By Ryan Seghers
May 27, 2015
QuickRT is a highly optimized Extremely Randomized Trees classifier implementation. This page contains some preliminary benchmarks comparing QuickRT to other available implementations.
QuickRT can perform binary (two-class) and multi-class classification, but the benchmarks on this page only cover binary classification.
These benchmarks were performed using a pre-alpha version of QuickRT. The test machine was:
- Core i7 4770 3.4 GHz (Haswell)
- 16 GB of RAM
- Windows 8.1 Pro x64 (with all updates installed as of test date)
The test data files used are:
|HIGGS||binary classification||up to 7.8 GB||up to 11M||28 floats||UCI Machine Learning Repository link|
|Adult (Census)||binary classification||3.8 MB||32,561||14 (most categorical)||UCI Machine Learning Repository link|
The classifiers used were:
- QuickRT (QRT): Pre-alpha version
- Scikit-learn Extra Trees (SKL ERT): scikit-learn ExtraTreesClassifier
- Weka RandomForest (WRF): Weka Version 3.7
- Weka ExtraTree (WERT): Weka Version 3.7
- OpenCV RandomForest (CVRF): CvRTrees
- OpenCV ExtraTrees (CVERT): CvERTrees
The Java version (for Weka) was:
Java(TM) SE Runtime Environment (build 1.7.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
The Python version (for Scikit-learn) was:
Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul 2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)]
Memory use was measured using Task Manager looking at Peak Working Set during a single-thread/process run of a single fold (with K = 4) and requesting 500 trees.
The results (from lowest to highest) for the HIGGS 10,000 data set:
|Classifier||Memory (Peak Working Set)|
|OpenCV RF||187 MB|
|scikit-learn ERT||228 MB|
|OpenCV ERT||482 MB|
|Weka RF||595 MB|
|Weka ERT||1,560 MB|
The results (from lowest to highest) for the first three classifiers on the HIGGS 100,000 data set:
|Classifier||Memory (Peak Working Set)|
|OpenCV RF||1,665 MB|
|scikit-learn ERT||1,700 MB|
Speed - Without Overhead
The test scenario was to perform K-fold validation on the input data file, with 4 folds. For each classifier a "warm-up"" run was done requesting just one tree, then an "overhead" run again with just one tree. The "warm-up" run is to spin up caches and VM's/JITters to avoid cold-start penalities, though different subsystems may not benefit from this since the warm-up executes a separate process. The "overhead" run is to measure process startup and csv load overhead. That overhead time is then subtracted from all future times. So the times charted are not total time, they are the additional time beyond the 1-tree time. The overhead times are measured and reported separately, below.
Both single-thread (ST) and multi-threaded (MT) scenarios were benchmarked. In the case that a classifier did not support multi-threading the 4 folds were run simultaneously in separate processes. That puts a multi-process run at a disadvantage to a multi-thread classifier because the 4 threads can work with the same feature data whereas in a multi-process run each process must load and contain its own copy of the data. Therefore single-thread/process runs are the simplest comparison.
For faster classifiers there are 10 data points per run, corresponding to requesting from 50 to 500 trees in steps of 50. For slower classifiers fewer data points were run, but still starting from 50 trees and stepping by 50.
The following are using the first 100,000 records of the HIGGs data set. Most classifiers are excluded because they are much slower.
Speed - With Overhead
The scikit-learn ExtraTreesClassifier is the only classifier tested that has similar speed to QuickRT. The primary benchmark program above subtracts the one-tree classifier run time to remove overhead. That overhead was measured and is reported here.
These are times for training and classifying a single fold (with K=4) of the HIGGS data set.
Here is the QuickRT command-line:
--metric gini [--nThreads 1] --speed 9 --nTrees [number of trees] -i trainFile.csv --test testFile.csv
Here is the time breakdown:
|Classifier||Load CSV Time (s)||Train Time (s)||Classify Time (s)||Total Time (s)|
|100 Trees Single-thread|
|100 Trees Multi-thread|
|500 Trees Single-thread|
|QuickRT 500 Trees ST||0.25||16.6||2.3||19.2|
|scikit-learn ERT: 500 Trees ST||2.5||37.5||3.2||43.2|
|500 Trees Multi-thread|
|QuickRT 500 Trees MT||0.25||4.6||0.6||5.5|
|scikit-learn ERT: 500 Trees MT||2.5||11.3||0.8||14.6|
QuickRT Speed Argument
QuickRT has a speed-accuracy trade-off argument (--speed). It can have values from 0 to 10. The value 0 means slowest speed, highest accuracy. The value 10 means highest speed, lowest accuracy. QuickRT varies the thoroughness of node split searches based on this parameter. See the usage for details. The purpose of this section is to show the effects of changing that parameter. The QuickRT?? runs in the chart correspond to runs with different speed argument values.
The test scenario is the same as for the HIGGs data set above, but this time on the Adult (Census) data set.
QuickRT is a highly optimized Extremely Random Trees (Random Forest) implementation. For binary (two-class) classification QuickRT is both faster and uses less memory than the alternatives benchmarked here.