By Ryan Seghers

May 27, 2015

Introduction

QuickRT is a highly optimized Extremely Randomized Trees classifier implementation. This page contains some preliminary benchmarks comparing QuickRT to other available implementations.

QuickRT can perform binary (two-class) and multi-class classification, but the benchmarks on this page only cover binary classification.

Test Environment

These benchmarks were performed using a pre-alpha version of QuickRT. The test machine was:

  • Core i7 4770 3.4 GHz (Haswell)
  • 16 GB of RAM
  • Windows 8.1 Pro x64 (with all updates installed as of test date)

Test Data

The test data files used are:

Data Set Type Size Samples Features Notes
HIGGS binary classification up to 7.8 GB up to 11M 28 floats UCI Machine Learning Repository link
Adult (Census) binary classification 3.8 MB 32,561 14 (most categorical) UCI Machine Learning Repository link
Not all samples of HIGGS were used due to its large size. The number of samples used is specified with each test using that data set. To use a subset the samples were chosen consecutively from the start of the file.

Classifiers

The classifiers used were:

  • QuickRT (QRT): Pre-alpha version
  • Scikit-learn Extra Trees (SKL ERT): scikit-learn ExtraTreesClassifier
  • Weka RandomForest (WRF): Weka Version 3.7
  • Weka ExtraTree (WERT): Weka Version 3.7
  • OpenCV RandomForest (CVRF): CvRTrees
  • OpenCV ExtraTrees (CVERT): CvERTrees

The Java version (for Weka) was:
Java(TM) SE Runtime Environment (build 1.7.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

The Python version (for Scikit-learn) was:
Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul 2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)]

Memory Use

Memory use was measured using Task Manager looking at Peak Working Set during a single-thread/process run of a single fold (with K = 4) and requesting 500 trees.

The results (from lowest to highest) for the HIGGS 10,000 data set:

Classifier Memory (Peak Working Set)
QuickRT93 MB
OpenCV RF187 MB
scikit-learn ERT228 MB
OpenCV ERT482 MB
Weka RF595 MB
Weka ERT1,560 MB

The results (from lowest to highest) for the first three classifiers on the HIGGS 100,000 data set:

Classifier Memory (Peak Working Set)
QuickRT829 MB
OpenCV RF1,665 MB
scikit-learn ERT1,700 MB

Speed - Without Overhead

The test scenario was to perform K-fold validation on the input data file, with 4 folds. For each classifier a "warm-up"" run was done requesting just one tree, then an "overhead" run again with just one tree. The "warm-up" run is to spin up caches and VM's/JITters to avoid cold-start penalities, though different subsystems may not benefit from this since the warm-up executes a separate process. The "overhead" run is to measure process startup and csv load overhead. That overhead time is then subtracted from all future times. So the times charted are not total time, they are the additional time beyond the 1-tree time. The overhead times are measured and reported separately, below.

Both single-thread (ST) and multi-threaded (MT) scenarios were benchmarked. In the case that a classifier did not support multi-threading the 4 folds were run simultaneously in separate processes. That puts a multi-process run at a disadvantage to a multi-thread classifier because the 4 threads can work with the same feature data whereas in a multi-process run each process must load and contain its own copy of the data. Therefore single-thread/process runs are the simplest comparison.

For faster classifiers there are 10 data points per run, corresponding to requesting from 50 to 500 trees in steps of 50. For slower classifiers fewer data points were run, but still starting from 50 trees and stepping by 50.

HIGGs 10,000

The following are using the first 10,000 records of the HIGGs data set.

HIGGS 10,000 ST HIGGS 10,000 MT

HIGGs 100,000

The following are using the first 100,000 records of the HIGGs data set. Most classifiers are excluded because they are much slower.

HIGGS 100,000 ST HIGGS 100,000 MT

Speed - With Overhead

The scikit-learn ExtraTreesClassifier is the only classifier tested that has similar speed to QuickRT. The primary benchmark program above subtracts the one-tree classifier run time to remove overhead. That overhead was measured and is reported here.

These are times for training and classifying a single fold (with K=4) of the HIGGS data set. Here is the QuickRT command-line:
--metric gini [--nThreads 1] --speed 9 --nTrees [number of trees] -i trainFile.csv --test testFile.csv

Here is the time breakdown:

Classifier Load CSV Time (s) Train Time (s) Classify Time (s) Total Time (s)
100 Trees Single-thread
QuickRT0.253.40.54.2
scikit-learn ERT2.57.70.710.9
100 Trees Multi-thread
QuickRT0.251.00.11.4
scikit-learn ERT2.52.70.35.6
500 Trees Single-thread
QuickRT 500 Trees ST0.2516.62.319.2
scikit-learn ERT: 500 Trees ST2.537.53.243.2
500 Trees Multi-thread
QuickRT 500 Trees MT0.254.60.65.5
scikit-learn ERT: 500 Trees MT2.511.30.814.6

QuickRT Speed Argument

QuickRT has a speed-accuracy trade-off argument (--speed). It can have values from 0 to 10. The value 0 means slowest speed, highest accuracy. The value 10 means highest speed, lowest accuracy. QuickRT varies the thoroughness of node split searches based on this parameter. See the usage for details. The purpose of this section is to show the effects of changing that parameter. The QuickRT?? runs in the chart correspond to runs with different speed argument values.

The test scenario is the same as for the HIGGs data set above, but this time on the Adult (Census) data set.

HIGGS 10,000 ST

Conclusion

QuickRT is a highly optimized Extremely Random Trees (Random Forest) implementation. For binary (two-class) classification QuickRT is both faster and uses less memory than the alternatives benchmarked here.