Pedro Domingos and Geoff Hulten developed VFML in 2003 to experiment with applying machine learning techniques to situations where the scale of streaming data being learned from makes traditional techniques impractical. Their original work is described in:
Hulten, G. and Domingos, P. "VFML -- A toolkit for mining high-speed time-changing data streams" http://www.cs.washington.edu/dm/vfml/. 2003.
JVFML is a Java implementation of Hulten and Domingos' Very Fast Decision Tree (VFDT) algorithm for building decision trees from streaming data using a statistical result known as the Hoeffding Bound. The Hoeffding Bound is used to decide when enough data instances have been processed to split a tree node and be confident that a traditional batch learner with all the data available would have made the same decision.
JVFML is designed to interface with Weka. Although using Weka eliminates a major advantage of VFDT (its ability to process streaming data sets one data instance at a time without ever loading the entire data set into memory), the Weka implementation is potentially a useful tool for experimenting with the algorithm.
The developers of Weka have also developed a streaming machine learning toolkit called Moa. This software contains an implementation of VFML as well as a number of other stream classifiers.
Domingos and Hulten also have an implementation of VFML in C. Their original source code can be downloaded from the VMFL Sourceforge repository. However, on Ubuntu 12.04 LTS (and possibly other modern Linux distros), the project does not build as-is. A slightly modified version of Domingos and Hulten's original C code which compiles under Ubuntu 12.04 LTS is packaged with JVFML.
java -classpath weka.jar;vfml-weka-1.0.0.jar weka.gui.GUIChooser
Note: Linux users should use : instead of ; when typing the previous command into the terminal.
Note: To work with large data sets, the Java heap space allocated to Weka may need to be increased. To give Weka 2GB of memory (for example) add the following option to the command above `-Xmx2g`.
Once Weka has been launched with the JVFML jar on the Java classpath, it can be used like any other Weka classifier. There should be two new classifiers available under weka/classifiers/trees: VFDT and CVFDT (CVFDT is an extension which support adapting to concept drift). Note that both classifiers currently only support nominal attributes and do not support missing values.
Follow these steps as a quick way to get started running VFML using the default Weka data sets: