Experimental test datasets for Fast and Scalable Implementation of the Bayesian SVM
datasetposted on 29.12.2017, 18:22 by Florian Wenzel, Théo Galy-Fajou, Matthäus Deutsch, Marius Kloft
This record contains seven real-world test datasets used in experiments with the Bayesian SVM algorithm in the ECML PKDD 2017 paper; Wenzel et al.: Bayesian Nonlinear Support Vector Machines for Big Data.
For code used in the related experiments please see https://doi.org/10.6084/m9.figshare.5443627
The datasets are used in the related experiments to compare the prediction performance, the quality of the uncertainty estimates and run time of the various methods. Collectively these contain containing millions of samples. The datasets are all from the Rätsch benchmark datasets commonly used to test the accuracy of binary nonlinear classifiers.
Data files are in .data format used by Analysis Studio, a statistical analysis and data mining program. It contains mined data in a plain text, tab-delimited format, including an Analysis Studio file header. The raw data is can be openly accessed via text edit software.
The data are from a range of disciplines that correspond to applications considered in the related publication:
We propose a fast inference method for Bayesian nonlinear support vector machines that leverages stochastic variational inference and inducing points. Our experiments show that the proposed method is faster than competing Bayesian approaches and scales easily to millions of data points. It provides additional features over frequentist competitors such as accurate predictive uncertainty estimates and automatic hyperparameter search.
Please also check out our github repository:
This work was partly funded by the German Research Foundation (DFG) award KL 2698/2-1.
Research Data SupportResearch data support provided by Springer Nature.
Read the peer-reviewed publication
Bayesian Approximative InferenceSupport Vector MachinesKernel MethodsBig DataBayesian nonlinear support vector machinesSVMStatistical machine learningstochasticuncertainty quantificationclass membership probabilitiescancer screeningsupervised classification algorithmclassification algorithmBayesian inference techniquesmachine learning