Experimental test datasets for Fast and Scalable Implementation of the Bayesian SVM
dataset
posted on 29.12.2017, 18:22 authored by Florian Wenzel, Théo Galy-Fajou, Matthäus Deutsch, Marius KloftThis record contains seven real-world test datasets used in experiments with the Bayesian SVM algorithm in the ECML PKDD 2017 paper; Wenzel et al.: Bayesian Nonlinear Support Vector Machines for Big Data.
For code used in the related experiments please see https://doi.org/10.6084/m9.figshare.5443627
The datasets are used in the related experiments to compare the prediction performance, the quality of the uncertainty estimates and run time of the various methods. Collectively these contain containing millions of samples. The datasets are all from the Rätsch benchmark datasets commonly used to test the accuracy of binary nonlinear classifiers.
Data files are in .data format used by Analysis Studio, a statistical analysis and data mining program. It contains mined data in a plain text, tab-delimited format, including an Analysis Studio file header. The raw data is can be openly accessed via text edit software.
The data are from a range of disciplines that correspond to applications considered in the related publication:
Processed_BreastCancer.data
Processed_Diabetis.data
Processed_Flare.data
Processed_German.data
Processed_Heart.data
Processed_Splice.data
Processed_Waveform.data
Background
We propose a fast inference method for Bayesian nonlinear support vector machines that leverages stochastic variational inference and inducing points. Our experiments show that the proposed method is faster than competing Bayesian approaches and scales easily to millions of data points. It provides additional features over frequentist competitors such as accurate predictive uncertainty estimates and automatic hyperparameter search.
Please also check out our github repository:
Funding
This work was partly funded by the German Research Foundation (DFG) award KL 2698/2-1.
History
References
Research Data Support
Research data support provided by Springer Nature.Usage metrics
Read the peer-reviewed publication
Categories
Keywords
Bayesian Approximative InferenceSupport Vector MachinesKernel MethodsBig DataBayesian nonlinear support vector machinesSVMStatistical machine learningstochasticuncertainty quantificationclass membership probabilitiescancer screeningsupervised classification algorithmclassification algorithmBayesian inference techniquesmachine learning