Experimental test datasets for Fast and Scalable Implementation of the Bayesian SVM

dataset

posted on 2017-12-29, 18:22 authored by Florian Wenzel, Théo Galy-Fajou, Matthäus Deutsch, Marius Kloft

This record contains seven real-world test datasets used in experiments with the Bayesian SVM algorithm in the ECML PKDD 2017 paper; Wenzel et al.: Bayesian Nonlinear Support Vector Machines for Big Data.

For code used in the related experiments please see https://doi.org/10.6084/m9.figshare.5443627

The datasets are used in the related experiments to compare the prediction performance, the quality of the uncertainty estimates and run time of the various methods. Collectively these contain containing millions of samples. The datasets are all from the Rätsch benchmark datasets commonly used to test the accuracy of binary nonlinear classifiers.

Data files are in .data format used by Analysis Studio, a statistical analysis and data mining program. It contains mined data in a plain text, tab-delimited format, including an Analysis Studio file header. The raw data is can be openly accessed via text edit software.

The data are from a range of disciplines that correspond to applications considered in the related publication:

Processed_BreastCancer.data

Processed_Diabetis.data

Processed_Flare.data

Processed_German.data

Processed_Heart.data

Processed_Splice.data

Processed_Waveform.data

Background

We propose a fast inference method for Bayesian nonlinear support vector machines that leverages stochastic variational inference and inducing points. Our experiments show that the proposed method is faster than competing Bayesian approaches and scales easily to millions of data points. It provides additional features over frequentist competitors such as accurate predictive uncertainty estimates and automatic hyperparameter search.

Please also check out our github repository:

https://github.com/theogf/BayesianSVM.jl

Funding

This work was partly funded by the German Research Foundation (DFG) award KL 2698/2-1.

History

Research Data Support

Research data support provided by Springer Nature.

Experimental test datasets for Fast and Scalable Implementation of the Bayesian SVM

Funding

This work was partly funded by the German Research Foundation (DFG) award KL 2698/2-1.

History

Research Data Support

Usage metrics

Categories

Keywords

Licence

Exports