Code for Fast and Scalable Implementation of the Bayesian SVM

This dataset contains the Julia code package for the Bayesian SVM algorithm described in the ECML PKDD 2017 paper; Wenzel et al.: Bayesian Nonlinear Support Vector Machines for Big Data.

Files are provided in .jl format; containing Julia language code: a high-performance dynamic programming language for numerical computing. These files can be accessed by openly available text edit software. To run the code please see the description below or the more detailed wiki


BSVM.jl - contains the module to run the Bayesian SVM algorithm.

AFKMC2.jl - File for the Assumption Free K MC2 algorithm (KMeans)

KernelFunctions.jl - Module for the kernel type

DataAccess.jl - Module for either generating data or exporting from an existing dataset

run_test.jl and paper_experiments.jl
- Modules to run on a file and compute accuracy on a nFold cross validation, also to compute the brier score and the logscore

test_functions.jl and paper_experiment_functions.jl - Sets of datatype and functions for efficient testing.

ECM.jl - Module for expectation conditional maximization (ECM) for nonlinear Bayesian SVM

For datasets used in the related experiments please see https://doi.org/10.6084/m9.figshare.5443621

Requirements

The BayesianSVM only works for version of Julia > 0.5. Other necessary packages will automatically be added in the installation. It is also possible to run the package from Python, to do so please check Pyjulia. If you prefer to use R you have the possibility to use RJulia. All these are a bit technical due to the fact that Julia is still a young package.

Installation

To install the last version of the package in Julia run

Pkg.clone("git://github.com/theogf/BayesianSVM.jl.git")

Running the Algorithm

Here are the basic steps for using the algorithm :

using BayesianSVM
Model = BSVM(X_training,y_training)
Model.Train()
y_predic = sign(Model.Predict(X_test))
y_uncertaintypredic = Model.PredictProb(X_test)
Where X_training should be a matrix of size NSamples x NFeatures, and y_training should be a vector of 1 and -1

You can find a more complete description in the Wiki

Background

We propose a fast inference method for Bayesian nonlinear support vector machines that leverages stochastic variational inference and inducing points. Our experiments show that the proposed method is faster than competing Bayesian approaches and scales easily to millions of data points. It provides additional features over frequentist competitors such as accurate predictive uncertainty estimates and automatic hyperparameter search.

Please also check out our github repository:
github.com/theogf/BayesianSVM.jl