Springer Nature
Browse
1/1
14 files

Data and code files for the Adaptive Skip-Train Structured Ensemble for Temporal Networks

dataset
posted on 2017-12-29, 18:23 authored by Martin Pavlovski, Fang Zhou, Ivan Stojkovic, Ljupco Kocarev, Zoran Obradovic
This fileset contains the data and source code related to the paper:

Pavlovski, M., Zhou, F., Stojkovic, I., Kocarev, L., & Obradovic, Z. "Adaptive Skip-Train Structured Regression for Temporal Networks", Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017

The code files contain the experimental setups and code for running the various models:

AST-SE: Adaptive Skip-Train Structured Ensemble, a sampling-based structured regression ensemble for prediction on top of temporal networks
LR: An L1-regularized linear regression. LR was employed as an unstructured predictor for each of the following models in order to achieve efficiency.
GCRF: Standard GCRF model that enables the chosen unstructured predictor to learn the network structure.
SE: Structured ensemble composed of multiple GCRF models.
WSE: Weighted structured ensemble that combines the predictions of multiple GCRFs in a weighted mixture in order to predict the nodes' outputs in the next timestep.

The data file H3N2_data.mat contains temporally collected gene expression measurements (12,032 genes) of a human subject infected with the H3N2 virus.

For further details see the related Conference paper.

All code is written in MATLAB and is available in .m format files. Raw code can be accessed from these files using openly-accessible text edit software. Data are provided in .mat format, accessible using the MATLAB computing environment.

Background
A broad range of high impact applications involve learning a predictive model in a temporal network environment. In weather forecasting, predicting effectiveness of treatments, outcomes in healthcare and in many other domains, networks are often large, while intervals between consecutive time moments are brief. Therefore, models are required to forecast in a more scalable and efficient way, without compromising accuracy. The Gaussian Conditional Random Field (GCRF) is a widely used graphical model for performing structured regression on networks. However, GCRF is not applicable to large networks and it cannot capture different network substructures (communities) since it considers the entire network while learning. In this study, we present a novel model, Adaptive Skip-Train Structured Ensemble (AST-SE), which is a sampling-based structured regression ensemble for prediction on top of temporal networks. AST-SE takes advantage of the scheme of ensemble methods to allow multiple GCRFs to learn from several subnetworks. The proposed model is able to automatically skip the entire training or some phases of the training process. The prediction accuracy and efficiency of AST-SE were assessed and compared against alternatives on synthetic temporal networks and the H3N2 Virus Influenza network. The obtained results provide evidence that (1) AST-SE is ~140 times faster than GCRF as it skips retraining quite frequently; (2) It still captures the original network structure more accurately than GCRF while operating solely on partial views of the network; (3) It outperforms both unweighted and weighted GCRF ensembles which also operate on sub- networks but require retraining at each timestep.

Funding

Targeted Funding Program, NSF grant CNS-1625061, Pennsylvania Department of Health CURE grant and ONR/ONR Global (grant No. N62909-16-1-2222)

History

Research Data Support

Research data support provided by Springer Nature.