Training and testing sample data used in "Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction"

2018-01-16T17:28:23Z (GMT) by Sebastian Daberdaku Carlo Ferrari
This dataset contains the generated training and testing samples used in the paper "Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction". <div><br></div><div>3D Zernike descriptors are used to represent protein surface shape, while Support-Vector Machine (SVM) is the binary classification technique employed to produce a model based on the training data which predicts the class labels of the test data given only the feature vectors of the test data.</div><div><br></div><div>Data are archived in the format .tar.xz, which can be extracted by common archive utilities. Each archive file contains a number of text files in the SVM light format. Each line records 1331 colon-separated pairs of numbers (the first one being a feature index - an integer ranging from 1 to 1331, the second a floating point number). .csv file are also provided recording the precision, recall and accuracy values for test predictions.</div><div><br>The first one or two letters of the archive filename together with the "l" or "r" letters identify the protein class; the "train" keyword identifies training sets, while the "test" keyword identifies testing sets; training sets are available as either balanced or unbalanced. See the associated paper for further details.</div><div><div><br></div><div>Background: </div><div>In the related paper we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0.</div></div><div><br></div>