Training and testing sample data used in "Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction"
2018-01-16T17:28:23Z (GMT) by
This dataset contains the generated training and testing samples used in the paper "Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction". <div><br></div><div>3D Zernike descriptors are used to represent protein surface shape, while Support-Vector Machine (SVM) is the binary classification technique employed to produce a model based on the training data which predicts the class labels of the test data given only the feature vectors of the test data.</div><div><br></div><div>Data are archived in the format .tar.xz, which can be extracted by common archive utilities. Each archive file contains a number of text files in the SVM light format. Each line records 1331 colon-separated pairs of numbers (the first one being a feature index - an integer ranging from 1 to 1331, the second a floating point number). .csv file are also provided recording the precision, recall and accuracy values for test predictions.</div><div><br>The first one or two letters of the archive filename together with the "l" or "r" letters identify the protein class; the "train" keyword identifies training sets, while the "test" keyword identifies testing sets; training sets are available as either balanced or unbalanced. See the associated paper for further details.</div><div><div><br></div><div>Background: </div><div>In the related paper we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0.</div></div><div><br></div>