MOESM1 of Database fingerprint (DFP): an approach to represent molecular databases

Additional file 1:Table S1. DFPs of representative data sets used in this work. Table S2. Inter-set relationship computed with the newly developed database fingerprint using DFP/Tanimoto coefficient. Fig. S1 Distributions of MACCS keys (166-bits) of selected data sets studied in this work (others are shown in the main text). Fig. S2 Visual representation of the distance matrix comparing inter-set relationships of the compound data sets computed with the database fingerprint (DFP) and city block distance. Fig. S3 Relationship between inverse normalized city block distance and Tanimoto similarity using the DFP. Fig. S4 Inter-set relationships of the compound data sets computed with MACCS keys and the Tanimoto coefficient. Fig. S5 Relationship between mean similarities computed with MACCS keys and DFP. Fig. S6 Relationship Shannon Entropy and DFP/Tanimoto similarity and k-mean Euclidean clustering for the ten compound data sets in Table 2 at threshold of 0.6. Fig. S7 Probability distribution of the 198 significant bit positions recovered from the original databases represented by PubChem fingerprint at threshold of 0.6.Fig. S8 Relationship Shannon Entropy and DFP/Tanimoto similarity and k-mean Euclidean clustering for the ten compound data sets in Table 2 at threshold of 0.7. Fig. S9 Probability distribution of the 198 significant bit positions recovered from the original databases represented by PubChem fingerprint at threshold of 0.7.