Additional file 13 of Coevolutionary analyses require phylogenetically deep alignments and better null models to accurately detect inter-protein contacts within and between species

2015-08-25T05:00:00Z (GMT) by Aram Avila-Herrera Katherine Pollard
Figure S1. HisKA-RR. Number of effective sequences (N eff) versus number of sequence (N) in the 60 sub-sampled HisKA-RR alignments. Dashed line indicates the diagonal. Blue line indicates a linear fit with 95 % confidence intervals in gray. Figure S2. Ovch32. Number of effective sequences (N eff) versus number of sequence (N) in the Ovch32 alignments. Dashed line indicates the diagonal. Blue line indicates a linear fit with 95 % confidence intervals in gray. Figure S3. Distribution of C β distances in HisKA-RR interaction (PDB: 3DGE). Figure S4. Distribution of C β distances in Ovch32 interactions [67] (See supplemental file for PDB accessions). Figure S5. Ovch32. Precision (PPV) versus Neff at FPR < 0.1 %. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. Figure S6. Ovch32. Power (TPR) versus Neff at FPR < 5 %. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. Figure S7. Ovch32. ϕ max versus Neff. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. Figure S8. HisKA-RR alt.. Power (TPR) vs Neff/L at FPR < 5 %. A stricter definition of positives, defined experimentally in [46–48] is used. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. Figure S9. HisKA-RR alt.. Power (TPR) vs Neff/L at FPR < 0.1 %. A stricter definition of positives, defined experimentally in [46–48] is used. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. Figure S10. HisKA-RR alt.. Precision (PPV) vs Neff/L at FPR < 0.1 %. A stricter definition of positives, defined experimentally in [46–48] is used. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. Figure S11. Ovch32. Power (TPR) at FPR < 5 % and Precision (PPV) at FPR < 0.1 % versus Neff/L. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. Figure S12. HisKA-RR. Nominal false positive rate (FPR) for target FPR 5 %. Figure S13. Ovch32. Nominal false positive rate (FPR) for target FPR 0.1 %. Figure S14. Ovch32. Nominal false positive rate (FPR) for target 5 %. Figure S15. HisKA-RR. ϕ max. Figure S16. HisKA-RR. F max. Figure S17. HisKA-RR. Area under precision-recall curve. Figure S18. HisKA-RR. Area under ROC curve. Figure S19. Ovch32. ϕ max. Figure S20. Ovch32. F max. Figure S21. Ovch32. Area under precision-recall curve. Figure S22. Ovch32. Area under ROC curve. Figure S23. HisKA-RR. Median precision (PPV) at FPR <0.1 % and median power (TPR) at FPR < 5 % per rate categories of individual alignment columns. Rate categories are defined as above- and below- median entropy for the HisKA and RR columns in each set of 10 alignments of equal size (number of sequences (N)). Figure S24. Ovch32. Precision (PPV) versus the proportion of contacting pairs of residues in each interaction (i.e. contacting pairs divided by all pairs of residues) at FPR < 0.1 %. Figure S25. Ovch32. Precision (PPV) versus the proportion of contacting pairs of residues in each interaction (i.e. contacting pairs divided by all pairs of residues) at FPR <5 %. Figure S26. Ovch32. False positive rate (FPR) versus the proportion of contacting pairs of residues in each interaction (i.e. contacting pairs divided by all pairs of residues) at P < 0.05 Figure S27. Ovch32. False positive rate (FPR) versus the proportion of contacting pairs of residues in each interaction (i.e. contacting pairs divided by all pairs of residues) at P < 0.001 Figure S28. HisKA-RR. The phylogenetic methods CTMP and Spidermonkey successfully ran on a subset of our alignments. Power (TPR) at FPR < 5 % and precision (PPV) at FPR < 0.1 %. Select methods are included for comparison. Blue line indicates a linear fit with 95 % confidence intervals in gray. Figure S29. HisKA-RR. The phylogenetic method CTMP and Spidermonkey successfully ran on a subset of our alignments. Nominal false positive rate (FPR) at target FPR 0.1 %. Select methods are included for comparison. Blue line indicates a linear fit with 95 % confidence intervals in gray. Figure S30. HisKA-RR. Quantile quantile plots of standardized coevolution scores are not always normally distributed. Scores are from 10 alignments with 5 sequences. Figure S31. HisKA-RR. Quantile quantile plots of standardized coevolution scores are not always normally distributed. Scores are from 10 alignments with 500 sequences. Figure S32. HisKA-RR. Quantile quantile plots of standardized coevolution scores are not always normally distributed. Scores are from 10 alignments with 5000 sequences. Figure S33. HisKA-RR. P boostrap fails to control the FPR except for PSICOV at target FPR < 5 % in HisKA-RR alignments. Eliminating residue pairs with large simulation errors shows PSICOV and MIHmin are most robust to variation at individual sites. See Misc. Abbreviations and Table 1 for abbreviations. Figure S34. Vif. Power (TPR), precision (PPV), and false positive rate (FPR) for predicting viral protein Vif residues (not pairs) essential for interacting with its host target A3G at P empirical <α thresholds that maximize PPV for each coevolution method. Residues defined as positive are taken from previous functional mutation studies in Table 3. See Abbreviations and Table 1 for abbreviations. Figure S35. Vif. Power (TPR), precision (PPV), and false positive rate (FPR) for predicting viral protein Vif residues (not pairs) essential for interacting with its host target A3G at P empirical <α thresholds that maximize PPV for each coevolution method. Residues defined as positive are taken from previous functional mutation studies in Table 3. See Abbreviations and Table 1 for abbreviations.Vifcrit PPVoptbars Figure S36. Residues (red) on viral protein Vif (light blue) that are predicted to coevolve with it host target A3G (structure unknown). Cofactors are shown in gray. Predictions are made at a threshold that maximizes precision (PPV) using A known essential residues (Table 3) using B-D MI, Figure S37. HIV1-human. Distinguishing HIV1-human interactors from a protein pairs in a permuted network is difficult with small Neff">/L. ϕ max across a the number of predicted coevolving column-pairs per protein-pair versus Pˆp empirical threshold for making column-pair predictions. Blue line indicates a linear fit with 95 % confidence intervals in gray. Figure S38. HIV1-human. N eff/L distribution of alignments in HIV1-human interactors The minimum N eff/L seen in the HisKA-RR (red) and Ovch32 (orange) data sets is marked. (ZIP 28364 kb)