Additional file 5 of Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models

2017-01-17T05:00:00Z (GMT) by Jesse Bloom
Simulations validate the statistical approach used to identify diversifying selection. Using the actual ExpCM parameters for NP in Table 2 except fixing ω=1 for all sites except for those selected to be simulated under diversifying selection, I used pyvolve [46] to simulate 40 alignments along the tree inferred from the actual NP sequences. For each simulation, I randomly selected 5 sites to place under diversifying selection, with ω r values ranging from 1 (no diversifying selection) to 30 (very strong diversifying selection). I then analyzed the data using phydms in the same way that the actual data were analyzed. Sites were called as being under significant diversifying selection using the false discovery rates (FDRs) indicated in the figure. The top panel shows that ExpCM greatly outperformed the FEL-like GY94 method at identifying true positives. The bottom panel shows that the Benjamini-Hocbherg [28] procedure effectively controls the fraction of false discoveries among the sites called as being under diversifying selection using ExpCM. The Benjamini-Hochberg procedure may be slightly too conservative for ExpCM (for every value of ω r the actual rate of false discoveries is slightly below the FDR), but the differences seem modest. The computer code to perform these simulations is in Additional file 17. (PDF 150 kb)