Additional file 5 of Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models

Bloom, Jesse

doi:10.6084/m9.figshare.c.3667438_D17.v1

13062_2016_172_MOESM5_ESM.pdf (150.74 kB)

Additional file 5 of Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models

journal contribution

posted on 2017-01-17, 05:00 authored by Jesse Bloom

Simulations validate the statistical approach used to identify diversifying selection. Using the actual ExpCM parameters for NP in Table 2 except fixing ω=1 for all sites except for those selected to be simulated under diversifying selection, I used pyvolve [46] to simulate 40 alignments along the tree inferred from the actual NP sequences. For each simulation, I randomly selected 5 sites to place under diversifying selection, with ω r values ranging from 1 (no diversifying selection) to 30 (very strong diversifying selection). I then analyzed the data using phydms in the same way that the actual data were analyzed. Sites were called as being under significant diversifying selection using the false discovery rates (FDRs) indicated in the figure. The top panel shows that ExpCM greatly outperformed the FEL-like GY94 method at identifying true positives. The bottom panel shows that the Benjamini-Hocbherg [28] procedure effectively controls the fraction of false discoveries among the sites called as being under diversifying selection using ExpCM. The Benjamini-Hochberg procedure may be slightly too conservative for ExpCM (for every value of ω r the actual rate of false discoveries is slightly below the FDR), but the differences seem modest. The computer code to perform these simulations is in Additional file 17. (PDF 150 kb)

Funding

National Institute of General Medical Sciences

History

Usage metrics

Keywords

Deep mutational scanning Phylogenetics Substitution model Diversifying selection dN/dS

Licence

CC BY + CC0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Additional file 5 of Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models

Funding

National Institute of General Medical Sciences

History

Usage metrics

Categories

Keywords

Licence

Exports