10.6084/m9.figshare.c.3606128_D12.v1
Bradlee Nelms
Bradlee
Nelms
Levi Waldron
Levi
Waldron
Luis Barrera
Luis
Barrera
Andrew Weflen
Andrew
Weflen
Jeremy Goettel
Jeremy
Goettel
Guoji Guo
Guoji
Guo
Robert Montgomery
Robert
Montgomery
Marian Neutra
Marian
Neutra
David Breault
David
Breault
Scott Snapper
Scott
Snapper
Stuart Orkin
Stuart
Orkin
Martha Bulyk
Martha
Bulyk
Curtis Huttenhower
Curtis
Huttenhower
Wayne Lencer
Wayne
Lencer
Additional file 5: of CellMapper: rapid and accurate inference of gene expression in difficult-to-isolate cell types
Springer Nature
2016
Cell type
Expression
Microarray
Genome-wide association study
Inflammatory bowel disease
2016-09-29 05:00:00
Journal contribution
https://springernature.figshare.com/articles/journal_contribution/Additional_file_5_of_CellMapper_rapid_and_accurate_inference_of_gene_expression_in_difficult-to-isolate_cell_types/4347890
Robustness of CellMapper to bias in dataset composition. Samples were drawn from the Lukk et al. [28] dataset in order to intentionally increase or decrease bias in sample composition and the effect on algorithm performance was quantified. a Sensitivity to adding redundant samples. CellMapper was applied, with and without the SVD filter, to search for tissue-specific genes using 500 randomly selected samples from the total microarray dataset, plus varying numbers of added “redundant samples.” For this analysis, “redundant samples” were selected from a subset of the data annotated as “blood,” “bone marrow,” and “mammary gland” because these three sample annotations are the most over-represented in the Lukk dataset, accounting for over half of all samples. While performance degraded when redundant samples were added without the SVD filter, CellMapper actually performed better and was able to benefit from the increase in sample size. b Sensitivity to removing relevant samples. Samples annotated as belonging to a specific tissue were removed from the Lukk dataset and CellMapper was applied to search this truncated dataset for genes expressed in the tissue with samples removed. This analysis was run separately for each of seven tissues (“bone,” “colon,” “kidney,” “liver,” “ovary,” “prostate,” and “skin”), and the mean change in AUPR across all tissues is reported. These tissues were analyzed because they represent an intermediate number of samples in the Lukk dataset (50–150 samples for each tissue or 1–3 % of the total). (PDF 19 kb)