10.6084/m9.figshare.c.3606128_D12.v1 Bradlee Nelms Bradlee Nelms Levi Waldron Levi Waldron Luis Barrera Luis Barrera Andrew Weflen Andrew Weflen Jeremy Goettel Jeremy Goettel Guoji Guo Guoji Guo Robert Montgomery Robert Montgomery Marian Neutra Marian Neutra David Breault David Breault Scott Snapper Scott Snapper Stuart Orkin Stuart Orkin Martha Bulyk Martha Bulyk Curtis Huttenhower Curtis Huttenhower Wayne Lencer Wayne Lencer Additional file 5: of CellMapper: rapid and accurate inference of gene expression in difficult-to-isolate cell types Springer Nature 2016 Cell type Expression Microarray Genome-wide association study Inflammatory bowel disease 2016-09-29 05:00:00 Journal contribution https://springernature.figshare.com/articles/journal_contribution/Additional_file_5_of_CellMapper_rapid_and_accurate_inference_of_gene_expression_in_difficult-to-isolate_cell_types/4347890 Robustness of CellMapper to bias in dataset composition. Samples were drawn from the Lukk et al. [28] dataset in order to intentionally increase or decrease bias in sample composition and the effect on algorithm performance was quantified. a Sensitivity to adding redundant samples. CellMapper was applied, with and without the SVD filter, to search for tissue-specific genes using 500 randomly selected samples from the total microarray dataset, plus varying numbers of added “redundant samples.” For this analysis, “redundant samples” were selected from a subset of the data annotated as “blood,” “bone marrow,” and “mammary gland” because these three sample annotations are the most over-represented in the Lukk dataset, accounting for over half of all samples. While performance degraded when redundant samples were added without the SVD filter, CellMapper actually performed better and was able to benefit from the increase in sample size. b Sensitivity to removing relevant samples. Samples annotated as belonging to a specific tissue were removed from the Lukk dataset and CellMapper was applied to search this truncated dataset for genes expressed in the tissue with samples removed. This analysis was run separately for each of seven tissues (“bone,” “colon,” “kidney,” “liver,” “ovary,” “prostate,” and “skin”), and the mean change in AUPR across all tissues is reported. These tissues were analyzed because they represent an intermediate number of samples in the Lukk dataset (50–150 samples for each tissue or 1–3 % of the total). (PDF 19 kb)