Additional file 5: of CellMapper: rapid and accurate inference of gene expression in difficult-to-isolate cell types

Robustness of CellMapper to bias in dataset composition. Samples were drawn from the Lukk et al. [28] dataset in order to intentionally increase or decrease bias in sample composition and the effect on algorithm performance was quantified. a Sensitivity to adding redundant samples. CellMapper was applied, with and without the SVD filter, to search for tissue-specific genes using 500 randomly selected samples from the total microarray dataset, plus varying numbers of added “redundant samples.” For this analysis, “redundant samples” were selected from a subset of the data annotated as “blood,” “bone marrow,” and “mammary gland” because these three sample annotations are the most over-represented in the Lukk dataset, accounting for over half of all samples. While performance degraded when redundant samples were added without the SVD filter, CellMapper actually performed better and was able to benefit from the increase in sample size. b Sensitivity to removing relevant samples. Samples annotated as belonging to a specific tissue were removed from the Lukk dataset and CellMapper was applied to search this truncated dataset for genes expressed in the tissue with samples removed. This analysis was run separately for each of seven tissues (“bone,” “colon,” “kidney,” “liver,” “ovary,” “prostate,” and “skin”), and the mean change in AUPR across all tissues is reported. These tissues were analyzed because they represent an intermediate number of samples in the Lukk dataset (50–150 samples for each tissue or 1–3 % of the total). (PDF 19 kb)