Additional file 1: of Analysis of error profiles in deep next-generation sequencing data

Supplementary Tables S1-S5. Table S1. Datasets used. Information provided includes data type, provider, analysis type, target region size, sequencing depth, dilution ratio, and sequencer. Table S2a. Designed 19 substitution markers in COLO829 experiment (HiSeq+Kapa Enzyme; StJude dataset). Listed are chromosome, position (hg19), mutation context, ploidy for each mutations. Also listed are the MAF, mutant allele counts (Mut), total coverage (Tot) for each replicate of each lane output for Normal, Tumor, and two dilutions (1:1000 and 1:5000). For ploidy, 4v4 means 4 out of 4 allels are mutated in cancer cells, 2v4 means 2 out of 4 allele are mutated; 1v4 means 1 out of 4 alleles are mutated; 4v6 means 4 out of 6 alleles are mutated; 1v2 means 1 out of 2 alleles are mutated. The allele counts by CleanLens are based on Phred score cutoff 38. *: BRAF V600E. Table S2b. Designed 19 substitution markers in COLO829 experiment (NovaSeq with Kapa enzyme; StJude dataset). Listed are chromosome, position (hg19), mutation context, ploidy for each mutations. Also listed are the MAF, mutant allele counts (Mut), total coverage (Tot) for each replicate of each lane output for Normal, Tumor, and two dilutions (1:1000 and 1:5000). For ploidy, 4v4 means 4 out of 4 allels are mutated in cancer cells, 2v4 means 2 out of 4 allele are mutated; 1v4 means 1 out of 4 alleles are mutated; 4v6 means 4 out of 6 alleles are mutated; 1v2 means 1 out of 2 alleles are mutated. The CleanLens allele counts based on Phred score cutoff 30. *: BRAF V600E. Table S2c. Designed 19 substitution markers in COLO829 experiment (NovaSeq with Q5 enzyme; StJude dataset). Listed are chromosome, position (hg19), mutation context, ploidy for each mutations. Also listed are the MAF, mutant allele counts (Mut), total coverage (Tot) for each replicate of each lane output for Normal, Tumor, and two dilutions (1:1000 and 1:5000). For ploidy, 4v4 means 4 out of 4 allels are mutated in cancer cells, 2v4 means 2 out of 4 allele are mutated; 1v4 means 1 out of 4 alleles are mutated; 4v6 means 4 out of 6 alleles are mutated; 1v2 means 1 out of 2 alleles are mutated. The CleanLens allele counts are based on Phred score cutoff 30. *: BRAF V600E. N.D: PCR failure. Table S2d, Primers for the 19 substitution markers for COLO829 experiment. Table S3a. Mutation counts in pediatric cancers (non-NBL and NBL) and adult cancers (COSMIC v82) are listed in columns C,D,E. For COSMIC data, we also excluded markers with population allele frequency (AF) >=0.1% (from ExAC database with TCGA samples subtracted), and required mutation recurrence (Rec) to be >=1 (columns F, M, S), >=5 (columns G, N, T), and >=10 (columns H, O, U). The number of C>T/G>A mutations in high error rate context for each group are listed in columns J-O, with percentages of high error rate contexts summarized in columns P-U. Table S3b. Analysis of sequence context of hotspot substitutions defined by Chang et al. (PMID: 29247016). In total 947 hotspot substitutions mutated in 5 or more samples (column C) are included. The gene name (column A), amino acid change (column B), genomic substitutions (column D) were extracted from the source paper. The mutational contexts were provided in columns E,F,G, in case multiple mutations can cause the same amino acid change. C>T/G>A mutations in high error rate contexts were indicated with orange color. Table S4. List of 47 hybridization capture samples. Related to Fig. 5 and Fig. 7. Table S5. List of 1663 whole genome samples. Related to Fig. 7. (XLSX 190 kb)