Analysis of error profiles in deep next-generation sequencing data
Posted on 2019-03-14 - 05:00
Abstract Background Sequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). However, there is a lack of comprehensive understanding of errors introduced at various steps of a conventional NGS workflow, such as sample handling, library preparation, PCR enrichment, and sequencing. In this study, we use current NGS technology to systematically investigate these questions. Results By evaluating read-specific error distributions, we discover that the substitution error rate can be computationally suppressed to 10−5 to 10−4, which is 10- to 100-fold lower than generally considered achievable (10−3) in the current literature. We then quantify substitution errors attributable to sample handling, library preparation, enrichment PCR, and sequencing by using multiple deep sequencing datasets. We find that error rates differ by nucleotide substitution types, ranging from 10−5 for A>C/T>G, C>A/G>T, and C>G/G>C changes to 10−4 for A>G/T>C changes. Furthermore, C>T/G>A errors exhibit strong sequence context dependency, sample-specific effects dominate elevated C>A/G>T errors, and target-enrichment PCR led to ~ 6-fold increase of overall error rate. We also find that more than 70% of hotspot variants can be detected at 0.1 ~ 0.01% frequency with the current NGS technology by applying in silico error suppression. Conclusions We present the first comprehensive analysis of sequencing error sources in conventional NGS workflows. The error profiles revealed by our study highlight new directions for further improving NGS analysis accuracy both experimentally and computationally, ultimately enhancing the precision of deep sequencing.
CITE THIS COLLECTION
DataCite
3 Biotech
3D Printing in Medicine
3D Research
3D-Printed Materials and Systems
4OR
AAPG Bulletin
AAPS Open
AAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)
Academic Medicine
Academic Pediatrics
Academic Psychiatry
Academic Questions
Academy of Management Discoveries
Academy of Management Journal
Academy of Management Learning and Education
Academy of Management Perspectives
Academy of Management Proceedings
Academy of Management Review
Ma, Xiaotu; Shao, Ying; Tian, Liqing; Flasch, Diane; Mulder, Heather; Edmonson, Michael; et al. (2019). Analysis of error profiles in deep next-generation sequencing data. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.4433663.v1
or
Select your citation style and then place your mouse over the citation text to select it.
SHARE
Usage metrics
Read the peer-reviewed publication
AUTHORS (19)
XM
Xiaotu Ma
YS
Ying Shao
LT
Liqing Tian
DF
Diane Flasch
HM
Heather Mulder
ME
Michael Edmonson
YL
Yu Liu
XC
Xiang Chen
SN
Scott Newman
JN
Joy Nakitandwe
YL
Yongjin Li
BL
Benshang Li
SS
Shuhong Shen
ZW
Zhaoming Wang
SS
Sheila Shurtleff
LR
Leslie Robison
SL
Shawn Levy
JE
John Easton
JZ
Jinghui Zhang