Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Pardo-Palacios, Francisco J.; Wang, Dingjie; Reese, Fairlie; Diekhans, Mark; Carbonell-Sala, Sílvia; Williams, Brian; Loveland, Jane E.; Adams, Matthew S.; Balderrama-Gutierrez, Gabriela; Behera, Amit K.; María, Maite De; Gonzalez, Jose M.; Hunt, Toby; Lagarde, Julien; Li, Haoran; Liang, Cindy E.; Prjibelski, Andrey D.; Sheynkman, Leon; Amador, David Moraga; Barnes, If; Berry, Andrew; Çelik, Muhammed Hasan; Garcia-Reyero, Natàlia; Goetz, Stefan; Kondratova, Liudmyla; Martinez-Tomas, Jorge; Menor, Carlos; Mudge, Jonathan M.; Paniagua, Alejandro; Suner, Marie-Marthe; Takahashi, Hazuki; Tang, Alison D.; Youngworth, Ingrid Ashley; Carninci, Piero; Denslow, Nancy; Guigó, Roderic; Hunter, Margaret E.; Tilgner, Hagen U.; Wold, Barbara J.; Vollmers, Christopher; Frankish, Adam; Au, Kin Fai; Sheynkman, Gloria M.; Conesa, Ana; Mortazavi, Ali; Brooks, Angela N.

doi:10.6084/m9.figshare.19642383.v1

1/1

4 files

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

journal contribution

posted on 2022-04-28, 02:01 authored by Francisco J. Pardo-Palacios, Dingjie Wang, Fairlie Reese, Mark DiekhansMark Diekhans, Sílvia Carbonell-Sala, Brian Williams, Jane E. Loveland, Matthew S. Adams, Gabriela Balderrama-Gutierrez, Amit K. Behera, Maite De María, Jose M. Gonzalez, Toby Hunt, Julien Lagarde, Haoran Li, Cindy E. Liang, Andrey D. Prjibelski, Leon Sheynkman, David Moraga Amador, If Barnes, Andrew Berry, Muhammed Hasan Çelik, Natàlia Garcia-Reyero, Stefan Goetz, Liudmyla Kondratova, Jorge Martinez-Tomas, Carlos Menor, Jonathan M. Mudge, Alejandro Paniagua, Marie-Marthe Suner, Hazuki Takahashi, Alison D. Tang, Ingrid Ashley Youngworth, Piero Carninci, Nancy Denslow, Roderic Guigó, Margaret E. Hunter, Hagen U. Tilgner, Barbara J. Wold, Christopher Vollmers, Adam Frankish, Kin Fai Au, Gloria M. Sheynkman, Ana Conesa, Ali Mortazavi, Angela N. Brooks

Abstract

With increased usage of long-read sequencing technologies to perform transcriptome analyses, there becomes a greater need to evaluate different methodologies including library preparation, sequencing platform, and computational analysis tools. Here, we report the study design of a community effort called the Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium, whose goals are characterizing the strengths and remaining challenges in using long-read approaches to identify and quantify the transcriptomes of both model and non-model organisms. The LRGASP organizers have generated cDNA and direct RNA datasets in human, mouse, and manatee samples using different protocols followed by sequencing on Illumina, Pacific Biosciences, and Oxford Nanopore Technologies platforms. Participants will use the provided data to submit predictions for three challenges: transcript isoform detection with a high-quality genome, transcript isoform quantification, and de novo transcript isoform identification. Evaluators from different institutions will determine which pipelines have the highest accuracy for a variety of metrics using benchmarks that include spike-in synthetic transcripts, simulated data, and a set of undisclosed, manually curated transcripts by GENCODE. We also describe plans for experimental validation of predictions that are platform-specific and computational tool-specific. We believe that a community effort to evaluate long-read RNA-seq methods will help move the field toward a better consensus on the best approaches to use for transcriptome analyses.

Items:

The LRGASP Registered Report

Supplementary File

Supplementary Figures

Supplementary Table 1

Funding

Pew Charitable Trust (A.N.B.)

NIGMS R35GM138122(A.N.B.)

NIGMS R35GM142647 (G.M.S.)

NHGRI U41HG007234 (J.L, M.D., R.G. and S.C-S)

UM1 HG009443 (A.M. and B.W.)

An institutional fund of the Department of Biomedical Informatics, The Ohio State University (K.F.A., D.W. and H.L.)

NHGRI R01HG008759 (K.F.A., D.W. and H.L.)

NIGMS R01GM136886 (K.F.A., D.W. and H.L.)

SPBU 93023437 (A.P)

J.E.L., J.M.M. and A.F. are supported by National Human Genome Research Institute of the National Institutes of Health [U41HG007234]

Wellcome Trust [WT108749/Z/15/Z, WT200990/Z/16/Z]

History

Date of in-principle acceptance

2022-04-02

Usage metrics

Keywords

Sequencing Long-read sequencing technology transcriptomics analysis transcriptomics experiments

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Funding

Pew Charitable Trust (A.N.B.)

NIGMS R35GM138122(A.N.B.)

NIGMS R35GM142647 (G.M.S.)

NHGRI U41HG007234 (J.L, M.D., R.G. and S.C-S)

UM1 HG009443 (A.M. and B.W.)

An institutional fund of the Department of Biomedical Informatics, The Ohio State University (K.F.A., D.W. and H.L.)

NHGRI R01HG008759 (K.F.A., D.W. and H.L.)

NIGMS R01GM136886 (K.F.A., D.W. and H.L.)

SPBU 93023437 (A.P)

J.E.L., J.M.M. and A.F. are supported by National Human Genome Research Institute of the National Institutes of Health [U41HG007234]

Wellcome Trust [WT108749/Z/15/Z, WT200990/Z/16/Z]

History

Date of in-principle acceptance

Usage metrics

Categories

Keywords

Licence

Exports