Chloroplast sequence variation and the efficacy of peptide nucleic acids for blocking host amplification in plant microbiome studies

dataset

posted on 2018-08-29, 15:41 authored by Connor R. Fitzpatrick, Patricia Lu-Irving, Julia Copeland, David S. Guttman, Pauline W. Wang, David A Baltrus, Katrina M. Dlugosch, Marc TJ Johnson

This dataset consists of bacterial and host plant phylogenetic trees, metadata and ASV (Amplicon sequence variant) taxonomy assignments and count tables. Data were generated for root microbial communities from replicate individuals of 32 plant species based on the use of universal of peptide nucleic acid (PNA) clamps.

Sequencing files are available on the NCBI SRA (SRP128025).

.tre extension: phylogenetic tree files created by e.g. BEAST (Bayesian Evolutionary Analysis Sampling Trees) or PASTA (Practical Alignment using Sate and TrAnsitivity) that can be openly accessed via text edit software or specialist visualisation software such as figtree

.csv extension: openly accessible text file

.rds extension: R data object in this case holding sequence variant data, accessible via the R programming language

PNA_16S_root_microbiome.tre: phylogenetic tree of all bacterial ASVs based on DADA2 generated ASVs. PASTA (doi:10.1089/cmb.2014.0156) was used to generate phylogeny.

PNA_host_plant.tre: Phylogenetic tree of host plant species. We downloaded accessions of 3 genes (2 plastid and 1 nuclear) for each of our plant species from GenBank: ribulose-bisphosphate carboxylase (rbcL); maturase K (matK); and internal transcribed spacer (ITS) adjacent to the 5.8S ribosomal RNA gene. We aligned sequences in MEGA v. 6.0 using MUSCLE with default parameters, followed by manually checking alignments. We used BEAST v. 2.1.3 to build a Bayesian phylogenetic tree. For each locus we implemented a standard general time-reversible model (GTR + I + Γ) and an uncorrelated lognormal clock (UCLN) to determine the rate of nucleotide change.

PNA_metadata.csv: Metadata associated with each individual sample. PNA: universal (o), Asteraceae-modified (m). Community: E (endosphere); R (rhizosphere); S (bulk soil); RT (rhizosphere toothpick); ET (endosphere toothpick); MOCK (mock community); PAO (pure culture of Psuedomonas aeruginosa); WATER (negative control). Species: refers to the host plant species; code 600 (bare soil); code 700 (toothpick). Replicate: refers to experimental blocking ID. Run - refers to individual MiSeq sequencing run. Treatment - refers to watering treatment (well-watered or drought).

PNA_root_microbiome_seqtab.rds: Count table for ASVs generated from DADA2 from raw MiSeq reads. Rows are individual samples and columns are individual ASVs. This table underlies nearly all analyses. Can be loaded into R using read.RDS(). Generate phyloseq() object using the meta data, bacterial phylogeny, and bacterial taxonomy provided here.

PNA_root_microbiome_SILVA_taxa.rds: Taxonomy assignments of each ASV generated from DADA2 using the SILVA training set and the RDP classifier (implemented in DADA2; See Methods). Saved as an RDS file to be read into R.

Background

The ability to efficiently characterize microbial communities from host individuals can be limited by co-amplification of host organellar sequences (mitochondrial and/or plastid), which share a common ancestor and thus sequence similarity with extant bacterial lineages. One promising approach is the use of sequence-specific peptide nucleic acid (PNA) clamps, which bind to, and block amplification of, host-derived DNA. Universal PNA clamps have been proposed to block host plant-derived mitochondrial (mPNA) and plastid (pPNA) sequences at the V4 16S rRNA locus, but their efficacy