ABSTRACT
Next-generation sequencing (NGS) opens up exciting possibilities for improving our knowledge of environmental microbial diversity, allowing rapid and cost-effective identification of both cultivated and uncultivated microorganisms. However, library preparation, sequencing, and analysis of the results can provide inaccurate representations of the studied community compositions. Therefore, all these steps need to be taken into account carefully. Here we evaluated the effects of DNA extraction methods, targeted 16S rRNA hypervariable regions, and sample origins on the diverse microbes detected by 454 pyrosequencing in marine cold seep and hydrothermal vent sediments. To assign the reads with enough taxonomic precision, we built a database with about 2,500 sequences from Archaea and Bacteria from deep-sea marine sediments, affiliated according to reference publications in the field. Thanks to statistical and diversity analyses as well as inference of operational taxonomic unit (OTU) networks, we show that (i) while DNA extraction methods do not seem to affect the results for some samples, they can lead to dramatic changes for others; and (ii) the choice of amplification and sequencing primers also considerably affects the microbial community detected in the samples. Thereby, very different proportions of pyrosequencing reads were obtained for some microbial lineages, such as the archaeal ANME-1, ANME-2c, and MBG-D and deltaproteobacterial subgroups. This work clearly indicates that the results from sequencing-based analyses, such as pyrosequencing, should be interpreted very carefully. Therefore, the combination of NGS with complementary approaches, such as fluorescence in situ hybridization (FISH)/catalyzed reporter deposition (CARD)-FISH or quantitative PCR (Q-PCR), would be desirable to gain a more comprehensive picture of environmental microbial communities.
INTRODUCTION
Elucidating how biodiversity is distributed and determining the underlying mechanisms that drive these patterns are central questions in ecology. However, the majority of microorganisms that occur in natural environments cannot be cultured or isolated using traditional cultivation techniques (1–3), which limits our knowledge of environmental microbial diversity. Culture-independent studies, such as molecular characterization, have provided new insights into our understanding of microbial community structures, especially for populations inhabiting extreme environments, such as marine cold seeps and hydrothermal vents, whose growth conditions may be difficult to mimic in the laboratory (4). Today, high-throughput sequencing allows the characterization of microbial communities with unprecedented levels of coverage (5–9). While molecular analyses provide a less biased picture of environmental microbial communities than culture-based methods, each step of the analysis needs to be taken into account. Indeed, the DNA extraction method, PCR amplification, or data analysis can all lead to distortions of the compositions of analyzed samples (10–18).
Extraction is often pointed out as a key step in obtaining DNAs from all microorganisms present in one studied environment (10, 11, 13, 14, 19–22). However, DNA extraction from sediment samples may be difficult. The presence of complex organic matter and enzymatic inhibitors, such as humic substances, can interfere with enzymatic reactions (23–25). Additionally, some microorganism cells might be resistant to lysis (e.g., resting cells, spores, and cysts), especially in extreme environments (26). Indeed, microbial cell envelopes vary in structure between and within microbial lineages, and some cells are more easily disrupted than others (13, 27, 28). Thus, optimization of DNA extraction methods is often necessary to access a correct representation of whole microbial communities.
The DNA amplification step required for most sequencing methods is also critical. Sequencing of the 16S rRNA gene has been used widely to assess the phylogenetic diversity in environmental samples (4, 29–31). This gene is composed of alternating conserved and hypervariable regions (32). The presence of conserved regions enables PCR amplifications using “universal” primers supposed to bind even to DNAs of unknown organisms (32). DNA primers can be designed to be complementary to two distant conserved regions. PCR can then be done by using forward and reverse primers leading to a PCR fragment including both conserved and variable regions. Amplified products are then used to evaluate microbial diversity and to study taxonomic affiliations and evolutionary relationships between organisms (32). However, 16S rRNA gene hypervariable regions exhibit different degrees of variability according to the studied phylogenetic groups; therefore, no unique hypervariable region allows discrimination between all known microbial lineages (33–35). Thus, working on partial 16S rRNA gene sequences may bias analyses of diversity and species richness recovery (33, 35, 36). Compared to other high-throughput sequencing technologies, pyrosequencing (Roche 454) is an attractive method for 16S rRNA gene sequencing, as relatively long reads are obtained (14, 37), though it does not allow access to the full length of the gene (i.e., about 1,550 bp). Therefore, in pyrosequencing studies, the choice of primers (14) and sequenced regions may significantly affect the estimates of operational taxonomic unit (OTU) richness and microbial diversity analysis (15, 33, 38).
In this study, we analyzed the effects of two major steps of microbial diversity characterization using the 16S rRNA gene. Three physicochemically contrasting samples of cold seep and hydrothermal vent sediments from the Guaymas Basin were recovered. These environments are known to contain important amounts of humic substances (39) that could impair subsequent molecular analyses (12, 23–25). Two different extraction methods and two primer sets targeting different hypervariable regions of the 16S rRNA gene for Bacteria, and two others for Archaea, were tested on each sample. Amplicons were sequenced using 454 pyrosequencing. Impacts of experimental designs were evaluated by comparing DNA extraction yields, OTU richness, and taxonomic diversity.
MATERIALS AND METHODS
Sample description and handling.Sediment samples were collected from the Guaymas Basin by use of push cores operated by the manned submersible Nautile during the “BIG” cruise (June 2010). The target sites included the “Vasconcelos” site, in a cold seep area from the Sonora Margin (marker BIG18; 27°35.5770N, 111°28.9840W; 1,574-m depth; dive 1758-14 core CT8) (40, 41), and the “MegaMat” site, in a hydrothermal vent area from the Southern Trough (marker M27; 27°00.4451N, 111°24.5298W; 2,000-m depth; dive 1766-22 core CT3). At marker BIG18 (Vasconcelos site; cold seeps), the sediment surface was colonized by abundant populations of gray polychaetes and gastropods surrounding a large white microbial mat (40). At marker M27 (MegaMat site; hydrothermal vents), the sediment surface was covered with white microbial mats near a hydrothermal mound overgrown with Riftia clusters. After recovery on board, sediment cores were immediately transferred to a cold room (4°C). Cores were then subsampled aseptically in 2-cm-thick layers that were frozen at −80°C until required for molecular analysis.
Two different sections from the cold seep core (surface layer [0 to 2 centimeters below the seafloor {cmbsf}; called SC1] and bottom layer [10 to 12 cmbsf; called SC6]; dive 1758-14 core CT8) and the surface layer from the hydrothermal vent core (0 to 2 cmbsf; called SH1; dive 1766-22 core CT3) were studied.
DNA extraction methods.We tested two of the most commonly used protocols to extract DNAs from deep marine sediment samples (42): the method proposed by Zhou et al. (12) (referred to here as “Zhou's method”) and a manufactured bead-beating kit. DNA extractions were carried out independently on each sample (SC1, SC6, and SH1), as follows. DNAs from three replicates of 2.5 g per sample were extracted using Zhou's method (12), with some modifications. Cycles of freeze-thawing in a high-salt extraction buffer were followed by chemical lysis at 65°C in the presence of sodium dodecyl sulfate (SDS), cetyltrimethylammonium bromide (CTAB), and proteinase K. Since lysis efficiencies vary between and within microbial groups (13, 28), the freeze-thawing step was repeated 10 times in order to increase the diversity of extracted lineages. Two-milliliter samples of this mix (total volume, 8 ml) were taken after 3, 5, 7, and 10 freeze-thaw cycles in order to protect DNAs released from easily disrupted cells and then were pooled before the chemical lysis step. This method allowed DNA extraction with minimal shearing, which might prevent the formation of chimeras (12). Moreover, CTAB has been used in previous studies to complex and remove contaminants from DNA (12).
In parallel, DNAs from four replicates of 0.6 g per sample were extracted by a bead-beating method. This method is based on a commercially available kit, FastDNA Spin for Soil (MP Biomedicals, Santa Ana, CA), that was modified following the method of Webster et al. (10). The FastDNA Spin kit is based on a mechanical lysis with ceramic and silica beads in a bead beater, which theoretically allows a more efficient lysis of all microorganisms, including bacterial spores, endospores, and Gram-positive bacteria (10, 11). Furthermore, this method uses silica columns to retain high-molecular-weight DNA while removing inhibitors, such as humic acids, salts, and proteins.
The quantities and purities of the extracted DNAs from all replicates were evaluated by optical density measurement using a Nanodrop ND-1000 spectrophotometer (Nanodrop Technologies, Wilmington, DE). Results were statistically compared via Student t tests and analysis of variance (ANOVA), using R software (R Development Core Team, 2011). DNA extract replicates were then pooled for each extraction method and purified using a Wizard DNA cleanup kit (Promega, Madison, WI) following the manufacturer's instructions. Purified DNA extracts were stored at −20°C until PCR amplification.
PCR amplification and pyrosequencing.The 16S rRNA gene was amplified using four different primer sets: two targeting Bacteria and two others for Archaea (Table 1).
Sets of primers used for PCR amplifications and in silico PCR performances of the primer pairs
The primer sets for Bacteria targeted the V4 (Bact1F-Bact1R) and V5–V6 (Bact2F-Bact2R) hypervariable regions (Table 1), while the primer sets for Archaea targeted the V2–V3 (Arch1F-Arch1R) and V5–V6 (Arch2F-Arch2R) hypervariable regions (Table 1). To allow multiplexing, forward primers were fused to 6 different tags (10 nucleotides [nt]) and the 454 GS-FLX sequencing primer A (5′-CGTATCGCCTCCCTCGCGCCATCAG-Tag-PrimerF-3′). The reverse primers were fused to the 454 GS-FLX sequencing primer B (5′-CCTATCCCTGTGTGCCTTGGCAGTC-PrimerR-3′). Primer design and DNA amplifications were performed by Genoscreen (Lille, France). PCR amplifications were performed using the following reagents in a 25-μl reaction volume: 10× DNA polymerase buffer (Applied Biosystems), 1.5 mM MgCl2 (Applied Biosystems), a 0.24 mM concentration of each deoxynucleoside triphosphate (dNTP; Roche), 0.4 mM (each) primers, and 1 U of Taq DNA polymerase (Applied Biosystems). The PCR program involved an initial denaturation step at 95°C for 10 min, followed by 25 cycles for Bacteria and 40 cycles for Archaea, at 95°C for 30 s, 50°C for 30 s, and 72°C for 1 min, with a final extension at 72°C for 10 min.
Amplicons were purified, and their concentrations and sizes were analyzed using DNA 1000 chips on an Agilent 2100 bioanalyzer (Agilent Technologies, Waldbronn, Germany). Amplicon products were mixed at an equimolar ratio (108 molecules per microliter for each amplicon) and then sequenced on a 454 Life Sciences GS-FLX genome sequencer (Roche Diagnostics, Indianapolis, IN). Quantification, purification, and sequencing were performed by Genoscreen (Lille, France). Two replicates from the hydrothermal vent sample extracted via Zhou's method and two others for the kit method were amplified, sequenced, and analyzed independently to check for reproducibility.
Read filtering.To minimize the effects of random sequencing errors (43, 44), pyrosequencing reads were filtered using the Mothur pipeline (45). We removed sequences (i) that were shorter than 200 bp, (ii) containing homopolymers longer than 8 bp, (iii) that aligned with the incorrect region of the 16S rRNA gene, (iv) identified as chimeras by the Uchime algorithm (45–47), and (v) affiliated with Archaea when Bacteria was targeted, and vice versa. Results were compared (Student t test and ANOVA) according to sample origin, targeted region, and extraction method, using R.
In order to minimize the impact of larger numbers of sequences per sample, all the following analyses were performed on the same number of sequences for the bacterial and archaeal regions.
OTU analysis.Numbers of OTUs and beta diversity indexes (Yue and Clayton, Bray-Curtis, and Morisita-Horn indexes) were determined using the Mothur pipeline (45) at two different levels of dissimilarity (3% and 10%). Results were compared (Student t test and ANOVA) according to sample origin, targeted region, and extraction method, using R. OTU networks were constructed at 10% dissimilarity by using the QIIME pipeline (48) and Cytoscape software (49).
Taxonomic assignment.Effects of DNA extraction methods and targeted 16S rRNA gene regions were also evaluated using taxonomic assignment of sequences by use of BLAST algorithms. The use of international databases, such as the EMBL, DDBJ, GenBank, and RFAM databases, and their associated taxonomies did not allow assignment at a sufficient level of precision to allow pertinent comparisons between our samples. Indeed, taxonomies used in these databases are sometimes inaccurate or not updated, and levels of classification are often too high to be informative (e.g., “uncultured archaeon”). Therefore, we designed a reference database for the purpose of this study, consisting of 786 archaeal and 1,677 bacterial sequences from deep-sea marine sediments, collected from and affiliated according to reference publications (1, 3, 40, 41, 50–74; L. Durand, M. Roumagnac, V. Cueff-Gauchard, C. Jan, M. Guri, C. Tessier, M. Haond, P. Crassous, M. Zbinden, S. Arnaud-Haond, and M.-A. Cambon Bonavita, unpublished data). Sequences were downloaded from the NCBI website and aligned with MAFFT 6.903 (75). The consistency of taxonomic affiliations was checked by phylogenetic reconstructions. Phylogenetic trees were estimated using distance (neighbor joining) and maximum likelihood (ML) methods. Pairwise nucleotide sequence divergences were calculated using the Kimura 2-parameter model of substitution (76), and neighbor-joining trees (77) were reconstructed using MEGA 4.0.2 (78). The robustness of topologies was assessed by bootstrap procedures, using 1,000 replicates (79). ML analyses were performed using MPI-parallelized RAxML 7.2.8 (80), using a GTR model with among-site rate variation modeled by a discrete gamma approximation with four categories. GTRCAT approximation of models was used for ML bootstrapping (1,000 replicates). ML analyses were conducted on the CIPRES Science Gateway (81). When groups were recovered on the two phylogenetic trees, corresponding names provided in reference publications were considered valid and kept in our reference database. When names did not yet exist in the NCBI database, new taxonomic IDs were created. When groups appeared to be para- or polyphyletic, sequences were affiliated in our database to the first higher level of classification that appeared to be monophyletic. The full list of sequences and associated details (NCBI accession numbers, reference publications, associated affiliations, and affiliations used in our database) is available in Table S2 in the supplemental material. The custom database and its associated taxonomy were implemented in Koriblast software (Korilog, Questembert, France) and can be downloaded from an FTP server (http://ftp.genostar.com/ifremer/Bacteries_Archees_genostar.fasta). The pyrosequencing reads were searched against this reference database for taxonomic assignment (BLAST was performed in Koriblast, in which sequences are assigned to the first BLAST hit). Since errors in taxonomic assignments may be due to the restricted lengths of sequences, correct taxonomic assignments of short sequences obtained with the pyrosequencing primers used in this study were theoretically validated by a congruent BLAST search of our sequence database with sequences cut to the amplified fragment length against the original database, using Koriblast software.
Microbial diversities were then statistically analyzed using Primer-E software (82). Ordination by multidimensional scaling, derived from the Euclidean distance between samples, was used to visualize diversity (83).
Nucleotide sequence accession number.The raw sequencing data have been submitted to the NCBI database under SRA accession number SRP038099 (details are given in Table S1 in the supplemental material).
RESULTS
Extraction yields and DNA quality.The DNA quantities and qualities recovered varied according to samples and extraction methods (see Table S3 in the supplemental material).
Quantities of DNA obtained by Zhou's method were significantly different between samples (ANOVA; P < 0.0001), while the quantities obtained with the commercial kit seemed steadier. Differences between DNA quantities obtained using either Zhou's method or the kit method were significant only for samples from the cold seep surface (SC1) (ANOVA; P < 0.0001). The highest yields of DNA were recovered from cold seep surface samples extracted by Zhou's method.
DNA qualities revealed by the 260/280 and 260/230 spectrophotometric ratios showed significant differences between samples (ANOVA; P < 0.01), except for the 260/230 ratio obtained with the commercial kit (see Table S3 in the supplemental material). For 260/280 ratios, Zhou's method gave better results than the commercial kit for the cold seep surface samples (ANOVA; P < 0.001). However, the commercial kit performed better than Zhou's method with other samples (ANOVA; P < 0.03 for cold seep sample SC6 and hydrothermal vent sample SH1). For 260/230 ratios, Zhou's method yielded higher ratios than those with the commercial kit for all samples (ANOVA; P < 0.001).
Numbers of reads and sequence filtering.Overall, no significant differences were observed between the numbers of reads obtained from the different extraction methods and from the different samples (see Table S4 in the supplemental material). However, the numbers of reads obtained for each amplified region were significantly different (ANOVA; P < 0.0001).
During the filtering step, 17.36% to 48.78% of reads were removed, depending on sample origin and amplified region (see Table S4 in the supplemental material). No significant difference was detected between proportions of remaining sequences obtained with Zhou's method and the kit extraction method. However, significant differences were observed between proportions of remaining sequences among the different samples and the different regions (ANOVA; P < 0.03).
To determine the filtering step when sequences were removed, we compared (i) the numbers of detected chimeras between samples and (ii) the numbers of sequences belonging to kingdoms that were not supposed to be amplified by our kingdom-specific primer sets.
Sequences identified as chimeras represented only a small part of the reads (see Table S4 in the supplemental material). No significant difference was observed between proportions of detected chimeras obtained with Zhou's method and the kit extraction method. However, the proportions of chimeras were globally significantly different between samples for the Arch2 region and between the amplified regions (ANOVA; P < 0.02). The largest proportion of chimeras was observed for the Arch2 region amplified from the hydrothermal vent samples (average of 4.38%).
Primer sets were designed to amplify specifically Bacteria or Archaea. However, with the exception of the Bact2 primer set, some sequences affiliated with Archaea were also detected with the Bact1 primer set and some sequences affiliated with Bacteria were also detected with the archaeal primer sets (see Table S4). No significant difference was pointed out between proportions of removed sequences due to a lack of specificity when DNAs were extracted using Zhou's method or the kit method. In contrast, proportions of removed sequences were significantly different between samples for the Arch1 and Arch2 regions (ANOVA; P < 0.001). Likewise, significant differences were identified depending on the amplified region (ANOVA; P < 0.01).
OTU analysis.The number of OTUs was calculated for each sample at two different levels of dissimilarity, i.e., 3% and 10%, corresponding to the phylogenetic distances conventionally used to define species (3%) and families (10%) based on full-length 16S rRNA gene sequences (29) (Table 2). Whatever the amplified region, no significant difference was observed between the numbers of OTUs obtained when DNAs were extracted using Zhou's method or the kit method. However, the numbers of OTUs were globally significantly different between the sequenced regions for the archaeal data set (Student t test; P < 0.03). Indeed, fewer OTUs were obtained for the Arch2 region than for the Arch1 region for the cold seep SC6 sample (141 and 299, respectively, on average, at the 3% level and 26 and 125, respectively, on average, at the 10% level). For the hydrothermal vent sample, fewer OTUs were found for the Arch1 region than for the Arch2 region at 3% dissimilarity (183 and 224, respectively), whereas the opposite was true at 10% dissimilarity (95 and 78, respectively). For the bacterial data set, the numbers of OTUs at 10% dissimilarity were significantly higher for the Bact1 region (86, 131, and 124 for cold seep SC1, cold seep SC6, and hydrothermal vent SH1 samples, respectively) than for the Bact2 region (65, 92, and 100 for cold seep SC1, cold seep SC6, and hydrothermal vent SH1 samples, respectively) (Student t test; P < 0.02).
Numbers of OTUs for each sample, according to primer set and DNA extraction method, determined at two levels of dissimilarity (3% and 10%)
Yue and Clayton (84), Bray-Curtis (85), and Morisita-Horn (86, 87) diversity indexes (see Table S5 in the supplemental material) and OTU networks (Fig. 1) were used to compare OTU compositions between samples, replicates, extraction methods, and amplified regions, at a level of dissimilarity of 10%.
Network analysis of microbial communities according to the sample origin and DNA extraction method used. OTUs were defined at a 10% dissimilarity. Sample nodes are denoted by purple nodes, while other nodes represent individual OTUs. OTUs with the largest numbers of sequences are represented by large red dots (1,046 sequences for Bact1, 3,560 for Bact2, 3,438 for Arch1, and 4,992 for Arch2), while OTUs including few sequences are represented by small green dots (1 sequence) and intermediates. Shared OTUs are connected by lines. (A) Arch1 region; (B) Arch2 region; (C) Bact1 region; (D) Bact2 region.
The beta diversity indexes obtained showed that OTU compositions between samples were not similar, though some of them were very close (see Table S5). Regarding the Bacteria, the highest similarity indexes were obtained between samples from the same origin. The two extraction methods did not seem to affect the similarity indexes. Furthermore, community structures obtained with the Bact2 primer set appeared to be more similar between all samples than those obtained with the Bact1 primer set (Student t test; P < 0.01 for the three indexes). Regarding the Archaea, the highest similarity indexes were obtained between the samples from the same origin in the case of the Arch2 region. However, community structures appeared to be different depending on the extraction method used for the hydrothermal vent sample in the case of the Arch1 region (Student t test; P < 0.001 for the three indexes).
On OTU networks, both OTUs and samples are represented by nodes. In Fig. 1, samples are represented by purple dots, while other nodes represent OTUs. OTUs with the largest numbers of sequences are represented by large red dots (1,046 sequences for Bact1, 3,560 for Bact2, 3,438 for Arch1, and 4,992 for Arch2), while OTUs including few sequences are represented by small green dots (1 sequence) and intermediates. When a sequence is affiliated with an OTU in a sample, the OTU is linked by a line to the corresponding sample node. The OTU networks show that OTUs represented by many sequences were shared between all samples (large red dots in Fig. 1), while nonshared OTUs contained only a few sequences (small green dots in Fig. 1). The more scattered Bact1 network shows that OTUs were more similar between samples when the Bact2 region was amplified.
Taxonomic affiliations.After filtering of the reads, the taxonomic affiliations of the remaining sequences highlighted different community structures according to sampling sites, targeted regions, and extraction methods.
Similarities between samples are represented graphically for Bacteria (Fig. 2) and Archaea (Fig. 3), using Euclidean distances, and proportions of the different microbial groups are reported in Tables 3 and 4. If no bias existed between extraction methods and the primer set used, we would expect only three different microbial community structures, for the three physicochemically contrasting samples used.
Ordination by multidimensional scaling (MDS), derived from the Euclidean distances between samples, according to DNA extraction method and the primer set used for Bacteria. Dotted lines indicate percent similarities of the community compositions between grouped samples (green, 25%; blue, 58%; turquoise, 70%; red, 80%). (B to D) Sizes of colored circles are proportional to microbial group relative abundances in each sample (Table 3) (for the GBG [B], CFB [C], and OD1 [D] groups).
Ordination by MDS, derived from the Euclidean distances between samples, according to DNA extraction method and the primer set used for Archaea. Dotted lines indicate percent similarities of the community compositions between grouped samples (green, 35%; blue, 60%; turquoise, 70%; red, 80%; pink, 90%). (B to D) Sizes of colored circles are proportional to microbial group relative abundances in each sample (Table 4) (for the ANME-3 [B], ANME-1 [C], and MBG-D [D] groups).
Taxonomic affiliations of pyrosequencing reads after filteringa
Taxonomic affiliations of pyrosequencing reads after filteringa
Regarding Bacteria, higher levels of dissimilarity were observed between samples from different origins. The three expected groups were observed as main clusters: the surface seep sediments, the deep-seep sediments, and the surface vent sediments (Fig. 2A). The differences observed between samples from different origins were mainly due to variations in some bacterial group proportions (Table 3). Similarity percentage analysis (SIMPER) (83) highlighted that the Epsilonproteobacteria (21.75% contribution) and SEEP-SRB1b (sulfate-reducing Bacteria) (71, 72, 88) (11.03% contribution) groups were mainly responsible for the differences between surface and deep-seep samples (Table 3). The difference between seep and vent samples was mainly due to the Guaymas bacterial group (GBG), also known as the HotSeep-1 cluster (63, 89) (Fig. 2B) (15.10% contribution), the Epsilonproteobacteria (11.40% contribution), and SEEP-SRB2 (11.01% contribution) (Table 3). This is congruent with previous molecular surveys of these ecosystems (41, 52, 90). Within these three main clusters, samples grouped according to the primer set used (Fig. 2A). SIMPER analysis highlighted that the difference in community structures observed between Bact1 and Bact2 amplifications was due mainly to variations in amplification of CFB (Bacteroidetes group) (Fig. 2C) (25.56%, 14.62%, and 15.11% contributions for surface seep samples, deep-seep samples, and vent samples, respectively). However, Epsilonproteobacteria from surface seep sediments (23.55% contribution) and SEEP-SRB1b from deep-seep sediments (27.65% contribution) were amplified more with the Bact2 primer set (SIMPER analysis). Likewise, the candidate division OD1 (69) was detected only with the Bact2 primer set (Fig. 2D), contributing 5.32% and 7.66% of the variation within the SC6 and SH1 samples, respectively (Fig. 2D and Table 3). Finally, we observed low levels of dissimilarity between extraction methods (Fig. 2A).
Regarding Archaea, differences generated by the targeted regions were much more conspicuous than for Bacteria (Fig. 3A). For both the Arch1 and Arch2 regions, clustering between samples of the same origin was partially masked by “region and extraction method effects” (Fig. 3A). Indeed, for deep-seep and vent samples, we observed an important effect of the targeted regions. For deep-seep samples, ANME-1 (archaeal anaerobic methanotroph) (91, 92) (Fig. 3C) (49.20% contribution), DHVE-6 (deep-sea hydrothermal vent euryarchaeota) (93) (13.84% contribution), and MBG-D (marine benthic group D) (94) (Fig. 3D) (12.68% contribution) were mainly responsible for this cleavage (0.28% and 89.40% ANME-1, 25.11% and 0.04% DHVE-6, and 22.96% and 0% MBG-D for Arch1 and Arch2, respectively, amplified from deep-seep samples) (Table 4). For vent samples, ANME-1 (Fig. 3C) (42.59% contribution), ANME-2c (20.03% contribution), and MBG-D (Fig. 3D) (14.81% contribution) were mainly responsible for this cleavage (13.01% and 59.41% ANME-1, 43.97% and 23.55% ANME-2c, and 15.13% and 0.04% MBG-D for Arch1 and Arch2, respectively, amplified from vent samples extracted by Zhou's method) (Table 4). Finally, the important difference between community structures obtained with the Arch1 and Arch2 primer sets and using Zhou's method on vent samples was not observed when the commercial kit was used (Fig. 3A).
DISCUSSION
In marine sediments, like the case in many other ecosystems, most microorganisms remain unculturable. Thus, the identification of the diverse microbial populations is based mainly on molecular characterization. Numerous studies investigating the microbial compositions of environments have focused on DNA profiling (3, 64), which requires efficient and unbiased DNA extraction procedures as well as the use of universal primers and relevant bioinformatic tools. These requirements become critical with the emergence and increasing use of high-throughput sequencing and analyses of complex environments harboring various contaminants and almost unknown microbial communities. In this study, we investigated the influence of DNA extraction methods and 16S rRNA gene regions targeted by four different primer sets through 454 pyrosequencing of three physicochemically contrasting samples from extreme marine environments (cold seeps and hydrothermal vents).
Impact of DNA extraction methods.We showed that the two extraction methods led to various levels of DNA quantity and quality (see Table S3 in the supplemental material). Indeed, commercial kit extraction (FastDNA Spin for Soil), which used bead-beating lysis in reduced-size tubes and included a purification step, yielded steadier DNA quantities between samples than Zhou's method (12) (performed with modifications), which used chemical, enzymatic, and freeze-thaw lysis steps. The quantity of DNA extracted with the commercial kit could be limited by the maximum DNA concentrations allowed by the filters used in the filtration step. Variations between DNA quantities might also be dependent on sample features, such as presence of macrofauna or high-molecular-weight compounds, such as humic acids. The higher concentrations of DNA obtained with surface seep samples and Zhou's method could be explained by the coextraction of DNAs from the macrofauna living on the surface of this sediment sample (e.g., small gastropods and polychaetes). In contrast, the tube sizes used for the bead-beating extraction could have excluded these animals, whose DNA would then not be extracted. Interestingly, previous studies on soils highlighted that coextraction of eukaryotic DNAs or other polyvalent polymers among microbial DNAs enhanced the DNA extraction efficiency, minimizing adsorption and degradation of target DNA (21, 95). Thus, coextraction of eukaryotic DNAs by Zhou's method could eventually provide a benefice in case of low-microbial-biomass environments. Furthermore, DNA quality also seemed to be dependent on the extraction method used. While the commercial kit provided higher 260/280 ratios, Zhou's method led to higher 260/230 ratios. Therefore, the commercial kit may perform better in removing contaminant proteins, while Zhou's method may be more efficient at removing humic substances (96–98).
The influence of the extraction methods on pyrosequencing results was investigated. No significant difference in terms of numbers of reads was observed between DNA extraction protocols in the pyrosequencing raw data outputs (see Table S4 in the supplemental material). This result indicates that despite differences in DNA quality after extraction, the purification step led to similar amplification efficiencies. Likewise, after the filtering steps with the raw pyrosequencing data, no significant difference was observed among the numbers of reads obtained, regardless of the DNA extraction method used (see Table S4). Bead beating has previously been suspected to shear DNA due to mechanical lysis and thus may give rise to the formation of more chimeric PCR products than Zhou's protocol (12, 99). However, our results indicated that the two extraction methods did not appear to drastically affect the quality of the reads or chimera formation.
Extraction methods have previously been shown to bias community structure and microbial diversity results (10, 13, 16–18, 21, 99). In our study, the OTUs obtained when DNA was extracted with the commercial kit or by Zhou's method were globally similar for each given sample (Table 2 and Fig. 1; see Table S5 in the supplemental material), which indicated that the two extraction methods might lead to similar estimations of the global microbial diversity. Interestingly, the differences in DNA quantity observed in the surface seep sample did not appear to change the results of the analysis, confirming that the supplementary DNA extracted by Zhou's method likely corresponded to nonprokaryotic DNA. However, the extraction method used seemed to affect the results we obtained for the vent samples when the Arch1 primer set was used (Fig. 3 and Table 4; see Table S5). This could indicate that some groups, amplified only by the Arch1 primer set, could be extracted more efficiently with one extraction method, as previously noted (21, 99–101). Thus, the picture of the community structure may sometimes be strongly biased by the extraction step, and a combination of several extraction methods would be a good alternative to gain a more accurate picture of microbial diversity.
Impacts of primers and targeted regions.Besides the DNA extraction step, the choice of PCR primers and 16S rRNA gene target regions is also critical for molecular analysis of microbial diversity (14, 15, 33, 38). Two primer sets for Bacteria and two others for Archaea, each targeting a different 16S rRNA gene region, were tested, and contrasting results were obtained.
The primer set choice affected the results in the early stages of our analysis. Even if an equimolar mix of samples was used for sequencing, significantly different numbers of reads were obtained depending on the primer set used (see Table S4), as previously noted in other projects (A. Dheilly, Biogenouest Platform, Rennes, France, personal communication). This variability may reflect differences in the sample titration in the equimolar mix (16) or may have been due to the fusion primer set used. Indeed, association of primers with tag sequences and pyrosequencing adaptors may reduce the hybridization efficiency on DNA capture beads, which is required for emulsion PCR and sequencing, and/or may be a disadvantage in amplification. Fusion primers must be designed carefully to minimize the formation of potential secondary structures. Each sample might also be labeled with two different tags and sequenced twice to reduce amplification bias due to differential binding of the fusion primers. We also observed contrasting impacts of read filtering according to the primer set used (see Table S4). Chimera detection and specificities of the primer sets for their respective kingdoms led to the removal of more or less reads depending on the primer set used (see Table S4), thus reducing the number of sequences available for the rest of our study. The largest number of chimeras obtained, for the V5–V6 regions (amplified with the Arch2 primer set) (see Table S4), might reveal that these regions could be more prone to splitting than the V2–V3 regions (amplified with the Arch1 primer set). Such variations between 16S rRNA gene regions have previously been highlighted in pyrosequencing analyses (102). The amount of low-quality data removed during the filtering step could also depend on samples and their associated microbial communities, as previously observed for chimera formation (102). Indeed, our results reflected some differences in terms of diversity among the samples (Tables 2 to 4 and Fig. 1 to 3; see Table S5). For example, we highlighted that the number of detected chimeras was dependent on the sample for the Arch2 region (see Table S4). The origin of these chimeras could be explained by the presence of some particular hydrothermal vent microbial lineages. Due to their nucleotide composition, 16S rRNA gene V2–V3 regions could not be amplified for these lineages with the Arch1 primer set and thus do not appear in our analyses. Their successful amplification using the Arch2 primer for the V5–V6 regions could have generated more chimeras.
Regarding the microbial diversity, important differences in community structure were observed depending on the primer set used (Tables 3 and 4 and Fig. 1 to 3; see Table S5 in the supplemental material). The use of a custom database allowed a sufficient level of precision for read assignment to detect these differences for some specific groups (e.g., SEEP-SRB1 or MBG-D), while the NCBI databases (EMBL, DDBJ, and GenBank databases) and associated taxonomies were inaccurate, and the levels of classification were often too high to be informative. These differences were more pronounced for Archaea than for Bacteria and were probably due to a disparity of matching efficiencies between primers for some microbial groups (Table 1), as previously observed in marine sediments (64). Thus, the Arch1 primer set appeared to amplify some archaeal groups which were less or not amplified with the Arch2 primer set (Tables 2 and 4), notably some ANME-2c, MBG-D, and DHVE-6 lineages (Table 4 and Fig. 3). Conversely, the Arch2 primer set appeared to amplify some archaeal groups which were less or not amplified with the Arch1 primer set (Tables 2 and 4), such as the ANME-1 lineage (Table 4 and Fig. 3). Likewise, the Bact1 primer set appeared to successfully amplify some bacterial groups (Tables 2 and 3), particularly CFB and candidate division OD1 bacteria, that were less or not amplified with the Bact2 primers, while the Bact2 primers more efficiently amplified Epsilonproteobacteria and the Deltaproteobacteria subgroup SEEP-SRB1b (Table 3 and Fig. 2). This differential amplification success led to more similar microbial communities between samples when the Bact2 primer set was used instead of the Bact1 primer set (Fig. 1; see Table S5). Differences in terms of numbers of OTUs at a 3% level of dissimilarity were observed between the two archaeal primer sets but were not detected at 10% dissimilarity, suggesting that the variation in the community structure was probably due to misdetection of some archaeal subgroups. This result might suggest a great variability of primer hybridization efficiency and amplification due to sequence mismatches of close microbial lineages (above 97% similarity).
Therefore, the choice of primer sets considerably affects the representation of the microbial community and, more particularly in our study, the archaeal community. This is important in microbial ecology because many of the discussions in the literature are based on the presence/absence of lineages to explain environmental data and microbial community distribution and function. The design of universal primers remains very delicate, especially since we do not know a priori the microbial compositions of samples. The situation is obviously more delicate for lineages presenting an intragroup heterogeneity. The use of different primer sets might be a good solution to minimize these biases. However, it would be better to totally eliminate the PCR amplification step that can generate many biases (e.g., by using de novo analyses or metagenomic studies).
Conclusions.In conclusion, we found that DNA extraction methods and primer sets could have major effects on species richness estimations for our samples. This revealed that every type of sample, because of its own nature, requires a very careful drawing of community structure conclusions. Indeed, community structure, microbial diversity, and proportions of read numbers, sometimes used as proxies for microbial relative abundance (103), could be strongly biased in some cases, and researchers must be very careful in drawing conclusions about community structures. Differences in DNA quantities and qualities observed after extraction did not always reflect a larger degree of microbial diversity. Similarly, if fewer reads are obtained after pyrosequencing for some samples, it does not necessarily reflect less microbial diversity. As long as molecular studies are dependent on a PCR amplification step, the choice of a good primer set will be crucial for giving a nonbiased vision of the true composition of the analyzed sample. In our study, the use of either primer set led to a totally different representation of the microbial communities. Indeed, for deep-seep samples, we obtained a proportion of 92.45% ANME-1 with one set of primers but only 0.48% with the other set. In the same way, while we detected a proportion of 25.31% for the Deltaproteobacteria subgroup SEEP-SRB1b with one set of primers and 2.11% with the other set, we did not observe differences for the other Deltaproteobacteria lineages between the two sets of primers. Next, even if the two extraction methods gave the same results for one given sample, they could lead to drastic changes for another sample. While the two extraction methods resulted in totally different archaeal community compositions for the vent samples with one primer set, no significant impact was detected for all other samples, regardless of the amplified region. Thus, using Zhou's method, we detected 12.31% MBG-D, 14.30% ANME-1, and 49.88% ANME-2c, while using the commercial kit on the same sample, we observed 7.07% MBG-D, 67.12% ANME-1, and 7.70% ANME-2. Therefore, in addition to the combination of several extraction methods and primer sets, we recommend combining sequencing methods with complementary approaches to gain a more comprehensive picture of the communities under scrutiny. For example, real-time PCR with different primer sets appeared to be very useful for obtaining more details about microbial group quantities (40). In addition, since we do not really know the numbers of 16S rRNA gene copies within bacterial and archaeal genomes, which could drastically increase microbial diversity results (104, 105), a fluorescence in situ hybridization (FISH/catalyzed reporter deposition [CARD]-FISH) approach (90, 106) could allow for confirmation of previous results.
ACKNOWLEDGMENTS
We are indebted to the crews of the research vessel L'Atalante and the submersible Nautile of the “BIG” cruise and to the scientific team, in particular Françoise Lesongeur, Mathilde Le Roy, Laurent Toffin, and Nolwenn Callac, for their work on board. We thank Maxime Galan, Alexandra Dheilly, and Sophie Coudouel for very helpful scientific discussions.
This cruise was funded by Ifremer (France) and benefited from a work permit in Mexican waters (permit DAPA/2/281009/3803; 28 October 2009). This study was supported by Carnot Institute funding and by an Ifremer/Brittany region Ph.D. grant for P.C.
FOOTNOTES
- Received 20 February 2014.
- Accepted 8 May 2014.
- Accepted manuscript posted online 16 May 2014.
Supplemental material for this article may be found at http://dx.doi.org/10.1128/AEM.00592-14.
- Copyright © 2014, American Society for Microbiology. All Rights Reserved.
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.