Previous Article | Next Article ![]()
Applied and Environmental Microbiology, May 2009, p. 2677-2683, Vol. 75, No. 9
0099-2240/09/$08.00+0 doi:10.1128/AEM.02166-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.

Microbial Ecology Program, Division of Biological Sciences,1 Montana—Ecology of Infectious Diseases Program, The University of Montana, Missoula, Montana2
Received 18 September 2008/ Accepted 20 February 2009
|
|
|---|
5,000 sequences from a single soil sample (i.e., a closed site-specific library was used to create PCR primers for use at this site). These primers were initially tested in silico prior to empirical testing by PCR amplification of known target sequences and of controls based on disparate phylogenetic groups. Although all primers were highly specific according to the in silico analysis, the empirical analyses clearly exhibited a high degree of nonspecificity for many of the phyla or classes, while other primers proved to be highly specific. These findings suggest that significant care must be taken when interpreting studies whose results were obtained with target specific primers that were not adequately validated, especially where population densities or dynamics have been inferred from the data. Further, we suggest that the reliability of quantification of specific target abundance using 16S rRNA-based quantitative PCR is case specific and must be determined through rigorous empirical testing rather than solely in silico. |
|
|---|
Additional concerns regarding primer specificity and primer-template mismatch come into play where quantitative analyses or comprehensive surveys are desired. Because of the massive numbers but uneven representation of 16S rRNA gene sequences in databases (i.e., generally low representation of cultivated, well described, environmental organisms), the feasibility of comprehensive primer testing is limited. This has been partly dealt with by performing in silico testing using freely available software such as PRIMROSE (3, 7). Alternatively, researchers have simply tested specificity against a subset of potential targets (5, 6, 11, 15, 16, 19, 21, 23), which in most cases likely represent only a fraction of the biodiversity present in their respective study sites.
The purpose of this study was to directly and rigorously assess the validity and efficacy of using 16S rRNA-based primers for phylum-, class-, and operational taxonomic unit (OTU)-specific target amplification in support of bacterial population studies at our site. We utilized a large 16S rRNA gene clone library (
5,000 sequences) and the PRIMROSE program (3) to develop primers and to assess (in silico) the specificity of primers targeting multiple taxonomic levels across several phyla. Each potential primer set was then subjected to primer and reaction condition optimization for PCR and subsequently tested in vitro for specificity against genomic DNA and PCR-derived templates.
|
|
|---|
16S rRNA gene sequence library.
A 16S rRNA gene sequence library containing 4,889 sequences that was generated as described previously (17) from soil at the Kellogg Biological Station Long Term Ecological Research site (KBS-LTER) by targeting hypervariable regions V4 and V5 of the 16S rRNA gene (GenBank accession no. EU352912 to EU357802) was used for primer development and in silico testing as described below. An additional 16S rRNA gene sequence library containing >50,000 partial, aligned, and annotated sequences was downloaded from the ARB website (http://www.arb-home.de/) (14) for in silico testing of primer specificity.
Primer design and in silico testing.
The PRIMROSE software (3) was used to design 16S rRNA gene primer sets for several major bacterial groups identified in the KBS-specific library based on taxonomic assignments developed using ARB (Table 1) (17). Each primer pair was comprised of a phylum-, class-, or OTU-specific forward primer targeting the V4 and V5 hypervariable regions and the generally conserved reverse primer 907r (5'-CCGTCAATTCMTTTRAGTTT-3') (13). Primer sets were generated for predominant phyla, as well as for some of the most abundant individual phylotypes or OTUs that were found at the site, based on
97% sequence similarity. All primer sets were designed using the default settings of the PRIMROSE program.
|
View this table: [in a new window] |
TABLE 1. Primer targets and sequence representation within libraries
|
85%) of sequences contained within its target taxon (within the KBS library) were indicated as detected; (ii) concomitantly, a low proportion (
17%) of nontarget phylotypes were indicated as detected; and (iii) the least number of degeneracies within the primer was required to achieve the first two criteria. Primers that qualified under these criteria were then tested in silico using PRIMROSE against target and nontarget taxa from the ARB-generated library containing over 50,000 16S rRNA gene sequences, including archaeal and eukaryotic sequences not specific to the KBS site.
Primer optimization and in vitro testing.
Genomic DNA preparations from Bradyrhizobium japonicum USDA 110d, Streptomyces griseus, Acidovorax facilis, Pseudomonas putida, Acidobacterium capsulatum, and Pedobacter heparinus were initially used to assess and optimize amplification from each respective set of phylum- and class-level primers. These isolates are not from the KBS site but are known through sequence analysis to have 100% complementarity to their respective primers. Temperature gradient (±10°C) PCRs were run for each phylum- or class-specific primer (Tables 2 and 3), using primer 907r as the second primer in the pair and the predicted melting temperature (Tm) for the specific primer as the central point in the temperature gradient. Reactions were performed in a final volume of 50 µl, and each reaction mixture contained 1x PCR buffer (containing 1.5 mM MgCl2), 200 µM deoxynucleoside triphosphates, 10 pmol of each primer, 20 µg bovine serum albumin, 1 U HotStar Taq polymerase (Qiagen, Valencia, CA), and 5 ng of genomic DNA as the template. Cycling conditions included an initial denaturation step of 15 min at 95°C; 30 cycles of amplification consisting of denaturation (25 s at 96°C), primer annealing (30 s at the requisite Tm), and primer extension (30 s at 72°C); and a final extension step of 1.5 min at 72°C for all primers tested.
|
View this table: [in a new window] |
TABLE 2. Phylum- and class-level primer sequences and target specificities as tested by PRIMROSE
|
|
View this table: [in a new window] |
TABLE 3. Primer sequences and target specificities to the KBS library as tested by PRIMROSE for the top 10 most abundant OTUs ( 97% sequence similarity)
|
Each primer set was subsequently screened for specificity by testing against its relevant positive control DNA, while using the positive controls for all other primer sets as negative controls. These reactions were run using the optimal experimentally derived annealing temperature and DNA concentration for each primer pair. The first round of specificity screening was conducted by creating four separate negative control groups of DNA, each containing four negative control DNAs (four different nontarget sequences) in equimolar concentration. Candidate primers were first tested against these groups to identify nontarget amplification. Primers that showed positive amplification with more than one negative control group were not tested further. A second round of testing was done with individual templates found within the amplified negative control groups (see above) in order to identify which of the four nontarget DNAs in that group was responsible for nonspecific amplification.
Real-time PCR assays.
Quantitative real-time PCR (qPCR) was performed on an iCycler iQ thermocycler (Bio-Rad, Hercules, CA) using 25-µl reaction mixtures that included 1 µl of template DNA at the concentration being tested (see Fig. 3), 10 pmol of each primer, 20 µg bovine serum albumin, and 12.5 µl of ABsolute Blue QPCR Sybr green ROX mix (ABgene, Rochester, NY). Forty cycles of amplification were performed using the same cycling conditions used for primer testing [Tm of 55°C and 53°C for primer pairs 688-706fAB plus 907r and Acido (#6)654-672 plus 907r, respectively]. An additional melting curve analysis, where fluorescence was measured as the temperature increased from 50°C to 100°C, was also performed to test for the amplification of a single target. The efficiency of PCR amplification for both primer pairs was assessed using a standard curve prepared using the PCR product generated from a cloned 16S rRNA gene insert (clone 302-F22 [Table 4]) assigned to the Acidobacteria group 6 cluster as determined by sequence alignments performed in ARB (17, 18). Control reactions lacking template DNA were run to ensure that primer dimers were not contributing to the overall signal. Two independent rounds of triplicate reactions were performed for each target, and the results of at least three qPCRs were analyzed. Abundances for all replicate reactions were related to the standard curve by their respective fluorescence intensity values, giving values of relative concentration.
![]() View larger version (20K): [in a new window] |
FIG. 3. Specific detection of Acidobacterium target DNA (Acidobacteria group 6 clone 302.F22 DNA) using the Acido (#6)654-672 plus 907r and 688-706fAB plus 907r primer sets. The given amounts of target DNA were tested alone or after addition to 9 ng of total community DNA isolated from KBS-LTER treatment 1, replicate plot 1. Values indicate the fold change in detection of the target group as a function of the amount of target added. Values for each primer set were normalized to 1 pg of specific target to show fold change in detection. Error bars are one SE of the mean for two rounds of triplicate qPCRs (final n 3) Target, Acidobacteria group 6 clone 302.F22; soil, 9 ng of total community DNA extracted from soils at the KBS-LTER treatment 1, replicate plot 1.
|
|
View this table: [in a new window] |
TABLE 4. Positive control targets and classification based on ARB and Classifier
|
|
|
|---|
On average, all higher-order primers (targeting phylum- and class-level groups) (Table 2) were predicted to detect about 92.7% (standard error [SE], 1%) of their specific targets while conversely predicted to amplify 4.6% (SE, 1.1%) of nontarget sequences. In order to assess whether primers designed from our site-specific sequence library might have a reduced ability to detect sequences from other sites (indicating some level of site specificity), an ARB database with over 50,000 sequences (Table 1) was used to assess both specificity and target group detection capabilities in silico. The results indicated a reduced ability of the primers to detect their corresponding phylum- or class-specific target sequences in the ARB database, especially for groups with low sequence representation such as the Chlorobi, Chloroflexi, and Gemmatimonadetes. On average, primer sets were able to detect 75.3% (SE, 3.6%) of their intended phylum or class in the ARB database while being predicted to detect about 6% (SE, 1.6%) of nontarget prokaryotic sequences. No significant detection of archaeal or eukaryotic sequences was noted.
An additional 10 primer sets were designed for detection of the 10 most abundant OTUs (genus-level phylotypes based on
97% sequence similarity) identified from our KBS-LTER data set (Table 3). On average, these primers detected 94.2% (SE, 0.9%) of their specific targets while detecting only 1.2% (SE, 0.5%) of nontarget sequences in the KBS database.
The effect of primer-target mismatch (theoretical number of mismatched base pairs allowed during annealing) on primer specificity was examined and shown to be highly significant in increasing nonspecific target detection (Fig. 1 and 2). As anticipated, a single mismatch was sufficient to substantially increase detection of nontarget sequences for all primers tested (some data not shown). When tested against our site-specific library, all higher-order primers were predicted to exhibit a ninefold increase in detection of nonspecific targets on average when a single mismatch was allowed (Table 2). For the OTU-level primers, a sixfold increase in detection of nontarget sequences was predicted when a single mismatch was allowed (Table 3).
![]() View larger version (21K): [in a new window] |
FIG. 1. Effect of mismatched bases on the recovery of target and nontarget sequences using phylum-level primers. Primers were tested in silico against 4,889 sequences from the KBS-LTER library and >50,000 sequences from the ARB database. (A) Thermomicrobia-specific primer 555-573fTM; (B) Gemmatimonadetes-specific primer 677-695fGT.
|
![]() View larger version (15K): [in a new window] |
FIG. 2. Effect of mismatched bases on the recovery of target and nontarget sequences using OTU-level ( 97% sequence similarity) primers. Primers were tested in silico against 4,889 sequences from the KBS-LTER library. (A) OTU-specific primer Coma851-869f; (B) OTU-specific primer Pseudo573-591f.
|
Primer specificity was tested using a set of mixed pools of four nontarget DNA samples in PCRs. Of all 28 primers tested, only 5 primers (at their optimal Tm, which is given in parentheses below) were shown to have sufficient specificity to support their use in qPCR, namely, Acido (#4)599-617f (57°C), Acido (#6)654-672f (53°C), Thermo (#4)735-753f (52°C), Thermo (#7)658-676f (52°C), and Nitro813-831f (49°C) (Tables 2 and 3). Of these five, the two primer sets designed to target subgroups within the Thermomicrobia were not sufficiently specific to distinguish between these subgroups within the phylum when tested empirically but rather represented useful primer sets for the entire phylum. The other three primer sets exhibited no nonspecific amplification using any of the tested sequences and thus represent validated primers for further quantitative analysis. Additional optimization was attempted to increase the specificity of the remaining primer sets that had not met validation criteria thus far, but conditions that improved specificity invariably resulted in concurrent decrease in target amplification efficiency, thereby limiting their utility as specific qPCR primers for community DNA.
Target detection using real-time qPCR.
The target detection and PCR efficiencies of two primer sets, one that passed validation [Acido (#6)654-672 plus 907r, specific for Acidobacteria group 6 targets] and one that did not (688-706fAB plus 907r, putatively specific for the phylum Acidobacteria), were compared using a real-time qPCR assay. Primers were initially validated for qPCR assays by creating a target standard curve (105 to 10 copies of the target). Once the primers were validated, an appropriate standard curve was used to assess and compare the abilities of the primers to detect changes in target concentration. A 10% difference in PCR efficiency was observed between the primer pairs, with Acido (#6)654-672 plus 907r exhibiting 103.05% efficiency while 688-706fAB plus 907r showed 92.9% efficiency. Although both primer sets were able to detect and quantify different target concentrations both alone and in the presence of total community DNA from soil (Fig. 3), the higher PCR efficiency of the primer pair containing Acido (#6)654-672 resulted in
3-fold more signal for a given target concentration compared to that with the primer pair containing 688-706fAB. Interestingly, despite the lower efficiency of amplification with the phylum-level primer set, quantification of target sequences from an unspiked soil sample from the KBS-LTER (treatment 1, replicate plot 1) indicated a higher abundance (4 pg, or
107 copies) of phylum-level acidobacterial targets than of genus-level Acidobacteria group 6 targets (0.23pg, or 105 copies) per 10 ng of soil extracted DNA, indicating a low proportion of Acidobacteria group 6 within the total Acidobacteria phylum representatives present in that soil.
Recovery of target-specific sequences from soil by cloning.
To experimentally test the target specificities of a primer set that passed validation [Acido (#6)654-672 plus 907r] and one that did not (688-706fAB plus 907r), two independent clone libraries were created from the same soil sample used for the real-time PCR assay. Totals of 79 and 77 clones were analyzed, respectively, for these primer sets (Table 5). As expected from the results of the validation experiments, the phylum-specific Acidobacteria primer set (which had failed validation) recovered target-specific phylotypes but also resulted in recovery of a large number of nontarget phylotypes from soil community DNA (35% versus 65%, respectively). These nontarget sequences were distributed among four different phyla, with a large proportion (39%) being unclassified. In contrast, the Acidobacteria group 6-specific primer set Acido (#6)654-672 plus 907r (which passed validation) displayed an extremely high level of specificity, resulting in 96% of all sequences recovered from the soil community being classified as Acidobacteria group 6 and the remaining 4% not being highly associated with any particular phylum (Table 5).
|
View this table: [in a new window] |
TABLE 5. Phylogenetic distribution of soil clones generated using primers 688-706fAB and Acido (#6)654-672
|
|
|
|---|
5,000 sequences), site-specific database of bacterial 16S rRNA gene sequences from treatment 1 at the KBS-LTER site. The size of this library allowed a highly robust analysis of specificity within a well-defined microbial community. The rationale for this approach was that phylum-, class-, and OTU-specific primer sets derived solely from this extensive site-specific (i.e., closed) library could be optimally directed against the groups present without concern for nonspecific detection of extraneous sequences from other locations and environments. Primer sets were designed using the widely available and widely utilized PRIMROSE software at its default settings. Given that the primer sets generated using this approach exhibited a diminished (but still substantial) ability to amplify phylotypes from other sites within the ARB database but belonging to the same phylum or class, they are probably at least to some degree site specific. This site specificity presumably makes them more appropriate for studies at the KBS-LTER site than primer sets generated in the context of all known sequences from everywhere, which was the objective of this work. We suggest that this approach represents a general strategy for development of qPCR primers that will be most effective at a given site.
The utility, efficacy, and specificity of these primer sets were assessed both in silico (using the KBS closed library and a much broader ARB-based library) and empirically using PCR on positive and negative control DNAs. Despite predicted high specificity for targeted sequences in silico (
83% targets detected versus
17% nontargets detected for all primer sets [Tables 2 and 3]), empirical testing revealed significant variability in target specificity, where some primer sets worked as predicted and others were highly cross-reactive. Presumably, this is due to the ability of primers with slight internal mismatches to bind sufficiently well to enable an initial elongation event. After the initial round of such "misprimed" elongation, subsequent PCR products would readily accumulate and sustain the exact sequence of the primers being used and thereafter would amplify with high efficiency. Thus, mismatching between primer and DNA template lends high specificity only where there is virtually no initial elongation taking place. Unfortunately, this likely occurs only where multiple, consecutive, or 3' prime mismatches are present or in in silico exercises where this "initial mispriming followed by high-fidelity amplification" phenomenon is not accounted for. We suggest that this phenomenon, at least in part, explains the equivocal nature of reports regarding primer specificity in previously published studies.
We acknowledge that our focus on a partial region of the 16S rRNA gene (corresponding to Escherichia coli positions 536 to 907) rather than full-length sequence information limited us to reliance on the V4 and V5 hypervariable regions for identification of signature sequences to provide specificity. However, it has previously been shown by our group and others that the V4-V5 region is suitable for consistent phylogenetic assignment compared to full-length sequence information (17, 26). This was further supported by our in silico analyses, which indicated high ratios of predicted target to nontarget sequence recovery from the large ARB database, suggesting that sufficient sequence information and variation were present to find differences delineating groups at the desired levels. Unfortunately, that indicated degree of specificity was often not sufficient when used experimentally, indicating a need for caution when analyzing data obtained using 16S rRNA gene-derived primers that have not been properly tested and validated empirically.
Our study employed a strategy of maximizing the depth of coverage of diversity by analyzing large numbers of partial (
400-bp) 16S rRNA gene sequences in single reads, which were sufficient for reliable classification. This strategy made it challenging to identify two specific, opposing primers that would generate a readily detectable PCR product, and we therefore paired a target-specific forward primer with the same general reverse primer used in the original cloning exercise. Studies utilizing complete or larger stretches of 16S rRNA gene sequences might afford greater opportunity to identify pairs of specific primers, thereby increasing specificity.
Recent reports (4, 10, 12) have shown that so-called "universal" primer sets are incapable of efficiently amplifying all targets, even those from pure cultures and with perfect sequence matches, casting doubts on the universality of such primers. Further, other groups have suggested that 16S rRNA gene group-specific primers can produce upwards of 25% nontarget amplification, making them unsuitable for qPCR (10). It has also been suggested that reliance on the slowly evolving 16S rRNA gene makes it difficult to recognize recent events in the evolutionary history of a species, such as those associated with incipient speciation (24), which might also be a contributing factor to the high levels of nonspecific binding for some of the primer sets employed here.
We conclude by recommending that specific care must be taken when interpreting new or previously published results obtained with 16S rRNA gene-based PCR primers that have not been fully validated, especially where population densities or dynamics are being inferred from the data. We further suggest that the reliability of quantification of group abundance using 16S rRNA gene-based qPCR is case specific and must be determined empirically rather than solely in silico.
Soil samples for this project were graciously provided by the Kellogg Biological Station Long Term Ecological Research project (KBS-LTER). We also gratefully acknowledge Linda Schimmelpfennig and Tara Westlie for technical assistance in creating the clone libraries.
Published ahead of print on 27 February 2009. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»