Previous Article | Next Article ![]()
Applied and Environmental Microbiology, June 2008, p. 3831-3838, Vol. 74, No. 12
0099-2240/08/$08.00+0 doi:10.1128/AEM.02743-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Department of Civil and Environmental Engineering,1 Center for Microbial Ecology,2 Department of Crop and Soil Science, Michigan State University, East Lansing, Michigan 48824,3 Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 481094
Received 5 December 2007/ Accepted 13 April 2008
|
|
|---|
|
|
|---|
When many pathogens must be screened in parallel with the ability to characterize the uneven distribution of the associated VMGs and their allelic variability, optimization is necessary for high-throughput tools with high sensitivity and specificity, e.g., quantitative PCR (QPCR) (14, 19, 27, 32, 44, 46, 50, 51). This optimization may be cumbersome because it requires the generation of multiple standard curves with the caveat that all primer sets must perform under the same amplification conditions. Approaches that increase the reliability of the primer design or avoid the use of standard curves altogether should be extremely useful in developing such parallel assays. The development of such approaches will undoubtedly depend upon the sequence characteristics (guanine and cytosine [GC] content, melting temperature [Tm], amplicon length, etc.) of primers, amplicons, genome size, amplification conditions, and the matrix in which the target is present. However, the influence of these factors on the performance of primer sets, especially on the threshold cycle (CT) number extensively used in QPCR to predict the abundance of pathogens, has not been explored fully.
This study used the nanoliter-volume BioTrove OpenArray platform (33) to examine the capacity of QPCR for highly parallel diagnostics of human pathogens and to systematically examine the influence of target and primer sequence characteristics on specificity, sensitivity, and CT value. The results were used to establish a predictive CT equation to estimate the number of starting copies without the use of standard curves. The study was performed with Sybr green I and used approximately 220 primer sets targeting 200 VMGs for 30 human pathogens. The performance of the predictive CT equation is examined, and the success rate of previously unvalidated pathogen-targeted primer sets is also presented. These results have significance in developing high-throughput and reliable screening tools for large numbers of pathogens without extensive validation.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Pathogens and VMGs targeted with OpenArray plates and organisms used for validationa
|
Design of the PCR primers to establish the predictive CT equation.
A set of primers was designed to target 20 human pathogens and was used to examine relationships between primer sequence characteristics and experimental CT. Prior to the primer design, consensus sequences were generated by aligning sequences for each of 96 VMGs using Kodon (Applied Maths, Austin, TX). The consensus sequence was used for the primer design with Primer Express (Applied Biosystems, Foster City, CA). A majority of the designed amplicons had a maximum length of 150 bases, and primers had a theoretical melting temperature (Tm) of 59°C. Some genes required longer amplicons (<250 bases) for generating acceptable primer sets. From the list of primers provided by Primer Express, optimal primers were selected using a script that automatically highlighted unspecific primers. The script used NCBI BLAST (2) to check specificity against the GenBank database. Specificity was based on the extent of 3'-end perfect matches to nontargeted bacterial sequences. Sequences were selected manually based on the results of the BLAST output. When available, primers described in the literature for successful QPCR were also used. Overall, 110 primer pairs were either designed or extracted from the literature (Table 1; primer sequences are listed in the supplemental material). Including primers previously described in the literature, this primer set targeted 3,687 VMG sequences (determined by BLAST analysis with GenBank, May 2006). Primer sequences and references for primers extracted from literature are listed in the supplemental material.
Design of PCR primers to validate correlations used in the development of the predictive CT equation.
To validate the predictive CT equation and further examine the success rate of the primers, 111 new primers were designed. The new group of primers was designed with the same criteria as the primers designed to establish the predictive CT equation. In addition, the primers were filtered further to select sequences with the lowest possible percentages of GC bases in the 3' ends of the primers. One nonspecific primer targeting C. parvum was removed from the set due to false-positive observations. Thus, a total of 220 primer sets (109 used to establish the predictive CT equation, and 111 used to validate the predictive CT equation) targeting 200 VMGs for 30 pathogenic bacteria were tested (Table 1; the new VMGs targeted with this primer set are in bold).
PCR on BioTrove OpenArray plates.
Primer sets were tested simultaneously on the BioTrove OpenArray plates. Primers were synthesized by Sigma-Aldrich (St. Louis, MO) and preloaded (128 nM for primers designed to establish the predictive CT equation and 400 nM for primers designed to validate the predictive CT equation) into BioTrove OpenArray plates (Woburn, MA) (33, 46). Two to four subarrays (each with 64 wells for 56 separate assays and eight loading controls) were used for each PCR sample. PCR mixtures (5 µl for each sample array) consisted of 1x LightCycler FastStart DNA Master Sybr green I mix (Roche Applied Sciences, Indianapolis, IN), 1.6x Sybr green I, 0.5% glycerol, 0.2% Pluronic F-68, 1 mg per ml bovine serum albumin (New England Biolaboratories, Beverly, MA), 2.5 mM MgCl2, 8% formamide, and a DNA mixture. After the initial enzyme activation at 95°C for 10 min, 36 cycles of the following program were used for amplification: denaturation at 95°C for 10 s, annealing at 53°C for 10 s, and elongation at 72°C for 10 s.
Design of the sample mixtures for development of the predictive CT equation.
Samples were mixed to evaluate the influences of the sequence characteristics of primers, amplicons, genome size, amplification conditions, and the matrix in which the target is present on specificity and sensitivity. To evaluate specificity without a complex background, gDNA from 14 pathogenic organisms (pure cultures; ATCC numbers are listed in Table 1) was tested individually (6 ng in the total sample or 20 pg per reaction well). To develop standard curves and further evaluate specificity and sensitivity within a complex background, gDNA from 14 pathogens was mixed and spiked at various concentrations into gDNA from the wastewater tertiary effluent and river water samples (20 pg, 2 pg, 200 fg, 20 fg, and 2 fg of each of the 14 pathogens mixed together spiked into 66.6 pg of background gDNA per reaction well). The mixture of various concentrations was also examined without a background to serve as a control. The complex background samples were also tested without spiking with gDNA from pure cultures. All of these samples were tested with the set of PCR primers designed to establish the predictive CT equation. These samples were examined further to evaluate the influence of sample inhibition and variation between OpenArray plates. All samples were tested in triplicate.
Design of the validation sample mixtures for validation of the predictive CT equation.
The validation samples were used to further evaluate the effect of characteristics of the template and primer sequences on the primer success rate and to evaluate the predictive CT equation. For the validation samples, gDNA from 21 organisms (ATCC numbers are listed in Table 1; the organisms used solely with these mixtures are in bold) was spiked at various concentrations, for an absolute abundance of approximately 10, 100, and 1,000 genomic copies per reaction well, into gDNA extracted from either river water, tertiary effluent, or activated sludge (0.99 ng per reaction well). In total, 36 validation samples were prepared. All samples were tested in triplicate.
Data analysis.
For all analysis, data was filtered to differentiate true- and false-positive and -negative signals. Amplification was considered positive if the CT was less than 26 for all three replicates and the experimental Tm was consistent. The influence of a primer's 3'-end GC content on specificity was analyzed using the primer (either forward or reverse) with the highest GC content. Primers were grouped based on the number of GC bases within the last 5 bases on the 3' end. For developing the predictive CT equation, only the primer sets displaying true-positive amplification were considered.
The influence of the GC content of the primers and the target organisms on the success rates of novel primers was examined based on the average sum of the successes (taken from 36 sample mixtures for validating the predictive CT equation) for all assays targeting an organism. Assays were considered successful if they displayed a true-positive or true-negative signal, and an organism was deemed present if two or more assays (targeting one organism) displayed amplification with a CT less than 26 for all three replicates. The average GC content of all primers used for a targeted organism was employed for the analysis examining the influence of primer GC content on the success rate. The three-dimensional plot was generated with a loess smoother and 1.0 sampling proportion. This smoothing was performed to identify characteristics of the population.
For comparison with the predictive CT equation, standard curves were generated using an average slope and intercept from all three replicates in all three backgrounds (control and gDNA spiked with river and tertiary gDNA) from sample mixtures designed to develop a predictive CT equation. PCR efficiency was examined to determine the influence of the sample background on quantitative values and was calculated from the slope of the standard curves with the following equation: PCR efficiency = –1 + 10–1/slope.
Raw results of all the experiments are included in the supplemental material.
|
|
|---|
![]() View larger version (21K): [in a new window] |
FIG. 1. Impact of gDNA from various environmental water samples and primer stability on specificity and sensitivity. The left panel shows the percentages of targeted and nontargeted primer assays displaying amplification at various dilutions of organisms spiked in gDNA from environmental samples. Error bars represent the standard deviations between replicates performed on three plates. The right panel shows the sum of the GC bases on the terminal 3' end (various-size circles) versus the percentages of primer sets displaying false-positive amplification when targets are spiked into background gDNA and not spiked into background gDNA. bkg, background.
|
The false-negative signal observed with G. intestinalis may be due to lower relative abundance caused by having a larger genome and a potential reduction in the availability of the target. The influence of genome size on amplification potential has been described previously (11, 15) and may be due to a decrease in the relative abundance of the template over the nontarget DNA. Garner proposed that in addition to a decreased relative abundance, there is an increased chance of the nonspecific annealing of primers to nontarget regions, diminishing the annealing of primers to the target strand (15). Optimizing the PCR cycle conditions or the concentration of the reagents may alleviate false-negative signals (40); however, changing these parameters may influence the specificity and sensitivity of the targeted assays that behaved well.
Since the primers were designed to have the same theoretical Tm, the terminal 3' end of the primers was also examined. The GC content, Tm, and binding energy within the terminal 7, 5, 3, and 2 bases on the terminal 3' end of the primers were considered for all false-positive signals. Correlations (provided in the supplemental material) were used to determine that both the Tm of the last 7 bases and the GC content of the last 5 bases had the highest influence on false-positive signals (considering targets alone and spiked into a background). To demonstrate this, primer sets were grouped based on the GC content within the last 5 bases of the terminal 3' end (Fig. 1, right panel). Two out of seven primer sets (28%) with five GCs within the last 5 bp at the 3' end of the primer displayed false-positive amplification. The percentage of false-positive signals decreased with the amount of GC in the 3' ends of the primers. The influence of the 3' end of a primer on the specificity of the amplification has been described previously (31). As a result, many primer design software programs now analyze the 3' ends of potential primers, while Primer3 emphasizes the stability of 5-base segments of the terminal 3' end (37).
It should be noted that the results obtained with the OpenArray nanoliter-volume reactions are comparable to conventional microliter-volume QPCR. Low-volume PCR has been optimized in nanoliter-volume reactions by adjusting surface chemistry (33), ramping rates and decreased annealing temperatures (46), and adjusting the PCR master mix composition with extra Sybr green I, bovine serum albumin, and formamide added to the standard PCR mixture. Cross-platform comparisons between the PCR performed with the microplate format (10- and 20-µl-volume reactions in the 7900HT) and with the OpenArray platform have shown high similarities in PCR efficiencies and detection limits (E. Ortenberg and D. Roberts, unpublished data; 7). The comparison also demonstrated a high correlation (between the two platforms) of specific gene regulation patterns between experimental (diseased heart) and control (normal adult heart) tissues. The PCR efficiency observed with the experiments in this study (described below) also demonstrates the success of PCR primers with the OpenArray environment.
Development of the predictive CT equation.
An empirical equation was developed using the sequence-specific results observed with the primers designed to develop the predictive CT equation. Multiple parameters were considered for the predictive CT equation, including size of the genome, GC content of the genome, and primer binding energy. Since all of the primers were designed with the same theoretical Tm (for simultaneous amplification of all primer sets on the OpenArray plate), the terminal 3' ends of the primers were also considered. This included the binding energy, position of the G and C bases, Tm, and GC content for the terminal 7, 5, 3, and 2 bases on the 3' ends of the primers. Correlations (Table 1; see the supplemental material) were used to determine which parameters had a greater influence on the CT value. The inclusion of parameters other than those chosen either had no effect or decreased the correlation between the predictive and experimental CTs. In addition, using the general linear model requires that all variables are independent; thus, only the top independent parameters were used. For example, parameters such as binding energy are not entirely independent of the Tm of the terminal 7 bases, as a primer with a high binding energy may have a higher Tm on the terminal 3' end. Five parameters were identified that influenced the CT of each primer set. These were (i) the genome size of the targeted organism, (ii) the target organism concentration, (iii) the GC content of the targeted organism, (iv) amplicon length, and (v) the theoretical Tm of the last 7 bases on the primer's 3' end. The levels of influence of these variables on the CT were in the order listed (i.e., the genome size of the targeted organism had the greatest influence, and the Tm of the last 7 bases on the primer's 3' end had the least).
A multiple-parameter linear regression curve was used to place a weighted influence on each of these parameters, and the following equation was developed:
![]() |
A high correlation (R2 = 0.816) between the CT predicted with the predictive CT equation and the experimental CT was observed (Fig. 2). Since the number of gene copies will vary based on the targeted organisms and gene, the predictive CT equation solely predicts starting copies per reaction. The accuracy of the equation was tested and compared with standard curves using validation sample mixtures with the primer set designed to validate the empirical CT equation.
![]() View larger version (11K): [in a new window] |
FIG. 2. Experimental versus predicted CT using predictive CT equation. The results obtained are with the primer designed to establish the predictive CT equation. The equation is derived using amplicon length, starting genomic copies, number of base pairs in the target organism's genome, GC content of the target organism's genome, and theoretical Tm of the last 7 bases of the primer's 3' end. Errors bars represent the standard deviations of experimental CT between replicates on three plates.
|
Validation of an organism's sequence characteristics to determine primer success rate.
Validation sample mixtures were tested with primers designed to validate the predictive CT equation to determine the success rate of novel primers. The influence of primer and targeted organism GC content on the success of the primers was observed (Fig. 3). Results show that the sequence of organisms with extreme GC content (low or high) had lower success rates. A study by Housley et al. (18) observed similar results concerning the influence of GC content and the success of the primers, describing success rates of 56.9% for primers designed to target an amplified region with GC content greater than 50% and 74.2% for primers designed to target an amplified region with GC content less than 50%. Targeted organisms with high GC content frequently tend to give weak signals in amplification (due to secondary structure and template-template annealing), and primers with high GC content will amplify nontargeted regions (due to high stability). In addition, an organism with a low GC content in the genome will have a higher rate of false-negative signals. The number of assays required for confidently determining the presence or absence of an organism is dependent on the success rates of designed primers. Therefore, targets with extreme GC content will require more assays for determining the presence and absence of organisms.
![]() View larger version (68K): [in a new window] |
FIG. 3. Success rate (%) of all assays targeting an organism. Success rate is defined as the sum of all true-positive and -negative assays divided by the sum of all assays targeting a single organism. The success rate was taken as an average of all 36 validation sample mixtures prepared to validate the predictive CT equation.
|
Validation of the predictive CT equation.
The predictive CT equation was examined and compared with standard curves (obtained from dilution experiments performed with primers designed to develop the predictive CT equation) by predicting the starting copy number in the 36 validation samples tested with the primers designed to validate the predictive CT equation. A distribution of the predicted values shows a high distribution of predicted starting copies around 20, 100, and 1,000, which is close to the actual starting copy numbers of 10, 100, and 1,000 spiked into the validation samples (Fig. 4). Comparing the predicted starting copies using the Wilcoxon signed-rank test showed that the standard curve tended to predict higher starting copies than using the predictive CT equation. Using the predictive CT equation made differentiating between assays spiked at 10 and 100 starting copies unclear.
![]() View larger version (19K): [in a new window] |
FIG. 4. Distribution of predicted starting copy numbers using predictive CT equation and standard curves for validation samples and primers designed to validate the predictive CT equation. Note that the x axes of the three panels have different scales. The templates of all the organisms were spiked at either 10, 100, or 1,000 genomic copies per reaction well (indicated by the dotted lines). Error bars represent the standard deviations between three replicates performed on the same OpenArray plates.
|
Inhibitions within various environmental samples may have also influenced quantification. The distribution of PCR efficiencies shifted when targets were spiked into gDNA from different environmental waters (Fig. 5, left panel), as observed by testing primers and samples designed to validate the predictive CT equation. The tertiary effluent background shifted the distribution of PCR efficiency above 1, and the river water shifted the distribution below 1. An analysis of variance showed significant inequality involving the mean PCR efficiencies between the control and the two backgrounds. Approximately 83% of the targeted assays with 1,000 to 10,000 starting copies had a CT standard deviation equal to or less than 0.35, as did 80% of the assays with 100 to 1,000 starting copies, 33% with 10 to 100 starting copies, and 15% with 1 to 10 copies. A CT standard deviation of 0.35 corresponds to a coefficient of variation of 25% for the estimated starting copies (assuming a standard curve slope of –3.3), which has been described in previous reports as a cutoff for acceptable precision in QPCR diagnostics (4, 20, 43). A correlation matrix was used to examine whether primer characteristics influenced assays with high standard deviations (data not shown). Analysis showed that the size of the genome had the highest weighted influence on the standard deviation of the CT, followed by the number of starting copies and the GC content of the target genome. Other studies have observed an influence of GC content on primer success. A study by Vanichanon et al. (47) found reduced repeatability with primer sets with high GC content. Thus, primers with GC content closer to 50% will be ideal for maintaining high reproducibility (10, 21, 23). This fluctuation in efficiency could be due to the designed assays, enzyme instability, and sample-dependent inhibitions (22).
![]() View larger version (14K): [in a new window] |
FIG. 5. Factors potentially influencing quantitative inaccuracies as observed with primers and samples designed to establish the predictive CT equation. In the left panel, a box plot shows the distribution of PCR efficiency for target organisms spiked into gDNA from complex environmental waters and a control (no background). The right panel shows the cumulative frequency distributions of the standard deviations of CT determined between three separate OpenArray plates with various transcript concentrations.
|
Future diagnostics will include both an exogenous internal positive control and a universal assay targeting the 16S rRNA gene to allow normalization for the potential influence of sample inhibitions, plate variation, lot constitutes, and assays on CT (17, 26, 51). Cumulatively, the large distributions observed with predicting starting copies with standard curves along with the cumbersome requirement of validation make using the predictive CT equation an attractive potential alternative. One other alternative strategy to quantify starting copies without the generation of standard curves has been described using competitive PCR with a fluorescence quenching (45). This alternative has the potential to quantify DNA within the presence of inhibitions, as tested with high concentrations of gDNA from pure cultures spiked into humic acid; however, the accuracy and precision of the quantification and the influence of inhibitors were not described with samples containing less than 1,000 starting copies.
Conclusion.
The influences of the sequence characteristics of primers, amplicons, target genome size, amplification conditions, and the matrix in which the target is present on CT were explored. A predictive CT equation was experimentally established by examining PCR-based assays targeting multiple VMGs and may be an efficient alternative to generating standard curves. These results are valuable when developing high-throughput and reliable screening tools for a large number of pathogens without extensive validation.
This work was supported by the National Institutes of Health (grant R01 RR018625-01), Environmental Protection Agency (RD83162801-0 and RD83301001), and 21st Century Michigan Economic Development Corporation (GR-476 PO 085P3000517).
Published ahead of print on 18 April 2008. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»