Previous Article | Next Article ![]()
Applied and Environmental Microbiology, April 2008, p. 2200-2209, Vol. 74, No. 7
0099-2240/08/$08.00+0 doi:10.1128/AEM.01962-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Dieter M. Tourlousse,1,
Robert D. Stedtfeld,1
Samuel W. Baushke,1
Amanda B. Herzog,1
Lukas M. Wick,3
Jean Marie Rouillard,4
Erdogan Gulari,4
James M. Tiedje,2 and
Syed A. Hashsham1,2*
Department of Civil and Environmental Engineering,1 Center for Microbial Ecology,2 National Center for Food Safety and Toxicology, Michigan State University, East Lansing, Michigan 48824,3 Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 481094
Received 27 August 2007/ Accepted 25 January 2008
|
|
|---|
|
|
|---|
One of the first microarrays for the detection of microbial pathogens using multiple VMGs was described by Wilson et al. (66), who used the Affymetrix platform. This microarray was validated for 11 bacterial, five viral, and two eukaryotic pathogens and employed genomic DNA (gDNA) of pathogens spiked in total DNA extracted from filtered air. This study also highlighted the advantage of using highly redundant probe sets to infer presence/absence calls based on the positive fraction (PF). The PF was defined as the number of positive probes for a given target divided by the total number of probes used to detect that target. Setting a sufficiently high threshold for the PF to define a target as present substantially reduces the rate of false-positive calls due to cross-hybridization. When redundant probe sets are used, the reliability of the presence/absence calls is expected to be somewhere between the reliability achieved by PCR-based detection of the VMGs and the reliability achieved by sequencing of the corresponding VMG amplicons. Utilization of redundant probe sets may also result in more efficient validation of newly designed microarrays. This is because the success levels for probe design are generally high and presence/absence calls based on redundant probe sets are not affected by failure of a small number probes within a set.
Among the challenges in the development of pathogen detection microarrays is the limited ability to detect low-abundance target sequences within complex mixtures. Microarrays without target gene amplification only allow detection of microorganisms at a relative abundance of approximately 1 to 5% (4, 11, 26, 50, 67). Obviously, pathogens may be present at levels well below this limit in various matrices. Hence, microarray-based pathogen detection generally relies upon target gene amplification strategies, typically using PCR (6, 7, 35). For conserved genes (e.g., the 16S rRNA gene), amplification using universal or group-specific primer sets in multitemplate PCR has been extensively used to enrich target genes prior to hybridization (12, 13, 16, 17, 24, 36-38, 43, 44, 51, 52, 55, 63, 65). Methods for simultaneous amplification of VMGs, however, are less well developed. Frequently, multiple PCRs or multiplex PCR has been used to amplify multiple VMGs to enhance probe signals and to improve detection limits for genotyping of isolates and pathogen detection in various matrices (1, 2, 8, 10, 20, 25, 31, 32, 45, 54, 61, 62, 66). However, development of robust and sensitive multiplex PCR assays for multiple VMGs still requires careful primer design and significant optimization of the reaction parameters (41). If a large number of VMGs could be simultaneously amplified in a robust manner in complex matrices, VMG-based microarrays could provide the required specificity, sensitivity, and target throughput, all in the same assay, for diagnostic purposes.
The sensitivity of microarrays is also influenced by the strength of probe signals obtained after hybridization. Probes with high target binding affinities may be preferred for detecting low-abundance targets. However, such probes are also more prone to cross-hybridization (23) and may decrease specificity. This trade-off between specificity and sensitivity needs to be assessed experimentally to define optimal probe selection criteria.
The main objective of this study was to design and validate an in situ-synthesized VMG biochip to detect 12 bacterial pathogens. Many of the pathogens included (Aeromonas hydrophila, Helicobacter pylori, Legionella pneumophila, Pseudomonas aeruginosa, Vibrio cholerae, Vibrio parahaemolyticus, and Yersinia enterocolitica) were waterborne. Other pathogens (Clostridium perfringens, Salmonella, Staphylococcus aureus, Campylobacter jejuni, and Listeria monocytogenes) were relevant to clinical diagnostics and food safety. gDNA extracted from tap water, river water, and tertiary effluent from a municipal wastewater treatment plant served as the matrix used to assess sensitivity and specificity after samples were spiked with pathogen gDNA. Target gene amplification using a split multiplex PCR assay allowed detection of pathogens at a relative abundance between 0.1 and 0.01%, depending on the pathogen and the VMG. Up to six VMG amplicons per pathogen and up to 35 probes per VMG amplicon were used to eliminate false-positive calls. The effect of characteristics of probes on their hybridization behavior was also evaluated in order to derive probe design rules yielding the best trade-off between sensitivity and specificity. The described VMG biochip may have applications in diagnostic areas where parallel screening of multiple pathogens with high levels of specificity is critical.
|
|
|---|
G°duplex) ranged from –14.1 to –23.9 kcal/mol; the mean was –18.4 kcal/mol, and the variability (standard deviation) was 2.1 kcal/mol. |
View this table: [in a new window] |
TABLE 1. Overview of the validation resultsa
|
![]() View larger version (20K): [in a new window] |
FIG. 1. Evaluation of 791 targeted and 2,034 nontargeted probes with a composite target mixture containing all 47 VMG amplicons (24 amplicons labeled with Cy3 and 23 amplicons labeled with Cy5). (a) Part of the Xeotron chip after hybridization and washing up to a temperature of 30°C. (b) Distribution of the SNRs for 791 targeted probes (filled bars) and 2,034 nontargeted probes (open bars). (c) Distribution of the PFs for 47 targeted probe sets (filled bars) and 67 nontargeted probe sets (open bars). (d) Success of probe selection for targeted and nontargeted probe sets ( and , respectively). The indicated slopes of the dashed lines are numerically identical to the PFs.
|
Tap water (40 liters), river water (10 liters; Red Cedar River, East Lansing, MI), and tertiary effluent (20 liters; wastewater treatment plant in East Lansing, MI) were sampled and filtered through 0.45-µm nitrocellulose filters (Millipore, Billerica, MA). gDNA was extracted from the filters using a MegaPrep UltraClean soil DNA kit (Mo Bio Laboratories, Carlsbad, CA) according to the manufacturer's instructions. The amount and quality of the extracted DNA were determined with a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE).
Primer design and PCR conditions.
A total of 47 gene-specific primer sets (see Table S1 in the supplemental material) were designed for 35 VMGs flanking gene regions with high probe density. Primers were selected so that they had similar annealing temperatures (one set had an annealing temperature of 53°C, and another set had an annealing temperature of 58°C) and covered most alleles of a given VMG available in the GenBank database. The uniqueness of the primers was confirmed by a BLAST search against the GenBank database. To ensure primer specificity, mismatches with related nontarget sequences were located near the 3' ends of the primers. For multiplex PCR, primers were segregated into five primer combinations, and each combination contained 9 or 10 primer pairs (see Table S2 in the supplemental material). Mixtures of primer sets were selected so that each pathogen (except P. aeruginosa) was targeted in at least two different multiplex PCRs. Primers were synthesized by Integrated DNA Technologies (Coralville, IA).
PCR mixtures (25 µl) consisted of 1x PCR buffer, 2 mM MgCl2, 1.5 U (for monoplex PCR) or 3 U (for multiplex PCR) of AmpliTaq Gold (Roche Molecular Systems, Pleasanton, CA), each deoxynucleoside triphosphate at a concentration of 200 µM (Invitrogen), each primer at a concentration of 500 nM, 200 ng of bovine serum albumin (New England BioLabs, Beverly, MA), and 1 µl of DNA. After the initial enzyme activation at 94°C for 10 min, 35 cycles of the following temperature regimen was used for amplification: denaturation at 94°C for 60 s, annealing at 53 or 58°C for 60 s, and elongation at 72°C for 60 s. This was followed by a final elongation step at 72°C for 7 min. PCR mixtures were purified using a QIAquick PCR purification kit (Qiagen), and the amplification products were quantified using the NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies).
Fluorescent labeling of PCR products.
PCR products pooled from different PCRs were labeled using an aminoallyl dUTP (aa-dUTP) incorporation and cyanine dye coupling protocol as described previously (56, 64), with minor modifications. Briefly, PCR products were amplified, and aa-dUTP was incorporated into the amplification products with a Bioprime DNA labeling kit (Invitrogen, San Diego, CA) using a ratio of aa-dUTP to dTTP of 5:1 and an incubation time of 120 min. Purified products were then coupled with cyanine dye by incubation for 80 min in a 1:1 mixture of 0.1 M sodium carbonate buffer (pH 9.3) and N-hydroxysuccinimide ester cyanine dye (Amersham Biosciences).
Biochip hybridization and melting curve development.
Hybridization and melting profile generation were performed using previously established protocols (56, 64). The method used for melting profile analysis was based on methods used in previous studies utilizing this approach for microarrays using 16S rRNA targets (15, 29, 34, 60). Briefly, after priming of the biochips, target DNA (200 pmol of Cy dye) was hybridized overnight at 20°C using an M-2 hybridization station (Invitrogen [formerly Xeotron Corporation, Houston, TX]) with hybridization buffer containing 35% deionized formamide (Ambion), 6x SSPE (pH 6.6; Invitrogen), and 0.4% Triton X-100 (Sigma) (1x SSPE is 0.18 M NaCl, 10 mM NaH2PO4, and 1 mM EDTA [pH 7.7]). After the chips were washed, the initial point of the melting curve profile was obtained by washing each chip with high-stringency wash buffer (20 mM NaCl, 10 mM Na2PO4, 5 mM Na2EDTA; pH adjusted to 6.6 with HCl) for 1.4 min at 25°C and imaging the chip. High-stringency wash buffer was degassed under a vacuum to prevent formation of air bubbles during the wash steps. Subsequent development of the melting profile was performed by using manual cycles of washing and scanning of the chip that were repeated at 1°C intervals until 60°C was reached. At the end of this procedure, the chip was additionally stripped at 60, 45 and 30°C (2.5 min each) with nuclease-free water (Sigma). The tubing of the hybridization station was washed for 20 min before and after each experiment to prevent carryover between experiments.
Experimental design.
Specificity and sensitivity were evaluated with samples consisting of pathogen gDNA spiked into DNA obtained from different water sources (tap water, river water, and tertiary effluent from a wastewater treatment plant), as has been done in numerous other validation studies (43, 66). A first set of samples was prepared by spiking 10 pg of gDNA of each pathogen (equivalent to approximately 1,400 to 5,500 genome copies depending on the pathogen) into 10 ng of DNA from the different water samples. This yielded a relative abundance of 0.1% for each of the pathogens, calculated as a mass-based percentage of pathogen gDNA in the total community gDNA. Both spiked and unspiked samples were subjected to the split multiplex PCR amplification step. The amplicons obtained from spiked water samples were labeled with Cy3, and the amplicons obtained from unspiked water samples were labeled with Cy5. Equal amounts of the two labeled products (100 pmol of each dye or
2.5 µg of DNA) were mixed and hybridized in duplicate. For the unspiked water samples, 2,034 nontargeted probes were evaluated to examine probe specificity. For the spiked samples, 673 probes were evaluated to examine probe sensitivity. A second set of samples contained pathogen gDNA spiked at a relative abundance of 0.01% into gDNA from river water and at a relative abundance of 0.001% in gDNA from tertiary effluent.
Data acquisition and processing.
Microarrays were scanned with a GenePix 4000B 16-bit laser scanner (Axon instruments, Union City, CA). Fluorescence signal intensities were extracted from the scanned images using GenePix5.0 (Axon Instruments, Union City, CA), which yielded values between 0 and 65,535 arbitrary units. The median of all pixel intensities within a spot was used as the raw spot intensity. Subsequent data analysis was done with Microsoft Excel (Microsoft, Redmond, WA), and plotting was done with SigmaPlot 9.0 (Systat Software, Point Richmond, CA).
Raw spot intensities were divided by the mean signal intensity of 22 empty control spots to obtain signal-to-noise ratios (SNRs) (56). The SNR was computed for each wash step between 30 and 45°C, and the values were subsequently averaged. Averaging of SNRs is equivalent to calculating an SNR based on the area under the melting curve within this temperature interval. The SNR was subsequently divided by the median SNR of the 2,034 nontargeted probes. A probe signal was considered positive if the SNR was greater than 3. The PF was computed by dividing the number of positive signal probes within a set by the number of probes within the set (66).
Gibbs free energy estimation.
The
G°duplex was calculated with the DINAMelt web server (14, 40, 53). This parameter reflects the energy released during probe-target duplex formation and is sequence dependent (53). It provides a thermodynamic measure of the affinity between a probe and a target, with more negative
G°duplex values being indicative of higher binding affinities. For all calculations, perfectly matching duplexes were assumed, and effects of target dangling ends were neglected. The temperature used for the calculations was 43°C instead of the actual hybridization temperature (20°C) to account for the presence of 35% formamide in the hybridization solution (3, 60), and the Na+ concentration used was 1 M. A linear relationship between the G+C content of a probe and its
G°duplex was observed. This relationship was used to convert an estimated
G°duplex into a corresponding G+C content.
|
|
|---|
Overall, of the 791 targeted probes, 720 (91.0%) produced positive signals with SNRs up to 1,000. Of the 2,034 nontargeted probes, 61 (3.0%) yielded positive signals with either the Cy3- or Cy5-labeled amplicons. The signal intensities for targeted and nontargeted probes were well separated, except for a small fraction of the probes (Fig. 1b), indicating the good discriminatory power of the probes. Altogether, 95% of the 5,650 individual probe-target interactions (twice the total number of targeted and nontargeted probes) were the expected interactions in terms of positive/negative probe signals.
The PF was calculated for each probe set (and corresponding amplicon) by dividing the number of positive probes by the number of probes within the set (66). A PF based on the number of initially designed probes is equivalent to the success level of probe selection for individual probe sets. Of the 47 targeted probe sets, 22 displayed a PF of 1, implying that 100% of the designed probes yielded positive signals. Of the remaining 25 sets, 18 displayed a PF between 0.8 to 1 and 7 displayed a PF between 0.6 to 0.8 (Fig. 1c). Of the 67 nontargeted probe sets, 34 displayed a PF of 0 and 31 displayed a PF between 0 and 0.1. The two remaining sets had PFs of 0.125 and 0.3. The success level of probe selection could also be illustrated by plotting the number of designed probes versus the number of positive probes (Fig. 1d). As Fig. 1d shows, the PFs for both targeted and nontargeted probes were not related to the number of designed probes.
Substantial variation in signal intensity was observed among probes for a given VMG amplicon (Fig. 2b). Extensive unevenness in signal intensity among probes for a given target is well documented (5, 44, 46, 56). This variability may be attributed to various factors, including probe and target secondary structure (9, 27, 47), target length (33),
G°duplex (23), and even the position of fluorescent labels (68). Also, sequence dissimilarities between the probes and hybridized targets may have contributed to this variability. When probe sets were sorted according to their median SNRs, a trend toward increasing PF with a higher median SNR was apparent (Fig. 2a). Interestingly, the G+C content of the VMG amplicons displayed an analogous tendency (Fig. 2c). This may also partially explain the overall lower success rate for the probe sets hybridized with the Cy3-labeled amplicons (85.5%) than for the probe sets targeted by the Cy5-labeled amplicons (96.2%). The G+C content for Cy3-labeled amplicons (36.5% ± 5.4%) was, on average, lower than the G+C content for the Cy5-labeled amplicons (48.6% ± 7.9%). These trends imply that probe design for genes or genomes (and hence detection of the corresponding microorganisms) with a considerably lower G+C content may be more challenging when continuous regions with a higher G+C content are not present in the selected gene targets.
![]() View larger version (24K): [in a new window] |
FIG. 2. PFs (a), SNRs (b), and G+C contents (c) for all 47 targeted VMG amplicons. The probe sets are sorted from bottom to top according to increasing median SNR. The boundaries of the boxes in panel b indicate the 25th and the 75th percentiles, and the whiskers indicate the 10th and the 90th percentiles. The median is given as a solid line, and outlying data points are shown as open symbols.
|
G°duplex.
G°duplex) would be the most valuable parameter for this screening (23, 39, 42, 49). Only a weak linear relationship (r2 = 0.34) was observed between
G°duplex and the natural logarithm of signal intensity (see Fig. S2 in the supplemental material). This variability in signal intensity for probes with comparable
G°duplex values precluded evaluation of probe design criteria in terms of signal intensity based on
G°duplex. To determine the relationship between
G°duplex and probe behavior, hybridization patterns were interpreted in terms of positive/negative probe signals, irrespective of their strength. For this analysis, all 791 targeted probes were sorted according to their
G°duplex values and binned, and the percentage of positive probes in each bin was calculated. The latter value is equivalent to the success level for probes with a
G°duplex within the bin range. A drastic decrease in the probe design success rate was observed for probes with a
G°duplex value less negative than –17 kcal/mol (Fig. 3) and a G+C content less than 34.4% (as estimated using the linear relationship between G+C content and
G°duplex). This trend was independent of the window size used for binning of the probes (see Fig. S3 in the supplemental material). A similar analysis with increasing SNR thresholds revealed analogous trends, but the results were shifted toward more negative
G°duplex values (data not shown). The factors contributing to the unevenness in probe signals for a given target may also explain the lack of hybridization signals for probes with a highly negative
G°duplex. In accordance with our observations, Reyes-Lopez et al. (49) previously reported that the proportion of predicted signals observed experimentally increased for 9-mer probes with increasingly negative
G°duplex values.
![]() View larger version (17K): [in a new window] |
FIG. 3. Success of probe design as a function of G°duplex. Each symbol indicates the percentage of positive probes for bins of 40 probes, and the error bars indicate the standard deviations for three replicates. The percentage of positive probes is plotted at the median G°duplex of each bin. The G+C content was derived from the linear relationship between G°duplex and G+C content.
|
G°duplex threshold of –17 kcal/mol could be used as a valuable design rule for future oligonucleotide probe design. Although this criterion was demonstrated only for 18-mers in this study, a similar approach to assess the effect of
G°duplex on the success level of probe design could be adopted for longer probes. The proposed
G°duplex design threshold is in accordance with the findings of Loy et al. (38), who proposed a value of –16 kcal/mol for 18-mers targeting the 16S rRNA gene. This value was suggested based on the observation that the majority of probes displaying positive signals (including cross-hybridization signals) were attributed to probe-target duplexes with a
G°duplex more negative than –16 kcal/mol. It should be noted that small deviations from these
G°duplex design thresholds may be observed for hybridizations performed under different experimental conditions and/or for theoretical
G°duplex estimates obtained using other nearest-neighbor model parameters (38). Finally, the
G°duplex criterion was derived from hybridization patterns with high-abundance and low-complexity target mixtures. For target mixtures having very different compositions in terms of target sequence abundance and diversity, these rules should be applied with caution.
Multiplex amplification of VMG targets.
A split multiplex PCR assay was developed to amplify all VMG amplicons (see Table S2 in the supplemental material). A gel image of the amplification products obtained with a mixture of gDNA from all 12 pathogens is shown in Fig. S1 in the supplemental material. Of the 720 probes displaying positive signals after hybridization of monoplex PCR amplicons, 673 (93.5%) yielded positive signals after hybridization of multiplex PCR amplicons (Table 1). For 28 of the 47 probe sets, all probes yielded positive signals. In general, the probes that displayed negative signals with the multiplex PCR amplicons also yielded low signals with the monoplex PCR amplicons (data not shown). For the ctxB gene of V. cholerae, only 2 of 20 probes yielded positive hybridization signals after hybridization of amplicons generated by multiplex PCR. This was attributed to poor amplification in the multiplex PCR rather than to low probe hybridization efficiency. With monoplex PCR, an amplification product of the expected length was observed, verifying that the designed primers successfully amplified the ctxB gene of V. cholerae ATCC 39315. In addition, all probes targeting the ctxB gene yielded positive signals after hybridization of the ctxB gene amplicon generated by monoplex PCR. The 27 targeted probes (3.9%, excluding probes targeting the ctxB gene) displaying negative signals were distributed among multiple VMG amplicons, and hence redesign or further optimization of the multiplex PCR assays was expected to be ineffective. In addition, the low levels of the signals of these probes with the monoplex PCR amplicons suggested that their sensitivity may be limited; therefore, these probes were masked during further analysis.
Performance of the VMG biochip with spiked water samples.
The performance of the VMG biochip probe sets was further tested with samples containing pathogen gDNA spiked into gDNA extracted from three different water samples: tap water, tertiary effluent from a wastewater treatment plant, and river water. For pathogens spiked at a relative abundance level of 0.1%, the median PFs for 46 targeted VMG amplicons were 0.79, 0.87, and 0.75 for tap water, tertiary effluent, and river water, respectively (Fig. 4). The median PFs for the 67 other probe sets were 0.05 for tap water, 0.09 for tertiary effluent, and 0.17 for river water (Fig. 4). By applying a PF threshold of 0.5, the numbers of spiked VMG amplicons assigned as present were 39 for tap water (85%), 40 for tertiary effluent (87%), and 39 for river water (85%). For the nontargeted probe sets, the PF was always less than 0.5, and consequently none of the corresponding genes were identified as present in the water samples. Probe sets targeting S. aureus VMG amplicons indicated the presence of this pathogen in all three spiked water samples, although the VMG amplicons assigned as present were in disagreement among samples. The VMG amplicons of C. perfringens could not be detected in any of the spiked water samples.
![]() View larger version (14K): [in a new window] |
FIG. 4. Performance of the VMG biochip with gDNA extracted from various water samples. DNA (10 pg) of each pathogen was spiked into 10 ng of background gDNA, yielding a relative abundance for each pathogen of 0.1%. For spiked samples, 673 probes (46 VMG amplicons) were analyzed. For unspiked samples, 2,034 probes (67 VMG amplicons) were analyzed. The box plots indicate the distribution of the PF among the VMG amplicons. The dashed line at a PF of 0.5 indicates the selected threshold for presence/absence calls. The boundaries of the boxes indicate the 25th and the 75th percentiles, and the whiskers indicate the 10th and the 90th percentiles. The median is given as a solid line, the mean is shown as a dashed line, and outlying data points are shown as open symbols.
|
The PF threshold applied in this study was less stringent than the cutoff values (at least 0.8) used by previous researchers (13, 66). Both target gene sequence diversity and abundance affect the selection of an optimal threshold for the PF. Uncharacterized sequence diversity within the target gene may lead to reduced PFs due to the increased potential for mismatching probe-target duplexes. Low target gene abundance may also yield decreased PFs due to variability in probe sensitivity. In both cases, a lower threshold PF needs to be used to reduce false-negative calls. Although the concept of redundant probe sets for enhanced reliability in presence/absence calls has been exploited widely (1, 12, 31, 32, 37, 38, 54, 61, 66), the effect of target gene abundance on PFs is less well documented. As demonstrated in this study and by Wilson et al. (66), lower PFs are expected with decreasing target gene amounts, and the extent of the decrease may vary among probe sets.
The use of mismatch probes is a common strategy to identify cross-hybridization signals on microarrays and to enhance the reliability of presence/absence calls (65, 66). In this study, excellent specificity was observed without inclusion of mismatch probes, even for environmental samples. A combination of various factors, including the use of redundant probe sets (with an average of 15 probes per VMG amplicon after experimental screening), the use of short probes with increased discriminatory power compared to long probes, reduction of sample complexity by target gene enrichment using multiplex PCR, and use of the melting curve approach, contributed to the elimination of false-positive calls.
Probe selectivity as a function of
G°duplex.
For selective detection, probes should yield high hybridization signals with intended target sequences and low signals with nontarget sequences. However, due to the biochemical nature of the probe-target hybridization process, these two criteria cannot be optimized independently. This translates into a trade-off between probe specificity and sensitivity that must be considered during probe selection. Based on our analysis shown in Fig. 3,
G°duplex was selected as the probe design parameter quantifying the trade-off between probe specificity and sensitivity. Probe sensitivity was derived from the hybridization patterns of the targeted probes, while probe specificity was quantified based on the hybridization patterns of the nontargeted probes. The hybridization results with the unspiked tap water sample were omitted from this analysis due the minimal amount of nontargeted probes with positive signals (Fig. 4). The hybridization results for all three water samples spiked with pathogen gDNA at a relative abundance of 0.1% were included in the analysis.
Both targeted and nontargeted probes were sorted according to their
G°duplex values and binned, and the percentage of positive probes was computed for each bin. After the percentage of positive probes was plotted as a function of
G°duplex, two distinct regions were apparent (Fig. 5b), and this effect was independent of the window size used for probe binning (see Fig. S4 in the supplemental material). In region 1 (
G°duplex more negative than –19.3 kcal/mol), 93.8% ± 2.8% of the targeted probes displayed positive signals, independent of the
G°duplex. The percentage of nontargeted probes with positive signals was influenced more by
G°duplex and increased from 14.7% for probes with a
G°duplex of –19.9 kcal/mol to 42.7% for probes with a
G°duplex of –23.2 kcal/mol (average increase, 6.8% per
G°duplex). In region 2 (
G°duplex less negative than –19.3 kcal/mol), a rapid decrease in the percentage of positive targeted probes was observed, and only 31.3% of the probes yielded positive signals for probes with a
G°duplex of –15.6 kcal/mol (average decrease, –16.1% per
G°duplex). The percentage of nontargeted probes yielding a positive signal was significantly lower in this region (average, 9.2% ± 3.9%).
![]() View larger version (18K): [in a new window] |
FIG. 5. Derivation of optimal probe design criteria. (a) Selectivity, expressed as the difference in the percentage of positive probes for targeted and nontargeted probes, as a function of G°duplex. (b) Percentage of positive probes as a function of G°duplex for targeted and nontargeted probes ( and , respectively). Each symbol indicates the percentage of positive probes for bins of 40 targeted and 80 nontargeted probes, and the error bars indicate the standard deviations for samples and replicates (n = 6 for targeted probes and n = 4 for nontargeted probes). The G+C content was derived from the linear relationship between G°duplex and G+C content.
|
G°duplex, was estimated based on the linear regression lines in Fig. 5b. The highest selectivity (
80%) was observed for probes with a
G°duplex of –19.3 kcal/mol and a G+C content of 47.2% (Fig. 5a). Thus, probes with a
G°duplex of –19.3 kcal/mol provide the best trade-off between sensitivity and specificity. Probes with a
G°duplex deviating from this optimum displayed lower selectivity. The selectivity was more than 70% for probes with a
G°duplex between –18.6 and –21.1 kcal/mol, which corresponds to a G+C content between 42.3 and 56.1% (Fig. 5a). Interestingly, the decrease in selectivity in region 2 was approximately twofold greater than that in region 1 (–6.6% per
G°duplex for region 1 and –14.0% per
G°duplex for region 2). This difference was attributed to the larger decrease in the percentage of positive targeted probes in region 2 than increase in the percentage of positive nontargeted probes in region 1 (Fig. 5b). It should be noted that for DNA samples containing more complex nontargeted sequences, the increase in the percentage of positive nontargeted probes (i.e., cross-hybridization) for increasingly negative
G°duplex values may be more pronounced, while for more abundant target sequences the decrease in the percentage of positive targeted probes for decreasingly negative
G°duplex values may be suppressed.
A number of studies have demonstrated the effectiveness of
G°duplex as a theoretical probe selection parameter for both long and short probes and RNA or DNA targets (22, 30, 38, 39, 42, 49, 50, 57, 58). In contrast, Pozhitkov et al. (46) recently suggested that all theoretical (thermodynamics-based) screening of oligonucleotide probes should be omitted due to poor correlations between
G°duplex (and
G° for intra- and intermolecular self-structures) and experimentally observed signal strengths for rRNA targets. Similarly, only a weak linear correlation was observed between probe signal intensity and
G°duplex (see Fig. S2 in the supplemental material). This indicates that hybridization signals cannot be accurately predicted with current thermodynamics models derived from hybridizations in solution. However, thermodynamic indices may still be indicative of probe behavior by reflecting the probability that a probe will yield a signal intensity higher than a given threshold. Based on the effect of
G°duplex on probe selectivity (Fig. 5a), an optimal
G°duplex of –19.3 kcal/mol is suggested for 18-mer oligonucleotide probes under the described hybridization conditions. These probes should yield the highest confidence in presence/absence calls for a given number of probes per target gene. For probes with either more or less negative
G°duplex values with lower selectivity, more probes need to be designed for a given target to achieve high confidence in presence/absence calls. In general, probes with a more negative
G°duplex displayed better sensitivity but poorer specificity. Finally, it is interesting that the effect of
G°duplex on the trade-off between probe sensitivity and specificity is analogous to the effect of probe length. In general, long probes (50- to 70-mers) are more sensitive than short probes (15- to 30-mers) but display lower specificity (28, 48).
Cost and flexibility in microarray probe synthesis are often recognized as two of the major limiting factors in the use of microarray technology for diagnostic purposes (6, 21). In this study, a maskless light-directed in situ microarray synthesis technology developed at the University of Michigan was employed (19). This technology employs a digital micromirror device to generate preprogrammed light patterns on the chip surface, triggering deprotection of the 5'-hydroxyl group in conventional phosphoramidite monomers. Synthesis of oligonucleotides using this chemistry provides high fidelity and a stepwise yield and also allows synthesis of probes that are up 100 nucleotides long. In addition, synthesis of new chips is low cost, rapid, and flexible and involves simply uploading a list of probe sequences in the optical unit. Other advantages of in situ probe synthesis include high spot uniformity and probe molecule density. In addition, continuous recycling of the hybridization solution in the microfluidic chips used in this study provides increased signal uniformity within a spot and increased reproducibility of the hybridization signals. The higher cost per chip compared to conventional glass slides is the major limitation of this platform (21). The ability to rapidly synthesize biochips with updated and reiterated probe sets is considered critical for diagnostic purposes as new gene sequence information is appearing almost daily and this information should be incorporated in the probe selection exercise and data analysis as soon as it becomes available.
Conclusions.
In conclusion, we developed and validated a coupled format for multiplex PCR and an in situ-synthesized DNA biochip for detection of 12 bacterial pathogens in various water samples. By using redundant probe sets targeting various VMGs and inferring presence/absence calls based on the PF, false-positive calls were eliminated. Pathogens could be detected at a relative abundance of 0.1 to 0.01%, depending on the pathogen. Analysis of the hybridization patterns also showed that probes with a
G°duplex provided the best trade-off between sensitivity and specificity. In future studies, the VMG biochip will be applied to additional environmental samples, and the presence/absence calls will be verified independently using real-time PCR.
Published ahead of print on 1 February 2008. ![]()
S.M.M. and D.M.T. contributed equally to this study. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»