Previous Article | Next Article ![]()
Applied and Environmental Microbiology, May 2003, p. 2848-2856, Vol. 69, No. 5
0099-2240/03/$08.00+0 DOI: 10.1128/AEM.69.5.2848-2856.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Said El Fantroussi,1,
Hauke Smidt,1,
James C. Smoot,1 Erik H. Tribou,1 John J. Kelly,2 Peter A. Noble,1 and David A. Stahl1*
Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington 98195,1 Department of Biology, Loyola University Chicago, Chicago, Illinois 606262
Received 26 August 2002/ Accepted 2 January 2003
|
|
|---|
|
|
|---|
In general, oligonucleotide DNA microarrays containing 15- to 25-mer oligonucleotide probes provide greater discrimination than microarrays composed of larger PCR-amplified DNA fragments. However, a central challenge to the application of DNA microarrays in environmental microbiology is achieving the specificity needed to resolve complex microbial populations, including discriminating between target and nontarget populations that differ by a single nucleotide (10). This level of specificity is needed to resolve variants of highly conserved genes (e.g., those encoding the rRNAs) and to distinguish between closely related target and nontarget microorganisms.
In conventional hybridization assays, single-base-pair discrimination is achieved by adjusting the hybridization conditions (e.g., temperature, ionic strength, or formamide concentration) or washing conditions (dissociation) of the probe-target duplex (31). In DNA microarray assays, however, this approach is difficult to use since one set of hybridization and wash conditions does not provide optimal target discrimination for all probes on the microarray. We therefore have developed an alternative approach that uses differences in thermal dissociation rates of probe-target duplexes to resolve matched and mismatched probe-target duplexes (13, 25).
The oligonucleotide DNA microarray used in this study is a variant of the more conventional format (15, 22, 29). Rather than being directly attached to glass, the probes are immobilized in three-dimensional polyacrylamide gel pads affixed to the glass (2, 6, 9, 10, 12, 13, 18, 24-26, 30). The gel pads provide a format suitable for the determination of equilibrium and nonequilibrium dissociation kinetics (i.e., melting profiles) of a large number of probe-target duplexes and for determining the dissociation temperature (Td), the temperature at which 50% of the duplexes remain during a specified wash period (13, 25). In this study, we used nonequilibrium dissociation kinetics to derive the optimum washing temperature for each probe, providing for maximum discrimination between target RNA or target DNA and all possible single-nucleotide-mismatch variants.
|
|
|---|
Oligonucleotide probe design and synthesis.
A 19-mer oligonucleotide probe (S-G-Staphy-0747-a-A-19) targeting Staphylococcus 16S rRNA was designed as described previously (25). An 18-mer oligonucleotide probe (S-*-Nsom-0653-a-A-18) targeting halotolerant and obligate halophilic Nitrosomonas (27) was used for the Nitrosomonas target. These probes were complemented by a set of probes having all possible single-mismatch variants at each position (Table 1). Probes having two to five mismatches were also incorporated on the microarray. All probes were synthesized with an amino linker at the 3' terminus as described previously (26).
|
View this table: [in a new window] |
TABLE 1. Oligonucleotide probes used in this study and their corresponding Tds
|
Microarray hybridization.
Hybridizations were conducted at room temperature (20°C) for 12 h in a hybridization chamber affixed to the surface of the glass slide (Grace BioLabs, Bend, Oreg.) containing 40 µl of hybridization buffer (0.9 M NaCl, 20 mM Tris-HCl [pH 8.0], 40% formamide) and 1 µl of Cy3-labeled target nucleic acids (each at 25 ng/µl). Following hybridization, the microarray was briefly washed three times at room temperature with 100 µl of wash buffer (20 mM Tris-HCl [pH 8.0], 5 mM EDTA, 4 mM NaCl). After the final wash, 100 µl of wash buffer was added to the wash chamber (Grace BioLabs) for fluorescence monitoring. Image analysis was performed by using a custom-designed fluorescence microscope (State Optical Institute, St. Petersburg, Russia) equipped with a cooled charge-coupled device camera (Princeton Instruments, Trenton, N.J.). Preliminary experiments revealed that there was no cross-hybridization of any probe-target duplexes when both target sequences were used (data not shown). Four microarray slides were used repeatedly in this study.
Generation of melting profiles.
To generate melting profiles, the microarray was fixed to a thermal table mounted on the stage of the microscope. The thermal table was connected to a thermoelectric temperature controller (LFI-3751; Wavelength Electronics, Inc., Bozeman, Mont.) and a water bath (Cole Parmer Instruments Co., Chicago, Ill.). Melting profiles for all gel pads were generated by gradually increasing the temperature (1°C/min) of the thermal table from 20 to 70°C and recording the fluorescence signal intensity of the gel pads at 2°C intervals. Temperature, data acquisition, image processing, and analysis were controlled with custom software written in LabVIEW (version 5.1; National Instruments Co., Austin, Tex.). The signal intensity of each melting profile was normalized, and the Td was calculated by using Td-calculator (http://stahl.ce.washington.edu) as described previously (25). Obtained Tds are listed in Table 1. Hybridization and melting profile analyses were repeated five times for both DNA and RNA targets.
DI.
The optimum wash temperature, defined as that providing maximum discrimination between perfect-match duplexes and those containing mismatches, is generally determined empirically. To refine and systematize the determination of an optimum wash temperature, we introduced a discrimination index (DI). The DI for a specific wash temperature was determined by the following equation: DItemperature = (pmtemperature/mmtemperature)x(pmtemperature - mmtemperature), where pmtemperature is the average signal intensity of perfect-match duplexes at a specific wash temperature and mmtemperature is the average signal intensity of mismatched duplexes, excluding those duplexes which have terminal and next-to-terminal mismatches.
Data for the NN.
The input data set consisted of signal intensity (melting) profiles, with each input record consisting of a single profile of either a perfect-match duplex, a duplex with a mismatch in the ultimate or penultimate position, or a duplex with an internal mismatch. The output data set consisted of one categorical variable that was coded 0 if the corresponding record was a perfect-match duplex, 1 if the duplex had a mismatch in the ultimate or penultimate position, or 2 if the duplex had an internal mismatch. Prior to neural network (NN) analyses, the data were all normalized to have a mean of 0 and a standard deviation (SD) of 1.
NN software and analyses.
The NN software was custom designed by using Java software and was based on the "leave one input out" cross-validation model (3). Rather than leave one input out, we modified the model to use one input (e.g., single intensity values at a specific temperature) to predict a categorical output (e.g., a perfect-match duplex, a duplex with a mismatch in the ultimate or penultimate position, or a duplex with an internal mismatch). We chose this approach because it was difficult to measure the importance of inputs that are statistically dependent (i.e., signal intensities within the same melting profile are highly correlated to one another). The software is available at a World Wide Web-based interface at http://stahl.ce.washington.edu under the heading "Tools for data analyses."
The network architecture consisted of one input layer, one hidden layer, and one output layer. Neurons in the hidden layer used a hyperbolic tangent activation function, while the neuron in the output layer used a standard purely linear activation function (11). All neurons included a bias term. The Levenberg-Marquardt algorithm was used for training the NN rather than standard back-propagation and conjugate gradient methods because preliminary results showed that the Levenberg-Marquardt algorithm was superior in terms of the number of iterations needed to reach the error minima (11). Since preliminary analysis revealed that the minimum number of hidden neurons needed to produce the highest R2 results was two, only two hidden neurons were used for all NN analyses. A standard least-squares error function was used for training the NN since this function could be easily converted to R2 values.
It should be noted that our method does not produce generalizable NNs since our specific objective was to identify with which inputs the NN learned best. Therefore, no data were used for testing or validation purposes. The NN was deemed to have reached minima (and consequently training was stopped) when the R2 did not increase by more than 0.001 U over a period of 10 s (i.e., approximately 200 megaflops).
For NN analyses, we generated an independent NN for each individual input. If one NN performed better with one input rather than another (i.e., it had a higher R2 value), the input having the better prediction was assumed to be more important. It is essential to recognize that this approach does not provide information on the optimal subsets of inputs but rather identifies which inputs are most important for predicting outputs when presented independently. Since some NNs do not train properly because they reach local minima of their error space, a median of 11 NN runs was conducted for each input. We chose the median rather than the mean since the median minimizes local-minimum effects.
|
|
|---|
![]() View larger version (60K): [in a new window] |
FIG. 1. Typical image of a DNA microarray after hybridization with DNA target sequences (A) and the locations of the oligonucleotide probes (B). Probe labels are as in Table 1. Hybridization and wash conditions are described in the text. Exposure time was 1.0 s. White boxes (A) indicate probes that did not yield detectable fluorescence signals after the wash at 20°C.
|
![]() View larger version (22K): [in a new window] |
FIG. 2. Typical normalized melting profiles of DNA-DNA duplexes of Staphylococcus. S0, perfect-match duplex; S30, single-base-pair-mismatched duplex containing a tt mismatch (probe-target) at position 10 from the 5' terminus; S59, double-base-pair-mismatched duplex containing cc and gg mismatches at positions 3 and 4, respectively. Error bars, SDs of the data (S0, n = 10; S30, n = 5; S59, n = 4).
|
In this study we evaluated the inclusion of signal intensity data to optimize discrimination among perfect-match and mismatched probe-target duplexes. Considering intensity data alone, an optimum corresponds to hybridization and washing conditions at which the signal intensity of mismatches reaches (or approaches) background and the perfect-match duplex maintains a detectable signal. Often these conditions are determined empirically, as represented by Fig. 3. This figure shows the signal intensity for each probe duplex (color) measured at 2°C increments during the thermal dissociation. An empirical estimate of the optimum wash temperature for each probe-target duplex is shown (left section of each panel), and the corresponding intensity data are shown in the adjacent section. For perfect-match duplexes, signal intensities at each empirically defined optimum were approximately 20% of the initial signal intensities. For example, the signal intensity of the perfect-match DNA-DNA duplex of Staphylococcus was 1.11 U at 20°C, while the signal intensity was 0.16 U at the empirically determined optimal wash temperature (52°C) (Fig. 3A, right section). These intensity measurements corresponded to those achieved in a separate experiment in which the microarray was washed at the identified temperature optimum (Fig. 4). However, it was not possible to fully resolve perfect-match probe-target duplexes and those with mismatches at the ultimate or penultimate position. These results were in accordance with the conclusion derived from Td analysis reported previously (25).
![]() View larger version (52K): [in a new window] |
FIG. 3. Signal intensity profile of probe-target duplexes with temperature gradient (color sections; A.U., arbitrary units of fluorescence intensities) and signal intensities at empirically determined optimum wash temperatures (bars). Red triangles, optimum wash temperatures. (A and B) Staphylococcus target DNA (A) and target RNA (B); (C and D) Nitrosomonas target DNA (C) and target RNA (D). Data represent the mean signal intensities of five melting profile analyses and SDs (error bars).
|
![]() View larger version (83K): [in a new window] |
FIG. 4. Typical images of DNA microarrays washed at optimum temperatures. Melting profiling was terminated at the empirically determined optimum wash temperature. The optimum wash temperatures (shown in Fig. 3) were 52°C for the Staphylococcus DNA probe-target duplexes (A) and 50°C for the Nitrosomonas DNA probe-target duplexes (B). pm, perfect-match probes for Staphylococcus target (S0 in Fig. 1B) or the Nitrosomonas target (N0 in Fig. 1B); 1st, probes define ultimate position; 2nd, probes define penultimate position.
|
To refine and systematize the above-described empirical approach, we introduced a DI, which is calculated by the formula given in Materials and Methods. This index is defined as the product of difference and ratio of the signal intensities for perfect-match and mismatched duplexes at a given wash temperature. The temperature with the maximum DI is defined as the optimum wash temperature. As shown in Fig. 5, optimum wash temperatures were calculated by using the DI from the signal intensity profiles in the range of 20 to 64°C (Fig. 3). For hybridization with the Staphylococcus target, DI-based and empirically determined optimum wash temperatures for DNA-DNA and RNA-DNA hybridizations were identical (Fig. 3 and 5). For Nitrosomonas, DI-based and empirically determined optimum wash temperatures were within 2°C of each other (i.e., triangle and peak DI values occur at around the same temperature in Fig. 5), suggesting a reasonable match between DI-based prediction and the empirical determination.
![]() View larger version (44K): [in a new window] |
FIG. 5. Inferred optimum wash temperatures for the discrimination of perfect and mismatched duplexes. (A and B) Staphylococcus target DNA (sDNA; A) and target RNA (sRNA; B); (C and D) Nitrosomonas target DNA (nDNA; C) and target RNA (nRNA; D). DI was calculated by using the formula given in Materials and Methods. Triangles, temperatures empirically inferred from melting profiles. Light gray zones, temperature intervals allowing for mismatch discrimination as deduced from NN analysis using all data sets (R2 > 0.7); dark gray zones, temperature intervals deduced from NN analysis using data sets excluding data from ultimate and penultimate positions (R2 > 0.9).
|
The application of NNs to the analysis of complex data in microbiology is relatively new (1). NNs have been used to identify the restriction enzyme profiles for E. coli O156:H7 (4), the pyrolysis mass spectra for Mycobacterium tuberculosis complex species (7), bacterial species from randomly amplified polymorphic DNA patterns (14), fatty acid profiles of microbial communities (17), stable low-molecular-weight rRNA from gel electrophoresis patterns (16), and Td from microarray data (25). However, to our knowledge, no study has used the method outlined in this paper to determine the relative importance of inputs to outputs.
In conclusion, our studies have established an analytical approach to achieving optimum discrimination between target and nontarget duplex structures. Although this objective is important in any application of DNA microarrays to sequence analysis (e.g., identification of point mutations), we note that the application of microarrays to environmental systems must consider a larger and uncharacterized diversity of sequences. Since the character and position of nontarget mismatches in an environmental sample are not known in advance, it is essential that conditions for optimum discrimination be generally defined. Continuing studies are evaluating the number and composition of mismatch probes required to implement the proposed optimization approach in standard applications and possible deviations from model predictions using rRNA derived from natural samples. The melting profiles obtained for this subset of mismatch probes would be used to calculate the maximum DI for each probe. More generally, our results further support the utility of melting profiles for achieving optimum resolution of microarray hybridization data.
This work was support by grants from the DARPA (DABT63-99-1-0009) and NASA (NAG9-1271) to D.A.S., from the NSF (DEB-0088879) and the EPA (R-82945801) to P.A.N., and from the DARPA to J.J.K.
Present address: National Institute for Environmental Studies, Tsukuba, Ibaraki 305-8606, Japan. ![]()
Present address: Unit of Bioengineering, University of Louvain, B-1348 Louvain-la-Neuve, Belgium. ![]()
Present address: Wageningen University, 6703 CT Wageningen, The Netherlands. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2010 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»