Previous Article | Next Article ![]()
Applied and Environmental Microbiology, May 2008, p. 2957-2966, Vol. 74, No. 10
0099-2240/08/$08.00+0 doi:10.1128/AEM.02536-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Institute for Environmental Genomics, Department of Botany and Microbiology, University of Oklahoma, Norman, Oklahoma 73019
Received 10 November 2007/ Accepted 6 March 2008
|
|
|---|
|
|
|---|
The signal-to-noise ratio (SNR) has been used to define a positive spot, and two general methods are currently used to calculate SNR values. One is to use the ratio of the differences between the signal mean and background noise divided by the background standard deviation (2). This calculation method has been commonly used in many signal-processing disciplines, such as radio, electronics, and imaging (2, 30), and the threshold is usually set to 3.0 (30). The other method is to use the ratio of the signal median divided by the background median with the threshold set to 1.50 (26), and it was modified to calculate the SNR for a probe with replicate spots and to set a threshold of 2.0 (18, 19). However, the determination of these thresholds is arbitrary and has not been experimentally validated. Although the background standard deviation of pixel intensities for each spot is included in the first calculation method, the signal standard deviation is not considered in either of the two SNR calculation methods. In addition, an SNR threshold may vary with different types of targets, target compositions, and hybridization conditions, and hence, it could be difficult to set a universal SNR threshold. Therefore, new SNR calculation methods that include both signal and background standard deviations and experimental evaluations of SNR thresholds are needed.
The objectives of this study were to (i) evaluate a new method for SNR calculation, (ii) determine appropriate SNR thresholds for differentiating signals from noise based on different SNR calculation methods, and (iii) examine the effects of target types, background DNA, and target compositions on the threshold determination. Our results demonstrated that our new calculation performed better than two other existing calculations and that SNR thresholds were affected by the hybridization stringency, types of target templates, background DNAs, and compositions of the target templates. The results provide general guidance for users to select appropriate SNR thresholds under different conditions.
|
|
|---|
Target template preparations.
Four 70-mer artificial targets (T1-SO1679, T2-SO1744, T3-SO2680, and T4-SO0848) that were complementary to the 70-mer PM probes were synthesized by the Molecular Structure Facility at Michigan State University (East Lansing, MI). The artificial oligonucleotide targets were labeled at the 5' ends with Cy5 (T1-SO1679, T2-SO1744, and T3-SO2680) or Cy3 (T4-SO0848) fluorescent dye during synthesis. The 70-mer oligonucleotide targets also contained the sequences of the 50-mer oligonucleotide targets.
Gene-specific primers were chosen for the four selected genes (see Table S1 in the supplemental material), with each PCR product about 500 bp, covering both 50-mer and 70-mer probe sequences. Each gene was amplified with S. oneidensis MR-1 genomic DNA (gDNA) as a template using the standard PCR amplification protocol. The amplified PCR products were purified using the Qiaquick PCR purification kit (Qiagen Inc., California) according to the protocol of the manufacturer. The purified PCR fragments were visualized, and the sizes via were checked by agarose gel electrophoresis, and then the fragments were quantified using the PicoGreen dsDNA Assay Kit (Invitrogen, California).
Genomic DNAs from four bacteria were also used as target DNAs. S. oneidensis MR-1, Escherichia coli S17, and Pseudomonas sp. strain G179 were grown in LB medium to stationary phase, and Desulfovibrio vulgaris Hildenborough was grown in the standard lactate and sulfate (LS) medium (20a). The cells were collected by centrifugation at 4,000 x g at room temperature for 10 min. Their gDNAs were isolated and purified as described previously (34). Methanococcus maripludis gDNA was provided by Sergey Stolyar at the University of Washington (Seattle). The yeast Saccharomyces cerevisiae was grown in yeast-peptone-dextrose medium to saturation, and its gDNA was extracted using the glass bead method as described by Hoffman and Winston (15).
To test how bacterial ratios affect the determination of SNRs, S. oneidensis MR-1 gDNA was mixed with four other bacterial gDNAs (D. vulgaris Hildenborough, E. coli S17, Pseudomonas sp. strain G179, and M. maripludis) at three different ratios: A (10 [S. oneidensis MR-1]:1:1:1:1), B (1 [S. oneidensis MR-1]:1:1:1:1), and C (1 [S. oneidensis MR-1]:10:10:10:10). Each sample had the same amount of total gDNA (2.5 µg).
Probe labeling, microarray hybridization, and image quantification.
PCR amplicons, the purified gDNAs from pure cultures (500 ng), and mixed gDNAs (2.5 µg) were fluorescently labeled by random priming using the Klenow fragment of DNA polymerase (12). Mixture I (35 µl), containing certain amounts (as indicated for different experiments) of gDNA and 20 µl of random primers (Invitrogen, California), was heated at 98°C for 3 to 5 min, cooled on ice, and then centrifuged. Mixture II (15 µl), containing 1 µl of 5 mM dATP, dGTP, and dTTP and 2.5 mM dCTP, 2 µl (80 U) of Klenow (Invitrogen, CA), and 0.5 µl of Cy3 dye (Amersham BioSciences, United Kingdom), was added to mixture I. A total of 50 µl labeling-reaction solution was incubated for 3 h at 42°C. The labeling reaction was terminated by heating the solution at 98°C for 3 min. The tubes were removed and placed on ice. The labeled cDNA targets were purified immediately using a QIAquick PCR purification column and concentrated in a Savant Speedvac centrifuge (Savant Instruments Inc., Holbrook, NY).
The labeled PCR amplicons or gDNAs were resuspended in 25 µl of hybridization solution containing 50% formamide, 5x saline-sodium citrate (SSC) (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate), 0.1% sodium dodecyl sulfate (SDS), and 0.1 mg/ml of herring sperm DNA (Invitrogen, California). The hybridization solution was incubated at 95 to 98°C for 5 min, centrifuged to collect condensation, and kept at 50°C. The solution was immediately applied to the microarray slide, and hybridization was carried out in a waterproof Corning hybridization chamber (Corning Life Science, New York) submerged in a 45°C water bath in the dark for 16 h (12). Washing was performed immediately in the following steps: (i) in a solution containing 2x SSC and 0.1% SDS at 40°C for 5 min, repeated once; (ii) in a solution containing 0.1x SSC and 0.1% SDS at room temperature for 10 min, repeated once; and (iii) in 0.1x SSC at room temperature for 2 min, repeated once. The slides were dried with compressed air prior to being scanned. The same batch slides and the same settings were used for all experiments. The laser power was set to 95%, and photomultiplier tube efficiency was set to 70%. Five slides (with four replicated spots on each slide) were used for each condition, and hence, each spot had up to 20 data points. The hybridized microarray slides were scanned using a ScanArray Express microarray analysis system (Perkin Elmer, Massachusetts). The spot signals, spot quality, and background fluorescence intensities of scanned images were quantified with ImaGene version 6.0 (Biodiscovery Inc., Los Angeles, CA).
Data analysis.
Data analysis included four major steps.
(i) Defining positive and negative spot pools.
Microarray detection mainly depends on probe specificity and hybridization stringency (e.g., temperature), and two levels of stringency were used in this study. High-level stringency is expected to eliminate cross-hybridization for the probes with a higher probe-target similarity, a longer continuous stretch length, and a lower free energy. At both stringencies, positive and negative pools were defined (see Tables S2 and S3 in the supplemental material). At high stringency, a positive 50-mer probe had a sequence identity of >90%, a stretch length of >20, and free energy of <–35 kcal/mol with its nontargets, and a negative probe had a sequence identity of
90%, a stretch length of
20, and free energy of
–35 kcal/mol with its nontargets. Our previous experimental results showed that such high-stringency hybridization could be achieved at 50°C with 50% formamide (17). Similarly, a positive 70-mer probe had a sequence identity of >90%, a stretch length of >25, and free energy of <–50 kcal/mol with its nontargets, and a negative probe had a sequence identity of
90%, a stretch length of
25, and free energy of
–50 kcal/mol with its nontargets. At low stringency, a positive 50-mer probe had a sequence identity of >85%, a stretch length of >15, and free energy of <–30 kcal/mol with its nontargets, and a negative probe had a sequence identity of
85%, a stretch length of
15, and free energy of
–30 kcal/mol with its nontargets (12). The low stringency generally corresponded to hybridization at 42°C with 50% formamide. Similarly, a positive 70-mer probe had a sequence identity of >85%, a stretch length of >20, and free energy of <–40 kcal/mol with its nontargets, and a negative probe had a sequence identity of
85%, a stretch length of
20, and free energy of
–40 kcal/mol with its nontargets (12). In addition, the probes that did not qualify for either the positive pool or the negative pool were ignored for further analysis.
(ii) Microarray spot analysis.
Spot intensity data were extracted from ImaGene output files. The values for gene ID, flag, signal mean (
), background mean (
), signal standard deviation (
s), and background standard deviation (
b) were extracted from ImaGene output files. After the removal of bad spots, the rest of the spots (including potential empty spots and good spots) were kept for further analysis. All processes were conducted with Microsoft Excel software.
(iii) Calculation of SNR values.
For each spot, three methods were used to calculate SNR values:
![]() | (1) |
![]() | (2) |
![]() | (3) |
and
are the signal mean and the background mean of pixel intensities, respectively, and
s and
b are the standard deviations of signal and background, respectively. Based on false-positive (FP) and false-negative (FN) spots at different values of the signal-to-standard-deviation ratio (SSR), the signal-to-background ratio (SBR), and the signal-to-both-standard-deviations ratio (SSDR) (in comparison with the defined positive and negative spot pools), their thresholds were determined by (i) minimizing FPs, (ii) minimizing FNs, and (iii) optimizing the overall percentage of FPs and FNs.
(iv) Student t test analysis of threshold-identified positive spots.
The values of signal (S) and background (B) for a probe with replicate spots were extracted from ImaGene output files, and their means (
m and
m, respectively) and standard deviations (
s,m and
b,m, respectively) were calculated. Outliers were removed if S –
m was greater than or equal to 2.0 x
s,m or B –
m was greater than or equal to 2.0 x
b,m, and this process continued until outliers were recursively removed. The final
m,
m,
s,m, and
b,m were used for the Student t test, and the significance between
m and
m was statistically evaluated for each probe at a given P value.
Data analysis for D. vulgaris Hildenborough microarrays.
Both wild-type and
fur mutant D. vulgaris cells were grown in LS4D medium with 60 µM of iron, and microarray data were obtained as previously described (3). The SSDR method was used to detect positive spots with a threshold of 0.80, and details of data analysis were conducted as previously described (3).
Data analysis for GeoChip with a soil sample.
A soil sample was taken from a plot at BioCON (23), and 5 g of soil was used to extract DNA. GeoChip (13) was used to detect functional genes in such a microbial soil community. SSR, SBR, and SSDR were used to detect positive spots with thresholds of 2.0, 1.6, and 0.8, respectively, and details of labeling, hybridization, and scanning were performed as described previously (13).
|
|
|---|
1.0, the difference between the signal intensity and the background noise is equal to or larger than the sum of the signal and background standard deviations. In this case, the pixel values of signal intensity are completely separated from those of background noise (Fig. 1). Intuitively, such a spot should represent positive signal. When the SSDR is <1.0, overlaps of the pixel values between signals and background noise exist (Fig. 1). In this case, some spots could be positive while some are not, but the key question is what is the minimum SNR (e.g., the SSDR) threshold for distinguishing the signal from its background noise. Thus, in this study, we experimentally determined the threshold of SSDR for differentiating signals from noise.
![]() View larger version (13K): [in a new window] |
FIG. 1. Schematic presentation of the SSDR calculation method. A, B, and C represent SSDRs of <1.0, 1.0, and >1.0, respectively. All four parameters used in the calculation were extracted from the ImaGene output files (ImaGene manual). The error bars represent standard deviations.
|
|
View this table: [in a new window] |
TABLE 1. Thresholds of SSR, SBR, and SSDR determined by minimizing the percentage of FP or FN spots on the array using synthesized oligonucleotide targets under low and high stringencies
|
![]() View larger version (18K): [in a new window] |
FIG. 2. Determination of thresholds of the SSR (A), SBR (B), and SSDR (C) at low stringency by minimizing the percentages of FP and FN spots. Ten picograms of each synthesized oligonucleotide was used to hybridize with the array, and five replicate slides were used. The SSR, SBR, and SSDR were determined to be 2.5, 1.6, and 0.80, respectively. The error bars represent standard deviations.
|
![]() View larger version (18K): [in a new window] |
FIG. 3. Determination of thresholds of the SSR (A), SBR (B), and SSDR (C) at high stringency by minimizing the percentages of FP and FN spots. Ten picograms of each synthesized oligonucleotide was used to hybridize with the array, and five replicate slides were used. The SSR, SBR, and SSDR were determined to be 3.0, 2.0, and 0.90, respectively. The error bars represent standard deviations.
|
![]() View larger version (21K): [in a new window] |
FIG. 4. Effects of target types on the thresholds and the percentages of FPs, FNs, and both (FP+FN) for the SSR (A), SBR (B), and SSDR (C). The left y axes present the optimal thresholds, and the right y axes present the percentages of FP, FN, or FP plus FN under the optimal threshold. The targets used were synthesized oligonucleotides (10 pg each), PCR amplicons (100 pg each), and S. oneidensis MR1 gDNA (500 ng). The more significant P value is shown on the top of each column, with the following notations: nd, no difference; one asterisk, P < 0.10; two asterisks, P < 0.05; and three asterisks, P < 0.01 (the Student t test) when one type of target was compared with two others.
|
![]() View larger version (16K): [in a new window] |
FIG. 5. Effects of background DNA on the determination of SSR, SBR, and SSDR thresholds. Five hundred nanograms of S. oneidensis MR-1 gDNA (A) and 10 pg for each synthesized oligonucleotide (oligo) (B) were spiked into 1.0 µg of yeast gDNA. For synthesized oligonucleotide targets, the yeast gDNA was first labeled and then mixed with the spiked oligonucleotides. S. oneidensis MR-1 gDNA was first mixed with the yeast gDNA and then labeled together. The significance is shown on the top of each column, with the following notations: nd, no difference; one asterisk, P < 0.10; two asterisks, P < 0.05; and three asterisks, P < 0.01 (the Student t test) when thresholds with background DNA were compared to those without background DNA.
|
![]() View larger version (17K): [in a new window] |
FIG. 6. Comparison of changes in signal mean, background (Bkgrd.) mean, signal standard deviation (std. dev.), and background standard deviation for each spot on the array when the yeast gDNA was added to the S. oneidensis gDNA (A) or the synthesized oligonucleotide (Oligo) targets (B). The error bars represent standard deviations.
|
|
View this table: [in a new window] |
TABLE 2. Thresholds of SSR, SBR, and SSDR and the percentages of FNs, FPs, or both for artificial bacterial mixtures
|
|
View this table: [in a new window] |
TABLE 3. Comparison of positive probes identified by probe design criteria, by the Student t test, and by SNR thresholdsa
|
fur mutant (JW707) D. vulgaris Hildenborough with the D. vulgaris Hildenborough oligonucleotide microarray (3), and the other was a BioCON soil sample with GeoChip (13). For the first data set, an SSDR threshold of 0.80 was used. The average SSDR for the fur probe was 0.25 for the
fur mutant and 2.16 for the wild type, confirming the absence of the gene in the mutant (Table 4). Fur is a transcriptional regulator, and it negatively regulates several genes in the fur regulon when it binds to a promoter. The microarray data did show that genes such as feoA, feoB, fld, and gdp, predicted in the fur regulon (25), were up-regulated in the mutant JW707 (Table 4). The Fur regulator has been shown to be involved in oxidative-stress responses, which are mainly controlled by the PerR regulator (25). Indeed, our results also showed that ahpC, rbr, and perR were overexpressed in the JW707 mutant (Table 4). In addition, it was observed that the expression of genes (cobI, cluster of orthologous groups [COG] fepB, fepC, and COG fepD) involved in iron uptake was repressed and that the expression of genes (bfr and ftn) involved in iron storage was induced (Table 4). This is consistent with the fact that more iron may accumulate in the mutant due to the absence of the Fur protein. It should be noted that different cutoffs for up-regulation and down-regulation were used in this study (twofold) and the previous study (3). |
View this table: [in a new window] |
TABLE 4. Examples of transcriptional changes of genes of known function in fur mutant (JW707) and wild-type D. vulgaris Hildenborough
|
|
View this table: [in a new window] |
TABLE 5. Numbers of detected, unique, and overlap spots among replicates A, B, and Ca
|
|
|
|---|
Considering the standard deviations of pixel intensities of both signal and background, a new calculation method was developed. It had two advantages. First, the signal standard deviation was considered as a parameter together with the background standard deviation. Since the pixel intensities of a spot are not uniform, its standard deviation significantly affects the ability to distinguish a true signal from its background. In this case, consideration of the signal standard deviation can more accurately reflect microarray hybridization behaviors and more reliably identify a true spot and its threshold. Second, our experimental data demonstrated that fewer FPs and NPs were observed with this method than with two other methods. The SBR did not change with target types or background DNA, since this calculation does not consider the signal standard deviation or background standard deviation, but it generally had a high percentage of FN and FP spots, and it may not be a good parameter to distinguish a true signal from its background noise. Therefore, this new method may be used for a general SNR calculation, and more accurate thresholds could be obtained with this calculation.
Three possible scenarios, minimizing FPs, minimizing FNs, and optimizing FPs and FNs, were considered to determine the ranges of SNR thresholds for detecting real signals, but the threshold values for optimal FPs and FNs could be used more often. By optimizing the percentage of FP and FN spots, those thresholds of the SSR and SBR determined in this experiment appeared to be lower than other commonly accepted thresholds. For example, the threshold of the SSR was set to 3.0 (30) and that of the SBR to 1.50 (26) or 2.0 (19). Considering all three methods for SNR determination, the ranges of SNR thresholds for gDNA targets are summarized in Table 6. For example, the thresholds of the SSR were in the range of 0.5 (no FN) and 2.0 (optimal) to 4.0 (no FP), and those of the SSDR were in the range of 0.3 (no FN) and 0.7 (optimal) to 0.9 (no FP) under low-stringency conditions. Those ranges provide a general guideline for users to select appropriate SNR thresholds based on their experiments. Two points need to be mentioned. One is that an error rate of 5% (FP plus FN) was used in this study, which is considered reasonable, since microarray data have relatively high variations due to various reasons, such as the small size, degrees of uniformity of printing pins, and uneven hybridization. The other is that the SNR threshold values determined here for DNA microarray studies under different stringencies and different target types or/and concentrations may be applied only to long (50- to 70-mer) oligonucleotide microarrays. The application of such parameters to short (18- to 25-mer) oligonucleotide microarrays remains unclear and needs to be further evaluated.
|
View this table: [in a new window] |
TABLE 6. Summary of ranges of experimentally determined SNR thresholds under low- and high-stringency conditions using the S. oneidensis MR1 gDNA target
|
Many factors, such as target type, background DNAs, target composition, and target amount in the tested sample, affect the SNR threshold determination. The microarray hybridization signal intensity is determined by the number of probe molecules bound to the microarray surface, the number of labeled targets present in the sample, and their ratios, which are closely related to the target type and their concentrations. In this study, the synthesized oligonucleotides and PCR amplicons were the simplest targets, they are similar, and they had almost the same thresholds. S. oneidensis MR-1 gDNA is more complex, and its threshold was a bit lower. Similarly, the complexity of the target was expected to increase in the presence of background DNA, and hence, a lower threshold was observed. Further analysis revealed that this might be due to an increase in the background standard deviation. This was validated by the fact that the thresholds of the SBR did not change with the target type or with the background DNA. With the mixed templates, mixture A contained >70% real target (S. oneidensis gDNA), and the threshold did not change significantly. However, a slight decrease in threshold was observed in mixture B, with 20% real target, and it became undeterminable for mixture C, containing about 2.5% real target. The decrease in the thresholds with a decrease in the target template composition can be explained by an increase in sample noise when the target concentration decreased. Sample noise is mostly from labeled molecules in a sample. For example, labeled target solutions can react in a nonspecific manner on microarrays, which masks the interactions between a probe and its target and obscures the microarray signal. Therefore, an increase in nontarget concentrations leads to an increase in noise, which may reduce SNR thresholds to compromise microarray detectability. This is also consistent with our observations for different types of target or with background DNAs, since labeled nontargets, such as background DNAs, cause a significant amount of background noise.
As previous studies showed, the detection limits for 50-mer oligonucleotide and 70-mer oligonucleotide arrays were estimated to be 25 to 100 ng of gDNA (11) for a pure culture, although a higher sensitivity (5 to 10 ng gDNA) was also observed (24, 29). In the presence of background DNA, the detection limit for a 50-mer oligonucleotide was estimated to be 50 to 100 ng of gDNA (24, 29). In mixture C, the real target was about 63 ng of gDNA, so it was not surprising that only 23.3% of defined positive probes had true signals. These results suggest that a threshold might change with the target composition, which is closely related to the microarray sensitivity.
It was also noted that the amount of target might affect the threshold determination. For example, a higher threshold might be required when a relatively large amount of target is used. In this study, we used the optimal concentrations of 10 pg for each oligonucleotide, 100 pg for each PCR amplicon, and 500 ng for gDNA, which are considered equivalent amounts of the target in samples. This is a simulation for a pure culture or a mixture of a few known microorganisms. For a sample with many unknown microorganisms, such as microbial communities in soil and the human intestinal tract, a determination of SNR thresholds may be even more challenging. Because of unequal abundances, low-abundance genes/microorganisms may not be detected even at a relatively low threshold.
In summary, three methods were used to calculate SNR values, and the newly developed calculation showed a better performance for distinguishing a true signal from its background than the other two methods. The positives identified based on SNR thresholds were verified by the Student t test across many replicate data, and consistent results were obtained. This study provides guidance for the selection of SNR thresholds for different samples, such as PCR amplicons and gDNAs from pure cultures and simple mixed cultures.
fur mutant. This research was supported by the U.S. Department of Energy under the Genomics:GTL program through the Virtual Institute of Microbial Stress and Survival (VIMSS) (http://vimss.lbl.gov) and the Environmental Remediation Science Program.
Published ahead of print on 14 March 2008. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
-proteobacteria. Genome Biol. 5:R90.[CrossRef][Medline]
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»