Previous Article | Next Article ![]()
Applied and Environmental Microbiology, August 2003, p. 4927-4934, Vol. 69, No. 8
0099-2240/03/$08.00+0 DOI: 10.1128/AEM.69.8.4927-4934.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Frederick S. Colwell,2 and Ronald L. Crawford1*
Environmental Research Institute, University of Idaho, Moscow, Idaho 83844-1052,1 Idaho National Engineering and Environmental Laboratory, Idaho Falls, Idaho 834152
Received 3 March 2003/ Accepted 30 May 2003
|
|
|---|
1-kb PCR products amplified from the inserts of 672 cosmids plus a set of 16S ribosomal DNA controls. COSMO was hybridized with Cy5-labeled genomic DNA from each bacterial strain, and the results were compared with the results for a common Cy3-labeled reference DNA sample consisting of a composite of genomic DNA from multiple species. The accuracy of the results was confirmed by the preferential hybridization of each strain to its corresponding rDNA probe. Cosmid clones were identified that hybridized specifically to each of 10 microcosm isolates, and other clones produced positive results with multiple related species, which is indicative of conserved genes. Many clones did not hybridize to any microcosm isolate; however, some of these clones hybridized to community genomic DNA, suggesting that they were derived from microbes that we failed to isolate in pure culture. Based on identification of genes by end sequencing of 17 such clones, DNA could be assigned to functions that have potential ecological importance, including hydrogen oxidation, nitrate reduction, and transposition. Metagenomic profiling offers an effective approach for rapidly characterizing many clones and identifying the clones corresponding to unidentified species of microorganisms. |
|
|---|
The use of large-insert genomic libraries is a powerful approach for isolating DNA sequences from complex mixtures of uncultured microorganisms. Direct cloning of DNA from environmental samples makes it possible to avoid some of the biases of cultivation and PCR. In addition, genomic fragments that are >100 kb long can be obtained, and they provide significant functional and taxonomic information about the organisms from which they were derived. Such metagenomic libraries have been used to identify novel genes from uncultivated species of archaea, bacteria, and viruses that are responsible for significant ecosystem processes (4, 5, 8, 12) and to isolate enzymes that are involved in biosynthesis of novel pharmaceuticals (7, 15, 24, 41, 42) or have other industrial uses (11, 16, 17, 26, 33).
Given the immense uncultivated and uncharacterized metabolic diversity in the environment, one would need to sequence relatively few bacterial artificial chromosome (BAC) or cosmid clones to discover fundamentally interesting sets of genes. If modern genomic techniques can be used to carry out more comprehensive surveys of metagenomic libraries, our understanding of natural genetic diversity should be greatly enhanced.
Screening a genomic library can be done a number of different ways. Typically, screening involves colony hybridization with a probe of interest, which yields information one gene at a time. Bioassays have been developed to screen libraries for genes involved in the production of specific enzymes (11, 16, 17, 26, 33) or natural products (15, 24, 42); however, this approach relies on the fortuitous expression of heterologous DNA by the library host strain. High-throughput end sequencing of BAC clones has been used to accelerate various single-genome projects (39), and it is currently being used to characterize some environmental DNA libraries (8).
Although the speed and effectiveness of brute-force sequencing are constantly improving, it is not yet practical to assemble a complete bacterial genome from a metagenome. There is still a need for new functional genomic approaches that systematically yield information about many of the elements in a metagenomic library. These new approaches should ideally allow us to identify the organism from which each clone came, to determine some functional characteristics of various clones, and to identify many more novel uncultivated bacteria.
We sought to develop a practical approach that would provide a large amount of information about the microbial community from a limited set of clones. The purpose of our approach was to classify many of our cosmids and to identify a few candidates for sequencing rather than to undertake a major sequencing and assembly project. Our method involves hybridization of the library with genomic DNA of various reference strains and bacterial isolates from the community under study. In addition, DNA derived from as-yet-uncultivated organisms can be identified by hybridization with metagenomic DNA.
|
|
|---|
DNA microarray technology has become an important tool for determining the gene contents of entire genomes and measuring the expression of genes (18). High-density arrays are effective for quantitative detection of genes in complex samples. Thus, microarrays are a promising technique for characterization of genes in environments such as soil and water (3, 9, 34, 44, 45). However, the use of microarrays has been limited to 16S rRNA markers or a relatively small set of functional genes, and no practical approach has been developed to specifically target the unculturable majority of the species in the environment.
We used a microarray platform to screen a metagenomic library with whole microbial genomes and community genomes (Fig. 1). The microarray (COSMO) contained
1-kb PCR products amplified from the inserts of 672 cosmids along with a set of controls (16S ribosomal DNA [rDNA] probes). From an environmental sample (the same sample from which the library was derived), numerous bacterial isolates were obtained. Genomic DNA was purified from each environmental isolate. In addition, metagenomic DNA was purified directly from the mixed population, which was done without cultivation of bacteria. Each test genome was labeled with Cy5-dCTP and probed with COSMO. In order to subtract any signal that may have come from nonspecific hybridization, in each experiment we used two-color hybridization, in which each test genome was compared to a reference sample of common bacterial DNA. The reference DNA consisted of a pooled sample of genomic DNA from 14 species of bacteria (which effectively diluted the strain-specific genes and enriched common sequences). The reference DNA was given a different label (Cy3), and equal amounts of test and reference DNA were combined and hybridized to COSMO. We refer to this approach as comparative genomic hybridization (CGH). CGH was repeated for all environmental isolates, as well as for the metagenomic DNA sample(s). Positive results were determined based on a Cy5/Cy3 ratio greater than 1 (>0 on a log2 scale). As a result, we obtained a profile for each clone in the metagenomic library (i.e., a graphical representation of its hybridization to one or more species of bacteria). Clones that were specific to a test strain or community hybridized only to those DNA samples. Clones that contained a conserved sequence within their corresponding PCR amplicons hybridized to the genomes of multiple species.
![]() View larger version (23K): [in a new window] |
FIG. 1. Metagenomic profiling. Genomic DNA is purified from various bacterial isolates, as well as from a mixed population. A reference sample of common bacterial DNA is created by pooling the genomic DNA of many strains. In each CGH experiment, the genome of each strain or community is analyzed in comparison to the reference DNA. Some examples of informative results are shown. A clone of environmental DNA may correspond to only bacterial strain 1 (clone a), multiple strains (clone b), the metagenome and a bacterial isolate (clone c), or only the metagenome (clone d).
|
|
|
|---|
To obtain community DNA that were to be used for microarray target samples, a second enrichment of the microcosm was carried out under the same conditions by using a 0.5% inoculum of frozen stock. After incubation, two different fractions of cells were collected (pellicle and planktonic). First, the pellicle was removed with a sterile spatula and placed in a 50-ml polypropylene tube, and then the remaining cells were suspended by gently vortexing the flask, decanting the medium into two 50-ml polypropylene tubes, and collecting the cells by centrifugation. DNA from the samples described above was probed with COSMO in order to identify clones corresponding to microbial strains that were enriched in the mixed culture but that we failed to isolate in pure culture. We used the unisolated fraction of the microcosm to represent the uncultured microorganisms in the environment.
From the original microcosm and all subsequent enrichments, bacterial species were isolated by plating serial dilutions of liquid cultures onto plates containing tryptic soy agar. Bacterial strains were identified by selecting colonies with unique morphology that appeared during a 7-day incubation at 30°C. Ten different strains were obtained (see Fig. 4) along with the Pseudomonas aeruginosa and Staphylococcus aureus reference strains. To determine the identity of each isolate, the 16S rRNA gene was amplified by PCR from a genomic DNA template by using eubacterial primers 27F (20) and 907R (29). 16S amplicons were sequenced by using the 27F primer.
![]() View larger version (50K): [in a new window] |
FIG. 4. Identification of clones corresponding to uncultivated organisms. CGH experiments were performed by using metagenomic DNA from different cell fractions, including free-swimming (planktonic) cells and cells accumulating at the surface (pellicle cells). When duplicate experiments were added to the original data set, it was possible to identify uncultivated DNA (A). In addition, probes classified by single-genome CGH could be used to track the distribution of an organism in the mixed population (B, C, and D).
|
40 kb. The results presented below verify that COSMO represented much of the microcosm's diversity. We did not attempt to characterize the species represented in the cosmid library prior to fabrication of COSMO. The task of amplifying 16S rRNA genes from a pool of cosmids was confounded by the presence of contaminating Escherichia coli genomic DNA in the plasmid preparation. (While the manuscript was being reviewed, Liles et al. [21] published a new procedure for eliminating E. coli 16S rDNA from pools of purified BACs.)
Array fabrication.
All microarray experiments were performed with COSMO, a DNA microarray containing end fragments of 672 cosmids selected randomly from the metagenomic library plus a set of reference genes (16S rDNA markers from several microcosm isolates [Fig. 2]). Cosmid end fragments were produced by a thermal asymmetric interlaced PCR method (22). Briefly, the thermal asymmetric interlaced PCR method involved sequential cycles of linear amplification of insert DNA from the T7 end of the vector with nested primers, followed by exponential amplification of the specific product by random priming, which resulted in PCR products that were 200 to 2,000 bp long. High-throughput PCR was performed with an MBS 384S thermocycler system (ThermoHybaid). 16S rRNA gene controls were also amplified by PCR. The quality and quantity of DNA were confirmed by agarose gel electrophoresis. PCR products were purified by using 384-well filter plates (Millipore) and were resuspended in 15 µl of 1x Spotting Solution Plus (Telechem) to obtain a final DNA concentration of 100 to 200 ng/µl. Each sample was spotted in duplicate on SuperAmine slides (Telechem) by using a Microgrid arrayer (BioRobotics). Slide cross-linking, washing, and blocking steps were carried out by using the manufacturer's protocols (http://arrayit.com/PDF/Super_Microarray_Substrates.pdf).
![]() View larger version (53K): [in a new window] |
FIG. 2. Microarray validation with 16S rDNA controls. The data are the results obtained with rRNA probes in 10 different experiments (one replicate per strain). The results of each experiment are presented as a column of color-coded values. The colors, ranging from green to red, correspond to the log2 of the Cy5/Cy3 ratio (see key at the top). Preferential hybridization of the test strain to a rRNA gene probe is indicated by a positive log2 ratio (red). Results with a signal-to-noise ratio of less than 3 are indicated by blank squares. Species are sorted horizontally and vertically according to their phylogenetic relationships (as determined by Clustal W alignment of rRNA sequences) in order to show the amount of cross-hybridization that occurs between the sequences of related strains. The last four probes shown are additional negative controls.
|
Data analysis.
Arrays were scanned with an Axon 4000 scanner, and fluorescence measurements were obtained by using Genepix Pro 3.0 software (Axon). Data sets were filtered for spots with a signal-to-noise ratio greater than 3.0 by using Microsoft Excel. Results are reported below as log2 of the Cy5/Cy3 ratio. Data analysis and graphical display were done with Expression NTI (Informax Inc.) in the following manner: (i) all clones that failed to produce a positive result (log2, >0) in duplicate experiments with at least one target genome were removed from the data set, (ii) clones were clustered by complete linkage of genes by using the correlation coefficient, and (iii) experiments were sorted according to the phylogenetic relationships of test strains as determined by Clustal W alignment of 16S rRNA sequences.
Analysis of cosmid end sequences.
The inserts of select clones were sequenced from both ends of the multiple cloning site by using the T3 and T7 primers. The plasmid template was purified from 500-ml cultures of E. coli by using a large-construct kit (Qiagen). A sequencing reaction mixture (total volume, 20 µl) was prepared by combining 1 µg of plasmid DNA with 0.32 µl of a 10 µM primer T3 or T7 stock solution in Tris-EDTA buffer and 8.0 µl of a Big Dye mixture. PCR was performed with a PTC-100 thermocycler (MJ Research) for 80 cycles (94°C for 30 s, 47°C for 15 s, and 60°C for 4 min). Reaction mixtures were purified with DTR gel filtration columns (Edge Biosystems). Nucleotide sequences (average size, 450 bp) were determined with an ABI 3100 DNA sequencer.
The potential functions and phylogenetic affinities of cosmid end sequences were determined by performing a nucleotide and translated-protein search of GenBank by using the Basic Local Alignment Search Tool (BLAST) (1). The potential function of a given sequence was determined by examination of all homologous sequences with expect values of <1e-2. A particular function was assigned if (i) a consensus was apparent among the best hits and (ii) there was no disagreement between the consensus of the nucleotide results and the translated-protein results. We also listed the species corresponding to the nucleotide best hit; when no significant nucleotide matches were observed, we listed the species corresponding to the protein best hit (Table 1).
|
View this table: [in a new window] |
TABLE 1. Potential genes observed in cosmid end sequencesa
|
|
|
|---|
The sensitivity and specificity of our microarray analysis were evident from the results obtained for controls and reference genes (Fig. 2) in our CGH experiments with individual genomes (see below). 16S rRNA genes from several microcosm isolates were obtained by PCR and included in COSMO as controls. Each rRNA gene served as a positive control for its corresponding species and as a negative control for distantly related species. Additional negative controls are shown last in the figures. The results indicated that our hybridization conditions allowed identification of strain-specific genes. In all but one case, genomic DNA of each strain hybridized preferentially to its corresponding rRNA probe (yielding the highest ratio). In most cases, some hybridization to a related strain was observed, but the level of hybridization was lower. In the case of Ultramicrobacterium 3, genomic DNA hybridized nearly equally to the Ultramicrobacterium 3 and Acidovorax B rRNA probes. No positive results were obtained with the Bacillus sp., Shigella sp., and Staphylococcus sp. negative controls or with the cosmid vector.
Hybridization with individual genomes.
Various reference strains and microcosm isolates were used individually as target DNA in experiments to locate clones related to these organisms. The clustered microarray results for various microcosm isolates and reference strains revealed distinct classes of DNA that corresponded to individual strains or groups of bacteria (Fig. 3). Numerous strain-specific clones were apparent. The data also revealed examples in which clones hybridized to multiple related organisms (indicative of conserved genes). Based on these patterns, we classified some clones as members a particular species, genus, or branch (Fig. 3).
![]() ![]() View larger version (61K): [in a new window] |
FIG. 3. Classification of clones based on hybridization of COSMO with individual genomes. Twelve different strains were analyzed with COSMO in duplicate. Experiments were sorted horizontally according to the phylogenetic relationships of the test strains. Clones were clustered (by rows) according to the correlation coefficients of the microarray data across 23 experiments. S. aureus was included to show that cross-hybridization between very distant relatives may not be significant, but one replicate was removed in order to minimize its effect on clustering. Based on the clustered data, distinct classes of clones are apparent. The colored bars on the right indicate the clones with reproducible hybridization patterns specific for one test strain (strain-specific clones) and clones that hybridized to multiple related species (conserved clones).
|
Classifying uncultured DNA.
In the experiment described above (Fig. 3), 156 probes (clones) passed our filtering process. The remaining 524 clones failed to produce a positive log2 ratio with any test strain or failed to produce any significant signal. Some of these clones may have corresponded to organisms present in the microcosm that we failed to isolate in pure culture. We considered the unisolated organisms in our microcosm to be analogous to uncultured microbes in the environment. To identify cosmids derived from such organisms, we performed a similar experiment using genomic DNA extracted directly from the mixed bacterial population as the test DNA (the same reference DNA was used). The results of two metagenomic CGH experiments were added to the data set prior to clustering (Fig. 4). The results identified a number of clones that were present in the community and not in our catalog of isolates (Fig. 4A). Such clones were classified as uncultured.
A single experiment, such as the one described above, yielded a spectrum of ratios. It is likely that the clones that yielded a log2 ratio of 1 corresponded to a different organism than the clones that yielded a log2 ratio of 6. The uncultured class could be separated into subgroups if there were clear differences in the abundance of different genes in the community, but it was not obvious where to draw the line between one organism and another.
In an attempt to resolve the uncultivated community in slightly better detail, the microcosm cells were fractionated into two types, pellicle cells and free-swimming cells, which were analyzed separately. All uncultured clones (Fig. 4A) were found to have the same even distribution between planktonic and pellicle cells; therefore, we were not able to determine if these clones corresponded to multiple species. However, we observed that clones corresponding to some of the cultivable species had distinct distribution patterns. Acidivorax I was apparently distributed throughout the microcosm (Fig. 4B), Acidivorax B was present primarily in the pellicle (Fig. 4C), and Caulobacter 6 was not abundant in either fraction (Fig. 4D). These clusters were labeled and presented as examples of clones from different species that, in principle, could be distinguished based solely on community analysis.
Sequence analysis of uncultivated DNA.
In order to identify functional characteristics of uncultured microorganisms from the microcosm, insert DNA from each of 17 random clones from cluster A (Fig. 4A) was sequenced from the T7 and T3 primer sites that flanked the insert. A nucleotide and translated-protein search of GenBank was performed with each cosmid end sequence (Table 1). Among 34 sequences, 18 functional genes were identified, 10 hits were obtained with genes of unknown function, and 6 sequences yielded no significant result. The great majority of the sequences were found to be significantly similar to genes from members of the Proteobacteria, including seven genes from Ralstonia sp. Some of the sequences could be assigned to functions having ecological importance, including a putative [NiFe] hydrogenase, nitrate reduction, and several transposases. Four different insertion sequence elements (ISs) were observed in five clones, 2H6, 6A12, 6A7, 6D10, and 7A12 (sequences from 6A7 and 6D10 were from different positions of the same gene). All ISs that we identified occurred precisely in the T7 end fragment.
|
|
|---|
Hybridizing COSMO with single genomes and cluster analysis of the microarray results were effective for visualizing groups of clones that have unique patterns of hybridization to different bacterial genomes (Fig. 3). We identified clones that hybridized to a single strain, as well as clones that hybridized to related species. In this manner, we classified clones as strain specific or specific to broader phylogenetic group.
Many of the clones in the scrambled section of Fig. 3 have much broader specificity, and some cross-hybridize only between two distantly related strains. The latter finding may be of interest to researchers who wish to identify genes that have been transferred horizontally between species. However, before such claims are made based on hybridization between more divergent species, the correlation between homology and signal intensity must be examined more carefully.
Identification of clones corresponding to cultivable strains is also useful for eliminating clones that do not need to be examined in a culture-independent manner. This may in fact be the primary objective, in which case it would not be necessary to probe genomes one at a time, as we did. The cultivable genomes could be combined into pools in order to accelerate the process.
The eventual identification of clones corresponding to uncultivated microorganisms is the most valuable aspect of this approach. We identified clones that hybridized uniquely to total community DNA. These clones corresponded to one or more species of bacteria that were enriched in mixed cultures but that we were unable to obtain by isolation on tryptic soy agar. End sequencing of 17 cosmids resulted in identification of numerous sequences that had nucleotide similarity to the genes of Ralstonia spp. In addition, we identified a putative hydrogenase and genes involved in reduction of nitrate. Some species of bacteria, including Ralstonia spp. (6, 19, 32), have the ability to couple hydrogen oxidation to nitrate reduction. The putative hydrogenase gene and the norE homologue were not found in the same clone, and we could not confirm that they are linked any way. Such proof would require isolating both sets of genes on the same fragment of DNA. However, knowledge of the metabolic genes present in the uncultured population provides information that should be important for selectively culturing such organisms if they are indeed present.
Among the clones that hybridized only to community DNA, a remarkably high frequency of ISs was observed. We do not believe that this is a unique characteristic of the uncultivated species. ISs occur in a wide range of genera (25). A variety of ISs can occur in a single genome, and a single type of IS may have a copy number of >20 (38). We believe that the five occurrences of four different IS-like elements precisely in a clone's T7 end fragment (the probe actually printed on the array) may have reflected a greater sensitivity of our method for high-copy-number sequences.
Of course, in a mixed population, our method detects the most abundant genes. A common limitation of microarray studies is sensitivity. The detection limit for microarray analysis of soil samples is on the order of
50 ng of a single bacterial genome (10); therefore, in a typical experiment one would detect only organisms that constitute at least 1/40 of the population. However, new techniques for uniform amplification of genomic DNA (27) and enrichment of unique sequences (23, 28) may be applied to environmental samples in order to access the single-copy genes of less abundant species.
We hope to further develop applications of the COSMO microarray for environmental samples. We demonstrate here that in addition to simple identification of an uncultured subset of clones, additional classes of organisms can be defined by comparing different environmental samples from the same habitat (Fig. 4B to D). Clones that have a unique distribution in the two fractions may constitute a separate phylogenetic class. By comparing the metagenomic profiles of a field site to a map of metabolic activities (13) or microbial species (36), clones whose distribution correlates with a biological process may be identified. In this manner, ecologically important genes for which there is not a specific probe or assay may be identified.
Systematic organization of a metagenomic library is essential for performing a more comprehensive study of a microbial community. We must continue to utilize modern techniques of genome analysis and adapt them to the study of complex mixed genomes, keeping in mind that the purpose of a genome-wide study is to accelerate the discovery of genes that are important for specific processes.
A study of the microbial metagenome need not seek truly comprehensive knowledge concerning all microbial genomes if it is possible to first negotiate the genomic landscape of a given site and find what is relevant and interesting. A practical approach to metagenomics is (i) to quickly identify familiar genes, (ii) to identify the unknowns, and (iii) to attempt to classify them based on knowledge such as where certain genes are present and what apparent linkages there are between different genes. Once a set of clones that is linked to a biological process is identified, the specific genes involved may be identified from the complete DNA sequences of the clones. Metagenomic profiling is an appropriate technique for these tasks. We believe that microarray studies in combination with DNA sequence analysis will be important tools for enhancing our understanding of earth's microbial diversity.
This work was funded by a predoctoral fellowship from the Inland Northwest Research Alliance. Additionally, the project was supported by NIH grant P20 RR16454 from the BRIN Program of the National Center for Research Resources.
Present address: Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»