Previous Article | Next Article ![]()
Applied and Environmental Microbiology, November 2007, p. 7059-7066, Vol. 73, No. 21
0099-2240/07/$08.00+0 doi:10.1128/AEM.00358-07
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado,1 Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado,2 College of Marine Science, University of South Florida, St. Petersburg, Florida,3 Department of Mathematics and Statistics, San Diego State University, San Diego, California,4 Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, Colorado,5 Center for Microbial Sciences, San Diego State University, San Diego, California,6 Department of Biology, San Diego State University, San Diego, California,7 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado,8 Department of Biology, Duke University, Durham, North Carolina,9 Nicholas School of the Environment and Earth Sciences, Duke University, Durham, North Carolina,10
Received 13 February 2007/ Accepted 29 August 2007
|
|
|---|
|
|
|---|
Of the microbial groups that are abundant in soil, the bacteria have been the most extensively studied. With an estimated 103 to 107 bacterial "species" per individual soil sample (15, 23, 59, 60), they are often considered to be the most diverse group of soil microorganisms (13). However, bacteria are not the only microorganisms found in soil; archaea, fungi, and viruses are also numerically abundant (58). To our knowledge, no previous studies have examined the sequence diversity of soil viruses, and no studies have compared the levels of genetic diversity found in the different taxonomic groups of soil microorganisms (bacteria, archaea, fungi, and viruses) inhabiting a given soil sample.
We propose that soil fungal, archaeal, and viral communities are likely to be as taxonomically diverse as soil bacterial communities. Although soil fungi have been studied for centuries, recent DNA-based surveys suggest that fruiting body and cultivation-based surveys have underestimated the total richness of soil fungal communities (33, 43, 54). Recent research also indicates that soil archaea are phylogenetically diverse (44, 46, 61) and are undersurveyed despite their apparent importance in soil processes (37). Soil viruses are known to be abundant, to be morphologically diverse, and to span a wide range of genome sizes (48, 64), but there are currently no published reports describing the genomic diversity of soil viral communities.
For this study, our goal was not to identify every individual microorganism found in soil. To do so would be prohibitively difficult given the magnitude of the required sequencing effort (17, 55). Rather, our goal was to compare the phylogenetic diversities of the four dominant taxonomic groups of soil microorganisms in soils collected from a tallgrass prairie, an arid desert, and a tropical rainforest. These sites were chosen because they represent globally dominant ecosystem types and span a broad gradient in aridity and productivity. We analyzed partial sequences of amplified 16S and 18S rRNA genes to characterize the phylogenetic diversity of archaeal, fungal, and bacterial communities in each soil. Because viruses lack ubiquitously conserved genetic elements, we assessed viral diversity by sequencing randomly chosen clones from viral DNA metagenomic libraries.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Site information and general properties of the three soils studieda
|
Viral community DNA was extracted from the soils using methods similar to those described elsewhere (8, 10). Soil samples (
200 g [wet weight]) were resuspended in 0.02-µm-filtered 1x phosphate-buffered saline solution and shaken vigorously to dislodge the viruses from the soil particles. The sediments were pelleted, and the supernatant was then filtered through a 0.2-µm Sterivex filter to remove all nonviral organisms. Viruses in the filtrate were concentrated by polyethylene glycol precipitation with polyethylene glycol 8000 added to a final concentration of 10%, and the samples were incubated for 12 h at 4°C (11). The samples were then centrifuged at 13,000 x g for 30 min on an SW41 rotor to pellet the viral particles. The viral pellet was resuspended in 0.02-µm-filtered phosphate-buffered saline solution and loaded onto a cesium chloride step gradient consisting of 1 ml each of 1.7, 1.5, and 1.35 g ml–1. The gradient was centrifuged for 2 h at 22,000 rpm on an SW41 rotor (average of 60,000 x g), and the DNA was isolated from the 1.35 to 1.5 g ml–1 fraction (which contains most of the viral particles) using formamide and cetyltrimethylammonium bromide extraction (53).
Clone library construction.
For the analysis of small-subunit rRNA genes, individual bacterial, archaeal, and fungal clone libraries were constructed from each soil sample. For each library, three replicate PCRs were conducted per soil DNA template (for a total of 30 replicate PCRs per library) using group-specific primers. The bacterial clone library was constructed using a universal eubacterial primer set, Bac8f (5'-AGAGTTTGATCCTGGCTCAG-3') and Univ529r (5'-ACCGCGGCKGCTGGC-3') (5, 36, 49). The archaeal clone library was constructed using the archaeon-specific primer Arc21f (5'-TTCCGGTTGATCCTGCCGGA-3') (5) and Univ529r. The fungal library was constructed with the EF4 (5'-GGAAGGGRTGTATTTATTAG-3') and fung5 (5'-GTAAAAGTCCTGGTTCCCC-3') primer set (57), which has previously been shown to amplify 18S rRNA genes from most fungal groups (3, 24, 26, 32). Each 50-µl PCR mixture contained 1x HotStarTaq master mix (QIAGEN, Valencia, CA), 0.5 µM of each primer, and 50 ng of template DNA. The amplification protocol consisted of 15 min at 95°C, followed by 25 cycles of 60 s at 94°C, 30 s at the appropriate annealing temperature, and 60 s at 72°C and a final 10-min extension step at 72°C. The annealing temperatures for the bacterial, archaeal, and fungal amplifications were 54°C, 55°C, and 48°C, respectively.
The amplified products from the replicate PCRs were pooled together and cloned using the TOPO-TA PCR cloning kit (Invitrogen). Clones were picked and unidirectionally sequenced following standard protocols (SymBio, Menlo Park, CA). Sequences were screened for chimeras using Bellerophon (29), trimmed at conserved motifs, and aligned using either NAST (available at http://greengenes.lbl.gov) or ARB (available at http://www.arb-home.de). Figure 1 and Table 2 indicate the number of sequences included in each library.
![]() View larger version (17K): [in a new window] |
FIG. 1. Rarefaction curves for the bacterial, fungal, and archaeal clone libraries constructed from each of the soil samples. Rarefaction curves were generated using EstimateS (version 7.5; R. K. Colwell, http://purl.oclc.org/estimates). In all nine libraries, there is no apparent asymptote in the rarefaction curves, suggesting that the libraries do not encompass the full extent of OTU richness in each of the communities with an OTU defined at the 97% sequence similarity level.
|
|
View this table: [in a new window] |
TABLE 2. Number of OTUs observed per library versus number predicted by the power law model
|
Analysis of archaeal, bacterial, and fungal libraries.
We confirmed that the sequences from each library matched the targeted taxonomic group by comparing the sequences to those in the GenBank database using the BLAST algorithm (1). The archaeal, fungal, and bacterial libraries were dereplicated into operational taxonomic units (OTUs) using Fastgroup II (65). An OTU was defined as a group with
97% identity in their small-subunit rRNA gene sequences following the conventional definition of a microbial "species" (52). Due to the computational challenges associated with estimating diversity indices and the associated error around these estimates, we only used a single OTU definition for this study. After grouping sequences into OTUs at the
97% sequence similarity level, we used EstimateS (version 7; R. K. Colwell, http://purl.oclc.org/estimates) to produce rarefaction curves (Fig. 1). Because none of the rarefaction curves approached an asymptote, we know that we have undersampled the total diversity of each microbial group, and therefore the rarefaction curves cannot be used to compare the diversities of the microbial communities.
For each of the nine libraries, the rank-abundance data (where the observed OTUs are ordered from most to least abundant on the x axis and the abundance of each OTU is plotted on the y axis) were fit to four possible models: logarithmic, log-normal, exponential, and power law models. The equations for these four models are provided in reference 4. These equations describe the community structure by expressing the fraction fi of the community in the ith ranked OTU in terms of the model parameters a, b, and M. As an example, the equation describing the community structure of the power law model is
![]() | (1) |
The parameters for all the models for all of the libraries were estimated using maximum-likelihood methods. The estimates for the viral communities followed the procedure described by Breitbart et al. (11) and are further described in "Viral sequence analysis" below. The maximum-likelihood estimates of M and a for the other communities proceeded by minimizing the variance-weighted sum of squared deviations Y between the observed and the predicted number of OTUs sampled exactly k times in a sample of size n:
![]() | (2) |
|
View this table: [in a new window] |
TABLE 3. Estimation of error for the parametric models used to describe the OTU abundance distribution in each community and estimates of OTU richnessa
|
![]() | (3) |
![]() | (4) |
Xi,k. Thus, knowing the expected value of the Xi,k enables us to calculate
![]() | (5) |
![]() | (6) |
![]() | (7) |
![]() View larger version (14K): [in a new window] |
FIG. 2. Estimation of OTU richness (M in equation 1) (panel a) and the abundance of the most common OTU (a in equation 1) (panel b) in each of the three soils. Symbols correspond to soil type ( , prairie; , rainforest; , desert). Parameters were estimated by fitting a power law function to OTU abundance distributions. Maximum-likelihood values are denoted with symbols, and bars indicate 68% confidence regions for the parameter estimates of the actual community (see Materials and Methods). Due to the high range of isolikelihood estimates for OTU richness in the desert archaeal, prairie fungal, and rainforest viral communities, we can conclude only that the number of OTUs in each of these communities is likely to exceed 106. The asterisks indicate that the maximum-likelihood estimates of OTU richness for the desert archaeal and rainforest viral communities exceeded 1010 OTUs.
|
98% identity over a minimum of 20 bp, as per Breitbart et al. (11). The contig spectra were as follows: rainforest [980, 8, 3, 1, 0, 0, 0...], desert [1592, 24, 0, 0, 0...], and prairie [1,899, 13, 1, 0, 0, 0...]. The resulting contig spectra were mathematically modeled to predict community structure using PHACCS (4) and Monte Carlo simulations as described previously (8, 11). To determine the identities of the environmental viruses, the viral metagenomic sequences were compared against the GenBank nonredundant database using TBLASTX. Significant hits to GenBank entries (E value of <0.001) were classified into groups based on sequence annotation in the nonredundant database. To determine the types of phages found in the soils, the sequences were compared against a database containing 510 complete phage genomes (51) using TBLASTX (http://phage.sdsu.edu/oceanviruses). Hits with an E value of <10–6 against this database (approximately equivalent to an E value of 0.001 against the nonredundant database) were considered significant.
Nucleotide sequence accession numbers.
The nonredundant sequences from this study have been deposited in the GenBank nonredundant database and have accession numbers EF429664 through EF431845 (bacteria, archaea, and fungi). All viral sequences from this study have been deposited in the GenBank GSS database with accession numbers ER781257 through ER785833.
|
|
|---|
97% sequence similarity level) was surveyed with the clone libraries, as none of the curves reached an asymptote. However, coarse estimates of microbial diversity can be obtained without sampling every individual OTU in a given community (15, 28), and we can compare relative levels of community richness and evenness in the targeted microbial taxa. Nonparametric estimators (i.e., Chao I and ACE) (41) are frequently used to estimate the total number of OTUs in a given community (6, 30). However, in all cases, the nonparametric estimates of total OTU richness failed to stabilize or reach an asymptote (data not shown), so they cannot be used to estimate the total number of OTUs within each community (34). Instead, we used a parametric technique, based on the observed OTU abundance distribution, to predict the community-level diversity of these three groups, assuming that the form of the OTU abundance distribution is the same for both the libraries and the communities as a whole. For the viral communities, which were surveyed by constructing metagenomic libraries, the OTU abundance distribution was predicted by mathematically modeling the contig spectra. We tested four different models that are commonly used to describe microbial community structure (23, 28) and used the most appropriate model (a power law function [Table 3]) to estimate the OTU richness and evenness of each community. While more complex parametric models have been used to estimate OTU richness (23, 55), these models were not tested because there is no a priori reason to choose one type of model over another and because less parsimonious models (those with a larger number of parameters) are likely to underestimate model error. The power law model yielded the lowest model error in 9 of the 12 cases (Table 3). Table 2 shows the close correspondence between the observed number of OTUs and the power law model prediction of OTU numbers for each library. The second-best-performing model, the log-normal model, yielded estimates of OTU richness across soils and taxonomic groups that were generally similar to the estimates obtained using the power law model (Table 3). Since the levels of diversity are estimated from the OTU abundance curve, the estimates of OTU richness should be relatively robust to changes in library size (Table 2). However, for some of the OTU richness estimates, there was a wide range in the 70% confidence regions around the maximum-likelihood values (Fig. 2). This high degree of uncertainty in richness estimates reflects the difficulties associated with reliably fitting the tail of a given distribution. This is readily apparent in Table 3 and in the extremely high richness estimates for the desert archaeal and prairie fungal communities (Fig. 2). Although our clone libraries are larger than most clone libraries published to date, they are still miniscule considering the overwhelming complexity of the soil microbial communities, making it difficult to estimate the exact number of OTUs in each taxonomic group. Due to this high degree of uncertainty, the richness estimates should be considered carefully, as they are likely to be more useful for comparing richness levels between taxonomic groups than for defining the exact number of OTUs in each of the collected soil samples. However, it is worth noting that there is far less uncertainty associated with the estimates of evenness for the individual communities (Fig. 2), as the evenness estimates are less susceptible to errors associated with predicting the specific shape of the tail end of the OTU distribution.
The model results suggest that the total OTU-level richness of bacteria, archaea, fungi, and viruses was extremely high at all sites (Fig. 2a), with the estimated richness of the last three groups equaling or exceeding the richness of soil bacteria in all habitats. The desert archaeal, prairie fungal, and rainforest viral communities were particularly OTU rich, with a minimum estimate of >106 unique OTUs each (Fig. 2a), more than an order of magnitude higher than bacterial richness at the same sites. Of course, given the caveats detailed above, it is important to recognize the high degree of uncertainty inherent in these richness estimates.
The estimated differences in evenness between taxa are likely to be more robust than our estimates of total OTU richness (Fig. 2). Of the four taxonomic groups, the archaeal communities were the least even, with a single OTU accounting for >8% of the population in a given community (Fig. 2b). The fungal and archaeal communities had lower evenness levels than bacterial communities, an observation consistent with results reported elsewhere (43, 46, 61). There was no apparent correlation between the estimated evenness and richness of the communities (r2 = 0.05; P > 0.5). Interestingly, the estimated probabilities of selecting two individuals of the same OTU from a community (Simpson's diversity index) (41) were relatively consistent within each taxonomic group regardless of soil type (Fig. 3). This consistency suggests that the overall structure of each of these communities is controlled by the type of microbe in question rather than the specific features of the soil environment.
![]() View larger version (8K): [in a new window] |
FIG. 3. Predicted values of Simpson's diversity index for each of the 12 communities. Since Simpson's index (D) is defined as the probability that two individuals taken at random from the community belong to the same species (or, in this case, OTU) (41), higher values of D–1 indicate higher overall diversity. Symbols correspond to soil type ( , prairie; , rainforest; , desert). The mean value for D–1 (with one standard error in parentheses) for each taxonomic group is denoted above each set of symbols.
|
Not only are soil bacteria, archaea, fungi, and viruses locally diverse, but our results indicate that these groups are also globally diverse, as we observed little phylogenetic overlap between soils. None of the identified archaeal, fungal, or bacterial OTUs was found at more than one site, and we observed only one instance of an overlapping viral sequence (
98% identity over 20 bp) between sites when all viral sequences (4,577 in total) were assembled together. While we have no way of estimating the global richness of these groups, the lack of overlap in observed OTUs between sites tells us that the global diversity of each of these groups must be very high. The century-old speculation that the global diversity of the smallest organisms should be relatively low (22) appears to be incorrect.
The estimated number of bacterial OTUs in the three plots (
104 unique OTUs [Fig. 2a]) closely matches the estimates obtained in other studies (59, 60). Our estimates of fungal richness are substantially higher than estimates obtained using classical taxonomic approaches (a maximum of 3,000 fungal species identified from a single 400-ha site) (25), confirming the results of other studies showing that molecular surveys can uncover a large pool of fungal diversity that has been overlooked (2, 33, 40, 43). Soil archaea also appear to have an equivalent, if not greater, OTU richness than soil bacterial communities, consistent with the high levels of phylogenetic diversity observed in other studies of soil archaea (46, 61). To our knowledge, there are no comparable studies of phylogenetic richness in soil viral communities. However, it is important to note that because we examined only viruses with double-stranded DNA, the true richness of viral communities at each site is likely to be even higher than our estimates.
Of the three soils examined, no individual soil harbored the most diverse community of microorganisms. The estimated number of OTUs was highest in the desert soil for archaea, the prairie soil for fungi, and the rainforest soil for viruses, while the richness of bacterial OTUs was very similar across the three soils (Fig. 2a). Due to a paucity of studies comparing microbial diversity across soils from different ecosystems and the large number of possible mechanisms that may influence levels of taxonomic richness, it is unclear how to interpret these results. Fierer and Jackson (21) found the lowest levels of bacterial diversity in rainforest soils, but their study (which estimated diversity by terminal restriction fragment length polymorphism fingerprinting) was not necessarily examining diversity at the same level of taxonomic resolution as in this study. The high estimated richness of archaeal OTUs in the desert soil is surprising considering the challenging nature of this environment, but other studies have also observed high levels of archaeal diversity in soils and other environments that are likely to be suboptimal for microbial growth (50, 61). The fungal results (Fig. 2a) are consistent with a study by Jumpponen and Johnson (33) in which high fungal diversity was also observed in soils collected from Konza Prairie, KS.
To our knowledge, this is the first study to use sequencing to characterize soil viral communities. TBLASTX comparison of the soil sequences against the GenBank nonredundant database revealed that the majority of the viral sequences showed no significant similarity to previously described sequences (E value of <0.001). Among the identifiable hits, there were numerous similarities to phages (viruses that infect bacteria) (Table 4) and to herpesviruses (data not shown). While there was very little overlap in viral sequences (
98% identity over 20 bp) between sites (see above), comparison of the sequences against a database containing the genomes of 510 completely sequenced phages demonstrated that similar types of phages were found in all three soil types (Table 4; Fig. 4). The most abundant phage types observed in the soil samples were similar to phages that infect the soil bacteria Actinoplanes, Mycobacterium, Myxococcus, and Streptomyces, as well as the halophilic archaeon Haloarcula (Table 4). The phage types observed in the soil samples were significantly different from the dominant types found in marine or fecal samples (8, 9, 11) (Table 4; Fig. 4), suggesting that distinct habitat types harbor distinct viral communities.
|
View this table: [in a new window] |
TABLE 4. Comparison of viral communities from soil and other environments
|
![]() View larger version (12K): [in a new window] |
FIG. 4. Hierarchical clustering showing the phylogenetic distance between viral communities from soil (this study), marine sediment (8), human fecal samples (9), and seawater environments (11). Distances were estimated with the weighted Unifrac algorithm (38, 39) using only those sequences from the metagenomic libraries with significant hits to the Phage Proteomic Tree (http://phage.sdsu.edu/oceanviruses) to generate the input phylogenetic trees. A sequence jackknifing technique was applied to each cluster to determine the sensitivity of the relationships to sample size. Asterisks indicate that the nodes are well supported, having been observed in >95% of the jackknifing runs. The soil viral communities were significantly different from the viral communities in the other environments (P < 0.02 in all cases with the UniFrac significance test) (39).
|
Together our results confirm that we have only begun to explore the diversity of soil microorganisms. In an individual sample, our data suggest that the actual number of archaeal, fungal, bacterial, and viral "species" (or OTUs) exceeds the total number of microbial species that have been named to date (
7,500 named archaea and bacteria combined,
80,000 fungi, and
2,000 viruses) (12, 19, 20). Clearly, the majority of the microbial diversity on Earth remains undiscovered.
This work was supported by grants from the Mellon Foundation and NSF to N.F.; grants from the Mellon Foundation, NIGEC/NICCR/DOE, IAI, and NSF to R.B.J.; and grants from the Gordon and Betty Moore Foundation and NSF to F.R.
Published ahead of print on 7 September 2007. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»