Previous Article | Next Article ![]()
Applied and Environmental Microbiology, September 2008, p. 5422-5428, Vol. 74, No. 17
0099-2240/08/$08.00+0 doi:10.1128/AEM.00410-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Department of Microbiology and Molecular Genetics, Oklahoma State University, 1110 S. Innovation Way, Stillwater, Oklahoma 74074,1 Department of Botany and Microbiology and Institute for Energy and The Environment, University of Oklahoma, 770 Van Vleet Oval, Norman, Oklahoma 73019,2 Department of Chemistry and Biochemistry and the Advanced Center for Genome Technology, University of Oklahoma, 101 David L. Boren Blvd., Norman, Oklahoma 73019,3 Department of Microbiology, University of Massachusetts, 639 North Pleasant Street, Amherst, Massachusetts 01003,4 Microbiology Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 993545
Received 18 February 2008/ Accepted 26 June 2008
|
|
|---|
|
|
|---|
Within virtually all ecosystems, little information is currently available on the composition, origins, dynamics, and ecological roles of members of the rare biosphere. To investigate these issues, a detailed analysis examining the presence and prevalence of previously unidentified lineages within the rare biosphere (i.e., novelty) and the phylogenetic and evolutionary relationships between rare and abundant members within a specific microbial community (i.e., uniqueness) is needed. Examination of the novelty and uniqueness of the rare biosphere in a specific habitat will obviously require extensive sampling efforts to access this fraction of the community. Pyrosequencing and other sequence tag-based approaches produce hundreds of thousands of short sequences (100 to 250 bp in length) (2, 14, 27, 32). As such, these data sets are helpful for comparative analysis of bacterial communities (19), species richness, and coverage estimates, as well as for an overall description of phylum- and class-level diversity. However, accurate sorting into bins of a specific sequence is often unreliable upon using short fragments, except for queries with high similarity to sequences currently available in public databases. The short amplicon size produced severely limits the utility of these sequences in satisfactorily documenting the presence of novel lineages, and hence sequences with low database similarity are usually sorted into "unclassified" or "other" categories in these studies (14, 27). In addition, the level of sequence divergence, and hence operational taxonomic unit (OTU) assignments, at various taxonomic cutoffs is not always comparable between pyrosequenced fragments and near-full-length 16S rRNA gene sequences (see File S1 in the supplemental material).
To avoid these limitations, we chose to utilize a capillary sequencing-based approach to investigate the nature of the rare biosphere in soil. Soil is an extremely valuable ecosystem for economic sustainability as well as for global nutrient cycling. The soil microbial community is extremely complex, and rRNA gene clone library-based estimates of species diversity range between 3,000 and 52,000 (27, 35). Soil is also one of the most intensively sampled ecosystems: the RDP project II release 9.56 (November 8, 2007) lists at least 77,692 16S rRNA gene sequences originating from soil. However, the vast majority of soil surveys generated small-sized clone libraries (less than 500 clones), with the exception of a composite library of 1,700 clones and a 1,033-clone library from Minnesota farm soil and Alaskan soil, respectively (29, 35). As such, the current collection of long (>300-bp) 16S rRNA gene sequences available in public databases can be regarded as a vast global survey of numerically abundant microorganisms in various soils as well as other habitats.
Here, we report our analysis of 13,001 near-complete 16S rRNA gene clones from an undisturbed tall grass prairie soil in central Oklahoma. The data set is one-fourth the size of the largest pyrosequencing-based soil data set recently reported from a boreal forest in northwestern Canada (27) and eight times the size of the largest published near-complete 16S rRNA gene clone library, which was derived from Minnesota farm soil (35). The analysis describes the novelty and uniqueness patterns observed within the community and suggests how these observed patterns could hold clues regarding the origins and potential ecological roles of rare members of the soil biosphere.
|
|
|---|
Sampling, DNA extraction, PCR amplification, library construction, and sequencing.
The top 5 cm of soil was scooped using a sterile spatula into a sterile 50-ml Falcon tube, stored on dry ice, and transferred to the laboratory, where it was stored at –20°C. The sample did not contain any grass or apparent root structures. DNA was extracted using a FastDNA spin kit for soil (Bio 101 Corp., Vista, CA). A near-complete 16S rRNA gene fragment was amplified using the primer pair 27F (AGAGTTTGATCCTGGCTCAG) and 1391R (GACGGGCGGTGWGTRCA) (17) in a 50-µl reaction mixture containing (final concentrations) 2 µl of extracted DNA, 1x PCR buffer (Invitrogen), 2.5 mM MgSO4, 0.2 mM deoxynucleoside triphosphate mixture, 2.5 U of platinum Taq DNA polymerase (Invitrogen), and 10 µM of each of the forward and reverse primers. PCR amplification was carried out according to the following protocol: initial denaturation for 5 min at 95°C, followed by 20 cycles of denaturation at 95°C for 45 s, annealing at 55°C for 45 s, and elongation at 72°C for 1.5 min, and a final elongation step at 72°C for 15 min was included. PCR products obtained were cloned into a TOPO-TA cloning vector according to the manufacturer's instructions (Invitrogen Corp., Carlsbad, CA) and sequenced at the Department of Energy Joint Genome Institute (Walnut Creek, CA) as previously described (35).
Phylogenetic analysis.
The data set was initially run through the RDP classifier (36), and each clone was sorted into bins based on the resulting preliminary taxonomic affiliations. Each RDP classifier-generated bin was treated as a single data set and aligned using Greengenes NAST-aligner to a 7,682-character global alignment (7). Sequences were assigned to phyla and candidate phyla according to the Hugenholtz taxonomy framework (8), as well as by importing them to the Greengenes May 2007 ARB database in the ARB software package (version 06.03.22) (21), and determining their position after parsimony insertion into the universal ARB dendrogram. Sequences with less than 90% sequence identity with their closest relatives were further probed by comparing them to entries in the GenBank nr database using the BLASTn function (1). The combined use of Greengenes classifier, BLAST, and ARB resulted in sorting of all sequences into phyla and candidate phyla except for 15 sequences putatively identified as members of novel candidate phyla. Potential chimeric sequences within the data sets were identified from NAST-aligned sequences using the program Mallard (3), and chimeric sequences were removed from the data set. Distance matrices from aligned, chimera-checked sequences were generated using the "create distance matrix" function on the Greengenes web server. The resulting distance matrices were used to generate OTUs at different taxonomic cutoffs using the DOTUR program (28). The rarefaction curve for the entire KFS data set was constructed using the Analytic Rarefaction software available from the University of Georgia Stratigraphy Laboratory with the cumulative DOTUR output of all bacterial phyla. Chao and ACE estimates of species richness were calculated using the program EstimateS (5).
Phylogenetic trees were constructed by importing KFS NAST-aligned sequences to the May 2007 ARB database in the ARB software package (21). Sequences were initially inserted to the universal ARB dendrogram using the ARB parsimony function, and phylogenetic trees were subsequently constructed from KFS sequences and closely related sequences using the ARB neighbor-joining (ARB-NJ) method with a Lane mask filter (17). Phylogenetic affiliations of novel phyla and subphyla lineages were further evaluated by exporting aligned sequences of KFS as well as their closest relatives from the ARB database into the PAUP 4.01 software package (Sinauer Associates, Sunderland, MA). Evolutionary distance trees and maximum parsimony trees were constructed from the data set, and the bootstrap values (100 replicates) were determined. Novel phyla described in this study remained monophyletic, with >50% bootstrap support upon using all previously described tree-building approaches, as well as the alteration of the composition and size of the data set used for phylogenetic analysis. Each new candidate phylum had at least two sequences and was unaffiliated with any of the previously described bacterial phyla and candidate phyla (15).
qPCR.
DNA was extracted from the KFS soil samples in triplicate using the MoBio UltraClean soil DNA isolation kit and then purified with the MoBio Power Clean DNA cleanup kit (MoBio, Carlsbad, CA). Extracts were pooled into one sample prior to use in quantitative PCR (qPCR). qPCR was performed in triplicate using a MyIQ real-time PCR system (Bio-Rad, Hercules, CA). The general bacterial 16S rRNA gene primers EUB338F (5'-ACTCCTACGGGAGGCAGCA) and EUB518R (5'-ATTACCGCGGCTGCTGG) (11) were used for amplification. Each reaction mixture (25-µl total volume) consisted of 12.5 µl IQ SYBR Green Supermix (Bio-Rad), 8.5 µl of water, 0.75 µl of each primer (Invitrogen), and 2.0 µl of DNA. qPCR conditions were 5 min at 95°C, followed by 40 cycles of 95°C for 30 s, 54°C for 30 s, and 72°C for 30 s. Tenfold serial dilutions of pCR4-TOPO plasmid (Invitrogen Corp., Carlsbad, CA) containing a EUB338/EUB518 PCR fragment amplified from the Escherichia coli 16S rRNA gene (strain K-12 MG1655) were used to construct standard curves. The copy number in the KFS was determined from the standard curve and subsequently standardized to copy numbers per gram of dry soil.
Novelty estimates.
Novelty (sequence divergence between a specific OTU and its closest relative in public databases) was determined by identifying the closest relative of each OTU within the chimera-checked, near-full-length 151,925 16S rRNA gene sequences available in the Greengenes database (August 2007) using the Classifier program (8). As well, novelty was also determined against the noncurated and frequently updated collection of complete and partial 16S rRNA gene sequences available in the GenBank nr database in September 2007 (see File S1 in the supplemental material).
Defining rare and abundant species in Kessler farm soil.
We reasoned that OTUs labeled as rare in the KFS data set should represent OTUs with a low probability of being encountered in average-sized clone libraries. Using the formula P = 1 – (1 – x)y, where P is the probability of detecting a species with relative abundance x in the large data set in a small data set of size y (33), we determined the probability of encountering OTUs with different occurrences in the KFS data set in smaller clone libraries. OTUs occurring once in the KFS data set have only a 0.77% probability of being encountered in a 100-clone library. Similarly, OTUs with two, three, four, and five clones have 1.53, 2.28, 3.03, and 3.77% chance of being encountered in a 100-clone library. Therefore, while not counting each microorganism present in KFS soil, we decided to empirically define rare species at a lower cutoff of n = 1 and a higher cutoff of n
5. In this sense, OTUs present at such low abundances have rarely been sampled from soils and other environments.
Nucleotide sequence accession numbers.
Sequences obtained in this study have been deposited in the GenBank database under accession numbers EU131915 to EU135578.
|
|
|---|
Detailed phylogenetic analysis grouped KFS clones into 34 different phyla and subphyla (Fig. 1A). Of these, 15 phyla have cultured representatives, 14 are previously described candidate phyla with no cultured representatives, and 5 phyla are novel (novel candidate phyla KFS1 to KFS5; see below). The phylum-level diversity in KFS is much higher than in previously reported data sets, and even higher than the total number (32 phyla) collectively compiled by Janssen (16) from a large number of 16S soil studies. However, this is not necessarily a reflection of a higher-than-expected phylum-level diversity in KFS, since it could be attributed to the ability of larger data sets to identify phyla present in extremely low abundance. In KFS, 24 phyla were present at less than 1% abundance, 14 phyla were represented by less than five clones, and 4 phyla were represented by a single clone.
![]() View larger version (31K): [in a new window] |
FIG. 1. Kessler Farm soil overall library composition. (A) Distribution of various phyla in KFS. (B) Species distribution pattern of KFS OTU0.03 asignments. (C) Rarefaction curve at different taxonomic cutoffs.
|
We used two empirically defined clone abundance cutoffs of n = 1 and n
5 clones to define rare OTU0.03 within the KFS data set (see Materials and Methods). Using these values, the percentage of rare species in KFS ranges between 18.1% (n = 1) and 37.1% (n
5) of the total number of KFS clones (Table 1). Interestingly, the proportion of rare species varied widely among various major bacterial phyla in KFS (defined as phyla represented by >500 clones), with Planctomycetes having the highest percentage of rare species (34.4 to 77.8% of total Planctomycetes clones) and Acidobacteria the lowest (12.0 to 25.0% of total Acidobacteria clones).
|
View this table: [in a new window] |
TABLE 1. Clones belonging to rare OTU0.03s in the entire data set as well as in bacterial phyla represented by more than 500 clones in the data set
|
![]() View larger version (17K): [in a new window] |
FIG. 2. (A) Correlation between novelty and abundance of clones within each OTU0.03 identified in the KFS data set. (B) Correlation between average percent sequence divergence and abundance of clones within KFS OTU0.03 assignments.
|
Detailed phylogenetic analysis of the KFS data set identified multiple novel lineages at the phylum and subphylum level. Except for three OTUs within the Deltaproteobacteria, all novel phylum- and class-level lineages identified were present in low abundance (see Files S4 to S6 in the supplemental material). Fifteen clones (14 OTUs) could not be placed within any of the currently described phyla in the Hugenholtz taxonomy framework in the Greengenes database (8) and as such, whether alone or with few previously unaffiliated sequences, could be grouped into five novel candidate divisions designated KFS1 to KFS5 (Fig. 3). In addition, we speculate that the future availability of sequences related to OTUs FFCH894, FFCH16611, and FFCH9315 might result in recognizing them as three additional novel bacterial phyla (Fig. 3).
![]() View larger version (15K): [in a new window] |
FIG. 3. Distance NJ tree highlighting the phylogenetic position of five novel candidate phyla identified in KFS data set. The tree was constructed from 1,643 aligned sequences using the ARB-NJ method with Olsen correction and a Lane mask filter. Bootstrap values are based on 1,000 replicates and are shown for novel candidate phyla branches.
|
In addition, within the rare members of the KFS data set, we identified several clones that, although belonging to previously described lineages, have rarely been encountered in soils. Examples include clones belonging to the phyla Chlorobia, Caldithrix, Elusimicrobia, candidate phylum BRC-1, clones affiliated with the genus Salinibacter within the Bacteroidetes, and Clostridiales-affiliated clones, as well as clones belonging to the Sup-05 lineage within the Gammaproteobacteria. Interestingly, many of these clones belonged to lineages requiring specific environmental conditions (strict anaerobic conditions, high salt, high temperature) that are usually not prevalent in soil ecosystems.
Proportion of unique clones among rare members of the KFS bacterial community.
We quantified the percentage of clones that belong to rare OTUs (n = 1 and n
5) within the KFS clone library at different taxonomic cutoffs (3, 6, 8, 10, 15, 20, and 25% sequence divergence). The proportion of clones that remains assigned to rare OTUs at higher sequence divergence cutoffs represents unique clones with no close relatives among more abundant members of the community. Similarly, the drop in the number of clones assigned to rare OTUs at higher sequence divergence cutoffs represents the fraction of rare clones with close relatives among more abundant members of the community. Clones within OTUs identified as rare at a putative genus level (6% sequence divergence) represented 9.1 to 24.6% of the total KFS clones (Fig. 4A) and 50.1 to 66.1% (at n = 1 and n
5) of the rare clones at the putative species level (OTU0.03) (Fig. 4B). At the putative class level (15% sequence divergence), clones within OTUs identified as rare represented only 1.4 to 6.1% of the total KFS clones (Fig. 4A) and only 7.9 to 16.3% of the rare clones at the putative species level (Fig. 4B). These results indicate that while a fraction of the rare biosphere is closely related to more abundant species, a significant fraction is unique and represents evolutionary distinct lineages within KFS biosphere.
![]() View larger version (14K): [in a new window] |
FIG. 4. Quantification of the proportion of unique clones within the KFS rare biosphere. Rare OTUs are defined at two cutoffs: those containing a single clone (n = 1) and those containing five or fewer clones (n 5). (A) Percentage of clones belonging to rare OTUs at different taxonomic cutoffs expressed as a fraction of the total number of clones in the KFS data set (13,001). (B) Percentage of clones belonging to rare OTUs at different taxonomic cutoffs expressed as a fraction of the number of clones belonging to rare OTUs at a 97% taxonomic cutoff (OTU0.03).
|
|
|
|---|
Based on our evaluation of the novelty of rare clones (Fig. 2), their phylogenetic affiliations (Fig. 3; see also Files S4 to S6 in the supplemental material), as well as their relationships to more abundant members of the community (Fig. 4), we broadly identify two main groups within the KFS rare biosphere: those with close relatives among the more abundant members of the KFS bacterial community, and those that belong to unique, phylogenetically distinct lineages with no close sequence similarity to more abundant members of KFS. Using 15% sequence divergence from the closest abundant relative within the KFS data set as an empirical "uniqueness" cutoff, members of group I represent 83.6 to 92.1% and members of the second group represent 7.9 to 16.4% of the total number of rare KFS clones (at n = 1 and n
5, respectively) (Fig. 4B). Similar to more abundant members of the community, members of group I belong to common, well-described, and well-sampled soil lineages. On the other hand, members of the unique group II usually belong to novel bacterial phyla, novel lineages within previously described phyla and candidate phyla, or are members of lineages that are ubiquitous in specific environments but rarely encountered in soils. We reason that these novelty and uniqueness patterns provide clues regarding the origins and potential ecological roles of members of the soil's rare biosphere.
The close sequence similarity between nonunique members of the rare biosphere (group I) and dominant OTUs within the community argues against an old, evolutionary distinct origin for this fraction of the rare biosphere, as previously suggested (32). Various lines of ecological evidence suggest that this group of nonunique, nonnovel members of the rare biosphere acts as a backup system and readily responds to seasonal variations encountered in soil temperature, pH, light exposure, and nutrient levels. The constant seasonal promotion of some members of group I rare species to be members of the dominant (and hence readily identifiable) taxa in soil, together with the seasonal demotion of some of the more abundant taxa in soil, is probably responsible for the observation that seasonal variations often result in significant changes in the phylogenetic affiliations of the most numerous members of the community, leading to statistically detectable differences between seasonal clone libraries from the same soil (12, 18, 30, 31). However, these seasonal cyclic changes rarely affect the fundamental soil community structure, and in all seasons, the sampled soils will still have their distinctive community composition (16). We also reason that this fine-tuning function of group I of the rare biosphere is responsible for the fact that within the thousands of soil studies conducted so far (see reference 16 for a review), no two clone libraries have had exactly the same community composition, and exact (100%) sequence matches between the most abundant species and database-deposited sequences (that broadly represent a global repository of more abundant members of soil and other communities) are very rarely encountered. The variations in physical and geochemical characteristics between different soils always select for different species as the most dominant members of the community. Therefore, in all soil surveys reported so far, dominant species identified almost always belong to a previously unencountered strain, species, or genus within well-recognized soil lineages (and hence the tail end of the curve in Fig. 2B never reaches zero).
Within group II of the rare biosphere in the KFS data set (rare bacterial taxa with no close relatives within the dominant species), a fraction belongs to well-described phylogenetic lineages that are prevalent in other ecosystems but are rarely encountered in soil (phyla Chlorobia, Caldithrix, Elusimicrobia candidate phylum BRC-1, Salinibacter, and Clostridiales-affiliated clones, and clones belonging to the Sup-05 lineage within the Gammaproteobacteria) (see Files S4 to S6 in the supplemental material). In addition, we speculate that since members of this group are present in a far less than ideal habitat, the majority will be present in an extremely low number and escaped detection in this study. We suggest that taxa belonging to this fraction of group II of the rare biosphere (together with other species recruited via immigration) respond to more drastic disturbances that could occur in the ecosystem. For example, desertification has been shown to consistently result in an increase in the numbers of organisms from the Deinococcus-Thermus group (26), which is otherwise rarely detected in other soil ecosystems. A change in redox potential could regenerate (among other changes) Clostridiales-affiliated cells (or spores) present in KFS in low abundance. More drastic and sustained disturbances (e.g., the occurrence of a major hydrocarbon spill and the development of anaerobic conditions in soil) initiate more radical promotion, demotion, and recruitment processes, resulting in a completely different community composition.
Finally, a fraction of group II of the rare biosphere belongs to novel bacterial lineages (phyla and subphyla) with no close relatives in the entire global 16S rRNA gene data set currently available (members of candidate phyla KFS1 to KFS5 and novel lineages within different bacterial phyla and candidate phyla) (see Files S3, S5, and S6 in the supplemental material). The ecological role of members of these novel, unique lineages is not yet known. It has been suggested that members of this group fulfill specific crucial, yet unknown functions within soil ecosystems (14, 32). Alternatively, this fraction of the rare biosphere might represent remnants of microbial evolution that, although currently out-competed in all global ecosystems, possess an exceptional ability to survive and escape extinction.
The comprehensive data set obtained in this study should prove extremely valuable in future research aimed at understanding community dynamics in response to environmental fluctuations, as well as a starting point for elucidating the physiological capabilities and metabolic potential of the numerous novel, as-yet-uncultured lineages in the rare biosphere. We are currently evaluating the effect of simulated global warming on the dynamics of the KFS bacterial community at different levels of phylogenetic resolution, ranging from the phylum level (using quantitative PCR) to the species level (using Phylochip, a comprehensive 16S rRNA gene microarray [4]). Further, targeted metagenomic approaches such as fluorescent in situ hybridization coupled to fluorescence-activated cell sorting and multiple displacement amplification, or microfluidic separation of individual cells coupled to multiple displacement amplification, are two promising approaches that could help in elucidating the metabolic potential of novel yet-uncultured groups with low abundance in soil and other complex environments (23, 25).
This work provides an overall assessment of the phylogenetic diversity and evolutionary relationships between rare and more abundant members of the soil biosphere. The data demonstrate that even in extensively studied habitats, the rare biosphere harbors novel lineages (with no representatives in the database) and unique lineages (that are evolutionary distinct, dissimilar to more abundant members of the community). We anticipate that similar efforts in different soils will greatly expand our understanding of the nature of the soil rare biosphere. Similarly, future efforts examining the rare biosphere in ecosystems currently estimated to have a higher level of yet-unexplored bacterial diversity (see reference 20 for a list of these environments) will greatly expand our understanding of the phylum-level global bacterial diversity. The identification of multiple novel bacterial phyla and subphyla within one of the most intensively studied and sampled habitats on earth clearly indicates that while the probability of identifying novel bacterial groups within numerically abundant members of various microbial communities appears to be nearing saturation, the potential of identifying novel lineages, genes, and genomes, as well as potentially novel metabolic abilities and microbial secondary metabolites within the rare biosphere, is just starting to be realized.
Published ahead of print on 7 July 2008. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»