| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Previous Article | Next Article ![]()
Applied and Environmental Microbiology, January 2007, p. 278-288, Vol. 73, No. 1
0099-2240/07/$08.00+0 doi:10.1128/AEM.01177-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Ingela Dahllöf,4
Carola Holmström,1,2
W. Ford Doolittle,3 and
Staffan Kjelleberg1,2*
School of Biotechnology and Biomolecular Sciences,1 The Centre for Marine Biofouling and Bio-Innovation, University of New South Wales, Sydney 2052, Australia,2 Program in Evolutionary Biology, Canadian Institute for Advanced Research, Department of Biochemistry, Dalhousie University, 5859 University Avenue, Halifax B3H 4H7, Canada,3 The National Environmental Research Institute, Roskilde, Denmark4
Received 22 May 2006/ Accepted 21 October 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Pauling and Zuckerkandl (45) first proposed the use of gene sequences as a molecular clock to decipher phylogenetic relationships. Woese and Fox (43, 44) introduced the use of rRNA genes for this purpose, which served as the basis for their definition of the three domains of life. The notion that rRNA genes could identify an organism by reconstructing its phylogeny, along with the possibility of storing sequences in databases, resulted in the rapid adoption of the 16S rRNA gene by microbiologists. This gene has now established itself as the "gold standard," not only in bacterial phylogeny but also in microbial ecology studies.
However, none of the 16S rRNA-based molecular methods allows for an accurate representation of microbial communities. While bias is introduced into molecular community analysis by many mechanical factors, such as sample handling, fixation, DNA extraction, and PCR (9, 12, 13, 15, 28, 42), it is also created by the existence of multiple heterogeneous copies of the 16S rRNA gene within a genome (10, 11).
The implications of using a gene displaying intragenomic heterogeneity for fingerprinting methods used in molecular community analysis were first described by Dahllöf et al. (11). They demonstrated that single species could produce complex banding patterns with DGGE, similar to those reported for whole communities. Subsequently, Crosby and Criddle (10) used an artificial community of bacteria whose whole genomes have been sequenced completely to determine the factors causing both over- and underestimates of diversity with various fingerprinting methods.
Most researchers currently adopt an OTU definition based on 16S rRNA gene sequence identity, usually considering organisms displaying 97 to 98% identity in this gene to be part of the same OTU (16). However, recent evidence suggests that even when taking into account intragenomic heterogeneity, clusters of sequences with 99% sequence identity reveal extensive ribotype microdiversity, which potentially underlies important ecological differentiation (2). For example, in a natural Vibrio splendidus population, even single ribotypes (unique 16S rRNA gene sequences) present important genotypic variations (38). The case for greater diversity than can be detected using current 16S rRNA gene-based molecular community analysis techniques is also supported by multilocus sequence analysis (MLSA) studies, which involve the sequencing of a number of genes coding for proteins with housekeeping functions to assess diversity in collections of isolates (23, 36). Such studies have identified organisms with identical 16S rRNA gene sequences that have significant sequence divergence in protein-encoding genes. Although the 16S rRNA gene is by far the most frequently used gene, methods in molecular microbial ecology are not limited to this gene. Alternative core housekeeping genes, such as the RNA polymerase ß subunit gene (rpoB), have been used with DGGE (7, 11, 25, 31, 32).
The use of a single-copy gene for community analysis is an important milestone in microbial ecology, as it could allow for the accurate measurement of diversity and phylogenetic relationships, avoiding a loss in phylogenetic resolution and biases in diversity measurements due to the presence of intragenomic heterogeneity. However, the criteria for which the 16S rRNA gene was originally selected must be established for alternative markers. In this study, a comparison between the 16S rRNA and rpoB genes is performed in order to evaluate the use of an alternative gene as a marker for molecular microbial ecology.
This study addresses four issues. Firstly, the localization of intragenomic heterogeneity within specific regions of the 16S rRNA molecule is explored. Secondly, we compare how well the 16S rRNA and rpoB genes and their fragments used for DGGE reconstruct bacterial phylogenies. Thirdly, the influence of 16S rRNA gene intragenomic heterogeneity on bacterial phylogeny at the subspecies level is determined. Finally, we address whether the rpoB gene fulfills the criteria required for a molecular marker in microbial ecology.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Recoding of alignment positions displaying intragenomic heterogeneity.
Multiple copies of the 16S rRNA gene present in a single organism were condensed to a single consensus sequence. When heterogeneity was present between multiple copies found in a given genome, the specific positions at which this heterogeneity occurred were recoded to denote the multiple character states observed for that position. The following code, which is recognized by the PAUP* program (35) used to compile phylogenetic trees, was used for recoding: R = AG, Y = CT, M = AC, K = GT, W = AT, S = CG, B = CGT, D = AGT, H = ACT, V = ACG, and N = ACGT.
Mutational saturation analysis.
The first and third positions within codons, which are synonymous for many amino acids, display higher mutational rates than the second position. If the sequences compared are sufficiently distant, then multiple consecutive nucleotide substitutions at the same position can occur after differentiation, making the position uninformative and/or misleading with regards to phylogenetic analyses. We therefore tested whether the rpoB data sets used for phylogenetic analysis were affected by mutational saturation. Saturation analysis was done for each codon position of the rpoB gene, using comp_mat in MUST, version 3.0 (27). The proportion of observed pairwise differences between rpoB sequences was calculated with an uncorrected neighbor-joining method using MUST 3.0 and was plotted against the proportion of inferred substitutions estimated by maximum parsimony using PAUP* 4.0b10 (35). If no linear fit could be found between observed and inferred nucleotide substitutions, the codon position tested was considered to present some degree of mutational saturation.
Phylogenetic analyses.
Phylogenetic analyses were performed at the DNA level for the 16S rRNA gene data sets and some of the rpoB data sets. For the rpoB gene, third codon positions that did not display mutational saturation were included in the analyses. The trees were constructed with PAUP* 4.04b, applying the heuristic search option and using the TBR branch-swapping algorithm. Maximum likelihood was used as the tree reconstruction method, with the nucleotide substitution model (GTR), among-sites rate variation parameter
(G), proportion of invariable sites (I), and nucleotide frequencies determined using MODELTEST (29). The confidence of each node was determined by building a consensus tree of 100 maximum likelihood trees from bootstrap pseudoreplicates of the original data set.
Maximum likelihood phylogenetic analyses of the rpoB gene amino acid translation were performed using PROML with the JTT amino acid substitution matrix, a rate heterogeneity model with gamma-distributed rates over four categories, with the
among-sites rate variation parameter estimated using TREE-PUZZLE, global rearrangements, and randomized input order of sequences (10 jumbles). Bootstrap support values represent a consensus (obtained using CONSENSE) of 100 Fitch-Margoliash distance trees (obtained using PUZZLEBOOT and FITCH) from pseudoreplicates (obtained using SEQBOOT) of the original alignment. The settings of PUZZLEBOOT were the same as those used for PROML, except that no global rearrangements and randomized input order of sequences are available in this program. PROML, CONSENSE, FITCH, and SEQBOOT are from the PHYLIP package, version 3.6a (http://evolution.genetics.washington.edu/phylip.html). TREE-PUZZLE and PUZZLEBOOT can be obtained from the program's website (http://www.tree-puzzle.de).
Random taxon sampling analysis.
To evaluate the influence of intragenomic heterogeneity in a 16S rRNA gene phylogeny at the subspecies level, a random taxon sampling analysis of 16S rRNA paralogous gene copies from multiple Escherichia coli and Shigella flexneri strains was performed. Shigella strains, although bearing a separate genus name, have been shown to be part of the E. coli species (30). Sequences from the more distant organism Salmonella enterica serovar Typhi LT2 were also included as an outgroup for the analysis. All strains of E. coli, S. flexneri, and S. enterica serovar Typhi LT2 analyzed here possess seven copies of the 16S rRNA gene. In this analysis, one of these seven paralogous copies was randomly sampled from each strain to create a data set, for which a maximum likelihood tree was computed. The procedure was repeated 1,000 times, giving a set of 16S rRNA gene trees, with each including one randomly chosen 16S rRNA gene copy from each E. coli and S. flexneri strain. Trees were systematically rooted with randomly sampled 16S rRNA gene copies from S. enterica serovar Typhi LT2.
Determination of evolutionary rates across sites (sliding-window analysis).
To better define regions of the 16S rRNA and rpoB genes with higher evolutionary rates, a sliding-window analysis was performed on a data set composed of the evolutionary rates at each position in the sequence alignments. These evolutionary rates were calculated for each position of the RpoB amino acid alignment and the 16S rRNA gene alignment of sequences from 50 representative species of the domain Bacteria. TREE-PUZZLE was used to assign an evolutionary rate category from 1 to 8 for each position (1 = slowest, 8 = fastest). These position and rate values were copied into an Excel spreadsheet, in which a macro was used to calculate the average evolutionary rate category for each 10 neighboring positions. The 10-position window was moved along the data set by increments of a single position, recalculating the average evolutionary rate category each time.
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
Comparing rpoB and 16S rRNA gene phylogenies.
To compare the phylogenetic resolution of the rpoB and 16S rRNA genes, both full-length and DGGE fragments were used to reconstruct bacterial phylogenies at various taxonomic levels (Tables 2 and 3). The phylogeny of T-RFLP analysis-generated DNA fragments was not included, as this method generates DNA fragments of various sizes. In order to reduce overrepresentation of some phyla and make computing-intensive methods such as maximum likelihood phylogeny and maximum likelihood distance bootstrapping feasible, 50 bacterial species were selected to reconstruct the phylogeny of the bacterial domain (Fig. 3). To compare the capacities of rpoB and 16S rRNA genes to reconstruct phylogenies at the subspecies level, data sets of these genes from E. coli and Shigella flexneri strains were compiled and analyzed using various phylogenetic tools (Fig. 4).
|
|
Tables 2 and 3 summarize the comparisons of phylogenetic resolution using the full-length rpoB and 16S rRNA genes and their DGGE fragments at various taxonomic levels (domain, phylum, family, order, class, and species). The full-length rpoB and 16S rRNA genes showed equal resolution (number of tree nodes showing significant statistical support) in four of the comparisons. The rpoB gene offered better resolution in 7 of the 13 comparisons, while the 16S rRNA gene only provided enhanced resolution in 2 of the comparisons. The rpoB and 16S rRNA DGGE fragments showed similar phylogenetic resolution powers, with both fragments having improved resolution in 5 of the 13 comparisons and 3 of the comparisons having equal phylogenetic resolution. Interestingly, the rpoB and 16S rRNA-DGGE fragments showed comparable phylogenetic resolution to that of the full-length rpoB and 16S rRNA genes for several data sets, including those for the Actinobacteria, Chlamydiaceae, Bacillales, Lactobacillales, Streptococcus, Mycoplasmataceae, Alphaproteobacteria, and Enterobacteriales.
Intragenomic heterogeneity displayed by the 16S rRNA gene is most likely to affect the fine-scale phylogeny of closely related organisms, as their 16S rRNA gene intergenomic heterogeneity could be comparable to intragenomic heterogeneity. For this reason, we used the more exhaustively sampled species E. coli/S. flexneri (Shigella strains are part of the E. coli species [29]) as a data set for various phylogenetic methods to determine if their fine-scale phylogeny could be resolved. We recovered 17 unique 16S rRNA gene sequences from the five E. coli/S. flexneri strains in the data set (Fig. 4A), which would cause diversity to be grossly overestimated when 100% 16S rRNA gene sequence identity is used to define an OTU. Additionally, intragenomic and intergenomic heterogeneity levels appear to be of comparable importance, given that the 16S rRNA gene copies from each strain do not form monophyletic clades. There is also little bootstrap support for the nodes, with the outgroup alone being supported with statistical significance as a monophyletic clade. If a single 16S rRNA gene is chosen to represent a strain, the phylogeny obtained can change significantly depending on which copy was chosen. This is illustrated by the results of our random taxon sampling analysis (Fig. 4B). In this analysis, one of the seven 16S rRNA paralogous gene copies found in each E. coli/S. flexneri genome was randomly chosen for phylogenetic analysis, and this procedure was replicated 1,000 times. The trees obtained in these analyses differed significantly from each other, as a consensus of these trees was poorly supported (Fig. 4B). Recoding nucleotide positions that displayed intragenomic heterogeneity as ambiguous also yielded a poorly resolved tree (Fig. 4C). All of the methods employed here clearly show that the 16S rRNA gene cannot resolve the relationships among E. coli/S. flexneri strains. The rpoB gene, for its part, can resolve the relationships between some of these strains with strong statistical support (Fig. 4D).
| DISCUSSION |
|---|
|
|
|---|
There are two main hypotheses used to explain the existence of multiple rRNA operons within a genome, as follows: (i) multiple rRNA operons provide a multiplier effect on translation, allowing a bacterium to grow rapidly in response to environmental change (20); and (ii) functional differentiation between rRNA operons allows for differential expression of rRNA operons in response to environmental change (18). For the first hypothesis to be true, genetic drift would have to be counteracted by gene conversion to give rise to rRNA operons with little sequence heterogeneity distributed randomly across their length. The second hypothesis predicts that selection of functionally differentiated rRNA operons would lead to the concentration of heterogeneous positions in specific regions and to differentiated stem and loop structures between rRNA molecules.
We have found evidence for both scenarios. The 38% of bacterial genomes which contain more than one 16S rRNA gene copy but display no sequence heterogeneity between copies suggest that multiple rRNA operons can exist because of their multiplier effect on translation. However, we have not examined promoters, ribosomal proteins, or the 5S and 23S rRNA genes, which could also be important for functional differentiation between ribosomal operons. For genomes with more than one rRNA operon, 62% display some degree of sequence divergence (0 to 11.6%) between the intragenomic 16S rRNA gene copies. While heterogeneous positions are found throughout the length of 16S rRNA gene copies, localized regions display above-average numbers of heterogeneous positions (helices 6, 9 to 11, 17, 33, 34, 39, and 41) (Fig. 1). Such hot spots could result from the accumulation of neutral nucleotide substitutions or recombination. In haloarchaea, hot spots have been found to result from recombination (6). For the firmicute Thermoanaerobacter tengcongensis, about half of the 11.6% pairwise sequence difference between its two most divergent 16S rRNA gene copies is a result of large insertions in one of these copies. These large inserts form secondary structures (3) suggestive of a functional role in the ribosome.
It has yet to be shown whether paralogous copies of rRNA operons are co- or differentially expressed under various environmental conditions in prokaryotes. However, such a link between functionality and intragenomic rRNA divergence has been observed in the apicomplexan parasite Plasmodium berghei (18). Its two types of 18S rRNA genes (which differ at 5.0% of their nucleotide positions) are preferentially expressed in different stages of the life cycle of this eukaryotic parasite (18).
Comparing the rpoB and 16S rRNA genes as phylogenetic markers.
The rpoB and 16S rRNA genes were compared at various taxonomic levels for the ability to resolve bacterial phylogeny. For a total of 13 data sets, the rpoB gene provided more phylogenetic resolution than the 16S rRNA gene in 7 cases, equal resolution in 4 cases, and lower resolution in 2 cases (Table 2). Importantly, it resolved more of the relationships in the comparison at the domain level (Bacteria), which is the level at which most molecular microbial ecology studies occur.
A different picture emerges from a comparison of the 16S rRNA- and rpoB-DGGE fragments, with each yielding better resolution in five of the comparisons and being equal in three of them. The 16S rRNA-DGGE fragment produced enhanced resolution in the comparison at the domain level (Bacteria), although both fragments had poor resolution at this level. This is not surprising, as the rpoB- and 16S rRNA-DGGE fragments are very small (380 and 550 bp, respectively).
It should be noted that neither the rpoB nor 16S rRNA gene can resolve all, and that generally each resolves only about half, of the phylogenetic relationships in a given data set with either their full length or DGGE fragment. Phylogenetic studies should therefore be cautious in arriving at conclusions on the origins of environmentally derived sequences and should employ rigorous phylogenetic methods involving maximum likelihood-based algorithms to determine relationships between sequences. Also, a measure of statistical confidence, such as bootstrapping, should always be included in phylogenetic analyses.
The influence of intragenomic heterogeneity displayed by the 16S rRNA gene on bacterial phylogeny was also assessed, as most heterogeneous positions are localized within fragments commonly used for microbial ecology studies (namely, clone libraries, DGGE, and T-RFLP analyses) (Fig. 2). Phylogenetic analysis of all 16S rRNA gene copies present in the five strains of E. coli/S. flexneri of our data set (17 unique sequences) presented multiple clusters containing sequences from different strains. This clustering of sequences from various strains demonstrates a lack of correlation between the origin of 16S rRNA genes and their nucleotide sequences at the subspecies level for the E. coli/S. flexneri group. Intragenomic heterogeneity is therefore likely to affect the fine-scale phylogeny of closely related bacteria. For such closely related organisms, intragenomic heterogeneity can be as significant as intergenomic heterogeneity, as we have shown here for the E. coli/S. flexneri strains. This suggests that one should use caution in interpreting environmental 16S rRNA gene sequence-derived phylogenies at the species level or lower. For some prokaryotes, such as the archaeal order Halobacteriales, levels of 16S rRNA intragenomic heterogeneity are so high (>5.0%) that they can even affect phylogeny at the genus level (6, 40).
The rpoB gene did resolve several of the relationships among the E.coli/S. flexneri strains with strong statistical support (Fig. 4D). We therefore suggest rpoB as an additional or alternative marker in ecological studies, as it is capable of deciphering fine-scale phylogenetic relationships that go undetected using the 16S rRNA gene. Indeed, the use of a marker such as rpoB would be especially important for studies focusing on the genus level or lower because of the increased resolution it would provide as well as the absence of intragenomic heterogeneity as a source of bias.
Establishing rpoB as an alternative gene marker for microbial ecology.
Protein-encoding genes, such as rpoB, have several advantages over RNA-encoding genes as molecular markers. They can be used at both the amino acid and nucleotide levels for phylogenetic analysis. Protein alignments allow for resolution of relationships at higher taxonomic levels (domain or phylum) when one or more codon positions are saturated. Nucleotide-level alignments allow for fine-scale resolution, with synonymous first and third codon positions allowing for nearly neutral mutations which can be detected between very closely related organisms (species level or lower). This is likely the reason why rpoB performed better than the 16S rRNA gene in resolving relationships at the subspecies level. The resolution provided by rpoB can also be increased further by sequencing other protein-encoding genes and examining allelic profiles of different isolates rather than using single-gene sequence comparisons (36, 38). Although such MLSA cannot be done directly on an environmental sample, a marker used to describe diversity in an environmental sample (such as rpoB) can also be included in the set of genes used for MLSA of a collection of isolates. The improved phylogenetic resolution at the subspecies level provided by rpoB seems to be of increasing importance, with recent studies suggesting important ecological or genotypic differences in organisms with identical 16S rRNA gene sequences (ribotypes) (e.g. see reference 23).
Our analyses also demonstrate that rpoB displays other important characteristics as an ecological marker, including (i) its universal presence in all prokaryotes; (ii) the presence of slowly and quickly evolving regions for the design of probes and primers of differing specificities; (iii) having a housekeeping function, making it less susceptible to some forms of lateral gene transfer; and (iv) a large enough size to contain phylogenetic information, even after removal of regions that are difficult to align.
There are a few disadvantages in using rpoB (or another protein-encoding gene with similar characteristics) over the 16S rRNA gene as a molecular marker for microbial ecology studies. The single major obstacle resides in a fundamental property of protein-encoding genes, i.e., the saturation of all third codon positions over a long evolutionary timescale, which makes it more difficult to design universal primers for rpoB. Nonetheless, rpoB primers that can amplify a range of bacterial groups have been successfully used for DGGE studies (7, 11, 25, 31, 32). Alternative rpoB primers targeting the most conserved amino acid motifs of the RpoB protein have been developed and tested on most culturable bacterial phyla (R. J. Case, unpublished data). Additionally, Santos and Ochman (33) have developed a set of 10 primer pairs targeting protein-encoding housekeeping genes (including rpoB) to overcome limitations of the 16S rRNA genes to identify and classify bacteria.
A distinct advantage of using 16S rRNA as an ecological marker is the high concentration of targets in cells for FISH analyses. However, the development of more sensitive methods, such as catalyzed reporter deposition-fluorescence in situ hybridization (CARD-FISH), should also allow for detection of organisms using protein-encoding genes (14, 26). Although mRNA for the rpoB gene would not be as abundant as rRNA in prokaryotic cells, it has been shown that this mRNA is relatively stable and should be present at significant concentrations, regardless of the growth phase of the host (24). Interestingly, we noticed that heterogeneous regions between the seven 16S rRNA gene copies in E. coli coincided with regions of poor FISH probe accessibility (data not shown) described by Behrens et al. (5), where probe accessibility was measured as a function of probe signal. However, a poor probe signal could result from a heterogeneous pool of 16S rRNAs, as Behrens et al. later showed that poor accessibility of probes did not result from the ribosome's tertiary structure (4).
There is little doubt of the scientific contributions made by studies using the 16S rRNA gene, as molecular studies have circumvented the need to culture bacteria. Central to this contribution is our appreciation of the extent of and ability to catalogue microbial diversity by storing 16S rRNA gene sequences in free and accessible databases. However, the growing number of sequenced genomes and environmental metagenomic libraries provides us with the necessary reference sequences to use other genes for molecular microbial ecology. Also contributing to the accumulation of sequences for alternative molecular markers are MLSA studies using protein-encoding genes.
Alternative markers in molecular microbial ecologythe way forward.
Lateral gene transfer has uncoupled function from phylogeny for prokaryotes, so the identity of a bacterium responsible for a function is unlikely to be uncovered through diversity studies. This has already led researchers to use functional genes, such as the [NiFe] hydrogenase gene, amoA, pmoA, nirS, nirK, nosZ, and pufM, in conjunction with DGGE (1, 19, 22, 39, 41) and to detect pmoA and dysB mRNAs with CARD-FISH (14, 26), leading the way for in situ identification of protein-encoding genes. The potential for alternative housekeeping genes to study diversity or the use of operational genes in molecular microbial ecology is an exciting prospect. This study demonstrates that some of them, with characteristics similar to those of rpoB, would have the potential to yield the same phylogenetic information as the 16S rRNA gene, with enhanced resolution for fine-scale analyses.
| FOOTNOTES |
|---|
Published ahead of print on 27 October 2006. ![]()
Present address: Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney 2109, Australia. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| J. Bacteriol. | Microbiol. Mol. Biol. Rev. | Eukaryot. Cell | All ASM Journals |
|---|