Department of Biochemistry and Molecular Biology, Dalhousie University, and Genome Atlantic, Halifax, Nova Scotia, Canada
Received 10 February 2006/ Accepted 10 May 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
A fosmid library was constructed using the CopyControl Fosmid Library production kit from Epicentre (Madison, WI), following the protocol of the manufacturer. Fosmid clone DNA was extracted using the REAL Prep 96 plasmid kit (QIAGEN, Mississauga, ON, Canada). We screened six 96-well plates for the presence of a mesotoga 16S rRNA by PCR using the Thermotogales-specific primers Balt.SSU.D.42, ATC ACT GGG CGT AAA GGG AG, and Balt.SSU.D.494, GTG GTC GTT CCT CTT TCA AT.
Subcloning of fosmids was performed using the TOPO Shotgun subcloning kit (Invitrogen, Burlington, ON, Canada), and each fosmid was sequenced to more than 8x coverage. Low-quality regions and gaps were targeted by PCR. The fosmid sequences were assembled using phredPhrap and Consed (http://www.phrap.org/phredphrapconsed.html) (8, 9, 13). Open reading frames (ORFs) were identified using the run-glimmer2 script using the default settings provided in this script (5), and ORFs shorter than 100 bp were eliminated. If two overlapping ORFs were identified, we selected the one that had significant homologs in GenBank. tRNAs were identified with tRNAscan-SE (25). The ORFs were annotated using BLASTP searches (1) of GenBank at http://www.ncbi.nlm.nih.gov/BLAST/ and Pfam searches (2) at http://www.sanger.ac.uk/Software/Pfam/search.shtml. The ORFs were designated MES0001 and MES0058.
Phylogenetic analyses of the 23S rRNA fragment and 16S rRNA genes were performed in PAUP*, version 4.0b10 (39). A 16S rRNA gene alignment based on secondary structure containing several Thermotogales lineages as well as outgroup sequences (see Fig. 2) was obtained from http://www.psb.ugent.be/rRNA/ssu/. Additional Thermotogales sequences were added to this alignment manually. The 23S rRNA alignment contained only Thermotogales sequences and could be aligned by eye. Minimum evolution (ME) trees were constructed using LogDet distances, and maximum likelihood (ML) trees were constructed using a general time-reversible model with gamma-distributed rates with four categories and invariable sites (GTR +
+ I). Ten random-addition cycles of the sequences and tree bisection and reconnection branch swapping were used in both cases.
|
Clusters of very similar sequences from the same or sister taxon were trimmed down to one representative sequence. We also removed sequences that were considerably shorter than the rest of the alignment as well as sequences that were difficult to align. The alignments were edited by deleting regions with many or large gaps. For each ORF, we made both simple neighbor-joining (NJ) trees with bootstrap analysis and a ME tree (with bootstrap analysis; 100 replicates with global rearrangements) estimated from ML distances (the substitution matrix was auto selected by TREE-PUZZLE 5.2 [37] +
, global rearrangements, and 10 random-addition replicates). In addition to these trees, we also constructed automated ML trees (WAG [42] +
+ I model) using the PhyloGenie package (11) modified by Eric Bapteste (Department of Biochemistry and Molecular Biology, Dalhousie University). Thermosipho sp. TCF52B sequences were not included in these trees as this program performs an automated retrieval of sequences from GenBank. In cases where the automated ML tree differed significantly from both the NJ and ME trees or where the Thermosipho sp. TCF52B was the only close Thermotogales homolog, an ML tree was constructed from the same alignments as those of the NJ and ME trees using PMBML (40), a modified version of the PROML within the PHYLIP package, version 3.6a2 (10). For these analyses, we used a JTT (21) +
model, global rearrangements, and 10 random-addition replicates. In the ML bootstrap analyses, we performed only one random addition of sequences per bootstrap replicate.
The ratio of synonymous to nonsynonymous mutations (ds/dn ratio) was determined by using the Nei-Gojobori method (27) in SNAP (synonymous/nonsynonymous analysis program) (23) at http://hiv-web.lanl.gov/SNAP/WEBSNAP/SNAP.html.
The absolute difference between the frequency of charged and polar amino acid residues (charged versus polar [CvP] bias) (38) was calculated for each mesotoga ORF that was predicted from the fosmid sequences as well as for their Thermotoga maritima MSB8 and Thermosipho sp. TCF52B homologs (where available) using an in-house Perl script. Transmembrane helices were predicted using the TMAP program from the EMBOSS package (35). ORFs with at least two predicted transmembrane helices were considered putative transmembrane proteins. Since the correlation of the CvP value with the thermostability of a protein does not hold for transmembrane proteins (38), the predicted transmembrane proteins were excluded from further analyses. ORFs were further classified into three categories according to their CvP values: mesophilic (CvP value, <7.35), thermophilic (CvP value, >10.62) and intermediate (CvP value, >7.35 to <10.62). The categories were defined following the observed CvP values in different genomes (Table 1 in reference 38).
Thermotogales 16S rRNA stems were identified by comparing the 16S rRNA gene alignment used in the phylogenetic analysis to the T. maritima MSB8 secondary structure obtained from Konings and Gutell (22), available at the comparative RNA web site (3). We used only unambiguously aligned regions. Optimal growth temperatures for members of the order Thermotogales were obtained from Huber and Hannig (17).
Nucleotide sequence accession numbers.
The sequenced fosmids have been submitted to GenBank with accession numbers AM184115 and AM184116.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
46%) (see Table S1 in the supplemental material). The end sequences obtained from the fosmid library varied in G+C content from 33 to 75%, with a median of 55.85%. Thus, we would expect to observe variations in G+C content if the clones were chimeric. We also observed only a single potential pseudogene (MES0053). Duplication or illegitimate recombination seems to be the most likely explanation for its origin (see below) since it has close neighbors on both sides that cluster strongly within the order Thermotogales (Fig. 1). Annotation of the identified ORFs is given in Table S1 of the supplemental material, and an overview of the contig is given in Fig. 1. There were four cases in which adjacent ORFs were also neighbors in the T. maritima MSB8 genome and one case each where three and four ORFs were found in the same order in T. maritima MSB8. We examined the syntenic four-gene cluster in some detail. It comprises a hypothetical protein of unknown function, a predicted metal-dependent hydrolase, a predicted "HD" superfamily hydrolase, and a putative phosphate starvation-inducible protein. The three ORFs with no transmembrane domains were all predicted to be hyperthermophilic from their CvP values (CvP values of 15, 17.4, and 13.2, see below). Phylogenetic analyses showed that the mesotoga sequences clustered with at least one other member of the order Thermotogales for all four ORFs.
The two sequenced fosmid clones overlapped by 6,936 bp. The two clones were very similar in sequence in the overlapping region, except for a 1,524-bp region were 56 differences were observed, giving an uncorrected distance of 3.67% (see Table S1 in the supplemental material; Fig. 1). For the rest of the overlapping region, only five differences (0.1%) were observed. Thus, the two clones were likely obtained from two very similar strains, with the high-variability region representing genes subjected to diversifying selection or, as we think more likely, homologous recombination with a less closely related lineage. The ratio between synonymous to nonsynonymous substitutions for the two ORFs where the polymorphisms were observed suggested a purifying selection; ds/dn was 3.86 for MES0029 (a glycosidase) and 8.06 for MES0030 (alpha-galactosidase), in support of this interpretation.
Phylogenetic analyses.
Phylogenetic analyses confirmed that the DNA fragments sequenced here do indeed originate from a Thermotogales bacterium (Fig. 2 and see Table S2 in the supplemental material). In the 16S rRNA gene tree, the sequence from the fosmid clone clusters with the majority of other Thermotogales 16S rRNA genes found in mesophilic environments. The closest relative of this mesotoga clade is a thermophilic strain, TBF19.5, that we recently isolated at 70°C from production water from the Troll oil field in the North Sea (C. L. Nesbø et al., unpublished data). TBF19.5 and the mesotoga sequences cluster together as a sister clade to the clade containing Marinitoga, Geotoga, and Petrotoga, of which all known isolates are thermophiles. Notably, there is a second clade containing mesophilic Thermotogales strains clustering as a sister to Petrotoga, indicating that adaptation to mesophily has occurred at least twice among these bacteria. Importantly, the topology of the tree suggests that members of the ancestral order Thermotogales were thermophiles or hyperthermophiles and that mesophily is a derived trait. A similar topology was obtained from the 23S rRNA tree, with the bh459.f1.4.b07 sequence clustering as a sister to Petrotoga and Marinitoga (no sequence is available from TBF.19.5 or other mesotoga lineages).
The majority of protein-coding genes also put mesotoga sequences within the order Thermotogales in phylogenetic analyses. For 21 of the ORFs, phylogenetic analysis strongly supported a monophyletic clade containing the mesotoga sequence and one or more other Thermotogales sequences (see Table S1 in the supplemental material). Mesotoga and other Thermotogales bacteria also formed a monophyletic clade, but with no bootstrap support, in an additional five gene trees. Finally, for two ORFs, mesotoga was found in the same clade as other members of the order Thermotogales interspersed with sequence from another bacterial group, suggesting lateral gene transfer (LGT) from Thermotogales bacteria to various lineages.
Among the 15 ORFs for which we have sequence data from T. maritima MSB8, Thermosipho sp. TCF52B, and mesotoga and for which these three Thermotogales lineages appear to form a monophyletic clade, 12 displayed our mesotoga lineage as sister to a monophyletic Thermotoga-Thermosipho clade, which was in agreement with the rRNA result. For two ORFs, Thermosipho sp. TCF52B clustered as sister to a monophyletic mesotoga-Thermotoga clade, and for one ORF, T. maritima MSB8 clustered as sister to a monophyletic mesotoga-Thermosipho clade.
The second most common pattern grouped mesotoga or the Thermotogales clade containing mesotoga with Firmicutes, which are low GC gram positives (this pattern was observed for 14 of the ORFs) (see Table S2 in the supplemental material). Omelchenko et al. (32) have previously shown that T. maritima clusters with Clostridium acetobutylicum within a Firmicutes clade in a gene content tree based on the presence and absence of clusters of orthologous groups. These observations could be due to a high level of LGT between members of the order Thermotogales and low GC gram positives, a specific relationship between these groups, or perhaps more likely, a combination of these factors. The phylogenetic analyses of mesotoga ORFs were consistent with LGT in most cases (at least nine ORFs) (see Table S2 in the supplemental material), both recent and likely ancient events involving all three Thermotogales lineages, where members of both Thermotogales and Firmicutes act as donors of the sequence (see Table S2 in the supplemental material). Mesotoga and anaerobic Firmicutes such as Clostridia are found in similar environments, providing ample opportunities for LGT (4, 6, 15, 45). For a few trees, a sister relationship between low G+C gram positives and Thermotogales appeared most likely. This was, for instance, the case for MES0017 encoding a methionyl-tRNA formyltransferase where the order Thermotogales clustered at the base of the phylum Firmicutes. For the potential phylogenetic marker MES0006, encoding a gram-positive DNA polymerase alpha subunit (PolC), the Thermotogales sequences (together with a sequence from Fusobacterium) clustered as sister to Clostridium and Thermoanaerobacter and, in trees containing more distantly related homologs to root the tree, the Thermotogales sequences are found at the base of the clade containing the PolC sequences.
In total, the phylogenetic analyses suggested that 15 ORFs had been acquired by LGT (Fig. 1 and see Table S2 in the supplemental material). This is clearly an underestimate since we consider only ORFs for which robust phylogenetic analyses appeared possible to perform. Five of these are ancient LGTs that involve all three Thermotogales lineages analyzed here. MES0032 is an oxidoreductase of eukaryotic origin that has again been transferred from a member of the order Thermotogales to the order Thermococcales. MES0046 is a phosphomannomutase for which the sequences in mesotoga and Thermosipho show patterns consistent with frequent transfer between Firmicutes and Thermotogales. MES0050, MES0054, and MES0055 all appear to be ancient LGTs from Archaea. Interestingly, both MES0053 and MES0054 encode TrkA (K+ transport systems, NAD-binding component). MES0053 is shorter than other TrkA homologs, being only 135 amino acids, and is probably a pseudogene, possibly the result of a duplication of MES0054. However, in phylogenetic trees, this ORF clusters consistently with Actinobacteria, suggesting instead that it is a result of a recent LGT and possibly illegitimate recombination upstream of MES0054. The eight remaining recent LGTs appear to have been acquired from a wide range of sources, Archaea, Eukarya, and Bacteria (see Tables S1 and S2 in the supplemental material). Most of the recent transfers appear to be from mesophilic organisms (see Table S2 in the supplemental material). Two exceptions are MES0020 and MES0021 that likely have been acquired from an archaeon, as they are found branching at the base of archaeal clades. These archaeal clades contain both mesophilic and thermophilic organisms; therefore, it is not possible to say whether the donor was a thermophile or a mesophile.
These recently transferred genes encode a range of different functions. Interestingly, one of the recently obtained genes, MES0002, encodes a protein with diguanylate cyclase and phosphodiesterase activity (DGC-PDE). This protein has four distinct domains, a periplasmic binding domain, a PAS-PAC domain, a GGDEF domain, and a metal-dependent phosphohydrolase (HDc) domain (see Table S1 in the supplemental material). Such proteins have been shown to be involved in control of the cellular levels of c-diGMP, a secondary messenger involved in modulating bacterial growth on surfaces by regulating cellular adhesion components (19). The PAS-PAC domain is likely to be involved in signal sensing, while the GGDEF domain converts GTP to c-diGMP (19). The HDc is likely to be responsible for the hydrolytic cleavage of c-diGMP into GMPs (19). Thus, the GGDEF domain and the HDc domain probably have opposing effects on the cellular level of c-diGMP. Biofilm formation regulated by c-diGMP has been shown for T. maritima MSB8 (20). The presence of a DGC-PDE protein suggests that this also may be the case for mesotoga. The phylogenetic analyses show that the mesotoga DGC-PDE likely was acquired from a Clostridium bacterium, perhaps providing mesotoga with the ability to sense changes specific to its current environment. The fact that both c-diGMP regulators are present in the same protein probably facilitated the integration of such a new sensory system. Notably, there are two other genes, MES0045 and MES0052, which encode response regulator proteins involved in a two-component signal transduction system in bacteria, that function to detect and respond to environmental changes. These proteins show the highest similarity to organisms outside the order Thermotogales, and mesotoga sequences did not cluster consistently with any other sequences in phylogenetic trees. It is likely that these genes were also acquired through LGT. Thus, it appears that mesotoga has acquired several genes involved in interactions with its new environment.
Adaptation to mesophily.
One interesting question is how mesotoga has changed as a result of its mesophilic lifestyle. The phylogenetic analyses suggest that both the modification of existing sequences and the acquisition of new genes from mesophilic organisms have played a role. The most prominent signature of hyperthermophilic lifestyle at the proteome level has been shown to be the large difference between the proportion of charged (Asp, Glu, Lys, and Arg) versus polar (noncharged) (Asn, Gln, Ser, and Thr) amino acids in soluble proteins, abbreviated as the CvP-bias (38). A plot of CvP values for each mesotoga ORF and its Thermotogales homolog (if present) is shown in Fig. 3. Mesotoga shows a lower average CvP value (excluding predicted transmembrane proteins) than do both of the other Thermotogales lineages analyzed, with 9 for mesotoga, 11.2 for Thermosipho, and 15.4 for T. maritima. The CvP values correlate well with the growth temperature of these lineages, where T. maritima has the highest growth temperature (55 to 90°C, with a 80°C optimum) (18) and Thermosipho is a more moderate thermophile (37 to 75°C, with a 65°C optimum [Birkeland and Dahle, University of Bergen, personal communication]).
|
When we classified all 44 mesotoga nontransmembrane proteins into mesophilic (CvP value, <7.35), thermophilic (CvP value, >10.62), and intermediate, 17 appeared mesophilic, 14 intermediate, and 13 thermophilic. Among these, 10 of the mesophilic proteins, 12 of the thermophilic, and 9 of the intermediate could be used in phylogenetic analyses. From these trees, recent LGT was inferred for five of the mesophilic proteins (50%), while LGT was suggested for only two of the thermophilic proteins and one of the intermediate proteins, suggesting that mesotoga is using LGT to adapt to a mesophilic environment. Notably, the two thermophilic proteins that are suggested to have been acquired by LGT appear to have mesophilic donors. However, as observed for MES0007, both of these proteins are short (Fig. 1 and see Table S1 in the supplemental material) and the high CvP value might be due to structural and evolutionary constraints. LGT, as an important means to adapt to new environmental niches, high temperature, and high levels of radiation, was also shown for Thermus thermophilus and Deinococcus radiodurans (32).
Another adaptation to living at high temperatures is the higher G+C content of the stems in rRNA secondary structures (12). A plot of the percent G+C content of predicted stems of Thermotogales 16S rRNA genes from strains with known optimal growth temperature is shown in Fig. 3B. In addition, we plotted the G+C content of mesotoga's closest relative in 16S rRNA gene trees, the strain we isolated at 70°C (Fig. 3B), and the percent G+C content of all the rRNA genes from mesophilic environments, arbitrarily positioned at 40°C. As observed earlier (12), the variation in percent G+C at temperatures below the 60 to 65°C optimal growth temperature is considerable, likely because there is low or no selective pressure for lower G+C contents at lower temperatures, and thus it is not possible to predict mesotoga's optimal growth temperature from such plots. However, Fig. 3B shows that the percent G+C content of mesotoga 16S rRNA stems is considerably lower than that of its closest relative TBF19.5, which likely has an optimal growth temperature of
70°C. The fact that the mesotoga 16S rRNA stem percent G+C is somewhat higher than what is observed for strains with optimum growth temperatures much higher than the environments in which the mesotoga rRNA genes have been detected could be due to the fact that it has just recently, in terms of 16S rRNA gene evolution, moved into more temperate environments. Alternatively, it might thrive in a range of mildly thermophilic and mesophilic environments. We favor the first explanation, since mesotoga rRNA genes have repeatedly been isolated from similar environments all over the world (Table 1).
Mesotoga habitats.
The habitats in which mesotoga rRNA genes have been observed are listed in Table 1. Where reported, all of these environments are anaerobic or microaerophilic, consistent with all previously described members of the order Thermotogales being anaerobes. Moreover, these sequences are frequently recovered from communities involved in the remediation of environmental contamination of toxic chemicals, in particular, organohalide pollutants (Table 1). None of the published studies suggested that the Thermotogales bacterium was directly involved in the bioremediation. However, the consistent finding of mesotoga in such communities does suggest that a role in such processes is possible. The fosmid clones sequenced here contain two genes that might be involved in such processes. Both MES0001 and MES0039 encode proteins similar to predicted dienelactone hydrolases. These enzymes use substrate-assisted catalysis to degrade aromatic compounds (14) and have been shown to be involved in the degradation of several chloroaromatic compounds (7). These genes have no match in T. maritima MSB8 or Thermosipho sp. TCF52B, suggesting that they may have been acquired by LGT. Unfortunately, we were not able to construct reliable alignments of these ORFs and their homologs in GenBank.
It is interesting that mesotoga's closest relative, TBF19.5, was isolated from the production waters from an oil well. Similarly, the second clade of sequences obtained from mesophilic environments is of sisters to another Thermotogales group, Petrotoga, which has so far been found only in oil wells (17, 24, 31). Thus, it is possible that both mesophilic Thermotogales lineages started out as inhabitants of oil reservoirs and have adapted to living at lower temperatures, perhaps as the temperature in the oil well lowered or by following petroleum leaking out of the reservoir. As these environments contain chemicals similar to those found in the environments where mesotoga rRNA genes have been detected (e.g., various aromatic hydrocarbons) (16, 33), this might have prepared mesotoga to live in environments with anthropogenic pollution.
Conclusion.
We used a metagenomic approach to get a first glimpse of the genome of a Thermotogales bacterium lineage living at mesophilic temperatures. Our analyses show that mesotoga may have acquired several genes from mesophilic bacteria living in the same types of environments, while many of its original proteins have secondarily adapted to function at lower temperatures. It is not uncommon to obtain, from environmental DNA samples, 16S rRNA sequences of taxa that are not expected to occur there. For instance, Marchant et al. (26) and Rahman et al. (34) recently described the isolation of highly thermophilic bacteria from mesophilic soil environments and they show that these bacteria are present in significant numbers in the soils studied. Whether such sequences are "contaminants" or identify organisms that are actually growing at the sampled site will always be a question, especially if isolation and cultivation are difficult. We have shown that metagenomics and comparative bioinformatics can provide an answer. It is particularly compelling that of the 31 ORFs that have matches to the hyperthermophile Thermotoga maritima, 29 appear (through CvP analysis) to be adapted to function at lower temperatures. We are currently attempting to obtain a pure mesotoga culture by inoculating standard Thermotogales media with the enrichments we used to construct the fosmid library and isolating single colonies. Such an isolate will help us further determine how this lineage has adapted to living at lower temperatures and should also be useful in uncovering thermophilic and hyperthermophilic determinants in other members of the order Thermotogales.
| ACKNOWLEDGMENTS |
|---|
We thank Joy Watts (Center of Marine Biotechnology, University of Maryland Biotechnology Institute, presently at the Department of Biological Sciences, Towson University) for providing enrichment cultures. We also thank E. Bapteste (Department of Biochemistry and Molecular Biology, Dalhousie University) for help with PhyloGenie.
| FOOTNOTES |
|---|
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| J. Bacteriol. | Microbiol. Mol. Biol. Rev. | Eukaryot. Cell | All ASM Journals |
|---|