Previous Article | Next Article ![]()
Applied and Environmental Microbiology, January 2002, p. 335-345, Vol. 68, No. 1
0099-2240/02/$04.00+0 DOI: 10.1128/AEM.68.1.335-345.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Eugene V. Koonin,2 L. Aravind,2 Lance T. Taylor,1 Heidi Seitz,3,
Jefferey L. Stein,4,
Daniel C. Bensen,4,
Robert A. Feldman,4,|| Ronald V. Swanson,4,# and Edward F. DeLong1*
Monterey Bay Aquarium Research Institute, Moss Landing, California 95039,1 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894,2 Marine Science Institute, University of California, Santa Barbara, California 93106,3 Diversa Corporation, San Diego, California 921214
Received 30 May 2001/ Accepted 1 October 2001
|
|
|---|
|
|
|---|
Members of the Archaea (48) are much more diverse and widespread than previously suspected. Representatives have now been detected in terrestrial environments, marine and lake sediments, and temperate ocean waters and polar seas (for a review, see reference 10). Marine planktonic archaea have been shown to occur in high relative abundance in the oceanic subsurface (13, 26, 27) and to dominate the prokaryotic fraction in the mesopelagic zone of the Pacific Ocean (20). Planktonic archaea also reach a relative seasonal maximum in winter Antarctic waters, approaching 10 to 30% of the total planktonic microbial population (12, 14, 28, 29). To gain additional information on yet-uncultivated Antarctic archaea, we constructed, by use of a fosmid vector (42), a recombinant DNA library that contained inserts of
40 kb from surface water picoplankton collected near Palmer Station, Antarctica, in late winter. Planktonic crenarchaeotal genome fragments that contained rRNA genes and originated from the same population were isolated and compared. These within-population genome comparisons yielded high-resolution information on genomic variations of uncultivated, sympatric archaeal cells. The entire sequences of one Antarctic crenarchaeotal clone (fosmid 74A4) and one temperate water subsurface crenarchaeotal clone (fosmid 4B7) (42) were also determined to compare the genomes of related, archaea inhabiting different oceanic provinces. This analysis provided comparative information on more distantly related crenarchaeotes derived from two different oceanic provinces. Our results demonstrate that microbial population structure can be determined at high resolution by examining genome divergence among highly related but genetically distinct cohorts coexisting in the same population. Insights into genomic variation, as it relates to rRNA sequence variation, also can be derived from comparisons of more distantly related microbial species sampled from different geographic locales.
|
|
|---|
900 ml. The cells were collected by centrifugation (4°C, 38,900 x g, 1 h) as previously described (12). The bacterioplankton pellet was embedded in agarose plugs as previously described (42). DNA extraction, preparation of the fosmid library, and multiplex PCR screening by using archaeon-biased 16S rRNA oligonucleotide primers were carried out as previously described (42). PCR primers used for screening the library were the 16S rRNA oligonucleotide primers Ar20-F (TTC CGG TTG ATC CYG CCR G) (13) and Arch958R (TCC GGC GTT GAM TCC AAT T) (9) and the 23S rRNA oligonucleotide primer LS2445a-R (CCC YGG GGT ARC TTT TCT ST) (13).
Subclone libraries.
A subclone library was constructed from fosmid 74A4 with DNA partially digested with RsaI (42). In this library, the content of the fosmid was not randomly represented; therefore, a second library was constructed. This library was prepared with randomly sheared DNA as described by Kawata et al. (21), except that the DNA was sheared by passage through a microemulsifying 25-gauge needle. DNA was cloned by using vector pCR 2.1 and the original TA cloning kit (Invitrogen Corporation, Carlsbad, Calif.). Fosmid subclone plasmids were purified by using a Mini-Prep 24 machine (MacConnell Research Corporation, San Diego, Calif.) according to the manufacturers instructions. Nucleotide sequences (800 bp, on average) were determined by the dideoxy termination reaction with fluorescence-labeled M13 forward and reverse primers, a SequiTherm EXCel II sequencing kit (Epicentre, Madison, Wis.), and a model 4200 automated DNA sequencer (LI-COR, Lincoln, Nebr.). The strategy for the construction of the subclone library and for the determination of the sequence of fosmid 4B7 was as previously described (41). Contiguous sequences were assembled by using SEQUENCHER 3.1.1 software (Gene Codes Co., Ann Arbor, Mich.).
Proteins, RNA genes, and motif search.
In-depth sequence analysis was based primarily on the use of the PSI-BLAST program (1) essentially as previously described (6). tRNAs were searched by using the tRNAscan-SE program (24). The BLAST-derived "e-values" (1) reported here take into account the statistics of database and local alignment size for the similarity scores obtained from local alignments. Signal peptides were predicted by using the SignalP program (30), and transmembrane segments were predicted by using the PHDhtm program (39, 40). Comparisons among 4B7, 74A4, and Cenarchaeum symbiosum (41) fosmids were performed with the BLAST 2 sequences program (43).
Phylogenetic analyses.
For distance and parsimony analyses of the inferred amino acid sequence of translation elongation factor 1-
(EF1-
), the program PaupSearch of the Wisconsin Package, version 10.0 (Genetics Computer Group, Madison, Wis.), was used.
Nucleotide sequence accession numbers.
Sequences reported in this study have been submitted to GenBank under the following accession numbers: AF393466, U40238, and AF393304 to AF393307.
|
|
|---|
Five unique archaeal 16S rRNA-containing fosmids (15G10, 19H8, 31B2, 74A4, and 83A10) were identified by restriction fragment length polymorphism analysis (data not shown). Sequence analyses of the archaeal 16S rRNAs showed that all belonged to group I marine planktonic Crenarchaeota (9). The 16S rRNA gene sequences from fosmid clones 83A10 and 31B2 were identical to one another. One clone (15G10) contained an rRNA gene identical in sequence to a PCR-amplified 16S rRNA gene (ANTARCTIC 12) (14) that was recovered at the same site 3 years prior to the sampling reported here. The 16S rRNA gene sequence variation observed among the other Antarctic archaeal fosmids was limited, occurring at a total of four nucleotide residues within the 16S rRNA gene (Fig. 1 and 2). Relative to those in 83A10 and 31B2, 16S rRNAs in the other clones contained only two (74A4 and 15G10) or three (19H8) nucleotide sequence differences (Fig. 1).
![]() View larger version (33K): [in a new window] |
FIG. 1. Genetic variability in the 16S rRNA gene, intergenic spacer region (ITS), and GSAT gene in sympatric Antarctic archaea. Asterisks represent a base substitution at residues where sequence variation was observed, relative to the 83A10 sequence. The plus sign indicates an insertion. Numbers at the right indicate the sequence positions (Escherichia coli numbering system) where variation was observed. Clone 19H8 is missing the GSAT gene because the recombinant DNA insert terminates in the ITS. Similarity tables for the 16S rRNA gene, ITS, and GSAT gene are shown at the far right. Fosmid 4B7 from deep temperate Pacific waters is included as an outgroup.
|
![]() ![]() View larger version (53K): [in a new window] |
FIG. 2. Nucleotide (A) and amino acid (B) sequence comparisons of the N terminus of the GSAT gene. The origin of the sequence is shown at the left. Triplets corresponding to amino acids are separated by spaces. Dashes indicate nucleotide residues identical to those of 15G10.
|
Identification of rRNA and protein coding genes on fosmid 74A4.
The entire sequence of the 43.6-kb genome fragment contained on fosmid 74A4 was determined. Forty-nine predicted protein coding frames were identified on fosmid 74A4 by using the National Center for Biotechnology Information BLAST 2.0 program (1), allowing the prediction of protein structural features such as transmembrane segments and signal peptides. Of these predicted proteins, 28 showed significant similarity to the products of genes with known functions, allowing a clear functional prediction, and 7 proteins were homologs of other, uncharacterized proteins. The remaining 14 predicted proteins had no detectable homologs, but some of them were predicted to be either membrane or secreted proteins on the basis of the predicted corresponding structural features (Table 1). The majority of the encoded proteins showed the greatest sequence similarity to homologs from other archaea. No specific affinity was noted with homologs from the only completely sequenced crenarchaeotal genome, that of Aeropyrum pernix, but several proteins showed greatest similarity to homologs from the partially sequenced genome of another crenarchaeote, Sulfolobus (Table 1). The protein sequence of EF1-
that showed the highest similarity to EF1-
proteins from Crenarchaeota was used for phylogenetic analysis. Marine crenarchaeotal EF1-
contained a modified 11-amino-acid insert that is shared between eukaryotes and Crenarchaeota but is not found in Euryarchaeota (36). However, both distance and parsimony analyses of the amino acid sequence of EF1-
could not resolve the placement of marine crenarchaeotal EF1-
(data not shown). Affiliation with either euryarchaeotes or crenarchaeotes was not well supported by either method.
|
View this table: [in a new window] |
TABLE 1. Predicted rRNA and protein coding genes in Antarctic crenarchaeotal fosmid 74A4a
|
|
View this table: [in a new window] |
TABLE 2. Predicted rRNA and protein coding genes in marine crenarchaeal fosmid 4B7a
|
, and ribosomal protein S10 on 74A4 and elongation factor 2 (EF2) on 4B7. The genes for EF1-
and S10 form a cluster that so far has not been detected in any other genomes. Another protein potentially involved in translation is a predicted RNA helicase encoded on 4B7.
DNA replication and repair.
One DNA repair enzyme, DinB/UmuC, recently identified as a repair-associated DNA polymerase (46), was identified on 4B7 and is most similar to the Dbh protein found in Sulfolobus solfataricus (22). Two other typical archaeal enzymes involved in replication and repair, DNA ligase (ATP dependent) and tRNA intron endonuclease, were identified on 74A4.
Transport and energy metabolism.
Both marine archaeal fosmids encoded several proteins implicated in energy conversion, particularly fatty acid metabolism. These included 3-hydroxyacyl-coenzyme A (CoA) dehydrogenase, acyl dehydratase, a predicted CoA-binding protein, glucose-1-phosphate dehydrogenase, and some other, poorly characterized oxidoreductases. Only one protein, a periplasmic solute-binding protein homologous to those found in iron(III) ATP-binding cassette transporters, was clearly assigned as a protein involved in transport. Other membrane transporters were tentatively identified on the basis of transmembrane segment predictions.
Miscellaneous proteins.
It was noted previously that moderately thermophilic archaea, such as Methanobacterium thermoautotrophicum or Methanosarcina barkeri, encode classical molecular chaperones of the hsp70 (DnaK) and hsp40 (DnaJ) families, whereas archaeal hyperthermophiles do not have those proteins (25). In agreement with this trend, two predicted marine archaeal proteins, 74A4#31 and 4B7#19, contain the J domain of the hsp40 family of chaperones and parts of the heat shock chaperone DnaJ, which interacts with and stimulates the hydrolysis of ATP by the cognate DnaK proteins (47). Protein 74A4#31 also contains a ferredoxin domain that is predicted to bind iron or possibly other ions and might be functionally analogous to the Zn clusters that are present in bacterial and eukaryotic DnaJ proteins. A J domain-ferredoxin fusion has not been reported so far. In contrast, protein 4B7#19 is predicted to be a type I membrane protein in which the J domain is the C-terminal cytoplasmic portion. Another interesting pair of paralogous proteins are 74A4#15 and 4B7#24, which are predicted membrane-associated, collagenase-like, metal-dependent proteases. A cluster of three genes on 74A4 encodes three predicted enzymes of the double-stranded beta-helix fold that might possess a variety of enzymatic activities, for example, that of sugar-phosphate isomerase.
Comparative genomic analyses of marine group I archaea.
Protein and RNA gene organizations on fosmids 74A4 and 4B7 were compared to that of the C. symbiosum (variant A) fosmid (41) by using the BLAST 2 Sequences program (43). The 23S and 16S rDNA sequences were used as an alignment point for the sequences (Fig. 3). The typical crenarchaeotal rRNA operon (16S-23S) was shared by all three fosmids, and the operon was adjacent to the GSAT gene in all. Fosmids 74A4 and 4B7 shared two other regions in common. The first region includes an unknown open reading frame, the BirA protein gene, and a hypothetical metalloprotease gene. The second region represents an inversion between the two genomes and includes genes for a putative periplasmic-binding protein of an iron(III) ATP-binding cassette transporter and a neighboring hypothetical protein (Fig. 3). Fosmid 74A4 and the fosmid from C. symbiosum shared a region including hypothetical protein 02 and the product of ORF01, previously reported only for C. symbiosum (41), and a putative glucose dehydrogenase which was also reported only for C. symbiosum (GenBank accession number AAC62698).
![]() View larger version (18K): [in a new window] |
FIG. 3. Genomic organization in planktonic marine crenarchaeotes. Gene maps for fosmids 4B7 (42) and 74A4 and C. symbiosum A (35) were aligned based on ribosomal 16S and 23S sequences. Homologous regions are connected with lines.
|
|
|
|---|
The 16S rRNA sequences of all five unique archaeal fosmids from the Antarctic picoplankton library were highly similar. One (15G10) was identical in sequence to a PCR-amplified 16S rRNA gene (ANTARCTIC 12) isolated from the same Antarctic waters in 1993 (3 years prior to the sampling of this report) (14). The sympatric archaeal 16S rRNA genes differed from one another by a maximum of three nucleotide substitutions over 1,418 nucleotide residues. However, the different restriction fragment length polymorphism patterns of the fosmids could not be explained solely by the distance between the rRNA operon and the cloning sites and suggested significant sequence divergence between these highly related variants. Since it was possible that the protein sequences and gene organization were less conserved than rRNA (as recently observed for natural variants of C. symbiosum) (41), we characterized the GSAT gene from the different clones. Microheterogeneity in the DNA sequence of the GSAT gene was observed for all clones, including 31B2 and 83A10, which were identical over the entire 16S rRNA gene. To our knowledge, this is one of the first studies that directly links 16S rRNA gene variation to heterogeneity in flanking protein coding genes in sympatric free-living microbes. Our study shows that within a single microbial population, considerable genomic variation exists, even among microbes with identical 16S rRNA gene sequences.
Sequence analysis of the 43- and 42-kb genome fragments derived from marine archaea from two different oceanic provinces showed many features typical of the domain Archaea. The majority of the genes identified, including those whose functions could not be predicted, most closely resembled archaeal protein genes (data not shown). Some features of these genome sequences, including the rRNA gene order, chromosomal organization, and nucleotide sequence of the genes for EF1-
and ribosomal protein S10, resembled those of other Crenarchaeota (7, 16, 17). The observation that the GSAT gene is located downstream of the ribosomal operon in all planktonic marine crenarchaeotes analyzed to date (41, 42; this study) also suggests some chromosomal organization common to marine group I crenarchaeotes. Specific gene sequences that we recovered might provide further insight into the relationship of marine crenarchaeotes to other cultivated species. For instance, EF1-
(EF-Tu in Bacteria) is a highly conserved protein that is found in all cellular organisms and that has proven extremely useful for global phylogenetic comparisons (2, 8, 37). Both distance and parsimony analyses of the EF1-
amino acid sequence derived from Antarctic fosmid 74A4, however, could not resolve its placement within the Crenarchaeota or Euryarchaeota (data not shown). Other important features of EF1-
are specific insertions and deletions among homologs that provide evidence for a specific evolutionary linkage between eukaryotes and crenarchaeotes. Antarctic crenarchaeotal EF1-
did contain an 11-amino-acid insertion (data not shown) that is characteristic of Eucarya and Crenarchaeota but not Euryarchaeota or bacteria (34). The sequence of the 11-amino-acid insertion of 74A4 most closely resembles the insertion of Pyrobaculum aerophilum, another crenarchaeote (four-amino-acid sequence difference; data not shown). This observation, together with the deep branching of planktonic archaeal 16S rRNA and of the EF2 amino acid sequence of fosmid 4B7 (42) in phylogenetic trees, could reflect a nonthermophilic origin of the crenarchaeotal subdivision. Alternatively, EF1-
homologs from as-yet-uncultured thermophilic relatives of low-temperature crenarchaeotes that have been detected in hot springs (3, 4) may branch more deeply, placing these thermophilic groups basal to cultivated crenarchaeotal lineages.
Small cold shock proteins were believed to be present only in bacteria and eukaryotes (18, 33). To date, no cold shock genes have been found in the archaeal genomes that have been entirely sequenced. It was therefore surprising to find a gene encoding a cold shock protein on fosmid 4B7. Based on amino acid similarity, this putative cold shock protein resembles those of bacteria. The observation that no other sequenced archaeal genomes encode members of the small cold shock protein family raises the possibility of lateral gene transfer of this gene into cold-adapted archaea from bacteria. Several other genes present on the two marine archaeal fosmids may have been acquired from bacteria, for example, the genes for double-stranded beta-helix fold proteins, the SWI/SNF helicase, and peptide methionine sulfoxide reductase. Even more unexpectedly, we identified a gene coding for a C2H2 Zn finger protein that so far has been found only in eukaryotes.
Analysis of the combined 80 kb of sequence data has identified several genes indicative of metabolic pathways (Tables 1 and 2). However, the lack of known transporters on the two fosmids makes it difficult to predict possible components being taken up by the archaeal cells. Additional data obtained for C. symbiosum and from other techniques, such as stable isotope and natural radiotracer analyses (34) and microautoradiography and fluorescence in situ hybridization (23, 31), should provide more insight into the potential metabolic traits of uncultivated marine archaea.
With regard to the marine planktonic crenarchaeotal clade in general, there exists considerable divergence and genome evolution. The 16S rRNA genes in fosmids 4B7 and 74A4 and C. symbiosum all share greater than 94% sequence similarity. However, the regions surrounding the rRNA operons vary substantially, indicating extensive genome rearrangements and various genome contents. These differences are also likely reflected in the phenotypic properties of the different crenarchaeotes that occupy the different oceanic regions.
Variation among homologous protein coding genes from microbes that share moderately similar (97%) 16S rRNA gene sequences has been reported for Prochlorococcus isolates derived from the same sample. Prochlorococcus isolates MED and SS120 shared 98% sequence similarity in their 16S rRNAs (44) but were only 76% identical based on their RNA polymerase C1 gene sequence (15). To our knowledge, however, genome variation among free-living, sympatric, uncultivated microbes has never been reported. Our data now provide a significant perspective on the extent of genome variation that can exist within a single population of free-living microbial cells that share identical (or nearly so) rRNA gene sequences. Our data suggest that the observed seasonal maximum of planktonic crenarchaeotes in Antarctic waters (29) is composed of (minimally) four highly related, yet nonidentical, co-occurring strains or variants. Of course, due to the labor and resource intensiveness of our procedures, our library screening procedure severely undersamples the actual population. Despite this undersampling, however, we did not recover any one dominant or identical genotype. Rather, we recovered identical or nearly identical rRNA phylotypes with significant differences in flanking genomic regions. Greater variation would be expected to be observed with larger sample size. These data strongly suggest that even within a single population, a very large amount of genomic heterogeneity exists that is undetectable by 16S rRNA sequence variation.
Presumably, genomic microheterogeneity can generate, eventually, physiological diversity. Even small variations among protein coding genes, such as those found here in sympatric archaeal cells that share identical or nearly identical rRNA gene sequences, could provide a selective advantage to the different genotypes under fluctuating environmental conditions. Such microvariations could confer greater fitness to the population as a whole under various environmental conditions, relative to any individual clonal phenotype. Our data strongly suggest that naturally occurring populations of bacteria and archaea can be viewed as nonclonal populations that harbor tremendous allelic variation.
This work was supported by NSF grants OPP94-18442 and OCE0001619 and the David and Lucile Packard Foundation to E.F.D. O.B. was supported by a fellowship from the European Molecular Biology Organization. D.C.B., R.V.S., R.A.F., and J.L.S. were supported by Diversa Corporation.
Present address: Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel. ![]()
Present address: Institut für Mikrobiologie und Weinforschung, Universität Mainz, 55099 Mainz, Germany. ![]()
Present address: Quorex Pharmaceuticals Inc., Carlsbad, CA 92009. ![]()
|| Present address: Molecular Dynamics Inc., Amersham Pharmacia Biotech, Sunnyvale, CA 94086. ![]()
# Present address: Syrrx Inc., San Diego, CA 92121. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»