Previous Article | Next Article ![]()
Applied and Environmental Microbiology, October 2006, p. 6841-6844, Vol. 72, No. 10
0099-2240/06/$08.00+0 doi:10.1128/AEM.00429-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
| SHORT REPORT |
School of Oceanography, University of Washington, Seattle, Washington 98195
Received 21 February 2006/ Accepted 28 July 2006
|
|
|---|
|
|
|---|
Planctomycetes are morphologically different from other bacteria and resemble eukaryotes in several ways. In particular, Planctomycetes do not have typical bacterial cell membranes. They lack peptidoglycan, and the fatty acids that constitute their phospholipids are mainly palmitic, palmitoleic, and oleic acids, which are typical of microeukaryotes, not Bacteria (9). Members of the Planctomycetes that utilize the anammox reaction have ether lipids (17, 18), once thought to be diagnostic of Archaea (11) but also found in some thermophiles and sulfate-reducing bacteria (10, 15). The planctomycete Gemmata obscuriglobus is capable of synthesizing sterols (16), a trait originally assumed to be eukaryotic but now also found in the bacterial lineages Methylococcales (16) and Myxobacteriales (2). Each of these sterol-utilizing bacterial lineages is characterized by cell compartmentalization (16). All Planctomycetes have at least a paryphoplasm: a membrane-bound, ribosome-free region (13). In addition to the paryphoplasm, G. obscuriglobus has a double membrane surrounding a nucleoid (6), and Planctomycetes that utilize the anammox reaction have an internal compartment called the anammoxosome where this reaction takes place (24).
Though morphological arguments suggest that Planctomycetes are similar to eukaryotes and therefore may be an ancient lineage, molecular analyses are in conflict with regard to the evolution of Planctomycetes. Phylogenetic analyses of the 16S rRNA gene have disagreed on the placement of the Planctomycetes in the bacterial phylogenetic tree (3, 5, 12, 14, 20, 25). A recent analysis resulted in significantly different 16S rRNA gene trees depending on the species of Planctomycetes included, consistent with the hypothesis that this group has experienced a high rate of evolution (12). Trees made from the most slowly evolving nucleotide bases of 16S rRNA have Planctomycetes at the root of the Bacteria (3). However, the limited number of positions in these analyses has led to the suggestion that these trees are not robust (5). Phylogenetic analysis using amino acid sequences of the elongation factor Tu also could not reliably resolve the division's position in the tree (8). Most recently, a comparison of 347 eukaryotic signature proteins to the unpublished genome of the planctomycete Gemmata sp. strain Wa-1 found a low number of high-scoring matches. Compared with matches to a proteobacterium, Gemmata did not appear significantly more closely related to Eukarya (21).
Genome sequences of two Planctomycetes (the marine organism R. baltica [7] and the uncultured anammox bacterium Kuenenia stuttgartiensis [22]) are now complete, and two more (Blastopirellula marina and Gemmata obscuriglobus) are in progress and publicly available, providing a molecular data set to address this question. Interestingly, both completed genomes have genes for peptidoglycan synthesis and other vestiges of a gram-negative cell wall (7, 22), which could suggest that the lineage Planctomycetes had once possessed and then lost a peptidoglycan cell wall. Phylogenetic trees of 39 concatenated ribosomal proteins (23), 49 concatenated protein sequences (22), and conserved proteins, such as ATP synthase and heat shock proteins 60 and 70 (7), indicate that Planctomycetes are not deeply branching. However, the higher-than-usual percentage of genes with BLAST hits to Archaea and Eukarya in R. baltica (7) might seem to support a basal position for the Planctomycetes. No trend was found for an organism of origin or a distinct functional category for these genes (7), reducing the likelihood that the result is due to a few instances of massive horizontal gene transfer.
R. baltica, the first planctomycete to be sequenced, is only distantly related to other divisions of Bacteria and hence to the majority of bacterial sequences in GenBank at the time of its sequencing. Thus, it is possible that the relatively large percentage of genes with best BLAST hits to Archaea and Eukarya was a result of the lack of close relatives available in the database. Other researchers have used comparisons of completed bacterial genomes to archaeal genomes to determine orthologs between genome pairs and to infer phylogeny (19). Although the number of orthologs was highly dependent on genome size (19), because this method compares one bacterial genome to one archaeal genome at a time, rather than examining the single best hit from a multiple-genome database, results are less susceptible to the taxonomic biases present in sequence databases. Here we employ a similar technique to compare bacterial genomes to archaeal and eukaryotic genomes in order to find out if the genomes of the planctomycetes R. baltica, B. marina, G. obscuriglobus, and K. stuttgartiensis contain an unusual number of eukaryotic and archaeal genes when biases in genome databases and genome sizes are taken into account.
The complete predicted protein sequences from 166 genomes (18 archaeal, 134 bacterial, and 14 eukaryotic) were downloaded from public databases in April 2006 (see Table S1 in the supplemental material). The genomes include a broad phylogenetic sampling, including both Crenarchaeotes and Euryarchaeotes and single and multicellular eukaryotes. In addition, preliminary sequence data for G. obscuriglobus were obtained from The Institute for Genomic Research through the website at http://www.tigr.org, and open reading frames were predicted using Glimmer, version 3.0 (4). Proteins in each bacterial genome were individually compared to proteins in each archaeal genome by reciprocal BLASTP (1). The best hit for each gene was extracted using a Perl script. The best hits for each bacterial-archaeal and archaeal-bacterial pair were compared in order to determine the number of reciprocal best hits for each pairwise comparison. The number of reciprocal best hits was counted by using an expectation value (E) of <e10 as the stringency threshold for determining a valid best hit. A subset of the genomes were also counted using a stringency threshold of E < e2. The same process was used to identify eukaryotic and bacterial protein pairs. Finally, archaeal genomes were compared to one another in order to confirm the underlying assumption that this method reflects organismal phylogeny. Indeed, Archaea had larger numbers of reciprocal best hits to other Archaea than did Bacteria of a similar genome size.
Regardless of the E value employed as a cutoff, the number of reciprocal BLAST hits between Bacteria and Archaea or Eukarya increased linearly with the number of genes in the bacterial genome until the number of genes in the bacterial genome reached approximately 4,000, after which the number of reciprocal best hits leveled off (Fig. 1) (see also Tables S2 and S3 in the supplemental material). Multiple regressions indicated that the number of reciprocal best hits depended on the numbers of both bacterial and archaeal genes (P < 0.001) but not on the number of eukaryotic genes. In the multiple regression of archaeal reciprocal best hit data, the number of genes in the bacterial genome was the most important variable used to explain the data (ß coefficient, >0.7). The lack of importance of the number of eukaryotic genes for the number of reciprocal best hits could be due to the fact that the number of genes in all the eukaryotic genomes is relatively large compared to the number of genes in a bacterial or archaeal genome.
![]() View larger version (44K): [in a new window] |
FIG. 1. Comparison between the number of best reciprocal BLAST hits (E < e10) between bacterial and archaeal (A) or bacterial and eukaryotic (B) genomes and the number of genes in the bacterial genome. Bacteria whose genomes have more than 4,000 genes are listed along the top axes. Each bacterial genome was separately compared to each archaeal or eukaryotic genome. Results are shown for the bacteria Kuenenia stuttgartiensis, Rhodopseudomonas palustris, Colwellia psychrerythraea, Bordetella bronchiseptica, Anabaena variabilis, Desulfitobacterium hafniense, Nostoc spp., Bacillus cereus, Nocardia farcinica, Blastopirellula marina, Pseudomonas fluorescens, Mesorhizobium loti, Hahella chejuensis, Rhodopirellula baltica, Streptomyces avermitilis, Bradyrhizobium japonicum, and Gemmata obscuriglobus. Information for bacteria with less than 4,000 genes can be found in Tables S2 and S3 in the supplemental material. Archaea used were Thermoplasma volcanium, Methanosphaera stadtmanae, Picrophilus torridus, Methanopyrus kandleri, Methanococcus maripaludis, Methanococcus jannaschii, Aeropyrum pernix, Methanobacterium thermoautotrophicum, Pyrococcus horikoshii, Halobacterium spp., Thermococcus kodakarensis, Archaeoglobus fulgidus, Pyrobaculum aerophilum, Natronomonas pharaonis, Sulfolobus tokodaii, Haloarcula marismortui, and Methanospirillum hungatei. Eukaryotes used were Aspergillus fumigatus, Ashbya gossypii, Arabidopsis thaliana, Caenorhabditis elegans, Candida glabrata, Dictyostelium discoideum, Drosophila melanogaster, Kluyveromyces lactis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Yarrowia lipolytica, Cyanidioschyzon merolae, Debaryomyces hansenii, and Thalassiosira pseudonana. Data for Photorhabdus luminescens and for Methanosarcina acetivorans were omitted for the sake of viewing clarity.
|
Our data suggest that the initial genome analyses found R. baltica to have an unusually large number of genes with best hits to Archaea and Eukarya (7) due to the dearth of sequences from closely related taxa in the database rather than because of its phylogenetic position. Indeed, when we repeated this analysis by comparing all 7,325 potential proteins in R. baltica to the NCBI nonredundant database in May 2006, 4,243 proteins had a significant hit (BLASTP expectation value, <103), compared to 3,380 in 2003. Furthermore, now that the genomes of K. stuttgartiensis and B. marina are available in GenBank, the number of apparent BLAST hits of R. baltica to archaeal and eukaryotic sequences has decreased to 0.52% and 0.87% of the total number of genes (0.9% and 1.5% of the genes with significant BLAST hits). These results underscore the danger of assigning phylogenetic affinities from top BLAST hits, especially if close relatives are absent from the sequence database.
C.A.F. was funded by NSF MCB 0132101 to J. Murray and J. Staley. G.R. was funded by NSF OCE-0220826. Sequencing of the G. obscuriglobus genome at TIGR was accomplished with support from the Department of Energy.
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»