Previous Article | Next Article ![]()
Applied and Environmental Microbiology, December 2003, p. 7298-7309, Vol. 69, No. 12
0099-2240/03/$08.00+0 DOI: 10.1128/AEM.69.12.7298-7309.2003
Copyright © 2003, American
Society for
Microbiology. All Rights Reserved.
Institut für Mikrobiologie und Genetik, Universität Göttingen,1 Laboratorium für Genomanalyse der Universität Göttingen,37077 Göttingen,2 Institut für Grenzflächenbiotechnologie, Universität Duisburg-Essen, 47057 Duisburg,3 Institut für Molekulare Enzymtechnologie, Heinrich Heine-Universität Düsseldorf, Forschungszentrum Jülich, 52425 Jülich,5 Gesellschaft für Biotechnologische Forschung, 38124 Braunschweig,Germany4
Received 23 May 2003/ Accepted 4 September 2003
|
|
|---|
|
|
|---|
To overcome the difficulties and limitations associated with cultivation techniques, several DNA-based molecular methods have been developed. In general, methods based on 16S rRNA gene analysis provide extensive information about the taxa and species present in an environment. However, these data usually provide little information about the functional role of any of the different microbes within the community and the genetic information they contain.
Metagenomics is a new and rapidly developing field that tries to analyze the complex genomes of microbial niches. Although the term metagenome has been introduced only recently to describe the genomes of noncultivated microbes present within a soil microbial community (10), earlier studies used a similar approach. In one such study, the approach was employed for the isolation of cellulases from a thermophilic environment (11), and in a second study the approach was used for the phylogenetic characterization of marine picoplankton (27).
Since then, an increasing number of publications have applied similar techniques to study the metagenomes of diverse microbial communities. The microbial niches addressed within these studies included the characterization of a wide range of different microbial communities ranging from soil and rather extreme environments to laboratory enrichments (2-5, 7, 19, 22, 24, 25, 32). The goal of these studies was to increase our understanding of ecological and molecular processes in the microbial communities, and several of these studies also aimed at an increased understanding of the genome information of individual microbes within the complex communities. In addition, the approach has been used to identify a number of novel biocatalysts and other interesting biomolecules from noncultivated microbes (8, 9, 11-13, 16, 17, 32). Altogether, these studies have led to an increased knowledge of the genetic structure of the microbial communities studied. Despite the number of metagenome studies, the amount of DNA information generated for individual niches is still very limited if one takes into account that the DNA information of several thousand different microbial genomes may be stored within a single microbial habitat (31). Thus, conclusions on the functional role of the microbes and sequences identified within these highly diverse bacterial communities cannot easily be made.
Since it can be assumed that microbial biofilms commonly found in drinking water distribution systems typically consist of fewer bacterial species than soil samples, they are ideal models to study metagenomes in combination with a phylogenetic analysis. The microbial communities that build drinking water biofilms have been characterized to some extent by 16S rRNA gene analyses. While these studies have mostly focused on the detection of bacterial species causing infectious diseases, such as Legionella and indicator organisms for fecal contamination, such as coliform bacteria (30), a number of more recent studies have led to the identification of novel nonpathogenic bacterial species (14, 15). Thus, the metagenomes of drinking water biofilms represent distinct and highly intriguing ecological niches, and their analysis is of significance to both the water suppliers and the consumers.
The aim of this study was to give insight into the metagenomes of drinking water biofilms grown on rubber-coated valves. For this purpose we characterized the phylogenetic structure of bacterial biofilms derived from rubber-coated drinking water valves by sequencing 16S rRNA clones. Additionally, we generated and analyzed about 2.0 Mb of DNA sequence information with a snapshot genome sequencing approach. With this sequence information, we analyzed the DNA sequence of four cosmid clones. This information has been used to set up a database to link the phylogenetic information with the genomic and functional information and to shed new light on the fine structure and evolution of the metagenomes of such complex microbial communities.
|
|
|---|
![]() View larger version (97K): [in a new window] |
FIG. 1. Bacterial biofilm observed on the surface
of a rubber-coated drinking water valve. The arrow indicates the
bacterial biofilm. The drinking water valve was obtained from a
drinking water pipe with an internal diameter of 15 cm. The valves are
normally submerged in the drinking
water.
|
Cosmid libraries were prepared in pWE15 (Stratagene, La Jolla, Calif.) with standard protocols (8). DNA fragments (20 to 40 kb) obtained after partial Sau3A digestion were ligated into the BamHI restriction sites of the cosmid vector. Phage packaging mixes were obtained from Stratagene (La Jolla, Calif.), and infection of Escherichia coli VCS257 was performed according to the manufacturer's protocol. For the construction of the snapshot libraries, DNA fragments with inserts of 3 to 7 kb were ligated into the sequencing vector pTZ19R (Amersham-Pharmacia, Essex, United Kingdom) and transformed into E. coli. For the construction of cosmid and small-insert libraries, the DNAs of the three samples were pooled. This was necessary because the amounts of DNA obtained from each individual sample were not sufficient to allow construction of the different samples. Therefore, the DNA of the three samples is considered a pool of biofilm genomes throughout this work, and the data summarize the possible microbes and genes occurring in these microbial niches.
PCR and cloning of 16S rRNA
sequences.
Bacterial
biofilm ribosomal DNAs (rRNAs) were amplified by PCR from DNA in
reaction mixtures containing (as final concentrations) 1x PCR
buffer (Perkin-Elmer), 2.5 mM MgCl2, 200 µM each
deoxynucleoside triphosphate, 300 nM each forward and reverse primer,
and 0.25 U of Taq DNA polymerase (Perkin-Elmer) per ml.
Reaction mixtures were incubated in a gradient thermal cycler (MJ
Research, Boston, Mass.) at 96°C for 5 min for initial
denaturation, followed by 25 to 35 cycles at 94°C for
30 s, 50°C for 45 s, and 72°C for
1.5 min, followed by a final extension period of 10 min at
72°C. For the clone library, rRNA genes were amplified with the
universal reverse oligonucleotide primer
5'-CGGCCTCTACCTTGTTACGAC-3' and
the universal forward primer
5'-AGAGTTTGATCCTCACTGGCTCAG-3'.
The resulting PCR products (of 1.5 kb) were cloned with a
Topo TA cloning kit in accordance with the manufacturer's
instructions (Invitrogen Corp., Karlsruhe, Germany). Plasmid DNAs
containing inserts were sequenced with standard protocols for ABI 377
automated sequencing.
Assignment of
cloned sequences to established phylogenetic divisions.
The phylogenetic diversity was
assessed with clone libraries of the 16S rRNA gene sequences of the
different biofilm samples. The cloned 16S rRNA gene sequences were
compared with reference sequences contained in the NCBI nucleotide
sequence database with the FASTA program. For calculation of a
phylogenetic tree, all ambiguous positions were excluded from
similarity calculations. Sequences were screened for chimeras with the
Check_Chimera program of the Ribosome Database Project and by
manual alignments of secondary structure. As a final check for
chimeras, each sequence was split into 5' and 3'
fragments, which were analyzed separately by Blast searching of
GenBank. Sequences for which either the 5' or 3'
fragment had significantly different closest relatives were considered
probable chimeras and were removed from the data set.
For calculation of the dendrogram shown in Fig. 2, cloned sequences were aligned with 16S rRNA gene sequences representative of the main bacterial divisions. Sequences were aligned with 16S rRNA sequences of other bacteria obtained from the Ribosomal Database Project (RPD-II) (18). Matrices of evolutionary distance were computed from the sequence alignment with the program DNADIST implemented in the software package Phylip (http://evolution.genetics.washington.edu/phylip.html) (version 3.5). For calculations of a phylogenetic tree from the distance matrices, the program applies the neighbor-joining method described by Saitou and Nei (23).
![]() ![]() View larger version (58K): [in a new window] |
FIG. 2. Dendrogram
of the 16S rRNA clones identified within the drinking water bacterial
community DNA, showing the relationship to the closest known relatives.
Only the proteobacterial lineages species are depicted. (A)
Alpha-proteobacterial lineage; (B) beta-proteobacterial lineage; (C)
gamma-protebacterial lineage; (D) delta-proteobacterial lineage. The
phylogenetic trees were calculated with the software package MEGA
version 2.1 (Molecular Evolutionary Genetics Analysis software, Arizona
State University, Tempe, Ariz.) and verified with the Phylip software
package from the Ribosomal Database Project (RDP)
(18). Only high-quality
sequences from the 16S rRNA gene clones were included in the
calculations, and the hypervariable regions in the 16S rRNA molecule
were excluded from the calculations. Numbers indicate data from a
bootstrap analysis, and values below 50% are not
indicated.
|
exonuclease digestion. The fragment size of the amplified V4 and V5
regions of the 16S rRNA gene was 390 bp. To introduce specific
secondary structures in the strands, samples were heat denatured,
quickly chilled on ice, and then electrophoresed on nondenaturing gels;
bands visualized by silver staining. For determining the band numbers,
the gels were digitized to create TIF files. Analysis of the 16S rRNA
fingerprints was performed with the software package GelCompare II
(Applied Maths, Kortrijk, Belgium). The background was subtracted with
rolling-circle correction (circle diameter, 30 points), and lanes were
normalized. Only bands with an intensity of 2% or more of the
total lane intensity were
considered.
Nucleotide sequence data
analysis.
Automated DNA
sequencing was performed with ABI377 and dye terminator chemistry
following the manufacturer's instructions; when required, gaps in
the DNA sequences were filled by PCR. The nucleotide sequences obtained
for larger contigs or complete cosmids have been deposited at GenBank,
and accession numbers are listed in Tables
4 to 6. The sequence data
of the cloned 16S rRNA genes were deposited at GenBank, and all 81
accession numbers
(AY187312
to
AY187393)
are available at
www.gwdg.de/
biofilm/ together with the
corresponding sequences; the snapshot genome sequences are available at
the same web pages together with the BlastX results. Also, the
sequences of the completely sequenced cosmids are available together
with the GenBank accession numbers and other useful information on this
web site. The GC contents of the nucleic acid sequences from the
cosmids and the snapshot library was calculated with the program Geecee
from the free open-source software package for sequence analysis,
Emboss
(http://www.hgmp.mrc.ac.uk/Software/EMBOSS/)
running on a local Linux
server.
|
View this table: [in a new window] |
TABLE 4. Genes
identified and observed similarities for ORFs identified on
pBioVa
|
|
View this table: [in a new window] |
TABLE 6. Genes
and observed similarities for ORFs identified on the 75-kb DNA fragment
formed by overlapping cosmids pbioX and pbioYa
|
|
|
|---|
Interestingly, no single phylogenetic group of bacteria dominated the clone collection. Instead, common bacterial phylotypes that occurred in the sample included members of the alpha-, beta-, delta-, and gamma-Proteobacteria, the Cytophaga-Flavobacterium-Bacteroides group, the Actinobacteria, and the low G+C gram-positive group (Fig. 2A to D and Table 1). Altogether, the Proteobacteria constituted 86% of the clones identified and thus represented the largest fraction of microbes within the bacterial community. The Actinobacteria, the low G+C gram-positives, the Cytophaga-Flavobacterium-Bacteroides group, and the Acidobacteria constituted only minor fractions of the clones. Finally, a small number of sequences were highly similar to unclassified bacteria (Table 1). While several of the isolates were highly similar to previously described microbial species within drinking water bacterial communities, a novel observation was that a limited number of the clones identified were closely related to the microbes which belong to the genera Rhizobium and Bradyrhizobium.
|
View this table: [in a new window] |
TABLE 1. Different
phylogenetic groups and clones observed in the 16S rRNA clone library
derived from a drinking water biofilm community DNAa
|
Random
sequencing of 2,500 small insert clones containing biofilm
DNA.
Total genomic DNA of the
drinking water biofilms was used to construct a small insert library
with inserts ranging in size from 1 to 5 kb. Of the 5,000 random
sequences obtained, 2,496 produced high-quality DNA sequences (Table
2); and 2,504 sequences (50.1%) were not included in further
analyses because of poor sequence quality, short length of the reads,
or vector contaminations. In this way, more than 2.0 Mb of high-quality
nucleotide sequence were collected and analyzed. The G+C
content of the high-quality sequences was 62%.
|
View this table: [in a new window] |
TABLE 2. Overview
of snapshot genome sequence analysis of a small insert library of
drinking water biofilm DNAa
|
biofilm.de
together with the corresponding sequences and other
information on the metagenome
analyzed. |
View this table: [in a new window] |
TABLE 3. Functional
classes and possible ORFs identified in random biofilm genome sequences
after automated BlastX searchesa
|
Furthermore, a number of genes were identified which encoded proteins involved in the degradation of aromatic compounds. These included mostly genes involved in the degradation of toluate and benzoate or related compounds. The partial proteins were highly similar to corresponding proteins from gram-positive and gram-negative microbes. Also, 14 possible ORFs were identified encoding genes involved in the degradation or modification of polysaccharides (i.e., starch and cellulose). Surprisingly, 21 putative protease genes were identified and 12 ORFs possibly involved in the catabolism of amino acids were found. Altogether, these findings suggest that the microbial community analyzed in this study is nutritionally highly diverse and able to catabolize a wide range of different carbon and energy sources.
Other remarkable features included the
identification of 28 (2.1%) sequences encoding genes that are
involved in protection response, such as antibiotic resistance or metal
detoxification. Eight clones carried possible tetracycline resistance
genes, and seven clones were possibly involved in resistance to
ß-lactam antibiotics. Two ORFs were identified that might be
linked to bacterial polyketide synthesis. Other features identified
included possible ORFs involved in bacterial photosynthesis and light
emission. Finally, it is noteworthy that none of the sequences of the
snapshot analysis encoded proteins specifically related to pathogenic
mechanisms. A complete list of all the possible ORFs identified and
their possible functions is available at
http://www.gwdg.de/
biofilm/overviewtable.htm.
Statistical
and phylogenetic analysis of the BlastX hits.
To further exploit the DNA snapshot
sequences, we analyzed the distribution of BlastX hits over different
bacterial groups. For this purpose, the results of 1,026 BlastX
similarity searches were evaluated. The statistical analysis of the
BlastX searches indicated that the major fraction (84%) of all
proteins were highly similar to proteins derived from the
Proteobacteria (Fig.
3A). Among these, most were highly similar to the group of the alpha- and
gamma-Proteobacteria (74.3%). Among the proteins most
similar to proteins originating from the alpha-Proteobacteria,
the largest fraction were highly similar to rhizobial proteins (i.e.,
Rhizobiales) (Fig.
3B). Interestingly, within
the Rhizobiales most deduced proteins were highly similar to
Sinorhizobium meliloti and Mesorhizobium loti
proteins (Fig. 3C). Also,
a significant fraction of proteins (5.5%) were highly similar to
proteins originating from microbes closely related to the typical
freshwater microbe Caulobacter crescentus.
![]() View larger version (26K): [in a new window] |
FIG. 3. Distribution
of BlastX similarities among bacterial phyla (A), bacterial families
(B), and bacterial genera and species (C). The results indicate the
distribution of the highest similarities observed after 1,026 BlastX
searches. The DNA sequences were derived from the snapshot genome
sequencing project, and only those sequences which resulted in the
identification of functional proteins were included. In B, only those
bacterial families for which more than 20 hits (2%) could be
observed were included; and in C, only the bacterial species for which
more than 10 hits (1%) could be observed were
included.
|
Sequence analysis of
large insert clones.
To
further exploit the genomic information contents of drinking water
biofilms, the complete DNA sequences of four cosmid clones were
determined. Three of the sequenced cosmid clones were randomly selected
from a library containing approximately 2,500 clones, and the sequenced
clones were designated pbioW, pbioV, and pbioX. Cosmid clone pbioY was
selected because it overlapped cosmid pbioX. In total, 144 kb of
additional DNA sequence information was generated, and this resulted in
the identification of 94 ORFs. The G+C content was highly
similar for all the cosmids and ranged between 65 and 67%. The
nucleotide sequences obtained for the cosmids have been deposited at
GenBank, and the accession numbers are listed in Tables
4 to 6. All ORFs
identified on the sequenced cosmids are summarized in Fig.
4.
![]() View larger version (20K): [in a new window] |
FIG. 4. Physical
maps of the central parts of four cosmid clones isolated from the
biofilm metagenome library. Arrows indicate the locations and
directions of transcription of the identified open reading frames
(ORFs) on the different cosmids. Observed similarities for the
indicated ORFs are listed in Tables
4 to
6, together with the
GenBank accession numbers. Color codes indicate the highest
similarities of the deduced protein sequences to proteins of known
bacterial species and their phylogenetic positions within the
Proteobacteria, Actinobacteria, and
Firmicutes. Only the highest similarities were considered for
this analysis; color coding is identical to the color coding used in
Fig. 3. The clones pbioX
and pbioY form a 75-kb overlapping DNA fragment, and the DNA sequence
was submitted to GenBank in two parts (contig1, csx001 to
csx024; contig2, csx026 to
csx051).
|
Cosmid clone pbioW encoded 22 ORFs in its 30.8-kb insert. Among these was a cluster of ORFs possibly involved in nitrogen regulatory circuits. Other possible genes encoded included a heme oxygenase and two proteins possibly involved in DNA modification. In addition, a number of hypothetical proteins were identified (Table 5).
|
View this table: [in a new window] |
TABLE 5. Genes
and observed similarities for ORFs identified on pBioWa
|
Of the 94 identified proteins, three were highly similar to proteins derived from delta-Proteobacteria, 14 were highly similar to proteins derived from the alpha-Proteobacteria, 34 were highly similar to the beta-proteobacterial proteins, and 30 were highly similar to proteins derived from gamma-proteobacterial species. Only 13 proteins were highly similar to known proteins from gram-positive microbes or other microbial species (Fig. 4). Altogether, the analysis of large insert clones also supports the concept that the studied biofilm is mainly constructed of microbes closely related to known species of the alpha-, beta-, and gamma-proteobacterial lineages.
In summary, all these data give a first insight into the complex metagenome of biofilms derived from rubber-coated valves used in drinking water networks.
|
|
|---|
Surprisingly, many of the 16S rRNA clones analyzed in this work were highly similar to microbes closely related to rhizobial species. Microorganisms from the gram-negative genera Rhizobium, Sinorhizobium, Bradyrhizobium, Mesorhizobium, and Azorhizobium, collectively termed rhizobia, are well known for their capacity to establish N2-fixing symbioses with legume plants (6). The observation here that rhizobial species or closely related microbes are possibly present within the biofilm community is a novel finding and might suggest an ecological role for these microbes in these nutrient-deprived environments.
In the second approach applied in this work, we analyzed and evaluated the genome information of 2,496 high-quality snapshot sequences (Table 2), which encode approximately 2.0 Mb of raw DNA sequence information. We speculate that the overall biofilm metagenome of the studied drinking water biofilm has a size of at least 324 to 648 Mb. This is based on the finding that the biofilm communities of the analyzed samples consisted of more than 81 different microbial species (Fig. 2), each with a genome size of 4 to 8 Mb. Thus, the amount of genomic sequences generated corresponds to approximately 0.3 to 0.6% of the genomic information stored in the samples analyzed.
Although the available sequences do not allow a complete analysis of the physiological and metabolic functions within this bacterial community, the sequences give a first insight into the biofilm genome structure and its metabolic potential. The genomic information suggests that the biofilm community is able to metabolize and catabolize a wide range of complex nutrients. Possible carbon sources available to the biofilm bacteria might be derived from the additives within the rubber coating, namely fatty acids, solubilizers, paraffin oils, and other compounds. However, additional experiments are necessary to correlate the occurrence and frequency of the catabolic genes identified through the snapshot sequencing with the in vivo catabolism of such compounds.
Our third strategy focused on the DNA analysis of large cosmid clones. The information on the DNA sequence has led to the identification of 94 ORFs (Fig. 4). The data obtained by whole cosmid sequencing supported the concept that our model microbial community is constructed of novel uncultured microbes closely related to Proteobacteria, and these findings support the data obtained through the phylogenetic analysis (Fig. 2A to D) and the snapshot sequencing analysis (Fig. 3). Although the observed similarities were surprisingly high for several of the identified genes, we have no evidence indicating from which species the sequenced cosmids were derived.
It is further noteworthy that the whole-cosmid sequencing as well as the snapshot genome sequencing did not result in the identification of genes encoding potential virulence factors. Therefore, we conclude that the microbial community within the studied microbial niche has only negligible pathogenic potential. This speculation is further supported by the phylogenetic data (Fig. 2). Although the phylogenetic analysis indicated the presence of several potentially pathogenic microbes, the majority of clones were similar to nonpathogenic microbial species.
Lastly, the sequencing data have been used to set up a publicly accessible database. Together with this information, a Blast server has been set up to allow in silico gene mining in the accumulated DNA sequences. Thus, one of the strengths of this report is that all the data generated are available in a searchable database, giving insight into the fine structure of the metagenome studied and other features of this unique biofilm community.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»