Previous Article | Next Article ![]()
Applied and Environmental Microbiology, March 2006, p. 2092-2101, Vol. 72, No. 3
0099-2240/06/$08.00+0 doi:10.1128/AEM.72.3.2092-2101.2006
Brookhaven National Laboratory, Biology Department, Building 463, Upton, New York 11973,1 IRD, UR 101, IFR-BAIM, Université de Provence, ESIL, F-13288 Marseille Cedex 09, France,2 and Universiteit Hasselt, Environmental Sciences, Building D, Universitaire Campus, Diepenbeek B3590, Belgium3
Received 13 May 2005/ Accepted 28 December 2005
|
|
|---|
|
|
|---|
An emerging alternative approach to studying microbial communities is the use of microarrays designed to detect specific sequences from important lineages of microorganisms known or suspected to be present in a particular population (16, 21, 22). While this approach can provide a comprehensive quantitative survey for the presence or absence of a particular sequence, the technique has a closed architecture; i.e., it cannot identify novel sequences, nor can it easily distinguish between two or more closely related sequences in mixed populations. For microbial community analysis to be meaningful, the ability to identify previously uncharacterized members and to discriminate between closely related organisms in a population is essential.
The improvement of sequencing technologies has made metagenome shotgun sequencing of an environmental sample feasible; however, most environmental communities are far too complex to be fully sequenced in this manner. Reconstruction of community metagenomes was initially attempted for viral communities in the ocean and in human feces (2-4) and has since been applied on samples from the Sargasso Sea (29) and an acid mine drainage biofilm (25). Most marine communities, however, are far richer in species diversity, on the order of 100 to 200 species per ml of water (8, 9), further complicating sequencing and assembly efforts. Soil communities are even more complex, with an estimated species richness on the order of 4,000 species per gram of soil (8, 9, 24). Sequencing a soil community's metagenome will require technological developments aimed at increasing sequencing capacity and data processing, along with more cost-effective sequencing chemistries.
Recently, serial analysis of ribosomal sequence tags (SARST) was developed as a novel technique for characterizing microbial community composition. The SARST method captures sequence information from concatenates of short PCR amplicons (tags) derived from either the V1 (20) or V6 hypervariable regions (15) of 16S rRNA genes from complex bacterial populations. The major advantage of the SARST method is the high-throughput generation of sequence data that can be directly used for species identification and comparisons between different experiments.
Genome signature tags (GSTs) were developed for use in a cost-effective sequencing-based method to identify and quantitatively analyze genomic or mixtures of genomic DNA (10). In silico analysis of the 168 entries in the current NCBI database of completely sequenced genomes indicates that in many cases the individual GST sequences provided sufficient specificity for species identification. This result prompted us to look for fragmenting enzymes that would generate only one or a few informative tags per organism, which in turn would reduce the complexity of the tag libraries and decrease the amount of sequencing required to characterize complex microbial communities. Since we were unable to identify a universal fragmenting enzyme that would generate a limited number of tags from all the listed genomes, we decided to devise a modified approach that uses conserved gene sequences in place of the requirement for a fragmenting enzyme. Based on the position of the conserved region and the orientation of the primer, single-point GSTs (SP-GSTs) can be generated internally or externally for any gene of interest, such as the 16S rRNA, rpoC, recA, and uvrB genes. This new approach is schematically outlined for the 16S rRNA gene in Fig. 1. In this paper we describe the application of this method to discriminate between closely related strains of Bacillus cereus and Bacillus anthracis and to identify the individual members of a defined microbial community.
![]() View larger version (25K): [in a new window] |
FIG. 1. Schematic representation of the SP-GST approach on the 16S rRNA gene. Tags are generated upstream of a conserved domain (e.g., position 8 to 27 in the 16S rRNA gene). DNA is first cleaved to completion with Csp6I, the anchoring enzyme. The free cohesive ends are ligated with an asymmetrical oligonucleotide cassette that restores the recognition sequence for the anchoring enzyme and places an MmeI recognition sequence immediately adjacent to the restored sequence. A biotinylated primer specific for the region of position 8 to 27 in the 16S rRNA gene and pointing outward of this gene is used in a first PCR cycle to linearly amplify the region between this specific domain and the most proximal site for the anchoring enzyme. This will result in the synthesis of the complementary strand of the linker fragment. The resulting single-stranded fragment is then exponentially amplified using a primer unique to the restored sequence of the MmeI cassette and the domain-specific primer. The biotinylated products are bound to streptavidin-coated magnetic beads and then digested with MmeI to release the tags, which are further treated as described in our original GST protocol (10).
|
|
|
|---|
For the selection of anchoring enzymes, we surveyed the restriction enzyme database REBASE (http://rebase.neb.com) for enzymes that met the following criteria: are commercially available, recognize a palindromic sequence, create cohesive overhangs, are insensitive to inhibition by DNA methylation, and contain no ambiguity codes. Of the 3,816 enzymes in REBASE, 479 met these criteria and recognized a total of 59 unique sequences as their restriction sites, which we considered as candidates in our in silico survey.
The type IIS restriction enzymes MmeI and EcoP15I were considered for tag generation, yielding tags of 21 bp and 27 bp, respectively. The number of possible sequences for each tag is represented by the expression 4(mn+o), where m is the overhang length of the type IIS restriction enzyme, n is the length of the anchoring enzyme's recognition site, and o is the overlap in nucleotide sequence between recognition sites of the type IIS restriction site and the recognition site of the fragmenting enzyme. To design the best SP-GST protocol, 168 unique prokaryotic genomes were surveyed from the NCBI database (ftp://ftp.ncbi.nih.gov/genomes/bacteria) for the in silico generation of SP-GSTs from conserved domains present in the 16S rRNA, rpoC, recA, and uvrB genes. In cases where the sequences of several strains of the same species were available, we selected the strain with the larger genome.
DNA isolation, DNA fragmentation, and linker ligation.
Genomic DNA was isolated from all bacterial strains as described in Bron et al. (5). Before a DNA sample was used for the SP-GST protocol, its quality was checked via PCR using the 16S rRNA gene-specific primers 8F and 1392R (1) (Table 1) as previously described (6), while DNAs from clinical B. cereus isolates were also compared using BOX-PCR (18, 26, 30).
|
View this table: [in a new window] |
TABLE 1. Table of primersa
|
A nonphosphorylated Csp6I-compatible, asymmetric oligonucleotide cassette was created by mixing 3,600 pmol of Csp6I Cas1 (sense strand) and Csp6I Cas2 (antisense strand) (Table 1) with 10 µl of OFA buffer (10 mM Tris-acetate, pH 7.5, 10 mM Mg acetate, 50 mM K acetate; Amersham Biosciences, Piscataway, NJ) and 18 µl of TESL buffer (10 mM Tris-HCl, pH 8.0, 0.1 mM EDTA-Na3). The mixture was heated at 95°C for 2 min and then for 10 min at 65°C, 10 min at 37°C, and finally for 20 min at room temperature, and it was then placed on ice. Subsequently,
600 pmol was ligated to the fragmented DNA in a total volume of 50 µl of 1x ligase buffer containing 3 Weiss units of T4 DNA ligase (Takara, Pittsburgh, PA). The reaction mixture was incubated overnight at 16°C, purified by using a GFX PCR DNA and Gel Band Purification Kit (Amersham Biosciences, Piscataway, NJ) per the manufacturer's instructions, and eluted in 50 µl of double-distilled water (ddH2O).
Amplification of DNA/adapter product: extended tags.
PCR was performed on the ligation product using a 0.4 µM final concentration of both the 27R-Bio and GST1 primers (Table 1), in 1x Promega buffer (catalog no. M190G; Madison, WI) containing 2 mM Mg sulfate, a 0.3 mM concentration of each deoxynucleoside triphosphate, 5 µl of ligation product, and 1 unit of high fidelity platinum Taq DNA polymerase (Invitrogen, Carlsbad, CA) in a total volume of 50 µl. Only fragments that have the bound asymmetric linker cassette and that contain the annealing site for the 27R-Bio primer will be amplified during this PCR; these fragments are referred to as extended tags. The reaction was carried out with an initial denaturing step for 2 min at 95°C, followed by 35 cycles of 30 s at 95°C, 30 s at 52°C, and 3 min at 72°C, with a final extension step for 8 min at 72°C.
Binding biotinylated fragments to streptavidin beads and MmeI digestion.
A total of 100 µl of thoroughly suspended streptavidin MagneSphere paramagnetic particles (Promega, Madison, WI) was transferred to a 1.5-ml Eppendorf tube and bound to a magnetic stand. The storage buffer was removed; the beads were washed three times with 400 µl of 1x B&W buffer (10 mM Tris-HCl, pH 8.0, 2 M NaCl, 1 mM EDTA) and resuspended in 100 µl of 1x B&W buffer. A total of 50 µl of 2x B&W buffer was added to 50 µl of the PCR mixture, which was then added to the beads. The PCR tube was washed with 200 µl of 1x B&W buffer and pooled to the beads. The sample was mixed gently and incubated at room temperature for 1 h with occasional mixing. Unbound DNA fragments were removed by washing the beads once with 400 µl of 1x B&W buffer, twice with TE buffer (10 mM Tris-HCl, pH 7.5, 1 mM EDTA-Na3), and once with 100 µl of MmeI digestion buffer (100 mM HEPES, pH 8.0, 25 mM K acetate, pH 8.0, 50 mM Mg acetate, pH 8.0, 20 mM dithiothreitol, 4 mM S-adenosylomethionine-HCl). The beads were finally resuspended in 100 µl of 1x MmeI digestion buffer containing 8 U of MmeI (New England Biolabs, Beverly, MA) and incubated for 3 h at 37°C. The beads were collected, and the supernatant containing the released tags was removed to a clean 1.5-ml Eppendorf tube. The beads were washed with 100 µl of TESL buffer, which was combined with the first MmeI supernatant. The pooled MmeI digest was extracted with phenol-chloroform (equal mixture, vol/vol) and precipitated overnight at 20°C with 1 ml of ethanol after the addition of 133 µl of 7.5 M ammonium acetate and 2 µl of GlycoBlue (Ambion, Austin, TX). The resulting pellet was washed with cold 75% ethanol, dried, and resuspended in 29.5 µl of TESL buffer plus 4 µl of 10x T4 DNA ligase buffer (Takara, Pittsburgh, PA).
Degenerate linker ligation and GST amplification.
A degenerated linker containing a Csp6I site preceded by a TTT triplet (serving as punctuation mark to orient the GST toward the 16S rRNA gene) was prepared by annealing Deg.cas1 (sense strand) and Deg.cas2 (antisense strand) (Table 1) as described above. A total of 35 pmol of the degenerate linker (in 3.5 µl) was added to 29.5 µl of suspended tag solution, along with 3 µl of DNA ligase (8 Weiss units; Takara, Pittsburgh, PA), after which the reaction mixture was incubated overnight at 16°C. The ligation product was then subjected to PCR amplification, and the cycling programs and reaction mixture composition (50 µl) were as previously described (10) with the primers used being GST1 and GST2 (Table 1).
Linear amplifications to reduce heteroduplexes.
The homology of the adapter sequences results in the formation of heteroduplexes. These were resolved, the unincorporated primers were digested, and the final sample was purified using previously described methods (10) with the same primer modification mentioned above. The only exception is that the 500 µl of amplified product was purified using the GFX PCR DNA and Gel Band Purification Kit (Amersham Biosciences, Piscataway, NJ) according to the manufacturer's instructions, and eluted in 240 µl of ddH2O.
Csp6I digestion, concatenation, cloning, and sequencing.
A total of 240 µl of the product of linear amplification to reduce heteroduplexes was digested at 37°C for 3 h with 20 units of Csp6I in a final volume of 400 µl. The digest was purified via phenol-chloroform extraction (equal mixture, vol/vol), ethanol precipitated in the presence of Na acetate and GlycoBlue (Ambion, Austin TX) carrier, and resuspended in 20 µl of TESL buffer. The sample was then run on a 12% polyacrylamide gel with a 20-bp DNA ladder (Sigma, St. Louis, MO) and the 25-bp band corresponding to the tags was cut out. SP-GSTs were eluted from the pulverized gel by adding 250 µl of TESL buffer and 50 µl of 7.5 M ammonium acetate and by incubating the sample at 37°C for 6 h. The tags were purified using a GFX PCR DNA and Gel Band Purification Kit (Amersham Biosciences, Piscataway, NJ) column without the chaotrophic agent, thus trapping the polyacrylamide on the column and permitting the small tags to pass through. The tags were then precipitated by adding 2.5 volumes of ethanol and 2.5 µl of GlycoBlue (Ambion, Austin, TX); they were washed twice with ice-cold 80% ethanol, resuspended in 12.5 µl of TESL buffer, and concatenated as previously described (10). The concatenated tags were then purified using a GFX PCR DNA and Gel Band Purification Kit (Amersham Biosciences, Piscataway, NJ), and the sample was eluted in 20 µl of ddH2O. Five microliters of this product was cloned into NdeI-digested pGEM5 vector (Promega, Madison, WI). Recombinant clones, obtained after electroporation of competent Escherichia coli TOP10 cells (Invitrogen, Carlsbad, CA), were selected on LB plates containing 100 µg/ml ampicillin supplemented with 0.4 mg/ml X-Gal (5-bromo-4-chloro-3-indolyl-ß-D-galactopyranoside) and 0.1 mM IPTG (isopropyl-ß-D-thiogalactopyranoside).
Plasmid preps, DNA sequencing, and data analysis were carried out as previously described (10). The SP-GST analysis software we developed is now publicly available at (http://genome.bio.bnl.gov:16080/16S_defined_GSTs/).
Real-time PCR.
After sequencing the extended tags of each isolate, primer pairs were designed (see supplemental material) to determine the number of 16S rRNA genes linked to each tag. This was carried out via quantitative real-time PCR (qRT-PCR) using an iCycler and iQ SYBR Green Supermix kit (Bio-Rad, Hercules, CA) chemistry according to the manufacturer's instructions. The qRT-PCR consisted of an initial hot-start activation step at 80°C for 30 s, followed by a denaturation step at 95°C for 30 s, followed by 35 cycles at 95°C for 15 s, 55°C for 30 s, and 72°C for 1.5 min; the final extension was for 4 min at 72°C. It should be noted that for all Pseudomonas samples, qRT-PCR results obtained with 27R were normalized relative to sequence length to obtain true quantification values.
Software programs to extend the SP-GST concept to other functions.
Restriction enzyme candidate sequences were obtained via SQL queries on a PostgreSQL database containing relevant information downloaded from REBASE. A program written in C of our own making was used to produce tables of tag sequences and their respective distances from adjacent restriction enzyme sites for each bacterial genome and candidate enzyme. Primer sequences and positions were identified in each genome using a different C program which finds patterns and allows for substitution mismatches. To simulate the various protocols described in this work, we wrote a series of PERL scripts to collate the tag and primer site files and then summarize uniqueness and degeneracies across genomes. Phylogenetic assignments (based on Bergey's taxonomy) were made for each bacterial genome by automatically querying the Ribosomal Database Project website (http://rdp.cme.msu.edu/index.jsp) with 1,500-bp sequences extracted downstream of the 8F (Table 1) priming sites in each genome sequence.
|
|
|---|
|
View this table: [in a new window] |
TABLE 2. Overview of primer sequences designed for the in silico generation of unique identifier tagsa
|
|
View this table: [in a new window] |
TABLE 3. Numbers of rpoC-, uvrB-, and recA-derived tags and the phylogenetic level at which they are able to discriminate the 168 sequenced microbial genomes
|
HpyCH4IV yields a nondiscriminating tag downstream of the uvrB primer, which was present in Streptomyces coelicolor, Thermus thermophilus, and the archaeon Haloarcula marismortui (results not shown in Table 3). Csp6I also yields one upstream tag unable to distinguish the phylogenetic domain of two organisms: H. marismortui, an archaeon, and Nocardia farcinica, a bacterium. In all these cases the tags were located immediately adjacent (20 nucleotides) to the conserved priming sites.
For rpoC, tags generated upstream with HpyCH4IV and Sau3AI gave the best results (Table 3). The worst case for HpyCH4IV was a single upstream tag unable to discriminate at the phylum level between three Bordetella species, Bordetella bronchiseptica, Bordetella parapertussis, and Bordetella pertussis, and Caulobacter crescentus. However, in the complete data set (see supplemental material) tags generated with TasI (/AATT) as the anchoring enzyme were able to discriminate to at least the family level.
Many of the genomes examined contained more than one copy of the recA priming site, in some cases yielding multiple tags; however, tags generated with Csp6I discriminated all organisms to at least the genus level and most to the species level. More than one different tag per genome can be helpful for phylogenetic identification: HpyCH4IV sites upstream and Sau3AI sites downstream of the primer annealing position yielded some tags shared across phylogenetic domains, classes, and orders, but these organisms had additional recA-linked tags that permitted their identification at a lower phylogenetic level.
From this survey we can conclude that anchoring enzymes that yield excellent discrimination can be chosen for each conserved primer. However, there is not one choice that is optimal for all primers. Interestingly, we found that EcoP15I-generated tags (27 bp) in general did not provide much more information than the MmeI-generated tags (21 bp) in this data set.
SP-GSTs on the 16S rRNA gene: in silico analysis.
Although rpoC, uvrB, and recA can function as phylogenetic identifiers, their number of entries in current sequence databases is marginal. Given this limitation, the 16S rRNA gene is an ideal alternative. Though typically present in multiple copies, it is found in all prokaryotes and has several highly conserved regions. An in silico survey was performed on this gene, as previously described on the NCBI genomes, to examine how unique and informative 21-bp MmeI-generated tags would be for species identification. All 59 anchoring enzyme candidates were examined; only the exemplars HpyCH4IV, Csp6I, Sau3AI, and BamHI are presented in Table 4. The conserved sequence from position 8 to 27 was chosen as the optimal primer annealing site. Tags generated downstream of the priming site were largely located within the rRNA operon, and their uniqueness was compared to those generated from the V1 hypervariable region by SARST (20). Using SARST, several organisms were not discriminated below the family level and many downstream 16S-derived SP-GSTs yielded even less information. The best results using the 16S rRNA gene were obtained with Csp6I upstream-derived tags, which discriminated all organisms to at least the genus level and most organisms to the species level.
|
View this table: [in a new window] |
TABLE 4. Numbers of 16S rRNA gene-derived tags and the phylogenetic level at which they are able to discriminate the 140 sequenced bacterial genomes
|
|
View this table: [in a new window] |
TABLE 5. Comparison of the Csp6I-generated SP-GSTs located upstream of the 16S rRNA gene for B. cereus and B. anthracis speciesa
|
Using Csp6I, sequence analysis of the resulting library of concatenated tags demonstrated that we were successful in obtaining 16S-linked tags from all species (Table 6). We accurately found the two tags adjacent to the Csp6I sites upstream of three 16S rRNA genes of D. radiodurans: GST-DR1, which is present in both sections 8 and 213 of the complete chromosome 1 sequence, and GST-DR2 from section 198 of the chromosome 1 sequence. These two D. radiodurans tags were present in a ratio of approximately 2:1, demonstrating that tag frequency can provide quantitative information concerning the relative abundance of the target sequence from which they were derived. We also obtained an unexpected tag, GST-MP1, with the sequence GTACAGCGAGGAATGGCTCA from the D. radiodurans R1 177-kb megaplasmid. PCR amplification with the GST-MP1 and 27R primers and sequence analysis of the obtained amplicon showed that the 27R primer annealed to a region of the megaplasmid, which resulted in the generation of the GST-MP1 tag.
|
View this table: [in a new window] |
TABLE 6. 16S SP-GST identifier tags obtained from a microbial consortium comprised of D. radiodurans R1, B. licheniformis B-6-4J, A. globiformis DSM 20124, and the P. stutzeri strains Stanier 221 and BRW1a
|
As was the case for the D. radiodurans R1 tags, tag frequencies for B. licheniformis B-6-4J reflected the relative abundances of the target sequences from which they were derived. QPCR showed that GST-BL3, GST-BL4, and GST-BL5 were present once in the B. licheniformis B-6-4J genome, while GST-BL1 and GST-BL2 were observed twice as frequently. This suggests that the B. licheniformis B-6-4J genome contains, like strain ATCC 14580, seven copies of its 16S rRNA gene. These tag frequencies were compared to that of the fully sequenced genome of B. licheniformis ATCC 14580 (GenBank accession no. AE017333) and proved that these two species had four tags in common although their frequencies differed between strains. Three copies of GST-BL2, two copies of GST-BL3, and one copy of both GST-BL4 and GST-BL5 were identified in B. licheniformis ATCC 14580, while GST-BL1 turned out to be a tag unique to B. licheniformis B-6-4J.
Tag frequencies for P. stutzeri also reflected the relative abundances of the target sequences from which they were derived. QPCR showed that GST-PS2, GST-PS3, and GST-PS4 were present once in the P. stutzeri genome, while GST-PS1 was observed twice as frequently. These data were consistent for both P. stutzeri Stanier 221 and BRW1 strains and indicate that both P. stutzeri strains contain five copies of the 16S rRNA genes, one more than previously found for this species (http://rrndb.cme.msu.edu/rrndb/servlet/controller).
SP-GST distributions in A. globiformis suggested that this species has three copies of a 16S rRNA gene with two copies of GST-AG1 and a single copy of GST-AG2 (Table 6). Tags for A. globiformis DSM 20124 may possibly have been harder to obtain due to the high genomic GC content of this species. Due to the small number of tags recovered from this species, tagging using SP-GSTs was specifically carried out on A. globiformis DNA to determine if these results were accurate. Two additional tags were discovered belonging to this species which were linked to two additional copies of the 16S rRNA gene: GST-AG3, GTACTAGAGGGGCCCAAGAT, and GST-AG4, GTACTGCACCCGGGAGGGTG. QPCR on A. globiformis DSM 20124 confirmed that GST-AG1 was present twice as frequently on the genome as GST-AG2. QPCR further suggested that A. globiformis DSM 20124 has a total of 15 copies of its 16S rRNA gene, 8 of which were linked to GST-AG1, 4 to GST-AG2, 2 to GST-AG3, and 1 to GST-AG4.
|
|
|---|
Due to differences in codon usage, especially among unrelated species, it is not always easy (or reliable) to translate conserved protein domains into their corresponding DNA sequences. The use of SP-GSTs has the advantage over other PCR based methods in that only one conserved DNA domain, rather than two, is required for primer annealing. In addition to taxonomic identification, this method promises to be very useful for examining the distribution of specific functional genes that share only one conserved domain, which are inaccessible to SARST (15, 20) or other related techniques. Other advantages of the SP-GST method are as follows: (i) the number of tags, defined by the copy number of the target gene, is small and minimizes the amount of required sequencing; (ii) the output is actual DNA sequence data, making it easy to make comparisons between experiments; and (iii) different anchoring enzymes can be used to tailor the sampling depth to the community in question. This also avoids complications that would arise where a recognition site for an anchoring enzyme is present in a specific target domain, as was the case, for instance, with Sau3AI tags generated from the 16S rRNA gene.
The large number of 16S rRNA gene entries in databases has reinforced their extensive use for the culture-independent identification of prokaryotes by PCR and cloning. 16S rRNA gene-based tags thus have the advantage that they can be easily used to identify more organisms from which they were derived, making them preferable to those generated by other conserved genes. SP-GSTs located within the 16S rRNA gene have the advantage that the sequence is already tied to phylogenetic identification for many thousands of species. Since many tags (between 10 and 20, depending on the efficiency of the concatenation) are sequenced concomitantly, the SP-GSTs provide a major reduction in sequencing effort compared to 16S rRNA gene libraries for community analysis. However, their discriminatory power is reduced, given that they can also be located in regions conserved across species. Identifier tags upstream of the 16S rRNA gene are typically located in more variable regions and have a better discriminating power for species identification. A disadvantage of the upstream 16S SP-GST approach is that the identifier tags are not yet directly tied to species identification unless they are derived from species with sequenced genomes; this is also the case for tags derived from rpoC, uvrB, and recA. It is possible, however, to use the tag sequence as a primer in combination with a primer against a conserved domain in the 16S rRNA gene, such as the 1392R reverse primer, to amplify and subsequently identify by sequencing the 16S rRNA gene and, thus, the organism from which the tag was derived. Using this approach, databases of SP-GSTs can be established. This approach also helps to exclude false tags: as expected, the GST-MP1 tag derived from the D. radiodurans R1 177-kb megaplasmid in combination with the 1392R primer failed to provide a PCR amplicon (results not shown).
The best results using the 16S rRNA gene were obtained with Csp6I upstream-derived tags, which discriminated all organisms to at least the genus level and most organisms to the species level. Csp6I has the following additional characteristics that make this restriction enzyme a suitable choice: the enzyme frequently cuts all known microbial genomes (theoretically, once per 256 nucleotides); it is insensitive to Dam methylation; the in silico analysis showed that the average position of its first recognition site is approximately 400 to 600 nucleotides upstream of the 16S priming site, which is well within the range of a PCR; the enzyme generates a 2-nucleotide 5' cohesive end; and, unlike the case for Sau3AI, e.g., none of the highly conserved domains of the 16S rRNA gene contains a Csp6I site.
The discriminating power of identifier tags generated from the variable regions upstream of the 16S rRNA gene was further demonstrated in comparisons of Csp6I-based tags generated from closely related B. cereus and B. anthracis species. Although none of the generated tags could distinguish between the closely related B. anthracis strains, Csp6I-based tags upstream of the 16S rRNA gene were often found to be specific for the different B. cereus strains. From the three B. cereus strains whose genomes have been sequenced to completion, strain ZK was the most closely related to B. anthracis. This strain shared the highest number of tags with B. anthracis, including a unique internally generated identifier tag from one of its 16S rRNA genes (Table 4). The second closest strain is B. cereus ATCC 10987, and strain ATCC 14579 shares the lowest number of tags and is phylogenetically the most distant from B. anthracis. This was confirmed by determining the percentage of exactly shared sequences between the genomes of the individual species using MUMmer version 3.0 (14). Compared to the B. anthracis Ames reference strain, these percentages were 79.7%, 59.1%, and 44.4% for B. cereus ZK, B. cereus ATCC 10987, and B. cereus ATCC 14579, respectively. We conclude that tags upstream of the 16S rRNA gene can be used to rapidly provide information on the phylogenetic relationship between closely related Bacillus strains and species without the need of whole-genome sequencing. A prerequisite is that a sufficiently large number of unique identifier tags can be generated. This was also experimentally observed when we obtained tags from other clinical B. cereus isolates and compared them with tags found in the sequenced B. cereus and B. anthracis strains. Based on the tag profiles, our data suggest that these clinical isolates are more closely related to each other than to the fully sequenced strains. The fact that the majority of them share the largest numbers of tags with the genomes from B. cereus ZK and B. cereus ATCC 10987 would suggest that they are evolutionarily closer to these two strains than to B. cereus ATCC 14579 and the B. anthracis strains.
The SP-GST method successfully produced tags from all member species of a defined microbial consortium. Within a species, tag frequencies reflected the relative abundances of the target sequences from which they were derived and allowed for the determination of 16S rRNA gene copy numbers within a species. As has been documented for other PCR-based methods, amplification biases lead to a misrepresentation of the overall community composition. It was concluded that the great strength in this technology lies in its discriminatory power. Given its open architecture, diverse application, and the facility with which we can link tags to any gene of interest, the use of SP-GSTs has great potential and application for identifying and analyzing closely related species or strains and simple microbial communities.
We specially thank Diane Heiser, who received a Student Undergraduate Laboratory Internship from the Department of Energy's Office of Science, for her role in primer design. We also thank George T. Tortora for providing us with the clinical B. cereus isolates. Judi Romeo and Mike Blewitt are acknowledged for sequencing the SP-GSTs.
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»