Previous Article | Next Article ![]()
Applied and Environmental Microbiology, November 2003, p. 6768-6776, Vol. 69, No. 11
0099-2240/03/$08.00+0 DOI: 10.1128/AEM.69.11.6768-6776.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Microbial Observatory of the North Temperate Lakes Long-Term Ecological Research Site (NTL-LTER), Center for Limnology,1 Department of Agronomy, University of WisconsinMadison, Madison, Wisconsin 537062
Received 30 December 2002/ Accepted 29 August 2003
|
|
|---|
|
|
|---|
Community analysis by terminal restriction fragment length polymorphism (T-RFLP) offers a compromise between sample throughput and phylogenetic resolution (24, 29). T-RFLP can be used to compare and contrast microbial community structure (4, 6-8, 12-14, 25, 30). Restriction fragment length is determined by the sequence of the fragment to be digested. Terminal restriction fragment (T-RF) lengths can be predicted from known sequences; thus, the T-RFLP method can potentially identify specific organisms in a community based on their T-RF length. There are many instances where the same T-RF length is predicted for multiple species of bacteria, but increased specificity can result from analysis of digests with multiple enzymes (9, 29).
Web-based resources available through the Ribosomal Database Project (http://rdp.cme.msu.edu) (31) or through the Microbial Community Analysis (MiCA) website at the University of Idaho (http://hermes.campus.uidaho.edu) allow prediction of T-RFs from 16S rRNA gene sequences presently in the database based on user input of PCR primers and restriction enzymes. Users are able to compare fragments obtained from T-RFLP analysis to the fragment sizes predicted from known 16S rRNA gene sequences. This comparison is accomplished by manually scanning the predicted fragment sizes to find a subset of species that produce a fragment size similar to one obtained experimentally. A species list can then be refined by comparison with additional digests. This is a reasonable procedure to carry out for uncomplicated profiles (e.g., individual unknown species or mixtures of very few species). However, such assignments are considerably more difficult in complex communities when each individual peak from each digest has the potential to represent multiple species (22, 31). Phylogenetic assignment for complex community profiles involves finding the intersection of the species sets represented by each peak. This is a daunting task when an individual T-RF may correspond to 15 or more species (31).
Discrepancies between observed and predicted fragment sizes may occur, an issue that further increases the list of species associated with each fragment (17, 22). This necessitates specification of size tolerances for matching. The large number of samples generated by studies of spatial or temporal variability magnifies this complexity. As a result, few studies are making use of the full potential of T-RFLP. Those studies which do utilize the T-RFLP method for identification of species from mixed communities often do so in conjunction with sequence analysis of a clone library (3, 6-8, 12-14, 19, 25, 27, 28, 30, 33) or by determination of T-RFs from cultured isolates (35). Others have coupled T-RFLP with Southern hybridization to assign T-RFs to specific phylogenetic groups (17). These variations are all designed to allow phylogenetic assignment from a single T-RFLP profile. A recent study assessing the value of T-RFLP profiles for phylogenetic inference similarly sought to maximize the information provided by a single digest by examining the phylogenetic specificity that can be achieved with different restriction enzymes (9).
In this work, the functionality of the T-RFLP method is expanded by automating the task of phylogenetic assignment from T-RF profiles produced by multiple digests. The use of multiple digests increases the specificity of phylogenetic inferences derived from T-RFLP profiles, and automation of this task makes this type of analysis accessible for analysis of complex communities. The effectiveness of the phylogenetic assignment tool is demonstrated with an analysis of aquatic microbial communities collected from a humic lake compared with the results of a 16S rRNA gene library.
|
|
|---|
Database.
The default database for PAT includes T-RFs predicted from 16S rRNA gene sequences by using the forward primer 8F (5'-AGAGTTTGATCMTGGCTCAG-3') (23) and a selection of tetrameric restriction enzymes. The database was generated by using the MiCA query function found at the MiCA website. PAT users may supply a custom database for analysis of their T-RFLP profiles. The MiCA website can be used to generate a database of predicted T-RFs for each species by using different primers or restriction enzymes, or such a database could be generated from sequence data obtained from a clone library. The database file is an array of T-RF lengths for each species, with restriction enzyme names as column headings and bacterial species designations as row labels. The column headings are used to generate the restriction enzyme list used by the program. This file must be formatted as a tab-delimited text file for use with the PAT program. There is no sequence analysis function included with the PAT algorithm.
Program implementation.
The program prompts the researcher for the necessary data and configuration files prior to computation. Input data files are the tab-delimited six-column output tables generated by automated sequencers such as the ABI Genetic Analyzer instruments. Each file includes the data obtained from a single restriction enzyme digest for a series of samples (e.g., data obtained from all HhaI digests for a batch of samples would be contained in one file, MspI digests would be contained in another file, and RsaI digest data would be contained in a third file). One data file is loaded into the program for each enzyme digest, and corresponding samples from each digest are required to have the same lane identification (ID) in order that the algorithm may know that the fragment data are derived from the same sample. This is easily accomplished by editing the lane ID labels prior to analysis in the event that corresponding samples do not have the same lane ID in a given data set. Each record in these data files contains a specific terminal-length size, lane ID, peak height, and a peak area found in the sample. The program also requires a database of known organisms with known T-RF lengths for each specific enzyme used. User input specifies the names of the enzymes associated with each uploaded digest file and the size tolerances to be used during the matching process.
The program uses a filtering approach for phylogenetic assignment. It performs a series of passes through the database of possible organisms to discover all possible phylogenetic assignments consistent with a given set of microbial community data. This series of passes is illustrated in Fig. 1.
![]() View larger version (26K): [in a new window] |
FIG. 1. Cycle of the matching algorithm. For each digest, each individual fragment is assigned a collection of species from the database that are predicted to have a T-RF length that matches the observed fragment length (within the user-specified size tolerance) (step 1). Records that do not match fragments found in additional digests (steps 2 and 3) are discarded. Steps 1 through 3 are repeated for each digest file.
|
For each subsequent restriction enzyme digest, possible matches present in the collection are compared to the T-RF lengths for the digest. Records that do not match a T-RF length found in subsequent enzyme digests are discarded from the collection. Steps 2 and 3 in the diagram represent this filtering technique. The collection then contains a list of species from the known organism database that have matched T-RF lengths for all enzyme digests.
The collection creation and filtering cycle is carried out for each fragment length present in the first digest, and steps 1 through 3 are repeated for each digest file. The final result is a series of collections of phylogenetic assignments representing the matches from a given sample. Each possible match contains the organism's name, the observed T-RF lengths that generated each assignment, and the lane ID and peak area of matched fragments.
Output.
After the calculations have been performed on a series of input digestions, the program produces several tab-delimited output files. The first file contains the phylogenetic matches as determined by the program. The record for each match contains the lane ID, peak area, and length of the fragment from each digest. A second file contains an analysis of the completeness of the matching algorithm and the abundance of unmatched fragments from each restriction digest. The final file contains a list of unmatched fragments. It includes lane IDs and peak areas for T-RFs that were not matched to a known organism by the program. All of these files are compatible with Excel or other common spreadsheet applications.
Portability.
This program was written in Java to allow portability across multiple computer platforms. The Java implementation also provided the opportunity to quickly create a web-based interface through the use of Java servlets and jsp pages.
Web interface.
The web interface for the PAT tool allows investigators to submit, process, and manage every aspect of the phylogenetic assignment. It can be accessed at http://trflp.limnology.wisc.edu. The web-based PAT allows for each user of the system to register and maintain an account for data and configuration information. A user can upload digest and database files for use with the account. Users can also elect to use the default database of T-RFs produced by the in silico digestion of known 16S rRNA gene sequences by 27 separate restriction enzymes. The web interface provides management of a user's enzyme digests, known organism database files, and bin size configuration (Fig. 2).
![]() View larger version (68K): [in a new window] |
FIG. 2. The PAT web interface allows users to manage uploaded digest files, known organism database files, and T-RF bin size configuration.
|
![]() View larger version (50K): [in a new window] |
FIG. 3. The PAT web interface prompts users to select the restriction enzymes used to generate their T-RF digest files from a list of enzymes found in the selected database.
|
Community analysis.
T-RFLP data produced from aquatic samples were analyzed to demonstrate the functionality of the PAT. Samples were obtained from Devil's Lake (45°31'N, 88°52'W), a humic lake located in northern Wisconsin with a surface area of 12.5 ha and a maximum depth of 7 m.
Sample collection.
Whole water samples were collected on 19 August 2002 from discrete depths throughout the water column. Samples for culture-independent analyses of community composition were immediately concentrated in aliquots of 250 ml onto 0.2-µm-pore-size filters (Supor-200; Gelman). Filters were placed in cryovials, frozen immediately, and stored at -80°C.
DNA extraction.
DNA was extracted from aquatic samples by using the method described by Fisher and Triplett (11).
T-RFLP.
PCRs to amplify 16S rRNA genes for T-RFLP analysis (24) contained PCR buffer consisting of 50 mM Tris (pH 8.0), 250 µg of bovine serum albumin per ml, 3.0 mM MgCl2 (catalog no. 1770; Idaho Technology), 250 µM (each) deoxynucleoside triphosphate, 10 pmol of each primer, 1.25 U of Taq polymerase (Promega), and 1 µl of extracted DNA in a final volume of 25 µl. The primers used were 8F (labeled with 6-FAM) and 1492R (23). Reactions were cycled in an Eppendorf MasterCycler Gradient (Eppendorf) with an initial denaturation at 94°C for 2 min, followed by 30 cycles of 94°C for 35 s, 55°C for 45 s, and 72°C for 2 min, with a final extension carried out at 72°C for 2 min. PCR products were digested with HhaI, MspI, and RsaI. Multiple single digests were carried out to increase the specificity of the phylogenetic assignments. Denaturing capillary electrophoresis was carried out for each digest by using an ABI 310 genetic analyzer (PE Applied Biosystems). Electrophoresis conditions were 60°C and 15 kV, with a run time of 50 min with POP-4 polymer. A custom 200- to 2,000-bp rhodamine X-labeled size standard (Bioventures) was used as the internal size standard for each sample. The data were analyzed using GeneScan 3.1 software (Perkin-Elmer).
Analysis.
Data tables containing fragment size and abundance data for each digest of the aquatic samples were exported from GeneScan, and the resulting text file was uploaded to the PAT website for phylogenetic assignment.
Clone library analysis.
Amplification of the 16S rRNA genes by PCR for clone library analysis of microbial community samples collected at 4.9 and 5.5 m was carried out as described for T-RFLP except that the 8F primer was unlabeled. Three µl of PCR product from each community was ligated into the pGEM-T vector (Promega). Clone libraries were produced with the pGEM-T Easy Vector System II (no. A1380; Promega) according to the manufacturer's instructions. Sequencing of 48 clones from each sample was carried out as described by Vergin et al. (39). Briefly, insert DNA was sequenced by using the ABI PRISM Big Dye Terminator cycle sequencing kit and 30 pmol of the 8F primer according to standard cycle sequencing parameters. The sequences were edited by using ABIView and submitted to BLAST (1) for initial phylogenetic assignment. Additional information regarding taxonomic assignment of cloned sequences was obtained by using the Hierarchy Browser function of the Ribosomal Database Project II at http://rdp.cme.msu.edu (5).
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. PAT output for samples collected from Devil's Lake (northern Wisconsin) in August 2002a
|
- and ß-Proteobacteria and Actinobacteria.
![]() View larger version (49K): [in a new window] |
FIG. 4. Bacterial classes detected by T-RFLP in Devil's Lake (northern Wisconsin) at various depths in August 2002. The plots indicate the number of fragments in each sample that could be assigned to each indicated phylogenetic class.
|
-Proteobacteria were detected at every sample depth (Fig. 4), the PAT assignments can be used to describe the variability of T-RFs within this class. T-RFs classified as Oceanospirillaceae, for example, are detected at only a few depths, while Vibrionaceae are detected throughout the water column (Fig. 5).
![]() View larger version (35K): [in a new window] |
FIG. 5. Bacterial diversity of families within the -Proteobacteria in Devil's Lake (northern Wisconsin) throughout the water column in August 2002. The y axis plots the number of fragments in each sample that could be assigned to a particular phylogenetic group within the -Proteobacteria.
|
-Proteobacteria, Acidobacteria, Flavobacteria, and Fusobacteria) were found only in the T-RFLP analysis. Eight classes were found in both analyses. |
View this table: [in a new window] |
TABLE 2. Comparison of taxonomic groups detected by T-RFLP using PAT with those detected by clone library analysisa
|
|
|
|---|
Attempts to conduct phylogenetic assignment for T-RFs by using a single digest are effective only when the amplified PCR products are produced from a single bacterial division (or smaller taxonomic group) (9). The predictive power of a single digest is also diminished by inaccuracies in fragment size measurement that require assignment of each T-RF to a bin of contiguous fragment sizes (which will almost certainly correspond to more organisms than are represented by a single observed T-RF). The phylogenetic information derived from a single digest is further reduced when T-RFs are generated from multiple bacterial subdivisions and also as a result of the ever-increasing number of sequences available for reference (9, 22). Thus, the use of multiple digests is recommended to accomplish any degree of phylogenetic resolution (9, 20, 22, 29, 31). However, one of the reasons that researchers have sought to determine the phylogenetic specificity of T-RFs from a single digest is the difficulty in correlating peaks from different digests produced from complex mixtures of bacteria.
This and other issues associated with database matching of T-RFLP profiles were reviewed by Kitts (22). Phylogenetic assignment uncertainties that arise due to discrepancies between predicted and observed T-RF sizes are multiplied when several digests are considered and contribute to the difficulty in manually interpreting community composition from multiple T-RFLP profiles.
Automation of T-RFLP analysis resolves many of the issues involved with analysis of complex profiles. Our PAT incorporates all of the recommendations found in the literature intended to maximize identification of T-RF peaks. First, it utilizes T-RFLP profiles from multiple digests (9, 20, 22, 29, 31). Second, it incorporates a user-defined window of size tolerances to accommodate discrepancies between predicted and observed T-RF lengths (22). In addition to these recommendations, we have incorporated the ability to define an increasing bin size for this matching window in recognition of the observation that uncertainty in size calling increases with increasing fragment length. Automation of T-RFLP analysis greatly reduces the time involved in phylogenetic assignment of T-RF peaks; in addition, it ensures that the size tolerance windows are uniformly applied and that all possible species matches are considered, thus reducing user-introduced bias. Also included is the option to compare the T-RFLP profiles to a user-defined database, allowing the phylogenetic assignments to be restricted to specific taxonomic groups or to a species list generated from a clone library. A database of T-RFs generated for a gene other than the 16S rRNA gene [e.g., nifH (36, 40), mer (2), elongation factor Tu (29), heat shock proteins (10, 15), glutamine synthetase (37), ATPases (26), and topoisomerases (18)] could also be used for analysis by the PAT algorithm.
The phylogenetic assignment tool generates a list of peaks that are not matched to species in the database. One explanation for the unmatched T-RFs is that these peaks are derived from previously uncharacterized bacteria. However, it is also possible that these peaks merely represent instances where there was insufficient information to make a match. This could occur when a 16S rRNA gene sequence in the database is not full length and therefore fails to match the input primer used for prediction of T-RFs. Unmatched T-RFs may also represent instances where one or more of the peaks produced by an organism falls below the peak detection threshold of the electrophoresis instrument and thus is not included in the PAT analysis.
Analysis of aquatic microbial community samples throughout the water column in a humic lake was used to demonstrate the functionality of PAT. The species lists generated by the PAT program were used to examine the variability in bacterial community composition at different levels of phylogenetic resolution. Many of the T-RFs were classified as
- and ß-Proteobacteria and Actinobacteria, suggesting that these taxa may dominate the microbial communities in this lake.
The phylogenetic assignments generated by using PAT were compared to the results obtained through clone library analysis of the same sample. For the two lake samples compared, 64.3 and 61.5% of the bacterial classes identified were found by both approaches. However, the remainder were found by only one approach. This suggests that the two approaches give similar results and can also complement each other in the identification of taxa not found by using only one method. For those taxa found only by the T-RFLP approach, taxon-specific primers can be designed to screen clones in a 16S rRNA library for these groups. The taxa identified only through the clone library analysis may indicate that some phylogenetic groups are not well represented in the T-RF database.
T-RFLP has demonstrated its utility as a community fingerprint method for comparisons of bacterial community composition between environments or treatments. The phylogenetic assignment tool described here extends this utility by offering a rapid, automated approach for phylogenetic analysis of T-RFs.
|
|
|---|
re, and L. J. Monrozier. 2001. Comparison of nifH gene pools in soils and soil microenvironments with contrasting properties. Appl. Environ. Microbiol. 67:2255-2262.
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»