Previous Article | Next Article ![]()
Applied and Environmental Microbiology, March 2006, p. 1843-1851, Vol. 72, No. 3
0099-2240/06/$08.00+0 doi:10.1128/AEM.72.3.1843-1851.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Department of Microbiology and Pathology, Washington State University, Pullman, Washington,1 Department of Civil and Environmental Engineering, University of California at Davis, Davis, California,2 Department of Veterinary Clinical Sciences, Washington State University, Pullman, Washington3
Received 28 June 2005/ Accepted 27 December 2005
|
|
|---|
|
|
|---|
BST methods can be characterized as library dependent and library independent. Library-dependent methods begin with characterization of a large collection of bacteria that are isolated from known sources (e.g., humans or cows). Only one species or genus of bacteria is usually considered in a given study (e.g., Escherichia coli, Enterococcus, or Streptococcus), and the traits of interest might be antibiotic resistance profiles (10, 12, 14, 30, 31), carbon utilization profiles (11), DNA fingerprints (5, 8, 13, 15, 29), or other DNA polymorphisms (17, 20). Once a large number of strains have been characterized, these data are used as "training" data to develop a classification equation using multivariate statistics. The reliability of the resultant classification function is evaluated both by its ability to classify the original training data and by its ability to correctly classify independent isolates from known sources, with the latter being the most legitimate means to assess the quality of the classification equation. In both cases, the analysis reports the percent correct classification rates for the various host animals.
A number of challenges can arise from the library-dependent strategy. The first challenge is that a large number of isolates must be characterized before a suitable classification equation can be generated. No guidelines exist to help determine how many isolates should be characterized per host, how many hosts should be sampled, or how many hosts should be included in the sampling frame (15). More representation is clearly a preferred strategy, but actual guidelines may be difficult to generate because this will be a function of the variance in the system under study (27). Also, there is a spatial component to the analysis. Because most of the traits considered by these methods are unlikely to represent selective loci that are specific to the host organisms (e.g., pulsed-field gel electrophoresis patterns are unlikely to be related to fitness), the libraries may have geographic limitations. While a library may perform within acceptable limits for a given watershed, it may not be applicable to an adjacent watershed (13, 22, 31). There is probably a temporal component leading to potential misclassification errors because we can expect selectively neutral traits to drift with time. The coefficients for classification functions are not reported in most papers, and the classification databases are effectively proprietary information, which is understandable because the host institution often carries the burden of maintenance of classification libraries. Nevertheless, when raw data and classification functions are not available, the reported classification system is effectively unavailable to the broader community. Finally, suitable methods are lacking for assigning valid confidence intervals for estimates originating from classification functions.
Library-independent methods require the development of an initial library of host-specific strains that are then characterized to identify host-specific genetic or phenotypic markers. The key difference from library-dependent methods is that these markers can be strongly associated only with bacterial strains from a single host animal. Thus, the markers can be used in a binary context (presence or absence) without reference to a classification function or library. Ideally, library-independent markers are "universal" in both space and time. To meet these criteria, the markers must target either an organism that appears specific to a given host or a gene product or gene sequence that is specific to bacteria from a given host. In the latter case, it is likely that the trait of interest confers a selective advantage in a specific host, which would explain a strong association between the marker and host. While several papers report polymorphisms in the 16S rRNA gene for bacterial species closely associated with specific hosts (1, 2, 6, 7, 9), there is only one published example of a host-specific genetic marker for a common, facultative aerobe such as Enterococcus (21), and this marker is only useful for detecting human fecal pollution.
The lack of progress in the identification of DNA markers suitable for identifying bacteria from specific hosts is probably related to our lack of understanding about which unique genes would be required to persist in different hosts and our inability to screen large numbers of potential markers. For this project, we attempted to circumvent the latter limitation by using a custom DNA microarray to screen a large number of genetic sequences from bacteria originating from different host animals. The microarray was composed of cloned DNA fragments from a large collection of isolates of Enterococcus from known sources (hence the term "mixed-genome" microarray) (4). This process identified a number of genetic markers that could be used in a library-dependent context, and these have been partially validated using Enterococcus isolates from known sources from across the United States.
|
|
|---|
Isolation of Enterococcus spp.
Fecal samples were collected with a sterile tongue depressor or swab and stored in a sterile plastic bag at 4°C until processed. All samples were processed within 7 days of collection. A sterile swab containing fecal material was used to inoculate 3 ml salt tolerance solution (2.5% [wt/vol] brain heart infusion broth [Becton-Dickinson, Sparks, MD], 0.5% dextrose, 6% NaCl, 0.0016% bromocresol purple) and incubated overnight at 45°C. Positive samples (yellow) were plated onto M-Enterococcus agar (Remel, Lenexa, KS) and incubated at 37°C for 72 to 96 h. This isolation procedure selects for Enterococcus, and to verify that selection was effective, we confirmed a representative sample of 52 isolates (from set A) as Enterococcus spp. by using the API Strep test (bioMérieux, Inc., Hazelwood, MO).
Fresh isolates were picked from the M-Enterococcus agar into brain heart infusion broth (Becton-Dickinson) and grown overnight at 37°C. Three milliliters of broth culture was used for genomic DNA (gDNA) extraction, and 1 ml was banked at 80°C after the addition of 330 µl phosphate-buffered glycerol (45 mM Na2HPO4, 34 mM NaH2PO4, 58.8% [vol/vol] glycerol).
Construction of mixed-genome shotgun libraries.
A separate shotgun library was constructed for each of the following hosts: cow, dog, human, waterfowl, and elk/deer. gDNAs were extracted from all isolates in set A by using a DNeasy tissue kit (QIAGEN, Valencia, CA) and were quantified by electrophoresis and spectroscopy. An equal amount of gDNAs from 50 isolates per host (only 43 for waterfowl) was mixed to make five mixed-genome pools containing 3 to 5 µg DNA in 160 µl elution buffer (supplied with the DNeasy tissue kit). The gDNAs were divided into 40-µl aliquots and sonicated to obtain 500- to 700-bp fragments. This included up to four sonication treatments (45 s to 1 min each at level 7) with a Misonix cup-horn sonicator (Misonix Inc., Farmingdale, NY). Samples were cooled on ice between treatments. Sonicated gDNA fragments were separated in a 1% agarose gel, the 600-bp region was excised, and the DNA was extracted using a Montage gel extraction kit (Millipore, Billerica, MA) followed by ethanol precipitation. A TOPO Shotgun subcloning kit (Invitrogen, Carlsbad, CA) was used according to the manufacturer's instructions to repair fragment ends and clone the fragments into the pCR4-TOPO vector. Recombinant plasmids were electroporated into E. coli Top 10 cells (Invitrogen) and selected on LB agar plates supplemented with 100 µg/ml ampicillin (Fisher Scientific, Fairlawn, NJ) and 40 µg/ml X-Gal (5-bromo-4-chloro-3-indolyl-ß-D-galactopyranoside; Sigma-Aldrich, Milwaukee, WI) (LBAmp100,X-Gal40). For each library, 864 (9 x 96) white colonies were selected, placed into a 96-spot grid on fresh LBAmp100,X-Gal40 agar plates, and incubated overnight at 37°C. A 96-pin replicating tool was used to inoculate a 96-well U-bottomed plate (plate A) containing 150 µl LBAmp100 broth per well, which was incubated overnight at 37°C. This plate was then used to inoculate two additional 96-well plates (plates B and C) containing LBAmp100 broth, which were incubated overnight at 37°C. Glycerol (50 µl of 50% glycerol) was added to plates A and B, which were stored at 80°C for routine use and long-term storage, respectively. Plate C was stored at 80°C without glycerol for use as a template for PCR.
Mixed-genome microarray fabrication.
Clone inserts were PCR amplified from whole cells containing recombinant plasmids (5 µl broth from plate C [described above]) in 50-µl PCR mixtures as described for the TOPO Shotgun subcloning kit (Invitrogen), using T7 and T3 primers. PCR products were purified by isopropanol precipitation and resuspended in 22 µl H2O, and 2 µl was evaluated in a 1% agarose gel. To the remaining 20 µl PCR product, 12.5 µl 4x print buffer (0.4 M Na2HPO4, 0.8 M NaCl, 0.04% sodium dodecyl sulfate; pH 11.6) and 17.5 µl H2O were added. Single spots of each PCR product were deposited as eight subarrays on Superfrost/Plus slides (Fisher Scientific) by using a MicroGrid II arrayer (Genomic Solutions, Ann Arbor, MI). Each subarray contained four replicate spots of a region of the 16S rRNA gene from an Enterococcus sp. isolate amplified with primers 16S_008Fwd and 16S_517Rvs (28) and four replicate spots of an arbitrary, biotinylated oligonucleotide. The former served as a control for target labeling and hybridization efficiency, and the latter served as a control for detection chemistry (4).
Sample hybridization.
gDNAs were extracted from target strains using a DNeasy tissue kit (QIAGEN), and 0.5 µg was nick translated for 2 h in the presence of biotin-dATP (BioNick labeling system; Invitrogen). Labeled gDNAs were purified by ethanol precipitation and resuspended in 225 µl hybridization buffer (4x SSC [60 mM NaCl, 0.6 mM sodium citrate; pH 7.0] and 5x Denhardt's solution [0.1% {wt/vol} Ficoll, 0.1% polyvinylpyrrolidone, 0.1% bovine serum albumin]). Biotinylated gDNAs (75 µl) were heat denatured, applied to the slide, and incubated overnight in a humidified chamber at 60°C. Slides were preblocked at 23°C for 30 min with TNB buffer (100 mM Tris-HCl [pH 7.5], 150 mM NaCl, 0.5% blocking reagent [TSA biotin system; Perkin-Elmer, Boston, MA]). The remaining detection steps were carried out as previously described (3), with 75 µl of the appropriate reagent applied to the slide at each step. Images were captured with an arrayWoRxe scanner (Applied Precision, Issaquah, WA).
Image and data analysis.
Microarray images were quantified using softWoRx Tracker software (Applied Precision). The final output included median pixel values (range, 0 to 65,535) that were imported into a custom relational database (MS Access; Microsoft Corp., Redmond, WA) for further processing. Data from each slide were normalized by first calculating the mean intensity for the replicated 16S rRNA gene probes and then dividing every probe intensity value by this mean. When normalized values exceeded 0.5, the probe was considered positive (and thus present in the genome). We then generated frequency tables to identify probes that were exclusive or nearly exclusive to a specific host (chi-square test). We only considered probes to be specific to a host if the probes originated from the mixed-genome library for that host. In cases where no probes could be identified by contingency analysis, we selected the best possible match to the host. NCSS 2004 software (Number Cruncher Statistical Systems, Kaysville, UT) was used for statistical tests, with a P value of 0.05 as the threshold for significant findings. We did not employ a correction factor for multiple tests because our goal at this stage was to be liberal in the selection process and rely on subsequent PCR validation testing to narrow the list of potential BST markers.
Plasmid insert sequencing and primer design.
Clones identified as potentially specific for a single host were retrieved from the glycerol stock bank and sequenced. Plate A (glycerol stock) was thawed on ice, and 20 µl of each selected clone was used to inoculate 3 ml LBAmp100 broth. Cultures were grown overnight at 37°C, and a QIAGEN Plasmid mini spin kit was used to isolate plasmid DNA. Extracted plasmids were quantified by spectrometry, diluted to 25 ng/µl, and sequenced with T7 and T3 primers using an ABI BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA). Each sequencing reaction contained 2 µl Terminator mix, 3 µl 5x dilution buffer, 3.4 µl H2O, 1.6 µl 2 µM primer, and 10 µl of 25 ng/µl plasmid template. Thermal cycling parameters for sequencing were set as described by the manufacturer (Applied Biosystems). Sequencing reaction products were purified by ethanol precipitation, resuspended in 15 µl HiDi formamide (Applied Biosystems), and analyzed with an ABI Prism 3100 genetic analyzer (Applied Biosystems). Vector NTI 9.0.0 (InforMax, Frederick, MD) software was used for base calling, to generate sequence alignments, and to design primers (Table 1). Database searches for similar protein sequences were performed by using the BLASTx network server (NCBI; www.ncbi.nlm.nih.gov) to assign a putative function to each probe sequence.
|
View this table: [in a new window] |
TABLE 1. Primers used in this study
|
|
View this table: [in a new window] |
TABLE 2. Templates used for validation PCR screening
|
Sequencing and analysis of 16S rRNA genes from selected isolates.
We identified two or three isolates (32 total) that harbored one of the putative markers from this study and sequenced a portion of the 16S rRNA gene to aid in their identification. A portion of the 16S rRNA gene was PCR amplified using primers 16S_008Fwd and 16S_517Rvs (28), with reaction and thermal cycling conditions as described previously. Two units of exonuclease I (New England Biolabs, Ipswich, MA) and 4 units of shrimp alkaline phosphatase (New England Biolabs) were added to each PCR mixture, and reactions were incubated at 37°C for 20 min followed by 80°C for 15 min. PCR products were then diluted to 4 ng/µl, and 10 µl was added to a standard one-quarter sequencing reaction mix as described above. Sequencing was completed as described above. Vector NTI 9.0.0 (InforMax) software was used for base calling and to generate sequence alignments. Database searches for similar DNA sequences were performed by using the BLASTn network server (NCBI; www.ncbi.nlm.nih.gov) to assign a putative species to each isolate.
|
|
|---|
Screening putative host-specific markers using PCR.
Putative host-specific primers were used to screen gDNA pools generated from multiple isolates of Enterococcus spp. (Table 2). We rejected all potential markers for waterfowl and dogs but identified 15 putative markers distributed between humans, cattle, and deer/elk (Table 3). Two cattle markers were identified as positive for only cow isolates and dairy lagoon samples. The elk/deer markers varied in their specificities, from specific for only elk/deer isolates to positive detection for several combinations of nonhuman templates. The human-specific primer pairs were mostly very specific, although only marker 77 amplified a band from the sewage samples (Table 3).
|
View this table: [in a new window] |
TABLE 3. Validation PCR screening results
|
|
View this table: [in a new window] |
TABLE 4. Distribution of markers among 385 isolates hybridized to the mixed-genome microarray
|
|
View this table: [in a new window] |
TABLE 5. Description of selected markers
|
|
View this table: [in a new window] |
TABLE 6. Distribution of markers among 32 positive control isolates
|
|
|
|---|
Our effort focused on a different strategy whereby we used microarrays to screen for the presence or absence of a large number of DNA fragments (n = 4,320) in an effort to detect host-specific markers. We focused our efforts on the Enterococcus genera in part because there is a considerable history of using these organisms for water quality testing and because these organisms are relatively simple to culture. We have no means to gauge the efficiency of our approach, but given that only 0.35% of the cloned inserts appeared to be applicable to BST, other approaches should be considered to enrich the libraries for host-specific markers (e.g., using suppression subtraction hybridization). While library construction and microarray hybridization consumed considerable effort, we found that a significant amount of time was also required for PCR validation testing, and this would be true regardless of how one goes about selecting putative markers.
The 15 markers identified in this study originated from human, cow, and cervid isolates. It is possible that our screening process invalidated additional markers that could be useful for source tracking. For example, markers identified by microarray hybridization that failed the PCR screening process could represent useful genetic polymorphisms. With enough polymorphisms (typically >10% difference between microarray probe and target sequences), we would see a reduced hybridization efficiency, but if our PCR primers were conserved for multiple alleles, then we would detect the presence of the gene in many hosts, regardless of host-specific internal sequence polymorphisms. Nevertheless, our simple screen for the presence or absence of markers was the most efficient means to identify relatively conserved markers for source tracking.
Of the 15 markers described here, the two cow markers (15 and 19) met all of our screening criteria to be defined as specific to cattle, and from both the hybridization and PCR screening results, we estimate that each of the cow markers is present in about 10% of cow isolates. One marker (48) was unambiguously associated with elk/deer, and we estimate that it is present in ca. 15% of Enterococcus isolates from cervid sources. We identified five human-specific markers. Four of these were present in an estimated 2% to 10% of human isolates, while the fifth was present in a larger proportion of isolates but may be present in isolates from other hosts. The remaining seven markers were specific for various combinations of hosts, and in most cases we found them to be clearly positive for nonhuman sources and negative for human sources.
We anticipated that our search would identify at least some host-specific markers that are involved in host recognition, such as cell surface proteins, or markers involved in the metabolism of host-specific nutrients. Three of the markers appear to be involved in metabolic pathways (markers 68, 77, and 93), but none are obvious outer membrane proteins that would be involved in host recognition and adhesion. Nevertheless, seven of the markers encode unknown and hypothetical proteins, and it is possible that these serve functional roles important to survival in specific hosts. It is very interesting that four of the markers appear to be involved in DNA replication and repair (markers 15, 40, 90, and 91), which are basic housekeeping functions. This implies that the host-specific strains are divergent enough to have evolved strain-specific housekeeping gene sequences. Others have found that host specificity can correlate with housekeeping gene divergence (e.g., Flavobacterium psychrophilum) (25). Finally, marker 81 encodes a major tail protein from a lactococcal bacteriophage. There is no documented evidence of the presence of this bacteriophage in Enterococcus spp.
The 15 markers from this study partition among discrete Enterococcus species. Cervid markers are present only in E. mundtii and E. casseliflavus, while E. faecalis harbors only human markers. E. hirae harbors a wider variety of markers, with human, cow, and cervid markers represented by this species. Nevertheless, the host-specific markers that we identified in E. hirae are still host specific, indicating the presence of host-adapted strains within a single species. This assumes that our 16S rRNA sequencing efforts correctly reflected species identification. In some cases, simply identifying the species present in compromised water may be useful as a preliminary and cost-effective means to test for sources of fecal contamination (29). We do not know what proportions of these species are represented in the normal flora of humans, cattle, and cervids.
Validating markers for bacterial source tracking is an open-ended process, and there are no universally accepted criteria to determine when a marker can be considered host specific. For example, it is possible that a host-adapted strain with a specific marker is present transiently within a nonspecific host, whereas it is found much more frequently in a target host (e.g., markers 89, 90, 91, 93, and 94; Table 4). It is also possible to have markers that are very host specific but that are so rare that they are not particularly useful for bacterial source tracking. The conclusion about the utility of the marker is also dependent on the assay that is employed. In our case, we tried to validate markers by testing them against a geographically diverse collection of specific and nonspecific isolates, and the tests involved a series of individual isolates or pools of gDNAs isolated from individual isolates or from broth enrichments. These isolates were obtained opportunistically from cooperators located around the United States. While we are not proposing validation criteria, we propose that any screening procedure should include isolates from diverse geographic origins.
As more library-independent markers are identified, there remain questions about how to implement these in a manner that provides an accurate, meaningful, and cost-effective assay for BST. One strategy is to test material that has been cultivated from water samples. For example, Scott et al. (21) filtered water samples, incubated the filters on selective media for Enterococcus spp., and then briefly enriched the colonies in broth, extracted gDNAs, and tested this material for the presence or absence of the esp marker. This approach is highly desirable because of its simplicity, but it becomes problematic when markers from more than one host species are detected. Simple detection of a marker's presence or absence does not help to enumerate the proportion of each marker in the sample. Furthermore, if multiple species of Enterococcus are being detected (e.g., four species in the present work), then there could be problems with differential recovery with the enrichment media (although this can be tested). The latter complication could be circumvented by directly extracting gDNAs from membrane filtrates without enrichment (18). Depending on the filtration method that is employed, however, this strategy can encounter significant sensitivity limitations, and filtering large volumes of water can be expensive.
Another approach being considered in the BST field is to develop quantitative real-time PCR assays for library-independent markers. In theory, this will permit the practitioner to enumerate each marker either in a multiplex reaction or through a series of individual reactions for each marker of interest. This analysis usually assumes that quantification is feasible based on a standard curve. For environmental samples, we know that each gDNA extraction can have different levels of PCR inhibition (18), so unless the standard curve is incorporated into the sample extract, there can be a fair amount of uncertainty for quantification. This limitation could be circumvented by using a relative ratio analysis where real-time PCR results could be expressed as ratios within each sample. Even this method, however, does not circumvent a larger question about marker representation, that is, if accurate enumeration or ratios can be estimated, how does this relate to the original contribution from each host animal? For example, suppose that we detect equal numbers of cattle and human targets within a sample. If the cattle marker is found in 1% of all cattle isolates and the human marker is found in 10% of all human isolates, then our original conclusion about a 50:50 contribution is incorrect by a 10-fold difference. More work will be needed to assess the proportion and variance for markers that are shed by host animals. More work is also needed to assess the sample-to-sample variance and differential survivorship of bacterial strains that harbor different BST markers.
Regardless of the method that is chosen, we advocate simplicity so that institutions that need to address total maximum daily load requirements can obtain accurate and timely information at the lowest possible cost. If we recognize that there is considerable variance in the temporal and spatial distributions of host-specific markers, then one strategy may be to use simple presence-absence detection (by PCR), but conclusions would be based on multiple site visits. That is, conclusions would be based on positive "events" rather than investing considerable effort to carefully enumerate individual single samples. For example, if a site is visited 10 times, and on three occasions markers are detected for human feces, but cattle fecal markers are detected for all 10 samples, then one can draw the reasonable conclusion that the contribution from cattle is the first concern.
This project was funded by USDA NRI contract 2002-35102-12374 and by the Agricultural Animal Health Program at the College of Veterinary Medicine, Washington State University, Pullman, Wash.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»