Previous Article | Next Article ![]()
Applied and Environmental Microbiology, March 2009, p. 1658-1666, Vol. 75, No. 6
0099-2240/09/$08.00+0 doi:10.1128/AEM.01304-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.
,
Department of Civil & Environmental Engineering, Stanford University, Stanford, California 94305
Received 11 June 2008/ Accepted 6 January 2009
|
|
|---|
r
0.05), temporal proximity (r = 0.09), and geographic distance (r = 0.09). A neutral community model for all sampling events explained 61% of the variation in genotype abundance. Cooccurrence indices (C-score, C-board, and Combo) were significantly different than expected by chance, suggesting that the V. cholerae population may have a competitive structure, especially at the regional scale. Even though stochastic processes are undoubtedly important in generating biogeographic patterns in diversity, deterministic factors appear to play a significant, albeit small, role in shaping the V. cholerae population structure in this system. |
|
|---|
The field of microbial ecology is also subject to the controversies raised in macroecology. Several studies have raised the apparent conflict between explanations of community taxon abundance distributions using either niche assembly or neutral assembly models (18, 27). Sloan et al. (39, 40) derived a continuous form of a neutral community model (NCM) that explains microbial taxon abundance in terms of only stochastic dispersal, random speciation, and ecological drift. While the NCM provided a convincing fit to clone libraries for genes from a diverse collection of bacterial communities, there is some question as to whether these patterns could also be generated by deterministic mechanisms (27).
In this study, we describe patterns in the diversity of a coastal Vibrio cholerae population spanning 3 years and nine sampling sites along the central California coast, with the goal of testing whether population structure is controlled by deterministic and/or stochastic processes. The bacterial species V. cholerae was chosen for this analysis because of its importance as a human pathogen, its ubiquity in coastal waters across the globe (8, 9, 34), and its extensive genomic and phenotypic diversity (22, 28). Previous fingerprinting studies have repeatedly exposed a high degree of diversity among nontoxigenic V. cholerae isolates and especially non-O1/O139 serotype isolates compared to the primarily clonal relationships among both clinical and environmental isolates of the O1 and O139 serotypes (8, 21, 37). However, a thorough biogeographic analysis of environmental V. cholerae has not previously been performed.
Genomic fingerprinting based on conserved enterobacterial repetitive intergenic consensus sequence PCR (ERIC-PCR) is particularly amenable to characterizing genomic diversity within large collections of closely related isolates due to its low cost and technical ease (30, 44). This method provides a high level of discrimination between bacterial strains and correlates well with other measures of genome similarity, including pulsed-field gel electrophoresis, amplified fragment length polymorphism, multilocus enzyme electrophoresis, and DNA-DNA hybridization (33, 44). ERIC-PCR fingerprinting has been successfully implemented in prior studies of V. cholerae epidemiology (8), source tracking (36, 48), and genomic diversity (4, 21, 37).
Classifying our unique collection of environmental V. cholerae isolates from coastal waters and sediments into distinct ERIC-PCR genotypes allows us to assess patterns in the genomic diversity of isolates with respect to differences in geography, temporal proximity, and environmental conditions. Assembly of this microbial population in time and space is likely determined both by niche differentiation between genotypes or other deterministic mechanisms and by neutral processes, such as random dispersal, ecological drift, and chance (19). However, providing evidence for competitive structuring in a community beyond that which can be explained by stochastic processes is rarely possible (1, 27). Here, we use partial Mantel tests, NCMs, and cooccurrence indices to explore how stochastic and deterministic factors shape V. cholerae population structure.
|
|
|---|
In addition to the 601 isolates collected during 2006, we included the 41 environmental and 4 clinical isolates previously described by Keymer et al. (22), 1 water column isolate from August 2004, 162 water column isolates from June and July 2005, and 27 and 10 isolates obtained by alkaline peptone water (APW) enrichment from filters and crab shells, respectively (for a total of 846 isolates) (see Tables S1 and S2 in the supplemental material). For the filter enrichment isolates, water column samples were membrane filtered as described above, but the filters were vortexed in APW and incubated at 37°C for 6 h before aliquots of APW were spread plated on TCBS agar. One-square-centimeter pieces of crab shells recovered after 2 weeks of deployment at five sampling sites were also vortexed and incubated in APW prior to selection of isolates on TCBS agar. Some of these additional isolates were isolated from Kirby Park in Elkhorn Slough, which increased the number of coastal sampling sites to nine. Seven strains from the collection did not produce analyzable fingerprints and were classified as untyped, so the number of Vibrio isolates used in the study was 839. The genomic similarity of the 45 isolates described by Keymer et al. (22) has been well characterized using comparative genome hybridization (CGH), which provides a set of isolates that can be used to assess the accuracy of ERIC-PCR analysis methods. Below, these 45 isolates are referred to as the "calibration set".
Environmental parameters.
Coincident with sample collection, water temperature, salinity, dissolved oxygen, pH, and turbidity were measured in situ using a calibrated Hydrolab Quanta water quality probe (HACH Environmental, Loveland, CO). Approximately 30-ml water samples were 0.2-µm syringe filtered into acid-washed containers and stored at –20°C prior to analysis for dissolved nutrients. Nutrient samples were sent to the UCSB Marine Science Institute Analytical Lab for analysis of ammonium, soluble reactive phosphate, nitrate plus nitrite, nitrite, and silicate using flow injection methods. Since the first four parameters were correlated over all sampling events, only ammonium concentration is included in the analysis of genotype distribution patterns. Log transformations of salinity, ammonium, and turbidity data were performed to ensure that the data were normally distributed. Raw environmental data are shown in Table S3 in the supplemental material.
ERIC-PCR.
Whole cells grown in LB broth with 1% NaCl were washed and resuspended in sterile water. One microliter of cell suspension was added to 24 µl of a master mixture containing 1.25 U high-fidelity Hot Star Taq polymerase (Qiagen), 1x HotStar HiFidelity PCR buffer, 1 mM additional MgSO4, and 2 µM each of 6-carboxyfluorescein-labeled primer ERIC2 and unlabeled primer ERIC1R (36). The thermal cycling conditions were the conditions described by Zo et al. (48), except for an initial 15-min hot start at 94°C. Three microliters of the PCR product was mixed with 22.5 µl of Hi-Di formamide (Applied Biosystems) and 0.5 µl of a custom MapMarker (Bioventures Inc., Murfreesboro, TN) consisting of 6-carboxyl-X-rhodamine-labeled size fragments (100 to 2,000 bp). Fragment analysis of the PCR-amplified genome fragments was performed with an ABI 3730XL DNA analyzer at the University of Wisconsin Biotechnology Center.
Processing of chromatograms.
The method described below for generating ERIC-PCR fingerprints from sample chromatograms was found to be superior to an automated band-calling method on the basis of precision, reproducibility, discriminatory power, and accuracy in classifying the calibration set (data not shown). Chromatogram files were uploaded to GelCompar II 5.0 (Applied Maths, Austin, TX) using the CrvConv filter. Curves were normalized to the internal standard and filtered using a rolling disk size of 16% to remove background noise and a least-squares cutoff of 0.03% to remove high-frequency noise. Filtered curves were compared using Pearson product-moment correlation (see Fig. S1 in the supplemental material), and the similarity matrix was exported to MATLAB 7.4 (MathWorks, Natick, MA).
Evaluation criteria for ERIC-PCR fingerprinting.
Four criteria were used to evaluate the ERIC-PCR fingerprinting method: precision, reproducibility, discriminatory power, and accuracy. Precision was computed as the mean of a lognormal distribution fit to the histogram of pairwise similarity between replicates. Replicates refer to multiple independent fingerprints (between 2 and 10 fingerprints; median, 2 fingerprints) obtained for the same isolate. For the other criteria, fingerprints were clustered into groups with intracluster similarity greater than a specified identity cutoff. Each cluster corresponds to one ERIC-PCR genotype and is designated by a unique identifier from 1 to 115. Reproducibility was calculated for all replicate fingerprints as the average percentage of replicates present in the cluster containing the most replicates of a given isolate. Discriminatory power (D) was computed for all fingerprints using equation 1 (20):
![]() | (1) |
![]() | (2) |
![]() | (3) |
Biogeography analysis.
Following the division of all fingerprints into ERIC-PCR genotypes, DNA sequences for the housekeeping gene dnaE were generated for representative isolates for all 115 genotypes (see Fig. S4 in the supplemental material). Six additional housekeeping genes were sequenced for isolates for 77 genotypes, and in all cases classification of isolates as species other than V. cholerae on the basis of dnaE sequence homology was verified by the additional locus sequences (D. P. Keymer and A. B. Boehm, unpublished data). Genotypes with <95% sequence similarity to V. cholerae (23) were classified as non-V. cholerae vibrios and excluded from further analysis.
The number of ERIC-PCR genotypes detected was plotted as a function of the number of isolates collected for each sampling event. Data points were fitted to both linear and power responses using least-squares regression. The power curve had the form y = axb + c. Goodness of fit for the two models was evaluated using Fisher's F test.
Simple and partial Mantel tests were performed with the open-source program zt (5) to test for correlation between genotypic dissimilarity of samples and geographic distance, while accounting for differences in environmental conditions. Downloads of zt are free through the Journal of Statistical Software website (http://www.jstatsoft.org/v07/i10). Matrices for Mantel tests were assembled in MATLAB 7.4. The genetic dissimilarity matrix used the Dice coefficient dissimilarity between the incidences of ERIC-PCR genotypes within each sampling event. The geographical distance matrix was based on the spherical law of cosine distance from latitude-longitude coordinates at each site. Simple Mantel tests were used to determine which environmental parameters were correlated with the genotypic dissimilarity matrix. Measured parameters were added stepwise to the environmental dissimilarity matrix to generate a matrix that had maximum correlation with the genotypic dissimilarity matrix and included the fewest parameters. The environmental distance matrix consisted of standardized Euclidean distances between values for water temperature, log salinity, and log ammonium concentration measured during each sampling event. The temporal distance matrix was comprised of the number of days between sampling events normalized to the study period duration. A false-discovery rate correction was used to determine the significance of testing multiple hypotheses (3). Values and ranges of correlation coefficients presented below reflect the individual correlation of geographic, environmental, or temporal distance with genotypic dissimilarity when both of the remaining predictor variables were controlled for.
Distance-decay relationships were estimated for log-log plots of genotypic similarity versus geographic and temporal distance. Bootstrapped linear regressions were performed with 10,000 replicate resamples to verify that taxon-area and taxon-time exponents were nonzero, as described previously (16).
NCMs were fitted with least squares to mean relative abundance and frequency data for each ERIC-PCR genotype across all sampling events based on the equations derived by Sloan et al. (40). NCMs were also fitted to sampling events for individual sites to assess V. cholerae community assembly on a local-versus-regional spatial scale. Coefficients of determination and root mean square errors (RMSEs) (normalized to range in dependent variable) were computed to assess goodness of fit for each model. NCMs were not fitted for individual sites that contained fewer than 16 genotypes (Kirby Park, Lagunitas Creek, Moss Landing Harbor, and San Pedro Creek).
Cooccurrence indices (C-board, Combo, and C-score) for incidence matrices of samples within individual sites or all sites were computed as described by Horner-Devine et al. (17), using EcoSim 7.72 (14). Singleton genotypes were removed from the matrices before analysis. Indices were not computed for individual sites that contained fewer than three nonsingleton genotypes (Kirby Park, Lagunitas Creek, Moss Landing Harbor, and San Pedro Creek). Standardized effect scores, equivalent to statistical z scores, are included below to allow comparison between data sets of different sizes.
All statistical comparisons except the Mantel tests were performed in MATLAB 7.4 (MathWorks).
Nucleotide sequence accession numbers.
The GenBank accession numbers for the DNA sequences determined in this study are FJ609424 to FJ609633.
|
|
|---|
The high-resolution CGH approach used by Keymer et al. (22) identified several groups of apparently identical V. cholerae isolates. Using the same isolates characterized in that study, we compared clusters generated from the ERIC-PCR fingerprints with different identity cutoffs to the clusters defined using CGH. False positives (divergent isolates clustering together) and false negatives (identical isolates clustering separately) were tallied over a range of similarity cutoffs (see Fig. S3 in the supplemental material). When evaluated at the optimal identity cutoff, 80%, our genotype classification method had a false-positive rate of 0.57% and a false-negative rate of 15.0%.
Fingerprint analysis of entire strain collection.
The entire collection of ERIC-PCR fingerprints was analyzed using an 80% identity cutoff to discern distinct genotypes. A total of 998 fingerprints, including replicates, from 839 isolates were divided into 115 ERIC-PCR genotypes, each represented by between 1 and 72 isolates (see Fig. S1 in the supplemental material). Nucleotide sequence analysis using the DNA polymerase I gene (dnaE) (see Fig. S4 in the supplemental material) divided the genotypes into confirmed V. cholerae isolates (99 genotypes, 799 isolates) and other putative vibrios (16 genotypes, 40 isolates). Genotypes classified as non-V. cholerae vibrios (genotypes 38, 50, 62, 65 to 67, 70, 72 to 75, 103, 107, 112, and 114 to 115) were excluded from further analysis. Our calibration set of 45 V. cholerae isolates is
99% similar at the 16S rRNA gene level (data not shown); however, ERIC-PCR fingerprinting classified the 45 isolates into 21 different genotypes.
The genotype richness of isolates collected during most sampling events did not level off with increasing sampling effort, even though we collected up to 50 isolates per event. For the 97 sampling events, we observed a power law relationship in the number of genotypes detected with the number of isolates collected (Fig. 1) (r2 = 0.71). The power law curve provided a better fit than the linear model (r2 = 0.68, F = 9.49, P < 0.05). Samples below the regression line were relatively closer to saturation of diversity, while samples above the line were relatively further from saturation. The isolates obtained by enrichment had relatively low diversity and fell below the regression line, along with the isolates collected at Waddell Creek in June 2006. Conversely, the samples from San Lorenzo River collected in June, July, and August 2006 had relatively high diversity and fell above the regression line.
![]() View larger version (21K): [in a new window] |
FIG. 1. Relationship between sample size and ERIC-PCR genotype diversity. The number of ERIC-PCR genotypes detected in a sample increases according to a power law relationship with the total number of isolates collected (y = axb + c, where a is 1.48 ± 1.47, b is 0.60 ± 0.24, and c is –0.51 ± 1.97; r2 = 0.71). The power law model fit better than a linear model (r2 = 0.68) based on Fisher's F test (F = 9.49, P < 0.05). Samples discussed in the text are labeled for clarity.
|
The distribution of persistent genotypes at Pescadero Creek, Old Salinas River, San Lorenzo River, and Waddell Creek is shown in Fig. 2 to illustrate interesting spatial and temporal patterns in genotype occurrence. For instance, genotypes 27, 45, 86, and 89 are cosmopolitan and were repeatedly found across the sites. In contrast, genotypes 1, 98, 99, and 113 are restricted to a single site. Temporal shifts in the genotypic composition of particular sites are also evident in Fig. 2. Genotype 90 appears to dominate the V. cholerae population in Waddell Creek in June 2006, but less than 30 days later most of its contribution to the sample diversity has been replaced by a number of other genotypes. Genotype 54 is numerically dominant in the Old Salinas River during many summer months, but its contribution declines throughout the rest of the year. Cooccurrence of some genotypes also appears to be common. For example, genotypes 1 and 111, 16 and 33, and 21, 42, 56, and 82 repeatedly occur together during various sampling events.
![]() View larger version (56K): [in a new window] |
FIG. 2. Proportions of selected genotypes collected from four sites over the entire sampling period. The numbers above the bars indicate the numbers of water column and sediment isolates collected and typed for the sampling event. The letters at the bottom indicate the months in which samples were collected. Each sample bar is divided into areas representing different genotypes (indicated by different colors and patterns) detected in the sample. To improve visual appearance, rare genotypes that appeared only once in the entire data set are indicated by white areas, while other genotypes that were not persistent in time or space are indicated by light gray areas. Genotypes classified as non-V. cholerae vibrios (non-VC) are indicated by black areas.
|
Simple Mantel tests revealed significant intercorrelations (P < 0.05) between our deterministic predictor variables (environmental, geographic, and temporal distances). Ammonium concentration was the most highly correlated individual environmental predictor of genotypic dissimilarity, followed by salinity and then water temperature. The partial Mantel tests revealed that the environmental conditions, sampling date, and geographic locations of the sampling events had significant, although small, independent effects on the genetic dissimilarity between samples (Table 1). Recall that the partial Mantel tests control for intercorrelation between predictor variables. Environmental differences had a relatively smaller effect on genotype similarity among samples (0.04
r
0.05, P < 0.008), and days between samples (r = 0.09, P = 0.0001) and kilometers between sites (r = 0.09, P < 0.0001) were equally significant predictors. While the geographic, temporal, and environmental associations were significantly correlated with the genetic similarity between samples, greater than 95% of the variability in the data remains unexplained by the variability in our predictor matrices.
|
View this table: [in a new window] |
TABLE 1. Results of simple and partial Mantel tests between distance matrices for genotypic dissimilarity, environmental dissimilarity, geographic distance, and temporal separation of isolates from each sampling eventa
|
m
0.0045 and equally prominent scatter in the observed abundance data (Table 2 and Fig. 3). |
View this table: [in a new window] |
TABLE 2. Fitting parameters (NT and m) and goodness of fit (r2 and RMSE) for NCMsa
|
![]() View larger version (23K): [in a new window] |
FIG. 3. Fit of NCM to observed isolation frequency and mean relative abundance for ERIC-PCR genotypes from all environmental samples and from a subset of individual sampling sites. The open circles represent individual genotypes, and the dashed lines indicate the least-squares best fit. Fitting parameters are shown in Table 2.
|
|
View this table: [in a new window] |
TABLE 3. Cooccurrence indices computed for incidence matrices from all sampling events or individual sites relative to null matrices using EcoSim 7.72a
|
|
|
|---|
z
–0.07) (12, 16) but lower than values computed for high-mountain-lake bacteria (z = 0.16) (35) and beech tree hole bacterial communities (z = 0.26) (2). The small sample sizes used here and in other studies allow only systematic shifts in abundant taxa to be detected using the taxon-area relationship, so the actual absolute value of the exponent could be much larger if rare taxa behave similarly (46). The small exponent observed for the V. cholerae taxon-area relationship probably indicates some niche overlap between intraspecific genotypes (29). A negative log-log relationship was also observed between genotypic similarity and temporal distance of sampling events (z = –0.047 ± 0.024). The taxon-time exponent is much smaller than the exponents estimated for bacteria treating industrial wastewater (–0.512
z
–0.162) (43). Our smaller exponent compared to the industrial wastewater reactor exponents suggests that there is higher temporal stability in the V. cholerae population, but it could stem from differences in the taxon-level resolution as well.
The distance-decay relationships described above provide an elementary tool to describe distribution patterns in microbial communities, but they cannot distinguish the effects of multiple interrelated factors. Therefore, we used partial Mantel tests to quantify independent effects of our geographic, temporal, and environmental predictor matrices on genotype occurrence across sampling events. All three of our predictor variables had significant, but minor effects on the genotype dissimilarity between samples. The correlation coefficients for the partial Mantel tests are somewhat lower than the values that have been seen for environmental effects in salt marsh bacteria (0.26
r
0.37) (16), for spatial effects in high-mountain-lake bacteria (r = 0.29) (35), and for pH effects on soil bacterial communities (r = 0.75) (12). In contrast to previous studies, we found evidence for independent effects on genotypic similarity from both environmental heterogeneity and spatial distance, as well as temporal proximity. The fact that we are able to detect geographic effects in our data set means that dispersal rates are not high enough to mask a distance-decay relationship (26). Correlation between environmental similarity and genotypic similarity implies that local deterministic factors and niche differences play a role in shaping V. cholerae population structure. However, the relatively strong temporal influence on genotypic similarity across sites up to 100 km apart suggests that regional factors acting simultaneously at all sites are also important. These factors could include biological interactions with other seasonally abundant taxa or fluctuations in climate and near-shore oceanographic phenomena.
The amount of variability in the genotype dissimilarity matrix explained by deterministic factors (location, time, and environmental conditions) according to the partial Mantel tests was less than 5% of the total variability, so much of the variability remains unexplained. There are several possible explanations for this result. We may not have measured some environmental parameters that are important in controlling V. cholerae population structure. V. cholerae may inhabit microniches where environmental parameters vary immensely from those measured and assigned to the sampling event (31, 45). Niche selection may occur at the level of gene content or gene regulation, which is finer than the resolution that we observe with genome fingerprinting. If this is the case, genotypes with similar genome fingerprints may respond very differently to changes in their environment. Alternatively, it may be possible that isolates that are classified as having different ERIC-PCR genotypes but are similar in some portion of their gene content experience selection with the same environmental cues (22). Finally, most of the diversity observed among isolates in this environment may be neutral and shaped by stochastic processes (42). To further explore the latter explanation, the role of stochastic and deterministic processes in controlling genotype distribution patterns was examined using NCMs and cooccurrence indices.
NCMs appear to describe much of the distribution of genotypes, suggesting that stochastic processes play an important role in shaping population structure. The NCMs fit our taxon abundance data better than the human fecal community example described by Sloan et al. (40), but our r2 values are relatively low and the curve fits are less convincing than those for other bacterial communities (39, 47). However, the simple NCMs can very rarely be rejected based on statistical considerations, and simulations show that under most circumstances niche and neutral models are indistinguishable, even under strong selective pressure (1, 27). Therefore, caution should be taken in interpretation of these results; they do not imply that deterministic processes are unimportant. To the contrary, we interpret the relatively weak NCM fit as evidence that stochastic processes, while important, are not sufficient to fully explain the observed patterns in genotype diversity.
The nonrandom cooccurrence indices suggest that competitive structuring of the V. cholerae population is important. The C-board and C-score indices were significantly larger than the indices generated for the null matrices (P = 0.001 and P = 0.019, respectively), and Combo was significantly smaller (P < 0.001), when all samples were compared across sites. Cooccurrence indices for samples within individual sites gave similar results, with three sites and two sites of four sites tested having significantly larger C-score and C-board indices than expected by chance, respectively. Competitive exclusion within the same niche is one mechanism that could produce higher-than-expected C-board and C-score values, but environmental differences between sites could also lead to habitat exclusion for some genotypes at the regional scale (1). These results highlight the potential importance of deterministic processes in shaping population structure. However, the work described here did not identify which deterministic factors ultimately control the observed patterns in V. cholerae biogeography, so further work is needed to verify effects of individual factors.
The results presented here ultimately depend on the tool that we used to discriminate genotypes. ERIC-PCR has been used by other researchers to explore V. cholerae diversity (21, 36, 37), and our independent evaluation of the ERIC-PCR analysis method confirmed that this genomic fingerprinting tool could be used to accurately define V. cholerae genotypes. However, other genomic fingerprinting methods (e.g., repetitive extragenic palindromic PCR with BOX or GTG5 primers), as well as multilocus sequence analysis, may provide greater discrimination between Vibrio isolates (13, 33, 38). It will be interesting to see how the results of biogeographic analysis of V. cholerae using these tools to define genotypes compare to our results.
The vast difference between practical sample size and actual population size in a site is an inherent problem in environmental microbial ecology studies (10). Due to constraints on time and resources, we could collect only up to 50 V. cholerae isolates for each sampling event. Based on our analysis of the relationship between sample size and genotype diversity, we found that there was a power law increase in the number of genotypes detected as the number of isolates collected increased. Therefore, the patterns that we observed take into account only the most abundant culturable genotypes, even though rare genotypes should be primarily responsible for driving spatial or other patterns in genotype distribution (46). We cannot currently assess how the results observed here would be affected by the use of culture-independent methods or larger sample sizes, but future work in these and other ecosystems should address this issue. Although future work must assess how well the patterns observed for our collection of isolates can be extrapolated to the larger coastal V. cholerae population, the nonrandom genotype distributions observed here provide evidence that deterministic factors have a small, but meaningful role in shaping the coastal V. cholerae population.
This work was funded by NOAA Oceans and Human Health Initiative grant NA04OAR4600195 (D.P.K., L.H.L., and A.B.B.), NSF grant OCE-0742048 (D.P.K. and A.B.B.), and a Gerhard Casper Stanford Graduate Fellowship (D.P.K.).
Published ahead of print on 9 January 2009. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»