Previous Article | Next Article ![]()
Applied and Environmental Microbiology, June 2007, p. 3705-3714, Vol. 73, No. 11
0099-2240/07/$08.00+0 doi:10.1128/AEM.02736-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
,
Department of Civil and Environmental Engineering, Stanford University, Stanford, California 94305,1 Division of Infectious Diseases and Geographic Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, California 943052
Received 22 November 2006/ Accepted 9 April 2007
|
|
|---|
|
|
|---|
Lan and Reeves (22) introduced the species genome concept for bacteria in 2000, delineating the genome into a core that defines the species characteristics contained in
95% of strains and auxiliary or dispensable components that allow adaptation to individual niches. Recently, several studies using multiple sequenced bacterial genomes from the same species identified core and dispensable genes (including three strains of Listeria monocytogenes and eight strains of Streptococcus agalactiae) (30, 36). Nearly half of the sequenced bacterial genomes to date are from pathogenic strains, limiting our ability to study functions that facilitate adaptation to environments outside of animal hosts. Analysis of multiple environmental strains from diverse environments is needed to appreciate how genome content is distributed across a species.
It is generally agreed that a bacterial species should be defined by both its core genetic composition and the phenotypes encoded by this core that define its broad ecological niche (23, 40). Diagnostic phenotypic characteristics of Vibrio cholerae were determined in clinical laboratories, using pathogenic strains and enteric media. These characteristics may not hold true for environmental strains distantly related to the infectious strains. Phylogenetic studies have demonstrated the clonality of O1 biovars and O139 strains while uncovering broad genetic diversity in non-O1/O139 strains (5, 12). If the genetic diversity is any indication of the phenotypic diversity of the species, then the core phenotypic characteristics of V. cholerae should be reevaluated for collections of environmental strains.
Given the sampling limitations associated with complete genome sequencing, comparative genome hybridization (CGH) using DNA microarrays provides a relatively inexpensive and rapid method for probing the genomic diversity of a collection of closely related strains. CGH has been widely used to explore the genomic composition of clinical and environmental strains of several bacterial species (4, 8, 10, 11, 14, 17, 33, 34). In the present study, we use DNA microarrays from the sequenced V. cholerae O1 El Tor strain N16961 to profile the genomes of 41 environmental isolates from a wide range of central California coastal environments and four clinical strains from India and Bangladesh. The 45 strains were also assayed for growth on 190 different carbon sources, allowing the classification of core and dispensable metabolomes for V. cholerae strains. These combined analyses identify basic V. cholerae genomic and metabolic cores that contain genes and functions enabling persistence in the aquatic environment.
This study provides the first analysis of both comparative genome content and metabolic capabilities of a substantial collection of environmental V. cholerae strains from a region without epidemic cholera. Although the isolates are geographically segregated from areas where cholera is endemic, we show that their genomes are not substantially different from non-O1/O139 strains from endemic regions. Simultaneous strain isolation and measurement of environmental parameters allows the correlation of gene presence and carbon source utilization with in situ water temperature, salinity, turbidity, and inorganic nutrient concentrations. This analysis identifies genes and pathways that may allow adaptation to specific niches in the coastal marine environment and play a role in shaping the Vibrio cholerae population structure in these ecosystems.
|
|
|---|
![]() View larger version (144K): [in a new window] |
FIG. 1. Locations of sampling sites in and around San Francisco Bay, California. Sites where Vibrio cholerae strains were isolated are displayed in bold followed by an asterisk. Darker colored areas along the coast indicate dense vegetative cover. (Map background data available from U.S. Geological Survey/EROS, Sioux Falls, SD.)
|
Environmental parameters.
Coincident with sample collection, water temperature, salinity, dissolved oxygen, pH, turbidity, and chlorophyll concentration were measured in situ using a calibrated water quality probe (model YSI 6600; Hydrolab, Yellow Springs, OH). Water samples were 0.2-µm-syringe filtered into acid-washed containers and stored at 20°C prior to analysis for dissolved nutrients. A five-channel, continuous flow analyzer was used to measure ammonium, soluble reactive phosphate, nitrate-nitrite, nitrite, and silicate following standard methods (3). Log transformations of salinity, ammonium, soluble reactive phosphate, nitrate, and nitrite, and inverse transformation of turbidity was performed to ensure the data were normally distributed. Selected environmental data are presented in Table S1 in the supplemental material.
Gene expression data.
We compared our CGH data with gene expression data from rice-water stool samples of cholera patients in Bangladesh (26) and fluid from the ileal loops of infected rabbits (41) to examine whether genes expressed while in the animal intestine also provide a selectable benefit for life in coastal ecosystems. Hybridization data were compiled as a log2 ratio with mid-log phase growth in LB used as the reference condition for the stool samples and genomic DNA used as reference for the ileal loop samples. Upregulated genes were defined as those greater than two standard deviations from the mean of the distribution.
Phenotype comparisons.
Isolates were assayed for nearly 190 different carbon substrates using proprietary phenotype microarrays (Biolog, Hayward, CA). Metabolic activity was quantified after a period of 48 h at 22°C. Isolates were scored as growth or no growth, using a procedure modified from the method for calling CGH data (27). Raw data for the phenotype microarrays consisted of intensity measurements of a chromogenic substrate at 15-min time points. The maximum intensity value was collected for all wells, and the maximum intensity in the negative control was removed from these values. Assays that were poorly reproducible between replicates were excluded. Substrates for strain N16961 with maximum intensity values greater than 175 and less than 175 were pooled into preliminary subsets A and B, respectively. Substrates with maximum intensities greater than median(A) 2 x standard deviation(A) were designated "growth" and those with maximum intensities less than median(B) 2 x standard deviation(B) were designated "no-growth." Any substrates with maximum intensities between the two thresholds were designated "unknown." Next, the maximum intensity for N16961 was subtracted from the maximum intensities for all other strains for each substrate. Substrates for all strains were called using the thresholds set for N16961, so that each assay was called growth or no growth relative to the score for N16961, based on the consensus of two replicates. Any substrates without a consensus were called unknown.
Statistical analysis.
All raw data management, scoring of data, and regression analysis for the CGH and phenotype microarrays were performed with MATLAB (Mathworks). Overrepresentation of functional role categories in a dispensable genome was verified with LACK 4.2 lexical analysis software (21). Nonmetric multidimensional scaling of microarray profiles was completed with Primer v.5 (PRIMER-E Ltd., Plymouth, United Kingdom). Canonical correspondence analysis was performed with PC ORD v4.0 (MjM Software). The unweighted-pair group method with arithmetic averages (UPGMA) tree of the binary CGH data was constructed using PAUP* 4.0b8 software with Jukes-Cantor distance and 1,000 bootstrap replicates. Logistic regression was accomplished with StatView 5.0.1 (SAS Institute, Inc).
|
|
|---|
Forty-one environmental and four clinical strains (see Table S2 in the supplemental material) were analyzed by CGH with the V. cholerae O1 El Tor strain N16961, using amplicon microarrays. The hybridization data and details of the CGH methods, including descriptions of probes printed on the microarray and explanations of the gene calls, are described in the report by Miller et al. (27). Briefly, the log2 ratio of the hybridization intensity of the query genomic DNA (gDNA) and that of the reference gDNA (strain N16961) was used to divide genes into three categories based on gene presence: positive, negative, and uncertain. Comparing the gene calls for each probed gene across the entire collection of isolates allowed the categorization of genes into biologically meaningful groups. Following the convention of Lan and Reeves, genes that were positive in 95% or more of independent strains were defined as the core genome for Vibrio cholerae, while genes that were negative in more than 5% of independent strains were defined as the dispensable genome (22). The core and dispensable gene sets contained 83.0 and 13.3% of the 3,357 probed genes. The remaining 3.6% of probed genes could not be determined to be ("called") positive or negative in at least 70% of the strains and are hereafter referred to as the uncalled gene set. The uncalled gene set is identical to the one described by Miller et al. (27), but the core and dispensable gene sets differ slightly from the "conserved," "absent," and "variable" gene sets. The core gene set comprises the entire "conserved" gene set plus "variable" genes called present in
95% and <100% of the strains. Similarly, the dispensable gene set includes all "absent" genes and the "variable" genes called present in <95% of strains. The 95% cutoff was chosen to allow comparison with other studies and should minimize the miscategorization of core genes due to stochastic errors, resulting in a more biologically meaningful core gene set (6).
We used regression analysis to estimate the adequacy of our sample size for predicting core genome size (Fig. 2). Six strains that had a clonal genotype in the same sampling event were removed from the data set, and 10,000 random permutations of strain order for the remaining 39 strains were generated. A power law was fitted to a plot of the reduction in core genome size with each additional strain sampled (R-squared value, 0.9998). The predicted core genome size for an infinite number of sampled genomes is 2,741, indicating that we overestimated the core genome size by approximately 1.6%. Increasing the accuracy of the core genome size by an additional 27 genes (1%) would require the hybridization of 86 additional independent genomes. This regression analysis validates the use of this strain set in approximating the V. cholerae core genome. However, there are two notable sources of error in this analysis. First, there are 530 predicted protein-encoding genes that were not included on the amplicon microarrays. A substantial portion of these unprobed genes fall within the integron, so core genes will likely be underrepresented in the unprobed gene set. However, the exclusion of the unprobed genes will certainly result in a reduced core genome size. Second, there is likely to be a small percentage of core genes excluded due to their absence from N16961 but presence in most other V. cholerae strains.
![]() View larger version (18K): [in a new window] |
FIG. 2. Estimation of Vibrio cholerae core genome size by regression analysis. Open circles with 95% confidence limits represent the mean number of core genes with increasing numbers of genomes sampled for 10,000 random permutations of sampling order. A power law regression fit [y = a x ( b) + c] with an R-squared value of 0.9998 is included. Regression coefficients with 95% confidence limits (CL) are as follows: a, 906.1 (CL, 894.1, 918.0); b, 0.8215 (CL, 0.8348, 0.8083); and c, 2,741 (CL, 2,739, 2,744). The horizontal dashed line represents the extrapolated core genome size for Vibrio cholerae, which is equal to 2,741 genes for a threshold of genes shared among 95% of sampled genomes. (Inset) Closed squares show the reduction in projected core genome size with increased stringency for gene ubiquity from 95% to 100% of strains.
|
We mapped the core gene set onto the pathway-genome database VchoCyc, which was designed to predict 171 likely metabolic pathways in Vibrio cholerae (35). Nearly all predicted metabolic pathways are fully represented within the core genome, including glycolysis, the tricarboxylic acid cycle, and pentose phosphate pathways. Biosynthesis of metabolic intermediates, amino acids, cofactors, nucleotides, and cell building blocks is represented, except for ectoine, cysteine, and derivatives of mannose. Pathways involved in metabolism of carbon and nutrient sources are also represented, except for sialic acid assimilation, removal of superoxides, galactose degradation, glycogen degradation, and citrate fermentation. Because all strains of V. cholerae inhabit the aquatic environment, while only a subset of strains reside for any time in other environments, like the intestine, pathways encoded in the core genome should primarily provide for persistence in the aquatic environment.
Functional characteristics of the dispensable genome.
Gene ontology (GO) terms (2) were collected for all genes probed on the microarray, yielding 5,957 and 637 functional annotations for the entire and dispensable genomes, respectively. Lexical analysis was performed to identify GO terms that were overrepresented in the dispensable gene set relative to those of the entire genome and to compute a probability that the observed number of hits would be observed for a random sample of the same size using the cumulative binomial probability function. Results were deemed significant at a P value of <0.05 (Table 1). It should be noted that the results of this analysis only allow a glimpse into the functions encoded by the dispensable genome because our methods restrict our analysis to genes probed on the microarray. Pathogenesis, transposition, and prophage functions and O-antigen biosynthesis genes are highly enriched in the dispensable genome, as has been shown elsewhere (11, 29). Primary virulence determinants contained on the CTX phage and Vibrio pathogenicity island 1 were missing from all the environmental isolates. Overrepresented functions also included chemosensing, cell surface modification, and lipopolysaccharide transport that mediate interactions between the bacterium and the extracellular environment. Such functions are expected to vary between strains adapted to living in different niche environments and are preferentially absent from other bacterial genomes relative to other functional categories (29, 30). The transport of some metabolic substrates in the dispensable genome, including citrate, mannose, fructose, and chitosan oligosaccharide (nonacetylated glucosamine dimer), were also overrepresented. This is consistent with the postulation that gene alteration is most likely to occur in the peripheral metabolic network since modifications or loss of core enzymes would affect metabolism of many connected substrates that are funneled through the same enzyme (31, 37).
|
View this table: [in a new window] |
TABLE 1. Overrepresented GO functions in the dispensable genomea
|
![]() View larger version (24K): [in a new window] |
FIG. 3. Division of strains into clades based on CGH profile. The UPGMA tree was generated using Jukes-Cantor distances and 1,000 bootstrap replicates, which provide 100% support for the five genotype clusters. Clades A, B, C, and D and clinical strains are shown in cyan, green, yellow, red, and brown, respectively. Bootstrap scores greater than 50 are displayed above the respective nodes.
|
![]() View larger version (22K): [in a new window] |
FIG. 4. Distribution of genotype groups illustrates relationship with changes in the environment. Clades A, B, C, and D are shown in cyan, green, yellow, and red, respectively. (A) Number of unique genotypes in each clade isolated, plotted for each month throughout the year. (B) Diversity of genotypes isolated from individual sampling sites over the entire sampling period listed, from north to south. Total numbers of strains sampled are listed below each pie graph. (Bottom) Canonical correspondence analysis ordination plots for (C) water temperature and (D) log ammonium. Each spot represents a genotype colored by clade, with the size of the spot proportional to the magnitude of the parameter when the strain was isolated. R values for axes 1 and 2 for water temperature are 0.742 and 0.294, respectively, confirming that clade C, followed by D, is most likely to be found in warmer water. R values for axes 1 and 2 for log ammonium are 0.470 and 0.779, respectively, confirming that clade D, followed by B, is the most likely to be found in nutrient-enriched waters.
|
Dispensable genes whose presence correlates to specific environmental conditions may provide a selective advantage to Vibrio cholerae for survival and growth under those conditions. Isolates that cluster together in clades defined in Fig. 3 have similar dispensable genomes. Because membership in a specific clade is determined by only a subset of dispensable genes, many of the remaining genes that comprise the dispensable genome vary between strains within the same clade. Assuming that natural selection acts on the level of the gene, especially if gene loss and horizontal exchange are common, then we expect genes under selection in a subset of our sites or sampling events to correlate with environmental parameters that reflect the differences among samples. A two-tailed t test was used to identify specific dispensable genes whose presence was significantly correlated (P < 0.05) to environmental water parameters. Step-wise exclusion of water parameters from a logistic regression model was used to identify which combinations of parameters are the best predictors for the observed pattern of gene conservation (Table 2). Strains isolated from samples with colder water temperatures are more likely to contain genes annotated with functions in iron transport, transport and metabolism of chitosan oligosaccharides, chemotaxis, and chitin degradation. In contrast, strains from warmer waters are more likely to possess a gene cluster involved in exopolysaccharide production, another for protection against superoxide, and a fructose transporter. The presence of the gene cluster involved in fructose transport was shown to confer mannose metabolism due to an adjacent mannose-6-phosphate isomerase (27). Strains that are isolated from samples with lower inorganic nutrient concentrations are more likely to contain genes for ectoine biosynthesis, and strains from lower salinity samples more often contain genes encoding citrate transport and fermentation functions. Interestingly, conservation of a gene cluster enabling metabolism of sialic acid (N-acetylneuraminate) does not significantly correlate with water temperature but displays a strong seasonal signal, where 11 out of 12 strains that possess these genes were isolated from March through June, while only 60% of all strains were isolated during this period.
|
View this table: [in a new window] |
TABLE 2. Statistically significant correlations between gene presence and measured environmental variables for selected clusters of genesa
|
All 45 strains of Vibrio cholerae were tested for metabolic diversity using proprietary phenotype microarrays (Biolog, Hayward, CA). Each assay was scored as growth or no growth (see Table S3 and Table S4 in the supplemental material). Out of 190 carbon sources assayed, at least one strain grew on each of 47 substrates. Of these 47 substrates, 95% or more (at least 43 of 45) strains grew on 26 of the carbon sources. These 26 substrates (N-acetylglucosamine, succinate, D-galactose, D-trehalose, glycerol, D-gluconate, L-lactate, D-mannitol, D,L-malate, D-fructose,
-D-glucose, maltose, D,L-
-glycerolphosphate, maltotriose, adenosine, fumarate, inosine, L-serine, L-malate, pyruvate, dextrin, L-asparagine, L-glutamate,
-ketobutyrate, D-glucosamine, and sucrose) comprise functions residing in the V. cholerae core metabolome. The remaining 21 carbon sources were classified as the dispensable metabolome (Table 3). It should be noted that the metabolomes listed in Table 3 are undoubtedly incomplete because only a subset of possible metabolites were tested. Strains were grown in liquid culture in microtiter plates, so metabolic pathways involved in other growth conditions might not have been detected (i.e., surface-attached cells or eukaryote-associated growth).
|
View this table: [in a new window] |
TABLE 3. Dispensable carbon sources for Vibrio cholerae metabolismd
|
The Vibrio cholerae core genome supports a generalist lifestyle.
Epidemic cholera has not been documented in central California for over 150 years, since 1850 when it claimed approximately 1,000 and 250 lives in Sacramento and San Francisco, respectively (15). Therefore, V. cholerae strains isolated from coastal waters here have likely not been in the human intestine for tens or hundreds of thousands of bacterial generations, if ever. These environmental isolates may even have diverged from ancestral pathogenic strains before the acquisition of the cholera toxin genes, so the isolates may never have contained genes under selection for causing cholera disease. If intestinal-specific genes were ever present in these isolates, then over time, gene loss and genome degradation would have periodically removed genes that were not selected for given the bacterium's lifestyle and ecology (28). If genes expressed while in the animal intestine provide no selectable benefit for life in coastal ecosystems, then these genes might be expected to be preferentially absent from the environmental isolates. To test whether this was the case, we compared our data with expression data from rice-water stool samples from three cholera patients in Bangladesh (26) and fluid from the ileal loops of infected rabbits (41). All strains of V. cholerae used in the expression studies were O1 El Tor biotype strains genetically similar to N16961. Genes with upregulated expression in each of the rice-water stool samples relative to that at mid-log phase in LB culture were mapped to the gene sets defined by CGH (Table 4). Remarkably, between 71 and 88% of the genes that were upregulated in stool samples from cholera patients belonged to the core gene set, while less than 7% were absent from all of the environmental isolates. For in vivo expression data from the rabbit ileal loop, approximately 90% of the upregulated genes mapped to the core gene set with either genomic DNA or mid-log phase in LB culture as a reference. Similar results were observed with additional ileal loop expression data using the O1 El Tor strain 92A1552 (N. Dolganov, personal communication). Comparison of these data with the those of the V. cholerae core and dispensable gene sets using a cumulative binomial distribution function indicates that genes upregulated in rabbit ileal loop samples are not preferentially absent from environmental isolates relative to those of the rest of the genome (P < 0.01). This trend is also true when looking at genes upregulated in stool from two or more of the cholera patients but is not statistically significant (P = 0.32). These data indicate that some of the genes in the core genome that are induced during in vivo growth are also found in the genomes of V. cholerae from California coastal waters. This finding likely indicates that such genes not only have a role during growth in vivo but are also required for survival of V. cholerae within one or more aquatic niches. Ultimately, the V. cholerae core genome contains genes useful for inhabiting multiple diverse habitats, including the animal intestine and coastal marine environments.
|
View this table: [in a new window] |
TABLE 4. Distribution of upregulated genes in rice water stool and ileal loop samples in gene sets defined by comparative genome hybridization
|
|
|
|---|
The coassociation of genes involved in mannose metabolism, superoxide protection, and exopolysaccharide production with warmer water temperature suggests that these genes may be selected for in environments with increased oxidative stress. Mannose was found to be a component of the exopolysaccharide produced by the V. cholerae O1 rugose variant, which provides increased resistance to osmotic and oxidative stress relative to that of smooth phenotype strains (39). During the summer months when water temperatures are higher, more solar radiation is absorbed by dissolved organic matter in coastal waters, leading to an increase in the production of reactive oxygen species (24). Strains containing genes for the production of protective exopolysaccharides and superoxide dismutase may be selectively isolated from warmer water samples due to their increased resistance to oxidative stress. Similarly, the coassociation of colder water temperature with genes encoding chitin degradation, chemotaxis, and chitosan oligosaccharide transport and metabolism functions suggests that these genes may all be under selection for chitin utilization. It is not surprising that chitin utilization is important in the coastal aquatic environment, and both particulate and soluble forms of chitin were shown to enhance the survival of V. cholerae at low temperature (1). Further exploration is required to assess whether the selection of chitin utilization genes in cold water also reflects the seasonal availability of chitin or other carbon sources at our sites.
From the dispensable metabolome, we identified six phenotypes with significant (P < 0.05) correlations with environmental parameters, including water temperature, salinity, turbidity, ammonium concentration, and season. The ability of strains to grow on D-glucuronate was positively correlated with turbidity, while increased salinity was correlated with growth on p-hydroxyphenylacetate and tyramine. The latter two substrates are products of the anaerobic deamination and decarboxylation of the amino acid tyrosine, respectively. In the environment, these transformations could occur in animal waste or anaerobic sediment in water bodies receiving high nutrient inputs (9, 25). Six out of seven strains that grow on p-hydroxyphenylacetate and tyramine were isolated from the Old Salinas River and Moss Landing Harbor sites, in which agriculture makes up 68% of watershed land use, as opposed to 16% for other sites where V. cholerae strains were isolated (16). These sites receive runoff from agricultural fields and grazing areas where water can mix with raw or composted manure (L. Crawford-Miksza [California Department of Health Services, Food and Drug Laboratory Branch], personal communication). The ability of some strains to grow on p-hydroxyphenylacetate and tyramine suggests the presence of agricultural runoff that stimulates anaerobic activity in the surrounding sediments or provides a rich supply of amines and volatile fatty acids. The coassociation of growth on these substrates with increased salinity reflects higher salinities at sites in agricultural areas relative to the that of other sampling sites.
An obvious drawback to CGH is the restriction of data sets to the sequences printed on the microarrays. We cannot use CGH to explore expanded genome content or the phenotypes that those genes might encode. In the present study this means we are unable to assess the genomic basis for the ability of certain strains to metabolize 43% of the carbon substrates in the dispensable metabolome (D-glucuronate, L-threonine, p-hydroxyphenylacetate, tyramine, alpha-cyclodextrin, gelatin, laminarin, and N-acetyl-D-galactosamine). Despite the limited scope of the CGH sequence coverage, the trends observed in functional bias for the dispensable genome are corroborated by the expanded genome content in other environmental V. cholerae strains uncovered by subtractive hybridization. Unique sequence fragments found in two environmental V. cholerae strains from southern California were enriched in mobile elements and genes involved in cell surface modification, bioluminescence, transport, carbohydrate metabolism, virulence, stress resistance, and signal transduction (32).
Despite the high phenotypic diversity among environmental strains of Vibrio cholerae in central California, little is known about how various phenotypic traits might affect the fitness of these strains in coastal waters. More work is required to analyze the relative importance of individual carbon and nutrient sources to overall metabolic requirements in coastal aquatic systems. In studying a Vibrio splendidus population in Plum Island Sound, Thompson et al. uncovered extremely high genotypic diversity that appears to be neutral, revealing no population structure in time or space (38). While statistically significant correlations suggest that our genotype clusters align themselves with different environmental conditions, further characterization of the sampling sites and the relative fitness of genotypes is needed to assess the importance of the observed genomic diversity. Finally, this analysis only uses isolates we were able to cultivate on selective media, so molecular probing of environmental samples will be needed to understand how these patterns apply to the nonculturable population of Vibrio cholerae strains.
This work was funded by the Woods Institute for the Environment (to D.P.K., A.B.B., and G.K.S.), by the NIH (to G.K.S.), and by the Giannini Family Foundation (to M.C.M.).
Published ahead of print on 20 April 2007. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»