Previous Article | Next Article ![]()
Applied and Environmental Microbiology, April 2004, p. 2464-2473, Vol. 70, No. 4
0099-2240/04/$08.00+0 DOI: 10.1128/AEM.70.4.2464-2473.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Yniv Palti,
,
Riva Gur-Arie, Helit Cohen, Eric M. Hallerman,
and Yechezkel Kashi*
Department of Food Engineering and Biotechnology, TechnionIsrael Institute of Technology, Haifa 32000, Israel
Received 10 September 2003/ Accepted 13 January 2004
|
|
|---|
|
|
|---|
Subdivision of bacterial strains is based on various O:H serotypes, discrete virulence and adherence properties, and distinct clinical phenotypes. Electrophoretic allozyme typing has been used to assess genetic relationships among E. coli strains in several studies (31, 32, 35, 41, 49, 50). In these studies, it was demonstrated that O:H serotyping is not sufficient for defining phylogenetic relatedness among E. coli strains. Analysis of the DNA sequences of housekeeping genes has also been used to study phylogenetic relationships among E. coli strains (21, 26, 35). Reid et al. (36) used sequence data for seven housekeeping genes to elucidate the evolution of pathogenic mechanisms by inferring phylogenetic relationships among 14 EHEC, EPEC, and K-12 strains.
Simple sequence repeats (SSRs, or microsatellites) are a class of DNA sequences consisting of simple motifs that are tandemly repeated at a locus (47). SSRs have long been known to be distributed throughout the genomes of eukaryotes, highly polymorphic (43, 48), and useful as tools for phylogenetic inference (5). The variability observed in SSRs is thought to be caused by slipped-strand mispairing. The abnormal tertiary structure of repetitive DNA allows mismatching of neighboring sequences, and repeats can be inserted or deleted during DNA duplication (reference 46 and references therein). Screening of prokaryotic genomes for SSRs has revealed large numbers of SSR tracts (7-9, 45, 46). Publication of the complete genome sequence for E. coli (2) provided the basis for characterizing SSR tracts in this organism, both genomewide and at particular loci (8, 9, 24). PCR-based assays have been developed in our laboratory for screening SSR polymorphism in different E. coli strains. Mononucleotide SSRs (mononucleotide repeats [MNRs]), consisting of at least five repeats, were found to be abundant and polymorphic in noncoding regions of the E. coli genome (9, 24).
Inference of evolutionary relationships within E. coli is not trivial due to horizontal transfer of genomic sequences between strains and species, which leads to fragmentation of the "clonal frame" (25). The inferred evolutionary history of a particular lineage may differ among different parts of its genome. Thus, combining data from two loci may obscure the reconstruction of either history (21). Statistical methods for identifying regions of recombination and for assessing its impact on phylogenetic reconstruction rely on large numbers of polymorphic genetic markers distributed throughout the genome. Therefore, screening of multiple polymorphic MNR loci is appropriate for supporting inferences about evolutionary relationships within E. coli and as a model system for phylogenetic studies in bacteria.
Metzgar et al. (24) found poor consistency between phylogenetic trees constructed by amplification fragment size analysis of SSRs and the standard E. coli reference (ECOR) tree of Herzer et al. (12). They concluded that individual SSRs mutate too frequently to retain meaningful phylogenetic information at the evolutionary scale represented by the standard ECOR tree. However, we hypothesize that by combining sequence data from as many loci as possible, MNRs may be used to construct phylogenetic trees which are consistent with those reported in previous studies that used other genetic markers. Additionally, we hypothesize that sequence variation at the flanking regions of the MNRs contains important information that can aid in the reconstruction of evolutionary relationships. Hence, in this study, we addressed two questions: Are randomly selected noncoding loci that contain MNRs more polymorphic at the sequence level than noncoding loci that do not contain SSRs in E. coli? How useful are those MNRs and their flanking sequences for inferring evolutionary relationships in E. coli when analyzed for sequence polymorphism?
Seven noncoding loci (four MNRs and three non-SSRs) were sequenced in 27 EHEC, EPEC, ETEC, B, and K-12 strains to address the first question. The MNR loci were also examined in the 72 strains of the ECOR collection (31) to enable comparison of the inferred evolutionary relationships with those of previous studies (12, 24, 35, 36, 50). The underlying goal of this study was to test the utility of MNRs for phylogenetic studies and strain identification in E. coli as a model system for prokaryotes.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Details
of E. coli strains analyzed in this study
|
|
View this table: [in a new window] |
TABLE 2. PCR
primers used for amplification of given loci, based on genomic sequence
in E. coli K-12
|
![]() View larger version (17K): [in a new window] |
FIG. 1. Genomic
locations of the noncoding loci sequenced. The locations are based on
the sequenced E. coli K-12 genome (2) as follows:
pepD, kb 254; ykgE, kb 321; yaiN, kb 379;
ycgW, kb 1211; osmB, kb 1341; gals, kb 2239;
and b2345, kb
2461.
|
DNA sequencing.
PCR products
were purified (QIAquick; Qiagen) and sequenced on both strands on an
ABI 310 automated DNA sequencer with the BigDye terminator cycle
sequencing kit (Applied Biosystems, Inc.), following established
procedures (3). Only
sequences with complete agreement between the two strands were used for
further analysis. Multiple alignment of the sequences was performed
with the Sequence Navigator program (version 1.0.1; Applied Biosystems,
Inc.).
Sequence polymorphism: comparison between MNR-containing loci and non-SSR loci.
Two methods were used to assess the
level of sequence variation at each locus. In the first, the percentage
of polymorphic positions in a locus (single nucleotide polymorphisms
[SNPs]) divided by the number of strains that were amplified
in that locus was determined: polymorphism = [(SNP/SEQ)
x 100]/no. of strains, where polymorphism is the fraction
of polymorphic positions, SNP is the number of SNPs, SEQ is the length
of the sequence in base pairs, and the number of strains is the number
of strains that were amplified by PCR. The second method was the same
except that the core repeat motifs within the MNRs were not included in
the calculation of polymorphism. This was done to quantify the level of
polymorphism in the sequences flanking MNR loci.
The number of microhaplotypes (set of specific mutations within limited chromosomal regions) observed at each locus was normalized as a fraction of the sequence length and number of strains that were amplified as in the first method. Two sequences were considered different genotypes if they contained at least one position at which they differed.
Student's t test for unequal variances (27) was conducted to compare the level of polymorphism between MNR and non-SSR loci as calculated by each of the two methods.
The comparison of sequence polymorphism between MNR and non-SSR loci was conducted in 27 strains (not including ECOR strains [Table 1]).
Phylogenetic analysis.
The program
Reticulate was used to identify putative recombination between loci
through the construction of a compatibility matrix
(13). The program START,
version 1.0.4, written by Keith Jolley, University of Oxford, 2000
(Index of Association implementation modified from code provided by
John Maynard-Smith, United Kingdom), was used to calculate the index of
association (IA) value for the four MNR loci. An
IA value that is not significantly different from 0
indicates that the loci may be incompatible
(42). Loci that show
incompatibility should not be combined for phylogenetic analysis, since
their incompatibility may be the result of horizontal gene transfer
(21).
Phylogenetic trees were inferred for each of the four MNR loci and for the combined sequence of all four loci. Trees were constructed by the unweighted pair-group method with arithmetic means (UPGMA) and neighbor-joining (NJ) algorithms (MEGA software) (29) and by the method of maximum parsimony (MP) (6). Bootstrap confidence values for the UPGMA and NJ algorithms were based on 1,000 simulated trees.
|
|
|---|
|
View this table: [in a new window] |
TABLE 3. PCR
amplification data for four MNR loci in the 72 ECOR strains
|
In order to simplify the sequence analysis, all the polymorphic sites were joined to create an artificial core sequence alignment. For example, the core sequence of yaiN is 13-bp point mutations different from the consensus 198-bp sequence. The alignment results for four MNR loci in 41 strains is presented in Fig. 2.
![]() View larger version (103K): [in a new window] |
FIG. 2. Core
artificial sequence alignment. The polymorphic positions (SNPs,
insertions, and deletions) at four MNR loci were joined to create a
reduced sequence alignment (68 polymorphic points along 534 bp) for 41
E. coli strains. Numbers above each column indicate
the position at the specific locus.?, entire locus was not amplified in
this strain; , the specific position was absent (i.e.,
deleted).
|
|
View this table: [in a new window] |
TABLE 4. Levels
of polymorphism at MNR loci compared to non-SSR type loci among 27
E. coli
strainsa
|
![]() ![]() View larger version (90K): [in a new window] |
FIG. 3. (a)
Compatibility matrix of polymorphic positions. The matrix was
constructed based on 68 polymorphic positions at four MNR loci of 41
E. coli strains with the Reticulate software. Each square
describes pairwise compatibility relationships between the 68 sites.
Black squares, incompatibility; white squares, compatibility between
two polymorphic sites at two different loci (between loci); colored
squares, compatibility between two polymorphic sites at the same locus
(within locus). (b) Compatibility at four MNR loci. Levels of
between-loci compatibility are plotted against with-in locus
compatibility.
|
A phylogenetic tree was constructed based on each of the loci separately (supplemental material may be found at http://www.technion.ac.il/technion/food/) and on the multilocus sequence of the four MNR loci (Fig. 4).
![]() View larger version (28K): [in a new window] |
FIG. 4. Phylogenetic
tree of 41 E. coli strains. The phylogeny was based on the
polymorphic sites at the four MNR loci (multilocus analysis). The tree
was constructed with the UPGMA method. The numbers at the nodes are
bootstrap confidence values based on 1,000 replicates. The ECOR A, B1,
and D groups are
indicated.
|
|
|
|---|
Our goal in this phylogenetic analysis was to utilize polymorphic DNA sequences by combining four MNR loci for multilocus sequence typing (MLST) analysis. We found that sequence analysis of MNR loci produced phylogenetic trees that are in good agreement with those constructed by use of other genetic markers.
A panel of the ECOR strains that were amplifiable at most of the MNR loci and that represented the five phylogenetic groups of Herzer et al. (12) was chosen for the phylogenetic study. Based on the rooted phylogenetic analysis of Lecointre et al. (21), B2 is the most ancient E. coli group, followed by D and then the sister groups A and B1. Since ycgW, b2345, and ykgE were not amplifiable in the B2 group (Table 3), they likely were transferred into the genome of the common ancestor of the D group (ycgW and b2345) or during the early evolution of D group before the segregation of the A and B1 clades (ykgE). These PCR amplification results (Table 3) can support the hypothesis that yaiN is the oldest locus of the four, followed by ycgW and b2345 and then ykgE.
The divergence of bacterial strains in nature is accelerated by the high rate of recombination, which may indicate horizontal gene transfer, resulting in fragmentation of the clonal frame of each strain (10, 25). Loci identified as recombination hot spots should be removed from the multilocus analysis because they may represent different evolutionary histories (21). Several approaches for inferring recombination and horizontal gene transfer in bacteria from DNA sequence data have been described (4, 13, 19, 20, 33, 42). Due to the availability of appropriate software and their wide use in other studies, we used the compatibility matrix approach (13) and calculated an association index for the four loci (42). Both approaches aim to examine the sequence compatibility between loci. A low level of compatibility indicates that the loci have experienced several changes due to recombination or repeated mutations at specific sites. The analysis of our sequence data by the two methods suggested that the four loci are compatible. The order of sequence compatibility plotted in Fig. 3b is in agreement with the hypothesis that we drew from the PCR amplification analysis: the longest-evolving locus, yaiN, had the lowest within-locus and between-loci compatibility levels, followed by b2345 and ycgW and then ykgE, which had the highest between-loci compatibility level. However, it is important to note that the source of yaiN's low between-loci compatibility is its hypervariable poly(G) MNR. Both ycgW and b2345 had higher within-locus compatibility values. This high level of within-locus conservation may be related to their functions, which may be subject to selection pressures.
A multilocus tree was constructed from the combined sequence of the four loci (Fig. 4), and for each locus separately (http://www.technion.ac.il/technion/food/). We found very good agreement among the three methods of phylogenetic analysis that we used (NJ, UPGMA, and MP). Groups A and D clustered as expected, and group B1 strains branched separately from A and D but did not cluster with each other. Group B2 is not present because it did not amplify at three of the four loci.
The six O157:H7 outbreak serotypes that we examined had completely identical sequences at the loci examined. O157:H7 clustered tightly with O55:H7 and the ungrouped EC37, which is consistent with findings in previous studies (35, 50). O55:H7 and O157:H7 evolved recently from a common ancestor and are more likely to be distinguished from each other by the presence or absence of the PCR amplicon due to the high rate of insertion and deletion events in the O157 genome (17, 30). EC42 was closely related to this cluster in the MP and NJ trees but not in the UPGMA tree (Fig. 4). It was found by Pupo et al. (35) to cluster with O157:H7 and EC37.
The K-12 strains O111ac:H (EPEC) and O78:H (ETEC) clustered with group A strains in the multilocus analysis and in each of the trees constructed for the individual loci. The clustering of these strains with group A is also consistent with the results of previous studies (11, 35). The clustering of O86:H18 (ETEC) with group D was a consensus in the three multilocus trees and was also evident in all the single-locus trees except that for yaiN.
An illustration of the increased resolving power of the multilocus analysis was found in the clustering of EC08 with O78:H (ETEC) and O157:H (EHEC) with O153:H(ETEC). The two clusters were not distinct in any of the single-locus trees but were distinct in each of the three multilocus trees. Further support for the advantage of multilocus over single-locus phylogenetic analysis was recently reported by Rokas et al. (38)
Most of the EPEC and EHEC strains did not cluster with either group A or D. With the addition of loci to the analysis and increase in resolution, they may cluster with B1 strains or with strains from the rapidly evolving ungrouped (E) category. However, since they were amplified at all loci, it is unlikely that they are closely related to the B2 group.
MLST of a number of fragments from housekeeping genes is widely used for evolutionary studies and has been put forward as a powerful tool for "global" epidemiology (23, 30, 36). DNA sequencing provides far more variation per locus than any other method currently used for bacterial strain typing, and it provides a uniform platform for comparison between laboratories and for database storage. Noncoding loci that contain mononucleotide SSRs were significantly more polymorphic at the sequence level than loci that did not contain SSRs (Table 4). Combining several polymorphic SSR loci enables the use of these sequences for SSR-based MLST. In this study, we demonstrated that MLST of MNRs from randomly chosen noncoding regions is as consistent and reliable as MLST of housekeeping genes. The advantage of MNRs is that they provide much higher variation per base.
The level of resolution that requires sequencing thousands of base pairs from housekeeping genes can be achieved by sequencing hundreds of base pairs from MNR loci. This should be most cost effective in clonal species such as E. coli and should make MLST a more rapid and affordable tool for epidemiology and clinical diagnostics.
Recent studies demonstrating that O157:H7 is rapidly evolving by unknown mechanisms of insertion and deletion of genomic fragments (17, 34) suggested a method for typing of O157 strains based on the absence or presence of specific PCR amplifications. In this study, we found that the amplification of randomly chosen sites of the E. coli genome can be useful for "local" typing of closely related strains (e.g., O55:H7 and O157:H7 with ycgW) as well as "global" typing for distinguishing between subspecies (of the four MNRs, only yaiN was amplified in B2 strains). This amplicon presence or absence approach was successful for strain typing in E. coli and Listeria spp. in our laboratory (L. Somer, E. Diamant, R. Gur-Arie, Y. Palti, Y. Danin-Poleg, and Y. Kashi, submitted for publication), and in combination with MNR, MLST can provide an efficient tool for epidemiology and for the development of rapid diagnostic kits for bacterial pathogens.
There is accumulating evidence that SSRs serve a functional role, affecting gene expression, and that polymorphism of SSR tracts may be important in the evolution of gene regulation (14, 16, 18, 28, 37, 39, 44-46). The markers used in phylogenetic studies should be as neutral as possible. Due to the potential involvement of SSRs in gene regulation, inference of SSR variation for evolutionary studies should be conducted with attention given to the ecological and epidemiological conditions. SSRs in genes that are known to contribute to ecological adaptation should be avoided in studies designed to infer evolutionary relationships. However, as demonstrated in this study, one way to overcome this problem is to conduct multilocus analysis, which dilutes the bias of individual loci.
In this study, we found that randomly selected noncoding loci that contain MNRs were significantly more polymorphic at the sequence level than noncoding loci that did not contain SSRs in E. coli. We also found that these polymorphic MNRs were useful for inferring phylogenetic relationships and reconstructed trees that were consistent with the standard multilocus enzyme electrophoresis trees (12, 35, 50). The usefulness of SSRs for evolution studies and strain typing in less clonal species such as Neisseria meningitidis (23) should be tested in similar future studies.
This research was supported by the Grand Water Research Institute, Mitchel Soref Innovation Awards program, Technion, and by the Otto Meyerhof Center for Biotechnology, Technion, established by the Minerva Foundation, Germany. R. Gur-Arie was supported by the Food Control Administration in the Israel Ministry of Health. Eric Hallerman was supported by the Fulbright Senior Scholars Program and by the Virginia Polytechnic Institute and State University.
E.D.
and Y.P. contributed equally to this
article. ![]()
Present address: National
Center for Cool and Cold Water Aquaculture, USDA-ARS,
Kearneysville, WV 25430. ![]()
Permanent
address: Department of Fisheries and Wildlife Sciences, Virginia
Polytechnic Institute and State University, Blacksburg, VA
24061. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»