Previous Article | Next Article ![]()
Applied and Environmental Microbiology, August 2005, p. 4784-4792, Vol. 71, No. 8
0099-2240/05/$08.00+0 doi:10.1128/AEM.71.8.4784-4792.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
INSERM U722 and Université Paris 7, Faculté de Médecine Xavier Bichat, 16 rue Henri Huchard, 75018 Paris, France,1 UMR 5122, Université Claude Bernard Lyon 1, 10 rue Dubois, 69622 Villeurbanne cedex, France,2 Service de Biochimie Génétique, Hôpital Robert Debré, 48 boulevard Sérurier, 75019 Paris, France3
Received 2 November 2004/ Accepted 22 February 2005
|
|
|---|
|
|
|---|
Here, we developed a new, simple, rapid, cost-effective readout method for phylotyping based on single base changes that are informative for phylogeny. The phylogenetic SNPs were identified from a DNA sequence analysis of 11 essential genes (encoding seven metabolic proteins and four DNA polymerases), which allowed phylogenetic tree reconstruction for 30 E. coli reference strains (20) representing the genetic diversity of the species (12). In the first step, this phylogenetic typing method was validated in silico with 65 pathogenic strains. In the second step, it was applied to 183 commensal and extraintestinal pathogenic strains of E. coli and compared to several typing methods.
|
|
|---|
General strategy of SNP phylotyping.
Multiple PCRs were first required to amplify the specific genes of interest encompassing the SNPs (step 1) (Fig. 1). Then dideoxy single-base extensions of unlabeled oligonucleotide primers were prepared with specific primers (SNaPshot multiplex assays), and the reaction products were loaded on a DNA sequencing apparatus (step 2). Finally, the SNP data obtained were analyzed as informative site data for phylogenetic analysis using classical phylogenetic tools (step 3).
![]() View larger version (23K): [in a new window] |
FIG.1. General strategy used for SNP phylotyping. In step 1, chromosomal DNA was used as a template for amplification of five genes (trpA, trpB, putP, icdA, and polB) in multiplex PCRs. In step 2, after purification of PCR fragments, dideoxy single-base extensions were added to specific oligonucleotide primers with fluorescent dideoxynucleoside triphosphates (ddNTPs) by using the SNaPshot reaction protocol (Applied Biosystems). The labeled primers were loaded onto a DNA sequencing apparatus, and the results were determined. In step 3, the SNP data were used as informative sites for phylogenetic tree construction using a classical phylogenetic approach. Note that the numbering of SNPs is arbitrary and corresponds to the localization of the SNPs in the concatenated alignment of 11 genes (12).
|
|
View this table: [in a new window] |
TABLE 1. Primers used for the amplification of trpA, trpB, putP, icdA, and polB and primers used for SNaPshot analysisa
|
Samples were analyzed with an ABI PRISM 3100 genetic analyzer (Applied Biosystems). One microliter of a SNaPshot reaction mixture was added to 14 µl of Hi-Di formamide. Samples were denatured at 100°C for 5 min and then placed on ice until they were loaded onto the analyzer. The parameters used were dye set E and the SNP36_POP4 default module (Applied Biosystems). After runs, the results were read using the GeneScan software (Applied Biosystems). Note that the numbering of SNPs is arbitrary and corresponds to their localization in the concatenated alignment of 11 genes (12) (Fig. 2B).
![]() View larger version (39K): [in a new window] |
FIG. 2. Phylogenetic SNPs used for E. coli typing. (A) Phylogenetic tree of 30 strains from the ECOR collection (20) reconstructed by parsimony using the 13 informative phylogenetic SNPs extracted from the complete sequences of five genes and rooted with E. fergusonii. The 12 nodes were numbered as follows. Nodes 1 and 2 correspond to subgroup Aa (SNP 2396) and group A (SNP 10607), respectively. Node 3 (SNPs 5081 and 10676) corresponds to the B1 group. Nodes 4 (SNP 5066), 5 (SNP 10376), and 6 (SNP 2022) correspond to the ancestor groups of groups A and B1, groups A, B1, and E, and groups A, B1, E, and D, respectively. Node 9 corresponds to group D (SNP 2258), and nodes 7 and 8 correspond to subgroups Da (SNPs 381) and Db (SNP 10607). Node 12 corresponds to group B2 (SNP 2313), and nodes 10 and 11 correspond to subgroups B2a (SNP 1096) and B2b (SNPs 220 and 2258). The phylogenetic group to which a strain belongs is indicated after the designation of the strain and was based on MLEE data (16) in most cases; the exception was the ECOR37 strain, which was considered a group E strain (12) (tree length = 24, consistency index = 0.58, retention index = 0.90). The numbers below the nodes are bootstrap percentages calculated from 1,000 iterations. Only values that are 50% are indicated. (B) SNP combinations for five genes (trpA, trpB, putP, icdA, and polB) for the 30 E. coli strains and E. fergusonii. The nucleotides at the 13 SNP positions are indicated for each strain.
|
|
|
|---|
In silico analysis of a collection of E. coli strains representing the pathovar diversity of the species.
To validate in silico the choice of these SNPs, our SNP phylotyping technique was first applied to a collection of 65 E. coli strains representing all the intestinal and extraintestinal pathovars previously studied by MLST (11) and the E. coli K-12 strain (3). The 13 SNPs were extracted from the complete sequences of five genes (trpA, trpB, putP, icdA, and polB) (11, 12). The SNP data for these 66 strains were added individually for each strain to the data of the 30 ECOR reference strains for phylogenetic reconstruction of a 50% majority rule consensus tree using parsimony in PAUP*4.0. The data for 48 (73%) of the 66 strains did not disturb the topology of the phylogenic tree and allowed easy grouping within the defined groups and subgroups. For 16 strains (23%), addition of the SNP data disturbed the topology of the phylogenic tree, but the strains in groups and subgroups remained closely related. In these cases, the E. coli strain tested could also be assigned to a phylogenetic group or subgroup since the strain clustered with a reference strain. Finally, one strain, DAEC213, could not be assigned to any group or subgroup (see Fig. S1A in the supplemental material). This could have resulted from horizontal transfers in the genes studied affecting the tree reconstructed from the SNP data but not the tree reconstructed from the MLST data, as in this case the horizontally transferred genes were swamped in the remaining informative sites. Results obtained from the SNP phylotyping were then compared to previous phylogenic MLST typing done with this collection (12). Excluding the strain that we were unable to group by SNP typing, 60 of 65 strains (92.3%) belonged to identical groups as determined by both approaches. The five differences were due to mistyping involving strains that in the MLST analysis (11) were in a basal position with low bootstrap values in the A, B1, and D groups, indicating weak support of the phylogenetic positions of the strains. As expected, the K-12 strain was assigned to the A group.
These results show that the information for the 13 SNPs could be sufficient to reconstruct the phylogenic history of E. coli and that the SNP phylotyping approach could be suitable and robust for phylogenic analyses of new collections of strains.
Oligonucleotide design and SNaPshot validation.
To use SNaPshot analysis, primers for each of the 13 phylogenetic SNPs were designed so that they exactly adjoined the variable nucleotide with the constraint of homogeneity at the annealing temperature (Table 1). The 30 ECOR strains that were used to determine the phylogenetic SNPs were then analyzed by the SNaPshot approach. The reproducibility of the method was assessed by using a panel of five ECOR strains belonging to different phylogenetic groups (ECOR4, ECOR26, ECOR27, ECOR40, and ECOR47). Identical results for a given strain were obtained when SNaPshot reactions were performed (i) with the same PCR products or (ii) with PCR products resulting from different DNA amplifications (data not shown). An example of an electropherogram of two SNaPshot reactions for strain ECOR63 is shown in Fig. 3. For all strains, double peaks were observed with primer SNP-2396 in the first SNaPshot reaction (Fig. 3A) and with primer SNP-220 in the second SNaPshot reaction (Fig. 3B). Although these primers were purified by HPLC, mass spectrometry analysis showed that each oligonucleotide primer (SNP-2396 and SNP-220) is a mixture of molecules of different lengths, one corresponding to the expected size (35 bases) and another of 34 bases (data not shown). This explains the two peaks observed experimentally. Moreover, another double blue (guanine) peak, as well as a weak green (adenine) peak in SNaPshot reaction 2 with the SNP-220 primer, were observed (Fig. 3B). These peaks were considered artifacts and were not taken in consideration since they were also present with only the SNP-220-primer in the absence of PCR-amplified DNA (data not shown). Such peaks were probably due to a potential 7-nucleotide annealing site between two molecules of the SNP-220 primer, as predicted by bioinformatics analysis (data not shown). This annealing led to addition of an adenine or a guanine at the end of the SNP-220 primer during the SNaPshot reaction.
![]() View larger version (25K): [in a new window] |
FIG. 3. SNaPshot analysis migration profile of strain ECOR63. (A) SNaPshot reaction 1. (B) SNaPshot reaction 2. The migration order corresponds to the primer lengths. The double peaks obtained in SNaPshot reaction 1 with the SNP-2396 primer (putP) (A) and in SNaPshot reaction 2 with the SNP-220 primer (trpA) (B) were due to a mixture of primers of different lengths (34- and 35-mers). Unfilled peaks with SNP-220 (trpA) are obtained in the presence of the primer without template DNA and thus were not taken into account in the analysis and were considered artifacts. The SNP combination defining ECOR63 as a subgroup B2b strain is indicated in panel C. Note that the numbering of SNPs is arbitrary and corresponds to the localization of the SNPs in the concatenated alignment of 11 genes (12).
|
Validation of SNP phylotyping by analysis of the complete ECOR collection.
To determine whether the new method of phylotyping was suitable for the diversity of the species E. coli, we extended the analysis to the 42 remaining strains of the ECOR collection (20). The SNPs were determined for these 42 additional strains by SNaPshot reactions and were used to construct a tree with the SNPs of the 30 ECOR strains, which was rooted with E. fergusonii (Fig. 4). Whatever optimal criterion was chosen to reconstruct the tree (parsimony, unweighted-pair group method using average linkages, or neighbor joining), the global topology of the tree was in accordance with our current knowledge concerning the phylogeny of the species. Only the tree constructed with parsimony as the optimal criterion is presented in Fig. 4. The B2 group strains are basal strains. There are two additional B2 subgroups, one consisting of the ECOR54 strain and the other intermediate between the B2 and D groups consisting of strains ECOR65 and ECOR66. It should be noted that the ECOR66 strain is the most external of the group B2 strains in the MLEE tree (16) and that it has a group D profile as determined by ribotyping (6). Group D strains are located between group B2 and groups A, B1, and E. They include the ECOR42 strain considered "ungrouped" in the MLEE analysis (16) but classified as a group D strain by PCR phylotyping (5). The ECOR69 group B1 strain is clustered with the ECOR37 strain in group E, outside groups A and B1. This strain has an atypical ribotype profile that is different from that of group B1 strains (6). Groups A and B1 are sister groups. Group B1 is monophyletic. Group A also is monophyletic except for strains ECOR24 and ECOR16. ECOR31 and ECOR43, which were also considered "ungrouped" in the MLEE analysis (16), were grouped with the group A strains. The ECOR43 strain was also grouped as a group A strain by PCR phylotyping (5), FAFLP analysis (1), and MLST (Escobar-Páramo and Denamur, unpublished data). Strain ECOR24 is an atypical group A strain as it exhibits numerous virulence factors (17) and has a group B1 profile as determined by ribotyping (6). Strain ECOR16 also is not a group A strain as determined by FAFLP analysis (1).
![]() View larger version (23K): [in a new window] |
FIG. 4. Fifty percent majority rule consensus tree (phylogram), obtained by using parsimony, based on simultaneous analysis of 13 SNPs of the 72 ECOR strains and rooted with E. fergusonii. SNP data were obtained from SNaPshot analysis. The 30 E. coli strains used to reconstruct the tree in Fig. 2 are indicated by asterisks. The numbers above the nodes correspond to conserved nodes in the tree of the 30 ECOR strains rooted with E. fergusonii (Fig. 2). Subgroups B2c and B2d defined by SNP analysis are indicated by boldface type (tree length = 32, consistency index = 0.47, retention index = 0.92). The numbers below the nodes are bootstrap percentages calculated from 1,000 iterations. Only values that are 50% are indicated. The SNP phylotyping is in agreement with the MLEE typing (16) except for the "ungrouped" ECOR31, ECOR37, ECOR42, and ECOR43 strains, which appear in groups A, E, D, and A, respectively, and the group B1 strain ECOR69, which appears in group E.
|
Analysis of a collection of commensal and extraintestinal pathogenic E. coli strains.
A total of 111 commensal and pathogenic extraintestinal E. coli strains originating from Europe, America, and Australia were analyzed by SNP phylotyping. These strains were classified in the four main phylogenetic groups by PCR phylotyping (5) for this work (C. Amorin and E. Denamur, unpublished data). According to this typing approach, 37%, 4.5%, 10.8%, and 47.7% of them belong to the A, B1, D, and B2 phylogenetic groups, respectively. SNP data for each strain were added one by one to the SNP data set for the 32 ECOR strains representative of the phylogenetic diversity of the species. From these data, a 50% majority rule consensus tree was then reconstructed using parsimony in PAUP*4.0. The phylogenetic position of each strain was determined by comparison with the 32 reference strains (see Fig. S1B in the supplemental material). Ninety-three strains (84%) were easily classified in a phylogenetic group since the topology of the constructed phylogenic tree was conserved after addition of the SNP strain data. For 13 strains (12%), the tree topology was not conserved when the SNP data were added, but they could also be easily classified in a phylogenetic group. Indeed, as stated above, the main phylogenic groups and subgroups were still closely related in the nonconserved topology trees. Finally, five strains (4%) remained ungrouped by SNP phylotyping analysis (see Fig. S1B in the supplemental material).The five genes (trpA, trpB, putP, icdA, and polB) of strains that did not conserve the phylogenetic topology of the reference tree or were ungrouped were sequenced to see if the SNaPshot technology failed in the identification of the SNPs. The sequencing results confirmed the results obtained by SNaPshot analysis (data not shown). Of the 111 strains analyzed, 24.3% belonged to group A (node 2), with three-fourths of them in internal group Aa (node 1), 10% belonged to group B1 (node 3), and 14.4% belonged to group D (node 9); the majority of the group D strains were in internal group Da (node 7). Forty-five percent of the strains analyzed belonged to group B2 (node 12), with 28%, 42%, 10%, and 8% of them belonging to subgroups B2a (node 10), B2b (node 11), B2c (ECOR66), and B2d (ECOR54), respectively, and 1.8% (including ECOR37) belonged to group E. We then compared the SNP phylotyping results with the results obtained by PCR phylotyping (5) (see Table S1 in the supplemental material). More than 80% of the strains in this E. coli strain collection were grouped identically by PCR typing and SNaPshot analysis. In contrast to PCR typing, the new approach could identify strains belonging to group E. Indeed, many of the differences observed between PCR phylotyping and SNP phylotyping were due to group E strains that were mistyped as group D or B2 strains by PCR phylotyping. Other observed differences were mainly due to strains localized in group A by PCR typing and in group B1 by the SNP phylotyping approach or localized in group B2 by PCR typing and in group D by the SNP approach. Discrepancies between groups A and B2 were rarely observed (see Table S1 in the supplemental material).
The presence of strains outside previously defined groups or subgroups could have resulted from horizontal transfers, as stated above. Disagreements between PCR typing and SNP typing could have been due to horizontal transfers that scrambled the SNP phylogeny, especially when the disagreement was between closely related groups, such as groups A and B1 or groups D and B2. Alternatively, as the PCR typing method is based on only three genomic DNA fragments, recent mobilization of these fragments with acquisition or loss could be a pitfall of this approach (O. Clermont and E. Denamur, unpublished data). Furthermore, one of the DNA fragments, chuA, is involved in the iron capture process and so cannot be considered neutral.
Taken together, these results confirmed the validity of the new phylogenic typing method.
Concluding remarks.
We developed a new and easily automated phylogenetic grouping method similar to MLST, which was based on SNP analysis. This new method was validated with E. coli, which had one advantage: the availability of a reference collection representative of the diversity of the species (20) previously studied by a wide range of typing methods, including MLST. Although the new approach is less discriminative than MLST with the number of SNPs that we analyzed, the resolution of this approach could be increased by increasing the number of SNPs studied. Our results indicate that similar approaches may be used for a wide variety of bacterial species.
This work was partially supported by grants from the Programme de Recherche Fondamentale en Microbiologie et Maladies Infectieuses et Parasitaires (MENRT) and from the Fondation pour la Recherche Médicale.
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»