| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,
Unité Biodiversité des Bactéries Pathogènes Emergentes, Institut Pasteur, Paris, France,1 Danone Research, Palaiseau, France2
Received 16 May 2007/ Accepted 7 August 2007
| ABSTRACT |
|---|
|
|
|---|
ranging from 0.0038 to 0.0109), 3 to 12 alleles were distinguished, resulting in 31 sequence types. One sequence type (ST1) was frequent (17 strains), but most others were represented by a single strain. Attempts to subtype ST1 strains by MLVA, ribotyping, clustered regularly interspaced short palindromic repeat characterization, and single nucleotide repeat variation were unsuccessful. We found clear evidence for homologous recombination during the diversification of L. casei clones, including a putative intragenic import of DNA into one strain. Nucleotides were estimated to change four times more frequently by recombination than by mutation. However, statistical congruence between individual gene trees was retained, indicating that recombination is not frequent enough to disrupt the phylogenetic signal. The developed multilocus sequence typing scheme should be useful for future studies of L. casei strain diversity and evolution. | INTRODUCTION |
|---|
|
|
|---|
Virtually nothing is currently known about the genetic diversity and population structure of L. casei. However, knowledge of strain diversity and phylogenetic relationships would be highly relevant for understanding the evolution of ecological or biological properties of strains and for optimizing their industrial or medical exploitation. Basic but far-reaching questions about the biology of fermentative or probiotic characteristics, such as their strain specificity or evolutionary stability, cannot be answered without a proper population genetic framework. Distinguishing L. casei members is also important for identifying strains with particular phenotypic or industrial properties and for strain tracking, collection management, and traceability. Limited molecular typing data based on randomly amplified polymorphic DNA-PCR, ribotyping, pulsed-field gel electrophoresis, or insertion sequences (41, 49, 54) indicate the existence of DNA-level differences among L. casei strains. However, these methods are not considered robust for strain delineation and phylogenetic inference (2). For pathogenic bacteria, including species that are grouped together with lactobacilli in the order Lactobacillales (for example, Enterococcus faecium or Streptococcus pneumoniae), the method of choice for population genetics and standardized strain typing is multilocus sequence typing (MLST) (34). MLST consists of determining the sequence of an internal portion of a small number (most often seven) of housekeeping genes. It provides unambiguous genotype nomenclature that can easily be shared between laboratories and provides precise information on strain evolution. Although MLST is now widely used for international collaboration on strain tracking and population biology of bacterial pathogens, its application to the study of strain diversity and evolution in the field of food production microbiology is still in its infancy, with developments only for Lactobacillus plantarum and Oenococcus oeni (8, 9).
In evolutionarily recent bacterial groups, MLST fails to differentiate strains because nucleotide variation accumulates only at a low rate (43). For fine subtyping of these groups, methods with higher rates of evolution are needed. Methods that were shown to be useful for fine typing of homogeneous groups include multilocus variable-number tandem repeats (VNTR) analysis (MLVA) (31), single nucleotide repeat (SNR) variation (51), and clustered regularly interspaced short palindromic repeats (CRISPR) locus variation, also called spoligotyping for Mycobacterium tuberculosis (28).
The main aim of the present study was to develop an MLST scheme for L. casei and to initiate characterization of the population structure of this species. Due to the currently debated taxonomic status and relationships of L. casei and the related species Lactobacillus zeae, Lactobacillus paracasei, and Lactobacillus rhamnosus, which together are regarded as the L. casei group, we first determined the phylogenetic clustering of our strains compared to the type and reference strains of the L. casei group. MLST was then applied to the study of diversity among 52 strains. We also explored potential alternative typing methods with the hope of providing increased discrimination for some strain groups that were homogeneous based on MLST.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
MLST.
Lactobacillus casei strains were grown at 30°C overnight in 10 ml of MRS broth. The cells were pelleted by centrifugation and resuspended in 500 µl of TE (10 mM Tris-HCl [pH 8.0], 1 mM EDTA) solution containing 15 mg/ml of lysozyme (Sigma, Germany) and 15 µl of mutanolysine (5 U/µl). Cells were incubated overnight at 37°C and then lysed by adding 150 µl of 25% (wt/vol) sodium dodecyl sulfate and 150 µl of proteinase K (20 mg/ml) (Sigma).
DNA extraction was performed using the Wizard genomic DNA purification kit (Promega, Madison, WI). In order to design primers suitable for PCR amplification of all L. casei strains (Table 2), we optimized primers suggested to be useful for a wide range of bacterial groups (46). Optimization was achieved by reducing degeneracy in the original primers, using the genome sequence data of strain DN-114 001. The eight loci for which PCR amplification was successful corresponded to gene elongation factor EF-2 (fusA), isoleucyl-tRNA synthetase (ileS), GTP-binding protein LepA (lepA), leucyl-tRNA synthetase (leuS), CTP synthetase (pyrG), recombinase A (recA), ATP-dependent DNA helicase (recG), and 50S ribosomal protein L2 (rplB). These genes are widely separated on the chromosome sequence of strain DN-114 001 (Table 2), excepted for rplB and fusA, whose start codons are only 5,744 nucleotides apart. PCR conditions for all amplification reactions were as follows: initial denaturation at 94°C for 5 min; 30 cycles at 94°C for 30 s, 55°C for 30 s, and 72°C for 30s; and final extension at 72°C for 5 min. PCR products were purified by ultrafiltration (Millipore), and nucleotide sequences were obtained using the PCR primers and BigDye Terminator v3.1 chemistry (Applied Biosystems, Foster City, CA) on an ABI 3700 apparatus (Applied Biosystems, Foster City, CA). Sequence traces were edited and stored using BioNumerics version 4.6 (Applied-Maths, St. Maartens-Latem, Belgium). For reliability, the quality of the chromatogram traces was checked and the sequences were repeated until every nucleotide in the consensus sequence was supported by at least two sequence chromatogram traces.
|
|
Amplification and sequencing of the CRISPR locus.
CRISPRs are a family of DNA direct repeats found in many prokaryotic genomes (27). The CRISPR region was amplified with primers 482F (5'-CCAGGGTTCAAATAAGTTATTAATCGC-3') and 483R (5'-TTTAAGTGCCAGAGACTTTTCGTCGG-3'), which targeted the region flanking the unique CRISPR locus found in the genome of strain DN-114 001. PCR amplification conditions were as follows: initial denaturation at 94°C for 2 min; 30 cycles at 94°C for 30 s, 58°C for 30 s, and 72°C for 30s; and a final extension at 72°C for 5 min. Nucleotide sequencing was performed from the two ends using the PCR primers.
Amplification and sequencing of SNRs.
SNRs are short sequence stretches of the same nucleotide base. Four loci with mononucleotide repeats of 9 or 10 nucleotides were selected from the genome sequence of strain DN-114 001 (unpublished), and PCR primers flanking the SNR were designed. Two loci were successfully amplified and sequenced with primers SNR2F (5'-GTG TTG CTA ATT GCA TCG TCA CG) and SNR2R (5'-TTC ACG ATG GTC GGC TTG TCT GG) and primers SNR4F (5'-GGA CTG CCA TCA ACA CTG TCG) and SNR4R (5'-CCA TAT CGC ACG ATG ACA TGG). Locus SNR2 is located on position 2453190, whereas SNR4 is located at position 3063640, on the genome sequence of strain DN-114 001. Both loci are intergenic.
Data analysis.
For each MLST locus, an allele number was given to each distinct sequence variant, and a distinct sequence type (ST) number was attributed to each distinct combination of alleles at the seven genes. Minimum spanning tree analysis was performed using the software program BioNumerics v4.6 (Applied-Maths, Sint Maartens-Latem, Belgium). Neighbor-joining tree analysis was performed using MEGA v3.1 (30) or SplitsTree v4b06 (26). Recombination tests were performed using RDP2 (36). Nucleotide diversity was calculated using DNAsp v4 (44). To test for phylogenetic congruence among the genes, the 31 distinct STs were used. Neighbor-joining trees were generated using PAUP* v4 (http://paup.csit.fsu.edu/index.html) for each gene individually and for the concatenated sequence of the eight genes. Using the method of Feil et al. (16), for each gene, the differences in log likelihood were computed, using PAUP* software, between the tree for that gene and the trees constructed using the other genes, with branch lengths optimized. These differences were compared to those obtained for 100 randomly generated trees. The recombination rate during the diversification of clonal lineages was computed according to the previously described principle (18, 23) that considers, between single-locus variants, allelic changes with more than one nucleotide change as resulting from recombination, whereas changes with a single nucleotide difference are considered to result from mutation. Possible recombination events introducing a single nucleotide change were excluded. The relative contributions of recombination on allelic and nucleotide changes were computed according to the above-described principle using the program MultiLocus Analyzer (S. Brisse, unpublished).
For MLVA data, each allele was coded using the deduced number of repeats. Each unique allelic combination of the repeat numbers at the nine VNTR loci was considered as a new repeat type (RT). Negative amplifications (see Results) were not taken into account for pairwise profile comparison. Hence, VNTR profiles with one or more missing loci were associated with the profile composed of the same values at the other loci. To keep track of the missing information, we coded these profiles with a delta suffix followed by the locus number that was missing (for example, RT1
9 corresponds to RT1 except that locus VNTR-9 was not amplified). The relatedness between the different STs or RTs was investigated using BioNumerics software by the minimum spanning tree method (48).
Nucleotide sequence accession numbers.
The rplB sequences generated in this study are available from the GenBank/EMBL databases under the accession numbers AM502819 to AM502835. Sequence data for the other genes are available through our MLST web site (www.pasteur.fr/mlst) and were also deposited in GenBank/EMBL, under the accession numbers EU030989 to EU031043.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
In order to ensure that the study strains belonged to L. casei, we gathered reference and type strains of Lactobacillus plantarum, Lactobacillus brevis, Lactobacillus casei, Lactobacillus zeae, Lactobacillus animalis, Lactobacillus sakei subsp. carnosus, and Lactobacillus rhamnosus. The gene rplB was amplified in all study strains and all of these species, whereas the seven other genes could not be amplified in some of these species. Therefore, the rplB gene sequence was chosen to compare the study strains with the reference/type strains of the other species. Excluding insertion and deletion events (in which two adjacent codons are implicated), a total of 184 (50.3%) polymorphic sites were found based on the alignment of 366 bp. Phylogenetic analysis of the sequence data showed that the 52 study strains had an rplB gene sequence that was very similar to that of L. casei reference strain CIP107868 (= ATCC 334), proposed as the neotype strain for L. casei (10, 13), and clearly distinct from rplB sequences obtained for other species, including the compact cluster formed by two L. zeae strains (including the type strain of this species) and the current taxonomic type strain of L. casei, CIP103137T (= ATCC 393T) (Fig. 1). These results, which support the exceptional position of the current type strain of L. casei, are concordant with those based on sequence analysis of the gene recA (20), gene tuf (6, 56) and ribotyping (52). Concordance between these markers renders it very unlikely that the atypical phylogenetic positioning of strain CIP103137T in single gene-based phylogenies is due to horizontal gene transfer of any of these three protein-coding genes. The gene rplB hence appears as a reliable phylogenetic marker for strains of the L. casei group and could be used in conjunction with tuf and recA for multilocus sequence analysis-based species delineation (24) in this group (6, 20, 56). Strain BL23, which is considered to be a plasmid-cured derivative of the type strain ATCC 393T, clustered within the L. casei rplB cluster. It thus appears distantly related from its supposedly ancestral strain, an observation that confirms previous reports that BL23 is not directly related to ATCC 393T (1).
|
Nucleotide variation.
The sequences of the 8 loci were determined for the 52 study strains. Consensus sequence templates ranged in length from 315 bp (recA) to 663 bp (fusA). The proportion of variable sites ranged from 0.58% (pyrG) to 4.09% (recG). Polymorphic sites are given on Fig. 2. Nonsynonymous substitutions per nonsynonymous site were relatively rare compared to synonymous changes per synonymous site (Table 4), indicating selection against amino acid changes and excluding strong positive selection on the observed allelic diversity, as is typically observed for housekeeping genes. The GC percentage observed in all alleles of the eight genes ranged between 47 and 50, thus being close to the GC percent value (46.6) of the complete genome of strain ATCC 334 (35).
|
|
MLST scheme optimization for future use.
The number of alleles per locus ranged from 3 (pyrG) to 12 (leuS). By combining the 8 gene loci, 31 STs were distinguished. MLST schemes generally include only five to seven gene loci, because adding more genes often does not increase the number of STs that are distinguished. We intended to optimize our MLST scheme by limiting the number of genes (for practical reasons) while retaining the highest number of STs. When rplB was removed from the analysis, the same 31 STs were found based on the 7 remaining genes. In contrast, when either pyrG or recA (the two genes with the lowest number of alleles) was removed, the number of STs decreased to 30. Therefore, we decided to eliminate rplB from our MLST scheme. Another reason to do so was that rplB and fusA are adjacent on the chromosome of strains DN-114 001 (start nucleotides are, respectively, at positions 2673252 and 2678996; see Table 2) and may therefore be horizontally transferred by a single recombination event, which would introduce bias in recombination rate estimations, for example, when using the clonal diversification method (see below).
Strain relationships based on allelic profiles.
To explore the relationships among the 52 study strains, allelic profile-based phylogenetic analysis was performed using the minimum spanning tree algorithm (Fig. 3), which links profiles so that the sum of the distances (number of distinct alleles between two profiles) is minimized (48). In this representation, strains of the same allelic profile fall in the same circle, the size of which is proportional to the number of strains with that particular profile. Similar to eBURST (17), this approach is less sensitive than nucleotide-based approaches to the disturbing effect of genetic recombination on phylogenetic reconstruction. Figure 3 illustrates that genotype ST1 was dominant in number among our 52 study strains, with 17 (33%) strains in total. Notably, L. casei strain DN-114 001 belonged to ST1. Strain BL23 also belonged to the ST1 cluster, and this affiliation was confirmed based on AFLP (see below), further confirming that it is not a direct derivative of ATCC 393T. Other STs with more than one strain were ST2, ST7, ST9, ST12, and ST18 (Fig. 3; Table 1).
|
Comparison of MLST with AFLP data.
Comparison of MLST data with AFLP data showed very high agreement (see the color pattern in Fig. 3). All strains with different STs had a different AFLP type, with two exceptions. First, strain D659 (ST3), which had the same AFLP type as strains of ST1, differed from ST1 strains at the locus ileS, with two nucleotide changes in this gene between the two STs. Likewise, strains D640 (ST14) and D641 (ST15) differed at the locus recG by five nucleotides, most probably caused by a single recombination event. Therefore, it appears that in both cases, the MLST profile evolved by a single locus change from a common ancestral strain while the AFLP profile remained identical. Concordance between MLST and AFLP was further evident in that most strains of a given ST had the same AFLP type. In particular, this was the case for all strains of ST1, which were all undistinguishable by AFLP. The discriminatory powers found herein for MLST and AFLP were very similar, an observation that is concordant with comparisons of these two methods when used with other bacterial species.
Evidence for homologous recombination in L. casei.
Bacterial species differ widely in their rates of homologous recombination (19). High rates of recombination accelerate the speed of genome diversification, hence affecting the interpretation of genomic differences that are observed among strains. In addition, recombination reduces the linkage between a given genomic background and individual genes, including those possibly involved, e.g., in probiotic characteristics. Because housekeeping genes are unlikely to be positively selected for variation, detection of recombination in these genes would provide an indication that recombination is relatively frequent in the population (19).
Homologous recombination introduces conflict among nucleotide sites in sequence data, which can be visually detected using split decomposition analysis and representing the conflicting relationships among sequences by a network rather than a tree (25). The concatenated sequence of the seven MLST genes did not show a network-like structure (see Fig. S1A in the supplemental material), suggesting that overall the seven gene portions are compatible among themselves, which excludes widespread associative recombination among genes. Accordingly, the phylogeny obtained using the neighbor-joining method (see Fig. S1B in the supplemental material) was very similar to the split network.
A method based on likelihood analysis has been proposed to evaluate the long-term consequences of recombination for the congruence of gene genealogies derived from independent genes (16). We found a significant congruence with the concatenated sequence of the eight genes (including rplB) for the gene trees derived from lepA, leuS, recG, rplB, ileS, and fusA (see Fig. S4, last panel, in the supplemental material). In addition, all of these genes but fusA were congruent with each other (see Fig. S4, first six panels, in the supplemental material). For fusA, the lack of congruence with the other genes could be explained by the presence of only one phylogenetically informative (i.e., a polymorphism present in at least two sequences) site (Fig. 2), which does not provide enough phylogenetic signal. For pyrG and recA, there was not a single phylogenetically informative site (Fig. 2), and for these two genes the obtained phylogenies were not statistically different from the random phylogenies (see Fig. S4 in the supplemental material). Overall, the above results indicate that recombination is not frequent enough to disrupt the phylogenetic signal, but they do not exclude low rates of recombination.
When the nucleotide sequences of individual genes were analyzed (see Fig. S2 in the supplemental material), they showed no or few network-like relationships (splits), except for lepA. Visual inspection of the nucleotide polymorphisms confirmed the existence of a number of conflicting partitions among some sites in lepA sequences. For example, site 28 partitions alleles 5, 6, and 10 versus all other alleles, whereas site 267 would group allele 5 with allele 8. Accordingly, lepA-5 appears to be related either to lepA-8 or to the node leading to lepA-6 (with the lepA-6 branch being explained by a singleton polymorphism, A at position 235).
Although recombination can introduce conflicts between sites, other explanations are possible, for example, homoplasy (reversion or parallel mutations in independent lineages). One more direct way to detect intragenic recombination is the observation of clustered distribution of polymorphisms along the sequence length (mosaic structure). In the case of lepA-10, we observed five differences from lepA-1 between positions 396 and 549 but none between positions 29 and 395 (Fig. 2). The existence of intragenic recombination among lepA alleles was also suggested by a positive Sawyer's test (P = 0.009 using uncondensed fragments) and the chi-square intragenic recombination test (P < 0.01). Overall, these results suggest that homologous recombination occurs in L. casei but at a rate that is not sufficiently high to eliminate most of the phylogenetic signal contained in the nucleotide sequences.
Although the above-used methods can detect recombination, they do not provide a direct estimate of the relative contributions of recombination and mutation in the evolution of strains. A way to quantify the recent contribution of recombination to the generation of genotypic diversity is the clonal diversification method (18, 23). In this method, for each pair of profiles that differ by only one gene (out of seven) along the evolutionary tree, the number of nucleotide changes between the two alleles that differ is counted. A single nucleotide difference is considered to be likely caused by mutation, whereas more than one mutation is considered to derive from recombination. Out of 13 allelic changes, 8 could be attributed to recombination, representing 21 nucleotide changes in total (see Table S2 in the supplemental material). Therefore, nucleotides are approximately four times (21/5) more likely to change by recombination than by mutation in this data set. This result is comparable to the relative contributions of recombination and mutation in E. coli: when the 527 STs available on the E. coli MLST web site (http://web.mpiib-berlin.mpg.de/mlst/) were considered, allelic changes observed within the 81 clonal complexes were found to be caused by recombination 0.85 times as frequently as by mutation, with nucleotides being 5.18 times more likely to change by recombination than mutation. The species E. coli can be considered weakly clonal, with statistical congruence among gene phylogenies but with evidence for localized recombination and gene mosaicism (16, 23, 57). As for E. coli, the apparently paradoxical observation for L. casei of both recombination and phylogenetic congruence among genes can be reconciled by the low rate of recombination or by ecological structuration of the natural populations (16, 23).
Characterization of VNTR loci.
CC1 represented a high proportion of strains in our collection. Therefore, we were interested in identifying genetic markers that would allow discrimination among members of this complex and in particular for members of ST1. MLVA has been shown to be a powerful method for subtyping very closely related strains that are not distinguished by MLST (31). This method is based on tandem repeat copy number differences between strains at well-defined loci. These differences result in locus size variation, which can be detected efficiently by PCR amplification using locus-specific primers targeting the regions that flank the tandem repeats.
After in silico identification of tandem repeats in the genome of strain DN-114 001 (see Materials and Methods) and PCR amplification testing on a small strain set, we selected nine loci that gave the best results. The number of repeats at the 9 selected VNTR loci was determined for 40 strains from the Danone Research collection (see Table S1 in the supplemental material). Despite repeated PCR amplification trials, only two loci (VNTR-10 and VNTR-14) could be amplified for all strains. However, VNTR-14 was monomorphic, and VNTR-10 showed a distinct allele only in two strains. These two loci therefore appear to be highly stable in the population of L. casei, both with respect to copy number and flanking sequences used for PCR priming. At the other loci, the number of PCR-negative strains ranged from 2 (for VNTR-2 and VNTR-3) to 26 (for VNTR-9). Notably, there was an obvious association between PCR failure and MLST data (see Table S1 in the supplemental material). For example, ST14 and ST15, which belong to the same MLST clonal complex, were PCR negative for loci VNTR-4, VNTR-9, and VNTR-12. Failure to amplify VNTR loci in strains with a specific phylogenetic background has been observed for other species (31, 33) and can most likely be attributed to sequence variation at the priming sites or absence of the locus in specific phylogenetic lineages.
The number of alleles ranged from 1 (VNTR-14) to five (VNTR-4). Sequencing of the distinct alleles confirmed that size variation was due to repeat number variation (not shown). Not considering negative amplification as a distinctive characteristic, only 14 unique MLVA types were identified among the 55 strains tested. By comparison, 23 STs were distinguished among these strains. Only one pair of strains of the same ST (ST18) had distinct MLVA profiles, which differed only by a single repeat difference at VNTR-4. All analyzed strains of ST1 had the same MLVA pattern. We conclude that MLVA based on the nine tested loci is less powerful than MLST in discriminating L. casei strains.
Despite this lower degree of variation, the observed concordance of MLVA data with MLST data was very high. In all cases, strains with distinct STs were distinct by MLVA, except for the highly related ST1, ST2, and ST3 (see Fig. S3 in the supplemental material).
Ribotyping, CRISPR locus variation, and SNR variation of CC1 strains.
We attempted to find genetic differences between strains of CC1 by three other methods. First, five ST1 strains (including DN 114-001), one ST2 strain (D6.1), and one ST18 strain (D697) were tested by ribotyping. The banding patterns of these six ST1 and ST2 strains were totally identical, since the same six DNA fragments were observed. Therefore, in contrast to MLST, ribotyping using EcoRI could not discriminate between DN-114 001 and D6.1. By comparison, the ribotype pattern of strain D697 (ST18) showed two clearly distinct bands (data not shown), in agreement with the more distant relationship of this strain based on MLST.
We next explored the possible existence of sequence or spacer content variation at one CRISPR locus that we identified (at positions 2393826 to 2394728) in the genome sequence of strain DN-114 001. The CRISPR locus is widely distributed in prokaryotes (27) and is constituted by an array of short (21 to 47 bp in the currently described CRISPR loci) conserved nucleotide stretches interleaved with nonrepeating spacers of similar size. In Mycobacterium tuberculosis, the CRISPR locus is called the DR locus and is the basis of spoligotyping (28). Although M. tuberculosis is highly homogeneous based on nucleotide sequencing of protein-coding genes, M. tuberculosis strains show extensive spacer content variation at CRISPR (5). Hence, spacer content variation can be used for strain subtyping, although this method has as yet been used only on a limited number of bacterial groups (4, 7, 42, 47). Four strains of CC1 (D657, D658, and D573 of ST1 and D6.1 of ST2) were selected, together with three strains of ST19, ST29, and ST30 used for comparison. PCR amplification of the entire CRISPR locus was successful for the four CC1 strains. However, sequence determination of 1,400 bp from the two extremities of the CRISPR locus did not reveal a single nucleotide difference between the four strains. Therefore, we did not consider this method promising for strain discrimination in CC1. PCR amplification failed with the three other strains tested, and we also noted that our primers do not match with the CRISPR locus in strain ATCC 334.
SNRs are short sequence stretches of the same nucleotide base at defined genome positions. Some of these mononucleotide stretches were shown to be highly polymorphic, including among strains with the same VNTR type (21, 51, 55). We selected 4 SNR loci with >9 repetitions of the same base (either T or A) in the DN-114 001 genome sequence. Two of these loci were successfully amplified by PCR and sequenced for the seven strains that were tested for CRISPR variation. SNR variation was found at both loci, in the form of a single nucleotide insertion observed in CC1 strains compared to the three non-CC1 strains. In addition, just upstream of each SNR locus, a single nucleotide polymorphism (T/G and A/G) was observed in one non-CC1 strain (ST19 and ST29, respectively). Unfortunately, no variation was found among the CC1 strains tested.
Conclusions.
In conclusion, we developed an MLST scheme for L. casei (L. paracasei) strains and estimated, using 52 strains, some basic population biology parameters of L. casei, including diversity indices and the impact of homologous recombination on the diversification of clones. The present L. casei MLST method is intended to become a common language for strain characterization with L. casei. Our study strains were mostly from food sources, industrial or traditional, creating the possibility that some of these strains were selected several times independently for their appreciated characteristics. Analysis of wider, well-documented strain collections with global strain sampling will precise the population structure of L. casei and could potentially bring interesting information on the history of dairy products and on the genotype-phenotype relationships of strains. To this purpose, we developed an MLST web site for L. casei, publicly available at http://www.pasteur.fr/mlst. The discriminatory power of MLST using the seven proposed genes was very similar to that of AFLP, a more complex, less reproducible, and less portable method. MLST should prove useful for strain collection management or traceability purposes.
We intended, but failed, to develop a complementary typing method that would allow subtyping of strains belonging to the major ST encountered in our strain collection, ST1. The subtyping of strains belonging to ST1 was important to us because they show differences in phenotype (Danone Research, unpublished). Subtyping of this major ST was not achieved using VNTR markers, a result that was surprising given the repeated finding of a large amount of VNTR variation among strains with the same ST in several bacterial species, such as Escherichia coli O157:H7 (32) or Salmonella enterica serotype Typhimurium (33). Similarly, Bacillus anthracis is homogeneous by MLST but shows MLVA variation (29). However, in the case of Enterococcus faecium, a species that is phylogenetically more closely related to lactobacilli based on 16S rRNA gene sequences, MLVA variation was not higher than MLST variation (53), similar to our present finding. These results may indicate an atypical stability of VNTR loci in this group of bacteria.
The other simple methods that were explored for ST1 subtyping (ribotyping, SNR markers, and CRISPR) did not show promise for this purpose. Efforts for further subtyping strategies could be warranted, possibly by undertaking more global mutation discovery approaches, such as whole-genome shotgun sequencing, microarray hybridization-based comparative genome sequencing, or mutation screening in high numbers of selected genes (43).
| ACKNOWLEDGMENTS |
|---|
We are grateful to Chantal Bizet and the Collection de l'Institut Pasteur for providing reference and type strains.
| FOOTNOTES |
|---|
Published ahead of print on 17 August 2007. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
| REFERENCES |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||