Previous Article | Next Article ![]()
Applied and Environmental Microbiology, June 2008, p. 3702-3709, Vol. 74, No. 12
0099-2240/08/$08.00+0 doi:10.1128/AEM.00244-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Guillaume Achaz,3,4
Catherine Bouchenot,2
Jean-François Bernardet,2 and
Eric Duchaud2
INRA, Mathématique Informatique et Génome UR1077, F-78350 Jouy-en-Josas, France,1 INRA, Virologie et Immunologie Moléculaires UR892, F-78350 Jouy-en-Josas, France,2 Université Pierre et Marie Curie-Paris 6, Atelier de Bioinformatique, F-75005 Paris, France,3 Université Pierre et Marie Curie-Paris 6, Systématique Adaptation Evolution UMR7138, F-75005 Paris, France4
Received 28 January 2008/ Accepted 14 April 2008
|
|
|---|
|
|
|---|
Answers to a number of important questions regarding F. psychrophilum are intimately connected with a better knowledge of its population structure. Distribution of F. psychrophilum is currently worldwide, but it is unclear what the initial geographic range of the pathogen was prior to the development of the fish-farming industry and international trade of fish and fish eggs. The range of natural host species also remains unclear: F. psychrophilum has been occasionally isolated from nonsalmonid freshwater fish, such as carp, sturgeon, sea lamprey, and eel (17, 26), but whether these fish species harbor a significant population of the bacterium and may constitute reservoirs for salmonid fish is not known. It also remains to be understood how the respective contributions of the pathogen characteristics (i.e., virulence and host specificity), as opposed to the fish susceptibility, affect the success and severity of the infection. There is experimental evidence for strain-dependent variations of virulence (28). Hence, diffusion of the disease could have resulted from the spreading of some highly virulent strains. The trade of fish eggs could have played a critical role, as the vertical transmission of F. psychrophilum is highly suspected (10, 13, 34).
To gain insight into the population structure of F. psychrophilum and its mode of evolution, we undertook a multilocus sequence analysis taking advantage of the recent determination of the whole-genome sequence of strain JIP 02/86 (15). Here, we report on the analysis of sequence polymorphisms at 11 core genome loci across a collection of 50 isolates selected to represent the broadest possible genetic diversity. This study also aimed at establishing a multilocus sequence typing (MLST) scheme (29, 43) for future epidemiological monitoring as a complement or an alternative to other typing methods, such as random amplification of polymorphic DNA, PCR-restriction fragment length polymorphism, ribotyping, pulsed-field gel electrophoresis, serotyping, and plasmid profiling (3, 11, 12, 23, 27, 37, 41). To our knowledge, MLST data already published for bacteria isolated from diseased fish are limited to a few strains of Yersinia ruckeri and Vibrio vulnificus (8, 25).
|
|
|---|
The 50 F. psychrophilum strains used in this study were isolated from all over the world over a period of more than 20 years, between 1981 and 2003, except for the type strain NCIMB 1947T, whose year of isolation is unknown but which was deposited in culture collections long before 1980. They represented six different geographical areas (North America, Europe, Israel, Chile, Tasmania, and Japan) and 10 different fish species (rainbow trout [Oncorhynchus mykiss], cutthroat trout [Oncorhynchus clarki], Coho salmon [Oncorhynchus kisutch], Chinook salmon [Oncorhynchus tshawytscha], ayu [Plecoglossus altivelis], Atlantic salmon [Salmo salar], brown trout [Salmo trutta], European eel [Anguilla anguilla], tench [Tinca tinca], and carp [Cyprinus carpio]). Most strains had already been included in previous typing studies performed in our laboratory (11, 12).
Strains were cultivated on modified Anacker and Ordal agar at 18°C as described previously (33). PCR amplifications were performed with 10 ng of purified genomic DNA extracted using a Wizard genomic DNA purification kit (Promega). The following primers were designed from regions showing high degrees of conservation within the genus Flavobacterium: trpB (forward, AAGATTATGTAGGCCGCCC; reverse, TGATAGATTGATGACTACAATATC), gyrB (GTTGTAATGACTAAAATTGGTG and CAATATCGGCATCACACAT), glyA (AAAGATAGACAAATTCACGG and GGTGATTTATCATCAAAAGG), dnaK (AAGGTGGAGAAATTAAAGTAGG and CCACCCATAGTTTCGATACC), tuf (GAAGAAAAAGAAAGAGGTATTAC and CACCTTCACGGATAGCGAA), rplB (TCAAATTAAGAAAGCTGTTGA and TAATGAACGTGCCATATCTT), fumC (CCAGCAAACAAATACTGGGG and GGTTTACTTTTCCTGGCATGAT), ftsQ (TTACGATAGGTGCGGGGGATAT and GTGCTGACCCGCAATACCTAC), murG (TGGCGGTACAGGAGGACATAT and GCATTCTTGGTTTGATGGTCTTC), recA (TAGACAAGACTTACGGCAAAGG and TCCGTAGCTAAACCACGATCC), and atpA (CTTGAAGAAGATAATGTGGG and TGTTCCAGCTACTTTTTTCAT). PCR amplifications were performed with a 20-µl reaction volume by using recombinant Taq polymerase (Invitrogen) and the following conditions: 94°C for 3 min, followed by 35 cycles at 94°C for 0.5 min, 52°C for 0.5 min, and 72°C for 2 min, and a final extension at 72°C for 8 min. Two microliters of the PCR products was resolved on a 1% agarose gel to check amplification. One microliter of the PCR products was purified by using exonuclease I (Biolabs)-alkaline phosphatase (USB) for 1 h at 37°C, followed by enzyme inactivation for 5 min at 94°C. One-tenth of the purified PCR products was sequenced on both strands, using the same primers as for the PCR and a BigDye Terminator version 3.1 sequencing kit (Applied Biosystems). The resulting products were analyzed with an Applied Biosystems 3730 automated sequencer. All chromatograms were manually verified to ensure high sequence quality. Sequences of strain CSF 259-93 were retrieved from the whole-genome sequencing project USDA-ARS 1930-32000-002-03 (G. Wiens, personal communication).
Data analysis.
According to MLST standards (43), arbitrary numbers were used for unambiguous identification of the allele types (ATs; particular alleles at particular loci) and sequence types (STs; unique combinations of ATs at the different loci).
A neighbor-joining tree with a simple Jukes-Cantor model of sequence evolution was constructed using the neighbor program of PHYLIP (version 3.6; J. Felsenstein, Department of Genome Sciences, University of Washington, Seattle, WA) for the graphical representation of the overall sequence divergence between isolates. Bootstrap analysis was based on 1,000-locus resampling.
For a more appropriate network representation of the relationships between STs, E-burst version 3 with default settings (19) and a new allele-sharing network representation were used. The latter involved bidimensional projection of the distance matrix between STs, defined as dij = –log[(Lij + 1)/(L + 1)], where Lij is the number of loci with identical ATs in ST i and ST j, and L is the total number of loci (L = 11). The projection was carried out with the nonmetric multidimensional scaling method implemented in the Sammon function of the R statistical language (39, 44).
Statistical analysis of the population structure (host-genotype association) was performed using the analysis of molecular variance method with simple Euclidean distance between STs (dij =
, where nij is the number of differences at the nucleotide level) and a permutation-based, nonparametric estimate of statistical significance (18).
The detection of recombination within and between loci involved the homoplasy test (31) and computation of the Hudson and Kaplan lower bound on the minimal number of recombination events in an infinite site model, Rmin (22). The minimal number of apparent homoplasies (h) was computed on the most parsimonious tree found with DNApars (PHYLIP). For the purpose of statistical testing, the fraction of effective sites was set to 18%, and h was recomputed on biallelic sites only. Rmin was computed on biallelic sites by using LDhat (32). A quantitative estimate of the contribution of recombination versus that of mutation in short-term divergence between strains was obtained by examining the relationships between single-locus-variant (SLV) STs according to the method described by Feil et al. (20, 21).
Nucleotide sequence accession numbers.
Nucleotide sequences have been deposited in GenBank (accession numbers EU428196 to EU428795).
|
|
|---|
). The numbers of SNPs and the nucleotide diversities differed between loci, from 2 SNPs and 0.02% pairwise diversity (at locus ftsQ) to 26 SNPs and 0.95% pairwise diversity (at locus atpA). The vast majority of the SNPs were biallelic (four were triallelic), as expected given the small fraction of polymorphic sites. Only 16 SNPs corresponded to nonsynonymous variations, suggesting that most of the polymorphisms reported here are selectively neutral or almost neutral. The numbers of distinct alleles ranged from 3 (at locus ftsQ) to 18 (at locus tuf), with 17 at locus atpA. The combination of the ATs at the 11 loci allowed 33 different STs to be distinguished among the 50 isolates. |
View this table: [in a new window] |
TABLE 1. Summary of polymorphisms
|
The second statistic, h, is the minimal number of apparent homoplasies obtained as the difference between the number of visible polymorphisms and the minimal number of mutations on a tree for explaining the sequences. This number is 0 in the absence of recurrent mutations and recombinations. Here, h was greater than 0 for all loci except ftsQ and reached 36 for atpA. Statistical testing showed that each of the 10 positive values of h was significantly greater than expected as a consequence of recurrent mutations alone, indicative of pervasive recombination within loci. Furthermore, the value of h obtained on the concatenated sequence was much higher than the sum of 11 independent values of h (294 versus 105), suggesting additional recombination between loci.
The level of polymorphism at locus ftsQ seemed unexpectedly low compared to those at the 10 other loci. Somewhat surprisingly, comparison with the sequences of two other members of the genus Flavobacterium, F. johnsoniae and F. columnare, indicated that locus ftsQ tended to evolve faster than the other loci (data not shown). Therefore, the low diversity is unlikely to result from a particularly low rate of molecular evolution. A possibility would be that the nucleotide diversity was recently swept out as a consequence of fixation of a beneficial mutation (30).
Population structure: clonal complexes, host association, and quantification of recombination.
The relationships between isolates can be represented in several ways. Although recombination precludes the reconstruction of a phylogenetic tree for the 50 isolates, Fig. 1 shows a tree aimed at reflecting the genetic distances between isolates. The limited value of this tree in terms of sequence genealogy is well illustrated by the low bootstrap support of most internal branches. The shape of the tree is well balanced in the sense that none of the isolates stands as "atypical." This gives another indication of the homogeneity of the species already supported by the low level of nucleotide divergence. Examination of the host fish species and the geographical origins of the isolates clearly shows an association between the genotype of the strain and the host fish species. In particular, 17 STs were sampled more than once but each ST always occurred in a unique fish species, sometimes in very distant regions of the world, for instance, ST 10 (rainbow trout) in North America and Europe, ST 12 (rainbow trout) in Chile and Europe, and ST 9 (Coho salmon) in North America and Chile. Conversely, very different STs coexisted in the same geographical area in association with different hosts. For instance, at least four very different STs coexisted in Japan: ST 17 (rainbow trout), ST 13 and ST 30 (Coho salmon), and ST 5 (ayu). The bootstrap analysis further suggested the existence of five groups of STs with low levels of divergence within groups, each group being preferentially associated with a particular fish species. The association between host fish species and genotype was quantitatively assessed by an analysis of molecular variance (18). The fish species accounted for 51.3% of the total molecular variance of the sample in terms of nucleotide differences. This association was highly statistically significant: the quantile associated with the P value 10–3 was estimated at 15% of the variance by random permutation of the fish labels.
![]() View larger version (21K): [in a new window] |
FIG. 1. Tentative tree representation of the relationships between the 50 Flavobacterium psychrophilum isolates. For each isolate, the host fish species, geographical area, and ST number are reported. Internal branches supported by more than 700 out of 1,000 bootstrap replicates are indicated by plain lines, and bootstrap values are provided. The bootstrap support associated with the cluster of strains highlighted with a vertical bar (*) was above 700/1,000 only when ST 6 and ST 21 were excluded from the analysis (the positions of these two sequences in the tree were unstable). The tree is unrooted. Abbreviations of fish names: Rbt, rainbow trout; BrT, brown trout; CuT, cutthroat trout; AtS, Atlantic salmon; ChS, Chinook salmon; CoS, Coho salmon; Ten, tench; Car, carp. Abbreviations of geographical areas: EU, Europe; ISR, Israel; NA, North America; CHL, Chile; AUS, Australia; JPN, Japan.
|
|
View this table: [in a new window] |
TABLE 2. Strains and sequence types
|
![]() View larger version (17K): [in a new window] |
FIG. 2. Network representations. (a) eBURST diagram. The three clonal complexes identified as groups of STs connected by SLVs are highlighted. (b) Allele-sharing network. Line styles reflect the number of ATs shared by a pair of STs: >3 ATs, plain black lines; 2 or 3 ATs, plain gray lines; and 1 AT, dotted gray lines. The major AT at locus ftsQ was not considered here, because of its overwhelming preponderance in the population.
|
A simple quantitative estimate of the contribution of recombination to the generation of new genotypes can be obtained from the study of the events that participated in the diversification process of clonal complexes (20, 21). A total of 11 SLV pairs were examined. The E-burst diagram served as a reference to select the eight SLV pairs and to orient the genetic events in CC1. Three additional SLV pairs belonged to CC2. E-burst inferred that ST 9 was ancestral to this clonal complex, given the number of isolates that ST 9 represented, but we preferred to consider ST 19 ancestral: ST 19 corresponded to the oldest isolate in our collection (i.e., the species type strain). This choice was also more parsimonious in terms of sequence evolutionary changes (data not shown). For the last SLV pair (ST 28 and ST 31), the inferred genetic event was independent of the choice of the ancestral ST. Nine out of the 11 SLV pairs were better explained by recombination events and 2 by mutation events. At the nucleotide level, 52 changes were apparently due to recombinations and only 2 to mutations.
Scheme for future MLST.
MLST is now recognized as a reference method for the typing of many bacterial pathogens. For MLST to be effective, the sequences of a few loci have to provide enough information to discriminate a high number of STs. The pattern of sequence polymorphism reported in this study shows that this is indeed the case for F. psychrophilum. It also allows the most informative loci to be selected for this purpose. As the result of a balance between cost and resolving power, most MLST schemes rely on seven loci. We propose that future MLST surveys of F. psychrophilum use the seven loci with the largest amounts of polymorphism as reflected in number of ATs: trpB, gyrA, dnaK, tuf, fumC, murG, and atpA. As shown in Table 2, the combination of ATs at these seven loci captures the 33 distinct STs. The data for the 50 strains at the seven loci have been deposited in a dedicated MLST database (24) hosted at the Institut Pasteur (http://www.pasteur.fr/mlst/).
|
|
|---|
Remarkably, the amount of nucleotide polymorphism as measured by the population mutation parameter (
= 2Nu, here estimated by
) reported in this study for F. psychrophilum is lower than that reported in the MLST datasets for 16 other bacterial pathogens (36). This low level of polymorphism may reflect a small effective-population size, as there is no sign that the diversity is limited by the recent origin of the pathogen. Indeed, Tajima's D statistic tends here to be positive rather than negative (data not shown) (42). This might indicate a relatively narrow original range of host species or a limited range of geographical origin for the pathogen in wild fish. References are lacking, however, as this is the first study on the nucleotide polymorphism of a fish pathogen and of a cold-living freshwater bacterium. Furthermore, almost nothing is known on the variability of mutation rates in the bacterial world, and the low diversity observed in this study could also reflect a low mutation rate.
The biological mechanism responsible for the high rate of recombination in F. psychrophilum remains to be elucidated. Natural competence cannot be excluded but has never been reported in the literature for this bacterium. Conjugative plasmids and transposons that might mediate DNA transfer between strains have, however, been found (1, 11, 15). To quantify the amount of recombination, we computed that 9 out of 11 pairs of STs differing at one locus (SLV pairs) were best explained by recombination (a ratio of 4.5:1 at the allele level). At the nucleotide level, this translated into 52 changes apparently due to recombination and only 2 changes due to mutation (a ratio of 26:1 at the nucleotide level). Although there is statistical uncertainty associated with those estimates of the contribution of recombination to short-term divergence, they can be compared with values reported in the literature for other bacteria. They are much higher than that for a mildly recombinogenic bacterium, such as Escherichia coli (allele level ratio, 0.84:1; nucleotide level ratio, 5.18:1) (38). They are somewhat lower than those reported in the literature for Streptococcus pneumoniae (allele level ratio, 8.9:1; nucleotide level ratio, 61:1) and Neisseria meningitidis (allele level ratio, 4.75:1; nucleotide level ratio, 100:1), both considered highly recombinogenic bacterial species (20, 38). It is, however, important to note that these values reflect not only the recombination rate but also the average divergence time between the sequences that recombine: the greater this evolutionary time, the more each recombination event alters the sequence (see next paragraph). For S. pneumoniae and N. meningitidis, the average genetic diversities (
) computed from the data of reference 20 are 0.012 and 0.040, respectively. These two species thus harbor higher genetic diversity than F. psychrophilum. Unless this is a simple consequence of higher mutation rates, intraspecific divergence times are also longer. Each recombination event may thus induce more allele and nucleotide changes in S. pneumoniae and N. meningitidis than in F. psychrophilum.
A more explicit description of the relationships between genetic diversity and the proportion of nucleotide change due to recombination is instructive. The expected number of nucleotide changes in a sequence of length L in a short evolutionary time t decomposes into L x t x r x T x u changes due to recombination and L x t x u changes due to mutation, where r and u are the rates of recombination and mutation, respectively, at each nucleotide position per unit of time and T is the average evolutionary time that separates the two sequences that recombine. The ratio between the numbers of nucleotide changes apparently due to recombination and mutation therefore depends not on r alone but rather on the composite parameter r x T. Direct information on T is lacking, but T is likely to be correlated with genetic diversity (
), namely, T =
/u, in a panmictic model. The threefold difference in the proportion of nucleotide change due to recombination in S. pneumoniae versus that for F. psychrophilum might thus reflect the threefold difference in genetic diversity rather than a higher recombination rate.
In keeping with a number of anterior typing surveys, the analysis showed a strong relationship between certain types of isolates and their host fish species (see, for instance, references 3 and 11). The sequence data presented here allowed this relationship to be quantified and revealed that it stems from the existence of clonal complexes with marked preferential association with particular host fish species. Although highly significant from a statistical standpoint, the association was not absolute and representatives of the same clonal complex were occasionally found in different host species. Evolutionary and epidemiological hypotheses could explain this association, and it would be premature to draw any conclusion on the relative contributions of these two lines of explanation. The association between ST and host fish species might reflect adaptive niche specialization, but it could also be merely random and maintained by preferential routes of transmission. The occurrence of identical STs on the same farmed fish species in distant regions of the world (Europe and North America, North America and Chile or Japan, and Europe and Chile) revealed by our data indeed indicates a probable role for the international trade of brood fish and fish eggs in the spread of certain STs. The association between ST and host fish species was not limited to farmed species. For instance, ST 22 (three isolates from a 5-year period) was found only in tench and ST 15 (two isolates from a 9-year period) was found only in eel.
Interestingly, this study provides little support for the hypothesis of a global spread of the pathogen from North America. Such a scenario could hardly explain the diversity of strains found in Europe, particularly those found only on wild, nonsalmonid fish. Instead, the pattern of expansion provided by historical records may reflect the spread of two main clonal complexes (CC1 and CC2) by human activities.
An important objective for future research will be to delineate the respective roles of epidemiology and evolution in the association between clonal complexes and host species. The data collected here allowed the design of an efficient MLST scheme based on seven loci that could distinguish the 33 different STs. The sequences of these loci are available through an MLST database that should provide a powerful tool for future surveys needed to better understand the epidemiology and population structure of F. psychrophilum. In particular, the data collected here will allow the results of in-depth studies of any particular collection of strains to be assigned within the broader context of species diversity. Evolutionary niche adaptation may depend on the presence/absence of specific genes, and experimental evidence of the variability of gene repertoires within the species already exists (40). Our understanding of possible adaptive niche specialization will therefore be considerably improved by additional complete genome sequences. The basic knowledge of the population structure acquired in this study will help interpret the forthcoming genome sequences and prioritize the isolates to be sequenced.
This study was supported by the INRA-AIP séquençage program and ANR-07-GMGE-004 FLAVOPHYLOGENOMICS.
Published ahead of print on 18 April 2008. ![]()
Present address: INRA, Ecologie et de Physiologie du Système Digestif UR910, F-78350 Jouy-en-Josas, France. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»