Previous Article | Next Article ![]()
Applied and Environmental Microbiology, April 2004, p. 1999-2012, Vol. 70, No. 4
0099-2240/04/$08.00+0 DOI: 10.1128/AEM.70.4.1999-2012.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Department of Botany, University of Toronto, Toronto, Ontario, Canada
Received 15 September 2003/ Accepted 15 December 2003
|
|
|---|
|
|
|---|
P. syringae strains are subclassified into approximately 50 pathogenic varieties, or pathovars, according to the plant host from which they were originally isolated. Although the pathovar nomenclature system has been useful in an agricultural context, its biological justification is questionable. Many individual clones are known to grow quite well on a number of different plant hosts. Additionally, there are essentially no biochemical or physiological distinctions that reliably differentiate P. syringae pathovars (35). Finally, phylogenetic studies of P. syringae indicate that strains with the same pathovar designation are not always closely related (53).
A previous phylogenetic study of P. syringae by Sawada et al. (53) revealed a remarkable degree of congruence between two housekeeping genes (gyrB and rpoD) and two components of the pathogenesis-associated type III secretion system (hrpS and hrpL), leading to the conclusion that the type III secretion system was acquired prior to the diversification of the P. syringae pathovars. The evolutionary history of the argK gene (involved in phaseolotoxin production), on the other hand, was clearly inconsistent with that of the housekeeping genes, lending strong support for an important role for horizontal transfer at this locus (52). On the basis of the four congruent genes, the species was partitioned into three primary monophyletic groups.
The diversity of P. syringae strains has been further explored by the physical mapping of the ribosomal gene cluster (rrn) (52). This analysis revealed that the size and structure of P. syringae genomes vary greatly by pathovar and that large-scale genomic rearrangements are common.
Gardan et al. (20) have used DNA-DNA hybridization to characterize the taxonomic structure of P. syringae. They concluded that one of the major clades of P. syringae (the Sawada group 3 strains, which include the pathovars savastanoi, phaseolicola, and glycinea) was sufficiently distinct that it should be given separate species status as Pseudomonas savastanoi.
Although all of these studies have been informative, a clearer picture of the population structure of the species would be gained by focusing strictly on housekeeping genes. These genes are components of the "core genome" (see below) and are less likely to undergo horizontal gene transfer. Housekeeping genes are particularly useful for clarifying clonal relationships among strains and for assessing the importance of recombination in driving the evolution of clonal lineages.
Recent comparative studies of bacterial genomes have found bacterial evolution to be a composite of forces acting on two largely independent yet intimately intertwined genomes: the "core" and "flexible" genomes (25). The core genome consists of genes ubiquitously found among strains of a bacterial species. These genes typically encode proteins that are essential for the survival of the organism, such as housekeeping genes. Components of the core genome are generally less likely to undergo horizontal gene transfer, and they either evolve neutrally or are selectively constrained. The core genome can be thought of as the clonal backbone of the species, and its constituents can be used to track the evolutionary history of clonal lineages through time.
Unlike the core genome, the flexible genome consists of genes that vary among strains within a species. These genes typically encode proteins that are responsible for adaptation to specific niches, hosts, or environments. The flexible genome may include virulence-associated genes, resistance genes, and genes associated with mobile elements such as bacteriophage, plasmids, or transposons. By definition, the flexible genome evolves largely through horizontal genetic exchange (i.e., through gene acquisition and loss). Since horizontal transfer shuffles and effectively obscures evolutionary histories, the most reliable approach to characterizing bacterial diversity would focus strictly on the core genome.
Multilocus sequence typing (MLST) (12, 40) is a recently developed strain-typing system that focuses strictly on the core genome. This highly accurate and reproducible approach uses the DNA sequences from seven housekeeping genes to differentiate strains and clonal lineages. The choice of seven loci ensures adequate variability so that one can distinguish between the most closely related strains and still be able to track global clonal dynamics. The use of housekeeping genes focuses the analysis on the core genome, thereby revealing the clonal history of the species with the highest possible accuracy.
One of the most powerful aspects of MLST analysis is its ability to detect and measure recombination (15, 16). Recombination has a tremendous influence on bacterial evolutionary dynamics (1, 17, 21, 22). It can cause the rapid diversification of clones when genetic material is introduced from other clonal lineages; conversely, it can homogenize genetic variation when it occurs within individual clonal groups. By reshuffling genetic variation, recombination creates new genotypes that may be better adapted to particular hosts or environments. Recombination has been shown to play a central role in the evolution of several important pathogens (17). An appreciation of recombination is central to understanding how bacterial clones and populations evolve and adapt to new environmental and host challenges.
In this study, we have provided the first MLST analysis of a plant-pathogenic bacterium. We find that P. syringae is surprisingly clonal, contrasting sharply with the extremely labile virulence-associated genes (24, 53). We also show that the core genome is only weakly associated with the host of isolation and pathovar designation and that P. syringae is endemic in plant populations. We hope that this study will provide a community resource and form the foundation for future investigations into P. syringae pathogenesis and host adaptation.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Strains
|
![]() View larger version (22K): [in a new window] |
FIG. 1. Schematic
representation of the positions of the seven housekeeping genes used in
this study, based on the sequenced P. syringae pv. tomato
DC3000 genome (NCBI accession no. NC_004578). The position of
each locus (in base pairs) is given below the gene
name.
|
|
View this table: [in a new window] |
TABLE 2. MLST
primers
|
DNA sequencing was performed with the CEQ-DTCS Quick Start kit on a Beckman-Coulter CEQ 8000 DNA sequencer according to the manufacturer's instructions. Forward and reverse sequences were obtained by using either the PCR primers or internal primers for each locus (Table 2). These sequences were edited and aligned by using Sequencher (Gene Codes). A total of 399 to 650 bp of overlapping sequence was obtained from the seven housekeeping genes for each strain. Since the amount of data obtained was different for each strain, all sequences were trimmed to include only those regions for which we had data for all strains. Sequences from each locus were aligned by using ClustalW (4) with the "slow-accurate" default alignment parameters and were trimmed to the minimal shared length in GeneDoc (www.psc.edu/biomed/genedoc).
There were 40 unique sequence types (see the next section) among the 60 strains sequenced. Some analyses used the "nonredundant" data set, which did not include sequences that were identical to another sequence unless they had been isolated from different hosts. Other analyses used the "fully nonredundant" data set, which did not include any sequences that were identical to another, regardless of host of isolation.
MLST analyses.
Aligned sequences were analyzed with
the Sequence Type Analysis and Recombination Test package (START)
(32) or with applications
available from the MLST home page
(www.mlst.net).
Allele assignments were made through the MLST nonredundant database
program NRDB, which gave each strain an allele profile known as a
sequence type (ST). These STs were then grouped by similarity by use of
BURST ("based upon related sequence types"). Clonal
complexes were defined as in the work of Feil et al.
(16). A consensus group
of a clonal complex is composed of those strains with the predominant
allelic profile. Single-locus variants (SLVs) are those strains that
differ from the consensus group at a single locus. Double-locus
variants (DLVs) differ from the consensus group at two loci, while
satellites (SATs) differ at three or more loci. A unique clonal group
was defined as comprising strains that were identical at five or more
loci (12,
40). SLVs were used to
estimate the relative rates of recombination and mutation
(16).
Phylogenetic analysis.
Analyses were
performed on individual gene sequences as well as on the concatenated
data set. Modeltest (46)
was used to determine the optimal nucleotide substitution model for
each gene. The evolutionary models chosen based on the likelihood ratio
test implemented in Modeltest were found to be more strongly supported
than the alternative Akaike information criteria parameters and were
used for further analyses. Neighbor-joining (NJ) trees were generated
in MEGA, version 2.1
(36), by using the
Tamura-Nei evolutionary model with gamma correction and 1,000 bootstrap
replicates for all sequences. Maximum-likelihood (ML) and
maximum-parsimony (MP) trees were generated in PAUP*,
version 4.0b10 for UNIX
(59), by using the
optimal Modeltest parameters and a starting NJ tree. Split
decomposition (2,
11) analyses were
performed with SplitsTree, version 3.2
(30), by using Hamming
distances, equal edge lengths, and 1,000 bootstrap replicates. Split
decomposition is a parsimony method that does not impose a branching or
tree-like structure on the data set. It permits reticulations or
network structure that may be indicative of past recombination events.
Intragenic recombination was estimated by split decomposition of
individual genes, while total recombination (intra- and intergenic) was
estimated by using the concatenated data set.
Phylogenetic congruence between ML gene trees was tested by using the Shimodaira-Hasegawa (SH) (55) and likelihood congruence (LC) tests (14). The SH test determines the likelihood of a data set given alternative trees, while the LC test determines the likelihood of the tree topologies from the different genes relative to random tree topologies. To put it simplistically, the LC test asks if the evolutionary histories of the MLST loci are different, while the SH tests asks if the evolutionary histories of the MLST loci are the same. The LC test is significant whenever evolutionary histories are correlated and is rejected when there is free recombination. The SH test is significant only when evolutionary histories are essentially identical and is rejected when there is even a very small amount of recombination. The SH test was implemented via the Phylip program DNAML with no branch lengths (18). The LC test was implemented in PAUP*. The incongruence length difference (ILD) test (5, 45) was also performed to detect differences between the MP trees. The ILD test measures the increase in homoplasy seen when data sets are concatenated.
Population genetic analysis.
Pairwise
nucleotide diversity (
) and the number of segregating sites
(Watterson's
) were calculated with DnaSP, version 3.53
(50). Three tests of
selection were performed: Tajima's D, Fu's
Fs statistic, and the
Ka/Ks
ratio test. The significance of the Fs
statistic was determined by coalescent simulations. These calculations
were performed with DnaSP or LDhat or through the START package.
Genetic distances were calculated by using a Kimura 2-parameter model
with a
correction of 0.18 in MEGA, version 2.1
(36).
A number of
recombination analyses were performed in addition to the MLST analysis.
The index of association (IA) test
(43) and the homoplasy
ratio test (42) were
conducted with the START package. Sliding-window analyses of
phylogenetic congruence were performed using the difference of sums of
squares (DSS) analysis of TOPALi, with a step size of 10 and a window
size of 40
(www.bioss.ac.uk/
iainm/topali/),
and bootscanning (51) as
implemented in SimPlot, version 2.5
(sray.med.som.jhmi.edu/RaySoft/SimPlot/Version2/SimPlot_Doc.html).
Recombination rates were measured by using a
coalescence-based method for detecting linkage
disequilibrium as implemented in LDhat
(44). Gene conversion
model C and a mean tract length of 100 nucleotides were used for
analysis of biallelic sites.
Demographic history.
We explored the
demographic history of our sample by using classic skyline plots
(49,
57). Skyline plots are
graphical, nonparametric tests that use coalescent methods to estimate
the effective size of a population through time. We used our fully
nonredundant concatenated data set to make a fully resolved ML tree in
PAUP*. This tree was analyzed with GENIE, version 3
(48), by using the
differential evolution optimizer and models of constant, exponential,
and expansion population
growth.
Association tests.
We tested for associations between
the genetic data and host of isolation or pathovar designation by using
the analysis of molecular variance (AMOVA) as implemented in Arlequin,
version 2.0 (54).
Pairwise distances were computed by using the Tamura and Nei distance
measure with a gamma correction of 0.18. This correction was based on
the PAUP* ML analysis. One thousand permutations of the data were used
to create the null
distribution.
|
|
|---|
Phylogenetic analysis.
NJ, ML, and MP trees were constructed
independently for each locus and for the concatenated data set. The
trees were rooted with the orthologous sequences from P.
fluorescens K56. All of the phylogenetic methods produced very
similar trees, with identical major monophyletic groups for each data
set. The monophyletic groups corresponded well to the three major
groups identified by Sawada et al.
(53) (Fig.
2). Group 1 contains primarily pathogens of brassicaceous crops such as
radish and cabbage (pathovar maculicola) and tomato pathogens (Table
1). Group 1 also contains
the tomato and Arabidopsis pathogen PtoDC3000, which has been
sequenced by The Institute for Genomic Research (TIGR). Group 2 has the
greatest host diversity, containing strains that are pathogenic to
hosts as diverse as tomatoes, brown rice, and lilacs (pathovars tomato,
aptata, and syringae, respectively). Pea pathogens (pathovar pisi)
appear to belong exclusively to this group. This group also contains
the greatest number of syringae pathovarsa designation often
applied rather indiscriminately. The nonpathogenic strains Cit7 and
TLP2 belong to this group, as well as PsyB728a, which causes bacterial
brown spot of bean and has been sequenced by DOE-JGI. Group
3 is dominated by bean pathogens (pathovars glycinea and phaseolicola).
It also contains cucumber (pathovar lachrymans) and tobacco (pathovars
tabaci and mellea) pathogens. Although not included in our original
group of 60 strains, strain Pph1448A, which causes halo blight on beans
and Arabidopsis thaliana, is also found in group
3. This strain is currently being sequenced by TIGR. Of special
interest in this phylogenetic analysis is the identification of a new
syringae group. Group 4 strains constitute the most basal clade of the
major groups and are strictly pathogens of monocotyledon hosts (rice,
onions, and oats). Of the six monocot pathogens in our study, four are
in group 4. Only two strains do not cluster in one of the four syringae
groups. PmaES4326 and PmaYm7930 are both radish pathogens (pathovar
maculicola) that diverged prior to the diversification of the rest of
the species. PmaES4326 has gained attention as an important model
strain for the study of P. syringae pathogenesis
(10,
23,
24,
26).
![]() View larger version (33K): [in a new window] |
FIG. 2. NJ
tree of the concatenated data set. The four major groups discussed in
the text are labeled. The host of isolation is given next to each
strain designation. Bootstrap scores greater than 60 are given at each
node. PfK756 is a P. fluorescens strain used as an
outgroup.
|
An LC test was performed on the nonredundant data
set to determine the likelihood of the observed gene tree topologies
relative to random topologies (which would be expected with truly
independent evolutionary histories)
(14). ML trees were
constructed for each gene in PAUP* by using parameters for the model of
best fit as chosen by Modeltest
(46). The likelihood of
these trees given the data for the other housekeeping genes was
determined and compared to the likelihood of 1,000 random trees
generated using the same parameters for the original data. The LC test
showed that the differences in the likelihoods
(
lnL) among MLST gene genealogies are
significantly smaller than the differences observed between the MLST
trees and random trees (Table
3). The range of likelihood differences among MLST trees was very small
relative to the differences obtained by using the random trees,
supporting strong congruence among the MLST genes.
|
View this table: [in a new window] |
TABLE 3. LC
test for phylogenetic
congruence
|
|
View this table: [in a new window] |
TABLE 4. SH
Test for phylogenetic congruence
|
Split decomposition analysis was used to investigate the influence of recombination on the evolution of each locus (2, 11). Split decomposition constructs a network connection between taxa whenever there is a phylogenetic inconsistency due to homoplasy or recombination. Recombination is generally inferred when competing splits have equal support. Split decomposition analysis of the individual MLST loci recovered the same phylogenetic clusters as the other, more traditional phylogenetic approaches (Fig. 3). This included very strong support (bootstrap, 97%) for the unusual grouping of some group-2 and -3 strains at the gyrB locus as previously mentioned. In general there was very little network structure for the individual genealogies, and that which was seen was primarily localized near the tips, indicating some recombination within groups but not between groups. acn, cts, gyrB, pfk, pgi, and rpoD had no significant reticulations, while gapA had a single significant reticulation in the group-2 strains. Significant reticulations are indicated when there are alternate statistically supported paths in the graph. These are regions of the graph where alternate paths have roughly the same high level of statistical support, as determined by bootstrap analysis. Split decomposition analysis of the concatenated data set showed much more significant network structure (Fig. 4). The majority of the reticulations occurred near the base of the graph, with one supported by bootstrap analysis (100% along one path and 71% along the alternate path). This reticulation corresponds to the recombination event seen in the gyrB locus.
![]() View larger version (37K): [in a new window] |
FIG. 3. Split
decomposition analysis of each housekeeping gene. Bootstrap scores
greater than 60 are given at each node. Significant reticulations are
those in which there is a roughly equal high level of support for
alternative paths in the
graph.
|
![]() View larger version (25K): [in a new window] |
FIG. 4. Split
decomposition analysis of the concatenated data set. The four major
groups discussed in the text are indicated. Bootstrap scores greater
than 60 are given at each
node.
|
Each of the two clonal groups had one SLV. One of these, PlaN7512, differed from the concensus (as represented by Pla1) at a single nucleotide site of the rpoD locus. This was a synonymous change that resulted in the creation of a unique allele. The other SLV, PtoKN10, differed at 34 nucleotide sites from the consensus (as represented by PtoDC3000), also at the rpoD locus. Interestingly, the PtoKN10 rpoD allele was the same as the allele found in the highly divergent bean pathogen PsyB728A. Of the 34 pairwise differences between the PtoKN10 and PtoDC3000 alleles, 2 are nonsynonymous (Ks = 0.3612; Ka = 0.0061). Given the large number of differences between these alleles and the fact that an identical allele is present in the population, it is very likely that the PtoKN10 rpoD allele was introduced into the strain by recombination. Based on these very limited groups we can calculate a recombination-to-mutation ratio (16, 22). Of the two SLVs, one change is presumably due to mutation and the other is presumably due to recombination; therefore, the per-gene ratio would be 1:1. The putative recombination event resulted in 34 changes, while the mutation resulted in a single change, giving a per-site recombination-to-mutation ratio of 34:1. Given the extremely small sample size from which these numbers are derived, they must be interpreted with caution. A better estimate of the recombination-to-mutation ratio, which is based on a coalescent analysis, is presented below.
Polymorphism.
The rates of evolution are roughly the
same among loci. All loci have 23 to 31 alleles, and the average number
of alleles per locus is 27 (Table
5), resulting in more than 1010 (277)potential STs. Watterson's
and the pairwise nucleotide
diversity (
) are also highly consistent across loci (Table
5), ranging from 0.03977
to 0.08547 for
and from 0.04211 to 0.10035 for
. The
average synonymous nucleotide diversity was 0.2411, while the average
nonsynonymous nucleotide diversity was 0.0108. The average divergence
from P. fluorescens was 0.1792. A sliding-window analysis of
the total pairwise nucleotide diversity and divergence from P.
fluorescens is presented in Fig.
5. Nucleotide diversity remains fairly constant over the seven loci, while
substantial variation is seen in the degree of divergence, particularly
in the pfk and rpoD loci. These peaks in divergence
are presumably not due to positive selection (see below) but may be due
to the relaxation of selective constraints for part of the gene. The
genetic distances within and between phylogenetic groups (Fig.
2) are presented in Table
6. The mean within-group distance is 0.017, while the mean between-group
distance is 0.150.
|
View this table: [in a new window] |
TABLE 5. Population
genetic analyses
|
![]() View larger version (17K): [in a new window] |
FIG. 5. Sliding-window
analysis of polymorphism (thin lines) and divergence from P.
fluorescens (heavy lines) for each of the seven housekeeping
genes.
|
|
View this table: [in a new window] |
TABLE 6. Genetic
distances within and between groups
|
and
), while Fu's Fs statistic
(19) is a one-sided test
that looks for an excess of rare alleles. This test is particularly
powerful for detecting genetic hitchhiking and population expansion
(34). Neither of the
tests produced any significant results (Table
5). The final test of
selection performed was the
Ka/Ks test, which measures the
ratio of the nonsynonymous to the synonymous substitution rate. This
ratio should be equal to 1 under strict neutrality, greater than 1
under positive selection, and less than 1 under purifying selection.
All Ka/Ks ratios were
substantially less than 1, ranging from 0.0135 for cts to
0.052 for pfk, indicating that all the loci are under fairly
strong purifying
selection.
Recombination.
Maynard Smith's IA
(43) was used to assess
the linkage disequilibrium between alleles among loci. An IA
significantly greater than 1 indicates linkage disequilibrium in the
sample. IA for the complete MLST data set was highly
significant at 3.689, with an observed variance of 2.143 and an
expected variance of 0.457. IA calculated for
the four major monophyletic groups had roughly the same values, ranging
from 3.290 for group 1 to 3.713 for group 2, and therefore showed
significant linkage within groups. This indicates a population
structure where recombination is limited both within and between
groups.
The homoplasy ratio test determines if there is a significant excess of homoplasies in a phylogenetic tree relative to an estimate of the number of homoplasies expected by repeated mutation in a strictly clonal species (42). Recombination should result in an excess of homoplasies. When this test was applied to the P. syringae data, it actually resulted in homoplasy ratios below zero, which is expected to be the lower bound for a strictly clonal organism (data not shown). Tests on all loci gave nonsignificant results, indicating that there were significantly fewer homoplasies than would be expected under free recombination.
McVean et al. (44) have developed a coalescence-based method for detecting recombination that is implemented in LDhat. This program permits the estimation of the per-locus population recombination rate, 2Ner (Ne, effective population size; r, recombination rate), by using a coalescence-based approximate-likelihood method. Intragenic recombination was estimated by using individual genes, while intergenic recombination was estimated by using the concatenated data set. By use of an average tract length of 100 nucleotides, limited recombination was detected in the seven housekeeping genes, with 2Ner estimated at 3.03 to 10.10 for the individual genes and 11.11 for the concatenated data set (Table 7). The highest recombination rate detected was 10.101, in gyrB. We also tested for recombination within the major monophyletic groups. Very little intragenic or intergenic recombination was detected in groups 1 and 4 (2Ner of the concatenated data = 0), but significant recombination was seen in groups 2 and 3 (2Ner of the concatenated data, 4.04 and 16.16, respectively).
|
View this table: [in a new window] |
TABLE 7. LDhat
recombination analysis
|
, the ratio of the population
recombination rate
(2Ner) to the
population mutation parameter
(2Neµ), where Ne is the
effective population size. By dividing these two population parameters,
we cancel out the Ne and are left with
r/µ, the relative strength of recombination versus
mutation in generating genetic variability. Using the values of
and
obtained from LDhat, we see that
ranges from 0.121 for pfk to 0.622 for pgi, with an
average
of 0.252 (Table
7). These numbers are in
general agreement with those derived from DnaSp, which uses the Hudson
recombination rate estimator
(29). The simple
interpretation of these results is that any single nucleotide in the
P. syringae genome is 4 times more likely to change due to
mutation than it is to change due to recombination. When the data set
is broken down into the four major clades, a striking pattern emerges.
Although the mutation rate stays roughly the same across the groups,
the recombination rates are dramatically different. Essentially all of
the recombination in the species appears to be happening in groups 2
and 3, while groups 1 and 4 appear to be almost strictly clonal (Table
7). Split decomposition
analyses of the individual groups also support this conclusion (data
not shown). These results may once again reflect the early
recombination event that occurred between the subset of group-2
(PsCit7, PsyFTRS_W6, Pto2170, and PapG733) and group-3 (Pmy1
and PmsFTRS_U7) strains at or near the gyrB
locus. We attempted to localize putative recombination breakpoints via sliding-window phylogenetic analysis. Both TOPALi, which performs a DSS test on trees generated from the two halves of a window moved across the target sequence, and bootscanning (as implemented in SIMPLOT), which scans for changes in the phylogenetic relatedness of individual sequences among clades, failed to find any significant recombination breakpoints (data not shown).
Demographic history.
An exciting recent development in
coalescent analyses of sequence data is the ability to make estimates
of the sample's effective population size through time. Skyline
plots are coalescence-based analyses of demographic history. These
analyses provide a visual and intuitive method to determine if
population sizes have been constant or changed during recent
evolutionary history. Skyline plots that are roughly parallel to the
x axis indicate a constant population size. Those that drop
off as time increases indicate population expansion. The exact shape of
the curve can be used to infer more details about the nature of the
population size changes
(57). Rapid population
expansion is indicative of epidemic pathogens, while constant
population sizes are indicative of endemic pathogens. A classic skyline
plot was constructed for our fully nonredundant sample by using the
concatenated data. The skyline plot (Fig.
6) shows no indications of dropping off as time increases, thereby
indicating that the sample maintains a fairly constant effective
population size through time. These results are consistent with P.
syringae being an endemic
pathogen.
![]() View larger version (18K): [in a new window] |
FIG. 6. Skyline
plot of the concatenated data
set.
|
|
View this table: [in a new window] |
TABLE 8. AMOVA
|
|
|
|---|
Phylogenetics of P. syringae.
Our
phylogenetic analysis of P. syringae reveals four major groups
of strains, three of which largely correspond to those identified by
Sawada et al. (53). Group
4 is intriguing because it is the most divergent major clade and
contains only pathogens of monocots (rice, oats, and onions). Only two
other monocot pathogens are present in our study; both of these strains
are found in group 2, but they are highly divergent from each other. Of
the nine strains that overlapped between our analysis and that made by
Sawada et al. (53), all
were in agreement with respect to their phylogenetic
grouping.
Host association was found to be relatively weak in the nonredundant sample set. Almost 80% of the total variation was found within populations defined by host of isolation. Alternatively stated, less than 20% of the variation was host specific. These numbers shift fairly dramatically to 61% within-host variation and 39% among-host variation when the full sample set is used, revealing a significant sampling bias. The bias is most apparent with the high level of identity found in the glycinea and maculicola pathovars. It is possible that this bias has a biological basis, but a simpler explanation is that most of the strains came from the same stock center, and many were collected from the same geographic area. This critical weakness must be corrected before a full picture of P. syringae population structure is obtained.
Although sampling biases are almost universally considered negative aspects of studies, in this case these "problems" actually provide tantalizing glimpses into the dynamics of natural bacterial populations. Eight soybean (glycinea) pathogens collected over a 13-year period are genetically identical, while four Chinese cabbage pathogens (maculicola) remained genetically uniform over 12 years. Three strains are genetically identical, yet all were isolated from different hosts; most interesting is the identity between the paper mulberry pathogen PbrKOZ8101 and the tobacco pathogen Pta6606 despite their different hosts and 13 years separating their collection. Finally, the two radish pathogens PmaES4326 and PmaYM7930 are genetically identical despite the fact that the former was collected in the United States in 1965 while the latter was collected 14 years later in Japan. If we take the conservative assumption of 2 generations per day, then 14 years encompass more than 10,000 generations, during which no mutations occurred in the 3,135 bp of sequence we analyzed. This may indicate a stabilized host-pathogen relationship.
Conversely, there are also quite a few cases where strains isolated from the same host turn out to be extremely divergent. The radish pathogen PmaKN91 clusters with the bean pathogens in group 3, while all other maculicola pathovars are in group 1 or are species outliers. There are tomato pathogens both in group 1 (PtoDC3000 and PtoKN10) and in group 2 (Pto2170). The well-characterized snap bean pathogen PsyB728a is found in group 2, while all other bean pathogens are tightly clustered in group 3. There are two widely divergent Japanese apricot pathogens in group 2 (PsyFTRS_W7835 and PsyFTRS_W6601) and a third in group 3 (PmpFTRS_U7805). Finally, although two rice pathogens are in group 4 (PorI_6 and Por36_1), there is a third (PttG733) that clusters tightly with apricot and tomato pathogens and with Cit7, a nonpathogenic isolate from oranges. These instances of host convergence may be due to common features of the flexible genome.
P. syringae species definition.
Gardan et al.
(20) have proposed that
group-3 strains be given separate species status as P.
savastanoi. A numerical taxonomy analysis of DNA-DNA hybridization
data showed that savastanoi pathovars clustered with glycinea and
phaseolicola pathovars and that this cluster was distinct from the
syringae pathovars. These hybridization results are largely consistent
with our MLST analysis; nevertheless, a cladistic analysis of our data
strongly refutes the ascension of pathovar savastanoi to species
status. Our analyses reveal that P. syringae group
3 is not monophyletic at the gyrB locus and that
it is consistently a sister clade to group 2. Raising group 3 to
species status would leave the rest of the species paraphyletic,
thereby violating cladistic rules of systematics. If group 3 is to be
given species status, then each of the other groups would likewise
necessarily have to be given the same status, thus splitting P.
syringae into four separate species.
Should P. syringae be split into four distinct species based on this study? We do not believe the data support this proposal. The ecology of all P. syringae strains is very similar: all are commensal and/or pathogens of aerial plant surfaces. Additionally, there are no reliable biochemical or physiological distinctions that differentiate the four groups (35). There are also a small number of core genome alleles shared among strains that belong to different P. syringae groups. Finally, the evolutionary histories of a very large number of noncore genes (e.g., virulence-associated genes such as type III effectors) are highly incongruent with that derived from the core genome and are supportive of extensive horizontal gene transfer among strains (24; D. S. Guttman, unpublished data). In conclusion, given the relatively small size of the present data set and the lack of distinctiveness of the four groups, we believe that splitting the species into four is unjustified at this time.
Recombination and clonality.
Recombination
plays an extremely important role in bacterial evolution by
homogenizing genetic variation within clones and introducing genetic
variation between clones. The relative importance of recombination in
generating genetic variation and breaking down clonal complexes has
been a source of substantial controversy and intensive investigation
(14,
15,
21,
22,
41,
43,
56,
61). The extent of
recombination in P. syringae has been addressed only once
before. Maynard Smith et al.
(41) analyzed the
multilocus enzyme electrophoresis (MLEE) data collected by Denny et al.
(9) from two pathovars of
P. syringae. They found extremely high levels of linkage
disequilibrium in the total sample and slightly lower, but still
significant, levels within each pathovar. There is difficulty in
interpreting these data simply due to the imperfect correspondence
between linkage disequilibrium and clonality; nevertheless, the
extraordinarily high level of linkage seen (nearly three times higher
than that of any other species in the study) is prima facie evidence
that P. syringae is a highly clonal species.
Our
analyses support the conclusion that P. syringae is a highly
clonal organism. The high level of congruence between gene trees and
the inability of the sliding-window phylogenetic tests to identify
recombination breakpoints within loci support a common evolutionary
history for loci widely separated around the genome. The relative lack
of reticulation in the split decomposition graphs, particularly when
individual loci are examined, further supports a relatively limited
role for recombination. The coalescence-based estimates of
,
the recombination-to-mutation rate ratio, indicate that mutation is
perhaps four times more likely to change any particular nucleotide than
recombination. The MLST-based recombination analyses are also
interesting, but much less reliable. The per-locus rate of
recombination was estimated to be equal to the mutation rate, while the
per-site recombination rate was 34 times that of the per-site mutation
rate. These numbers must be accepted with extreme caution, because this
analytical technique was developed for much larger data sets and is
probably inappropriate for a data set of the present size. Calculation
of Maynard Smith's IA
(41) again is in
agreement with the findings of the MLEE study, with highly significant
levels of linkage observed in the total sample and within each group.
The homoplasy ratio test reached its lower limit, indicating that there
was far less homoplasy than would be expected under free recombination.
In summary, all of the analyses are in general agreement that
recombination is relatively rare in this species. The conclusions from
the coalescence-based approach are perhaps most meaningful in this
analysis given the size and structure of the data set.
How does P. syringae compare to other species studied? Taking all of the analyses together, it appears that the variation-generating potential of recombination is roughly equal to or slightly less than that of mutation in this species. This recombination rate is dramatically lower than that seen in most other species. Neisseria meningitidis has the highest per-nucleotide ratio of the recombination rate to the mutation rate on record, at 100:1 (14), while Escherichia coli has a ratio of approximately 50:1 (22) and Streptococcus pneumoniae has a ratio of 24:1 (14). The lowest ratio on record is that of Staphylococcus aureus, where any nucleotide site is 15 times more likely to be changed by mutation than by recombination (ratio, 1:15) (13).
Population structure of P. syringae.
In synthesizing these analyses, we are
left with a picture of a species that is highly clonal. There is
essentially no genetic exchange of the core genome among strains on
different hosts. The split decomposition analysis reveals a significant
network structure only in the concatenated data set, which may
correspond to a past intergenic recombination (occurring at loci other
than those sequenced) or to the one event seen at the gyrB
locus. All of the significant reticulations in this analysis are near
the center of the graph. The most likely explanation for this pattern
is that early in the origin of the species there was limited genetic
exchange between strains, but as the strains diverged and specialized
on their respective hosts, they became more reproductively isolated.
The result is clonal lineages that are evolving essentially
independently of the rest of the species with respect to their core
genome. The finding of remarkable genetic homogeneity among soybean,
cabbage, or radish pathogens isolated over a dozen years further
supports this conclusion.
The remarkable genetic stability of these strains is supported by the analysis of demographic history. The skyline plot clearly indicates that the P. syringae population has maintained a roughly constant size through time. This is an excellent indication that the species is endemic in plant populations and that large-scale (affecting a significant fraction of the total species) outbreaks of new and more virulent pathogens are rare. What cannot be ruled out at this time is smaller-scale, host-specific outbreaks. The emergence and spread of a new strain that is more virulent on a single host would result in a selective sweep, or purge, of the genetic variation within that host-specific population. Importantly, this epidemic would not affect strains on other hosts. We would expect these dynamics to result in genetic homogeneity within host-specific populations and in extensive divergence and perhaps isolation between populations. This is not inconsistent with the present data, but much more extensive natural population sampling would be required to confirm this hypothesis and identify these epidemics.
If the core genome of P. syringae were responsible for determining host specificity, it would be reasonable to assume that the genetic variation in the housekeeping genes would be very tightly associated with the host of isolation. Additionally, since the core genome is effectively clonal, we would expect phenotypic differences between clonal lineages to accumulate as they wandered down their independent evolutionary paths. However, host of isolation explains only 20% of the variation in the core genome. Short of host specificity, one pathovar of P. syringae is largely phenotypically indistinguishable from another. Clearly, factors outside of the core genome must be maintaining the cohesion of the species and must play very significant roles in determining host suitability. Likely flexible-genome candidates for this role include a wide range of virulence factors, such as type III secreted effector proteins, toxins, and resistance genes. Hopefully, the intensive study of P. syringae virulence factors currently under way will shed light on the complex mechanisms used by this important organism to adapt to its diverse set of hosts.
D.S.G. is supported by grants from the Natural Sciences and Engineering Research Council of Canada and the Canadian Foundation for Innovation. S.F.S. is partially supported by an Ontario Graduate Scholarship and a Vedanta Society of Toronto Vivekananda Scholarship.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»