Previous Article | Next Article ![]()
Applied and Environmental Microbiology, November 2006, p. 7098-7110, Vol. 72, No. 11
0099-2240/06/$08.00+0 doi:10.1128/AEM.00731-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Department of Biology, University of California, Riverside, California,1 The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland,2 The Peter Medawar Building for Pathogen Research, University of Oxford, Oxford, United Kingdom,3 Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, The Marine Biological Laboratory, Woods Hole, Massachusetts,4 Department of Biology, University of Rochester, Rochester, New York5
Received 29 March 2006/ Accepted 13 August 2006
|
|
|---|
|
|
|---|
The importance of Wolbachia strains ranges from their effects on the reproductive biology, ecology, and evolution of their hosts to their potential use in biological control of pest insects and biomedical applications (35, 54, 61). Wolbachia strains can extensively manipulate host reproduction by means such as inducing parthenogenesis, feminizing genetic males, killing male embryos, or causing cytoplasmic incompatibility between gametes (16, 45, 46). In filarial nematodes, the host dependence on Wolbachia for propagation has generated great interest in the medical field for targeting these bacteria in the treatment of filariasis (10, 38).
Wolbachia strains were first found in the mosquito species Culex pipiens and designated Wolbachia pipientis (15). These bacteria were subsequently found to be widely distributed in arthropods. However, additional species have not been named, pending improved understanding of the genetic variation of this genus (6, 26). At present, supergroup designations are routinely used to describe the major phylogenetic subdivisions of this bacterial group. So far, eight supergroups have been designated (supergroups A to H) primarily on the basis of data from the 16S rRNA, ftsZ, and wsp (Wolbachia surface protein [WSP]) genes (31, 59, 62). Most of the supergroups are found in arthropods (supergroups A, B, E, F, G, and H), and the majority of insect Wolbachia strains belong to supergroups A and B. Fully annotated genomes of the following two Wolbachia strains are available: wMel, a supergroup A strain from the arthropod host Drosophila melanogaster (60), and wBm, a supergroup D strain from the nematode host Brugia malayi (9).
Despite increasing studies of Wolbachia and a growing list of infected arthropod species, researchers still lack a standard and reliable system for typing and quantifying strain diversity. A strain typing system was previously developed using the Wolbachia surface protein gene, wsp (62). However, evidence of extensive recombination (2, 57) and strong diversifying selection (1, 21) at this gene make it unsuitable for reliable strain characterization if it is used alone. Furthermore, recent detection of recombination both within and between Wolbachia genes (3) suggests that a single-locus approach to strain characterization may be misleading. A multigene typing approach has recently been used to type Wolbachia strains (32). This study focused on strains found in Drosophila hosts and demonstrated the utility of multigene typing for Wolbachia strains.
Here we developed a standard multilocus sequence typing (MLST) system for Wolbachia, which works broadly with Wolbachia strains found in diverse singly infected host species. This system includes a web-accessible central database and toolkit for storage, organization, and analysis of data. The data are fully portable between laboratories and allow extensive comparative analyses (22). This accurate strain typing system should provide the genetic framework for tracing the movement of Wolbachia globally and within insect communities and for associating Wolbachia strains with geographic regions, host features (e.g., ecology and phylogeny), and phenotypic effects on hosts.
Originally used for global epidemiology and surveillance of recombinant pathogenic bacteria (27), the MLST approach defines a strain as a sequence type (ST) (equivalent to a haplotype) on the basis of a combination of alleles (i.e., allelic profile) at a sample of housekeeping genes. Similarity relationships among strains are described using the nucleotide sequence identity of alleles at each locus. The use of combinations of alleles as molecular markers to genotype strains means that recombination is not an impediment to strain characterization, and allelic information can be used to readily detect recombination between genes and rates of recombination within a population (17). Phylogenetic and other clustering algorithms can be applied to individual or concatenated gene data sets for further analyses of strain relationships, although caution in interpretation of phylogenetic relationships is necessary due to recombination issues.
The MLST scheme developed for Wolbachia indexes variation at five conserved genes (ftsZ, gatB, coxA, hcpA, and fbpA). The PCR primers used robustly amplify loci of strains belonging to supergroups A and B and potentially amplify loci of strains belonging to other supergroups as well. Specific primers for amplification of supergroup A and B strains coinfecting the same host individual are also described here. A profile database and an isolate database were generated to store and distribute the sequences and the strain and host information, respectively (22; http://pubmlst.org/wolbachia/). New strains and profiles can be submitted to the curator for inclusion in the databases, which thus represent a focal point for future storage and management of all Wolbachia strain data.
Given the current extensive use of wsp sequences for characterizing Wolbachia strains (25, 30, 36, 43, 53) and the suspected role of the Wolbachia surface protein in the host-symbiont interactions (1, 62), use of this gene as an additional optional strain marker is proposed. A WSP typing system was developed based on the amino acid sequences encoded by the four hypervariable regions (HVRs) of this gene (2), and a separate database for storing both nucleotide sequences of the wsp gene and amino acid sequences of single HVRs was generated. WSP typing is analogous to antigen protein typing used for pathogenic bacteria (33) and can complement the MLST information.
Thirty-seven strains representing supergroups A, B, D, and F from diverse singly infected host species were typed by using the MLST and WSP systems. MLST unambiguously identified, differentiated, and grouped genetically similar strains through comparisons of their allelic profiles. The MLST data revealed considerable diversity within the genus Wolbachia and also identified horizontal movement of strains between hosts and shifts in Wolbachia phenotypic effects on hosts (e.g., cytoplasmic and parthenogenesis induction).
|
|
|---|
Amplicons were generated using standard PCR conditions with HotStarTaq (QIAGEN) according to the manufacturer's recommendations and with each
64-fold degenerate primer at a concentration of 0.5 µM. The forward and reverse primers had 5' tags of M13 forward and reverse primer sequences, respectively. These tags served as anchors for the degenerate primers during amplification and were subsequently used for sequencing. Amplification reactions were initiated with 15 min of incubation at 95°C, followed by 50 cycles of 95°C for 15 s, 55°C for 30 s, and 72°C for 1 min. The amplification reaction mixtures (8 µl) were treated with 0.5 U shrimp alkaline phosphatase and 1.0 U exonuclease I (Amersham) in the buffer supplied. Sequencing reactions were performed at the J. Craig Venter Joint Technology Center, Rockville, MD (an affiliate of The Institute for Genomic Research [TIGR], Rockville, MD) with the M13 forward and reverse primers. Assembly was done with the TIGR assembler (47), and sequences were manually curated with the multiple-sequence editor Cloe (a multiple-sequence editor heavily integrated into the TIGR database). Alignments were initially generated using CLUSTALX (51) and then manually edited in MacClade 4.08 (http://macclade.org) and Bioedit v. 7.0.4.1 (14).
From the initial collection of 46 loci, five conserved genes were chosen for MLST on the basis of the following features: (i) presence throughout the sequenced Rickettsiales order; (ii) a single copy in the wMel genome (60); (iii) wide distribution in the wMel genome (Fig. 1); and (iv) evidence of strong stabilizing selection within the genus Wolbachia (average ratio of the number of nonsynonymous substitutions per nonsynonymous site to the number of synonymous substitutions per synonymous site [Ka/Ks], <<1). These criteria conform to the standard locus requirements for an MLST system (27, 52). The five genes chosen for the MLST scheme are gatB, coxA, hcpA, ftsZ, and fbpA (Table 1). Only one of these genes, ftsZ, was used in previous studies (59, 32). Briefly, the gatB gene encodes aspartyl/glutamyl-tRNA amidotransferase subunit B that synthesizes charged tRNAs in organisms lacking certain tRNA synthetases; coxA encodes a catalytic subunit of cytochrome oxidase of the respiratory chain; hcpA encodes a functionally uncharacterized protein whose length and amino acid sequence are highly conserved in many bacteria, which is designated NCBI COG0217 (49, 50); ftsZ encodes a protein involved in bacterial cell division; and fbpA encodes fructose-bisphosphate aldolase that in Wolbachia is likely to be involved in gluconeogenesis (8, 9). The surface protein WSP was included in this study as an additional marker for strain typing.
![]() View larger version (30K): [in a new window] |
FIG. 1. Map of the wMel chromosome, showing the locations of the five MLST loci and wsp (inner circle). The outer and middle circles show the open reading frames on the plus and minus chromosomal strands, respectively.
|
|
View this table: [in a new window] |
TABLE 1. MLST and wsp loci and primer features
|
PCR amplification was performed by using 20-µl mixtures containing each deoxynucleoside triphosphate at a concentration of 0.2 mM, 1.5 mM MgCl2, each primer at a concentration of 1 µM, 0.5 U Taq DNA polymerase, 1x PCR buffer, and 1 µl DNA. Reactions were initiated by incubation at 94°C for 2 min, which was followed by 36 cycles of 94°C for 30 s, the optimal annealing temperature (see below) for 45 s, and 72°C for 1.5 min, a final elongation step at 70°C for 10 min, and a final hold at 4°C. The optimal PCR annealing temperature was 53°C for hcpA, 54°C for both gatB and ftsZ, 55°C for coxA, and 59°C for fbpA and wsp. PCR products were purified using Montage PCR centrifugal filter devices (Millipore) and were bidirectionally sequenced. An internal fragment of each PCR product was specifically selected for MLST (Table 1). Ambiguous bases and single-point polymorphisms were assessed by performing a second amplification and resequencing. All MLST and wsp sequences are presented in the coding reading frame.
Given the great genetic diversity of Wolbachia strains, alternative primer amplification strategies may be required if the standard primers (Table 1) do not work. For instance, use of tags for the MLST primers can improve their performance. The 64-fold degenerate primers used initially as part of a larger study are also provided for four of the five genes at the MLST website. For PCR amplification these primers can be used singly or in combination with the MLST primers using a nested strategy. Alternative ftsZ primers, ftsZuniF and ftsZuniR, which were designed previously (26), can also be used. Additional primers to help with amplification of more divergent strains and supergroup A and B strains occurring in doubly infected individuals have been also designed. All the primers described above and the PCR protocols are posted at the MLST website (http://pubmlst.org/wolbachia/). As experience with MLST sequencing of divergent strains accumulates, the information will be provided at the website.
Wolbachia strains.
MLST and wsp primers were tested with 42 host species harboring single infections with strains belonging either to supergroup A, B, D, E, or F or to an unassigned supergroup (Table 2). Single-infection status was confirmed during sequencing. Wolbachia infection of the termite species Incisitermes snyderi was not reported previously, and the single-infection status of this relationship was assessed using supergroup A- and B-specific wsp primers (61). In addition, primer performance was evaluated for new species with previously unreported infections, including several species of the spider genus Agelenopsis, the hymenopteran Neivamyrmex opacithorax, the embiopteran Haploembia solieri, and unidentified species of the bristletail family Meinertellidae (Archaeognatha). In the latter case, Wolbachia 16S rRNA gene general primers were used as a control (56). In order to confirm the robustness and consistency of PCR results obtained in different laboratories, all primers were independently tested with a variable sample of strains by multiple laboratories. The supergroup A- and B-specific primers were tested with insects known to harbor double infections with supergroup A and B strains, including Nasonia longicornis (IV7), Nasonia giraulti (RV2), Nasonia vitripennis (R511), and Aedes albopictus; with insects singly infected with supergroup B strains, including N. vitripennis (4.9) and Protocalliphora sialia (NY); and with insects singly infected with supergroup A strains, including N. giraulti (16.2), N. longicornis (2.1), N. vitripennis (12.1), Drosophila ananassae (Hawaii), Drosophila simulans (wRi), D. simulans (wAu), and Drosophila melanogaster (wMel). Specific amplification of the group-specific sequences in doubly infected individuals (Nasonia species) was assessed by comparison with sequences obtained from experimentally separated cultures of the two strains. The primers were also tested with the following three N. longicornis strains carrying multiple supergroup A (A1A2) and/or B (B1B2) strains: N. longicornis strain PE 18bx5 (infection status, A1A2B1B2), strain PE 18bx4 (A1A2B1), and strain BTPE 18bx5 (A1A2B2). Electropherograms were inspected for the occurrence of double peaks to determine whether the use of the primers could reveal multiple infections by members of a single supergroup.
|
View this table: [in a new window] |
TABLE 2. Wolbachia strains analyzed
|
When the complete set of the five MLST gene sequences cannot be obtained for a strain, single gene alleles and partial MLST allelic profiles can be submitted to the database. Partial data provide useful allele diversity information, allowing the profile database to grow. A strain cannot be assigned to an ST until full MLST data are provided, but it can receive an identification number and all specific strain and host information can be submitted to the isolate database.
WSP typing.
In addition to the ST, each strain was characterized based on the amino acid motifs of the four HVRs of its WSP sequence. A relatively conserved set of amino acid motifs are present in each of the four HVRs, and there is shuffling of HVR motifs among strains (2). The use of WSP is analogous to the use of antigens for serotyping pathogenic bacteria (33) and is meant to be a complement to MLST, not a substitute for MLST. Specifically, each WSP amino acid sequence (amino acids 52 to 222 with respect to the wMel sequences) was partitioned into four consecutive sections whose breakpoints fell within conserved regions between the hypervariable regions (2), as follows: HVR1 (amino acids 52 to 84), HVR2 (amino acids 85 to 134), HVR3 (amino acids 135 to 185), and HVR4 (amino acids 186 to 222). Each section encompassed one of the four HVR motifs and a portion of the two conserved flanking regions. For simplicity here we refer to each section as an HVR. Each unique HVR haplotype was given a number. Each WSP sequence was thus defined as a combination of four numbers, its WSP profile, corresponding to the four HVR haplotypes.
Features of the Wolbachia databases.
The following three separate databases have been generated for storing and managing all Wolbachia data: the profile, isolate, and WSP databases. The profile database contains the MLST allele sequences, allelic profiles, and STs. The isolate database stores information about the strains and host strain biological information (e.g., taxonomy, natural history, collection date, and locality). Each strain is designated by an identification number and a code consisting of the first letter of the host genus name, followed by the first three letters of the host species name, underscore, a letter indicating the supergroup to which it belongs, underscore, and any further specific information about the strain needed for discrimination. For example, Dsim_A_wRi indicates the Wolbachia strain from D. simulans belonging to supergroup A found in Riverside County (CA) currently referred to as wRi. The isolate database allows storage of accession numbers for sequences other than the sequences for the MLST genes that are available for a specific strain. The WSP database stores both nucleotide and amino acid sequences for wsp alleles and HVRs. All of the databases are reciprocally linked and can be accessed at the Wolbachia MLST website (http://pubmlst.org/wolbachia/).
Analyses of genetic variation and recombination.
For each locus, the number of variable sites (VI), G+C content, level of nucleotide diversity per site (Pi), and Ka/Ks were calculated using DNAsp v. 4.0 (41). Significant differences in genetic divergence for loci for groups of strains having one identical allele were determined using
2 analysis.
Recombination analyses were performed using MaxChi (28) with single and combined data sets from each gene alignment. The parameters were set as follows: sequences were considered linear, the highest acceptable P value cutoff was 0.01, a Bonferroni correction was applied, consensus daughter sequences were found, gaps were included, different window sizes of variable sites were tested (10, 20, and 30 VI), and 1,000 permutations were performed. All recombination events detected by the program were visually inspected for the alignment, and only clearly recombinant strains were listed.
Tree reconstruction.
Strict maximum likelihood (ML) inference was used to reconstruct phylogenies based on single and concatenated MLST genes (PAUP v. 4.01). The appropriate model of evolution was estimated with Modeltest v. 3.06, and the best likelihood score was evaluated with the Akaike information criterion (37). The models selected were GTR+I+G for gatB, wsp, and the concatenated MLST gene data set and GTR+G for coxA, hcpA, ftsZ, and fbpA. Bayesian maximum likelihood inference and maximum parsimony were used to reconstruct phylogenies based on the concatenated MLST data set and wsp data. Bayesian analyses were performed using MrBayes, v. 3.1.1 (18), with 106 generations and a sample frequency of 100. The first 5,000 trees were discarded as burn-in, and 50% majority rule was applied to the remaining set of trees. Maximum parsimony bootstrap analyses were performed with PAUP v. 4.01 (48) using 1,000 replicates and 100 random-addition replicates per bootstrap replicate, and a 50% majority rule bootstrap was applied.
A neighbor-joining tree was reconstructed from the 37 MLST allelic profiles using the START2 program (23). An unweighted pair group method with arithmetic mean dendrogram was reconstructed from the matrix of pairwise similarities for the 36 WSP profiles (START2 program). START2 and a suite of publicly available bioinformatics software for MLST data analysis can be found at the MLST home page (http://pubmlst.org/).
Analysis of congruence between trees.
We used the Shimodaira-Hasegawa test (42) to evaluate the significance of topological incongruence among the ML trees based on the five MLST genes and the concatenated data set. The lnL scores and significance values for nucleotide substitution were optimized separately for all topologies considered. Differences in lnL scores between the best ML topology and an alternative topology were calculated for each data set. The significance of these differences was measured using a bootstrap approach with RELL and full optimization for 1,000 replicates (PAUP v. 4.01).
Nucleotide sequence accession numbers.
All MLST and wsp sequences generated in this study have been deposited in the GenBank database under accession numbers DQ842268 to DQ842486.
|
|
|---|
The primers were sufficiently degenerate to also amplify divergent strains not belonging to supergroups A and B, such as Wolbachia strains from the nematode B. malayi (supergroup D) and the bed bug Cimex lectularius (supergroup F). For other divergent strains the primers worked well for only some of the loci. All loci except ftsZ were amplified from Wolbachia strains in the termite Zootermopsis angusticollis (supergroup H) and the flea Ctenocephalides felis. All loci except coxA were amplified from Wolbachia strains in the pseudoscorpion Cordylochernes scorpioides and the collembolan Folsomia candida (supergroup E). Finally, hcpA, fbpA, and ftsZ were amplified from Wolbachia strains in the termite Coptotermes acinaciformis (supergroup F). The wsp primers worked with all the strains mentioned above except the strain in F. candida. All species whose infection status was previously unknown gave results consistent with 16S rRNA gene results. Only for the embiopteran H. solieri were the results discordant; this insect tested positive for infection with a Wolbachia-specific 16S rRNA gene, but there was no amplification with MLST and wsp primers.
When distinct supergroup A and B strains coinfected the same individuals, the MLST supergroup A- and B-specific primers successfully amplified the target supergroup A and B type sequences from all the doubly and singly infected species examined. Group-specific MLST primers also amplified both supergroup A and B strain products from the three N. longicornis strains carrying multiple supergroup A and B strains. Analysis of the electropherograms showed unambiguous double peaks for the coxA, gatB, and wsp genes, revealing the multiple infections for each group. Based on the electropherogram pattern of polymorphic sites, specific restriction enzymes were used to digest the PCR products, and the results confirmed the presence of multiple sequence products (data not shown). This illustrates how the MLST gene set can be used to detect multiple infections in individual hosts.
Allelic variation at MLST loci.
A total of 37 strains were characterized by MLST, including 35 strains belonging to supergroup A or B plus strains from B. malayi (supergroup D) and C. lectularius (supergroup F) (Table 2). In the 37 strains analyzed, there were 25 ftsZ alleles, 26 gatB alleles, 27 coxA alleles, 28 fbpA alleles, and 31 hcpA alleles (Table 3). Comparative analysis of the genetic divergence across loci indicated that ftsZ was the least polymorphic locus, with only 24.8% of the sites showing variation (VI), the lowest number of alleles (25), and the lowest level of nucleotide diversity per site (Pi, 0.065). The most polymorphic locus was hcpA, with 36.5% VI and the highest number of alleles (31). The greatest nucleotide diversity per site was found in fbpA (Pi, 0.092). All genes showed an average Ka/Ks of <<1, indicating that the genes were subject to stabilizing selection, conforming to the general requirements for MLST loci. Comparison of the genetic polymorphism between supergroups A and B at the five loci (Table 3) led to three main observations: (i) the percentage of VI for all five loci was always higher in supergroup B than in supergroup A; (ii) the number of alleles was generally higher for supergroup B (except for gatB); and (iii) the Ka/Ks was consistently <<1 for both supergroups.
|
View this table: [in a new window] |
TABLE 3. Genetic features and occurrence of recombination at the five MLST loci
|
As a likely result of recombination both within and outside the gene boundaries (intra- and intergenic recombination, respectively), some strains were extremely similar at one locus and highly divergent at another. This disparity in divergence was visualized by plotting the genetic distances for loci for pairs of strains that were identical at one or two loci (Fig. 2). The levels of genetic divergence at nonidentical loci estimated for pairs of strains that exhibited identity at gatB were remarkably high in some cases (Fig. 2). For instance, for Ostrinia scapulalis and N. vitripennis (B) there was about 11% divergence at fbpA. Similar values were found for O. scapulalis and Acraea encedon, for Telogryllus taiwanemma and N. vitripennis(B), for T. taiwanemma and A. encedon, for N. vitripennis(B) and Chelymorpha alternans, and for N. vitripennis(B) and A. encedon. These values are significantly greater than the average level of divergence for this locus (2.7%) based on the pairs of strains with identity at another locus (P < 0.05). Conflicting relationships among strains could also be visualized (Fig. 2); N. vitripennis B and C. alternans were very similar based on their coxA alleles (0.2% divergence) but were very divergent based on their fbpA alleles (11.4%). Conversely, N. vitripennis(B) and A. encedon were very similar based on their fbpA alleles (0.2%) but were very divergent based on their coxA alleles (8.2%). These inversions of relationships indicate that there was recombination among divergent strains.
![]() View larger version (33K): [in a new window] |
FIG. 2. Pairwise divergence at MLST loci. Thirty-three strain pairs identical at one or two loci were used to show the variation in similarity across loci. The most striking pattern was found for strains identical at gatB (vertical bar); some of the strains showed remarkable divergence in the fbpA and coxA alleles, likely associated with recombination events. The arrows indicate examples of contradictory relationships inferred from the results for the five genes involving N. vitripenni (B), C. alternans, and A. encedon strains (values are circled; see text for explanation).
|
100%. However, based on coxA phylogeny, the E. formosa strain clustered with the Acraea eponina strain (data not shown) and was separated from the P. sialia (B1) strain by a remarkable genetic distance (Fig. 2). Consistent with the effects of recombination, these two strains also did not cluster based on the wsp phylogeny (Fig. 4). Hence, the concatenated gene approach alone can be useful for identifying very closely related strains and to resolve major supergroups. However, it cannot be used to interpret more distant phylogenetic relationships within a supergroup, even when clades are highly supported, because of artifacts resulting from recombination among genes. Combined analyses using allelic profiles, single-gene phylogenies, and concatenated gene phylogenies are advisable for any inference of strain relationships. |
View this table: [in a new window] |
TABLE 4. Results of a Shimodaira-Hasegawa test of alternative tree topologies for the WolbachiaMLST genes
|
![]() View larger version (21K): [in a new window] |
FIG. 3. Bayesian likelihood inference phylogeny based on the concatenated data set for the five MLST loci (37 strains, 2,079 bp). Groups of strains sharing at least three MLST alleles are highlighted. Arrows and asterisks indicate two examples of strain pairs whose predicted relationships are highly discordant with the wsp phylogeny (Fig. 4). Posterior probability (left) and parsimony bootstrap (right) values are indicated at major nodes if they were supported by both clustering algorithms.
|
![]() View larger version (21K): [in a new window] |
FIG. 4. Bayesian likelihood inference phylogeny based on wsp sequences of the 36 strains analyzed by MLST (without the T. deion strain). ST complexes and strains with identical STs identified by MLST are highlighted. Arrows and asterisks indicate two examples of strain pairs whose predicted relationships are highly discordant with the MLST phylogeny (Fig. 4). The supergroup placement of the A. sparsa and S. invicta strains based on MLST phylogeny (in parentheses) (Fig. 3) is not supported by the wsp-based phylogeny. Posterior probability (left) and parsimony bootstrap (right) values are shown at major nodes if they were supported by both clustering algorithms.
|
![]() View larger version (25K): [in a new window] |
FIG. 5. Neighbor-joining tree based on the MLST allelic profiles (right columns) of 37 Wolbachia strains. ST complexes and strains with identical STs are highlighted (STs shared by strains are indicated by arrowheads). Only closely related strains predicted by MLST were also predicted by the concatenated MLST phylogeny (Fig. 3), WSP profiles (Fig. 6), and wsp phylogeny (Fig. 4).
|
A total of three ST complexes were identified based on the criteria used (three shared alleles with a central ST) (Fig. 5). The ST-13 complex included supergroup A strains present in Drosophila species and the parasitoid wasp N. longicornis (A) (three shared alleles). The five strains differed by only five nucleotide polymorphisms based exclusively on the concatenated alignment of the five MLST genes (2,079 bases). The central ST (ST-13) was identified as the Drosophila recens ST based on BURST analysis. The other two complexes included only two strains each and were temporarily designated ST-a and ST-b, until additional STs that allow identification of the central ST become available. ST-a included two supergroup B strains, the D. simulans wNo strain and the D. simulans wMa strain (three shared alleles, three polymorphic sites), and ST-b included two supergroup A strains, the Drosophila orientacea and Drosophila neotestacea strains (four shared alleles, one polymorphic site).
WSP characterization.
The HVRs of the WSP protein were employed as an additional, optional marker to assess strain diversity. A new method for representing WSP sequence relationships is proposed here, which can be used in combination with standard distance-based methods. This method takes into account the extensive intragenic shuffling of the four HVRs (2) and can be used to readily detect allele shuffling based on unique identifiers for each HVR. Since the majority of amino acid changes among WSP sequences occur in the four HVRs, these motifs can be used as signatures for discrimination of the different WSP protein types (see Materials and Methods).
The sample of strains characterized by MLST was also typed by WSP, with the exception of the T. deion strain, for which amplification of wsp failed. Figure 6 shows the WSP profile, a combination of the four HVR amino acid haplotypes. For 36 strains, 33 unique WSP profiles were identified. Five groups of strains shared at least three HVR haplotypes (Fig. 6), and three of them had identical WSP profiles. Nine of the 33 WSP profiles (Fig. 6) did not share any haplotype with any other WSP profile. The overall numbers of unique haplotypes per HVR were similar, ranging from 24 to 27. However, the number of nonunique haplotypes within an HVR (i.e., haplotypes that occurred in multiple WSP profiles) varied for the four HVRs; there were only three haplotypes for HVR1 (haplotypes 1, 10, and 17), compared to seven haplotypes for HVR2, six haplotypes for HVR3, and nine haplotypes for HVR4. Motif shuffling was apparent in the data set. Indeed, several strains exhibited identity at individual HVRs alternatively with different sets of strains. For example, N. giraulti (A) shared haplotype 1 in HVR1 with wMel, N. longicornis (A), D. simulans wAu, D. recens, Drosophila innubila (A), D. orientacea, D. neotestacea, and A. albopictus. However, in HVR2 it had a unique haplotype sequence (haplotype 23), while in HVR3 it had the same haplotype sequence (haplotype 15) as D. simulans wRi; finally, in HVR4 it had the same haplotype sequence (haplotype 25) as Camponotus pennsylvanicus.
![]() View larger version (27K): [in a new window] |
FIG. 6. Unweighted pair group method with arithmetic mean dendrogram based on the WSP profiles (right columns) of 36 Wolbachia strains, corresponding to the same sample typed by MLST (without the T. deion strain). Groups of strains sharing at least three HVR haplotypes are highlighted. In some cases groups of strains identified by MLST (Fig. 5, highlighted strains) were also closely related based on their WSP profiles. In strains belonging to the ST-13 complex there was a clear recombination signature in HVR4, which resulted in remarkable genetic divergence in this region among the five strains.
|
As found with the MLST profiles, no identical HVRs were observed for the supergroup A and B strains, although some HVRs were closely related. Nine of 16 supergroup A strains had the same haplotype sequence in HVR1 (haplotype 1). Two strains, from the tortoise beetle Acromis sparsa and the ant Solenopsis invicta, had four unique HVR haplotype sequences. Both strains belonged to supergroup A, as supported by phylogenetic analyses based on MLST data (both the posterior probability and bootstrap values were 100) (Fig. 3) and MLST profiles (both strains exhibited identity at least at one allele with another supergroup A strain) (Fig. 5). However, based on the wsp gene phylogeny (Fig. 4), the strain from A. sparsa was excluded from both supergroup A and supergroup B and formed a divergent lineage, while the strain from S. invicta clustered significantly in supergroup B (both the posterior probability and bootstrap value were >90). Based on MaxChi analyses, the A. sparsa wsp sequence was the product of recombination between the supergroup A wsp-type sequence of Ephestia kuehniella (A) and the supergroup B-type sequence of E. formosa (P < 0.001). In the case of the Wolbachia strain from S. invicta, lateral transfer of the whole wsp gene from a supergroup B-type donor strain and recombination may explain the contradictory clustering inferred from the MLST and WSP data sets (Fig. 3 and 4).
Overall, there was strong incongruence between MLST- and WSP-based relationships for both subgroup and supergroup strain assignment, indicating the need for a multilocus approach.
|
|
|---|
MLST is also an important source of sequence data for extensive comparative genetics, providing a platform for investigating molecular evolutionary processes in these intracellular bacteria.
Results reported in this study reveal that (i) the MLST primers work reliably for the two major supergroups of Wolbachia found in arthropods (supergroups A and B), as well as with other groups (however, additional flanking primers are provided at the MLST website); (ii) there is considerable allelic diversity in genes and among Wolbachia strains, with recombination occurring both within and between the MLST genes; (iii) MLST can identify and discriminate closely related strains using chromosomal genes; (iv) closely related strains detected by MLST (strains that share a minimum of three alleles in their MLST profiles) can show remarkable divergence at WSP as a result of recombination at the wsp gene; (v) strains with very similar WSP can have highly divergent MLST profiles; and (vi) strains closely related based on MLST can produce very different host effects.
Discrimination and grouping of closely related strains.
One of the challenges in the study of Wolbachia is the difficulty encountered when workers attempt to infer similarities among strains. Because of horizontal genetic exchange, Wolbachia genes do not evolve in a bifurcating, tree-like fashion. Thus, groups of truly derived strains cannot be recovered with traditional cladistic methods. Previous studies have inferred phylogenetic relatedness among strains based on similarity or identity at one or two genes, for instance, ftsZ (59) or wsp (43, 62). The data presented here demonstrate that because of recombination, identity at single genes is not sufficient for reliable strain typing and that the high level of support for a clade in a concatenated phylogeny could be biased by the unequal divergence of loci, masking single-locus conflicts. Because MLST does not describe strain relationships in terms of the level of homology among sequences (i.e., phylogenetically) but describes them in terms of identity versus nonidentity among alleles (i.e., similarity), it avoids false inferences due to recombination issues. The ST tree in Fig. 2 provides a schematic representation of relationships among allelic profiles and can be combined with phylogenetic and other clustering algorithm analyses based on the sequences in individual or concatenated MLST data sets.
Based on the data set analyzed, only STs sharing three of five identical MLST loci (designated an ST complex) show consistent similarity at all the remaining loci. Strains identical at three alleles are likely to have a recent common ancestor and thus represent a phylogenetic cluster of recently diverged strains. Preliminary analyses of an additional 28 chromosomal genes for the members of three complexes (unpublished data) confirmed their consistent relatedness, indicating that MLST profiles can be used to predict similarity at other loci. Furthermore, no recombination was found between members of the same ST complex at MLST genes. In contrast, recombination at MLST genes was detected between strains identical at one or two MLST alleles. This indicates that grouping strains based on at least three shared alleles at MLST loci can form the basis of a reliable system for assessing actual similarity at chromosomal genes among strains. However, as more allelic profiles become available, it should be possible to test this observation more extensively.
Wolbachia researchers have used the gene wsp extensively to determine relationships among strains. However, using our data set, we found several instances in which strains closely related based on MLST data are very divergent based on WSP data and strains that have similar WSP differ substantially based on MLST data. For instance, members of the ST-13 complex showed extensive divergence in HVR4 of WSP, as a result of recombination. However, when phylogenetic methods were applied to this recombinant data set, a recombination signature in HVR4 was masked by a high level of similarity in the first three HVRs and the five strains apparently formed a strong cluster (posterior probability, 100%; bootstrap support, 100%) (Fig. 4). As an example of the opposite pattern, highly similar WSP sequences were shared by strains found in N. vitripennis (B) and E. kuehniella (B) (1% amino acid divergence), whose FbpA and HcpA protein sequences differ substantially (8 and 5% divergence, respectively). Such findings further support the conclusion that wsp alone is an unreliable predictor of either strain similarity or divergence.
Overall, the results show that the MLST system described here provides an excellent method for typing Wolbachia strains from diverse hosts and also for discriminating among strains in the same host species (e.g., D. simulans strains). It should be noted that identity in MLST genes does not mean that two strains are identical throughout their genomes. Even within a host species, where an ancestral strain can be derived from a relatively recent ancestor, variation within the genome can accumulate, particularly in quickly evolving or highly recombinant genes, such as mobile elements and/or ankyrin repeat domains. This has been shown for strains found in C. pipiens and Drosophila species (19, 44). Discrimination of substrains within the same ST requires the use of additional specific polymorphic markers (39, 44). The MLST database can accommodate this additional information through links that can be accessed through stored accession numbers.
Detection and typing of multiple coinfecting strains.
Double Wolbachia infections are common in insects (24, 55), and only in some circumstances can they be separated experimentally (34). MLST primers for typing supergroup A and B Wolbachia strains from individuals that are doubly infected have been designed to address this issue. However, PCR-based systems like MLST and WSP typing are constrained for examination of double infections as the different alleles cannot always be reliably assigned to specific strains coinfecting the same individual. A preliminary analysis using a small sample of multiply and singly infected species gave positive results. Primer performance, however, should be verified on a larger scale. Use of the group-specific primers for MLST requires workers to assume that there have not been reciprocal gene exchanges between the strains. Therefore, MLST for typing hosts that are doubly infected with supergroup A and B Wolbachia strains should not be used unless reliable supergroup A- and B-specific alleles can be identified for each of the MLST genes. Allelic profiles for doubly infected individuals are given the status "candidate" STs in the MLST database until the strains can be experimentally separated (e.g., by partial antibiotic curing or segregation by other means).
In some cases host species are coinfected with very closely related strains that belong to the same supergroup. These strains can occur in double or multiple infections and are coded A1A2, B1B2, etc. Because of the high levels of genetic similarity among the strains, these multiple infections are normally hard to detect, and when they are detected, the specific amplification and correct assignment of sequences to one of the strains are often difficult. Such multiple infections may be more common than has been assumed, and providing an estimate of their global frequency in host species represents an important first step. Based on a preliminary screening of a small sample of N. longicornis strains doubly infected with supergroup A and/or B strains using MLST group-specific primers, the multiple peaks in the electropherograms of some of the MLST gene products can be read as a signature of the potential presence of multiple strain variants. However, additional analyses are necessary to verify the actual existence of multiple strains.
Relating MLST to Wolbachia biology.
Even with the limited number of strains analyzed here, several patterns regarding the transmission of Wolbachia are apparent. Based on both MLST and WSP data, different host organisms were found to harbor very similar or identical Wolbachia strains; these organisms include the wasp species N. vitripennis (A) and M. uniraptor; the mosquito subspecies C. pipiens pipiens and C. pipiens quinquefasciatus; the fly species D. melanogaster, D. simulans wAu, D. recens, and D. innubila, as well as the wasp species N. longicornis; and the fly species D. neotestacea and D. orientacea. The strain similarities may have been due to recent horizontal transmission of Wolbachia across host species or, in the case of the closely related species D. neotestacea and D. orientacea, to codivergence and coevolution of Wolbachia strains and their hosts. Because genetically similar strains are often found in similar insect hosts, ecological interactions among hosts may have mediated such horizontal transfers. In particular, it appears that horizontal transmission of Wolbachia has occurred quite recently and frequently within the genus Drosophila.
Many of the reproductive phenotypes of the strains analyzed in this study remain unknown, and the possibility that there is a correlation between STs and/or WSP profiles and the strain phenotype cannot be eliminated yet. Based on the data set analyzed, genetically similar strains found in different host species were often associated with distinct host reproductive manipulations (Table 2). For example, strains found in N. vitripennis (A) and M. uniraptor, which have identical ST and very similar WSP profiles, are responsible for inducing cytoplasmic incompatibility (5) and parthenogenesis (45), respectively, in their hosts. Similarly, based on both MLST and WSP data the D. simulans wAu and wasp N. longicornis (A) strains are very closely related but are responsible for different patterns of cytoplasmic incompatibility (5, 29, 58). Based on these observations, it is now relevant to investigate the genetic changes in these closely related strains that may be involved in the phenotypic shift. In addition, as future studies clarify the specific role of wsp in host-parasite interactions, information on amino acid motifs in HVRs may prove to be useful. As more allelic profiles become available for strains that are found in hosts belonging to different taxa, occur in different geographic regions, and are subjected to different phenotypes, the potential of the MLST database for revealing ecological, evolutionary, and functional patterns should increase.
This work was supported by funds from U.S. National Science Foundation grant EF-0328363 to J.H.W. Development of the database software was funded by the Wellcome Trust and the European Union.
Published ahead of print on 25 August 2006. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»