Multilocus Sequence Typing System for the Endosymbiont Wolbachia pipientis

ABSTRACT The eubacterial genus Wolbachia comprises one of the most abundant groups of obligate intracellular bacteria, and it has a host range that spans the phyla Arthropoda and Nematoda. Here we developed a multilocus sequence typing (MLST) scheme as a universal genotyping tool for Wolbachia. Internal fragments of five ubiquitous genes (gatB, coxA, hcpA, fbpA, and ftsZ) were chosen, and primers that amplified across the major Wolbachia supergroups found in arthropods, as well as other divergent lineages, were designed. A supplemental typing system using the hypervariable regions of the Wolbachia surface protein (WSP) was also developed. Thirty-seven strains belonging to supergroups A, B, D, and F obtained from singly infected hosts were characterized by using MLST and WSP. The number of alleles per MLST locus ranged from 25 to 31, and the average levels of genetic diversity among alleles were 6.5% to 9.2%. A total of 35 unique allelic profiles were found. The results confirmed that there is a high level of recombination in chromosomal genes. MLST was shown to be effective for detecting diversity among strains within a single host species, as well as for identifying closely related strains found in different arthropod hosts. Identical or similar allelic profiles were obtained for strains harbored by different insect species and causing distinct reproductive phenotypes. Strains with similar WSP sequences can have very different MLST allelic profiles and vice versa, indicating the importance of the MLST approach for strain identification. The MLST system provides a universal and unambiguous tool for strain typing, population genetics, and molecular evolutionary studies. The central database for storing and organizing Wolbachia bacterial and host information can be accessed at http://pubmlst.org/wolbachia/ .

In the past 10 years there has been increasing interest in the maternally inherited Wolbachia endosymbionts because of their remarkably widespread distribution and significant impact on the ecology, evolution, and reproductive biology of their host species (46,54). Approximately 20 to 75% of all insect species harbor Wolbachia (20,55), as do many arachnids and terrestrial crustaceans (7,11,12,40). Individual insects can be infected with multiple Wolbachia strains (24,55,58), and geographically distinct populations of the same species can harbor different strains (29,39). Outside the phylum Arthropoda, high infection levels have also been detected in the vast majority of pathogenic filarial nematodes (4). Overall, the extraordinary infection frequency among insects alone places members of the genus Wolbachia among the most widespread intracellular bacteria described thus far (55,56). Wolbachia strains are typically vertically transmitted within a species through the cytoplasm of eggs (maternal inheritance). However, the wide range of infected hosts cannot be explained by vertical transmission alone, and the ability of these bacteria to spread to new host species via horizontal transfer accounts for the pandemic distribution, although the mechanisms and patterns of interspecies transfer are not well understood (59).
The importance of Wolbachia strains ranges from their effects on the reproductive biology, ecology, and evolution of their hosts to their potential use in biological control of pest insects and biomedical applications (35,54,61). Wolbachia strains can extensively manipulate host reproduction by means such as inducing parthenogenesis, feminizing genetic males, killing male embryos, or causing cytoplasmic incompatibility between gametes (16,45,46). In filarial nematodes, the host dependence on Wolbachia for propagation has generated great interest in the medical field for targeting these bacteria in the treatment of filariasis (10,38).
Wolbachia strains were first found in the mosquito species Culex pipiens and designated Wolbachia pipientis (15). These bacteria were subsequently found to be widely distributed in arthropods. However, additional species have not been named, pending improved understanding of the genetic variation of this genus (6,26). At present, supergroup designations are routinely used to describe the major phylogenetic subdivisions of this bacterial group. So far, eight supergroups have been designated (supergroups A to H) primarily on the basis of data from the 16S rRNA, ftsZ, and wsp (Wolbachia surface protein [WSP]) genes (31,59,62). Most of the supergroups are found in arthropods (supergroups A, B, E, F, G, and H), and the majority of insect Wolbachia strains belong to supergroups A and B. Fully annotated genomes of the following two Wolbachia strains are available: wMel, a supergroup A strain from the arthropod host Drosophila mela-nogaster (60), and wBm, a supergroup D strain from the nematode host Brugia malayi (9).
Despite increasing studies of Wolbachia and a growing list of infected arthropod species, researchers still lack a standard and reliable system for typing and quantifying strain diversity. A strain typing system was previously developed using the Wolbachia surface protein gene, wsp (62). However, evidence of extensive recombination (2,57) and strong diversifying selection (1,21) at this gene make it unsuitable for reliable strain characterization if it is used alone. Furthermore, recent detection of recombination both within and between Wolbachia genes (3) suggests that a single-locus approach to strain characterization may be misleading. A multigene typing approach has recently been used to type Wolbachia strains (32). This study focused on strains found in Drosophila hosts and demonstrated the utility of multigene typing for Wolbachia strains.
Here we developed a standard multilocus sequence typing (MLST) system for Wolbachia, which works broadly with Wolbachia strains found in diverse singly infected host species. This system includes a web-accessible central database and toolkit for storage, organization, and analysis of data. The data are fully portable between laboratories and allow extensive comparative analyses (22). This accurate strain typing system should provide the genetic framework for tracing the movement of Wolbachia globally and within insect communities and for associating Wolbachia strains with geographic regions, host features (e.g., ecology and phylogeny), and phenotypic effects on hosts.
Originally used for global epidemiology and surveillance of recombinant pathogenic bacteria (27), the MLST approach defines a strain as a sequence type (ST) (equivalent to a haplotype) on the basis of a combination of alleles (i.e., allelic profile) at a sample of housekeeping genes. Similarity relationships among strains are described using the nucleotide sequence identity of alleles at each locus. The use of combinations of alleles as molecular markers to genotype strains means that recombination is not an impediment to strain characterization, and allelic information can be used to readily detect recombination between genes and rates of recombination within a population (17). Phylogenetic and other clustering algorithms can be applied to individual or concatenated gene data sets for further analyses of strain relationships, although caution in interpretation of phylogenetic relationships is necessary due to recombination issues.
The MLST scheme developed for Wolbachia indexes variation at five conserved genes (ftsZ, gatB, coxA, hcpA, and fbpA). The PCR primers used robustly amplify loci of strains belonging to supergroups A and B and potentially amplify loci of strains belonging to other supergroups as well. Specific primers for amplification of supergroup A and B strains coinfecting the same host individual are also described here. A profile database and an isolate database were generated to store and distribute the sequences and the strain and host information, respectively (22; http://pubmlst.org/wolbachia/). New strains and profiles can be submitted to the curator for inclusion in the databases, which thus represent a focal point for future storage and management of all Wolbachia strain data.
Given the current extensive use of wsp sequences for characterizing Wolbachia strains (25,30,36,43,53) and the suspected role of the Wolbachia surface protein in the host-sym-biont interactions (1,62), use of this gene as an additional optional strain marker is proposed. A WSP typing system was developed based on the amino acid sequences encoded by the four hypervariable regions (HVRs) of this gene (2), and a separate database for storing both nucleotide sequences of the wsp gene and amino acid sequences of single HVRs was generated. WSP typing is analogous to antigen protein typing used for pathogenic bacteria (33) and can complement the MLST information.
Thirty-seven strains representing supergroups A, B, D, and F from diverse singly infected host species were typed by using the MLST and WSP systems. MLST unambiguously identified, differentiated, and grouped genetically similar strains through comparisons of their allelic profiles. The MLST data revealed considerable diversity within the genus Wolbachia and also identified horizontal movement of strains between hosts and shifts in Wolbachia phenotypic effects on hosts (e.g., cytoplasmic and parthenogenesis induction).

MATERIALS AND METHODS
Choice of loci. The choice of the genes used for the MLST system was facilitated by an ongoing project to characterize genetic diversity in Wolbachia. A total of 46 unique loci were amplified from a subset of more than 50 Wolbachia strains belonging to supergroups A, B, C, D, E, F, and H and sequenced. DNA was extracted from either a whole individual, the abdomen, or the gonads, depending on the size of the specimen, using a DNAeasy tissue kit (QIAGEN, Germantown, MD).
Amplicons were generated using standard PCR conditions with HotStarTaq (QIAGEN) according to the manufacturer's recommendations and with each ϳ64-fold degenerate primer at a concentration of 0.5 M. The forward and reverse primers had 5Ј tags of M13 forward and reverse primer sequences, respectively. These tags served as anchors for the degenerate primers during amplification and were subsequently used for sequencing. Amplification reactions were initiated with 15 min of incubation at 95°C, followed by 50 cycles of 95°C for 15 s, 55°C for 30 s, and 72°C for 1 min. The amplification reaction mixtures (8 l) were treated with 0.5 U shrimp alkaline phosphatase and 1.0 U exonuclease I (Amersham) in the buffer supplied. Sequencing reactions were performed at the J. Craig Venter Joint Technology Center, Rockville, MD (an  affiliate of The Institute for Genomic Research [TIGR], Rockville, MD) with the M13 forward and reverse primers. Assembly was done with the TIGR assembler (47), and sequences were manually curated with the multiple-sequence editor Cloe (a multiple-sequence editor heavily integrated into the TIGR database). Alignments were initially generated using CLUSTALX (51) and then manually edited in MacClade 4.08 (http://macclade.org) and Bioedit v. 7.0.4.1 (14).
From the initial collection of 46 loci, five conserved genes were chosen for MLST on the basis of the following features: (i) presence throughout the sequenced Rickettsiales order; (ii) a single copy in the wMel genome (60); (iii) wide distribution in the wMel genome (Fig. 1); and (iv) evidence of strong stabilizing selection within the genus Wolbachia (average ratio of the number of nonsynonymous substitutions per nonsynonymous site to the number of synonymous substitutions per synonymous site [K a /K s ], Ͻ Ͻ1). These criteria conform to the standard locus requirements for an MLST system (27,52). The five genes chosen for the MLST scheme are gatB, coxA, hcpA, ftsZ, and fbpA (Table 1). Only one of these genes, ftsZ, was used in previous studies (59,32). Briefly, the gatB gene encodes aspartyl/glutamyl-tRNA amidotransferase subunit B that synthesizes charged tRNAs in organisms lacking certain tRNA synthetases; coxA encodes a catalytic subunit of cytochrome oxidase of the respiratory chain; hcpA encodes a functionally uncharacterized protein whose length and amino acid sequence are highly conserved in many bacteria, which is designated NCBI COG0217 (49,50); ftsZ encodes a protein involved in bacterial cell division; and fbpA encodes fructose-bisphosphate aldolase that in Wolbachia is likely to be involved in gluconeogenesis (8,9). The surface protein WSP was included in this study as an additional marker for strain typing.
MLST and wsp primer design and amplification. For each gene, primers were designed manually using sequence alignments that were generated from the initial sample of more than 50  PCR amplification was performed by using 20-l mixtures containing each deoxynucleoside triphosphate at a concentration of 0.2 mM, 1.5 mM MgCl 2 , each primer at a concentration of 1 M, 0.5 U Taq DNA polymerase, 1ϫ PCR buffer, and 1 l DNA. Reactions were initiated by incubation at 94°C for 2 min, which was followed by 36 cycles of 94°C for 30 s, the optimal annealing temperature (see below) for 45 s, and 72°C for 1.5 min, a final elongation step at 70°C for 10 min, and a final hold at 4°C. The optimal PCR annealing temperature was 53°C for hcpA, 54°C for both gatB and ftsZ, 55°C for coxA, and 59°C for fbpA and wsp. PCR products were purified using Montage PCR centrifugal filter devices (Millipore) and were bidirectionally sequenced. An internal fragment of each PCR product was specifically selected for MLST (Table 1). Ambiguous bases and single-point polymorphisms were assessed by performing a second amplification and resequencing. All MLST and wsp sequences are presented in the coding reading frame.
Given the great genetic diversity of Wolbachia strains, alternative primer amplification strategies may be required if the standard primers (Table 1) do not work. For instance, use of tags for the MLST primers can improve their performance. The 64-fold degenerate primers used initially as part of a larger study are also provided for four of the five genes at the MLST website. For PCR ampli-fication these primers can be used singly or in combination with the MLST primers using a nested strategy. Alternative ftsZ primers, ftsZuniF and ftsZuniR, which were designed previously (26), can also be used. Additional primers to help with amplification of more divergent strains and supergroup A and B strains occurring in doubly infected individuals have been also designed. All the primers described above and the PCR protocols are posted at the MLST website (http: //pubmlst.org/wolbachia/). As experience with MLST sequencing of divergent strains accumulates, the information will be provided at the website.
Wolbachia strains. MLST and wsp primers were tested with 42 host species harboring single infections with strains belonging either to supergroup A, B, D, E, or F or to an unassigned supergroup ( Table 2). Single-infection status was confirmed during sequencing. Wolbachia infection of the termite species Incisitermes snyderi was not reported previously, and the single-infection status of this relationship was assessed using supergroup A-and B-specific wsp primers (61). In addition, primer performance was evaluated for new species with previously unreported infections, including several species of the spider genus Agelenopsis, the hymenopteran Neivamyrmex opacithorax, the embiopteran Haploembia solieri, and unidentified species of the bristletail family Meinertellidae (Archaeognatha). In the latter case, Wolbachia 16S rRNA gene general primers were used as a control (56). In order to confirm the robustness and consistency of PCR results obtained in different laboratories, all primers were independently tested with a variable sample of strains by multiple laboratories. The supergroup A-and B-specific primers were tested with insects known to harbor double infections with supergroup A and B strains, including Nasonia longicornis (IV7), Nasonia giraulti (RV2), Nasonia vitripennis (R511), and Aedes albopictus; with insects singly infected with supergroup B strains, including N. vitripennis (4.9) and Protocalliphora sialia (NY); and with insects singly infected with supergroup A strains, including N. giraulti (16.2), N. longicornis (2.1), N. vitripennis (12.1), Drosophila ananassae (Hawaii), Drosophila simulans (wRi), D. simulans (wAu), and Drosophila melanogaster (wMel). Specific amplification of the group-specific sequences in doubly infected individuals (Nasonia species) was assessed by comparison with sequences obtained from experimentally separated cultures of the two strains. The primers were also tested with the following three N. longicornis strains carrying multiple supergroup A (A1A2) and/or B (B1B2) strains: N. longicornis strain PE 18bx5 (infection status, A1A2B1B2), strain PE 18bx4 (A1A2B1), and strain BTPE 18bx5 (A1A2B2). Electropherograms were inspected for the occurrence of double peaks to determine whether the use of the primers could reveal multiple infections by members of a single supergroup.
MLST. Following MLST convention (27,52), identical nucleotide sequences at a given locus for different strains were assigned the same arbitrary allele number (i.e., each allele has a unique identifier). Each strain was then characterized by the combination of the five MLST allelic numbers, representing its allelic profile. Each unique allelic profile was assigned an ST. Ultimately, an ST characterizes a strain. We defined an ST complex as a group of STs sharing a minimum of three alleles with a central ST. The complex is designated by the  central ST, which is identified as the group founder based on BURST analyses implemented in START2 (23). When the complete set of the five MLST gene sequences cannot be obtained for a strain, single gene alleles and partial MLST allelic profiles can be submitted to the database. Partial data provide useful allele diversity information, allowing the profile database to grow. A strain cannot be assigned to an ST until full MLST data are provided, but it can receive an identification number and all specific strain and host information can be submitted to the isolate database.
WSP typing. In addition to the ST, each strain was characterized based on the amino acid motifs of the four HVRs of its WSP sequence. A relatively conserved set of amino acid motifs are present in each of the four HVRs, and there is shuffling of HVR motifs among strains (2). The use of WSP is analogous to the use of antigens for serotyping pathogenic bacteria (33) and is meant to be a complement to MLST, not a substitute for MLST. Specifically, each WSP amino acid sequence (amino acids 52 to 222 with respect to the wMel sequences) was partitioned into four consecutive sections whose breakpoints fell within conserved regions between the hypervariable regions (2), as follows: HVR1 (amino acids 52 to 84), HVR2 (amino acids 85 to 134), HVR3 (amino acids 135 to 185), and HVR4 (amino acids 186 to 222). Each section encompassed one of the four HVR motifs and a portion of the two conserved flanking regions. For simplicity here we refer to each section as an HVR. Each unique HVR haplotype was given a number. Each WSP sequence was thus defined as a combination of four numbers, its WSP profile, corresponding to the four HVR haplotypes.
Features of the Wolbachia databases. The following three separate databases have been generated for storing and managing all Wolbachia data: the profile, isolate, and WSP databases. The profile database contains the MLST allele sequences, allelic profiles, and STs. The isolate database stores information about the strains and host strain biological information (e.g., taxonomy, natural history, collection date, and locality). Each strain is designated by an identification number and a code consisting of the first letter of the host genus name, followed by the first three letters of the host species name, underscore, a letter indicating the supergroup to which it belongs, underscore, and any further specific information about the strain needed for discrimination. For example, Dsim_A_wRi indicates the Wolbachia strain from D. simulans belonging to supergroup A found in Riverside County (CA) currently referred to as wRi. The isolate database allows storage of accession numbers for sequences other than the sequences for the MLST genes that are available for a specific strain. The WSP database stores both nucleotide and amino acid sequences for wsp alleles and HVRs. All of the databases are reciprocally linked and can be accessed at the Wolbachia MLST website (http://pubmlst.org/wolbachia/).

Analyses of genetic variation and recombination.
For each locus, the number of variable sites (VI), GϩC content, level of nucleotide diversity per site (Pi), and  (41). Significant differences in genetic divergence for loci for groups of strains having one identical allele were determined using 2 analysis. Recombination analyses were performed using MaxChi (28) with single and combined data sets from each gene alignment. The parameters were set as follows: sequences were considered linear, the highest acceptable P value cutoff was 0.01, a Bonferroni correction was applied, consensus daughter sequences were found, gaps were included, different window sizes of variable sites were tested (10, 20, and 30 VI), and 1,000 permutations were performed. All recombination events detected by the program were visually inspected for the alignment, and only clearly recombinant strains were listed.
Tree reconstruction. Strict maximum likelihood (ML) inference was used to reconstruct phylogenies based on single and concatenated MLST genes (PAUP v. 4.01). The appropriate model of evolution was estimated with Modeltest v. 3.06, and the best likelihood score was evaluated with the Akaike information criterion (37). The models selected were GTRϩIϩG for gatB, wsp, and the concatenated MLST gene data set and GTRϩG for coxA, hcpA, ftsZ, and fbpA. Bayesian maximum likelihood inference and maximum parsimony were used to reconstruct phylogenies based on the concatenated MLST data set and wsp data. Bayesian analyses were performed using MrBayes, v. 3.1.1 (18), with 10 6 generations and a sample frequency of 100. The first 5,000 trees were discarded as burn-in, and 50% majority rule was applied to the remaining set of trees. Maximum parsimony bootstrap analyses were performed with PAUP v. 4.01 (48) using 1,000 replicates and 100 random-addition replicates per bootstrap replicate, and a 50% majority rule bootstrap was applied.
A neighbor-joining tree was reconstructed from the 37 MLST allelic profiles using the START2 program (23). An unweighted pair group method with arithmetic mean dendrogram was reconstructed from the matrix of pairwise similarities for the 36 WSP profiles (START2 program). START2 and a suite of publicly available bioinformatics software for MLST data analysis can be found at the MLST home page (http://pubmlst.org/).
Analysis of congruence between trees. We used the Shimodaira-Hasegawa test (42) to evaluate the significance of topological incongruence among the ML trees based on the five MLST genes and the concatenated data set. The ϪlnL scores and significance values for nucleotide substitution were optimized separately for all topologies considered. Differences in ϪlnL scores between the best ML topology and an alternative topology were calculated for each data set. The significance of these differences was measured using a bootstrap approach with RELL and full optimization for 1,000 replicates (PAUP v. 4.01).
Nucleotide sequence accession numbers. All MLST and wsp sequences generated in this study have been deposited in the GenBank database under accession numbers DQ842268 to DQ842486.

RESULTS
Primer performance. The MLST primers worked for 34 of 35 supergroup A and B strains tested (Table 2, first 35 strains). In only one case, Trichogramma deion (TX), did the ftsZ primers not amplify. This strain sequence was generated using the MLST primers with tags (http://pubmlst.org/wolbachia/). The wsp primers amplified the gene from all the supergroup A and B strains tested except the strain in T. deion. All sequences generated using MLST primers perfectly matched the corresponding sequences independently generated using the initial 64-fold degenerate primers. MLST and wsp primers consistently generated amplification products in all the laboratories where they were tested, and bidirectional sequencing of PCR products gave unambiguous results.
The primers were sufficiently degenerate to also amplify divergent strains not belonging to supergroups A and B, such as Wolbachia strains from the nematode B. malayi (supergroup D) and the bed bug Cimex lectularius (supergroup F). For other divergent strains the primers worked well for only some of the loci. All loci except ftsZ were amplified from Wolbachia strains in the termite Zootermopsis angusticollis (supergroup H) and the flea Ctenocephalides felis. All loci except coxA were amplified from Wolbachia strains in the pseudoscorpion Cor-dylochernes scorpioides and the collembolan Folsomia candida (supergroup E). Finally, hcpA, fbpA, and ftsZ were amplified from Wolbachia strains in the termite Coptotermes acinaciformis (supergroup F). The wsp primers worked with all the strains mentioned above except the strain in F. candida. All species whose infection status was previously unknown gave results consistent with 16S rRNA gene results. Only for the embiopteran H. solieri were the results discordant; this insect tested positive for infection with a Wolbachia-specific 16S rRNA gene, but there was no amplification with MLST and wsp primers.
When distinct supergroup A and B strains coinfected the same individuals, the MLST supergroup A-and B-specific primers successfully amplified the target supergroup A and B type sequences from all the doubly and singly infected species examined. Group-specific MLST primers also amplified both supergroup A and B strain products from the three N. longicornis strains carrying multiple supergroup A and B strains. Analysis of the electropherograms showed unambiguous double peaks for the coxA, gatB, and wsp genes, revealing the multiple infections for each group. Based on the electropherogram pattern of polymorphic sites, specific restriction enzymes were used to digest the PCR products, and the results confirmed the presence of multiple sequence products (data not shown). This illustrates how the MLST gene set can be used to detect multiple infections in individual hosts.
Allelic variation at MLST loci. A total of 37 strains were characterized by MLST, including 35 strains belonging to supergroup A or B plus strains from B. malayi (supergroup D) and C. lectularius (supergroup F) ( Table 2). In the 37 strains analyzed, there were 25 ftsZ alleles, 26 gatB alleles, 27 coxA alleles, 28 fbpA alleles, and 31 hcpA alleles (Table 3). Comparative analysis of the genetic divergence across loci indicated that ftsZ was the least polymorphic locus, with only 24.8% of the sites showing variation (VI), the lowest number of alleles (25), and the lowest level of nucleotide diversity per site (Pi, 0.065). The most polymorphic locus was hcpA, with 36.5% VI and the highest number of alleles (31). The greatest nucleotide diversity per site was found in fbpA (Pi, 0.092). All genes showed an average K a /K s of Ͻ Ͻ1, indicating that the genes were subject to stabilizing selection, conforming to the general requirements for MLST loci. Comparison of the genetic polymorphism between supergroups A and B at the five loci (Table  3) led to three main observations: (i) the percentage of VI for all five loci was always higher in supergroup B than in supergroup A; (ii) the number of alleles was generally higher for supergroup B (except for gatB); and (iii) the K a /K s was consistently Ͻ Ͻ1 for both supergroups.
Recombination at MLST loci. Four of the five genes chosen for the MLST scheme (gatB, coxA, hcpA, and fbpA) exhibited evidence of intragenic recombination (P Ͻ 0.001, as determined by MaxChi) ( Table 3). The lack of evidence for recombination in ftsZ was consistent with previous findings (3). The hcpA, coxA, and fbpA alleles had recombinants within supergroup A or B. The gatB and fbpA alleles had recombinants between supergroups A and B. These results indicated that the two supergroups exchanged DNA frequently enough to be detected with this small data set, although most genes showed consistent association with one of the two supergroups (see below). 7102 BALDO ET AL. APPL. ENVIRON. MICROBIOL.
As a likely result of recombination both within and outside the gene boundaries (intra-and intergenic recombination, respectively), some strains were extremely similar at one locus and highly divergent at another. This disparity in divergence was visualized by plotting the genetic distances for loci for pairs of strains that were identical at one or two loci (Fig. 2). The levels of genetic divergence at nonidentical loci estimated for pairs of strains that exhibited identity at gatB were remarkably high in some cases (Fig. 2). For instance, for Ostrinia scapulalis and N. vitripennis (B) there was about 11% divergence at fbpA. Similar values were found for O. scapulalis and Acraea encedon, for Telogryllus taiwanemma and N. vitripennis (B), for T. taiwanemma and A. encedon, for N. vitripennis (B) and Chelymorpha alternans, and for N. vitripennis (B) and A. encedon. These values are significantly greater than the average level of divergence for this locus (2.7%) based on the pairs of strains with identity at another locus (P Ͻ 0.05). Conflicting relationships among strains could also be visualized (Fig. 2); N. vitripennis B and C. alternans were very similar based on their coxA alleles (0.2% divergence) but were very divergent based on their fbpA alleles (11.4%). Conversely, N. vitripennis (B) and A. encedon were very similar based on their fbpA alleles (0.2%) but were very divergent based on their coxA alleles (8.2%). These inversions of relationships indicate that there was recombination among divergent strains.
Phylogenetic inference of relationships. As expected for a data set affected by recombination, phylogenetic analyses for single and concatenated MLST loci revealed significant incongruence among loci (Table 4). Conflicts in tree topologies were inferred for all pairwise comparisons of the five genes. Indeed, there appeared to be no single phylogeny for a strain but multiple phylogenies for different fragments of the genome. As a result, the phylogenetic tree based on the concatenated sequences (Fig. 3) revealed strains that are closely related but  also created phylogenetic artifacts that resulted from recombination between more divergent strains. For instance, members of the five clusters highlighted in the tree (Fig. 3) likely descended from a recent common ancestor, as reflected by strong grouping in the concatenated tree (both posterior probability and bootstrap values were Ͼ95%) and consistent phylogenetic clustering based on single-gene phylogenies (data not shown). However, the concatenated tree also revealed several clusters of strains whose relationships differ in single-locus phylogenies. For example, based on the concatenated data set, the Encarsia formosa and P. sialia (B1) strains formed a highly supported cluster, with both bootstrap and posterior probability values of ϳ100%. However, based on coxA phylogeny, the E. formosa strain clustered with the Acraea eponina strain (data not shown) and was separated from the P. sialia (B1) strain by a remarkable genetic distance (Fig. 2). Consistent with the effects of recombination, these two strains also did not cluster based on the wsp phylogeny (Fig. 4). Hence, the concatenated gene approach alone can be useful for identifying very closely related strains and to resolve major supergroups. However, it FIG. 3. Bayesian likelihood inference phylogeny based on the concatenated data set for the five MLST loci (37 strains, 2,079 bp). Groups of strains sharing at least three MLST alleles are highlighted. Arrows and asterisks indicate two examples of strain pairs whose predicted relationships are highly discordant with the wsp phylogeny (Fig. 4). Posterior probability (left) and parsimony bootstrap (right) values are indicated at major nodes if they were supported by both clustering algorithms. cannot be used to interpret more distant phylogenetic relationships within a supergroup, even when clades are highly supported, because of artifacts resulting from recombination among genes. Combined analyses using allelic profiles, singlegene phylogenies, and concatenated gene phylogenies are advisable for any inference of strain relationships. MLST inference. Based on the neighbor-joining tree constructed from the STs (Fig. 5), the majority (35/37) of the STs were unique to a strain, as were most of the alleles at a single locus. Nine of the 35 STs (Fig. 4) did not share any alleles with any other ST. This allelic diversity was probably due in part to the great genetic diversity of Wolbachia and in part to the sample of strains selected, which spanned a wide range of host species. No alleles were shared by STs of supergroup A and B strains. The following two pairs of strains among the pairs analyzed exhibited identity at all loci: the pair consisting of the Culex pipiens pipiens and C. pipiens quinquefasciatus strains (ST-9) and the pair consisting of the N. vitripennis (A) and Muscidifurax uniraptor strains (ST-23). Culex strains are identical at other chromosomal genes (13,44), while the identity for the N. vitripennis (A) and M. uniraptor wasp strains was not reported previously.
By convention, strains that share at least three MLST alleles with a central ST are grouped in a complex (ST complex). Additional sequencing of 28 genes showed that members of an ST complex based on our five MLST genes were also consistently closely related based on the larger gene set, indicating the reliability of the MLST system for identifying closely related strains.
A total of three ST complexes were identified based on the criteria used (three shared alleles with a central ST) (Fig. 5). WSP characterization. The HVRs of the WSP protein were employed as an additional, optional marker to assess strain diversity. A new method for representing WSP sequence relationships is proposed here, which can be used in combination with standard distance-based methods. This method takes into account the extensive intragenic shuffling of the four HVRs (2) and can be used to readily detect allele shuffling based on unique identifiers for each HVR. Since the majority of amino acid changes among WSP sequences occur in the four HVRs, these motifs can be used as signatures for discrimination of the different WSP protein types (see Materials and Methods).
The sample of strains characterized by MLST was also typed by WSP, with the exception of the T. deion strain, for which amplification of wsp failed. Figure 6 shows the WSP profile, a combination of the four HVR amino acid haplotypes. For 36 strains, 33 unique WSP profiles were identified. Five groups of strains shared at least three HVR haplotypes (Fig. 6), and three of them had identical WSP profiles. Nine of the 33 WSP profiles (Fig. 6) did not share any haplotype with any other WSP profile. The overall numbers of unique haplotypes per HVR were similar, ranging from 24 to 27. However, the number of nonunique haplotypes within an HVR (i.e., haplotypes that occurred in multiple WSP profiles) varied for the four HVRs; there were only three haplotypes for HVR1 (haplotypes 1, 10, and 17), compared to seven haplotypes for HVR2, six haplotypes for HVR3, and nine haplotypes for HVR4. Motif shuffling was apparent in the data set. Indeed, several strains exhibited identity at individual HVRs alternatively with different sets of strains. For example, N. giraulti (A) shared haplotype 1 in HVR1 with wMel, N. longicornis (A), D. simulans wAu, D. recens, Drosophila innubila (A), D. orientacea, D. neotestacea, and A. albopictus. However, in HVR2 it had a unique haplotype sequence (haplotype 23), while in HVR3 it had the same haplotype sequence (haplotype 15) as D. simulans wRi; finally, in HVR4 it had the same haplotype sequence (haplotype 25) as Camponotus pennsylvanicus.
WSP inference and comparison to MLST. Analogous to the ST-based neighbor-joining tree (Fig. 5), the relationships among WSP profiles could be schematically represented in a tree (Fig. 6). We noted that relationships inferred by MLST and WSP profiles were not necessarily concordant but were complementary. In some cases strains that were closely related based on their MLST profiles also had similar WSP profiles. Indeed, the two C. pipiens strains had identical ST (ST-9) (Fig.  5) and WSP (Fig. 6) profiles. The M. uniraptor and N. vitripennis (A) strains, which had the same ST (ST-23), shared three of four HVR haplotypes and differed by only two amino acids in HVR3. Two of the three ST complexes (Fig. 5) were also very closely related based on their WSP profiles (Fig. 6). Specifically, D. simulans wMa and D. simulans wNo (ST-a FIG. 5. Neighbor-joining tree based on the MLST allelic profiles (right columns) of 37 Wolbachia strains. ST complexes and strains with identical STs are highlighted (STs shared by strains are indicated by arrowheads). Only closely related strains predicted by MLST were also predicted by the concatenated MLST phylogeny (Fig. 3), WSP profiles (Fig. 6), and wsp phylogeny (Fig. 4).  (Fig. 5). The mean levels of amino acid divergence between alleles of the two groups were 2% for gatB, 4% for hcpA, 5% for fbpA, and 6% for coxA. There were no differences between ftsZ alleles. Based on their WSP profiles, however, the two groups exhibited high levels of similarity in their WSP profiles, with two of the four HVRs identical (Fig. 6).
Divergence between the two groups in the two nonidentical HVRs accounted for only one amino acid change each. Such a level of high similarity at a quickly evolving gene is in contrast to the higher levels of divergence in some housekeeping genes; this evidence indicates that horizontal transfer and acquisition of wsp sequences occur through recombination.
As found with the MLST profiles, no identical HVRs were observed for the supergroup A and B strains, although some HVRs were closely related. Nine of 16 supergroup A strains had the same haplotype sequence in HVR1 (haplotype 1). Two strains, from the tortoise beetle Acromis sparsa and the ant Solenopsis invicta, had four unique HVR haplotype sequences. Both strains belonged to supergroup A, as supported by phylogenetic analyses based on MLST data (both the posterior probability and bootstrap values were 100) (Fig. 3) and MLST profiles (both strains exhibited identity at least at one allele with another supergroup A strain) (Fig. 5). However, based on the wsp gene phylogeny (Fig. 4), the strain from A. sparsa was excluded from both supergroup A and supergroup B and formed a divergent lineage, while the strain from S. invicta clustered significantly in supergroup B (both the posterior probability and bootstrap value were Ͼ90). Based on MaxChi analyses, the A. sparsa wsp sequence was the product of recombination between the supergroup A wsp-type sequence of Ephestia kuehniella (A) and the supergroup B-type sequence of E. formosa (P Ͻ 0.001). In the case of the Wolbachia strain from S. invicta, lateral transfer of the whole wsp gene from a supergroup B-type donor strain and recombination may explain the contradictory clustering inferred from the MLST and WSP data sets ( Fig. 3 and 4).
Overall, there was strong incongruence between MLST-and WSP-based relationships for both subgroup and supergroup strain assignment, indicating the need for a multilocus approach.

DISCUSSION
Wolbachia strains are among the most common obligate intracellular bacteria (56), and by virtue of the diverse effects that they have on their hosts (46,54) and their potential health and agricultural implications (35,61), they represent an extremely important group. The MLST system described here allows universal characterization of Wolbachia strains that induce a variety of phenotypes in a diverse range of hosts. It provides a standard method for reliable strain typing, a unified nomenclature for Wolbachia, and a centralized and curated database for data storage and management. Accurate strain typing is an extremely powerful tool for investigating the biology and evolution of Wolbachia. Because of the indexing of a standard set of genes for typing Wolbachia strains and the common field host information in the isolate database, it is possible to relate any new submitted strain to an expanding data set. With the isolate database, host and strain information are partitioned into multiple, searchable fields. Thus, each field can be used to sort the data, and the frequency of a single field or a combination of fields can be calculated and graphically displayed (for instance, host geographical location versus specific allele frequencies). As the database grows, informative patterns can emerge and key questions can be addressed, such as (i) what is the global pattern of diversity of Wolbachia strains; (ii) are there particular taxa that have an unusually high number of infected species; (iii) how do Wolbachia strains move between host taxa, and what genetic changes occur when they shift hosts; (iv) how do the strains move locally and globally within arthropod communities (e.g., are there strains or alleles that are found in particular geographical regions, while FIG. 6. Unweighted pair group method with arithmetic mean dendrogram based on the WSP profiles (right columns) of 36 Wolbachia strains, corresponding to the same sample typed by MLST (without the T. deion strain). Groups of strains sharing at least three HVR haplotypes are highlighted. In some cases groups of strains identified by MLST (Fig. 5, highlighted strains) were also closely related based on their WSP profiles. In strains belonging to the ST-13 complex there was a clear recombination signature in HVR4, which resulted in remarkable genetic divergence in this region among the five strains. MLST is also an important source of sequence data for extensive comparative genetics, providing a platform for investigating molecular evolutionary processes in these intracellular bacteria.
Results reported in this study reveal that (i) the MLST primers work reliably for the two major supergroups of Wolbachia found in arthropods (supergroups A and B), as well as with other groups (however, additional flanking primers are provided at the MLST website); (ii) there is considerable allelic diversity in genes and among Wolbachia strains, with recombination occurring both within and between the MLST genes; (iii) MLST can identify and discriminate closely related strains using chromosomal genes; (iv) closely related strains detected by MLST (strains that share a minimum of three alleles in their MLST profiles) can show remarkable divergence at WSP as a result of recombination at the wsp gene; (v) strains with very similar WSP can have highly divergent MLST profiles; and (vi) strains closely related based on MLST can produce very different host effects.
Discrimination and grouping of closely related strains. One of the challenges in the study of Wolbachia is the difficulty encountered when workers attempt to infer similarities among strains. Because of horizontal genetic exchange, Wolbachia genes do not evolve in a bifurcating, tree-like fashion. Thus, groups of truly derived strains cannot be recovered with traditional cladistic methods. Previous studies have inferred phylogenetic relatedness among strains based on similarity or identity at one or two genes, for instance, ftsZ (59) or wsp (43,62). The data presented here demonstrate that because of recombination, identity at single genes is not sufficient for reliable strain typing and that the high level of support for a clade in a concatenated phylogeny could be biased by the unequal divergence of loci, masking single-locus conflicts. Because MLST does not describe strain relationships in terms of the level of homology among sequences (i.e., phylogenetically) but describes them in terms of identity versus nonidentity among alleles (i.e., similarity), it avoids false inferences due to recombination issues. The ST tree in Fig. 2 provides a schematic representation of relationships among allelic profiles and can be combined with phylogenetic and other clustering algorithm analyses based on the sequences in individual or concatenated MLST data sets.
Based on the data set analyzed, only STs sharing three of five identical MLST loci (designated an ST complex) show consistent similarity at all the remaining loci. Strains identical at three alleles are likely to have a recent common ancestor and thus represent a phylogenetic cluster of recently diverged strains. Preliminary analyses of an additional 28 chromosomal genes for the members of three complexes (unpublished data) confirmed their consistent relatedness, indicating that MLST profiles can be used to predict similarity at other loci. Furthermore, no recombination was found between members of the same ST complex at MLST genes. In contrast, recombination at MLST genes was detected between strains identical at one or two MLST alleles. This indicates that grouping strains based on at least three shared alleles at MLST loci can form the basis of a reliable system for assessing actual similarity at chromosomal genes among strains. However, as more allelic profiles become available, it should be possible to test this observation more extensively.
Wolbachia researchers have used the gene wsp extensively to determine relationships among strains. However, using our data set, we found several instances in which strains closely related based on MLST data are very divergent based on WSP data and strains that have similar WSP differ substantially based on MLST data. For instance, members of the ST-13 complex showed extensive divergence in HVR4 of WSP, as a result of recombination. However, when phylogenetic methods were applied to this recombinant data set, a recombination signature in HVR4 was masked by a high level of similarity in the first three HVRs and the five strains apparently formed a strong cluster (posterior probability, 100%; bootstrap support, 100%) (Fig. 4). As an example of the opposite pattern, highly similar WSP sequences were shared by strains found in N. vitripennis (B) and E. kuehniella (B) (1% amino acid divergence), whose FbpA and HcpA protein sequences differ substantially (8 and 5% divergence, respectively). Such findings further support the conclusion that wsp alone is an unreliable predictor of either strain similarity or divergence.
Overall, the results show that the MLST system described here provides an excellent method for typing Wolbachia strains from diverse hosts and also for discriminating among strains in the same host species (e.g., D. simulans strains). It should be noted that identity in MLST genes does not mean that two strains are identical throughout their genomes. Even within a host species, where an ancestral strain can be derived from a relatively recent ancestor, variation within the genome can accumulate, particularly in quickly evolving or highly recombinant genes, such as mobile elements and/or ankyrin repeat domains. This has been shown for strains found in C. pipiens and Drosophila species (19,44). Discrimination of substrains within the same ST requires the use of additional specific polymorphic markers (39,44). The MLST database can accommodate this additional information through links that can be accessed through stored accession numbers.
Detection and typing of multiple coinfecting strains. Double Wolbachia infections are common in insects (24,55), and only in some circumstances can they be separated experimentally (34). MLST primers for typing supergroup A and B Wolbachia strains from individuals that are doubly infected have been designed to address this issue. However, PCR-based systems like MLST and WSP typing are constrained for examination of double infections as the different alleles cannot always be reliably assigned to specific strains coinfecting the same individual. A preliminary analysis using a small sample of multiply and singly infected species gave positive results. Primer performance, however, should be verified on a larger scale. Use of the group-specific primers for MLST requires workers to assume that there have not been reciprocal gene exchanges between the strains. Therefore, MLST for typing hosts that are doubly infected with supergroup A and B Wolbachia strains should not be used unless reliable supergroup A-and B-specific alleles can be identified for each of the MLST genes. Allelic profiles for doubly infected individuals are given the status "candidate" STs in the MLST database until the strains can be experimentally separated (e.g., by partial antibiotic curing or segregation by other means).
In some cases host species are coinfected with very closely 7108 BALDO ET AL. APPL. ENVIRON. MICROBIOL. related strains that belong to the same supergroup. These strains can occur in double or multiple infections and are coded A1A2, B1B2, etc. Because of the high levels of genetic similarity among the strains, these multiple infections are normally hard to detect, and when they are detected, the specific amplification and correct assignment of sequences to one of the strains are often difficult. Such multiple infections may be more common than has been assumed, and providing an estimate of their global frequency in host species represents an important first step. Based on a preliminary screening of a small sample of N. longicornis strains doubly infected with supergroup A and/or B strains using MLST group-specific primers, the multiple peaks in the electropherograms of some of the MLST gene products can be read as a signature of the potential presence of multiple strain variants. However, additional analyses are necessary to verify the actual existence of multiple strains. Relating MLST to Wolbachia biology. Even with the limited number of strains analyzed here, several patterns regarding the transmission of Wolbachia are apparent. Based on both MLST and WSP data, different host organisms were found to harbor very similar or identical Wolbachia strains; these organisms include the wasp species N. vitripennis (A) and M. uniraptor; the mosquito subspecies C. pipiens pipiens and C. pipiens quinquefasciatus; the fly species D. melanogaster, D. simulans wAu, D. recens, and D. innubila, as well as the wasp species N. longicornis; and the fly species D. neotestacea and D. orientacea. The strain similarities may have been due to recent horizontal transmission of Wolbachia across host species or, in the case of the closely related species D. neotestacea and D. orientacea, to codivergence and coevolution of Wolbachia strains and their hosts. Because genetically similar strains are often found in similar insect hosts, ecological interactions among hosts may have mediated such horizontal transfers. In particular, it appears that horizontal transmission of Wolbachia has occurred quite recently and frequently within the genus Drosophila.
Many of the reproductive phenotypes of the strains analyzed in this study remain unknown, and the possibility that there is a correlation between STs and/or WSP profiles and the strain phenotype cannot be eliminated yet. Based on the data set analyzed, genetically similar strains found in different host species were often associated with distinct host reproductive manipulations ( Table 2). For example, strains found in N. vitripennis (A) and M. uniraptor, which have identical ST and very similar WSP profiles, are responsible for inducing cytoplasmic incompatibility (5) and parthenogenesis (45), respectively, in their hosts. Similarly, based on both MLST and WSP data the D. simulans wAu and wasp N. longicornis (A) strains are very closely related but are responsible for different patterns of cytoplasmic incompatibility (5,29,58). Based on these observations, it is now relevant to investigate the genetic changes in these closely related strains that may be involved in the phenotypic shift. In addition, as future studies clarify the specific role of wsp in host-parasite interactions, information on amino acid motifs in HVRs may prove to be useful. As more allelic profiles become available for strains that are found in hosts belonging to different taxa, occur in different geographic regions, and are subjected to different phenotypes, the potential of the MLST database for revealing ecological, evolutionary, and functional patterns should increase.