Application of Whole-Cell Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry for Rapid Identification and Clustering Analysis of Pantoea Species

ABSTRACT Pantoea agglomerans is an ecologically diverse taxon that includes commercially important plant-beneficial strains and opportunistic clinical isolates. Standard biochemical identification methods in diagnostic laboratories were repeatedly shown to run into false-positive identifications of P. agglomerans, a fact which is also reflected by the high number of 16S rRNA gene sequences in public databases that are incorrectly assigned to this species. More reliable methods for rapid identification are required to ascertain the prevalence of this species in clinical samples and to evaluate the biosafety of beneficial isolates. Whole-cell matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) methods and reference spectra (SuperSpectrum) were developed for accurate identification of P. agglomerans and related bacteria and used to detect differences in the protein profile within variants of the same strain, including a ribosomal point mutation conferring streptomycin resistance. MALDI-TOF MS-based clustering was shown to generally agree with classification based on gyrB sequencing, allowing rapid and reliable identification at the species level.

Pantoea agglomerans (20) is a ubiquitous plant-epiphytic bacterium that belongs to the family Enterobacteriaceae. While several strains are commercialized for biological control of plant diseases (23), the species also includes two phytopathogenic pathovars that carry distinctive virulence plasmids (32). P. agglomerans has a Jekyll-Hyde nature, being described also as an opportunistic human pathogen (30), which raises biosafety regulatory issues for the utilization of beneficial isolates (45). Clinical reports predominantly involve septicemia following penetrating trauma (16,56) or nosocomial infections (14,55). Clinical pathogenicity of this species has not been confidently confirmed (unfulfilled Koch's postulates). Infections attributed to P. agglomerans are typically of a polymicrobial nature involving patients affected by other diseases (14) and may represent secondary contamination of wounds. Standard clinical diagnostics and identification rely mainly on biochemical profiling analysis or alternatively on 16S rRNA gene sequencing, despite the inadequacy of these techniques for precise discrimination within the Enterobacter and Pantoea genera (5,20,39). Problems with correct identification have been observed for automated systems such as the API 20E (24,39) and Vitek-2/GNIϩ (39,40) (both from bioMerieux) or the Phoenix (11,38) and Crystal identification systems (40,48) (both from BD Diagnostic Systems).
P. agglomerans is a composite taxon conglomerating former Enterobacter agglomerans, Erwinia milletiae, and Erwinia herbicola strains. Accurate identification is complicated by the unsettled taxonomy of the "P. agglomerans-E. herbicola-E. agglomerans" complex (45). Recent analyses based on gyrB sequencing, multilocus sequence analysis (MLSA) (4), and fluorescent amplified fragment length polymorphisms (fAFLP) (45) indicate that strains belonging to Enterobacter or Erwinia archived in culture collections are often erroneously assigned to P. agglomerans and are likely also misidentified in clinical diagnostics. False classifications of environmental P. agglomerans strains as related Pantoea species, including human-or plant-pathogenic P. ananatis, are also common (45). Inadequate biochemical identification methods and uncertainty regarding current taxonomy are revealed also by the excessive number of 16S rRNA gene sequences incorrectly assigned to P. agglomerans that can be retrieved from GenBank (Fig. 1). Sequencing of housekeeping genes, MLSA, and fAFLP are labor-intensive, time-consuming, and impractical approaches as routine diagnostic tools. Whole-cell matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) (31) is an emerging technology for identification of bacteria (26,46), fungi (17,33), viruses (29,51), insects (41), and helminths (42). MALDI-TOF MS-based identification can accurately resolve bacterial identity at the genus, species, and in some taxa subspecies levels (e.g., Salmonella enterica serovars, Listeria genotypes) (1,18). Identity is based on unique mass/charge ratio (m/z) fingerprints of proteins, which are ionized using short laser pulses directed to bacterial cells obtained from a single colony embedded in a matrix. After desorption, ions are accelerated in vacuum by a high electric potential and separated on the basis of the time taken to reach a detector, which is directly proportional to the mass-to-charge ratio of an ion. This FIG. 1. Taxonomy of putative P. agglomerans isolates based on 16S rRNA gene sequences retrieved from GenBank under the currently accepted species name or under the old basonyms Enterobacter agglomerans and Erwinia herbicola. Out of a total of 331 complete or partial sequences found, 263 could be aligned over their 1,240-bp central region resulting in a minimum evolution tree. For the analysis, gaps and missing data were eliminated only in pairwise sequence comparisons, resulting in a total of 1,114 positions. Nodal supports were assessed by 1,000 bootstrap replicates. Only bootstrap values greater than 50% are shown. The scale bar represents the number of base substitutions per site. The number of "P. agglomerans" sequences clustering with a given reference strain in shown in parentheses. Reference strains and clades containing reference strains are marked in bold, and the corresponding accession numbers are indicated between brackets. For the genus Erwinia the following reference strains were used: technique has been shown to deliver reproducible protein mass fingerprints starting from an aliquot of a single bacterial colony within minutes and without any prior separation, purification, or concentration of samples. Whole-cell MALDI-TOF MS is a reliable technique across broad conditions (e.g., different growth media, cell growth states), with limited variability in mass-peak signatures within a selected mass range (2,000 Ͻ m/z Ͻ 20,000) that does not affect reliability of identification (28,31). MALDI-TOF MS profiles primarily represent ribosomal proteins, which are the most abundant cellular proteins and are synthesized under all growth conditions (47). MALDI-TOF MS identification profiles derived from several characterized strains for a given species are used to develop reference spectra (e.g., SuperSpectrum; AnagnosTec GmbH, Potsdam, Germany), and they include a subset of characteristic and reproducible markers. MALDI-TOF MS identification databases are currently available for a relatively wide range of clinical bacteria, and this method has become an accepted tool for routine clinical diagnostics due to enhanced simplicity, rapidity, and reliability. However, environmental bacteria, such as Pantoea, have not been widely evaluated using MALDI-TOF MS and are largely absent from identification databases, limiting the practical reach of this new technology. Our objectives were to develop a robust method for rapid identification of P. agglomerans and related bacteria based on MALDI-TOF MS and to compare MALDI-TOF MS results against those obtained from a phylogenetic analysis based on gyrB sequencing as well as against biochemical identification methods.

MATERIALS AND METHODS
Screening and analysis of total available Pantoea 16S rRNA gene sequences. The NCBI nucleotide database (http://www.ncbi.nlm.nih.gov/nuccore) was scrutinized for 16S rRNA gene sequences of putative P. agglomerans isolates by searching for entries beneath the currently accepted species name and under the old basonyms Enterobacter agglomerans and Erwinia herbicola. Out of a total of 394 complete or partial sequences found, 263 were at least 1,240 bp long and were retained for the analysis together with 21 reference sequences of relevant species of Enterobacteriaceae. The resulting ClustalW (52) alignment was employed to construct a minimum evolution tree using the Molecular Evolutionary Genetics Analysis (MEGA) program, version 4.0 (50).
Bacterial strains. A collection of 53 strains received as P. agglomerans, a Pantoea sp., or E. agglomerans from research and culture collections and 20 reference strains belonging to closely related Pantoea species and other Enterobacteriaceae were compared ( Table 1). The streptomycin-resistant, commercial biocontrol strain Pantoea vagans C9-1S (formerly P. agglomerans [45]) and a variant strain, P. vagans C9-1W, lacking the 530-kb pPag3 plasmid (49), were included to evaluate the sensitivity and robustness of MALDI-TOF MS. For standardized spectral acquisition and generation of "SuperSpectra," as determined by SARAMIS (spectral archive and microbial identification system) (AnagnosTec GmbH), bacteria were grown on LB agar at 28°C for 24 to 48 h. Tryptic soy agar (TSA) and Mueller-Hinton agar (MHA) were used in parallel to assess the influence of different media on species recognition. Alongside MALDI-TOF MS analysis, biochemical profiling and conventional Sanger sequencing of the gyrB gene were performed for all strains.
gyrB sequence analysis. Amplification and sequencing of the gyrB gene were performed as described previously (45) by means of the HotStarTaq master mix kit (Qiagen, Basel, Switzerland) and the ABI PRISM BigDye Terminators version 1.1 cycle sequencing kit (Applied Biosystems, Foster City, CA), respectively. Two degenerate primers were used to amplify and sequence a 970-bp region of the gyrB gene, gyr-320 (5Ј-TAARTTYGAYGAYAACTCYTAYAAAGT-3Ј) and rgyr-1260 (5Ј-CMCCYTCCACCARGTAMAGTTC-3Ј) (15). The phylogenetic tree was generated on the basis of a 740-bp fragment of the gyrB amplicon. DNA sequences were aligned with ClustalW (52). Sites presenting alignment gaps were excluded from analysis. The Molecular Evolutionary Genetics Analysis (MEGA) program, version 4.0 (50), was used to calculate evolutionary distances and to infer a tree based on the neighbor-joining (NJ) method using the maxi-mum composite likelihood (MCL) model. Nodal robustness of the inferred tree was assessed by 1,000 bootstrap replicates. GenBank accession numbers for sequences used in this work were GU225728 and GU225729, FJ617346 to FJ617453, FJ617355 to FJ617459, FJ617361 to FJ617483, FJ617385 to FJ617486, FJ617389 to FJ617491, FJ617393 to FJ617496, FJ617398 to FJ617402, FJ617404 to FJ617405, FJ617404 to FJ617405, FJ617408 to FJ617413, FJ617416 to FJ617419, FJ617422, FJ617424, FJ617425 to FJ617427, EF988757 to EF988758, EF988768, and EU145275.
Biochemical identification. Automated biochemical identification of strains was performed using the Phoenix 100 ID/AST system (V5.66A) with NMC/ID-51 panels and the EpiCenter (V5.66A/V4.61A) microbiology data management system (BD Biosciences, Sparks, MD), following manufacturers' protocols. Each panel contained 45 substrates (Fig. 2) and included two fluorescent positivecontrol wells. Identification was determined using the Phoenix software by comparing the patterns of positive and negative reactions of individual samples with those of species contained in the commercial database. The current Phoenix database contains 90 genera, 324 species, and five CDC enteric groups.
MALDI-TOF MS spectrum acquisition. Cells from a single bacterial colony grown on LB agar for 24 h were transferred to a target spot of a steel target plate using a disposable loop, overlaid with 0.5 l of a 2,5-dihydroxybenzoic acid (DHB) matrix (AnagnosTec GmbH, Potsdam, Germany), and air dehydrated within 1 to 2 min at 24 to 27°C. Protein mass fingerprints were obtained using a MALDI-TOF mass spectrometry Axima confidence machine (Shimadzu-Biotech Corp., Kyoto, Japan), with detection in the linear, positive mode at a laser frequency of 50 Hz and within a mass range of 2,000 to 20,000 Da. Acceleration voltage was 20 kV, and the extraction delay time was 200 ns. A minimum of 20 laser shots per sample was used to generate each ion spectrum. For each bacterial sample, a total of 50 protein mass fingerprints were averaged and processed using the Launchpad version 2.8 software (Shimadzu-Biotech Corp.). For peak acquisition, the average smoothing method was chosen, with a smoothing filtering width of 50 channels. Peak detection was performed with the threshold-apex peak detection method using the adaptive voltage threshold which roughly follows the signal noise level, and subtraction of the baseline was set with a baseline subtraction filter width of 500 channels. For each sample, a list of the significant spectrum peaks was generated that included the m/z values for each peak, mass deviations, and signal intensity. Calibration was conducted for each target plate using spectra of the reference strain Escherichia coli K-12 (GM48 genotype). E. coli K-12 was deposited on each plate in two fixed positions, and the calibration was performed at the beginning of each plate acquisition. At the end, a second measurement of K-12 as a control was performed.
MALDI-TOF MS spectrum analysis and SuperSpectrum generation. Generated protein mass fingerprints were first imported in SARAMIS and analyzed using the following presetting parameters: mass range, from 2,000 to 20,000 Da; allowed mass deviation, 800 ppm. The spectra were related through cluster analysis by applying the single-link agglomerative algorithm of SARAMIS. Distance trees were compared to the neighbor-joining phylogenetic gyrB tree. As reference spectra for a rapid identification, SARAMIS uses so-called "Super-Spectra" consisting of taxon-specific biomarkers. SuperSpectrum generation was based on recovered mass signal markers with an absolute intensity of at least 200 mV included in the 2,000-to 20,000-Da mass range. To create the SuperSpectrum of a species, only the protein mass fingerprints of the strains clustering with the respective type strain in the gyrB tree were used (with species clustering indicated by the associated bracket). Using the SuperSpectrum tool, a subset of protein masses found in at least 90% of the strains of one species were selected and tested for their discriminatory power by comparing them to all of the database entries. Dependent on the amount of remaining species-identifying marker masses, each was given a numeric value in order to get a maximum total number of points not higher than 1,250. A set of 20 to 40 marker masses is normally sufficient to obtain a specific identification to the species level. The identification results obtained using MALDI-TOF MS were compared to those obtained using DNA sequencing combined with a BLAST similarity search and the outcome of the biochemical analysis using the Phoenix 100 ID/AST system.
Nucleotide sequence accession numbers. Sequences newly obtained as a result of this study were deposited in GenBank under accession numbers GU225728 and GU225729.

RESULTS AND DISCUSSION
Only 151 of 263 (i.e., 57%) 16S rRNA sequences retrieved from NCBI listed as belonging to P. agglomerans isolates clustered indeed with type strain P. agglomerans LMG 1286 T . The remaining ones could be assigned either to other Pantoea spp.  (29 sequences), to the genus Erwinia (20 sequences), or to other taxa of the Enterobacteriaceae (62 sequences) or did not cluster with any of the chosen reference species or genera and did not produce significant matches with reliable 16S rRNA sequences at NCBI (i.e., blastn Յ 97%) (Fig. 1). These results underscore the inadequacy of current biochemical and molecular identification methods employed in clinical diagnostics for P. agglomerans and the common use of obsolete taxonomy. Two ways incorrect sequences may appear in GenBank are blanket relocation to P. agglomerans of some species within the "P. agglomerans-E. herbicola-E. agglomerans" complex or a posteriori sequencing of isolates biochemically misidentified as P. agglomerans. The presence of such rogue data in the GenBank 16S rRNA gene database constitutes a potential pitfall for anyone trying to identify P. agglomerans only on the basis of 16S rRNA gene sequences. Automated biochemical identification using the Phoenix 100 ID/AST system was less accurate than gyrB sequencing or MALDI-TOF MS and returned uncertain strain identification within P. agglomerans and related species. Of the 23 Pantoea strains analyzed, 19 were identified biochemically as P. agglomerans (Fig. 2), although only six of them could accurately be assigned to P. agglomerans using gyrB sequence analysis (Fig.  3). The Phoenix 100 ID/AST system was unable to separate the different species within the genus, as none of the other recognized Pantoea species is currently present in its database. Con-versely, five strains belonging to other genera were incorrectly assigned to P. agglomerans using biochemical analysis: Erwinia persicina LMG 3622, Enterobacter sp. ATCC 27988 and ATCC 27991, Tatumella punctata LMG 22097, and Tatumella citrea LMG 23359, the latter two previously belonging to the former "Japanese species" of Pantoea (4,8). For the first three strains this outcome is the consequence of imprecise biochemical profiling, as 29 out of 45 reactions within the NMC/ID-51 panel are allowed to deliver a variable result while still retaining the identification as P. agglomerans (Fig. 2). This is most evident in Enterobacter sp. ATCC 27988, where all 16 nonvariable reactions correspond to the Phoenix profile of P. agglomerans but where 14 out of the 29 reactions which allow a variable result are different between this strain and P. agglomerans LMG 1286 T . On the other hand, Tatumella strains LMG 22097 and LMG 23359 have biochemical profiles which are closely related to those of Pantoea and only recently have DNA-DNA hybridization and phenotypic tests allowed the transfer of these species to the genus Tatumella (8). One strain belonging to P. agglomerans (ATCC 27987) was incorrectly assigned biochemically to Mannheimia haemolytica, while P. ananatis ATCC 27996 and Pantoea sp. LMG 5343 were misidentified as Vibrio cholerae and Cedecea davisae, respectively. For all three strains the number of nonvariable reactions matching the Phoenix profile of P. agglomerans fell short of 15, which is apparently among the minimum prerequisites for a positive identification  (Fig. 2). Taken together, these results suggest that the taxonomical confusion within the former "P. agglomerans-E. herbicola-E. agglomerans" complex (19) contributed at least in part to generate imprecise biochemical profiles for the identification of P. agglomerans.
As expected, phylogenetic analysis based on gyrB sequences (Fig. 3) provided greater discriminatory power than either biochemical or 16S rRNA gene sequencing to describe the Pantoea group (45). Confirming the uncertainty of P. agglomerans identification, only 20 of the 53 strains received from culture collections as P. agglomerans, Pantoea spp., or E. agglomerans clustered with type strain LMG 1286 T according to gyrB sequencing. Seven strains previously assigned to P. agglomerans were found to fit in the MLSA groups of Pantoea recently described as new species (i.e., C9-1 as P. vagans; EM13cb and SC-1 as P. anthophila; LMG 5343 and ATCC 29001 as P. brenneri; and EM17cb as P. conspicua), or to belong to a novel subspecies of P. agglomerans (Eh252) (4,6,7,45). The remaining strains were reassigned to other Pantoea species or Enterobacteriaceae, although precise identification was not possible in all cases.
MALDI-TOF MS delivered results which were almost equivalent to those of gyrB sequencing in terms of species grouping (Fig. 3). Only a single strain (ATCC 27987) was A UPGMA dendrogram (on the left) was generated based on the binary biochemical data (shown in the middle) of the selected strains. Positive reactions are indicated by a black square, whereas an empty square means that for that strain the reaction was negative after 48 h incubation. Names of strains belonging to the Pantoea genus according to gyrB sequencing are in boldface, while those belonging to P. agglomerans are preceded by a number sign (#). The projected Phoenix profile for P. agglomerans is shown below; a grey square indicates that either a positive or a negative result can be expected for that reaction. Identification provided by the Phoenix 100 ID/AST system is shown on the right side, followed by the number of nonvariable reactions (NVR, N max ϭ16) matching the expected biochemical profile of P. agglomerans. Substrates evaluated were as follows: 1, arginine-arginine-AMC (AMC is 7-amido-4-methylcoumarin); 2, glycine-proline-AMC; 3, glycine-AMC; 4, glutaryl-glycine-arginine-AMC; 5, L-arginine; 6, L-glutamic acid-AMC; 7, L-leucine-AMC; 8, L-phenylalanine-AMC; 9, L-proline-AMC; 10, L-pyroglutamic acid-AMC; 11, L-tryptophan-AMC; 12, lysine-alanine-AMC; 13 found to be intermediate, being offset from the P. agglomerans (sensu stricto) group when a mass range of 2 to 20 kDa was used (Fig. 3) but not when the lower limit of the mass range was raised to 3 kDa. ATCC 27987 was confirmed as P. agglo-merans (sensu stricto) using fAFLP, although it must be noted that it was the only isolate genetically assigned to P. agglomerans for which the biochemical signature obtained with the Phoenix 100 ID/AST system was noticeably dissimilar from FIG. 3. Comparison between dendrograms derived from gyrB sequencing (A) and MALDI-TOF MS protein mass fingerprints (B). With the exception of the reference strains marked in bold, all other isolates were received as P. agglomerans, a Pantoea sp., or E. agglomerans from culture collections. The gyrB phylogenetic tree was generated on the basis of a 740-bp fragment of the gyrB amplicon using 1,000 bootstrap replicates of the neighbor-joining (NJ) method with the maximum composite likelihood (MCL) model without choosing any outgroup. Sites presenting alignment gaps were excluded from analysis. The MALDI-TOF MS dendrogram was constructed based on the protein mass fingerprint patterns of all analyzed strains using the single-link clustering algorithm implemented in SARAMIS. This algorithm is proprietary to the database provider and does not provide the option to perform bootstrap analysis. Correct clustering of each strain was confirmed by running at least four replicates per isolate, but, for clarity and conciseness, graphics were reduced to show a single representative spectrum per strain. The main clusters obtained in both phylogenetic trees were highly similar; thus, the two methods were comparable in discriminating P. agglomerans from other species and in recognizing previously misidentified strains. The diversity within the single species is apparently exaggerated by MALDI-TOF (deeper strain branches) and is not reflecting the actual phylogenetic distances. * , for P. agglomerans, P. vagans, P. ananatis, and P. dispersa, brackets indicate the isolates clustering with the respective type strain, which were used to create the SuperSpectrum of the corresponding species.

VOL. 76, 2010
PANTOEA IDENTIFICATION BY MALDI-TOF MS 4503 FIG. 4. Characteristic masses (in daltons) recognized as markers for the identification of P. agglomerans, P. vagans, P. ananatis, and P. dispersa. Two or more masses are considered equivalent and are shown on the same line if their values deviate by less than 800 ppm from the average. The 800-ppm mass accuracy setting defining the spectral bin size was evaluated/validated by AnagnosTec GmbH. Even if the error margin in a data set analyzed on a single MALDI-TOF MS instrument is usually smaller (around 400 ppm), a larger bin size is necessary to allow, within the SARAMIS database, the comparison of data collected with different instruments in different laboratories. An 800-ppm mass accuracy still allows one to distinguish mass differences of 14 Da for proteins of 10 kDa, which is normally sufficient to detect single-amino-acid exchanges in proteins of that size. Masses shared by three or more Pantoea species are marked in bold.
those of the other strains of the species, leading not only to a wrong automated identification based on the number of correct nonvariable reaction but also to an apparently incorrect clustering based on the overall reaction pattern (Fig. 2). The protein mass fingerprints of the 21 P. agglomerans (sensu stricto) strains and strains of P. ananatis, P. dispersa, and P. vagans were used to generate a SuperSpectrum with identifying mass peaks for each species (Fig. 4). A typical MALDI-TOF MS spectrum of P. agglomerans contained about 150 ion peaks between 2,000 and 20,000 Da, with the highest intensity peaks found between 4,000 and 11,000 Da. Comparison of these protein mass fingerprints defined a set of 21 markers present in at least 90% of all protein mass fingerprints for confident identification of P. agglomerans. While a subset of masses were shared by different Pantoea species, a combination of discriminatory signals provided a unique species-level signature (Fig.  4). MALDI-TOF MS spectra of strains having an identity that could not be genetically confirmed as Pantoea (e.g., LMG 5339) showed widely divergent profiles compared to strains confidently assigned to this genus (Fig. 5). These divergent signal patterns were reflected in both the MALDI-TOF MS and gyrB trees, with the related strains clustering well outside the genus Pantoea (Fig. 3). On the basis of 16S rRNA gene sequencing, strain LMG 5339 showed 99.5% identity to Buttiauxella agrestis DSM 4586 (45). MALDI-TOF MS analysis was able to discriminate strains within Pantoea and to segregate related strains into separate species/clades with the same level of accuracy as gyrB sequencing and more sensitively than either biochemical or 16S rRNA gene sequencing approaches. Clustering between strains within a group was not identical in the two methods, but these fluctuations at the subgroup level are inconsequential as long as the aim is to ensure that strains remain within a species. Misidentified strains previously grouped into P. agglomerans were accurately assigned to P. ananatis, P. dispersa, or the recently defined species P. vagans or to discrete clades following Brenner's biogroups (10). For example, strains could be assigned to biogroup VII (LMG 5336, ATCC 27993, ATCC 27994, and EM2cb), biogroup VIII (LMG 5341, ATCC 27991, and ATCC 27992), and biogroup XII (LMG 5337, ATCC 27981, and ATCC 27990) using both MALDI-TOF MS and gyrB sequencing (Fig. 3). Furthermore, MALDI-TOF MS was in agreement with gyrB sequencing regarding strains clustering together as P. stewartii (CFBP 3517 and CFPB3614), P. anthophila (SC-1 and EM13cb) or a probable novel Pantoea species (EM486 and EM595) (45). Our standardized MALDI-TOF MS protocol analysis using strains grown on LB agar proved to deliver highly repeatable results, with only a few replicates that do not immediately cluster with the other measurements performed on the same isolate. Even so, plotting of all the replicates easily allows one to discard substandard measurements which, in this case, still cluster within the same species and hence would retain all prerequisites for successful identification at this taxonomical level (see Fig. S1 in the supplemental material). The use of alternative media such as TSA or MHA shows that the alteration of the growth parameters does have a certain influence on the composition of the mass spectra obtained. This is reflected in the merging of nearby strain measurements in the dendrogram and leads to a loss of resolution that does not allow the unambiguous recognition of each single strain anymore. However, all measurements within the same species are still kept in a tight cluster, thereby preserving the conditions for unequivocal identification at species level (see Fig. S2 in the supplemental material).
Identification using MALDI-TOF MS not only provided robust recognition but was also able to detect protein profile differences within P. vagans C9-1 (4.88-Mb genome) resulting from a large genomic alteration such as curing of the 530-kb megaplasmid pPag3 (pigmentless variant C9-1W, 4.35-Mb genome) (49), as shown for instance for the loss of the 3,499-Da mass signal (Fig. 6). Other masses missing from the plasmidcured variant C9-1W were at 2,133, 2,140, 2,309, 2,396, 3,197, 3,552, 3,570, 4,295, and 4,850 Da, while no peak was found exclusively in the wild type. Since distinctive marker signals for P. vagans (Fig. 4) were unaffected by the loss of the plasmid and the number of missing masses was relatively low, neither the species-level assignment of this variant nor its position in the MALDI-TOF MS dendrogram was altered with respect to the wild type (Fig. 3). This can be explained by the fact that the main signals in MALDI-TOF MS were reported to be ribosomal proteins (47), and thus only a fraction of masses disappears from the profile following the loss of pPag3. Indeed, a number of characteristic masses recognized as markers for the identification of the considered Pantoea species are compatible with the predicted masses of Pantoea sp. At-9b ribosomal proteins deposited in the UniProt database (http: //www.uniprot.org/) within the allowed mass deviation of 800 ppm ( Table 2). Acquisition of the 135-kb virulence plasmid pPATH (32) also did not confound accurate identification of the phytopathogenic strain ATCC 43348 as P. agglomerans using MALDI-TOF MS. Moreover, replacement of the DHB matrix with sinapinic acid allowed us to identify the K43R (lysine to arginine) point mutation in 30S ribosomal protein S12 (12) that confers streptomycin resistance in the commercial biocontrol strain P. vagans C9-1S (Fig. 7). The primary drawback of MALDI-TOF MS for bacterial identification and diagnostics is the dearth of reference species in databases, which are limited primarily to clinical species while environmental bacteria such as Pantoea are largely absent. Unfortunately, there are at present no public repositories for diagnostic protein patterns, which are currently archived only within commercial databases such as that contained in SARAMIS. A further potential problem is the need of isolated colonies for MALDI-TOF MS analysis, a fact that may still represent a limitation for fastidious or slow-growing species (e.g., Bordetella sp., Borrelia sp., Neisseria gonorrhoeae, or Mycoplasma spp. in diagnostic laboratories). In such instances direct molecular or serological methods may still retain the advantage, although these methods often require some a priori assumption about the nature of the organism to be identified.
This study demonstrates both the accuracy and the simplicity of whole-cell MALDI-TOF MS for identification of the complex P. agglomerans group and related taxa. Strain identification using the unique protein profiles generated was achieved with minimal labor and materials and within a few minutes for MALDI-TOF MS sample preparation and analysis. This offers an attractive alternative to the relatively high investment required for single-locus validation, PCR amplification, and sequencing. Investment in a MALDI-TOF mass spectrometer is comparable to that needed for a 16-capillary DNAsequencing machine, but it requires a fraction of the operating costs and consumables. We also demonstrated the application of MALDI-TOF MS for clustering analysis of Pantoea, almost equivalent to gyrB phylogenetic analysis. The unique ICMS fingerprints for Pantoea species developed in this study will facilitate more accurate and rapid  Fig. 4) with the predicted masses of ribosomal proteins of Pantoea sp. At-9b deposited in the UniProt database (http://www.uniprot.org/) within the allowed mass deviation of 800 pm. Three single, common posttranslational modifications (N-terminal methionine loss, methylation, and acetylation) were considered to estimate the possible mass variations starting from the predicted mass of the protein in Pantoea sp. At-9b.
FIG. 7. MALDI-TOF MS identification of the spontaneous lysineto-arginine mutation in 30S ribosomal protein S12 (encoded by the rpsL gene) leading to streptomycin resistance in commercial biocontrol strain P. vagans C9-1S (BlightBan C9-1). The 13,607-Da mass signal found in wild-type strain C9-1 (in blue) corresponds to the predicted mass of RpsL in Pantoea sp. At-9b (molecular mass ϭ 13,737 Da; UniProt accession code, C8QDZ1), assuming the removal of the Nterminal methionine (Ϫ131 Da) and matrix-assisted protonation (ϩ1 Da) following the laser pulse. In streptomycin-resistant strain C9-1S (in red) the peak is shifted at 13,635 Da, whereas the positions of the other peaks remain constant. This variation (ϩ28 Da) corresponds precisely to the difference in molecular mass between arginine (molecular mass ϭ 174.2 Da) and lysine (molecular mass ϭ 146.2 Da). The masses (in daltons) of the ions are shown on the x axis. The m/z values represent mass-to-charge ratios. For improved clarity, the relative intensities of ions (percentages) on the y axis are 5-fold exaggerated between 13,300 and 13,970 Da.

VOL. 76, 2010
PANTOEA IDENTIFICATION BY MALDI-TOF MS 4507 identification of isolates from environmental and clinical samples using this technology.