Previous Article | Next Article ![]()
Applied and Environmental Microbiology, August 2003, p. 4566-4574, Vol. 69, No. 8
0099-2240/03/$08.00+0 DOI: 10.1128/AEM.69.8.4566-4574.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Centre for Infectious Diseases and Microbiology, Institute for Clinical Pathology and Medical Research, University of Sydney at Westmead Hospital, Westmead, New South Wales 2145,1 Institute for Magnetic Resonance Research and Department of Magnetic Resonance in Medicine, University of Sydney, Sydney, New South Wales 2006, Australia,2 Institute for Biodiagnostics, National Research Council of Canada, Winnipeg, Manitoba R3B 1Y6, Canada3
Received 6 March 2003/ Accepted 3 June 2003
|
|
|---|
|
|
|---|
Secondary metabolites and other compounds, such as proteins, lipids, or carbohydrates, have been utilized in chemotaxonomic approaches to the classification of fungi and lichenized fungi and for identification (11). Within the yeasts, classification to the level of genus has been achieved by using a monophasic approach, i.e., by analysis of profiles from particular groups of compounds, including fatty acids, carbohydrates, or polyols (1, 3, 13, 36). Identification of Candida isolates in routine laboratories is based on a polyphasic approach, which includes combinations of morphological characters, assimilation and fermentation profiles, identification of particular metabolites, and growth on differential media. These phenotypic methods are slow, may not distinguish between closely related species, and are sometimes unreliable, and not all clinically relevant species are included in the databases (10, 19, 23, 40, 42, 46, 49). Molecular methods of identification (including DNA-DNA reassociation, PCR fingerprinting, and DNA sequencing of small numbers of reference organisms) are generally more discriminatory than methods based on phenotypic characters. However, criteria for defining species boundaries based only on DNA sequences have not been agreed upon. Genotypic characters alone, which do not take into account expressed gene products of biological importance, may not provide a plausible and practical definition of species boundaries. As many genotypic and phenotypic characters as possible should be considered to further the understanding of species boundaries in the absence of sexuality.
NMR spectroscopy offers a high-throughput, rapid, polyphasic approach to establish the chemotype of microorganisms by providing information on a large range of metabolites rapidly and simultaneously. 1D proton (1H) NMR spectra of cell suspensions provide an overview of mobile hydrogen-containing compounds, which can be identified by multidimensional NMR correlation spectroscopy. Multivariate analysis and pattern recognition techniques detect differences in gross spectral characteristics (shape and pattern) of spectroscopic data from biological samples without the need to identify individual compounds (2, 14, 17, 24, 30, 44, 45). It was shown in a preliminary study that analysis of 1H NMR spectra by linear discriminate analysis is able to distinguish between selected species of streptococci and staphylococci with high accuracy (2).
We demonstrate here that application of a multistage, supervised SCS developed for analysis of NMR spectra (24, 44) results in accurate identification of isolates within the genus Candida.
|
|
|---|
Cultures.
Isolates were obtained from the culture collections at the Centre for Infectious Diseases and Microbiology (CIDM, University of Sydney at Westmead Hospital, Westmead, New South Wales, Australia), the ATCC (Manassas, Va.), and the CBS (Utrecht, The Netherlands). Recent clinical isolates were obtained from the CIDM Laboratory Services (CIDMLS), Institute of Clinical Pathology and Medical Research, Westmead Hospital. Type or neotype cultures of C. albicans (CBS 562), C. glabrata (CBS 138), C. krusei (CBS 573), C. parapsilosis (CBS 604), and C. tropicalis (CBS 94) were obtained from the CBS. Overall, 96% were clinical isolates (82% from eight hospitals in Australia and New Zealand, 9% from North and South America, 4% from Europe, and 1% from Asia) and 4% were of environmental origin. Isolates were stored either in autoclaved water at 25°C or in 10% glycerol in nutrient broth at -70°C. Identifications were made from duplicate cultures that had been incubated at 27°C for 42 to 48 h on SAB (Difco Laboratories, Detroit, Mich.). Specimens for MR spectroscopy were held at room temperature (20 to 30°C) for 1 to 4 h before use.
Identification.
Prior to storage, isolates had been identified biochemically (VITEK YBC or API ID32; BioMerieux, Marcy l'Etoile, France). All tests were carried out as specified by the manufacturers. To check for potential discrepancies arising from incorrect handling of stored cultures, random isolates were reidentified by conventional tests. Approximately 15% of isolates (n = 74) were also identified by PCR fingerprinting (23, 29). Conventional identification and PCR were performed routinely when there was disagreement between identification by conventional methods (VITEK/API) and statistical classification of NMR spectra.
PCR fingerprinting.
Briefly, genomic DNA was isolated and PCR was performed by using oligonucleotides of the minisatellite-specific core sequence of the wild-type phage M13 (5'-GAGGGTGGCGGTTCT-3') (29) as a single primer. Reactions were carried out in a Perkin-Elmer thermal cycler (model 480) as follows: denaturation (35 cycles of 20 s at 94°C), annealing (1 min at 50°C), extension (20 s at 72°C), and final extension (6 min at 72°C). Products were separated by electrophoresis in 1.4% agarose gels in Tris-borate-EDTA buffer at 3 V cm-1. Amplification products were detected by staining with ethidium bromide and were visualized under UV light. PCR fingerprint profiles were manually evaluated. Type strains of all tested species were included as reference strains.
NMR spectroscopy.
Yeast colonies were gently removed from the SAB plate with a plastic inoculation loop and were suspended in 0.5 ml in PBS-D2O (PBS [pH 7.2, room temperature] made up in 99.5% D2O) (Australian Nuclear Science and Technology Organization, Lucas Heights, Australia) to a final concentration of 108 to 5 x 109 CFU ml-1. The suspension was immediately transferred to a 5-mm NMR tube (Wilmad Glass Co., Inc., Buena, N.J.). 1H NMR spectra were obtained at 37°C on a Bruker Avance 360-MHz NMR spectrometer by using a 5-mm {1H, 13C} inverse-detection dual-frequency probe. 1H NMR spectra were acquired with acquisition parameters as follows: frequency, 360.13 MHz; pulse angle, 90o (6 or 7 µs); repetition time, 2.3 s; 4,096 data points; 32 transients; and spectral width, 3,600 Hz. The samples were spun at 20 Hz to prevent the cells from settling in the NMR tube. The field was locked to D2O. Water suppression was performed by a selective excitation field gradient method (18). Spectra of cell suspensions were stable at 37°C for at least 2 h. Spectra were processed by using Bruker xwinnmr 2.6 software. Chemical shift calibration was performed by setting the center of the spectrum to 4.64 ppm (nominal position of the water resonance with respect to tetramethylsilane in PBS-D2O at 37°C). The viability and purity of cultures were confirmed by plating fungal cell suspensions (two isolates per species) after 2-h NMR experiments (1H NMR plus 1H and 1H COSY).
Signal assignment.
2D homo- and heteronuclear correlation spectra were acquired for 7 to 12 isolates per species to assign 1H NMR resonances to specific compounds. {1H, 1H} gradient COSY experiments were performed in magnitude mode. Acquisition parameters were as follows: spectral width in t2, 3,600 Hz; t2 time domain, 2,048; 256 increments of four or eight acquisitions each; and relaxation delay, 1 s. Sine-bell window functions were applied in the t1 dimension, and Gaussian-Lorentzian window functions were applied in the t2 dimension. Zero filling was used to expand the data matrix to 1,024 in the t1 dimension. TOCSY spectra with mixing times of 40 and 150 ms were acquired with 256 increments of 2,048 data points and 16 acquisitions. {1H, 13C} single-bond shift correlation spectra were obtained in the 1H detection mode by using an HSQC pulse sequence. The 1H NMR spectral width was 3,600 Hz, and the 13C NMR spectral width was 15,000 Hz. 13C decoupling during acquisition was achieved by GARP-1 (41). The evolution time (t1) was incremented to obtain 256 FIDs, each of 32 or 64 acquisitions and consisting of 2,048 data points. The relaxation delay was 2 s. A sine-bell function was applied in the t2 dimension, and a Gaussian-Lorentzian function was applied in the t1 dimension. Zero filling to 1,024 was used in the t1 dimension prior to Fourier transformation. {1H, 13C} gradient HMBC were acquired without proton decoupling by using the same parameters as for the HSQC experiments, except for a 13C spectral width of 20 kHz. Single-bond and long-range correlation experiments were usually optimized for 1JC,H of 140 Hz and nJC,H of 7 Hz, respectively. 1H and 13C chemical shifts in HSQC and HMBC were referenced by using the two ethanol cross-peaks at 3.65/58.3 and 1.18/17.6 ppm, which were present in all samples. 1D 1H NMR spectra were acquired before and after the 2D experiments to verify absence of metabolic changes during the time in the magnet.
Quantification of HSQC spectra.
Quantification of metabolites based on 1D 1H NMR spectra was not possible due to overlapping resonances. We applied a modified method by Bubb et al. that was based on the integration of fully relaxed {1H, 13C} HSQC cross-peaks (4). The concentrations of amino acid residues, carbohydrates, and alditols were estimated by comparison of volumes of resolved HSQC cross-peaks with those of p-aminobenzoic acid. To account for differences in relaxation times and 1J{1H, 13C} coupling constants between different compounds (for most of the cross-peaks the 1J{1H, 13C} coupling constants were found to be between 130 and 145 Hz), calibration factors were determined by comparison of the NMR-based concentrations with chemically determined trehalose, glycerol, and lysine concentrations (one isolate per species). To determine experimental errors, the procedure was repeated with a duplicate culture. The error of the method was found to be in the order of 30 to 40%, confirming previous reports (4). Despite the recognized limitations associated with the measurement of peak volumes in 2D NMR spectra, the concentration estimates are considered to be more meaningful than are qualitative descriptions of peak intensities (51).
Reproducibility of NMR spectra.
Short-term method variability was investigated by examination of duplicate cultures. Any effect of storage, i.e., minor variations in methods or cultures (e.g., differences between batches of culture media), were sought by retesting two isolates per Candida species up to five times over a 20-month period. The effect of specified culture conditions on classification of NMR spectra was investigated with duplicate cultures of two isolates per species by varying the following: temperature (25 to 40°C), pH (pH 5.0 or 7.2), incubation time (24 to 192 h), growth medium (SAB plates or YPG broth), and storage at room temperature (0 to 192 h).
Classification of NMR spectra.
An SCS was employed that was specifically designed for NMR and infrared spectra of biofluids and tissue biopsies, where databases contain many fewer spectra than data points in each spectrum (attributes) (31, 44). NMR data were prepared for statistical classification by using software developed in-house (Xprep; IBD, NRC, Winnipeg, Canada). Magnitude spectra, consisting of 4,096 data points over a spectral width of 10 ppm, were reduced to the most informative 1,500 points between 0.35 and 4.00 ppm. The spectra were normalized to unit area in this region. The correct alignment of the NMR spectra was checked visually by simultaneous and sequential display of all NMR spectra through use of the lipid resonance at 1.3 ppm.
The magnitude NMR spectra were analyzed by a genetic algorithm-based ORS in order to reduce the number of attributes and hence eliminate redundant information (31). Two or three maximally discriminatory subregions in the 1D NMR spectra of each group (species) were selected for development of LDA-based classifiers. The averages of these subregions were used to develop LDA-based pairwise classifiers for all combinations of the five Candida species (n = 10).
These classifiers were made robust by a bootstrap-based cross-validation method developed in-house (IBD, NRC) (8, 43). Specifically, about half the spectra from each class were selected at random and were used to train an LDA classifier (the "training set"). The remaining spectra were then tested against this classifier. This process was repeated 1,000 times (with random replacement), and the optimized LDA coefficients were saved. The ultimate classifier consisted of the weighted output of the 1,000 different bootstrap classifier coefficient sets. Each classifier yielded probabilities of class assignment for the individual spectra. For each sample x, the a posteriori probabilities pm(x) for all five classes (species) m = 1, 2,..., K were calculated according to
![]() |
Class assignment was defined as crisp if the probability of belonging to one class was larger than the average between 1 and even probabilities ((1 + K-1)/2). Thus, for five classes, assignment was crisp if the probability was larger than 60%. Correct classification refers to assignment of a spectrum to the same species group as conventional classification, with a classification probability that is >60%. Indeterminate or fuzzy classification refers to assignment of a spectrum to any species group with a classification probability of
60%. Accuracy refers to the number of cultures correctly identified relative to the total number of cultures. Specificity refers to the number of cultures identified correctly as a given species relative to the number of cultures actually belonging to that species. Sensitivity refers to the number of correctly identified cultures from one species relative to the total number of cultures from that species.
Validation.
Classifiers were evaluated by using independent validation sets that consisted of NMR data from isolates that were not part of the training process (usually 130 to 200 cultures) and also included yeast species other than the five in the study. The initial classifiers were redeveloped and tested by using additional Candida isolates as they became available. Some spectra from the initial validation sets were later added to the training sets.
|
|
|---|
![]() View larger version (15K): [in a new window] |
FIG. 1. Representative 1H NMR spectra from suspensions of Candida species in PBS-D2O. (A) C. albicans, (B) C. krusei, (C) C. glabrata, (D) C. parapsilosis, and (E) C. tropicalis. Signal assignment was performed by using {1H, 1H} COSY and {1H, 13C} HSQC spectra. Abbreviations: CHOH, carbohydrate and polyol residues; eth, ethanol; lip, lipids; and N(CH3)3, betaine- and choline-containing metabolites (mainly glycerophosphocholine). Signal assignment was made from COSY spectra except for polyols and carbohydrate residues, when {1H, 13C} correlation NMR spectra (HMBC and HSQC) were used. 1H NMR spectra were obtained after processing in xwinnmr (multiplication with an exponential function resulting in a line broadening of 1 Hz, Fourier transformation, first- and second-order phase correction, and polynomial baseline correction).
|
|
View this table: [in a new window] |
TABLE 1. Metabolites of Candida species that were identified by 2D correlation NMR spectrad
|
|
View this table: [in a new window] |
TABLE 2. Classification results for cultures that were misidentified at some stage of the classifier developmenta
|
![]() View larger version (19K): [in a new window] |
FIG. 2. Accuracy of the classification of Candida species with increasing numbers of isolates. Classifiers were developed as isolates became available. A first set of pairwise classifiers was developed on 162 cultures. The accuracy of the classifiers on the training set is indicated by crosses. These classifiers were then tested against an independent validation set of cultures (circles). Only the number of cultures used for the development of classifiers (training set) is indicated. For number of cultures in the validation set, see Table 2. Accuracy was determined based on correct identification (compared to PCR fingerprinting). Isolates belonging to species not included in this study but part of the validation data sets were not considered.
|
|
View this table: [in a new window] |
TABLE 3. Performance of final pairwise classifiers on a training and validation set of NMR spectra from cell suspensions of Candida spp.a
|
Reproducibility and effect of specific culture conditions on identification by SCS/NMR.
The reproducibility of the method was evaluated by repeated testing of five cultures of two isolates per species. The results obtained under standard culture conditions were highly reproducible, resulting in only one fuzzy classification out of 50 cultures. The effect of different culture conditions was tested by using duplicate cultures of two isolates per species. Variation in the time of incubation (24, 49, 72, 96, and 192 h) revealed that, with the exception of one C. albicans culture that was indeterminately classified, identifications based on 24- to 72-h cultures were correct. Four cultures (two of C. albicans) were indeterminate, and two cultures were misclassified after incubation for 96 h. Seven cultures were indeterminate, and four were misclassified after incubation for 192 h. Two isolates of C. parapsilosis and one of C. albicans failed to grow at 40°C, but cultures incubated at 25°C and 30°C were identified correctly. Two cultures of C. krusei and one of C. parapsilosis were indeterminate after growth at 25°C. One culture of C. krusei and one of C. tropicalis were misidentified, and two cultures of C. krusei and one of C. glabrata were indeterminate after incubation at 35°C. One culture of C. krusei and one of C. tropicalis were misidentified after incubation at 40°C. Storage for up to 24 h at room temperature did not influence the classification results. Storage for an additional 96 h resulted in indeterminate or misclassification of 30% of all cultures and up to 50% of those of C. krusei and C. glabrata. The growth medium significantly influenced the NMR spectral characteristics. Cultures in YPG displayed quantitatively different metabolite profiles with generally larger amounts of ethanol and acetate than in those cultured on SAB plates (data not shown). This resulted in up to 50% of C. albicans and C. glabrata isolates being indeterminate or misclassified.
|
|
|---|
,
-trehalose when cultured at higher temperatures).
Statistical classification strategy.
1H NMR spectra of living cells contain information derived simultaneously from all mobile chemicals (metabolites and cell components) that are present in the cell suspension. The advantage of NMR spectroscopy compared with other biochemical methods of identification is that this information is retrieved rapidly from a single test. The need to assign the respective NMR spectra to particular metabolites is circumvented by analyzing variations in the signal pattern rather than identification of individual compounds. The SCS was specifically designed for the analysis of NMR and infrared spectra from biological fluids and tissues (44), where individual spectra contain many more data points (attributes) than the number of spectra (sample size) (reviewed in reference 24). The genetic algorithm-based ORS for attribute reduction used in NMR/SCS has the advantage over principal component analyses applied by others for analysis of NMR, FTIR, and Raman spectra (9, 17, 26, 27) that the selected features retain their spectral identity (correspond to spectral subregions). The most discriminatory spectral regions identified by ORS point to particular metabolites that contributed to a successful discrimination. These spectral regions (metabolites) are consistent for all isolates within the particular species. Thus, NMR/SCS also provides a means of rapidly screening for stable phenotypic markers (metabolite profiles) that can be used for classification. This is of particular value in yeast species with few distinctive phenotypic characters (28). SCS proved to be a robust method for identification of Candida isolates to the species level in the present study, based on analysis of only two spectral regions selected by the ORS. The smaller the number of attributes utilized for a successful analysis, the more robust it is, as both theoretical and empirical evidence suggests that the sample-to-attribute ratio must be greater than 10 to 1 (31).
The robustness and reliability of the NMR/SCS analysis were proven by testing against additional, independent isolates that were not included in the development of the initial classifiers. As expected, the accuracy of identification increased with increasing numbers of isolates in the training set (Fig. 2). Classifiers were considered robust when the accuracy and crispness of identification using an independent validation set approached those obtained on the training set. Convergence of the accuracy for the training and validation sets was achieved with 444 cultures in the training set (Fig. 2). This may vary for different classification problems and is potentially problematic for species where the number of known isolates is small. The larger the training set, the likelier it is that the randomly selected members (isolates) of a particular class (species) represent the most of the data space (phenotype range) occupied by all members of that species. The increased reliability of the classifiers with more isolates was confirmed by the improved results on isolates that were initially classified as indeterminate or misclassified (Table 2).
Metabolite profiles.
Although NMR/SCS is based on biochemical information in NMR spectra, identification of individual compounds is not necessary for successful classification. We confirmed by 2D correlation NMR spectroscopic techniques that many of the most discriminatory regions utilized for development of the pairwise classifiers in the SCS correspond to the polyol/carbohydrate region of the NMR spectra (3.5 to 4.0 ppm) and to a less specific region (1.5 to 2.5 ppm) exhibiting resonance of lipids, amino acid residues, acetate, and other O- and N-acetyl compounds. Differences in polyol, lipid, and carbohydrate composition of material from purified yeast extracts, cell walls, or capsular components have been used previously to classify fungi at a taxonomic level above species (1, 3, 13, 36). Table 1 shows that more than a single metabolite or group of compounds differ quantitatively between species. The advantage of NMR spectroscopic studies based on suspensions of whole cells is that all NMR-visible metabolites can be included in the analysis, which increases the discriminatory power of the method and results in a rapid sample throughput. Analysis of large databases is essential for any chemotaxonomic approach to overcome natural variations between isolates of the same taxon (5). Although the NMR/SCS approach selects out spectral regions and metabolites with the most discriminatory power, it still uses only a small portion of the entire NMR spectrum. The polyphasic approach of NMR spectroscopy, based on an overview of secondary metabolites, primary metabolite pools, and other compounds (including carbohydrates, lipids, and amino acids), provided a more discriminating and rapid method for classification of fungi to the species level than did monophasic chemotypic methods. In addition, the utilization of live cells rather than cell extracts minimizes manual sample handling. Alternative, nondestructive, and polyphasic spectroscopic techniques being investigated for microbial identification (FTIR and near infrared multichannel Raman spectroscopy) are less robust and cannot be used to identify particular metabolites (26, 49). Furthermore, these techniques require preprocessing of plate cultures before spectroscopy, which adds at least an extra 6 h to the identification time (26, 27).
Characterization of indeterminate and misclassified isolates.
It is of interest that classification using the final classifiers resulted in correct identification of 80% of previously misclassified isolates. Apart from the fact that the isolates used in our analysis still represent only a small fraction of the entire data space, additional factors that may have contributed to incorrect or indeterminate identification are the (i) presence of mixed cultures for seven isolates and (ii) intraspecific discontinuity among seven isolates (Table 2).
Mixed isolates were in most cases misidentified by both VITEK/API as well as NMR/SCS, with marginally better results for NMR-based identification, presumably because one of the cultures was dominating the NMR spectrum. NMR/SCS identification of the purified and recultured isolates resulted in the assignment to the correct species in all cases.
Intraspecies discontinuity has been reported in a number of Candida species (22, 25, 32, 38, 39). A small number of isolates, analyzed by NMR/SCS, were indeterminate or misclassified throughout the study. Most of these isolates showed both NMR spectra and genotypes (PCR and partial actin gene sequences) different from the majority. For example, three distinct groups have been recognized for C. parapsilosis (25, 38, 39). Among the less common types II and III, type III is more different from type I, consistent with allocation of a different biotype code in the API 20C identification kit (38). Some C. parapsilosis type II (WM 1.57) and type III (WM 01.18, WM 1007, and WM 1.56) isolates included in this study were identified by partial sequence analysis of the actin gene (7). Whereas only one culture of WM 1.57 was at some stage indeterminately classified by NMR/SCS, all cultures of the more distantly related type III were repeatedly indeterminate or misclassified (Table 2), indicating unique NMR spectra and therefore "chemotypes" that were distinct. Similarly, isolates of C. glabrata (WM 929, WM 01.181, and WM 1107) with an unusual PCR fingerprinting pattern (data not shown) were more often classified as indeterminate or misclassified by NMR/SCS.
Identification of unusual clinical isolates.
Some isolates in the blinded validation sets were found subsequently to belong to species not included in this study (Table 2). They were either indeterminately classified or misclassified as closely related species. For example, isolates of C. dubliniensis were consistently identified as C. albicans, consistent with their close genetic relationship (48). The type strain of C. dubliniensis (CBS 7987) was also classified as C. albicans when compared against the classifiers (data not shown). In other cases metabolic similarities resulted in consistent indeterminate identification or misidentification. For example, Yarrowia lipolytica, a species with high lipid content, was misidentified as C. tropicalis, the species in the training set with the highest lipid content. C. neoformans was consistently identified as C. glabrata. Both are characterized by a very high trehalose content (16) (Table 2).
Identification of species outside those used for classifier development requires an additional data analysis approach based for example on a distance analysis. This was applied successfully for analysis of complex proteomic data (R. L. Somorjai et al., unpublished data) and is under development for microbial applications.
The presented data suggest that NMR/SCS will provide a rapid chemotypic method for microbial identification that can be fully automated. Species characterized by paucity of phenotypic characters were investigated in this study, and the accuracy of NMR/SCS exceeded that of traditional identification systems.
This work was supported by the National Health and Medical Research Council of Australia (grant 153805).
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»