Previous Article | Next Article ![]()
Applied and Environmental Microbiology, June 2008, p. 3573-3582, Vol. 74, No. 11
0099-2240/08/$08.00+0 doi:10.1128/AEM.02526-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Pacific Northwest National Laboratory, P.O. Box 999, Richland, WA 99352
Received 8 November 2007/ Accepted 23 March 2008
|
|
|---|
|
|
|---|
It became clear early in the investigation of the anthrax letters that genetic identity alone was not necessarily sufficient to lead investigators to a perpetrator. Analytical methods were needed that could differentiate genetically identical organisms produced under different conditions and also provide information about how a particular batch of organisms was produced. In response, scientists have been exploring the use of a variety of analytical approaches to characterize changes in microbial signatures with culture conditions (4, 8, 20, 31, 32, 35, 36). A major task faced in these studies is assessing how the profile of a given species varies in response to different culture conditions.
Different organisms prefer different growth media. Nearly all media, however, contain carbon and nitrogen sources, sulfur, mineral ions, and water (5). Sugar, yeast, soy, peptone, and protein hydrolysates are just a few of the different sources of carbon used in culture media. Nitrogen can sometimes be derived from the same source as carbon, but it can also be supplied in the form of inorganic salts, such as NH4Cl or NaNO3. Sulfur can often be supplied with the addition of salts, such as (NH4)2SO4, or it can be obtained from cystine- or methionine-containing constituents, such as peptone. Additionally, one or more minerals, such as Ca2+, K+, Na+, Mg2+, Mn2+, Fe2+ or Fe3+, Zn2+, Cu2+, or Co2+, are added in the form of salts.
Many aspects of microorganisms are known to vary with the culture medium and thus could provide clues to production conditions. For example, the protein expression and content of microorganisms vary with growth conditions (1, 2, 24), as does the lipid content (29, 33). The stable isotope ratios of heterotrophic microbes, like those of other heterotrophs, are a function of the stable isotope ratios of their growth medium nutrients and water and thus vary with the growth environment (11, 12, 13, 21, 22). In addition to triggering intrinsic variation in the compositions of microorganisms, growth media may also leave direct traces, such as medium-specific metabolic products or unused medium components, on microbial cells.
The utilities of various analytical techniques and approaches for characterizing an organism's growth environment have been explored. Valentine et al. (31). demonstrated reproducible differences in matrix-assisted laser desorption ionization mass spectrometry (MALDI MS) signatures of Bacillus subtilis spores grown in different culture media. Kreuzer-Martin and Jarman (20) have discussed the usefulness of 13C/12C and 15N/14N isotope ratios for characterizing culture media. Cliff et al. (8) suggested secondary-ion MS (SIMS) as a means for identifying a culture medium based on the metal content. Edberg et al. (H. C. Edberg, C. E. Petersen, N. B. Valentine, D. S. Wunschel, and K. L. Wahl, presented at the 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, WA, 2006; H. C. Edberg, C. E. Petersen, N. B. Valentine, and K. L. Wahl, unpublished data) demonstrated a method for detecting the presence of agar in microbial samples based on electrospray ionization (ESI) MS and derivatization gas chromatography MS. Whiteaker et al. (34) developed a MALDI MS-based method for detecting heme on Bacillus spores.
Each of these individual techniques has the potential to capture one aspect of an organism's growth environment. However, by combining information from different "orthogonal" techniques, a more complete characterization seems possible. We previously presented a Bayesian classification scheme for identifying the culture medium of an unknown source microorganism when signatures of the organism in candidate culture media are available (K. H. Jarman, K. L. Wahl, N. B. Valentine, J. B. Cliff, H. Kreuzer-Martin, C. E. Petersen, H. C. Edberg, and D. S. Wunschel, presented at the 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, WA, 2006). Built from probability models of isotope ratio MS (IRMS), SIMS, MALDI MS, and ESI MS data, this scheme integrates disparate data sets by converting them into a likelihood score ranging from zero to one and multiplying the different likelihoods to produce a single, integrated score for every candidate culture medium. By integrating the different instrument data using this scheme, the culture media were correctly identified 92% of the time on average, as opposed to the 61% average correct identification rate for the individual instruments.
This approach has some serious drawbacks. First, it requires a signature for the microorganism in every culture medium of interest. While it may be possible to collect such signatures when reference samples from a suspect laboratory are available, in many cases no reference samples will be available. Even if a large database of culture medium signatures could be constructed, the culture medium from an unknown sample could be made from atypical components, spiked with additional metals or other compounds, or made without following any known recipe, making a database-matching approach extremely challenging. The question then becomes one of whether a data analysis framework can be developed so that culture medium characterization is possible when traditional signatures are unavailable.
Here, we expand on the previous work by developing a Bayesian network for microbial forensics. This new approach alleviates the need for a database of specific signatures of interest and allows us to characterize culture medium components as opposed to complete culture medium recipes. It also has several other benefits. First, a Bayesian network provides an intuitive visual representation of the relationships between culture media and analytical measurements. Second, it allows us to model dependencies that sometimes occur between nominally orthogonal measurement techniques. Finally, an existing network can easily be expanded to include more measurement techniques and more culture medium components, particularly with the use of a free or commercially available Bayesian network software package.
Initially developed to express causal relationships in a probabilistic setting, Bayesian networks have become a well-established decision support tool used in many applications ranging from disease diagnosis to oil drilling. Many books on the subject have been written (10, 15, 19, 26, 28). Additionally, numerous commercial and free software products are available to assist users in developing Bayesian networks for virtually any application. (The interested reader is referred to reference 26 and http://en.wikipedia.org/wiki/Bayesian_network for lists of available software.)
In this application, we present a Bayesian network that follows the causal relationship from selection of a culture medium recipe to the addition of recipe components and the SIMS, ESI MS, and 13C and 15N isotope ratios of spores grown in the medium. We discuss the benefits of our approach, such as the use of intuitive probability scores and the ability to analyze samples from partial evidence. Using data collected from Bacillus spores grown under multiple culture conditions, we demonstrate the ability of this Bayesian network to characterize a culture medium without constructing a reference signature database. The results of this limited study show an error rate of less than 7% in characterizing carbon and nitrogen sources, addition of metals, and the presence of agar and an error rate of 19% in characterizing the culture medium recipe.
|
|
|---|
200 additional ESI MS analyses of Bacillus thuringiensis 58890 and Bacillus anthracis Sterne grown in brain heart infusion agar, brain heart infusion broth, blood agar, tryptic soy agar [TSA], and tryptic soy broth [TSB] using methods described below; and [iii]
300 additional IRMS analyses of B. subtilis 6051 spores grown in Schaeffer's sporulation medium broth (SSMB) and agar (SSMA), TSB, TSA, Columbia broth, and LLB [20].) Here, we describe the laboratory and instrumental data collection methods for the designed study. Additional data sets were collected in an analogous manner; however, replication in cultures and analyses were not always performed. For the designed experiment, three replicate B. subtilis 49760 cultures were prepared both on agar plates and in broth using modified SSM (16) in which MgCl2 was substituted for MgSO4 (Fox), G (17), LD (23), and NSM (30). Vegetative starter cultures were inoculated from frozen stocks into TSB and incubated overnight at 30°C on a rotary shaker. For broth cultures, 1 ml of the overnight culture was added to 225 ml liquid medium, and the cultures were incubated at 37°C for 5 to 7 days on a rotary shaker at 150 rpm. Agar plates were spread with 150 µl overnight culture and incubated upside down at 37°C for 3 to 5 days. The cultures were checked microscopically and harvested when >95% spores were observed. The spores were washed a minimum of five times with distilled/deionized water. Following the washing, the spores were checked again for purity using phase-contrast microscopy and enumerated by plate counting. Aliquots of each spore preparation were then delivered for instrument analysis. When possible, replicate analyses were performed on each instrument for each aliquot. A description of the instrument analyses and the Bayes framework for data integration is provided below.
Metal analysis.
The residual metal composition was measured using SIMS (ION-TOF IV instrument; IONTOF GmbH) according to the experimental protocol described by Cliff et al. (8). The relative abundances of Na+, K+, Mn+, Mg+, Ca+, Fe+, Zn+, and Cu+ were used in the data analysis. For illustration, typical SIMS spectra of B. subtilis 49760 grown in different media are provided in Fig. 1.
![]() View larger version (22K): [in a new window] |
FIG. 1. SIMS metal profiles of B. subtilis 49760 in different growth media.
|

), where 
= [(RA/RStd) – 1]·1,000
, and RA and RStd are the molar ratios of the rare isotope to the abundant isotope (e.g., 13C/12C) in the sample and the standard. The standard for carbon is Peedee belemnite, a fossil limestone from South Carolina, and for nitrogen, it is air (9). 13C/12C and 15N/14N isotope ratios (expressed as 
) of Bacillus spores grown in different media are shown in Fig. 2.
![]() View larger version (21K): [in a new window] |
FIG. 2. Carbon and nitrogen stable iosotope ratios of Bacillus spores (multiple species) in common media. Col A, Columbia agar; Fox A and B, modified SSMA and SSMB.
|
![]() View larger version (23K): [in a new window] |
FIG. 3. ESI MS agar profiles of B. subtilis 49760 in agar-based medium. Ions marked by an asterisk are due to digested agar. The arrows indicate the progression of agar ions through three tandem mass spectral analyses. Only the MS3 329 ion is used as an indicator of the presence of agar in this study.
|
probM1{x CMi} · probM2{x CMi} · ... · probMN{x CMi}, where x refers to data from instruments M1, M2, ... MN and CMi refers to culture medium i. This relationship is particularly useful, since it allows us to reverse likelihoods from what we can measure (prob{data|CM}) to what we would like to know (prob{CM|data}). It also provides the basis for a simple and powerful classification scheme. In particular, given a sample grown in unknown culture medium, mass spectral analyses can be performed and prob{CMi|x} can be calculated for each culture medium. The culture medium with the highest probability is then identified as the culture medium of the unknown. Bayes networks expand on this relationship (25). Also called belief networks, Bayes networks were originally used to model causal relationships in a process. By imposing certain conditional-independence assumptions on the nodes of the network, Bayes' theorem can be used to make inferences about the underlying (unobserved) state of a process from measurement data or other evidence. Since their original inception, Bayes networks have proven to be useful for modeling complex systems and processes over a broad spectrum of applications.
Following the classical approach, we define the Bayes network for characterizing a culture medium through causal relationships. In particular, the culture medium used determines the ingredients to be used. The specific ingredients used are consumed by the microorganism during the culturing process, and their effects are measured through SIMS, IRMS, and ESI MS analyses after the fact.
This modeling process is illustrated by way of the directed acyclic graph (DAG) in Fig. 4. The circles in the DAG representing various steps in the process are called nodes. Arrows, called edges, connect the nodes. The arrows point from parent nodes to child nodes, and the directions of the arrows in the graph represent causal relationships (i.e., from culture medium recipe to ingredients to SIMS, IRMS, and ESI MS analyses of the cultured microorganisms).
![]() View larger version (22K): [in a new window] |
FIG. 4. DAG for integrated characterization of culture media. The nodes indicate different steps in the process, while the arrows indicate causal relationships between the nodes. IR, isotope ratio.
|
The DAG in Fig. 4 illustrates the dependencies in the process and provides the basis for our framework. To complete the Bayes network, a series of conditional-probability models must be developed. First, we make the standard assumptions that any two child nodes are conditionally independent of one another, given that we know the state of the parent node (10, 25). This means, for example, that the ESI MS peak intensities and the IRMS peak intensities are orthogonal (and statistically independent) of one another if we know whether agar was added to the culture medium.
Next, we specify conditional probabilities associated with transition from parent nodes to their children (i.e., prob{metals added|CMi}, prob{agar|added CMi}, prob{C/N source|CMi}, prob{SIMS intensities|known metals added}, prob{ESI-MS ion intensity|CMi}, prob{13C/15N isotope ratios agar added|CMi}). We also put a priori probabilities on the different possible culture media for the organism under consideration. A combination of scientific understanding and empirical data is used to determine all of these probabilities, and their specification is provided below.
Once the Bayes network has been fully specified, it can be used to take evidence (SIMS, IRMS, and/or ESI MS data) and make inferences about the unobserved nodes in the process (metals added, agar added, C/N sources, and culture medium). In particular, the rules of Bayesian statistics are applied to data to obtain probabilities, or scores, for the culture medium ingredients and the culture medium. By ranking these scores, we can determine which culture medium components and culture medium were most likely used.
(i) Culture medium.
The states of the culture medium node represent the collection of all possible culture media, CMi, and the a priori probability attached to each state represents our initial belief that the culture medium used was CMi. For microorganisms that grow well in a variety of media, such as B. anthracis, these probabilities might be evenly weighted over several of the CMi. For organisms that grow well in only a few media, such as Francisella tularensis, only a few culture media will have nonzero probabilities. For our study of B. subtilis, we placed equal a priori probabilities on all of the following media: Luria broth (LB), LB with agar, GB and GA, LDB and LDA, NSMB and NSMA, LLB and LL agar (LLA), TSB and TSA, SSB and SSA, and nutrient broth (NB) and agar.
(ii) Metals added.
Ca2+, K+, Na+, Mg2+, Mn2+, Fe2+ or Fe3+, Zn2+, Cu2+, and Co2+ are often added to culture media in different amounts. Our SIMS data provided intensities for each of these ions; however, preliminary data analysis indicated that the relationship between the addition of metals and the SIMS relative intensity is not readily modeled. In particular, some metals, such as Zn2+, showed an increase in relative ion intensity for the media to which Zn2+ was added. Others, such as Fe2+, showed the opposite. These seemingly nonintuitive observations may be due to the ubiquitous nature of metals in many base culture medium components, tight intercellular regulation of key metals (27), or the complex way in which mineral ions are utilized by microorganisms (18).
In our study, two metals (Zn2+ and Cu2+) appeared to have a strong, predictable effect on the relative SIMS intensity and were therefore included in the metals-added node. The conditional probabilities for this node reflect the explicit addition of Zn2+ and Cu2+, given each culture medium, and are derived directly from the culture medium recipes. For example, since Zn2+ and Cu2+ are both added to glucose medium, we set prob{Zn2+ added|G} equal to 1 prob{Zn2+ not added|G} equal to 0 and prob{Cu2+ added|G} equal to 1 prob{Cu2+ not added|G} equal to 0.
The conditional probabilities for the states of this node, namely, all possible combinations of Zn2+ and Cu2+, were then obtained by multiplying the individual Zn2+ and Cu2+ conditional probabilities. (This gives prob{Zn2+ added and Cu2+ added|G} = 1. All other combinations have zero probability). We noted that aside from G, none of the other media included in this study had Zn2+ or Cu2+ explicitly added. This is a limitation we hope to overcome in future studies.
(iii) Agar added.
The agar-added probabilities are derived directly from the culture medium recipe. For the agar media, prob{agar added|agar-based medium} is equal to 1. For the broth media, prob{agar added|broth-based medium} is equal to 0.
(iv) C/N food source.
Based on the culture medium recipes considered in this study, the carbon and nitrogen food sources were divided into five groups, each with an agar and broth form: yeast/tryptone (LB and LB with agar), yeast/sugar (GA and GB), soy/tryptone (TSA and TSB), beef extract/peptone (NB, nutrient agar, LLB, LLA, SSB, SSA, LDB, and LDA), and beef extract/tryptone (NSMB and NSMA). The conditional probabilities for each group were derived directly from the culture medium recipes, and all were either zero or one.
(v) SIMS intensities.
The probabilities for the SIMS intensities were calculated using a combination of data from the designed study and a historical data set containing B. subtilis 49760 spores grown in LLB. We employed a qualitative metric for these data, specifically, whether the ion intensity, relative to the sum of K+, Mg+, Mn+, Fe+, Zn+, and Cu+ ion intensities, exceeded a previously specified threshold.
Box plots for the relative Zn+ and Cu+ ion intensities (Fig. 5) show the range of relative ion intensities for Zn+ (or Cu+) when the metal was or was not explicitly added to the culture medium. The threshold, selected visually, is also indicated in Fig. 5. From these data, we have the following: prob{Cu+ relative intensity > 0.00025|Cu2+ explicitly added to medium} = 0.5; prob{Cu+ relative intensity > 0.00025|Cu2+ not explicitly added to medium} = 0.18; prob{Zn+ relative intensity > 0.0005|Zn2+ explicitly added to medium} = 0.54; and prob{Zn+ relative intensity > 0.0005|Zn2+ not explicitly added to medium} = 0.008.
![]() View larger version (15K): [in a new window] |
FIG. 5. Box plots of SIMS Zn and Cu intensities (Int.) for B. subtilis spores in multiple media. The center lines through the boxes indicate the median values. The heights of the boxes indicate the spread of 75% of the measurements. The dotted lines spanning the plots show the critical threshold (thresh) used in the Bayesian network.
|
![]() View larger version (10K): [in a new window] |
FIG. 6. Box plots of the ESI-MS MS3 329 m/z ion intensities (Int.) for multiple Bacillus species in multiple culture media. The center lines through the boxes indicate the median values. The heights of the boxes indicate the spread of 75% of the measurements. The dotted lines spanning the plots show the critical threshold used in the Bayesian network.
|
![]() View larger version (22K): [in a new window] |
FIG. 7. Nitrogen and carbon isotope ratios of B. subtilis spores produced in different media identified by C/N source. Agar- and broth-grown samples are included in the plot without being explicitly indicated.
|
|
View this table: [in a new window] |
TABLE 1. Probabilities of different 13C and 15N isotope ratio ranges by growth medium type (broth samples)
|
|
|
|---|
![]() View larger version (40K): [in a new window] |
FIG. 8. Screenshot of Bayesian network applied to data on B. subtilis spores grown in GA.
|
![]() View larger version (40K): [in a new window] |
FIG. 9. Illustration of Bayesian network on data from B. subtilis spores grown in LDB.
|
![]() View larger version (43K): [in a new window] |
FIG. 10. Illustration of Bayesian network on data from B. subtilis spores grown in TSB with ESI MS data omitted.
|
When evidence for a single culture medium is weak or when the true culture medium of the sample has not been included in the Bayes net, the probability scores for individual culture medium components become crucial. This is illustrated in Fig. 9, where a specific culture medium cannot be identified but where the probabilities for a broth culture with beef/peptone base and no Zn2+ or Cu2+ are all strong. Ultimately, this may be the most useful information, because it would allow investigators to partially characterize the culture environment of an unknown sample without having the exact culture medium recipe entered in the network.
Finally, Fig. 10 shows how partial evidence can be used to characterize a sample. The isotope ratios and SIMS intensities of spores grown in TSA are entered into the network, but the ESI MS data are omitted as if they were unavailable. In this case, the probability scores for both TSB and TSA are approximately the same, indicating that the media cannot be differentiated based on the evidence provided. However, the probabilities for soy/tryptone base and no Zn2+ or Cu2+ are strong, giving us a partial characterization of the medium. It is also interesting that the probability for broth culture is slightly higher than for agar, even though no ESI MS data were entered. This is because of the modeled dependency between the agar node and the isotope ratio node. Even though the 13C and 15N isotope ratios do not directly measure agar, the 15N isotope ratio is affected by agar. Thus, when isotope ratio data are propagated through the Bayes network, the 15N value causes the probabilities for broth and agar to adjust slightly.
Data from all the studies listed in Materials and Methods were combined and used in a cursory simulation designed to test the performance of the Bayes network. In some cases, such as the designed study, the IRMS, SIMS, and ESI MS data came from the same B. subtilis culture. In others, IRMS, SIMS, and ESI MS data were paired randomly and run through the Bayes net as if all three data sets were from the same culture. In an attempt to use as much of the data as possible, three substitutions were made. First, in the designed study, data from modified SSM (Fox) agar and broth were available. Additional data were available from SSM. The two recipes were identical, except that Fox medium calls for 0.012% MgCl and SSM calls for 0.012% MgSO4 heptahydrate. Since our analysis did not include the anion Cl– or SO42–, we randomly combined data from the two media. Second, replicate ESI MS data were limited. Our observations indicated that the culture medium has little or no effect on the ESI MS analysis results (Edberg et al., unpublished). Therefore, when ESI MS data from the same broth (or agar) culture medium were unavailable, we randomly paired another broth (or agar) ESI MS data set with the IRMS and SIMS data to produce our results. Finally, data from all Bacillus samples were combined without regard to species.
In performing the data analysis, we established criteria for identification of the culture medium as follows. If the top probability score for a culture medium was at least 0.45 and twice as large as the second top score, then we considered it to be strong evidence for the top-scoring culture medium. Otherwise, it was considered to be weak evidence. The results are tabulated in Table 2. In the table, the numbers of correct and incorrect strong identifications are provided, along with the number of weak identifications and comments on the results.
|
View this table: [in a new window] |
TABLE 2. Characterization of growth media of Bacillus spores using a Bayesian network
|
Table 3 tabulates the performance of the Bayes network in characterizing the culture medium components. The numbers of correct and incorrect characterizations of metals, agar, and C/N source are provided. With the exception of SSM, the numbers of errors are very small, and the tallies in the bottom row of the table indicate that the error rates are low.
|
View this table: [in a new window] |
TABLE 3. Characterization of medium components of Bacillus spores using a Bayesian network
|
Conclusions.
Early studies indicated that different growth environments impose distinct signatures on microorganisms. However, taking these early results and translating them into practical solutions in forensics, where known signatures are not always available for comparison, is not a trivial task. The work presented here provides an avenue for bridging this gap between basic science and practice.
The integration of multiple analytical techniques allows us to maximize the amount of information obtained from unknown-source microorganisms. In this work, we focused on characterizing metals, agar, and carbon/nitrogen sources in culture media, but our framework can be easily extended to include any number of analytical techniques measuring a wide range of production environment characteristics. Once a relationship between a particular aspect of the culture environment and a specific analytical measurement is established, all that is needed is a proper set of training data and some science-based statistical modeling to incorporate the new technique into this approach.
The Bayesian network approach allows us to combine scientific understanding with well-established statistical methodologies to characterize a microbe's growth environment without the need for reference signatures. The use of probability scores provides the user with an estimate of the uncertainty in the analysis, which is crucial to investigators who would like to have some measure of the strength of their evidence. Even when faced with a previously uncharacterized culture medium, the probability scores attached to the medium components still provide investigators with useful information about the organism's growth environment.
A Bayesian network is easily expanded and adapted. Any number of culture media can be included in the network without added data collection, and the a priori probabilities attached to each culture medium can be used to weight the likelihood of different culture media based on their effectiveness in growing a given organism. In addition, a Bayesian network can be easily adapted to our increasing understanding of microbial forensics by adjusting the conditional probabilities embedded in the model as more data are added.
The study presented here serves only as an initial proof of concept of the approach. A scientifically defensible Bayesian network for microbial forensics should integrate biological models, as well as a statistical description of each node based on carefully collected data. Additionally, a sensitivity study characterizing changes in the performance of the Bayesian network with changes in the specified parameters would provide a measure of the robustness of this approach. Finally, the network needs to be demonstrated under realistic conditions. The effects of different species on our models need to be established, and a broader range of culture media need to be studied. In addition, the effects of interference and the presence of complex backgrounds need to be evaluated. Nonetheless, this framework provides researchers with a methodology that can be readily expanded upon and refined as our knowledge and understanding of microbial forensics increase.
We thank Stephen Golledge at the University of Oregon for his assistance. Stable isotope ratio analyses were performed at the Stable Isotope Ratio Facility for Environmental Research at the University of Utah, Salt Lake City. We thank James Ehleringer for his support and Lesley Chesson, Michael Lott, Janet Barnette, and Jeremiah Hoffman for their assistance. Finally, we thank Thomas A. Martin for his assistance with preparation of graphics for this publication.
Published ahead of print on 4 April 2008. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»