This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Jarman, K. H.
Right arrow Articles by Wahl, K. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jarman, K. H.
Right arrow Articles by Wahl, K. L.
Agricola
Right arrow Articles by Jarman, K. H.
Right arrow Articles by Wahl, K. L.

 Previous Article  |  Next Article 

Applied and Environmental Microbiology, June 2008, p. 3573-3582, Vol. 74, No. 11
0099-2240/08/$08.00+0     doi:10.1128/AEM.02526-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Bayesian-Integrated Microbial Forensics{triangledown}

Kristin H. Jarman,* Helen W. Kreuzer-Martin, David S. Wunschel, Nancy B. Valentine, John B. Cliff, Catherine E. Petersen, Heather A. Colburn, and Karen L. Wahl

Pacific Northwest National Laboratory, P.O. Box 999, Richland, WA 99352

Received 8 November 2007/ Accepted 23 March 2008


arrow
ABSTRACT
 
In the aftermath of the 2001 anthrax letters, researchers have been exploring ways to predict the production environment of unknown-source microorganisms. Culture medium, presence of agar, culturing temperature, and drying method are just some of the broad spectrum of characteristics an investigator might like to infer. The effects of many of these factors on microorganisms are not well understood, but the complex way in which microbes interact with their environments suggests that numerous analytical techniques measuring different properties will eventually be needed for complete characterization. In this work, we present a Bayesian statistical framework for integrating disparate analytical measurements. We illustrate its application to the problem of characterizing the culture medium of Bacillus spores using three different mass spectral techniques. The results of our study suggest that integrating data in this way significantly improves the accuracy and robustness of the analyses.


arrow
INTRODUCTION
 
The anthrax mailings of 2001 dramatically heightened concerns about the possibility of terrorist incidents involving microbiological agents. In the wake of the attacks, microbial forensics has emerged as a new focus area for research. Microbial forensics involves characterization of microorganisms used as weapons for the purpose of identifying and convicting those responsible (6, 7). Researchers in this nascent field have been working to develop methods that provide information useful in an investigation and ultimately a courtroom.

It became clear early in the investigation of the anthrax letters that genetic identity alone was not necessarily sufficient to lead investigators to a perpetrator. Analytical methods were needed that could differentiate genetically identical organisms produced under different conditions and also provide information about how a particular batch of organisms was produced. In response, scientists have been exploring the use of a variety of analytical approaches to characterize changes in microbial signatures with culture conditions (4, 8, 20, 31, 32, 35, 36). A major task faced in these studies is assessing how the profile of a given species varies in response to different culture conditions.

Different organisms prefer different growth media. Nearly all media, however, contain carbon and nitrogen sources, sulfur, mineral ions, and water (5). Sugar, yeast, soy, peptone, and protein hydrolysates are just a few of the different sources of carbon used in culture media. Nitrogen can sometimes be derived from the same source as carbon, but it can also be supplied in the form of inorganic salts, such as NH4Cl or NaNO3. Sulfur can often be supplied with the addition of salts, such as (NH4)2SO4, or it can be obtained from cystine- or methionine-containing constituents, such as peptone. Additionally, one or more minerals, such as Ca2+, K+, Na+, Mg2+, Mn2+, Fe2+ or Fe3+, Zn2+, Cu2+, or Co2+, are added in the form of salts.

Many aspects of microorganisms are known to vary with the culture medium and thus could provide clues to production conditions. For example, the protein expression and content of microorganisms vary with growth conditions (1, 2, 24), as does the lipid content (29, 33). The stable isotope ratios of heterotrophic microbes, like those of other heterotrophs, are a function of the stable isotope ratios of their growth medium nutrients and water and thus vary with the growth environment (11, 12, 13, 21, 22). In addition to triggering intrinsic variation in the compositions of microorganisms, growth media may also leave direct traces, such as medium-specific metabolic products or unused medium components, on microbial cells.

The utilities of various analytical techniques and approaches for characterizing an organism's growth environment have been explored. Valentine et al. (31). demonstrated reproducible differences in matrix-assisted laser desorption ionization mass spectrometry (MALDI MS) signatures of Bacillus subtilis spores grown in different culture media. Kreuzer-Martin and Jarman (20) have discussed the usefulness of 13C/12C and 15N/14N isotope ratios for characterizing culture media. Cliff et al. (8) suggested secondary-ion MS (SIMS) as a means for identifying a culture medium based on the metal content. Edberg et al. (H. C. Edberg, C. E. Petersen, N. B. Valentine, D. S. Wunschel, and K. L. Wahl, presented at the 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, WA, 2006; H. C. Edberg, C. E. Petersen, N. B. Valentine, and K. L. Wahl, unpublished data) demonstrated a method for detecting the presence of agar in microbial samples based on electrospray ionization (ESI) MS and derivatization gas chromatography MS. Whiteaker et al. (34) developed a MALDI MS-based method for detecting heme on Bacillus spores.

Each of these individual techniques has the potential to capture one aspect of an organism's growth environment. However, by combining information from different "orthogonal" techniques, a more complete characterization seems possible. We previously presented a Bayesian classification scheme for identifying the culture medium of an unknown source microorganism when signatures of the organism in candidate culture media are available (K. H. Jarman, K. L. Wahl, N. B. Valentine, J. B. Cliff, H. Kreuzer-Martin, C. E. Petersen, H. C. Edberg, and D. S. Wunschel, presented at the 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, WA, 2006). Built from probability models of isotope ratio MS (IRMS), SIMS, MALDI MS, and ESI MS data, this scheme integrates disparate data sets by converting them into a likelihood score ranging from zero to one and multiplying the different likelihoods to produce a single, integrated score for every candidate culture medium. By integrating the different instrument data using this scheme, the culture media were correctly identified 92% of the time on average, as opposed to the 61% average correct identification rate for the individual instruments.

This approach has some serious drawbacks. First, it requires a signature for the microorganism in every culture medium of interest. While it may be possible to collect such signatures when reference samples from a suspect laboratory are available, in many cases no reference samples will be available. Even if a large database of culture medium signatures could be constructed, the culture medium from an unknown sample could be made from atypical components, spiked with additional metals or other compounds, or made without following any known recipe, making a database-matching approach extremely challenging. The question then becomes one of whether a data analysis framework can be developed so that culture medium characterization is possible when traditional signatures are unavailable.

Here, we expand on the previous work by developing a Bayesian network for microbial forensics. This new approach alleviates the need for a database of specific signatures of interest and allows us to characterize culture medium components as opposed to complete culture medium recipes. It also has several other benefits. First, a Bayesian network provides an intuitive visual representation of the relationships between culture media and analytical measurements. Second, it allows us to model dependencies that sometimes occur between nominally orthogonal measurement techniques. Finally, an existing network can easily be expanded to include more measurement techniques and more culture medium components, particularly with the use of a free or commercially available Bayesian network software package.

Initially developed to express causal relationships in a probabilistic setting, Bayesian networks have become a well-established decision support tool used in many applications ranging from disease diagnosis to oil drilling. Many books on the subject have been written (10, 15, 19, 26, 28). Additionally, numerous commercial and free software products are available to assist users in developing Bayesian networks for virtually any application. (The interested reader is referred to reference 26 and http://en.wikipedia.org/wiki/Bayesian_network for lists of available software.)

In this application, we present a Bayesian network that follows the causal relationship from selection of a culture medium recipe to the addition of recipe components and the SIMS, ESI MS, and 13C and 15N isotope ratios of spores grown in the medium. We discuss the benefits of our approach, such as the use of intuitive probability scores and the ability to analyze samples from partial evidence. Using data collected from Bacillus spores grown under multiple culture conditions, we demonstrate the ability of this Bayesian network to characterize a culture medium without constructing a reference signature database. The results of this limited study show an error rate of less than 7% in characterizing carbon and nitrogen sources, addition of metals, and the presence of agar and an error rate of 19% in characterizing the culture medium recipe.


arrow
MATERIALS AND METHODS
 
A combination of historical data and data collected as part of a designed experiment was used in our analysis. (The data for the designed study were as follows: [i] three replicate cultures of B. subtilis 49760 prepared in Fox broth, Fox agar, glucose medium broth [GB], G agar [GA], nutrient sporulating medium broth [NSMB], NSM agar [NSMA], Leighton-Doi broth [LDB], or LD agar [LDA]; [ii] multiple analyses of spores performed using SIMS, IRMS, MALDI MS, and ESI MS; [iii] data used to develop and test the Bayesian classification method for identifying a culture medium; and [iv] data used to develop and test the Bayes network. Additional data used to develop and test the Bayes network were as follows: [i] nine additional SIMS measurements of B. subtilis 49760 grown in Lab Lemco broth [LLB] [8] using methods described below; [ii] ~200 additional ESI MS analyses of Bacillus thuringiensis 58890 and Bacillus anthracis Sterne grown in brain heart infusion agar, brain heart infusion broth, blood agar, tryptic soy agar [TSA], and tryptic soy broth [TSB] using methods described below; and [iii] ~300 additional IRMS analyses of B. subtilis 6051 spores grown in Schaeffer's sporulation medium broth (SSMB) and agar (SSMA), TSB, TSA, Columbia broth, and LLB [20].) Here, we describe the laboratory and instrumental data collection methods for the designed study. Additional data sets were collected in an analogous manner; however, replication in cultures and analyses were not always performed.

For the designed experiment, three replicate B. subtilis 49760 cultures were prepared both on agar plates and in broth using modified SSM (16) in which MgCl2 was substituted for MgSO4 (Fox), G (17), LD (23), and NSM (30). Vegetative starter cultures were inoculated from frozen stocks into TSB and incubated overnight at 30°C on a rotary shaker. For broth cultures, 1 ml of the overnight culture was added to 225 ml liquid medium, and the cultures were incubated at 37°C for 5 to 7 days on a rotary shaker at 150 rpm. Agar plates were spread with 150 µl overnight culture and incubated upside down at 37°C for 3 to 5 days. The cultures were checked microscopically and harvested when >95% spores were observed. The spores were washed a minimum of five times with distilled/deionized water. Following the washing, the spores were checked again for purity using phase-contrast microscopy and enumerated by plate counting. Aliquots of each spore preparation were then delivered for instrument analysis. When possible, replicate analyses were performed on each instrument for each aliquot. A description of the instrument analyses and the Bayes framework for data integration is provided below.

Metal analysis.
The residual metal composition was measured using SIMS (ION-TOF IV instrument; IONTOF GmbH) according to the experimental protocol described by Cliff et al. (8). The relative abundances of Na+, K+, Mn+, Mg+, Ca+, Fe+, Zn+, and Cu+ were used in the data analysis. For illustration, typical SIMS spectra of B. subtilis 49760 grown in different media are provided in Fig. 1.


Figure 1
View larger version (22K):
[in this window]
[in a new window]

 
FIG. 1. SIMS metal profiles of B. subtilis 49760 in different growth media.

Carbon and nitrogen isotope ratios.
Carbon and nitrogen isotope ratios were measured using IRMS (Finnigan-MAT Delta S isotope ratio mass spectrometer). A detailed description of the analysis protocol is provided elsewhere (20, 21). Data analysis was applied to the delta value ({delta}{per thousand}), where {delta}{per thousand} = [(RA/RStd) – 1]·1,000{per thousand}, and RA and RStd are the molar ratios of the rare isotope to the abundant isotope (e.g., 13C/12C) in the sample and the standard. The standard for carbon is Peedee belemnite, a fossil limestone from South Carolina, and for nitrogen, it is air (9). 13C/12C and 15N/14N isotope ratios (expressed as {delta}{per thousand}) of Bacillus spores grown in different media are shown in Fig. 2.


Figure 2
View larger version (21K):
[in this window]
[in a new window]

 
FIG. 2. Carbon and nitrogen stable iosotope ratios of Bacillus spores (multiple species) in common media. Col A, Columbia agar; Fox A and B, modified SSMA and SSMB.

Presence of agar.
Residual agar was detected by soaking spores in hot water to remove agar, adding agarase to the water to digest the agar, and detecting diagnostic carbohydrate fragments by ESI MS (Agilent LC-MSD Ion Trap XCT and ThermoFinnigan LTQ) according to the protocol described by Edberg et al. (H. C. Edberg, C. E. Petersen, N. B. Valentine, D. S. Wunschel, and K. L. Wahl, presented at the 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, WA, 2006; Edberg et al., unpublished). Agar is a repeating polysaccharide of galactose and anhydrogalactose. The agarase digestion resulted in agar fragments of various lengths. The resulting dimer of the agar was detected in MS mode as a sodium adduct ion at m/z 653. For specificity, the m/z 653 ion was selected and fragmented, resulting in tandem-MS fragments of 329, 473, and 509. The m/z 473 ion was fragmented further, and the total abundances of the MS3 329 m/z ion fragment were used in the data analysis. A typical ESI MS spectrum of B. subtilis 49760 grown in an agar-based medium is provided in Fig. 3.


Figure 3
View larger version (23K):
[in this window]
[in a new window]

 
FIG. 3. ESI MS agar profiles of B. subtilis 49760 in agar-based medium. Ions marked by an asterisk are due to digested agar. The arrows indicate the progression of agar ions through three tandem mass spectral analyses. Only the MS3 329 ion is used as an indicator of the presence of agar in this study.

Data analysis methods.
In earlier work (K. H. Jarman, K. L. Wahl, N. B. Valentine, J. B. Cliff, H. Kreuzer-Martin, C. E. Petersen, H. C. Edberg, and D. S. Wunschel, presented at the 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, WA, 2006), we presented a method for integrating disparate data using a Bayesian classification framework. The framework uses conditional-probability models for organisms grown in culture media (CM) of interest, namely, prob{data|CM}, the conditional probability of the observed data given a specific culture medium, where the vertical line represents the word "given" (i.e., the conditioning). Data from different analytical instruments are then combined using a well-known relationship derived from Bayesian statistics (3, 15), prob{CMi|x} ~ probM1{x CMi} · probM2{x CMi} · ... · probMN{x CMi}, where x refers to data from instruments M1, M2, ... MN and CMi refers to culture medium i. This relationship is particularly useful, since it allows us to reverse likelihoods from what we can measure (prob{data|CM}) to what we would like to know (prob{CM|data}). It also provides the basis for a simple and powerful classification scheme. In particular, given a sample grown in unknown culture medium, mass spectral analyses can be performed and prob{CMi|x} can be calculated for each culture medium. The culture medium with the highest probability is then identified as the culture medium of the unknown.

Bayes networks expand on this relationship (25). Also called belief networks, Bayes networks were originally used to model causal relationships in a process. By imposing certain conditional-independence assumptions on the nodes of the network, Bayes' theorem can be used to make inferences about the underlying (unobserved) state of a process from measurement data or other evidence. Since their original inception, Bayes networks have proven to be useful for modeling complex systems and processes over a broad spectrum of applications.

Following the classical approach, we define the Bayes network for characterizing a culture medium through causal relationships. In particular, the culture medium used determines the ingredients to be used. The specific ingredients used are consumed by the microorganism during the culturing process, and their effects are measured through SIMS, IRMS, and ESI MS analyses after the fact.

This modeling process is illustrated by way of the directed acyclic graph (DAG) in Fig. 4. The circles in the DAG representing various steps in the process are called nodes. Arrows, called edges, connect the nodes. The arrows point from parent nodes to child nodes, and the directions of the arrows in the graph represent causal relationships (i.e., from culture medium recipe to ingredients to SIMS, IRMS, and ESI MS analyses of the cultured microorganisms).


Figure 4
View larger version (22K):
[in this window]
[in a new window]

 
FIG. 4. DAG for integrated characterization of culture media. The nodes indicate different steps in the process, while the arrows indicate causal relationships between the nodes. IR, isotope ratio.

Along with illustrating causality, Fig. 4 also demonstrates the ability of DAGs to model dependencies between nominally orthogonal analytical techniques. For example, the addition of agar to a culture medium affects the ESI MS agar peak intensities by design, but it also affects the 15N isotope ratios. The reason for this has not been determined, but it could be because solidification of the growth medium promotes volatilization of ammonia from the medium. Ammonia containing 14N would evaporate somewhat more readily than ammonia containing 15N (14) and thus slightly increase the relative amount of 15N in the pool of nitrogen available to the microbes. This causal relationship between the addition of agar and the ESI MS and 13C and 15N isotope ratios is modeled by adding an arrow from the agar node to both the ESI MS and IRMS measurement nodes.

The DAG in Fig. 4 illustrates the dependencies in the process and provides the basis for our framework. To complete the Bayes network, a series of conditional-probability models must be developed. First, we make the standard assumptions that any two child nodes are conditionally independent of one another, given that we know the state of the parent node (10, 25). This means, for example, that the ESI MS peak intensities and the IRMS peak intensities are orthogonal (and statistically independent) of one another if we know whether agar was added to the culture medium.

Next, we specify conditional probabilities associated with transition from parent nodes to their children (i.e., prob{metals added|CMi}, prob{agar|added CMi}, prob{C/N source|CMi}, prob{SIMS intensities|known metals added}, prob{ESI-MS ion intensity|CMi}, prob{13C/15N isotope ratios agar added|CMi}). We also put a priori probabilities on the different possible culture media for the organism under consideration. A combination of scientific understanding and empirical data is used to determine all of these probabilities, and their specification is provided below.

Once the Bayes network has been fully specified, it can be used to take evidence (SIMS, IRMS, and/or ESI MS data) and make inferences about the unobserved nodes in the process (metals added, agar added, C/N sources, and culture medium). In particular, the rules of Bayesian statistics are applied to data to obtain probabilities, or scores, for the culture medium ingredients and the culture medium. By ranking these scores, we can determine which culture medium components and culture medium were most likely used.

(i) Culture medium.
The states of the culture medium node represent the collection of all possible culture media, CMi, and the a priori probability attached to each state represents our initial belief that the culture medium used was CMi. For microorganisms that grow well in a variety of media, such as B. anthracis, these probabilities might be evenly weighted over several of the CMi. For organisms that grow well in only a few media, such as Francisella tularensis, only a few culture media will have nonzero probabilities. For our study of B. subtilis, we placed equal a priori probabilities on all of the following media: Luria broth (LB), LB with agar, GB and GA, LDB and LDA, NSMB and NSMA, LLB and LL agar (LLA), TSB and TSA, SSB and SSA, and nutrient broth (NB) and agar.

(ii) Metals added.
Ca2+, K+, Na+, Mg2+, Mn2+, Fe2+ or Fe3+, Zn2+, Cu2+, and Co2+ are often added to culture media in different amounts. Our SIMS data provided intensities for each of these ions; however, preliminary data analysis indicated that the relationship between the addition of metals and the SIMS relative intensity is not readily modeled. In particular, some metals, such as Zn2+, showed an increase in relative ion intensity for the media to which Zn2+ was added. Others, such as Fe2+, showed the opposite. These seemingly nonintuitive observations may be due to the ubiquitous nature of metals in many base culture medium components, tight intercellular regulation of key metals (27), or the complex way in which mineral ions are utilized by microorganisms (18).

In our study, two metals (Zn2+ and Cu2+) appeared to have a strong, predictable effect on the relative SIMS intensity and were therefore included in the metals-added node. The conditional probabilities for this node reflect the explicit addition of Zn2+ and Cu2+, given each culture medium, and are derived directly from the culture medium recipes. For example, since Zn2+ and Cu2+ are both added to glucose medium, we set prob{Zn2+ added|G} equal to 1 prob{Zn2+ not added|G} equal to 0 and prob{Cu2+ added|G} equal to 1 prob{Cu2+ not added|G} equal to 0.

The conditional probabilities for the states of this node, namely, all possible combinations of Zn2+ and Cu2+, were then obtained by multiplying the individual Zn2+ and Cu2+ conditional probabilities. (This gives prob{Zn2+ added and Cu2+ added|G} = 1. All other combinations have zero probability). We noted that aside from G, none of the other media included in this study had Zn2+ or Cu2+ explicitly added. This is a limitation we hope to overcome in future studies.

(iii) Agar added.
The agar-added probabilities are derived directly from the culture medium recipe. For the agar media, prob{agar added|agar-based medium} is equal to 1. For the broth media, prob{agar added|broth-based medium} is equal to 0.

(iv) C/N food source.
Based on the culture medium recipes considered in this study, the carbon and nitrogen food sources were divided into five groups, each with an agar and broth form: yeast/tryptone (LB and LB with agar), yeast/sugar (GA and GB), soy/tryptone (TSA and TSB), beef extract/peptone (NB, nutrient agar, LLB, LLA, SSB, SSA, LDB, and LDA), and beef extract/tryptone (NSMB and NSMA). The conditional probabilities for each group were derived directly from the culture medium recipes, and all were either zero or one.

(v) SIMS intensities.
The probabilities for the SIMS intensities were calculated using a combination of data from the designed study and a historical data set containing B. subtilis 49760 spores grown in LLB. We employed a qualitative metric for these data, specifically, whether the ion intensity, relative to the sum of K+, Mg+, Mn+, Fe+, Zn+, and Cu+ ion intensities, exceeded a previously specified threshold.

Box plots for the relative Zn+ and Cu+ ion intensities (Fig. 5) show the range of relative ion intensities for Zn+ (or Cu+) when the metal was or was not explicitly added to the culture medium. The threshold, selected visually, is also indicated in Fig. 5. From these data, we have the following: prob{Cu+ relative intensity > 0.00025|Cu2+ explicitly added to medium} = 0.5; prob{Cu+ relative intensity > 0.00025|Cu2+ not explicitly added to medium} = 0.18; prob{Zn+ relative intensity > 0.0005|Zn2+ explicitly added to medium} = 0.54; and prob{Zn+ relative intensity > 0.0005|Zn2+ not explicitly added to medium} = 0.008.


Figure 5
View larger version (15K):
[in this window]
[in a new window]

 
FIG. 5. Box plots of SIMS Zn and Cu intensities (Int.) for B. subtilis spores in multiple media. The center lines through the boxes indicate the median values. The heights of the boxes indicate the spread of 75% of the measurements. The dotted lines spanning the plots show the critical threshold (thresh) used in the Bayesian network.

(vi) ESI MS ion intensities.
In their assay for residual agar, Edberg et al. used the the MS3 329 m/z ion intensity as an indicator of the presence of agar (H. C. Edberg, C. E. Petersen, N. B. Valentine, D. S. Wunschel, and K. L. Wahl, presented at the 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, WA, 2006; Edberg et al., unpublished). The probabilities for this ion were calculated using data from the designed study, along with an additional 300 analyses described above. As with the SIMS intensities, we chose a qualitative indicator of the presence of agar: whether the ion intensity exceeded a previously specified threshold. Figure 6 shows box plots of the MS3 329 m/z ion intensity for data where agar was and was not added. The threshold, selected visually, is also indicated. From these data, we have the following: prob{MS3 329 ion > 40 counts agar not added} = 0.16 and prob{MS3 329 ion > 40 counts agar added} = 0.85.


Figure 6
View larger version (10K):
[in this window]
[in a new window]

 
FIG. 6. Box plots of the ESI-MS MS3 329 m/z ion intensities (Int.) for multiple Bacillus species in multiple culture media. The center lines through the boxes indicate the median values. The heights of the boxes indicate the spread of 75% of the measurements. The dotted lines spanning the plots show the critical threshold used in the Bayesian network.

(vii) 13C/15N isotope ratios.
The carbon and nitrogen isotope ratio probabilities are conditioned on two nodes: agar added and the C/N source. They were constructed using a combination of data from the designed study, along with the additional 300 historical data values discussed above. Figure 7 plots the 13C and 15N isotope ratios labeled by C/N source (without regard to agar added). Once again, we employed qualitative metrics for the 13C/15N values, namely, whether the isotope ratios fell within specified ranges. In particular, we divided the range of 13C and 15N values into the regions indicated in Fig. 7. Then, given each C/N source and agar combination, we calculated the probability of having 13C and 15N values inside each region as the fraction of observed values in that region. These probabilities are provided in Table 1.


Figure 7
View larger version (22K):
[in this window]
[in a new window]

 
FIG. 7. Nitrogen and carbon isotope ratios of B. subtilis spores produced in different media identified by C/N source. Agar- and broth-grown samples are included in the plot without being explicitly indicated.


View this table:
[in this window]
[in a new window]

 
TABLE 1. Probabilities of different 13C and 15N isotope ratio ranges by growth medium type (broth samples)


arrow
RESULTS AND DISCUSSION
 
Data analysis was performed in Matlab 7.0.1. A user-developed Bayes net toolbox (http://bnt.sourceforge.net) was used to construct and make inferences with the Bayes network. Inferences were made using the junction tree engine algorithm (10), and the datasets described in Materials and Methods were used to test the network. Figure 8 shows a sample screenshot of our Bayes network applied to data from B. subtilis spores grown in GA (entered in the SIMS, ESI MS, and IRMS text boxes). The inferred probabilities for culture medium and culture medium ingredients are provided in their respective list boxes, ranked so that the highest probabilities appear at the top of the list. Figures 9 and 10 show screen shots of the Bayes network applied to data from B. subtilis spores grown in LDB and TSB, respectively.


Figure 8
View larger version (40K):
[in this window]
[in a new window]

 
FIG. 8. Screenshot of Bayesian network applied to data on B. subtilis spores grown in GA.


Figure 9
View larger version (40K):
[in this window]
[in a new window]

 
FIG. 9. Illustration of Bayesian network on data from B. subtilis spores grown in LDB.


Figure 10
View larger version (43K):
[in this window]
[in a new window]

 
FIG. 10. Illustration of Bayesian network on data from B. subtilis spores grown in TSB with ESI MS data omitted.

Figures 8 to 10 demonstrate some of the benefits the Bayes network approach. First, probabilities are provided for each culture medium, giving the user a straightforward way to evaluate the strength of his or her evidence. For example, in Fig. 8, GA is the medium with the highest probability, having a much larger score than any of the other media. In this case, we could say that the evidence for GA is strong. By way of contrast, Fig. 9 shows a sample screenshot of the Bayes network applied to spores grown in LDB. Because there are four culture broths with the same beef extract/peptone base and no Zn2+ or Cu2+ added (NB, SSMB, LB, and LDB), these media are not unique to our analysis techniques and all four have the same top score of 0.21. In this case, we could say that the evidence for a particular culture medium is weak. (However, we note that by incorporating additional analytical techniques into the network, it may eventually be possible to differentiate spores grown in these four similar media.)

When evidence for a single culture medium is weak or when the true culture medium of the sample has not been included in the Bayes net, the probability scores for individual culture medium components become crucial. This is illustrated in Fig. 9, where a specific culture medium cannot be identified but where the probabilities for a broth culture with beef/peptone base and no Zn2+ or Cu2+ are all strong. Ultimately, this may be the most useful information, because it would allow investigators to partially characterize the culture environment of an unknown sample without having the exact culture medium recipe entered in the network.

Finally, Fig. 10 shows how partial evidence can be used to characterize a sample. The isotope ratios and SIMS intensities of spores grown in TSA are entered into the network, but the ESI MS data are omitted as if they were unavailable. In this case, the probability scores for both TSB and TSA are approximately the same, indicating that the media cannot be differentiated based on the evidence provided. However, the probabilities for soy/tryptone base and no Zn2+ or Cu2+ are strong, giving us a partial characterization of the medium. It is also interesting that the probability for broth culture is slightly higher than for agar, even though no ESI MS data were entered. This is because of the modeled dependency between the agar node and the isotope ratio node. Even though the 13C and 15N isotope ratios do not directly measure agar, the 15N isotope ratio is affected by agar. Thus, when isotope ratio data are propagated through the Bayes network, the 15N value causes the probabilities for broth and agar to adjust slightly.

Data from all the studies listed in Materials and Methods were combined and used in a cursory simulation designed to test the performance of the Bayes network. In some cases, such as the designed study, the IRMS, SIMS, and ESI MS data came from the same B. subtilis culture. In others, IRMS, SIMS, and ESI MS data were paired randomly and run through the Bayes net as if all three data sets were from the same culture. In an attempt to use as much of the data as possible, three substitutions were made. First, in the designed study, data from modified SSM (Fox) agar and broth were available. Additional data were available from SSM. The two recipes were identical, except that Fox medium calls for 0.012% MgCl and SSM calls for 0.012% MgSO4 heptahydrate. Since our analysis did not include the anion Cl or SO42–, we randomly combined data from the two media. Second, replicate ESI MS data were limited. Our observations indicated that the culture medium has little or no effect on the ESI MS analysis results (Edberg et al., unpublished). Therefore, when ESI MS data from the same broth (or agar) culture medium were unavailable, we randomly paired another broth (or agar) ESI MS data set with the IRMS and SIMS data to produce our results. Finally, data from all Bacillus samples were combined without regard to species.

In performing the data analysis, we established criteria for identification of the culture medium as follows. If the top probability score for a culture medium was at least 0.45 and twice as large as the second top score, then we considered it to be strong evidence for the top-scoring culture medium. Otherwise, it was considered to be weak evidence. The results are tabulated in Table 2. In the table, the numbers of correct and incorrect strong identifications are provided, along with the number of weak identifications and comments on the results.


View this table:
[in this window]
[in a new window]

 
TABLE 2. Characterization of growth media of Bacillus spores using a Bayesian network

In Table 2, four of the media generally produced strong evidence while six of the media tended to produce weak evidence. Of those 31 runs giving strong evidence, only 6 (19%) produced the incorrect culture medium as the top score. Of the six that produced the incorrect medium as the top score, four of the errors were due to incorrect determinations as to whether the sample was grown on agar or in broth.

Table 3 tabulates the performance of the Bayes network in characterizing the culture medium components. The numbers of correct and incorrect characterizations of metals, agar, and C/N source are provided. With the exception of SSM, the numbers of errors are very small, and the tallies in the bottom row of the table indicate that the error rates are low.


View this table:
[in this window]
[in a new window]

 
TABLE 3. Characterization of medium components of Bacillus spores using a Bayesian network

In spite of our imperfect data, the results in Tables 3 and 4 suggest that the proposed Bayes network has potential utility in microbial forensics. The error rate for characterizing culture medium components is small, ranging from less than 1% for the metals to just over 6% for determination of agar versus broth. The error rate for identifying the culture medium is higher at 19%. (The fact that the error rate is higher for identifying a culture medium than for individual components is expected, since correct identification of a culture medium relies on correct characterization of all three components, not just one.) These error rates are encouraging and a little surprising given the liberties taken in the data analysis. For example, the effects of different species on 13C and 15N isotope ratios and SIMS metal intensities are undocumented and might possibly be significant factors. In this case, combining data from different Bacillus species in the Bayes network would increase statistical variability, resulting in an adverse affect on our reported error rates. A controlled study encompassing multiple species in multiple media could help us refine our error estimates for this Bayes net approach.

Conclusions.
Early studies indicated that different growth environments impose distinct signatures on microorganisms. However, taking these early results and translating them into practical solutions in forensics, where known signatures are not always available for comparison, is not a trivial task. The work presented here provides an avenue for bridging this gap between basic science and practice.

The integration of multiple analytical techniques allows us to maximize the amount of information obtained from unknown-source microorganisms. In this work, we focused on characterizing metals, agar, and carbon/nitrogen sources in culture media, but our framework can be easily extended to include any number of analytical techniques measuring a wide range of production environment characteristics. Once a relationship between a particular aspect of the culture environment and a specific analytical measurement is established, all that is needed is a proper set of training data and some science-based statistical modeling to incorporate the new technique into this approach.

The Bayesian network approach allows us to combine scientific understanding with well-established statistical methodologies to characterize a microbe's growth environment without the need for reference signatures. The use of probability scores provides the user with an estimate of the uncertainty in the analysis, which is crucial to investigators who would like to have some measure of the strength of their evidence. Even when faced with a previously uncharacterized culture medium, the probability scores attached to the medium components still provide investigators with useful information about the organism's growth environment.

A Bayesian network is easily expanded and adapted. Any number of culture media can be included in the network without added data collection, and the a priori probabilities attached to each culture medium can be used to weight the likelihood of different culture media based on their effectiveness in growing a given organism. In addition, a Bayesian network can be easily adapted to our increasing understanding of microbial forensics by adjusting the conditional probabilities embedded in the model as more data are added.

The study presented here serves only as an initial proof of concept of the approach. A scientifically defensible Bayesian network for microbial forensics should integrate biological models, as well as a statistical description of each node based on carefully collected data. Additionally, a sensitivity study characterizing changes in the performance of the Bayesian network with changes in the specified parameters would provide a measure of the robustness of this approach. Finally, the network needs to be demonstrated under realistic conditions. The effects of different species on our models need to be established, and a broader range of culture media need to be studied. In addition, the effects of interference and the presence of complex backgrounds need to be evaluated. Nonetheless, this framework provides researchers with a methodology that can be readily expanded upon and refined as our knowledge and understanding of microbial forensics increase.


arrow
ACKNOWLEDGMENTS
 
This work was funded by the U.S. Department of Energy through the Laboratory Directed Research and Development Program. Battelle Memorial Institute under contract DE-AC06-76RLO operates Pacific Northwest National Laboratory for the U.S. DOE. The ESI MS agar data were provided courtesy of a program funded by the Department of Homeland Security. Support from the NSF (DMR-0216639) for the TOFSIMS instrumentation at the University of Oregon is gratefully acknowledged.

We thank Stephen Golledge at the University of Oregon for his assistance. Stable isotope ratio analyses were performed at the Stable Isotope Ratio Facility for Environmental Research at the University of Utah, Salt Lake City. We thank James Ehleringer for his support and Lesley Chesson, Michael Lott, Janet Barnette, and Jeremiah Hoffman for their assistance. Finally, we thank Thomas A. Martin for his assistance with preparation of graphics for this publication.


arrow
FOOTNOTES
 
* Corresponding author. Mailing address: Pacific Northwest National Laboratory, P.O. Box 999/MS K9-72, Richland, WA 99352. Phone: (509) 375-4539. Fax: (509) 375-2604. E-mail: Kristin.jarman{at}pnl.gov Back

{triangledown} Published ahead of print on 4 April 2008. Back


arrow
REFERENCES
 
    1
  1. Antelmann, H., S. Towe, D. Albrecht, and M. Hecker. 2007. The phosphorus source phytate changes the composition of the cell wall proteome in Bacillus subtilis. J. Proteome Res. 6:897-903.[CrossRef][Medline]
  2. 2
  3. Bae, S. M., S. M. Yeon, T. S. Kim, and K. J. Lee. 2006. The effect on protein expression of Streptococcus pneumoniae by blood. J. Biochem. Mol. Biol. 39:703-708.[Medline]
  4. 3
  5. Bernardo, J. M., and A. F. Smith. 2000. Bayesian theory. John Wiley & Sons, Ltd., West Sussex, United Kingdom.
  6. 4
  7. Breeze, R. G., B. Budowle, and S. E. Schutzer. 2005. Microbial forensics. Elsevier Academic Press, London, United Kingdom.
  8. 5
  9. Bridson, E. Y. 1990. Media in microbiology. Rev. Med. Microbiol. 1:1-9.
  10. 6
  11. Budowle, B., S. E. Schutzer, M. S. Ascher, R. M. Atlas, J. P. Burans, R. Chakraborty, J. J. Dunn, C. M. Fraser, D. R. Franz, T. J. Leighton, S. A. Morse, R. S. Murch, J. Ravel, D. L. Rock, T. R. Slezak, S. P. Velsko, A. C. Walsh, and R. A. Walters. 2005. Toward a system of microbial forensics: from sample collection to interpretation of evidence. Appl. Environ. Microbiol. 71:2209-2213.[Free Full Text]
  12. 7
  13. Budowle, B. M., M. D. Johnson, C. M. Fraser, T. J. Leighton, R. S. Murch, and R. Chakraborty. 2005. Genetic analysis and attribution of microbial forensics evidence. Crit. Rev. Microbiol. 31:233-254.[CrossRef][Medline]
  14. 8
  15. Cliff, J. B., K. H. Jarman, N. B. Valentine, S. L. Golledge, D. J. Gaspar, D. S. Wunschel, and K. L. Wahl. 2005. Differentiation of spores of Bacillus subtilis grown in different media by elemental characterization using time-of-flight secondary ion mass spectrometry. Appl. Environ. Microbiol. 71:6524-6530.[Abstract/Free Full Text]
  16. 9
  17. Coplen, T. B. 1996. New guidelines for reporting stable hydrogen, carbon, and oxygen isotope-ratio data. Geochim. Cosmochim. Acta 60:3359-3360.[CrossRef]
  18. 10
  19. Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. 1999. Probabilistic networks and expert systems. Springer-Verlag, Inc., New York, NY.
  20. 11
  21. DeNiro, M. J., and S. Epstein. 1978. Influence of diet on the distribution of carbon isotopes in animals. Geochim. Cosmochim. Acta 42:495-506.[CrossRef]
  22. 12
  23. DeNiro, M. J., and S. Epstein. 1981. Influence of diet on the distribution of nitrogen isotopes in animals. Geochim. Cosmochim. Acta 45:341-351.[CrossRef]
  24. 13
  25. Estep, M. 1980. Hydrogen isotope ratios of mouse tissues are influenced by a variety of factors other than diet. Science 214:1374-1376.[CrossRef]
  26. 14
  27. Fritz, P., and J. C. Fontes. 1980. The terrestrial environment, A, vol. 1, p. 1-19. Elsevier, New York, NY.
  28. 15
  29. Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 2000. Bayesian data analysis. CRC Press, Boca Raton, FL.
  30. 16
  31. Harwood, C., and S. Cutting. 1990. Molecular biological methods for Bacillus. John Wiley & Sons, Inc., Chichester, United Kingdom.
  32. 17
  33. Hashimoto, T., S. H. Black, and P. Gerhardt. 1960. Development of time structure, thermostability, and dipicolinate during sporogenesis in a Bacillus. Can. J. Microbiol. 6:203-212.[Medline]
  34. 18
  35. Hughes, M. N., and R. K. Poole. 1991. Metal speciation and microbial growth—the (hard and soft) facts. J. Gen. Microbiol. 137:725-734.[Free Full Text]
  36. 19
  37. Jensen, F. V. 2001. Bayesian networks and decision graphs. Springer-Verlag, Inc., New York, NY.
  38. 20
  39. Kreuzer-Martin, H., and K. H. Jarman. 2007. Stable isotope ratios and forensic analysis of microorganisms. Appl. Environ. Microbiol. 73:3896-3908.[Abstract/Free Full Text]
  40. 21
  41. Kreuzer-Martin, H. W., L. A. Chesson, M. J. Lott, J. V. Dorigan, and J. R. Ehleringer. 2004. Stable isotope ratios as a tool in microbial forensics, 2. Isotopic variation among different growth media as a tool for sourcing origins of bacterial cells or spores. J. Forensic Sci. 49:961-967.[Medline]
  42. 22
  43. Kreuzer-Martin, H. W., M. J. Lott, J. Dorigan, and J. R. Ehleringer. 2003. Microbe forensics: oxygen and hydrogen stable isotope ratios in Bacillus subtilis cells and spores. Proc. Natl. Acad. Sci. USA 100:815-819.[Abstract/Free Full Text]
  44. 23
  45. Leighton, T. J., and R. H. Doi. 1971. The stability of messenger ribonucleic acid during sporulation in Bacillus subtilis. J. Biol. Chem. 246:3189-3195.[Abstract/Free Full Text]
  46. 24
  47. Lopez-Campistrous, A., P. Semchuk, L. Burke, T. Palmer-Stone, S. J. Brokx, G. Broderick, D. Bottorff, S. Bolch, J. H. Weiner, and M. J. Ellison. 2005. Localization, annotation, and comparison of the Escherichia coli K-12 proteome under two states of growth. Mol. Cell. Proteomics 4:1205-1209.[Abstract/Free Full Text]
  48. 25
  49. Neapolitan, R. E. 1990. Probabilistic reasoning in expert systems: theory and algorithms. John Wiley & Sons, Inc., New York, NY.
  50. 26
  51. Nicholson, A., and K. Korb. 2003. Bayesian artificial intelligence. CRC Press, Boca Raton, FL.
  52. 27
  53. Outten, C. E., and T. V. O'Halloran. 2001. Femtomolar sensitivity of metalloregulatory proteins controlling zinc homeostasis. Science 292:2488-2492.[Abstract/Free Full Text]
  54. 28
  55. Pearl, J. 2000. Causality: models, reasoning, and inference. Cambridge University Press, Cambridge, United Kingdom.
  56. 29
  57. Rock, C. O., and S. Jackowski. 2002. Forty years of bacterial fatty acid synthesis. Biochem. Biophys. Res. Commun. 292:1155-1166.[CrossRef][Medline]
  58. 30
  59. Samuels, A. C., F. C. DeLucia, K. L. McNesby, and A. W. Miziolek. 2003. Laser-induced breakdown spectroscopy of bacterial spores, molds, pollens, and protein: initial studies of discrimination potential. Appl. Optics 42:6205-6209.[CrossRef][Medline]
  60. 31
  61. Valentine, N., S. Wunschel, D. Wunschel, C. Petersen, and K. Wahl. 2005. Effect of culture conditions on microorganism identification by matrix-assisted laser desorption ionization mass spectrometry. Appl. Environ. Microbiol. 71:58-64.[Abstract/Free Full Text]
  62. 32
  63. Wahl, K. L., N. B. Valentine, S. C. Wunschel, D. S. Wunschel, K. H. Jarman, and C. E. Petersen. 2003. Microorganism analysis and identification by MALDI-TOF-MS. Abstr. Pap. Am. Chem. Soc. 226:U121.
  64. 33
  65. White, D. C., D. B. Ringelberg, D. B. Hedrick, and D. E. Nivens. 1994. Rapid identification of microbes and environmental matrices—characterization of signature lipids. Mass Spectrom. Characterization Microorg. 541:8-17.
  66. 34
  67. Whiteaker, J., C. Fenselau, D. Fetteroff, D. Steele, and D. Wilson. 2004. Quantitative determination of heme for forensic characterization of Bacillus spores using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Anal. Chem. 76:2836-2841.[Medline]
  68. 35
  69. Wunschel, D. S., E. A. Hill, J. S. McLean, K. Jarman, Y. A. Gorby, N. Valentine, and K. Wahl. 2005. Effects of varied pH, growth rate and temperature using controlled fermentation and batch culture on matrix assisted laser desorption/ionization whole cell protein fingerprints. J. Microbiol. Methods 62:259-271.[CrossRef][Medline]
  70. 36
  71. Wunschel, S. C., K. H. Jarman, C. E. Petersen, N. B. Valentine, K. L. Wahl, D. Shauki, J. Jackman, S. P. Nelson, and E. White. 2005. Bacterial analysis by MALDI-TOF MS: an interlaboratory comparison. J. Am. Soc. Mass Spectrom. 16:456-462.[CrossRef][Medline]


Applied and Environmental Microbiology, June 2008, p. 3573-3582, Vol. 74, No. 11
0099-2240/08/$08.00+0     doi:10.1128/AEM.02526-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Jarman, K. H.
Right arrow Articles by Wahl, K. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jarman, K. H.
Right arrow Articles by Wahl, K. L.
Agricola
Right arrow Articles by Jarman, K. H.
Right arrow Articles by Wahl, K. L.