**DOI:**10.1128/AEM.01448-10

## ABSTRACT

The expanded Fermi solution was originally developed for estimating the number of food-poisoning victims when information concerning the circumstances of exposure is scarce. The method has been modified for estimating the initial number of pathogenic or probiotic cells or spores so that enough of them will survive the food preparation and digestive tract's obstacles to reach or colonize the gut in sufficient numbers to have an effect. The method is based on identifying the relevant obstacles and assigning each a survival probability range. The assumed number of needed survivors is also specified as a range. The initial number is then estimated to be the ratio of the number of survivors to the product of the survival probabilities. Assuming that the values of the number of survivors and the survival probabilities are uniformly distributed over their respective ranges, the sought initial number is construed as a random variable with a probability distribution whose parameters are explicitly determined by the individual factors' ranges. The distribution of the initial number is often approximately lognormal, and its mode is taken to be the best estimate of the initial number. The distribution also provides a credible interval for this estimated initial number. The best estimate and credible interval are shown to be robust against small perturbations of the ranges and therefore can help assessors achieve consensus where hard knowledge is scant. The calculation procedure has been automated and made freely downloadable as a Wolfram Demonstration.

The number or fraction of ingested microbial cells or spores reaching the gut intact or viable is of interest in two main situations: when we want them to survive, as in the case of probiotic lactobacilli, and when we do not, as in the case of food- or waterborne pathogens. In both cases, direct determination of the number of viable cells or spores that can successfully colonize or inhabit the human gut is a very difficult if not impossible task. This is true because of a variety of methodological and logistic impediments and ethical and safety considerations. But even if there were safe and feasible methods to determine the number of surviving cells or spores in humans *in vivo*, the investigator would still have to face the likelihood of inconsistent or highly scattered results due to variations in the digestive systems of individual humans, the potential influence of the food with which the cells or spores have been ingested, and the manner in which the food has been prepared and consumed. Moreover, cells of different strains might respond differently to the stresses imposed on them within the human digestive tract (1). Also, their survival as well as infectivity and virulence can be influenced by their preingestion history and the particular food's composition. Perhaps with the exception of some damage in the stomach, bacterial spores can pass the digestive tract practically intact. This, however, may not be the case if they have started to germinate prior to or after their ingestion. Whether germination occurs and to what extent are affected by different factors and are usually not well-known. All the above suggests that the number or fraction of ingested cells or spores can be only approximately estimated even under the best circumstances.

One question that arises is whether one can calculate a plausible estimate of the survival ratio of ingested microbial cells or spores, despite the absence of solid information on what actually happens to them once they enter the human body. A second, closely related question is, what is the most likely number of cells or spores that, if ingested, would leave enough survivors reaching the gut to cause acute food poisoning? Similarly, in the case of probiotic cells or spores, how many should be ingested in order to guarantee a given number of survivors in the gut to have the desirable effect on health? The two questions can be extended to include what happens to the cells or spores prior to the food's ingestion. For example, how many pathogens should initially be in a food to cause food poisoning if it is kept refrigerated and then washed and/or (partly) cooked as a step in its preparation. A similar question can be asked about probiotic cells and spores if they are introduced via a baked or frozen food. Notice that except under strictly controlled laboratory conditions, the terms “washed,” “refrigerated,” “cooked,” and “baked” are not clearly defined. But even in controlled experiments, the survival of pathogens and other microorganisms can vary by as much as an order of magnitude and even more (2, 9, 10, 19, 24). Thus, determining by how much food storage and preparation reduce the number of cells or spores of interest would require a substantial amount of data and might still yield inconclusive results.

One method used to estimate risk in the absence of sufficient data is to multiply and/or divide a series of assumed probabilities of the factors that determine the risk (3-5, 7, 22). When these probabilities (or other relevant factors) are merely reasonable guesses, the method is known as the Fermi solution (21), named after the physicist Enrico Fermi (1901 to 1954), who developed it to a high art. The method has recently been expanded for microbiology applications by replacing the values of the guessed probabilities and factors by their assumed ranges (13). In this form, the method can be used to estimate the daily number of salmonellosis cases in a large city or the number of food-poisoning cases from a contaminated dish served in a restaurant or at a party. In addition to a point estimate, i.e., a single value, for the number of cases, which may or may not be convincing, this so-called expanded Fermi solution method gives a plausible range of values which is likely to capture the true number of cases.

In this work we present a version of the expanded Fermi solution adapted to answer the two questions posed above. The goal has not been to investigate any particular organism and its survival pattern prior to and after its ingestion. The method complements, but does not replace, existing methods for microbial risk assessment (6, 23, 25) or the mathematical models on which they are based (16, 20, 26). Therefore, what follows will present only the method, explain the concept on which it is based, and demonstrate the capabilities of the freely downloadable interactive software that has been developed for its implementation.

## METHODS

Original Fermi solution application to microbial survival in the digestive tract.Suppose *M* is the number of microbial cells or spores ingested. In order for them to reach the gut viable, they have to survive the stomach's acid and enzymes, the exposure to bile and the pancreatic juice, the competition with cells of other species in the gut, etc. The survival probabilities after each of these exposures are denoted by *P*_{1}, *P*_{2}, *P*_{3}…, *P _{k}*. The expected number of cells arriving viable in the gut would be

*N = MP*

_{1}

*P*

_{2}

*P*

_{3}…

*P*. Now, take

_{k}*N*to be the minimum number of a pathogen's cells in the gut that is needed to cause food poisoning in a human. If all the probabilities

*P*

_{1}, …,

*P*were known, then the number of cells needed to be ingested to leave

_{k}*N*viable survivors in the gut, and thus cause food poisoning, is given by

*M = M**, where $$mathtex$$\[M{\ast}{=}\frac{N}{P_{1}{\times}P_{2}{\times}P_{3}{\times}.\ .\ .{\times}P_{k}}{=}\frac{N}{{{\prod}_{1}^{k}}P_{i}}\]$$mathtex$$(1) and

*k*is the number of probabilities (

*P*) that determine the survival level. (The reason for the notation

_{i}*M** will become apparent below.) Notice that the probabilities can include preingestion probabilities, such as probabilities of surviving freezing, cold storage and refrigerated transportation, washing, and mild heating. In principle, the denominator of equation 1 can include not only survival probabilities (0 <

*P*≤ 1) but also factors having a numerical value bigger than 1 to account for growth. Although the mathematical procedure to estimate the number of cells reaching the gut will be the same, we shall not address such scenarios in this work. In the case of probiotic spores, we can also include the probability that they will germinate after reaching the gut intact.

_{i}In reality, we might not know *N* exactly, and the same can be said about some or all of the *P _{i}*'s. However, if we could come up with reasonable estimates of

*N*and the

*P*'s, the value of

_{i}*M** calculated with equation 1 might be a realistic estimate of the actual number. This is the Fermi solution. The reason that it often works is that if the estimated (or guessed) values of

*N*and the

*P*'s are reasonable, the likelihood of over- or underestimating all or most of the factors is fairly low. Therefore, it is more likely that the errors in one direction will be compensated for by errors in the opposite direction, so that the final result will not be too far off. What constitutes a “reasonable guess” is not scientifically defined, but in most systems, unrealistic guesses of the parameters can be avoided. When used judiciously, the method renders estimates that are often much closer to the correct value than a wild guess, as has been demonstrated in systems where they could be compared to the actual values (21).

_{i}The expanded Fermi solution.When it comes to a factor whose value is unknown, it is much more plausible to identify upper and lower limits for it than to specify a single value. For example, it would be more convincing to state that the number of pathogenic cells that causes acute poisoning is between 20 and 40 than to state that it is exactly 27. The same can be said about giving ranges for the probabilities (*P _{i}*'s), rather than specific values. According to the expanded Fermi solution (13), once the ranges have been specified, the factors, namely,

*N*and the probabilities (

*P*'s), are regarded as independent random variables. Since we usually have little or no knowledge of the values of the factors, except for their ranges, we regard them as random variables having a uniform probability distribution within their respective ranges; this represents the case of maximum ignorance. Of course, if more information was available about a particular factor, another distribution deemed more appropriate could be used instead. Since

_{i}*N*and the

*P*are random variables, so is

_{i}*M** in equation 1. The distribution of

*M** will often be approximately lognormal, for the following reason: since $$mathtex$$\[\mathrm{log}(M{\ast}){=}\mathrm{log}(N){-}\mathrm{log}(P_{1}){-}\mathrm{log}(P_{2}){-}\mathrm{log}(P_{3}){-}\ .\ .\ .\ {-}\ \mathrm{log}(P_{k})\]$$mathtex$$(2) and since the terms on the right side of the equation are independent random variables, a version of the central limit theorem implies that log(

*M**) will have (approximately) a normal (Gaussian) distribution (15). The distribution will approach a perfect normal distribution as the number of terms in equation 2 increases. Consequently, if log(

*M**) is normally distributed,

*M** will be log normally distributed. The mode of the distribution of

*M**, i.e., the value with the highest frequency (see below), is taken to be the best estimate of

*M*.

Given the lognormal approximation, the best estimate can be calculated analytically, as follows. If μ* _{L}* is the logarithmic mean [the expected value of log(

*M**)] and σ

*is the logarithmic standard deviation (logarithmic variance σ*

_{L}

_{L}^{2}), the best estimate, i.e., the mode of the approximating lognormal distribution, is exp(μ

*− σ*

_{L}

_{L}^{2}). We denote this “analytical best estimate” by

*M**

*. The logarithmic mean and variance of*

_{A}*M**are given by $$mathtex$$\[{\mu}_{L}{=}{\mu}_{LN}{-}{\mu}_{L1}{-}{\mu}_{L2}{-}{\mu}_{L3}{-}\ .\ .\ .\ {-}{\mu}_{Lk}\]$$mathtex$$(3) and $$mathtex$$\[{\sigma}_{L}^{2}{=}{\sigma}_{LN}^{2}{+}{\sigma}_{L1}^{2}{+}{\sigma}_{L2}^{2}{+}{\sigma}_{L3}^{2}{+}\ .\ .\ .{\sigma}_{Lk}^{2}\]$$mathtex$$(4) respectively, where μ

*and σ*

_{LN}

_{LN}^{2}are the mean and variance of log(

*N*), respectively, and μ

*and σ*

_{Li}

_{Li}^{2}are the mean and variance of log(

*P*), respectively. In the case of the uniform distribution for the factors, these can be calculated from the formulas $$mathtex$$\[{\mu}_{LN}{=}\frac{N_{\mathrm{max}}\mathrm{log}(N_{\mathrm{max}}){-}N_{\mathrm{min}}\mathrm{log}(N_{\mathrm{min}})}{N_{\mathrm{max}}{-}N_{\mathrm{min}}}{-}1\]$$mathtex$$(5) $$mathtex$$\[{\mu}_{Li}{=}\frac{P_{i\ \mathrm{max}}\mathrm{log}(P_{i\ \mathrm{max}}){-}P_{\mathrm{min}}\mathrm{log}(P_{i\ \mathrm{min}})}{P_{i\ \mathrm{max}}{-}P_{i\ \mathrm{min}}}\]$$mathtex$$(6) $$mathtex$$\[{\sigma}_{LN}^{2}{=}\frac{N_{\mathrm{max}}[\mathrm{log}(N_{\mathrm{max}})]^{2}{-}N_{\mathrm{min}}[\mathrm{log}(N_{\mathrm{min}})]^{2}}{N_{\mathrm{max}}{-}N_{\mathrm{min}}}){-}2{\mu}_{LN}{-}({\mu}_{LN})^{2}\]$$mathtex$$(7) and $$mathtex$$\[{\sigma}_{Li}^{2}{=}\frac{P_{i\ \mathrm{max}}[\mathrm{log}(P_{i\ \mathrm{max}})]^{2}{-}P_{i\ \mathrm{min}}[\mathrm{log}(P_{i\ \mathrm{min}})]^{2}}{P_{i\ \mathrm{max}}{-}P_{i\ \mathrm{min}}}{-}2{\mu}_{Li}{-}({\mu}_{Li})^{2}\]$$mathtex$$(8) where the subscripts max and min indicate the upper and lower bounds of the ranges, respectively.

_{i}A simple calculus argument shows that, as the factor ranges become narrower, the best estimate calculated in this way will become closer to that calculated by equation 1, had all the factors' values been known exactly.

When the lognormal approximation holds, it is also easy to find a credible interval for *M*. A credible interval is a Bayesian analogue of a confidence interval. A 95% credible interval for *M* is a range or interval of numbers, from *x = a* to *x = b* (*b* > *a*), such that the probability of *M* lying between *a* and *b* is 0.95. (Any other percentage is possible; we use 95% for illustration.) It may be regarded as a range of plausible values of *M* at the 95% level of confidence (or credibility). In the lognormal case, a 95% credible interval, denoted by *C _{A}*, is given by

*a =*exp(μ

*− 1.96σ*

_{L}*) and*

_{L}*b*= exp(μ

*+ 1.96σ*

_{L}*) (11, 13).*

_{L}Other methods are available to compute the best estimate and credible interval in those cases where the lognormal approximation does not hold (13); however, the simplest approach is by simulation, as described next.

Calculation of best estimate by simulation.The simulation method that we now describe and which is implemented in our software (see below) provides a simple way to determine the best estimate and credible interval, whether or not the lognormal approximation holds. We refer to these as the “simulation best estimate” *M*** _{S}* and credible interval

*C*. The software allows one to check the lognormal approximation visually and to calculate the analytical best estimate

_{S}*M**

*and credible interval*

_{A}*C*given above.

_{A}The Monte Carlo simulation starts with a specification by the user of the ranges for the factors *N* and *P*_{1},…, *P _{k}* in the form of intervals from

*N*

_{min}to

*N*

_{max},

*P*

_{1 min}to

*P*

_{1 max}, etc. The user also chooses the number of Monte Carlo simulations (

*S*), where

*S*is, say, 1,000. A random-number generator, which is part of the program, then generates random values for

*N*and the

*P*'s within their respective ranges. From each such set of random factor values, a random value of

_{i}*M** is computed using equation 1, i.e., by dividing the random value of

*N*by the product of the random values of

*P*'s. This yields a random sample of S values of

_{i}*M** (

*S*= 1,000 in this example). These values can be used to study the probability distribution of

*M** regardless of whether the lognormal approximation holds. The software yields a histogram of the

*M** values so generated and plots the best-fitting lognormal curve, which allows visual assessment of the fit. The software estimates the mode of the distribution of

*M**, based on the histogram; this mode is called the simulation best estimate

*M**

*. Similarly, the simulation 95% credible interval*

_{S}*C*is the interval from the 2.5 percentile to the 97.5 percentile of the simulated

_{S}*M** values. In our software, the only user input is the specification of the ranges; everything else is automated. The factor values are uniformly distributed within their respective ranges.

The two methods to calculate the best estimate, i.e., by Monte Carlo simulations and analytically, are the basis of a Wolfram Demonstration recently posted by the authors on the Internet (see http://www-unix.oit.umass.edu/aew2000/WolframDemoLinks.html and click on “Expanded Fermi Solution to Retrodict the Initial from the Final Number in a Stochastic Process”) (Fig. 1). Notice that with the number of simulations being on the order of a thousand, the corresponding best estimates are very close to each other and are almost identical, for all practical purposes. They are also practically independent of the seed which is used to generate the random entries. The Wolfram Demonstration project is a compilation of over 6,000 interactive demonstrations to date in almost every field of the physical sciences (including mathematics), the social sciences, engineering, and the arts. It was initiated and has been hosted by Wolfram Research, the company that has developed Mathematica. The Wolfram Demonstrations themselves have been contributed by Mathematica users around the world. To use the Wolfram Demonstrations, one has first to download Mathematica Player, a free software. It is not necessary to have Mathematica installed on one's computer. This would be needed only if one wants to modify the code of an existing Wolfram Demonstration or write a new one. (However, one can see an animated version of almost all the Wolfram Demonstrations in the project without downloading Mathematica Player by clicking on the “watch web preview” at the top-right corner of the web display.) What is unique to the Wolfram Demonstrations is that the parameters for each plot can be entered and altered by moving sliders on the screen and/or by clicking on a box setter, and the display will be modified accordingly almost instantaneously. This enables the user to examine a large number of contemplated scenarios within a very short time (12).

## RESULTS AND DISCUSSION

Some hypothetical examples. (i) Example 1: survival of a single cell.Suppose the probability *P*_{1} of a pathogenic cell present on a tomato to survive washing is in the range of 0.3 to 0.5, to survive the stomach environment (*P*_{2}), 0.05 to 0.1, to survive the bile (*P*_{3}), 0.7 to 0.9, and to survive the pancreatic juice (*P _{4}*), 0.8 to 0.9. (For information on the antibacterial activities of the stomach, bile, and pancreatic fluid, see for example, the work of Peterson et al. [14], Hofmann and Eckmann [8], and Rubinstein et al. [17], respectively.) In addition, suppose the probability that a cell will be able to establish itself on the gut wall once it reaches it (

*P*

_{5}) is in the range 0.4 to 0.8. If so, what is the most probable number of cells on the tomato (assumed to be eaten whole) that will result in one surviving cell at the gut wall? According to the traditional Fermi solution, we would probably guess random values within each of the probability range, e.g., that

*P*

_{1}is equal to 0.4,

*P*

_{2}is equal to 0.05,

*P*

_{3}is equal to 0.8,

*P*

_{4}is equal to 0.8, and

*P*

_{5}is equal to 0.7 and, hence, that the initial number estimate the midvalues of the factors' ranges,

*M**, calculated with equation 1 is

_{F}*M** = 1/(0.4 × 0.06 × 0.9 × 0.8 × 0.7) = 83 (rounded). The Fermi solution calculated with the probabilities assuming their respective ranges' middle value is 82 (rounded). The expanded Fermi solution yields 77 (rounded) as the solution when calculated analytically (Fig. 2 and Table 1), and the Monte Carlo simulation method yields a very similar number, typically in the range of 75 to 79. Notice that because of the random probability entries, the actual best estimate is not exactly the same in every simulation. But, as previously stated, with

_{F}*S*being on the order of a thousand, the corresponding best estimates are very close to each other and are almost identical. As demonstrated in Table 1, a slight change in the factor ranges does not affect the best estimates substantially, an indication that the method is robust against small perturbations. We return to this point later. Also notice that had the question been what pathogen load would have resulted in 10 or 20 surviving cells at the gut wall, the corresponding best estimates would be that found for 1 cell multiplied by 10 and 20, respectively. Obviously, any initial number of cells higher than the best estimate,

*M**, will almost certainly result in at least 1 cell reaching the gut viable. However, since we are dealing with a stochastic process, cases where the pathogen presence in a smaller number than the best estimate might sometimes result in a surviving cell or spore at the gut, albeit at a lower probability. The cumulative form of the (lognormal) distribution of the best estimate is shown in Fig. 2. The bottom plots demonstrate that as the number of the initial cells or spores increases past the best estimate, the probability that one of them will survive the journey fast approaches (but never reaches exactly) 100%.

(ii) Example 2: survival of an ingested pathogen.An example where the number of colonizing pathogen cells needed to cause infection is uncertain and hence ought to be specified by a range is given in Fig. 3, with the details presented in Table 2. As before, there is good agreement between the best estimate calculated analytically and that derived from the Monte Carlo simulations. Here, the estimates reached by the expanded Fermi solution differed substantially from those reached by the traditional method (based on the middle values as guesses). Table 2 also demonstrates that in this case, too, small changes in the factor ranges had a small effect on the best estimate, a manifestation of the method's robustness. This has been observed in many other factor combinations, an exercise facilitated by the ease of changing the ranges with sliders on the screen in our software. In this case, the cumulative curve depicts the probability not that a single cell or spore will survive and settle in the gut but that a number of cells or spores between the specified *N*_{min} and *N*_{max} will complete the journey successfully.

(iii) Example 3: survival of probiotic bacterial cells.An example of how to estimate the number of probiotic bacterial cells needed to be present in a beverage, for instance, in order to have 1 to 10 million settliing in the gut is given in Fig. 4, and some of its details are presented in Table 3. The estimation method is the same as that used in the previous examples, except that the numbers are of different orders of magnitude. As seen in Fig. 1, the scale of *N* can be selected by the multiplier bar setter, while that of *M* is automatically adjusted by the program. The observations concerning the method's robustness, the differences between the estimates obtained by the expanded and traditional versions of the Fermi solution, and the agreement between analytical and Monte Carlo methods' results have all been repeated (Table 3). The curves at the bottom of Fig. 4 are again the cumulative form of the estimates' distribution. As in the previous example, the probability is of the survival of a number between the specified *N*_{min} and *N*_{max}, not that of any particular number of cells.

(iv) Example 4: survival of probiotic spores in a heat-processed food.Foods can be enriched with the endospores of probiotic bacilli, which will germinate upon arrival to the gut. Such foods can be heat treated to rid them of pathogens and inactivate their enzymes, thus rendering them biochemically stable, provided that the process is not intensive enough to destroy the probiotic spores themselves. (This can also be done with heat-resistant lactobacillus cells in dairy products, for example.) But the heat can activate some of the spores, making them susceptible to conditions in the digestive tract, thus reducing their effective number. The method of estimating the initial load is essentially the same as in the previous example. But besides the scale that might be different, there are other probabilities added to the list. Here *P*_{1} is the probability that the spore will survive the heat treatment, *P*_{5} that it will successfully germinate after reaching the gut, and *P*_{6} that cells originating from the germinated spores will succeed in establishing themselves in the gut. An example of a hypothetical scenario of this kind is given in Fig. 5 and Table 4. All the observations concerning the method's performance reported in the previous three examples were repeated here, too, as expected.

The four examples given above are intended to demonstrate how the expanded Fermi solution works for the kind of problem that each represents rather than a solution to specific actual problems. Consequently, the numbers in the figures and tables, although realistic, should not be used as if they were real data on food-borne pathogens or probiotics. For actual use of the method and software, the reader will have to identify the pertinent survival level in the gut and the (up to six) most important probabilities that determine the cells' or spores' fate in the system in question and then estimate their ranges, on the basis of previous knowledge and/or published results on the organism's survival in *in vitro* or animal experiments. The narrower that the factors' ranges are, the closer will be the best estimate to the correct value. Thus, the method should not be viewed as an alternative to knowledge but as a tool to be used in the absence of hard information. Although the focus has been on foods, the methodology can be equally applicable to drinking water accidentally contaminated with a pathogen or insufficiently disinfected. In either case, experts' opinions on a particular survival level or probability may vary considerably, which could result in disagreement concerning the critical initial microbial load and, hence, on how to respond in real-life situations. But it will be much easier for the experts to agree on the pertinent parameters' ranges, which will help them reach a consensus. Also, the Wolfram Demonstration allows an individual expert or a team of experts to examine very rapidly the consequences of different contemplated limits on the parameters, which will help them to improve the accuracy and reliability of their assessment through mutual persuasion.

The method's usefulness is not limited to food or water safety issues or to medical microbiology, for that matter. It can also be an extremely useful tool in the development of food products for effective delivery of probiotic cells and spores. In principle, the same expanded Fermi solution method can also be used to examine the survivability of nutrients and nutraceuticals during processing and storage, in the human digestive system once it is ingested, or in supplements taken orally. In either case, the results could be exploited in the development of products that will guarantee an effective dose of the biologically active agent.

We have not attempted to demonstrate that the method presented here renders realistic estimates of the number of viable cells or spores reaching the gut for any specific microbial system. Thus, all the numerical values used in the examples are purely hypothetical and need not agree with the actual parameter ranges of any particular organism. The original Fermi solution (equation 1) and its expanded version (equations 3 to 8) have been developed for situations where the available data are insufficient to derive a reliable estimate or make an accurate prediction. However, the method's performance can be tested *a posteriori* against clinical data or by the examination of laboratory experimental data especially designed for the purpose.

## ACKNOWLEDGMENTS

We thank Eric Decker of the Department of Food Science at the University of Massachusetts for suggesting extension of the method to the delivery of probiotics.

## FOOTNOTES

- Received 17 June 2010.
- Accepted 27 October 2010.
- Accepted manuscript posted online 5 November 2010.
† This is a contribution of the Massachusetts Agricultural Experiment Station at Amherst.

- Copyright © 2011, American Society for Microbiology