**DOI:**10.1128/AEM.00020-11

## ABSTRACT

In Europe, the Drinking Water Directive of the European Commission indicates which methods (most of which are CEN/ISO-standardized methods) should be used for the analysis of microbiological parameters (European Commission, Environment, Council Directive 98/83/EC of 3 November 1998). According to the Directive, alternative methods “may be used, providing it can be demonstrated that the results obtained are at least as reliable as those produced by the methods specified.” The prerequisite for the routine use of any alternative method is to provide evidence that this method performs equivalently to the corresponding reference method. In this respect, the ISO 16140 standard (ISO, *ISO 16140. Microbiology of Food and Animal Feeding Stuffs—Protocol for the Validation of Alternative Methods*, 2003) represents a key issue in generating such a procedure based on an interlaboratory study. A new statistical tool, called the accuracy profile, has been developed to better interpret the data. The study presented here is based upon the enumeration of Escherichia coli bacteria in water. The reference method may require up to 72 h to provide a confirmed result. The aim of this publication is to present data for an alternative method by which results can be obtained in 18 h (Colilert-18/Quanti-Tray) based upon defined substrate technology (DST). The accuracy profile is a statistical and graphical decision-making tool and consists of simultaneously combining, in a single graphic, β expectation tolerance intervals (β-ETIs) and acceptability limits (λ). The study presents the validation criteria calculated at the three levels of contamination used in the trial for a β equal to 80% and a λ equal to ±0.3 and combines the accuracy profiles of Escherichia coli for a λ of ±0.3 log_{10} unit/100 ml, a λ of ±0.4 log_{10} unit/100 ml, and a β of 80% or 90%. Several interesting conclusions can be drawn from these data. The accuracy profile method has been applied to the validation of the Colilert-18/Quanti-Tray method against reference method ISO 9308-1 (ISO, *ISO 9308*-*1. Water Quality—Detection and Enumeration of Escherichia coli and Coliform Bacteria. Part 1. Membrane Filtration Method*, 2000), using a β of 80% and a λ of 0.4; the alternative method can be validated between 1.00 and 2.05 log_{10} units/100 ml, equivalent to 10 to 112 CFU/100 ml.

## INTRODUCTION

Until now, there has been little formal guidance on procedures for adopting alternative methods for determining levels of microbes in water. From a metrological point of view, the first step in developing a procedure is to define the measurand. The measurand itself may be simply defined as “the quantity intended to be measured” (in reference 16, see section 2.3). Due to the nature of microorganisms and the well-recognized concept of CFU, the currently accepted method for defining the measurand and for ensuring traceability in microbiology consists of using a reference (or official) method. This is usually a historic method which has been standardized and is recognized as reliable by the community of microbiologists and regulatory bodies. We must keep in mind, however, that microbiological methods, even if they have the status of reference methods, are based in counts of discrete units, and hence their uncertainties have intrinsic, unavoidable, stochastic components that make their values higher than the uncertainties of most chemical methods. This is still more accentuated in the most probable number (MPN) methods, where the calculated result is the mode of a statistical distribution of values, but around this mode there are values with lower probabilities. Ideally, each MPN should be expressed with a confidence interval according to this fact. As microbiological methods are empirical (also called “direct”) analytical techniques, the measurand is highly dependent upon the operating procedure. Therefore, the measurand, as defined by a “reference” method, can be somewhat different when determined by an “alternative” method. For practical purposes, however, and to undertake the validation, it is necessary to make this approximation. This situation is rather typical when alternative and reference methods are compared.

Taking into account the points highlighted above, the prerequisite for the commercial retail and routine use of any alternative method is to provide evidence that this method performs equivalently to the corresponding reference method. To provide such evidence, the manufacturers of alternative proprietary kits, the food and beverage industry, the public health services, and other authorities require a reliable and commonly agreed upon procedure for the validation of such alternative methods. In this respect, the ISO 16140 standard is a key issue in generating such a procedure (13).

As suggested in its title, ISO 16140 (13) separately proposes validation protocols for both quantitative and qualitative methods (1). An interesting requirement of the standard for quantitative method validation is the organization of an interlaboratory study in accordance with the recommendations of ISO 5725 (15a). Some years ago, microbiologists were reluctant to participate in collaborative studies due to the assumed instability of samples. Much improvement has occurred with regard to stabilization of samples, and interlaboratory studies are now commonly used for proficiency testing programs. This classical collaborative approach can now also be applied to method validation in microbiology; sample instability and organizational constraints are no longer an issue (2, 4, 12).

The classical statistical strategy employed for the interpretation of data in many validation procedures is based largely on null-hypothesis testing. This type of analysis aims to demonstrate that an alternative method does not produce results significantly different from those of the reference method. This strategy presents many drawbacks that have been described extensively in recent publications (7, 18). The most striking observation was that the more uncertain the results obtained by an alternative method are, the easier the validation is. For this reason, a new statistical tool, called the accuracy profile, has been developed to better interpret the data of a validation study, so that misleading conclusions are avoided. This methodology has been extensively applied to chemical analytical methods and, more recently, in the field of food microbiology (8).

The study presented here is based upon the enumeration of Escherichia coli bacteria in water. The choice to use the accuracy profile for this parameter is based largely upon recent changes introduced by new European regulations and subsequently by regulation 2073/2005 of the Commission of the European Communities and its amendments for microbiological criteria (3).

The objective of these regulations is to ensure that drinking water is free of pathogens such as viruses, protozoa, and bacteria. Waterborne pathogens cause diseases such as hepatitis, giardiasis, and dysentery. The analysis of water for the presence of specific harmful viruses, protozoa, and bacteria is time-consuming and thus expensive. In addition, not all analytical laboratories are equipped and approved to proficiently perform the required testing. Water testing for the presence of specific organisms is therefore limited to investigating specific waterborne disease outbreaks.

E. coli and coliform bacteria are a broad class of bacteria found in the environment and also in the feces of humans and other animals. Therefore, the presence of coliform bacteria and, in particular, E. coli in drinking water may indicate the presence of harmful, disease-causing organisms. For this reason, the enumeration of E. coli cells in water is increasingly used to assess water quality.

The current reference method has several limiting factors, in particular, time, as it may require up to 72 h to provide a confirmed result. The aim of this publication is to present data for an alternative method by which results can be obtained in 18 h. In the context of a validation study, a collaborative study was organized and data were collected according to the guidelines of ISO 16140 (13). The interpretation of these data using the accuracy profile approach is presented, and the “fitness for purpose” of this alternative method versus the reference method is ascertained.

## MATERIALS AND METHODS

The reference method for the enumeration of E. coli organisms is the published standard ISO 9308-1 (15). This method consists of using Tergitol 7-triphenyl tetrazolium chloride (TTC) agar after sample filtration. The standard operating procedure can be summarized as follows.

Filter 100 ml of a sample with a sterile membrane as described in ISO 7218 (14).

Carefully place the membrane on Tergitol 7-TTC agar.

Incubate the samples at 36 ± 2°C for 21 ± 3 h.

If no typical colonies are present, incubate the samples at 36 ± 2°C for an additional 24 ± 2 h.

When presumptive coliform colonies (lactose-positive colonies which show a yellow color development in the medium under the membrane) are present, a confirmatory step is required. Selected colonies of presumptive E. coli and non-E. coli coliform bacteria are subcultured onto a nonselective medium and incubated at 37 ± 1°C for 24 ± 2 h. Confirmation involves testing the colonies for oxidase activity and the production of indole. The colonies which are oxidase negative and indole negative are presumed to be non-E. coli coliforms, whereas colonies which are oxidase negative and indole positive are presumed to be E. coli.

(ii) Alternative method.The alternative method (Colilert-18/Quanti-Tray) is based upon the Defined Substrate Technology (DST) of IDEXX Laboratories, Inc. (5). Colilert-18/Quanti-Tray simultaneously detects and enumerates total coliform and E. coli bacteria in water. When total coliform bacteria metabolize the nutrient indicator, *o*-nitrophenyl galactopyranoside (ONPG), the sample turns yellow. When E. coli metabolizes the nutrient indicators ONPG and methylumbelliferyl-β-d-glucuronide (MUG), the sample turns yellow and fluoresces under UV light. Only E. coli detection and enumeration are used in the present study.

One of the outputs of this study is the quantification limit for the Colilert-18 method. The Colilert-18 operating procedure can be summarized as follows.

Add the contents of one blister pack to a 100-ml room temperature water sample in a sterile vessel.

Cap the vessel, and shake it until the reagent is dissolved.

Pour the sample-reagent mixture into a Quanti-Tray and seal it in an IDEXX Quanti-Tray sealer.

Incubate the sealed tray at 36 ± 2°C for 18 to 22 h.

Read the results according to the result interpretation table, and count the number of positive wells.

No confirmations are needed. The most probable number (MPN) can be calculated from the number of positive wells (see the Appendix) or read in the table provided with the trays to convert the number of positive wells to MPN format. The values in this table agree with those calculated with the FDA's *Bacteriological Analytical Manual* (BAM) method (19), used in the Appendix.

Experiment. (i) Experimental design.The experimental design used for the interlaboratory study is described in ISO 16140 (see reference 13, section 6.3.3 and Annex H). The aim of the design is to comparatively determine the performance characteristics (accuracy and precision) of an alternative method against the corresponding reference method. The design consists of at least eight participating laboratories producing usable results. The first step of the interlaboratory study is to select a single well-mixed representative water sample. The water sample selected for this study was artificially contaminated with E. coli strain ESC.1.131, an environmental strain isolated from water. Samples were contaminated at three nominal levels (expressed as CFU/100 ml): control, 0 (sterile); low, between 1 and 10; medium, between 10 and 50; and high, between 50 and 200. The zero-contamination level was prepared for control purposes only and was not included in the calculations. Prior to inoculation, the absence of E. coli and coliform bacteria in the water sample was confirmed by the organizing laboratory according to NF EN ISO 9308-1 (15).

Each batch of sample was divided into 100-ml aliquots that were transferred to sterile vials, which were subsequently closed and sealed with tape. Each vial was individually well mixed prior to shipment to a participant. Each participant received two samples per contamination level and was required to make duplicate analyses of each sample. A total of 16 enumerations were performed by each laboratory.

The stability of inoculated samples was determined over a 3-day period using a prototype sample stored at 4 ± 2°C. The results of the stability study are presented in Table 1. Following log transformation, data were subjected to a one-way analysis of variance (ANOVA), and stability was found to be satisfactory over a 48-h period. Samples were packaged at +4°C in thermostatic boxes containing a temperature control probe (TomProbe, catalog no. MD30100; AES Chemunex) prior to shipment by express mail to the participant. In order to be included in the study, results for each sample had to be reported to the Institut Scientifique d'Hygiène et d'Analyse (ISHA) within 48 h.

Eleven participating laboratories (*I* = 11) were selected to participate in the collaborative study. Samples were labeled according to the following coding system: one letter, from A to K, for the identification of the laboratory and a random number, from 1 to 8, anonymously assigned by the organizing laboratory in order to identify the level.

One week before the trial, participants received detailed instructions on the operating procedure and the necessary quantities of analytical reagents and kits required for the study. All participants agreed to perform the analyses within 24 h of receiving the samples. Receipt of samples was acknowledged within 24 h, and all samples were analyzed within 48 h according to the instructions provided by the organizing laboratory.

In addition, the organizing laboratory also provides a test protocol and a data sheet for recording experimental data and critical experimental conditions within each laboratory.

During the interlaboratory study, only one type of matrix is required, as a preliminary study has previously been undertaken to fully define the types of matrices for which the method is applicable. The results of this preliminary study are not presented here. Subsequently, each participating laboratory receives three subsamples at the three levels of contamination and performs duplicate analyses with each method (alternative and reference). In all, each participant returns 12 (3 × 2 × 2) analytical results, not including the results for the control sample.

(ii) Statistical processing.Data are gathered by the organizing laboratory and processed in order to calculate validation criteria, such as repeatability, reproducibility, and bias between the two methods. Classical interpretation of data consists of testing hypotheses for each validation criterion. This strategy is often misleading and can result in contradictory conclusions. For this reason, it was decided to apply the new strategy of the accuracy profile to aid data interpretation.

The accuracy profile is a statistical and graphical decision-making tool aimed at helping the analyst to conclude whether an analytical procedure is valid. It consists of simultaneously combining, in a single graphic, β expectation tolerance intervals (β-ETIs) and acceptability limits (λ) (see the Appendix for definitions of terms). β-ETIs (or average tolerance intervals) are defined as intervals that cover, on average, a certain percentage of a distribution. In practical terms, β-ETIs can be claimed to contain, on average, for example, 80% of future measurements. β-ETIs should not be confused with confidence intervals, which characterize only statistical parameters, such as an average, as β-ETIs relate to individual future observations.

Analysts are often interested in estimating the average value in a population. Information about the population average, in the form of a sample estimate, can be deduced by drawing an interval or range of values around the sample average which is likely to include the true population average. Such ranges are generally referred to as confidence intervals. However, on occasion, the range of values in a population is of greater importance than the average. In such cases, another type of interval, a tolerance interval, may be useful. Average tolerance limits define the bounds of an interval which contains, on average, a specified proportion (β) of the measurements.

In contrast, acceptability limits are defined as the allowable difference that can be accepted between the reference and alternative methods without misinterpreting a result. For example, in many cases, a difference of 0.3 log or 0.4 log_{10} unit/100 ml between the result obtained by a reference method and that given by an alternative method affects the interpretation with regard to the conformity of a sample.

In so far as validation must cover the complete application domain of the method, the accuracy profile combines both tolerance intervals and acceptability limits calculated at several levels of contamination across the application domain of the method, thus meeting the criteria for validation.

The theory, calculation, and application of accuracy profiles to chemical analyses are described in detail elsewhere (9, 10, 11). When applied to microbiological analyses, some modifications are necessary. The construction of the accuracy profile can be summarized as a sequence of nine steps, listed below (8). Within the calculation, *i* is the identification index of a participating laboratory, and 1 ≤ *i* ≤ *I*, where *I* is the total number of laboratories participating in the trial. A further identifier (*j*) is the identification index of a replicate, and 1 ≤ *j* ≤ *J*, where *J* is the number of replicates, which is assumed to be the same for each laboratory-level combination. Finally, *k* is the identification index of a level, and 1 ≤ *k* ≤ *K*, where *K* is the number of contamination levels. According to the recommendations of ISO 16140 (13), *I* should be greater than 8, *J* should equal 2, and *K* should equal 3.

The construction of an accuracy profile involves the following steps. (i) Define the acceptability criterion (λ), usually ±0.2 or ±0.3 decimal log unit/100 ml, for the alternative method. It is typical to select a single value for λ for all accuracy profiles, but it is possible to choose different values depending on the level of contamination. (ii) Collect the analytical results (in CFU/100 ml) obtained by the reference method within the interlaboratory trial. For each level of contamination, calculate the median result [*T*_{(k)}] obtained with the reference method and log transform the data. These values are called reference or target values. (iii) Collect the results (in CFU/100 ml) obtained by the alternative method and log transform the data. These data are denoted *X*. (iv) For each level, *k*, using *x _{ijk}*, calculate the reproducibility standard deviation (

*s*). The principle behind this calculation is that the total variance of all replicates of one level is modeled according to a random-effect ANOVA, where the random effect corresponds to the laboratory. This method consists of splitting total variance into the within-laboratory variance (s

_{R}_{r}

^{2}), also called repeatability variance, and the between-laboratory variance (s

_{L}

^{2}). This classical statistical procedure is fully described in ISO 5725-2 (15b). Finally, the reproducibility standard deviation for one level of contamination can be calculated using equation 1:

_{(k)}] of measurements made with the alternative method. (vi) For each level, calculate the absolute bias according to equation 2:

*s*

_{(}

_{k}

_{)}

_{R}is the standard deviation of reproducibility and

*k*

_{(}

_{k}

_{)}

_{M}is its coverage factor for level

*k*[see “Definitions of terms,” above, for the

*k*

_{(}

_{k}

_{)}

_{M}calculation].

*T*

_{(}

_{k}

_{)}according to equation 4.

On the horizontal (

*x*) axis, plot the target values [*T*_{(}_{k}_{)}] in decimal log units (logs).On the vertical (

*y*) axis, simultaneously plot the bias (equation 2), the acceptability limits (±λ), and the tolerance interval limits (equation 4), all expressed in log_{10}units.

In this context, the acceptability criterion (±λ) is expressed as an acceptable difference, as we are dealing with logarithms, whereas in fact this difference can be interpreted as a ratio. Acceptability criteria represent the maximum acceptable differences between a result obtained by an alternative method and that given by the reference method.

All calculations in this study were performed using Microsoft Excel, and specific worksheets were prepared for this purpose. These worksheets may be downloaded from http://www.paris.inra.fr/metarisk/downloads/software_programs/excel_templates). The interpretation of the accuracy profile is as follows. If across the validation domain of the method, all β-expectation tolerance intervals (β-ETIs) are included within the acceptability limits, the method is declared valid over this range. Where any β-ETI value exceeds one of the acceptability limits, the method is not valid and the validity domain must be reduced.

This can be interpreted as follows. According to the definition, a β-ETI should contain, on average, β% of the predicted future results. Therefore, the analyst and end user can be confident that β% of future results will fall between the limits of this interval. As long as this percentage is included in the acceptability limits, the analyst can be confident that his measurements are comparable to those obtained by the reference method, with an acceptance of ±λ log_{10} units.

## RESULTS

Reference material preparation.Within the scope of this validation study, the Colilert-18 method was submitted for validation according to the ISO 16140 procedure (13). Results were collected during a collaborative study organized by the Institut Scientifique d'Hygiène et d'Analyse (ISHA, Massy, France) and supervised by the AFNOR Certification Board.

Raw data (expressed as numbers of CFU/100 ml) are presented in Table 2.

Since microbial counts are not normally distributed, it was decided to transform the data into decimal logarithms, as is typical with such analyses. For each level of contamination, the reference value, *T* (or target value), was calculated as the median result obtained with the reference method. For the low, medium, and high contamination levels, target values in CFU/100 ml and their corresponding log_{10} values (in parentheses) were 12 (1.079), 64 (1.806), and 120 (2.079), respectively. These values are somewhat different from the expected nominal values, demonstrating that the reference method and/or reference sample preparation procedure also represents a source of uncertainty. Counts obtained for the alternative method were also transformed into log_{10} units.

Linearity.Linearity was achieved graphically as illustrated in Fig. 1 by simultaneously plotting the logarithm results obtained by each method on the same sample. It can be seen that the alternative method gives results which are proportional to those of the reference method, and this complies with the definition of linearity. Additionally, the slope of the linear regression line between both methods is close to 1 and confirms the linearity of the data. It is not useful to perform any conformity hypothesis testing on this linear regression because within the framework of the accuracy profile approach, no hypothesis testing is performed. This is intended, because with hypothesis testing, usually only the null hypothesis is verified, whereas a true estimate of the performance of the test requires defining an acceptance limit for each alternative hypothesis. It is therefore preferable to globally use the acceptance limit (±λ), which allows a more informed decision to be as described below. Although this graphical interpretation may seem rather subjective, it is considered sufficient.

In Fig. 1, the medians of the measurements obtained at each level with the reference method are also represented. These values are used to define the target (*T*) values and to calculate the trueness of the alternative method.

Accuracy profile.For this study, acceptability limits were set at two values: ±0.3 and ±0.4 log_{10} unit/100 ml. In terms of the number of CFU/100 ml, 0.3 corresponds to a factor of 2 in the closeness of agreement of results generated by the alternative method compared to the reference method. For example, when an acceptability limit of ±0.3 is applied, if the reference method gives a result of 10 CFU/100 ml (or 1 log unit), it is deemed acceptable that the alternative method gives extreme results at 5 or 20 CFU/100 ml. Likewise, when using an acceptability limit of ±0.4 log_{10} unit/100 ml, this maximum interval becomes 4 to 25 CFU/100 ml. It should be noted that these values are maximum acceptable limits, and it is expected that performance of the alternative method will be at least as good as that of the reference method.

These levels of acceptance may appear to be rather lenient in some analytical domains, such as food chemistry, but correspond well to the actual decision rules that are applied to regulatory controls when traditional microbiological analyses based on bacteria growing on agar media are used. Additionally, these thresholds take into account all possible sources of uncertainty, for example, changes within the sample during handling and storing, dilution inaccuracies, sample heterogeneity, matrix effects, the physiological state of the bacteria, the ability of bacteria to grow and develop a colony, and many other factors, such as laboratory effects, not included here.

The proportion (β) of future results falling within the β-ETI was also set at two levels: 80% and 90%. For the alternative method at a given concentration, on average, β% of results for the alternative method are comprised between the limits of the β-expectation tolerance interval. If this tolerance interval is included within the limits of acceptability, the method is claimed to be valid.

Table 3 presents the validation criteria calculated at the three levels of contamination used in the trial for a β of 80% and a λ of ±0.3. Figure 2 combines the accuracy profiles of *E. coli* for a λ of ±0.3 or ±0.4 log_{10} unit/100 ml and a β equal to 80% or 90%, and several interesting conclusions can be drawn from these data.

For a β of 80% and a λ of ±0.4, the alternative method can be validated between 1.00 and 2.05 log

_{10}units/100 ml (i.e., 10 and 112 CFU/100 ml).For a β equal to 80% and a λ equal to ±0.3, the tolerance interval limits are contained within the acceptability limits only for low and medium contamination levels, and the alternative method cannot be validated over all studied domains.

The bias (the difference between the target value and the average result) is small but varies from 0.02 to 0.09 log

_{10}unit/100 ml as the bacterial concentration increases. This may explain why the upper tolerance interval limit exceeds the acceptability limit for higher contamination levels.When any β-ETI limit intersects any acceptability limit, the alternative method is determined not to be valid above or below the corresponding concentration. This point is marked by a vertical arrow in Fig. 2 and corresponds to a concentration of 1.96 log

_{10}units/100 ml (about 92 CFU/100 ml). This value can be denoted the upper limit of quantification (ULOQ) of the alternative method.With regard to the lower limit of quantification (LLOQ), the lowest level of the validation domain can be used, as it is not possible to extrapolate to values which were not actually included in the study in determining LOQs. However, it can be assumed that alternative-method quantification performance is superior to this limit.

## DISCUSSION

The accuracy profile method has been applied to the validation of the Colilert-18/Quanti-Tray against the reference method ISO 9308-1 (15). Using a β of 80% and a λ equal to 0.4, the alternative method can be validated between 1.00 and 2.05 log_{10} units/100 ml, equivalent to 10 to 112 CFU/100 ml.

The bias which is shown in Fig. 2 may be largely due to the difference in the principles of enumeration between the reference method and the alternative method. The reference method is based upon the enumeration of bacterial colonies on an agar medium, while the alternative method is based upon the most probable number (MPN) approach, the results of which derive from a calculation.

It may be an interesting exercise to adjust the observed bias by applying a correction factor, although the values with this correction are no longer useful for method validation purposes. As the slope of the regression line between the reference and the alternative methods is 1.02, as illustrated in Fig. 1, a correction factor of 2% can be proposed. This correction factor was applied to all log-transformed data, and validation criteria were recalculated (Fig. 3). It can be seen that, with these revised data, the alternative method can be deemed as valid over all studied domains when a β of 80% and a λ of ±0.3 are chosen. The ULOQ is then 2.05 log_{10} units/100 ml, but the LLOQ remains unchanged.