**DOI:**10.1128/AEM.01724-09

## ABSTRACT

To assess interchangeability of estimates of bacterial abundance by different epifluorescence microscopy methods, total bacterial numbers (TBNs) determined by most widely accepted protocols were statistically compared. Bacteria in a set of distinctive samples were stained with acridine orange (AO), 4′-6-diamidino-2-phenylindole (DAPI), and BacLight and enumerated by visual counting (VC) and supervised image analysis (IA). Model II regression and Bland-Altman analysis proved general agreements between IA and VC methods, although IA counts tended to be lower than VC counts by 7% on a logarithmic scale. Distributions of cells and latex beads on polycarbonate filters were best fitted to negative binomial models rather than to Poisson or log-normal models. The fitted models revealed higher precisions of TBNs by the IA method than those by the VC method. In pairwise comparisons of the staining methods, TBNs by AO and BacLight staining showed good agreement with each other, but DAPI staining had tendencies of underestimation. Although precisions of the three staining methods were comparable to one another (intraclass correlation coefficients, 0.97 to 0.98), accuracy of the DAPI staining method was rebutted by disproportionateness of TBNs between pairs of samples that carried 2-fold different volumes of identical cell suspensions. It was concluded that the TBN values estimated by AO and BacLight staining are relatively accurate and interchangeable for quantitative interpretation and that IA provides better precision than does VC. As a prudent measure, it is suggested to avoid use of DAPI staining for comparative studies investigating accuracy of novel cell-counting methods.

Bacterial abundance is an instrumental parameter in assessing the roles of bacteria in the environments (18, 27, 30, 45). While a variety of techniques are available (1, 30, 53, 60), staining bacterial cells with acridine orange (AO) (29) or 4′,6-diamidino-2-phenylindole (DAPI) (48) and counting them on black polycarbonate (PC) filters by epifluorescence microscopy have become the standard procedure for direct counting (9, 18, 30). The Live/Dead BacLight staining kit, which is widely accepted as a rapid measure of viability of individual cells, also provides a total count of bacteria (10). Currently, most studies reporting total bacterial numbers (TBNs) use one of the three staining methods described above. However, the basic question of which fluorochrome to use for a given samples still presents challenges, as comparative studies using two or more of these fluorochromes have often yielded conflicting results (10, 17, 20, 34, 37, 40, 49, 52, 54, 57, 58).

A more perplexing question is whether TBN values based on different fluorochromes are interchangeable for a quantitative interpretation incorporating TBN data from different methods. A large-scale intersystem study, an analysis of long-term collection of longitudinal data, or a collaborative study by multiple laboratories often requires an amalgamated use of TBN values from different fluorochromes. Apart from the interchangeability of fluorochromes, there is another complication at the step of cell enumeration. For example, TBN estimates by digital image analysis (IA) on microscope fields were often either slightly higher (3, 44) or significantly lower (25) than those found by visual counting (VC). With the introduction of various instrument-aided enumeration methods, including photomicrography IA (43, 55, 59), laser-scanning microscopy (8, 36), flow cytometry (2, 27, 34), and microfluidic devices (1, 53), TBN values are now reported based on various combinations of fluorochromes and enumeration methods. Considering the rapid advancement of novel enumeration technologies, establishing a robust “gold standard” method that can estimate bacterial abundance with high accuracy and precision is more in demand than ever.

However, the robust gold standard that can validate novel methods and calibrate different methods apparently does not exist yet, largely due to insufficient attention to random errors and biases involved with fluorochromes or enumeration methods (9, 30). In the studies reporting general agreement among TBN methods (22, 34, 41, 44, 53, 59), using correlation or ordinary linear regression as the only or major evidence of agreement appears to be a major analytical drawback. Since measurements under comparison are from the same quantity, i.e., the true value, intrinsic correlation is naturally expected. Therefore, analytical approaches based on correlation are biased toward finding an agreement (7), and hence, the strength of agreement cannot be objectively quantified. In cases reporting discrepancies between different TBN methods (17, 25, 35, 43, 48, 54, 57, 58), sources of biases were not identified due to the limitation of knowledge on the true abundance values or lack of estimation of precisions of methods. Error propagations of TBN methods were analyzed by several studies (13, 23, 32, 39) but have been limited to identification of sources of error for a specific method (35, 36), instead of comparing precisions and accuracies of commonly used TBN methods. Therefore, a comprehensive statistical study to reveal the intrinsic nature of the errors and biases of conventional TBN methods is necessary to establish the robust gold standard method for determining TBNs. In essence, the statistical study should compare different combinations of staining and enumeration methods that are used as the standard method for calibration of novel TBN methods or those that are most widely used for TBN estimation, either to establish a robust gold standard method for TBN estimation or to understand differences in TBN values reported in the literature.

In this study, we performed intensive analyses on accuracies and precisions of the conventional TBN methods and determined agreements among their measurements. For the fluorochromes, the three most-used fluorochromes (AO, DAPI, and BacLight) were compared. For the enumeration methods, we employed VC, which is the traditional gold standard method for enumeration of bacteria, and a simple supervised IA method as a representative, using photographic images by imaging instruments. In comparison to other novel instrument-aided enumeration methods, these two methods could validate the objects being enumerated by human decision. Therefore, they had best potential as a part of the gold standard for TBN estimation. In many studies, these methods were implicitly regarded as the gold standard method in estimation of bacterial abundance. We applied Bland-Altman analysis (5) to quantify difference of measurements, characterized intrinsic errors of count data by generalized linear models (64), and determined accuracies of methods based on the confidence interval (CI) of ratios of average cell counts by a generalized pivotal approach (15). Based on these statistical properties of the methods, we identified biases intrinsic to each method and addressed which methods are accurate and interchangeable.

## MATERIALS AND METHODS

Sample preparation.To compare statistical properties of bacterial cell counts by different direct methods, we employed a two-way factorial design comprising eight samples and three staining methods (Fig. 1). Four samples, samples A to D, were two different dilutions of two bacterial strains, *Lactobacillus* sp. strain IMSNU 10111 and *Escherichia coli* IMSNU 10085, and maintained in the Institute of Microbiology, Seoul National University, Seoul, South Korea. We selected them as representatives of the Gram-positive and the Gram-negative bacteria. The strains were cultivated in Lactobacilli MRS broth (Difco Laboratories, Detroit, MI) and EC broth (Difco), respectively, at 35°C for 24 h, to reach the stationary phase. For each culture (optical density of 0.7 to 1.0), two samples, i.e., samples A and B from the *Lactobacillus* sp. culture and samples C and D from the *E. coli* culture, were made by diluting 100 μl of each culture to a 1-ml aliquot with phosphate-buffered saline (PBS; pH 7). Samples E and F were natural water obtained from two different locations. Sample E was from Lake Soyang, Chuncheon, South Korea, an oligomesotrophic lake. Sample F was obtained from a highly eutrophic shallow pond located in Kangwon National University, Chuncheon, South Korea. The sample F was prepared by diluting 20 ml of the raw pond water with an equal volume of filter-sterilized pond water. Two kinds of biofilm samples from soil, samples G and H, were also prepared to reflect diverse forms of environmental samples. The source of sample G was underground sections of plant roots which contained soil components as well as plant biomass. The source of sample H was leaf litters in a state of decomposition. Samples G and H were prepared by dispersing bacterial cells attached on the respective sources into water, by placing 5 g of each source material in a stomacher bag with 100 ml of PBS and treating the bags with Stomacher 400 (Seward Medical, London, United Kingdom) at the maximum speed for 15 min. The debris in the suspensions were removed by centrifugation at 3,000 ×*g* at room temperature for 15 min. Supernatants (40 ml) were collected as samples G and H. All samples were fixed by adding buffered formaldehyde to the final concentration of 2% and stored at 4°C for more than 3 h.

For each sample, a total of nine subsamples were subjected to staining and filtration to prepare triplicate filters stained with AO, DAPI, and BacLight, respectively. For samples A and C, subsamples were 50 μl of each fixed sample broth, while the subsamples of samples B and D received 100 μl of the fixed samples. Subsamples of samples E and F were 1-ml aliquots of the respective samples. For both samples G and H, 100 μl of each sample was applied as subsamples for staining and filtration. All subsamples with a <1-ml volume were diluted with PBS to 1-ml aliquots before staining.

For in-depth examination on distributions of cell counts per microscope field, we employed additional samples. *Salmonella enterica* subsp. *enterica* strain ATCC 43971, cultured in Luria-Bertani broth (Difco) at 35°C for 17 h, and *Bacillus subtilis* ATCC 6051 strain, cultured in MRS broth (Difco) at 35°C for 17 h, were fixed by buffered formaldehyde at the final concentration of 2% and subjected to staining and filtration at various serial dilutions with PBS. Fluorescent latex beads (catalog no. L1403 and L1153; Sigma-Aldrich, St. Louis, MO) of sizes 0.5 and 2.0 μm were also employed at various dilutions in distilled water or 0.05% aqueous Tween 80 solution (Junsei, Japan). Tween 80 was used as an additive to help dispersion of latex beads, which tend to form clumps in distilled water.

Staining and filtration procedures.For AO and DAPI staining, we followed the protocols most widely used according to Kepner and Pratt (30), with a modification in incubation time. To stain cells in subsamples by AO, 100 μl of 1 g liter^{−1} aqueous stock solution of AO (Merck, Whitehouse Station, NJ) was added per 1 ml of each subsample. The mixture was incubated for 3 min at room temperature until filtration (29). DAPI staining was performed by adding 100 μl of 10 mg liter^{−1} aqueous stock solution of DAPI (Sigma-Aldrich) per 1 ml of each subsample suspension. The mixture was incubated for 15 min at room temperature in the dark, until filtration (48). Since there were understaining and fading issues raised for DAPI staining, we extended the incubation time from the conventional duration of 5 min (30) to 15 min.

The Live/Dead BacLight stain mixture, which contained equal volumes of SYTO-9 solution and propidium iodide solution provided in the BacLight viability kit (catalog no. L13152; Molecular Probes, Eugene, OR), was prepared according to manufacturer's instruction. The stain mixture (1 ml) was added to 1 ml of each subsample suspension and incubated for 15 min at room temperature in the dark.

For filtration of all subsamples, we used the single identical filtration funnel device (XX1002500; Millipore, Billerica, MA). For subsamples with less than 2 ml of volume after staining, the volume was adjusted to 2 ml by adding PBS before filtration. A stained subsample was filtered onto a 0.2-μm-pore-size black PC filter (GTBP Isopore membrane filter, 25-mm diameter; Millipore, MA) placed on a supporting filter (cellulose nitrate filter, 25-mm diameter, 0.45-μm pore size; Whatman, Maidstone, United Kingdom). After staining and filtration procedures, PC filters were mounted onto glass slides by overlaying on a drop of mounting agent. For AO- or DAPI-stained subsamples, the immersion oil (refractive index at 20°C of 1.515 to 1.517; C_{14}H_{12}O_{2}; Merck, Darmstadt, Germany) was used as the mounting agent while the low-fluorescent mounting oil, included in the BacLight kit, was used for BacLight-stained subsamples. For comparison of cell counts when an antifading agent was used for DAPI staining, Citifluor (Ted Pella, Inc., Redding, CA) was used (see the supplemental material).

Enumeration by epifluorescence microscopy.For each filter, 12 microscope fields were randomly selected, and the identical 12 fields were counted by both VC and supervised IA. Each field was defined as the square outlining the 10-by-10 ocular grids (counting squares) installed in the Olympus BX-60 epifluorescence microscope equipped with HBO 100W/2 mercury lamp. Observations were performed at ×1,000 magnification with a 100× UPlanApo objective lens (Olympus, Japan). Depending on the fluorochromes, appropriate dichromatic filter sets were selected to fit their excitation/emission spectra, i.e., blue excitation filter set (U-MWB; Olympus) for AO and BacLight and UV exciting filter set (U-MWU; Olympus) for DAPI. Immediately after focusing the microscope on a field, microphotography was performed with the aid of an Infinity 2 charge-coupled-device (CCD) camera (Lumenera, Ottawa, Canada), to obtain 1,392- by 1,040-pixel-size joint photographic coding experts group (JPEG) format image files, each pixel of which comprising red, green, and blue colors in 36 bits, resulting in 4,096 intensity values. On the same field, VC was performed immediately after the microphotography. The bacterial cells in each JPEG file were counted by a supervised IA process as follows: (i) bacterium-like objects of a saved image were calibrated as separate objects by changing the thresholds of contrast, brightness, and expected color settings; (ii) the objects were visually validated by sorting bacteria-like morphology versus other shapes; and (iii) validated objects were counted by semiautomatic enumeration option of the IA software (i-Solution version 6.3; IMT-iSolution Inc., Vancouver, Canada), which allowed for manual marking of individual objects. To compare results of VC and IA for the same area of a microscope field, we enumerated the cells only in the region of the JPEG image that corresponds to the ocular grid area used in VC. The region was determined precisely to the pixel unit by photographing micrometers aligned to match the ocular grids. The ocular grid formed a 1,060- by 1,060-pixel area in the JPEG image, consistently throughout the study.

Statistical models.The statistical programming environment R version 2.8.0 (50) was used for all statistical analyses in this study. When test of normality was required, Shapiro-Wilk's test was performed. For test of homogeneity of variance, distributions of tested values were visually checked for homogeneity, and the significance of the Spearman rank correlation coefficient was used as the criterion for heterogeneity. All significances were evaluated at the type I error level of 5%.

Since there were combinations of three staining methods for eight samples, the experimental design was a two-way factorial design in which each combination was replicated by three subsamples (Fig. 1). The individual value of cell counts by supervised IA or VC was obtained as cell counts per microscope field, which can be regarded as split-plots nested under the two-way factorial whole-plot. The resulting values were modeled as equation 1:
$$mathtex$$\[x_{ijkl}{=}{\mu}_{x}{+}{\alpha}_{i}{+}{\beta}_{j}{+}({\alpha}{\beta})_{ij}{+}{\gamma}_{ijk}{+}{\varepsilon}_{ijkl}\]$$mathtex$$(1) where *i* is A, B, C, D, E, F, G, or H (samples); *j* is AO, BacLight, or DAPI (fluorochromes); *k* is 1, 2, or 3 (filters containing different subsamples); *l* is 1, 2, 3, …, or 12 (microscope fields); *x _{ijkl}* is a cell count by IA;

*μ*is the grand mean of all

_{x}*x*'s;

_{ijkl}*α*is the main effect of samples;

_{i}*β*is the main effect of fluorochromes; (

_{j}*αβ*)

_{ij}is the first-order interaction between samples and fluorochromes;

*γ*is the random error due to subsampling; and

_{ijk}*ε*is the random error due to difference of microscope fields. Cell counts by the VC method (

_{ijkl}*y*) was modeled with the same parameters by replacing

_{ijkl}*μ*with

_{x}*μ*, the grand mean of all

_{y}*y*'s. We considered

_{ijkl}*β*as the fixed effect while treating the other factors as random contributions to

_{j}*x*or

_{ijkl}*y*. In fitting these models or their derivatives, the generalized linear modeling approaches were used (64).

_{ijkl}Analyses of agreement and precision.For agreement between a pair of methods, Bland-Altman analysis (5, 7), which is the most-cited standard method for visualization of instrumental agreements and discrepancies (51), was used. This method is based on the rationale that the conventional correlation analysis is not appropriate for analysis of agreement but differences should be analyzed in order to visualize bias of a new method from the gold standard method (7). Limits of agreement were determined by 95% CI of the mean of the differences of TBN estimates by a pair of methods (*d*). In Bland-Altman analysis, normality and homoscedacity of random errors of *d* are assumed for parametric determination of the CI (6). To meet this assumption, log transformation was required. When the normality assumption was not met, the nonparametric approach that uses quantiles (6) was used. Since there were triplicate measurements for each method, the overall variance of *d* (*σ _{d}*

^{2}) was estimated as equation 2, according to equation 5.3 of Bland and Altman (6). $$mathtex$$\[{\sigma}_{{\bar{d}}}^{2}{=}s_{{\bar{d}}}^{2}{+}\frac{2}{3}s_{pw}^{2}{+}\frac{2}{3}s_{qw}^{2}\]$$mathtex$$(2) where $$mathtex$$\(s_{{\bar{d}}}^{2}\)$$mathtex$$ is the observed variance of the differences between within-subject means, and $$mathtex$$\(s_{pw}^{2}\)$$mathtex$$ and $$mathtex$$\(s_{qw}^{2}\)$$mathtex$$ are the observed within-subject variances from measurements by the same method, for methods

*p*and

*q*, respectively. $$mathtex$$\(s_{pw}^{2}\)$$mathtex$$ and $$mathtex$$\(s_{qw}^{2}\)$$mathtex$$ were estimated by mean squares in an analysis of variance (ANOVA) for each method (5).

For the functional relationship between estimates of a pair of methods, the major axis of model II regression was used, since both of two estimates were random variables (56). For comparison of precisions by different methods, ANOVA was performed, and intraclass correlation coefficient was calculated as the index of contribution of random measurement errors to the total variance (56). To calculate variances for each fluorochrome, separate models were employed for each fluorochrome, e.g., $$mathtex$$\(x_{ikl}{=}{\mu}_{x}{+}{\alpha}_{i}{+}{\gamma}_{ik}{+}{\varepsilon}_{ikl}\)$$mathtex$$. Since a cell count in a sample or a subsample is conventionally represented as the filterwise mean value of cell counts per microscope field, the random error due to field-to-field variation (ε_{ijkl}) can be dropped from the model, resulting in a simpler ANOVA model. For this purpose, ANOVA was performed on the mean value of cell counts per filter (x̄* _{ik.}*) for each fluorochrome, with the model x̄

_{ik.}=μ

_{x}+α

_{i}+γ

_{ik}, where the period represent all 12

*l*values.

Analysis of accuracy.To assess accuracies of TBN methods, proportionateness of TBN estimates between a pair of samples with a known difference of cell densities was tested. Samples with such properties were the pair with samples A and B and the pair with samples C and D. The pairs of samples received identical bacterial populations and differed only in the volume of bacterial cultures employed. Practically, accuracy was tested by calculating the ratio of the mean of sample B to the mean of sample A (see Discussion for more descriptions on this rationale), since we found that the TBN estimates per filter (x̄* _{ijk.}*) had log-normal distributions. The test examined whether the CI of the ratio of log-normal means included the volume ratio value 2. To build the CI of the ratio, we used generalized CIs of the log-normal mean ratio by a generalized pivotal approach (15, 33). According to the theory of the method, the ratio of means (denoted as

*R*) of the two log-normal (Log-N) populations of TBNs is as follows:

*R*=

*m*

_{1}/

*m*

_{2}= exp(

*μ*

_{1}+

*σ*

_{1}

^{2}/2 −

*μ*

_{2}−

*σ*

_{2}

^{2}/2), where

*m*

_{1}and

*m*

_{2}are the means of the TBN estimates

*Y*

_{1}∼ Log-N(

*μ*

_{1},

*σ*

_{1}

^{2}) and

*Y*

_{2}∼ Log-N(

*μ*

_{1},

*σ*

_{1}

^{2}), and the generalized pivotal quantity for ln(

*m*

_{1}/

*m*

_{2}) is

*T*=

_{R}*T*

_{1}−

*T*

_{2}, where

*T*is the generalized pivotal quantity for ln(

_{i}*m*), with

_{i}*i*being 1 or 2.

*T*was calculated from the three realized values, namely, the observed sample mean (ȳ

_{i}*), the observed sample variance (*

_{i}*s*

^{2}) and the sample size (

*N*), and the two independent random parameters were calculated with known distributions, namely,

_{i}*Z*∼

_{i}*N*(0, 1) and $$mathtex$$\(Q_{i}{\sim}{\chi}_{N_{i{-}1}}^{2}\)$$mathtex$$, as equation 3. $$mathtex$$\[T_{i}{=}{\bar{y}}_{i}{-}\frac{z_{i}}{Q_{i}/\sqrt{N_{i}{-}1}}\frac{s_{i}}{\sqrt{N_{i}}}{+}\frac{1}{2}\frac{s_{i}^{2}}{Q_{i}/\sqrt{N_{i}{-}1}}\]$$mathtex$$(3) We determined the CI of the ratio of log-normal means by the 2.5% quantile and 97.5% quantile values of

*T*values, simulated 10,000 times by generating random values of

_{R}*Z*and

_{i}*Q*from the given distributions.

_{i}## RESULTS

Agreement between IA and VC.From the experimental design of this study (Fig. 1; equation 1), each of identical 864 microscope fields were counted by both the IA and VC methods, yielding *x _{ijkl}* and

*y*, respectively. In a pairwise comparison of

_{ijkl}*x*and

_{ijkl}*y*, a significant correlation was found (Fig. 2a), as expected. Bivariate variance by

_{ijkl}*x*and

_{ijkl}*y*appeared to increase with cell counts, i.e., the crowdedness of a microscope field, and became relatively homogeneous by transforming

_{ijkl}*x*and

_{ijkl}*y*by logarithm (Fig. 2b). Model II regression between the log-transformed values generated the major axis with the slope 1.07 (95% CI = 1.05 to 1.09;

_{ijkl}*P*< 0.001; permutation test;

*n*= 999), indicating a slight but significant bias from the slope value 1 (Fig. 2b). Among the 864 fields, 70% of fields showed VC values that were higher than IA counts, while 29% showed IA counts that were higher than VC values. The bias toward higher VC values seemed to be caused due to more-frequent, low-VC-value outliers clustered in the zone where the regression predicted VC values that are lower than IA counts. This zone was determined as ln(

*x*) ≤ 3.8 (corresponding to

_{ijkl}*x*≤ 44 cells per microscope field) based on the point where the regression line intercepts the line of identity (the dashed line in Fig. 2b).

_{ijkl}Bland-Altman analysis on the log-transformed variables visualized the magnitude and variability of differences between the two counting methods (Fig. 2c). When the expressions ln(*y _{ijkl}*) − ln(

*x*) and [ln(

_{ijkl}*y*) + ln(

_{ijkl}*x*)]/2 were denoted as

_{ijkl}*d*and

_{ijkl}*m*, respectively, the variance of

_{ijkl}*d*also differed by

_{ijkl}*m*= 3.8 as the border line. The high variance zone of

_{ijkl}*m*≤ 3.8, comprising 157 fields out of 864 fields, was mostly populated with

_{ijkl}*d*from DAPI staining. For both

_{ijkl}*m*≤ 3.8 and

_{ijkl}*m*> 3.8 zones,

_{ijkl}*d*values were skewed to the negative values, indicating that the majority of outliers were due to very low

_{ijkl}*y*in comparison to the

_{ijkl}*x*of the same field. The overall mean of differences, which was designated the bias by Bland and Altman (7), was −0.06 for the

_{ijkl}*m*≤ 3.8 zone and 0.07 for the

_{ijkl}*m*> 3.8 zone. These numbers indicated that log TBN by the VC method was 107% of that by the IA method for

_{ijkl}*m*> 3.8 and 94% for

_{ijkl}*m*≤ 3.8. Calculated for all 864 fields, the log TBN by the VC method was 104% of that of the IA method. Therefore, based on the results of model II regression and Bland-Altman analysis, visual counts were concluded to be significantly higher than IA counts in most cases.

_{ijkl}Parameterization of distribution of cells on PC filters.To compare precisions by VC and IA methods and to understand intrinsic features of the random error *ε _{ijkl}* in equation 1, the distribution of cells on each PC filter was parameterized by fitting the cell counts from 12 microscope fields of a filter to typical count data models. Purely random scattering of particles in a given area is typically modeled as Poisson distribution, which is a single-parameter model with the variance identical to the mean (19). When

*x*and

_{ijk.}*y*were modeled by Poisson distributions, 63% of

_{ijk.}*x*and 79% of

_{ijk.}*y*of 72 filters did not pass the goodness-of-fit test at a significance level of 5%. When an overdispersion test (12) was performed to evaluate the suitability of two-parameter models, the distribution of cell counts per field in 17 filters showed significantly better fitness to negative binomial (NB) models (

_{ijk.}*P*< 0.05; likelihood ratio test). Fitness to NB models was acceptable for all

*x*(0.25 ≤

_{ijk.}*P*≤ 0.99; goodness-of-fit test) and

*y*(0.28 ≤

_{ijk.}*P*≤ 0.94). The fitness was also confirmed visually by comparing the observed densities and expected densities of the counts. The two filters that produced lowest probability of fitness to NB distribution are shown in Fig. 3a and b as examples.

To test whether NB distribution is unique to the samples and the enumeration procedures employed in this study or general to most conventional schemes for determining TBNs employing PC filters, we enumerated 100 microscope fields of each PC filter prepared with pure cultures of other bacteria and latex beads. For *Salmonella enterica* and *Bacillus subtilis*, the frequency of cell counts per field fitted well to NB models (*P* = 0.47 and 0.38, respectively), while Poisson models deviated from the observed frequencies with high significance (*P* < 0.0001) (Fig. 3c and d). Latex beads with diameters of 0.5 μm also showed distributions on the filters with NB models (Fig. 3e and f). Latex beads with diameters of 2 μm formed clusters of various numbers of beads in distilled water, and the distribution was not suitable to be described with an NB model (Fig. 3g). When 0.05% Tween 80 was applied, beads did not form clusters, and their distribution fitted to an NB model (Fig. 3h). Therefore, NB distribution appears to be an intrinsic feature in the distribution of cells on PC filters. For further analyses, we estimated parameters of the distribution of cell counts on each of all 72 PC filters.

Precisions of IA and VC.Based on NB models fitted to *x _{ijk.}* and

*y*, variances of cell counts in each filter were estimated and compared (Fig. 4a). In general, VC values had significantly higher variance (and hence standard deviation [SD]) than did IA counts (

_{ijk.}*P*< 0.01; one-tailed Wilcoxon signed-rank test on paired samples), placing 56 filters out of 72 filters above the line of identity in Fig. 4a. This observation advocated that the severe outliers with negative values among the

*d*values shown in Fig. 2c were mostly caused by the high variability in VC values rather than in IA counts. When the skewness of

_{ijkl}*x*and

_{ijk.}*y*was estimated from the NB models, VC values showed significantly more skewness (

_{ijk.}*P*< 0.01; one-tailed Wilcoxon signed-rank test on paired samples), placing 50 filters out of 72 filters above the line of identity in Fig. 4b. Therefore, it was concluded that the VC method is less precise than the IA method.

Agreement among staining methods.The filterwise mean of cell counts per field (x̄* _{ijk.}* or ȳ

*) for each PC filter was determined by the NB model fitted to cell counts of the 12 fields. The ANOVA models with the log-transformed values of x̄*

_{ijk.}*or ȳ*

_{ijk.}*met the normality and homoscedacity assumptions, while various transformation of*

_{ijk.}*x*and

_{ijkl}*y*did not yield a normal model for

_{ijkl}*x*and

_{ijkl}*y*, largely due to the fact that the random error

_{ijkl}*ε*in equation 1 was an NB distribution. This result implied that

_{ijkl}*γ*, the random error due to subsampling estimated from the triplicate subsamples, could be analyzed as log-normal distributions.

_{ijk}Comparisons of ln(x̄* _{ijk.}*) or ln(ȳ

*) values by pairs of AO versus BacLight, AO versus DAPI, and BacLight versus DAPI were performed by model II regression, Bland-Altman analysis, and ANOVA. In model II regressions, the AO-versus-BacLight pair yielded significant slope values (*

_{ijk.}*P*< 0.05;

*n =*999) that were not significantly different from 1 (Fig. 5a). The slopes were 1.17 (95% CI = 0.45 to 3.63) for the IA method and 1.17 (95% CI = 0.41 to 4.36) for the VC method. In comparisons of DAPI versus BacLight and DAPI versus AO, the slopes were insignificant (

*P*> 0.05), and 95% CI did not include 1 (Fig. 5b).

In Bland-Altman analysis of ln(x̄* _{ijk.}*) values, the bias estimate from the AO-versus-BacLight comparison was merely −0.05 (the solid red line in Fig. 5c), and this value was >10 times less than those from the comparisons of DAPI versus the other fluorochromes (the solid blue/green lines in Fig. 5c). In comparisons of the limits of agreement, it was peculiar that only the lower limits of agreement differ considerably, largely due to values from samples E and H (Fig. 5c). These observations were also applicable to Bland-Altman analysis on ln(ȳ

*) values. By reverse transforming the bias estimate of −0.52 between DAPI and AO, DAPI counts were on average about 60% of the AO counts. Therefore, DAPI was concluded to underestimate cell counts by about 40% in comparison to the other two methods.*

_{ijk.}As suggested from agreements between the results of AO and BacLight methods, difference in fluorochrome was not a significant factor (*P* = 0.22) when *j* ≠ DAPI in the ANOVA with the model of ln(x̄* _{ijk.}*) =

*μ*+

_{x}*α*+

_{i}*β*+ (αβ)

_{j}_{ij}+

*γ*, although it was highly significant when DAPI was included (

_{ijk}*P*< 0.01). Interestingly, the interaction between fluorochrome and samples, i.e., the interaction term (

*αβ*)

_{ij}, was still highly significant even when DAPI was excluded from the model (

*P*< 0.001), indicating that the relationship between AO and BacLight depends on the sample. In a one-tailed

*t*test with individual samples, sample F showed AO counts that were significantly higher than BacLight counts (

*P*< 0.05), while samples G and H showed significantly higher BacLight counts (

*P*< 0.05). These results implied that the general agreement between AO and BacLight methods can be violated for certain types of samples.

The interchangeability between AO counts and BacLight counts can be estimated with the bias and limits of agreement by Bland-Altman analysis. Calculated with IA counts of all eight samples, the ratios of AO counts to BacLight counts had a mean of 0.95%, with a CI of 38% and 239%. The bivariate distributions of cell counts shown in Fig. 5a indicated that sample G had largest deviation from the agreement between AO and BacLight method. When sample G was excluded in calculations of bias and limits of agreement between AO and BacLight methods, the bias was 0.1, with 95% limits of agreement of −0.4 and 0.6 by the IA method (Fig. 5c), which implied that the ratio of AO counts to BacLight counts was 110%, with a CI of 66% and 185%. Calculated with VC values, biases and limits of agreements were similar to those with IA values (Fig. 5c).

Precisions of staining methods.Precision of TBN count by three different staining methods were compared by decomposing variances by their sources and calculating intraclass correlation coefficients for each combination of staining methods and counting methods, for both ln(x̄* _{ijk.}*) and ln(ȳ

*). As expected, residual variances were very small in comparison to TBN differences among different samples, producing intraclass correlation coefficient ranging from 0.97 to 0.98 (Table 1). Therefore, it was concluded that repeatability of TBN estimation under the influence of subsampling was comparable among the three staining methods.*

_{ijk.}Accuracy of counting methods and staining methods.The results on the estimation of ratios and their CIs between log-normal means of samples A to D indicated that the AO method had the expected ratios of ∼2 in both *E. coli* and *Lactobacillus* sp. samples (Table 2). BacLight also produced the expected ratio of 2 for *Lactobacillus* samples, but a marginal level of bias was noted for *E. coli* because of its producing an expected ratio of 1.7, with the upper limit of the CI close to 2. DAPI showed significant inaccuracy for both strains with both enumeration methods. The biases that occurred in DAPI counts were inconsistent, showing underestimation for *Lactobacillus* while showing overestimation for *E. coli*. From Fig. 5a, the causes of under- or overestimation of the ratio were identified as underestimation of cell counts for samples B and C. Because the physiological conditions of cells in those samples were identical to samples A and D, respectively, the cause of underestimation in samples B and C by DAPI staining was not related to properties of cells influencing DAPI staining (57).

## DISCUSSION

NB distribution of cell counts per field.For randomly distributed particles, a Poisson distribution of count values is expected. In the variance analysis by Kirchman et al. (32), counts per field showed significant deviation from the Poisson distribution due to overdispersion in one out of three water samples and violation of assumptions of ANOVA in both native and log-transformed data sets. In a similar approach on bacterial cells in marine sediments, the deviation was explained as log-normal distributions (23, 39). Lisle et al. (36) also found that areal distributions of fluorescent beads and ChemChrome-stained *E. coli* O157:H7 cells on PC filters were adequately described by log transformation, with a slight difference in model parameters between the beads and the bacterial cells. Large virus- or bacteria-sized (0.2 μm) fluorescent microspheres in distilled water or artificial groundwater were distributed either normally or log normally on 0.1-μm-pore-sized PC filter (13). In the enumeration of fecal bacterial flora described by Thiel and Blaut (59), both IA and VC counts per field were normally distributed. Therefore, prior studies suggested that areal distribution of cell counts on PC filters can be approximated to normal, Poisson, or log-normal distributions. However, results of this study demonstrated that frequency distribution of cell counts per field of PC filters follows NB models, consistent among all kinds of samples we have examined, i.e., pure cultures of Gram-negative and Gram-positive bacteria, water samples, biofilm samples, and latex beads. While log-normal distribution was most widely accepted in prior studies, we found that only 15 out of 78 filters examined in this study fitted to log-normal distributions with significance (Shapiro-Wilk's test; *P* < 0.05). This result implies that NB distribution is the general and adequate model for field-to-field variations of cell counts and that Poisson or log-normal models are working cases of approximation of the intrinsic NB distributions. By varying filtration conditions, we could demonstrate that Poisson, normal, and log-normal distributions can be obtained sometimes, while NB distribution could be fitted for most cell count distributions (see the supplemental material).

NB distributions are often found among bacterial counts in various scales, including repeated measurements and spatial or temporal distribution of specific bacteria (14, 19, 26, 47), which correspond to distributions of *α _{i}* or

*γ*in the model of equation 1. A published case of NB distribution of bacterial cells in a microscope scale, i.e., the scale of

_{ijk}*ε*in equation 1, can be found in counts of ingested bacteria within nanoflagellate cells (11). The authors of the latter study interpreted that the mechanism of NB distribution was the summation of different grazing rates of subpopulations of the protozoa. On PC filters, difference in the volumes of water samples that passed through pores of a microscope field can be analogous to this mechanism. Uneven distribution of pores and uneven flow rates due to the architecture of the filtration apparatus can cause the volumetric differences. Clumping of bacterial cell was suggested as another mechanism of NB distribution (26). While the clumps are randomly distributed as a Poisson model, logarithmic distribution of the number of bacteria per clump generates NB distributions in theory. In our experiment with latex beads in distilled water without detergent (Fig. 3g), clumps of various numbers of beads were collected on the PC filter. When we examined the number of beads per clumps, it fitted to an exponential distribution. Therefore, clump formation seems unlikely to be the mechanism of NB distribution of bacterial cells on PC filters.

_{ijkl}NB distribution can be highly skewed when the dispersion parameter has a large value. This property implies that the confidence limit of the estimated mean is asymmetric, with the tendency of the upper limit being more sensitive to the level of dispersion than that of the lower limit. Therefore, NB distribution being intrinsic for within-filter distribution of cells has a direct bearing on how we can interpret the count data. The mean value from a single filter can be sensitive to dispersion of bacterial cells on a PC filter as well as the true bacterial abundance.

Differences between counts by VC and IA.When the two enumeration methods were compared, discrepancies were often noted. Grivet et al. (25) used the two methods for quantitative assessment of bacterial adhesion to hard surfaces and found IA counts being ∼10% lower than visual counts. Zhou et al. (63) noted that threshold-based IA methods tended to underestimate cell counts. In the present study, 7% underestimation of log TBN by IA was validated by model II regression and Bland-Altman analysis.

Clumping of cells was suggested as one of the reasons for the underestimation by IA. Differentiating individuals cells in small clumps may be difficult in the IA working with only two-dimensional silhouettes (55). Clumps also cause a masking effect due to high fluorescence (28). Daims and Wagner (16) demonstrated that microcolony-forming bacteria are often packed so tightly *in situ* that several adjacent cells or whole-cell clusters are detected as one single object by IA software. However, in this study, pure cultures of bacteria (samples A to D) also produced underestimation by IA, indicating that the underestimation by IA is not limited to certain kind of samples, but is intrinsic to all samples.

The presence of weakly stained, presumably inactive or dead cells may also cause underestimation in IA because the sensitivity of CCD cameras is lower than that of human eye sight. In this case, the difference between IA and VC counts may vary by samples, depending on their content of weakly stained cells. Another reason for the difference can be overestimation of cell counts by VC by overcounting, i.e., counting a cell more than once. In the use of IA, a CCD camera recorded the image, and counting was performed with the aid of the software that tracks counted cells. During VC, the tracking depended on the observer's subjective decision. Hence, the VC method was prone to error of overcounting. Finally, the difference in multiplicity of focal depths between IA and VC methods can be the source of differences in the two methods. Since the IA method takes a snapshot picture of the microscope field, only one focal depth is used for enumeration, unlike human eyes that can have more focal planes during observation.

In any case, the results of our study indicated that mean values of log TBN by IA needed to be corrected by multiplying by 1.07 to match that of VC. However, one needs to be cautious for the risk of an overcounting problem when we consider VC as the gold standard method. The precision of the IA method appears to be superior to that of the VC method due to less field-to-field variation, largely due to a short duration of exposure to excitation lights. The IA method, equipped with a highly sensitive camera and capable of counting multiple or deep focal planes, may function better as the ideal gold standard.

Since there are various CCD instruments and software, our observation on the precision and the bias of IA against VC might not be generalized for all kinds of IA. The bias might not be as severe as it was observed in this study when a highly sensitive CCD camera is used. The result of this study, however, points out that there is a tendency of underestimation by IA since most CCD cameras in use for determining TBN are not as sensitive as human eyes. The better precision of IA observed in this study can be generally applicable because of less exposure time and validation procedures of IA which prevents multiple counting of the same cell.

Estimation of accuracy.In the statistical sense, accuracy of measurement is the closeness of a measured value to its true value (56). In estimating accuracy of bacterial counts by microscopic methods, obtaining the true value is a challenging task. Chae et al. (13) regarded the true value as the best available measure of true concentration, and this concept led them to seek the counts obtained under the most favorable conditions as the true estimates of particle abundance under microscope. In practicing the concept, the authors used the greatest counts with highest particle density per field as the true particle counts per field. In the study by Lisle et al. (36), the true values were nominally determined by preparing samples with dilutions from a single high-density stock, the cell abundance of which was enumerated with a interlaboratory standardized growth curve of a bacterial strain. Because the estimation of abundance in the stock and the dilution operations are subject to random errors, we believe that Lisle et al.'s approach is acceptable only when the accuracy is interpreted on a relative scale. Chae et al.'s approach is likely to underestimate accuracy because the highest particle density may be an incidence of overestimation or close to the upper limit of possible random variation, rather than the mean.

In comparing the accuracies of the three staining methods used in this study, we addressed the issue of information on true particle density by preparing filters with identical samples in two different volumes. Our definition of accuracy was the conformity of the ratio of cell counts to the expected ratio of sample volumes filtered on PC filters (Table 2). This approach eliminated the need for knowledge on the true density of a sample or a culture stock. If the ratio of cell counts is significantly different from the ratio expected from the volumes of the samples applied to subsample filters, we can conclude a significant bias of the enumeration method. Although the conformity of the cell count ratio to volume ratio is not sufficient as evidence for absolute absence of bias, it is a necessary condition for an accurate enumeration method. Therefore, a significant deviation of the observed ratio from the expected volume ratio can be regarded as evidence for inaccuracy of a TBN determination method. We tested the significance of deviation of the observed ratio from the expected volume ratio by determining the CI of the observed ratio. Since the observed data were log normal, difference in cell counts provided the log ratio of means. By adopting newly introduced generalized pivotal value simulation (15, 33), general CI for the ratio of log-normal means could be constructed, and the CI was examined for inclusion of the expected volume ratio of 2.

Underestimation by DAPI staining method.In the analysis of the accuracy of the three staining methods, the DAPI staining method produced inaccurate results for both *E. coli* and *Lactobacillus* sp. cultures, while BacLight and AO showed good agreements (Fig. 5a and Table 2). By comparison of means and CIs, DAPI was significantly underestimating TBNs compared to AO and BacLight for samples B, C, E, G, and H (Fig. 5; ANOVA; *P* < 0.05).

The DAPI method is known to underestimate bacterial abundance in natural samples. Suzuki et al. (57) found 30% underestimation of TBNs for seawater when DAPI-stained bacterial counts were compared to AO-stained bacterial counts. Porter and Feig (48) found 16% underestimation by DAPI, on average. Newell et al. (42) suggested that this pattern was more true for seawater than it is for freshwater. Those studies recognized that only a subset of AO-stained bacteria could be stained by DAPI because some cells in environments and cultures have low DNA content for various reasons, including starvation and viral lysis. McNamara et al. (38) also demonstrated that TBNs for bacterial cells in starvation-survival mode were underestimated by DAPI staining. Because natural assemblages of bacteria comprise various bacterial species under various physiological conditions, estimation of TBNs by DAPI staining will always tend to underestimate TBNs for environmental samples.

The limitation of DAPI staining, i.e., the incapability of staining a subset of cells in natural bacterial assemblage, is a sufficient explanation for the underestimation of TBNs by DAPI staining for samples E, G, and H. However, this phenomenon still fails to explain the significant bias in ratio of log-normal means of DAPI-stained pure cultures (Table 2), i.e., samples A and B and samples C and D, because the two paired samples had identical compositions of bacterial cells. Understaining of individual cells or fading of fluorescence signals after staining could also cause underestimation in microscopic enumeration. Since we stained cells for 15 min with DAPI, understaining is unlikely to be the cause of the underestimation. This view was further supported by the fact that underestimation did not occur in the same proportions in the sample A and B pair and the sample C and D pair. In contrast to understaining, the level of loss of fluorescence signal due to fading will be relatively stochastic for each filter. The observation that only one of the two dilutions of the two pure cultures produced underestimation supports this expectation of stochastic behavior of underestimation of TBNs by DAPI.

Unlike early statements on the robustness of DAPI staining (31, 55, 57), there were several studies that reported rapid fading of DAPI fluorescence (35, 37, 61). Porter and Feig (48) also emphasized the necessity for rapid counting and making photomicrography for effective visualization and minimum fading. Some studies (21, 55) suggested to count 25 to 30 bacterial cells per field, claiming that photofading was not so much of a problem for the short duration of counting. It has been also said that immediate counting or darkroom storage of DAPI slides is recommended to ensure accurate enumeration results (62).

Differences between AO and BacLight.The AO and BacLight methods showed a good agreement and accuracy in this study and in those by others (41), implying an acceptable interchangeability. However, there were reports on at least two types of samples for which BacLight estimations will not agree with AO estimations. For *E. coli* cells in chlorinated waters, propidium iodide was impermeable (46). For activated sludge samples, BacLight is, reportedly, not appropriate because of high background fluorescence and nonspecific binding (4). These observations, therefore, suggest the possibility that the relationship of the two methods may vary by the type of sample.

In this study, sample F showed AO counts that were significantly higher than BacLight counts, while samples G and H showed significantly higher BacLight counts. Since the latter two samples were from biofilms, we hypothesized that some aspects of litter or root biofilm contributed to higher counts by BacLight. We suspected the high-density presence of bacteria-like autofluorescence particles as the cause and examined all stored samples for autofluorescent particles. Sample F showed 9 ± 14 red-fluorescent cell-like particles (mean and SD from 30 microscopic fields) while sample G showed 18 ± 17 green-fluorescent cell-like particles. In sample H, both green-fluorescent particles and red-fluorescent particles were observed with densities of 12 ± 4 and 10 ± 3, respectively. Regarding the nature of sample F, the red-fluorescent particles could be autotrophic microorganisms. Interestingly, sample G, which showed the highest bias in Bland-Altman analysis, carried a considerable amount of green-colored particles, which we presumed to originate from mineral particles of soil. Dislodgement and cell separation treatments not only release bacteria from root sections and leaf litters but also yield bacterium-sized organic and inorganic particles, which may interfere with microscopic observations (24). The green-colored particles under the dichromatic filter sets for AO and BacLight are more likely to be counted as cells in BacLight filters because the BacLight method counts both bright-red and bright-green cells, while AO method counts bright-red and faint-green cells. The dual-staining scheme of the BacLight method appears to cause a higher chance of overestimation under the influence of autofluorescent soil particles.

Interchangeability and the gold standard.Findings in this study can be summarized as follows. (i) Parameters of the distribution of bacterial cells on PC filters were best estimated by fitting cell counts per field to an NB model. (ii) The VC method was less precise than the IA method. (iii) The IA method was underestimating log TBNs by 7% in comparison to the VC method. (iv) DAPI staining was inaccurate for having stochastic underestimation. (v) TBN estimates by AO and BacLight were accurate and interchangeable with each other.

Based on those findings, we can suggest the appropriate selection and application of the gold standard method for bacterial cell counting. The choice of fluorochrome can be either AO or BacLight. If BacLight is used, one needs to be cautious of any potential bias due to high density of green autofluorescent particles from soil and impermeability of chlorinated cells. For studies where a precise standard is required, the appropriate enumeration method is the supervised IA, but the log TBN from this method should be considered about 93% of the VC values. Performing IA for multiple focal planes with a highly sensitive camera may reduce the magnitude of underestimation. To minimize bias of the standard method, VC should be employed while minimizing overcounting errors and fading of fluorescence signals during the period of counting. Using an effective anti-fading agent might help to minimize the variance of cell counts. The number of cells per microscope field that minimize variance of cell counts should be ≥45 cells per field, which was empirically determined in this study to ensure relatively small variance. We expect that the effect of signal fading is relatively large in fields with <45 cells per field, resulting in high variance among cell counts.

## ACKNOWLEDGMENTS

E.-Y.S. and Y.-G.Z. were supported by the second-phase Brain Korea 21 Project in 2008 and 2009.

## FOOTNOTES

- Received 21 July 2009.
- Accepted 8 January 2010.

- Copyright © 2010 American Society for Microbiology