**DOI:**10.1128/AEM.00514-07

## ABSTRACT

Ecological diversity indices are frequently applied to molecular profiling methods, such as terminal restriction fragment length polymorphism (T-RFLP), in order to compare diversity among microbial communities. We performed simulations to determine whether diversity indices calculated from T-RFLP profiles could reflect the true diversity of the underlying communities despite potential analytical artifacts. These include multiple taxa generating the same terminal restriction fragment (TRF) and rare TRFs being excluded by a relative abundance (fluorescence) threshold. True community diversity was simulated using the lognormal species abundance distribution. Simulated T-RFLP profiles were generated by assigning each species a TRF size based on an empirical or modeled TRF size distribution. With a typical threshold (1%), the only consistently useful relationship was between Smith and Wilson evenness applied to T-RFLP data (TRF-*E*_{var}) and true Shannon diversity (*H*′), with correlations between 0.71 and 0.81. TRF-*H*′ and true *H*′ were well correlated in the simulations using the lowest number of species, but this correlation declined substantially in simulations using greater numbers of species, to the point where TRF-*H*′ cannot be considered a useful statistic. The relationships between TRF diversity indices and true indices were sensitive to the relative abundance threshold, with greatly improved correlations observed using a 0.1% threshold, which was investigated for comparative purposes but is not possible to consistently achieve with current technology. In general, the use of diversity indices on T-RFLP data provides inaccurate estimates of true diversity in microbial communities (with the possible exception of TRF-*E*_{var}). We suggest that, where significant differences in T-RFLP diversity indices were found in previous work, these should be reinterpreted as a reflection of differences in community composition rather than a true difference in community diversity.

Microbial ecologists deal with arguably the most diverse biological communities on Earth (3, 7) and with organisms which are among the most difficult to study in their natural environments. Molecular profiling methods based on the heterogeneity of a specific gene resulting in differences in electrophoretic mobility are popular approaches for characterizing microbial community composition. Diversity indices originally adopted for macroorganisms are frequently used on microbial community profile data (12, 23), including data generated by terminal restriction fragment length polymorphism (T-RFLP) (see reference 24 for details on T-RFLP). This is appealing because diversity is at the center of a large body of ecological theory (27) and is a concept that the general public appreciates. Univariate diversity indices (e.g., Table 1) may also be an elegant summary of a complex community.

Despite their use in microbial ecology, the application of diversity indices to T-RFLP data has not been analytically validated, and several aspects common to all electrophoresis-based community profiling methods should be considered before this practice is accepted. For example, molecular profiling methods normally characterize only “dominant” organisms (e.g., >1% of the community) due to detection limits that arise whenever many markers are quantified simultaneously. Hence, rare species can never be detected, but these species often make up the vast majority of the diversity in microbial communities (13, 33). In addition, terminal restriction fragments (TRFs) of the same size can be generated from multiple taxa, which can be disparately related (1, 5, 8). The diversity of even those dominant taxa that are detected will therefore be underestimated.

Given the aforementioned properties of microbial community profiles, what can comparison of diversity indices applied to them tell us about the underlying diversity of the communities? Here, we make use of database information to answer this question for simulated communities with various levels of diversity. We examine T-RFLP of bacterial ribosomal genes because it is a popular, high-throughput method which we hope will not be abused, and databases and bioinformatics tools are available to support our simulations. The utilities of several diversity indices, including species richness, evenness, and integrated diversity indices, are evaluated.

## MATERIALS AND METHODS

The simulations required assumptions about two distributions: a species abundance distribution, which determines the “true” diversity of the community in terms of the number of species present and their relative abundances, and a TRF size distribution, which describes how the sizes of TRFs vary among species (Fig. 1). The simulations were performed by sampling a TRF size distribution to determine the T-RFLP profile for a given species abundance distribution. Species within the bacteria are difficult to define, and we use the term here to be consistent with the literature on diversity indices.

Species abundance distributions.Community species abundance distributions were simulated using the lognormal distribution, which has previously been applied to soil microbial communities (3, 6, 13). Equations defining the lognormal distribution were obtained from Dunbar et al. (6). The distribution is defined by the following equation:
$$mathtex$$\[S(R){=}\frac{S_{T}a}{\sqrt{{\pi}}}\ \mathrm{exp}({-}a^{2}R^{2})\]$$mathtex$$ where *R* is the log_{2} species abundance octave, *S*(*R*) is the number of species in octave *R*, *S*_{T} is the total number of species in the community, and *a* is the dispersion constant which incorporates the variance of the distribution [*a* = (0.5/σ^{2})^{0.5}]. The abundance of each species in octave *R*(*N*_{R}) equals *N*_{0}2^{R}. The focus of our simulations was relative abundance, so the abundance of each species was found relative to *N*_{0}, and then the total arbitrarily scaled population size of the community (from $$mathtex$$\(N_{{-}R_{\mathrm{max}}}\)$$mathtex$$ to $$mathtex$$\(N_{R_{\mathrm{max}}}\)$$mathtex$$) was determined so that relative abundances could be calculated.

The simulations consisted of systematically varying initial parameters which determined the diversity and shape of the distribution, including *a*, *S*_{T}, and *S*(*R*_{max}). The parameter *S*(*R*_{max}) is the number of species in the octave with the most abundant species. The parameter *a* was varied from 0.15 to 0.3 by intervals of 0.05, and *S*(*R*_{max}) was varied from 1 to 7 by intervals of 2. While it is often assumed that *S*(*R*_{max}) equals 1, we did not see an a priori reason to assume this when considering species with highly diverse ecological niches, as is the case for soil bacteria. *S*_{T} was varied from 10 species to different maxima, depending on the analysis (Table 2), by intervals of 10 species. For a given set of these parameters, *S*(0) was calculated as *S*_{T} × *a*/π^{0.5}, and then *R*_{max} was determined iteratively. The number of species in each octave, its relative abundance, and *S*_{T} for the resulting distribution were then calculated. To accommodate the fixed value of *a* and the discrete lognormal distribution, values of *S*_{T} and *S*(*R*_{max}) were allowed to vary slightly below and above their initial values, respectively, after the number of octaves in the distribution was determined. Combinations of parameters which did not result in meaningful distributions were discarded (e.g., distributions with *a* values of 0.2, *S*(*R*_{max}) values of 5, and 40 or fewer species were not used, because at least 42 species are required for this combination of parameters).

Empirical TRF size distributions.A database of TRF sizes was generated using the ARB 2004 small-subunit ribosomal database (28) and the ARB tool TRF-CUT (36). A total of 5,133 sequences which matched the primers Bac8F and Univ1492R underwent in silico digestions using the restriction enzymes HhaI, RsaI, MspI, and HaeIII, assuming that Bac8F was the labeled primer. A sequence was excluded from analysis for a particular enzyme if it contained missing nucleotide data before the first restriction site. This empirical database was sampled in two ways. In “sequence sampling,” sequences were randomly sampled without replacement and assigned to each species in a species abundance distribution. In “OTU sampling,” sequences were first categorized into operational taxonomic units (OTUs) by using the following approach. Genetic distances between the primed gene fragments were calculated using the Olsen distance correction factor recommended by ARB. Sequences were then clustered into OTUs by using a 3% distance cutoff and the average distance method in SAS (SAS Institute, Cary, NC). OTU sampling consisted of randomly choosing one OTU without replacement and then randomly choosing one sequence within that OTU to represent it. OTU sampling therefore avoided bias due to overrepresentation of certain OTUs within the sequence database, whereas sequence sampling was prone to this bias.

Simulations of community diversity and T-RFLP profiles.Simulations were performed using a macro written in SAS Proc IML. For a given simulation, true community diversity was varied by altering the parameters of the species abundance distribution as described above, and 10 replicate T-RFLP profiles were generated for each species abundance distribution. Replicate T-RFLP profiles were generated by repeating the random assignment of sequences to species in the distribution. The TRF signal in the simulated T-RFLP profile (representing height or area) was assumed to be proportional to the relative abundance of the species. TRF signals were pooled when two species had the same TRF size in base pairs. TRFs smaller than 50 bp and greater than 600 bp were deleted, and TRFs were expressed as relative abundances of the total signal detected in the T-RFLP profile. TRFs below a threshold (0.1, 1, or 4% of the total profile signal) were also deleted, and relative abundances of remaining TRFs were recalculated. Diversity statistics (Table 1) were calculated on both the T-RFLP profile (TRF indices) and the species abundance distribution that it was derived from (true indices).

The utilities of TRF indices were assessed by calculating their correlation coefficients with true indices. However, testing the statistical significance of the correlation coefficients is not appropriate, because the very large number of data points makes all correlations significant, even those very close to zero and obviously not useful as an analytical tool. Therefore, we used graphical displays of the data cloud (data density plots) to further interpret correlation coefficients. These were constructed by plotting the mean values and percentile ranges of a true index over each 5-percentile interval of a TRF index.

Theoretical TRF size distributions and simulations.“Parametric sampling” consisted of using mathematical approximations of TRF size distributions, allowing the total number of species in abundance distributions to be increased beyond the number of empirical sequences available in the database. Candidate models for the general form of the parametric TRF size distribution were evaluated for their abilities to fit the HhaI, RsaI, MspI, and HaeIII TRF size distributions. Models were initially screened by one of two methods. The first involved visually comparing empirical TRF frequency plots arranged by size (bp) to plots generated using combinations of normal and uniform probability distributions. The second screening method consisted of fitting nonlinear equations supplied with the software GOSA (Bio-Log, Ramonville, France) (e.g., exponential decay, two-phase exponential decay, lognormal, normal, power, Pareto, and polynomial, etc.) to empirical TRF frequencies arranged as rank abundance data. We were modeling TRF size distributions for the purpose of evaluating TRF diversity indices by simulation, so our main criterion was the ability of the models to generate correlations similar to those obtained in empirical simulations when performed over the same range of *S*_{T}. After the initial screen, simulations were performed to 1,200 and 4,000 species by using selected models, and correlations were compared to results from OTU sampling and sequence sampling, respectively. Simulations were performed as described above, except that assignment of TRF size to each species was according to a probability distribution defined by the model in question (the parametric TRF size distribution) rather than an empirical TRF size distribution. Also, variability between empirical TRF size distributions was incorporated into the simulations by creating a new parametric TRF size distribution for each iteration within a simulation. This was done by allowing random variability in model parameters based on the range of parameter values obtained during fitting of the model to the four empirical TRF size distributions. Correlation matrices obtained using different parametric TRF size distribution models were compared to OTU sampling and sequence sampling correlation matrices by conducting Mantel tests and by calculating the mean absolute deviation between correlation matrices.

Our goal was to simulate communities with up to 10,000 species. Through the above-described process, we chose to use the following model in further simulations: an assemblage of six normal curves and one uniform random distribution. Each normal curve had a randomly generated probability (*P*_{O}) that a species would have a TRF size derived from that curve. *P*_{O} values for all the normal curves summed to 90%, while the underlying random-uniform-distribution *P*_{O} was 10%. Each normal curve was defined by a randomly generated mean (25 to 650 bp) and standard deviation (SD; 2 to 20 bp). The uniform random distribution ranged from 1 to 1,400 bp. Parametric TRF size distributions were sampled by randomly assigning each species in an abundance distribution to one of the six normal curves or the uniform distribution (according to their *P*_{O} values) and then randomly assigning a specific TRF size within the assigned curve.

## RESULTS

OTU sampling.OTU sampling was conducted using species abundance distributions with 10 to 1,200 species for most restriction enzymes (Table 2). Table 3 shows correlation results for a 1% threshold (the most commonly used threshold in the literature) for some of the indices tested. The highest correlation between a true index and its corresponding TRF index was between *H*′ and TRF-*H*′ (0.71). However, TRF-*E*_{var} had the highest correlations with true indices overall, with the correlation between *H*′ and TRF-*E*_{var} the highest (0.81). Data density plots (Fig. 2A and B) show the extent to which TRF-*H*′ and TRF-*E*_{var} may be useful over this range of abundance distributions. Compared to the data in Fig. 2A, the decreased slope and the smaller range of true *H*′ for each category in Fig. 2B show the greater resolution that TRF-*E*_{var} has in reflecting true *H*′. This is particularly true when *H*′ is greater than 4.5, which is the case for approximately half the data. Species richness (*S*) and the evenness indices *E*_{var} and *E*_{1/D} are not predicted well by any TRF index (*r* < 0.4). This is illustrated for the relationship between *S* and TRF-*S* in Fig. 2C. Other indices tested (*J*′, 1/*D*, 1/*d*, and *e*^{H′}) were moderately well correlated with TRF indices (*r* < 0.73).

When a 4% threshold was used, the correlations between most true and TRF indices declined substantially (data not shown), even for those TRF indices with the highest correlations with true indices (Table 4). However, the relationship between TRF-*E*_{var} and *H*′ may still be useful, with a correlation coefficient of 0.66 (SD = 0.03). This was the highest correlation observed between true and TRF indices when a threshold of 4% was used. The correlation between true *H*′ and TRF-*H*′ dropped to −0.54 (SD = 0.03).

With a 0.1% threshold, true indices were generally well correlated with their corresponding TRF indices (see examples in Table 4).

Sequence sampling.When species richness was extended to 4,000 species by using sequence sampling (Table 2) and a 1% peak threshold, almost all correlations between true and TRF indices declined compared to those described above for simulations using up to 1,200 species and OTU sampling (data not shown). The highest correlation remained between *H*′ and TRF-*E*_{var}, which declined from 0.81 to 0.71 (SD = 0.03) (Table 4), whereas the correlation between *H*′ and TRF-*H*′ declined from 0.71 to 0.51 (SD = 0.08). Differences between results for sequence sampling and OTU sampling for 4% or 0.1% thresholds were similar to the differences described for the 1% threshold. Correlations remained basically the same or declined slightly (data not shown).

Parametric sampling for complex communities.A variety of parametric TRF size distributions were compared to the four empirical size distributions obtained from TRF-CUT. The comparisons were based on shapes of the size distributions and on how well correlations between true and TRF indices obtained with communities of up to 1,200 or 4,000 species matched results obtained using empirical TRF size distributions over the same range of *S*_{T}. According to both Mantel tests and the mean absolute deviations, assemblages of six normal curves plus an underlying random uniform distribution were the most similar to results obtained using the empirical distributions (Table 5). This distribution performed better by these criteria than a two-phase exponential-decay model, which had the best fit of built-in nonlinear equations according to analysis with the software GOSA. When the simulations were limited to 1,200 species (1% threshold), the correlation of *H*′ and TRF-*H*′ declined to 0.64 for the parametric distribution, whereas the correlation of *H*′ and TRF-*E*_{var} was 0.82; these values were the closest to those obtained by OTU sampling for all models tested.

When species abundance distributions were extended up to 10,000 species and a threshold of 1% was used, the correlation between *H*′ and TRF-*E*_{var} remained high (0.73), whereas the correlation between *H*′ and TRF-*H*′ dropped to 0.15 (Fig. 3). Other TRF evenness indices also correlated well with *H*′ and *J*′, but the TRF species richness and diversity indices did not correlate well with any true indices (Table 6). The results were found to be essentially the same when a 4% threshold was used (correlation between *H*′ and TRF-*E*_{var} of 0.56) (Table 4). In contrast, when a 0.1% threshold was used, all true indices correlated well with their corresponding TRF indices, with the best correlations between *H*′ and TRF-*H*′ (0.76) and between *J*′ and TRF-*J*′ (0.68) (e.g., Table 4).

## DISCUSSION

Estimates of species richness in soil bacterial communities range from thousands to millions of species, whether based on sequencing of ribosomal genes (3, 31) or DNA reassociation kinetics (13, 39). Here, we asked whether differences in diversity between communities can be meaningfully portrayed by common diversity indices applied to T-RFLP profiles derived from bacterial communities. Our results demonstrate a large disparity between true community diversity and diversity estimates derived from T-RFLP data, indicating that diversity estimates based on T-RFLP inaccurately portray the true underlying diversity of bacterial communities.

When typical peak height or area thresholds (∼1%) were used, richness and diversity indices applied to T-RFLP profiles had limited capabilities to discriminate between levels of diversity in the underlying community. The number of bands in T-RFLP profiles (TRF-*S*) did a very poor job at predicting the number of taxa in the community (*S*) in all simulations. Loisel et al. (26) found that the number of bands did not correspond with *S* for other community profiling methods (single-stranded conformation polymorphism and denaturing gradient electrophoresis profiles) and proposed that the Curtis estimator could be used to evaluate the diversity of the underlying community. The Curtis estimator is based on the reciprocal of the Berger-Parker index (1/*d*), which we did not find to be particularly useful here, as demonstrated by low correlations between TRF-1/*d* and true community indices.

*H*′ is a well-known and widely used diversity index integrating both species richness and evenness, although interpretation of the index itself is not without some conceptual difficulties (19, 21). TRF-*H*′ did correlate well with the true community value of *H*′ in the simulations using the lowest number of species (10 to 1,200). However, this correlation declined substantially in simulations using greater numbers of species, to the point where TRF-*H*′ cannot be considered a useful metric of diversity. We were surprised to find that one community evenness index applied to T-RFLP profiles, TRF-*E*_{var}, consistently had the highest correlations with the true community value of *H*′ in all simulations using 1% or 4% thresholds (correlations above 0.71 to 0.81 and 0.56 to 0.66, respectively). The fact that different taxa can generate the same TRF (1, 8) may in part explain why evenness of T-RFLP profiles (reflected in TRF-*E*_{var}) is affected by true community richness as well as true evenness (reflected in true *H*′). An increase in species richness will likely increase the average number of taxa contributing to each TRF, which will reduce the impact of differences in abundance and increase the evenness of the profile. Additionally, our results may indicate that evenness within the most abundant taxa is often affected by the overall species richness of a community.

The factors captured in our simulations which cause T-RFLP diversity indices to underestimate, and differ independently from, true community diversity include (i) TRFs of the same size generated from multiple taxa, (ii) exclusion of TRFs if they are outside the size range reliably resolved during electrophoresis, and (iii) TRFs not being detected if they fall below a fluorescence threshold. This fluorescence threshold is translated to a relative abundance threshold by the standardization processes that have been recommended to reduce effects of analytical variability in overall profile strength (2, 5, 32). While there are some similarities between the threshold and the lognormal distribution “veil line” of Preston (34), they differ in that the veil line is a consequence of random sampling, whereas the threshold specifically excludes rare TRFs. Hence, there is no reason to expect the well-known relationship between diversity and sampling effort (14) to hold for T-RFLP profiles, if sampling effort is increased by repeating the assay on the same DNA extract. The relationship may or may not hold if sampling effort is increased by repeating the analysis on replicate environmental samples, depending on the spatial scale of the sampling relative to the spatial variability of the community (35).

Sampling effort can also be increased by enhancing the sensitivity of detection of rare TRFs. We found that an order-of-magnitude increase in the simulated sensitivity of T-RFLP, from a 1% threshold to 0.1%, completely alters the relationships between true diversity indices and TRF indices. Evenness indices are replaced by richness or integrated indices as the best predictors of true *S*, *H*′, and 1/*D* for a 0.1% threshold. To our knowledge, current technology cannot be used to consistently differentiate peaks from background noise at this low level, so we present the analysis for comparative purposes. This sensitivity to fluorescence threshold highlights the importance of choosing a common threshold to be applied uniformly across profiles.

All diversity indices tested here, other than TRF-*S*, are based on comparisons of peaks within a single profile and the assumption that these comparisons reflect differences in relative abundances of taxa. It is worth noting that this assumption is also tenuous due to complex interactions between taxa in ribosomal copy number, PCR bias, and DNA extraction bias (11). It is important to note that analysis of profiles by ordination is based on comparisons between rather than within samples and on the more realistic assumption that biases affecting taxa are the same across samples. The simulations are also conservative (in favor of diversity indices) because TRFs were assumed to be sized with single-base-pair resolution, which often cannot be achieved (22, 30).

While we have demonstrated that diversity indices applied to T-RFLP profiles do not tell us much about the underlying community diversity, empirical studies using the indices often find significant differences between communities from different environments. In other words, the indices do not vary randomly in the field. Significant differences could, however, arise from changes in the identities or relative abundances of the taxa present rather than species richness and evenness. Calculation of diversity indices results in an inevitable loss of information, as the community data are reduced to a single value. Indeed, we are unaware of any studies in which diversity indices indicated a difference between communities and multivariate methods (an ordination or cluster analysis) did not, whereas there are numerous studies where both methods of analysis indicated a difference between communities (9, 17, 20, 25) or where multivariate methods indicated a difference but diversity indices did not (4, 10, 15, 16, 40). It has been suggested that the greater sensitivity of multivariate methods for detecting community differences is true even across methods which seem optimal for each type of analysis (18).

From our analyses, we are led to conclude that T-RFLP-based estimates of diversity provide inaccurate insights into the actual diversity of bacterial communities, even as a comparative method. We suggest that previously published work using diversity indices applied to T-RFLP data should be reinterpreted. For example, Fierer and Jackson (9) found that diversity indices (TRF-*S* and TRF-*H*′) and community ordinations of bacterial T-RFLP profiles were correlated with soil pH. Based on conclusions reached here, both of these analyses indicate that community composition responds to pH, but no reliable statements can be made regarding the diversity (richness or evenness) of those communities. In addition, there is little reason to apply diversity indices to T-RFLP data to detect differences in community composition, because multivariate methods are more sensitive and better suited for this task. If it is necessary to measure bacterial diversity using T-RFLP, we recommend using TRF-*E*_{var} as a measure of diversity because it was the index that best represented the true underlying community diversity.

## ACKNOWLEDGMENTS

This work was supported by the National Science Foundation (DEB0075397); the Office of Science (BER), U.S. Department of Energy (DE-FG02-93ER61666); and the National Research Initiative of the USDA Cooperative State Research, Education and Extension Service (2003-35107-13743).

## FOOTNOTES

- Received 6 March 2007.
- Accepted 19 June 2007.
↵▿ Published ahead of print on 29 June 2007.

- American Society for Microbiology