Previous Article | Next Article ![]()
Applied and Environmental Microbiology, March 2007, p. 1576-1585, Vol. 73, No. 5
0099-2240/07/$08.00+0 doi:10.1128/AEM.01996-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, Colorado 80309,1 Department of Computer Science, University of Colorado, Boulder, Colorado 80309,2 Department of Biology, San Diego State University, San Diego, California 92182-4614,3 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 803094
Received 23 August 2006/ Accepted 20 December 2006
|
|
|---|
|
|
|---|
diversity (the diversity within each sample, e.g., the number of species observed in an environment), and ß diversity (the partitioning of biological diversity among environments or along a gradient, e.g., the number of species shared between two environments) (Table 1) (31). Here, we focus on ß diversity, which can be measured in many different ways. These measures can be broadly divided into two categories: qualitative measures, which use the presence/absence of data to compare community composition, and quantitative measures, which also take the relative abundance of each type of organism into account (Table 1). Examples of commonly used qualitative measures of ß diversity include the Sörensen and Jaccard indices; quantitative measures include the Sörensen quantitative index and the Morisita-Horn measure (see reference 16 for a review). |
View this table: [in a new window] |
TABLE 1. Measurements of diversity
|
diversity, the diversity within each sample. These studies have provided overwhelming evidence that quantitative and qualitative measures of diversity can paint markedly different pictures of diversity over a range of taxa and spatial scales. For example, in treeless Appalachian spider communities, rare, transient species dominated qualitative measures of
diversity, but ecological factors such as the availability of flying insect prey determined which species were abundant in a given community and thus dominated the quantitative measures (28). In human-modified grassland plots, qualitative
diversity was primarily influenced by levels of phosphorus and potassium, but quantitative
diversity was primarily influenced by levels of calcium, the carbon-to-nitrogen ratio, and organic matter (15). Other studies of plant diversity in alpine tundra (1), shrub-steppe (17), and tallgrass prairie (23) and of phytoplankton diversity (32) have also found important and biologically meaningful differences in quantitative and qualitative
diversity. Although these studies all focused on
diversity, we expect that similar differences between quantitative and qualitative diversity will also be found in ß diversity and, therefore, that a combination of qualitative and quantitative approaches for measuring ß diversity is also desirable. Most ß diversity measures, including those listed above, treat each species or operational taxonomic unit (OTU; typically defined by a sequence similarity threshold) in the sample as equally related. Newer ß diversity measures that incorporate phylogenetic information are more powerful because they account for the degree of divergence between sequences (13, 18, 29, 30). Phylogenetic ß diversity measures can also be either quantitative or qualitative depending on whether abundance is taken into account. The original, unweighted UniFrac measure (13) is a qualitative measure. Unweighted UniFrac measures the distance between two communities by calculating the fraction of the branch length in a phylogenetic tree that leads to descendants in either, but not both, of the two communities (Fig. 1A). The fixation index (FST), which measures the distance between two communities by comparing the genetic diversity within each community to the total genetic diversity of the communities combined (18), is a quantitative measure that accounts for different levels of divergence between sequences. The phylogenetic test (P test), which measures the significance of the association between environment and phylogeny (18), is typically used as a qualitative measure because duplicate sequences are usually removed from the tree. However, the P test may be used in a semiquantitative manner if all clones, even those with identical or near-identical sequences, are included in the tree (13).
![]() View larger version (10K): [in a new window] |
FIG. 1. Calculation of the unweighted and the weighted UniFrac measures. Squares and circles represent sequences from two different environments. (a) In unweighted UniFrac, the distance between the circle and square communities is calculated as the fraction of the branch length that has descendants from either the square or the circle environment (black) but not both (gray). (b) In weighted UniFrac, branch lengths are weighted by the relative abundance of sequences in the square and circle communities; square sequences are weighted twice as much as circle sequences because there are twice as many total circle sequences in the data set. The width of branches is proportional to the degree to which each branch is weighted in the calculations, and gray branches have no weight. Branches 1 and 2 have heavy weights since the descendants are biased toward the square and circles, respectively. Branch 3 contributes no value since it has an equal contribution from circle and square sequences after normalization.
|
|
|
|---|
The first step in applying weighted UniFrac is to calculate the raw weighted UniFrac value (u), according to the first equation:
![]() |
If the phylogenetic tree is not ultrametric (i.e., if different sequences in the sample have evolved at different rates), clustering with weighted UniFrac will place more emphasis on communities that contain quickly evolving taxa. Since these taxa are assigned more branch length, a comparison of the communities that contain them will tend to produce higher values of u. In some situations, it may be desirable to normalize u so that it has a value of 0 for identical communities and 1 for nonoverlapping communities. This is accomplished by dividing u by a scaling factor (D), which is the average distance of each sequence from the root, as shown in the equation as follows:
![]() |
Clustering with normalized u values treats each sample equally instead of treating each unit of branch equally: the issues involved are similar to those involved in performing a multivariate analysis using the correlation matrix to treat each variable equally independent of scale or, using the covariance matrix, to take the scale into account. Scaling by D also allows for comparison with unweighted UniFrac values, which also always have a value between 0 and 1.
Multivariate analyses and tests of robustness to sequencing effort.
Weighted UniFrac can be used to compare many communities simultaneously using standard multivariate statistical methods. In this respect, it resembles unweighted UniFrac (13), FST (18), and other ß diversity measures that can be treated as distance metrics (16). In the case studies below, we use a hierarchical clustering method called unweighted pair group method with arithmetic averages (UPGMA) (25) to cluster the community samples. It should be noted that this method is used only to relate the community samples to one another; it is not used to build the phylogenetic tree that relates the sequences. UPGMA sequentially joins the least different samples to create a tree structure describing the differences between communities. We also use principal coordinates analysis (PCoA) (7), in which a distance matrix is used to plot the n samples in (n 1)-dimensional space. The vectors in this space, or factors that describe as much variation as possible, can be plotted as axes in two dimensions for visualization or regressed on environmental variables (e.g., chemistry or temperature) using general linear model regressions to determine which environmental factors have the largest impact on community composition. PCoA is analogous to a related, widely used technique called principal components analysis (PCA). The distinction is that PCA begins with a table of the number of times each phylotype was observed in each environment, whereas PCoA begins with a table of distances between each pair of environments. The output of PCoA is a list of factors (labeled factor 1, factor 2, etc. in descending order of importance) and the factor weighting for each sample (allowing each sample to be plotted in the factor space).
Because microbial communities are usually too complex to sample completely, we measured the robustness of the UPGMA and PCoA results to sequencing effort using a sequence jackknifing technique in which the UPGMA and PCoA clusters are regenerated using a subset of the sequences. Specifically, the same number of sequences are randomly selected from each environment for many replicate trials. Nodes in the UPGMA cluster that are recovered in a large percentage of the jackknife trials are considered robust to sampling effort, particularly if only a small proportion of the data was subsampled from each environment during each jackknife replicate. The positions of the points in jackknifed PCoA scatterplots are the average for the jackknife replicates and are displayed with ellipses representing the interquartile range (IQR) in each axis. If the IQRs are small, the same result would likely be achieved with a different sample of sequences from the same distribution, but if the IQRs are large we might expect to see different relationships.
Case studies.
We chose two studies that compared the community composition of related environments using 16S rRNA genes sequenced directly from environmental samples (20, 21). The first study examined the relative influences of mineral chemistry, temperature, and geography on microbial community composition in acidic thermal springs in Yellowstone National Park (18a). The second study examined the influences of obesity and kinship on microbial community composition in the mouse gut (11).
By analyzing the data from each study with both unweighted UniFrac (a qualitative measure) and weighted UniFrac and FST (both quantitative measures), we show that quantitative and qualitative ß diversity measures can lead to substantially different conclusions about the main factors that structure microbial diversity. In both studies, the results from the weighted and unweighted analyses suggest that different factors affect the presence/absence and relative abundance of microbial lineages.
|
|
|---|
2 km away in the Roaring Mountain (RM) area. In each spring, sediment was sampled at temperatures of 60°C, 65°C, and 70°C. In addition, samples were taken from an AS spring that had both iron- and sulfur-rich sediments as well as mixed iron-sulfur sediments at 75°C. The springs all had similarly low pH levels (
1.5). Mathur et al. (18a) generated 16S rRNA gene libraries of between 65 and 96 sequences representing between 8 and 53 unique sequences for the 12 samples. Based on the analysis of this sequence information, Mathur et al. concluded that mineral chemistry was by far the most important factor for controlling community composition and that temperature had a secondary effect. The clearest statistical evidence for this conclusion resulted from PCoA of pairwise FST values. Factor 1 from the PCoA accounted for 85.7% of the variation in the data and correlated strongly with chemistry. Specifically, sulfur-rich springs clustered apart from springs rich in iron or iron and sulfur. Factor 2 explained only 7.8% of the variation and was correlated with temperature in the iron-rich samples.
The strong correlation with chemistry detected using the quantitative FST values was primarily due to differences in the relative abundances of bacterial lineages rather then the types of microorganisms present. Comparing the FST analysis to the results of PCoA and hierarchical clustering with both weighted and unweighted UniFrac demonstrated that the FST results were consistent with the results of weighted UniFrac but not unweighted UniFrac (Fig. 2), as expected because both FST and weighted UniFrac are quantitative measures.
![]() ![]() View larger version (50K): [in a new window] |
FIG. 2. PCoA analysis of hot spring sediment samples with FST and unweighted, weighted, and normalized weighted UniFrac using a variety of trees. Shown is a plot of the first two principal coordinate axes (factors) for PCoA using each tree-building method and a UniFrac algorithm. Rows show the effects of different tree-building methods; columns show the effects of applying unweighted UniFrac (first column), weighted UniFrac (second column), and weighted UniFrac with the branch length normalization (third column). (a) The legend describes which symbol applies to which sample. Fe-containing springs have solid symbols; springs that contain only S have hollow symbols. Temperature (°C) is denoted by the shape of the symbol. (b) PCoA clustering using FST values as distances. (c through e) Neighbor-joining tree from NEIGHBOR. (f through h) and (i through k) Two representative parsimony trees from DNAPARS. (l through n) ARB parsimony insertion tree. (o through q) RAxML maximum likelihood tree. (r through t) RAxML parsimony guide tree, no branch lengths. (u through w) MrBayes consensus tree.
|
We evaluated the similarities of the seven phylogenetic trees using both the nodal distance algorithm (NDA) (2) and a partition metric (22) (Table 2). The NDA value is the sum of the differences in distance between each pair of sequences in the phylogenetic tree (2). We scaled the branch lengths so that they summed to 1 in each of the seven trees. The partition metric counts the number of tree nodes (partitions) that are not shared between the two trees (22) (Table 2). We divided the result by the total number of partitions so that the values are expressed as fractions.
|
View this table: [in a new window] |
TABLE 2. Pairwise comparisons of the phylogenetic trees evaluated in this studya
|
![]() View larger version (14K): [in a new window] |
FIG. 4. Hierarchical clustering of hot spring sediment samples with weighted and unweighted UniFrac. The percentage support for nodes supported at least 70% of the time with sequence jackknifing is indicated. The name of each sample indicates the spring (e.g., A1, A2, and A3 are different springs from the Amphitheatre Springs area, and RM is from the Roaring Mountain area), whether the sample is sulfur rich (S), iron rich (Fe), or both (FeS), and the temperature. The names and branches are colored black for S samples and gray for Fe and FeS samples. (a) Weighted UniFrac with the neighbor-joining tree and (b) unweighted UniFrac with the neighbor-joining tree.
|
![]() View larger version (8K): [in a new window] |
FIG. 3. Jackknifing of PCoA analysis of hot spring sediment samples with unweighted and weighted UniFrac. Shown is a plot of the first two principal coordinate axes (factors) for PCoA with the neighbor-joining tree. Point locations are the average location in the 100 jackknife replicates. Only 50 randomly selected sequences from each sample were used in each replicate (the range of sequences per sample was 65 to 96). Gray ellipses represent the IQR for the 100 jackknife replicates. The 95% confidence intervals for the point locations were also calculated and were considerably smaller than the IQRs (data not shown). The symbols are the same as those shown in Fig. 2.
|
The UniFrac results were not sensitive to the method used to build the phylogeny. For both weighted and unweighted UniFrac, the results were generally similar for the seven phylogenetic trees, even though the trees had considerable differences in both topology (the partition metric shows that 10 to 75% of the clades were unique to one or the other tree) and branch length (generally high NDA values; see Table 2). Despite these differences, weighted UniFrac always separated the samples by chemistry, and unweighted UniFrac always produced a first factor that correlated significantly with temperature. These results suggest that both weighted and unweighted UniFrac analyses are robust to differences in the trees produced by popular tree-building methods. The one tree-building method that is clearly the most different from the remainder is the RAxML parsimony tree (which lacks branch lengths). This loss of information about the extent of divergence may lead to different results. For these samples, applying the normalization for differential branch lengths generally had little effect (compare the second and third columns of graphs in Fig. 2). The normalization did appear to increase the variability of the spread of the sulfur-rich samples along factor 2 depending on the tree-building method, especially for likelihood and Bayesian trees, perhaps indicating that it is less robust than weighted UniFrac without normalization to differences in branch length estimates.
Study 2: obesity and gut microbiota.
Although in the example above, weighted UniFrac identified clearer patterns of variation between samples than unweighted UniFrac, this is not always the case. In the analysis of the effects of obesity and kinship on the microbial population of mouse gut microbiota, quantitative and qualitative ß diversity measures again provide completely different perspectives on the main factors that impact microbial community composition. Here, unweighted UniFrac identifies clearer patterns of variation. In their study, Ley et al. sequenced bacterial 16S rRNA genes from the distal ceca of obese and nonobese mice who were the offspring of three different mothers (11). Each mother was heterozygous for a mutation in the gene for the hormone leptin, which affects appetite and causes obesity in homozygous mutants (10, 33). Each mother was mated to a heterozygous male, and litters were produced that contained siblings with each of the three possible genotypes. The littermates were kept in the same cage as their mother until they were weaned at 3 weeks and then kept in their own cages for an additional 5 weeks before being sacrificed. Bacterial 16S rRNA gene clone libraries with between 111 and 484 sequences were produced from the cecal microbial communities for each of the three mothers and for 16 of the offspring (11).
We calculated pairwise values using both weighted and unweighted UniFrac and used both hierarchical clustering and PCoA to cluster the mice based on this sequence information (Fig. 5). Unweighted UniFrac revealed a clear association between microbial diversity and kinship: the mice clustered almost perfectly by mother. The two mothers who were sisters (M1 and M3) clustered together with their offspring. An unrelated mother (M2) and her offspring formed a separate cluster. This result was reliably obtained using both hierarchical clustering and PCoA (Fig. 5) (11). Sequence jackknifing showed that these results were robust to sample size. When 200 sequences were randomly selected from each of the 17 mice with between 200 and 484 sequences, both the node that grouped M1 and M3 with their offspring and the node that grouped M2 with most of her offspring were recovered 87 out of 100 times (Fig. 5e). The two mice represented by less than 200 sequences (M2A-1 and M2A-2) were the only mice that did not group with their mother (Fig. 5e). In addition, none of the IQRs for point locations for M1 and M3 and their offspring overlaps those of M2 and her offspring in the jackknifed PCoA plot (Fig. 5c).
![]() View larger version (25K): [in a new window] |
FIG. 5. Analysis of mouse cecal microbial communities with weighted and unweighted UniFrac. Genotypes are ob/ob for homozygotes for the mutant leptin allele that confers obesity, ob/+ for heterozygotes, and +/+ for wild types. All mothers are ob/+. (a) Plot of the first two principal coordinate axes for PCoA with unweighted UniFrac. Symbols represent individual animals. The rectangles highlight the family of mother 2 and the families of mothers 1 and 3, who are sisters. (b) The same plot for weighted Unifrac. The rectangle highlights the majority of the ob/ob mice. The arrows point to outliers: an ob/ob mouse outside of the ob/ob cluster (black triangle) and an ob/+ mouse inside the ob/ob cluster (white square). (c) Same plot for sequence jackknifing of unweighted UniFrac with a maximum of 200 sequences from each mouse for 100 replicates. The symbols are the average values for the 100 replicates, and the gray ellipses represent the IQR of the point locations. (d) Sequence jackknifing with weighted UniFrac with a maximum of 200 sequences from each mouse for 100 replicates. (e) Hierarchical cluster diagram for unweighted UniFrac. The percentage support for nodes supported at least 70% of the time with sequence jackknifing is indicated. The main clustering is by mother. (f) Hierarchical cluster diagram for weighted UniFrac. The clustering by mother is much less clear, and there is more clustering by ob/ob genotype (and hence by obesity phenotype).
|
|
|
|---|
In contrast, our new weighted UniFrac measure is well suited to detecting differences in abundance even when the overall groups of organisms that are present in each sample remain the same. In the thermal spring study, weighted UniFrac clustered the samples primarily by the chemistry of each spring; in the mouse obesity study, weighted UniFrac grouped most of the obese mice together in one cluster. Thus, we expect that weighted UniFrac will be suitable for studying transient changes in microbial communities related to nutrient availability and may also be suited to the analysis of seasonal changes and changes under the influence of different pollutants. We have made an implementation of the weighted measure available through the UniFrac website at http://bmf.colorado.edu/unifrac by selecting the "Use Abundance Weights" option, allowing researchers to run both weighted and unweighted analyses in the context of a convenient Web interface (12).
We demonstrated that neither weighted nor unweighted UniFrac is sensitive to the methodology used to build the underlying phylogenies, which is important because each phylogenetic method has its own strengths and weaknesses (5). Weighted UniFrac performs similarly to FST, another quantitative measure of ß diversity, but has the advantage over FST that it is applied to a phylogenetic tree rather than to a sequence alignment. It can thus be applied to phylogenetic trees that are generated from nonoverlapping sequence, such as trees generated based on top BLAST hits in viral metagenomic analyses (3) or from different regions of the rRNA molecule using ARB's parsimony insertion tool (14). Weighted UniFrac will thus be an important tool for large-scale comparisons across multiple community samples collected by different researchers at different times. Unfortunately, information about the abundance of each sequence in the sample and about whether clones were prescreened for diversity by restriction fragment length polymorphisms or related techniques is typically not available in public databases such as GenBank. Now that methods are available to analyze such data on a global scale, the development of resources that collect this information is more critical than ever.
We conclude that both quantitative and qualitative measures of ß diversity have specific niches in the analysis of microbial communities and that using both types of measures will often be critical for understanding the factors that underlie microbial diversity.
Catherine Lozupone was supported by NIH predoctoral training grant T32 GM08759. This work was supported in part by the W. M. Keck RNA Bioinformatics Initiative and by a donation from the Jane and Charlie Butcher Foundation.
Published ahead of print on 12 January 2007. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»