Use of Antibiotic Resistance Analysis for Representativeness Testing of Multiwatershed Libraries

ABSTRACT The use of antibiotic resistance analysis (ARA) for microbial source tracking requires the generation of a library of isolates collected from known sources in the watershed. The size and composition of the library are critical in determining if it represents the diversity of patterns found in the watershed. This study was performed to determine the size that an ARA library needs to be to be representative of the watersheds for which it will be used and to determine if libraries from different watersheds can be merged to create multiwatershed libraries. Fecal samples from known human, domesticated, and wild animal sources were collected from six Virginia watersheds. From these samples, enterococci were isolated and tested by ARA. Based on cross-validation discriminant analysis, only the largest of the libraries (2,931 isolates) were found to be able to classify nonlibrary isolates as well as library isolates (i.e., were representative). Small libraries tended to have higher average rates of correct classification, but were much less able to correctly classify nonlibrary isolates. A merged multiwatershed library (6,587 isolates) was created and was found to be large enough to be representative of the isolates from the contributing watersheds. When isolates that were collected from the contributing watersheds approximately 1 year later were analyzed with the multiwatershed library, they were classified as well as the isolates in the library, suggesting that the resistance patterns are temporally stable for at least 1 year. The ability to obtain a representative, temporally stable library demonstrates that ARA can be used to identify sources of fecal pollution in natural waters.

The contamination of streams, rivers, and estuaries with untreated fecal material from nonpoint sources continues to be a major environmental problem. Nonpoint source fecal pollution can contain high levels of nitrogen and phosphorus, which contribute to eutrophication. Pollution from fecal sources can also contain pathogenic bacteria, protozoa, and viruses, which can contribute to an increased public health risk. Although methods of quantifying the amount of fecal pollution in natural waters are well established, there is a critical need for the development of methods for the identification of the sources of the pollution to facilitate the remediation of the polluted waters.
For all library-based methods, the size and composition of the library are critical for accurate and reliable source determination. In order for a library to reliably identify fecal sources, it needs to be representative of the sources that are present in the watersheds (i.e., it should contain examples of all of the patterns found in the bacteria from each of the source types that are found in the watershed). Additionally, an ideal library should be representative enough to be able to classify fecal isolates from other geographic areas and should be stable over time so that new libraries do not need to be continually created. Our laboratory has focused on using ARA for source identification and has created libraries of enterococci from six Virginia watersheds. The purpose of this paper is to examine the relationship between the size of ARA libraries and their representativeness and to determine if libraries from different watersheds can be merged to create multiwatershed libraries that are geographically representative and temporally stable.

MATERIALS AND METHODS
Watersheds and library composition. Fecal samples from known sources were collected from six Virginia watersheds (Table 1). Land use of these watersheds varied, including urban (high proportions of paved surface, with a high density of homes connected to a sewer system), rural (primarily animal or crop agricultural land use), and wooded (undeveloped land predominately covered by trees). Fecal samples were collected from three types of sources: human (septic tanks and sewage influent), domesticated animal (beef, chicken, dairy, dog, goat, horse, sheep, and turkey), and wild animal (deer, goose, duck, bird, groundhog, and raccoon). When possible, feces from three to five animals were combined into one sample to maximize the diversity of the library. Human samples were obtained from septic tank pumpouts and from the primary influent to sewage treatment plants. Domesticated animal samples were obtained from farms that raised the animals or (in the case of dogs) from city parks. Wild animal scat samples were collected in the field or from trapped animals and were identified by wildlife biologists. The number of samples of each source type varied among the libraries and ranged from 9 to 191 samples per type per library (Table 2). Fecal samples were collected from as many unique locations within the watershed as time and budgets allowed.
In addition, libraries were created either by merging or by removing isolates from the six individual libraries. A multiwatershed merged library was created by combining the isolates from each of the six individual libraries. For testing the similarity between libraries, combinations of five libraries were used to classify isolates from the remaining sixth library into the three source types: human, domesticated, and wild. For determining the effect of library size on classification success and representativeness, sublibraries of various sizes (5, 10, 20, 40, 60, or 80% of the full library) were randomly generated (along with their corresponding groups of held-out isolates) from the Long Glade library by using the SURVEY-SELECT procedure of SAS (SAS version 8.1; SAS Institute, Inc.) To generate the sublibraries, 5, 10, 20, 40, 60, or 80% of the isolates in each of the 349 samples in the Long Glade library were randomly selected. Three sublibraries were generated for each sublibrary size.
Statistical analysis. For each of the six individual libraries and for the multiwatershed merged library, the data for each of the known isolates' ability to grow in the presence of each concentration of each antibiotic were classified by discriminant analysis by the DISCRIM procedure of SAS (prior probabilities, equal; covariance matrix, pooled). Within each sample, only isolates with unique resistance patterns were analyzed. (Isolates with identical patterns were discarded by using the NODUPRECS option of the SORT procedure.) The percentage of unique isolates ranged from 64% in Goose Creek to 76% in Blacks Run, and the average number of unique isolates per sample ranged from 7 (Goose Creek) to 14 (Blacks Run).
The average rate of correct classification (ARCC) of the various libraries, sublibraries, and sets of held-out isolates was computed by averaging the percentages of correctly classified isolates for each source (35). Additionally, the rates of misclassification were calculated. To determine the extent of misclassification of isolates from other sources into a given source type, the percentages of each of the other source types that were misclassified as that type were averaged. This value was termed the "expected frequency of misclassification" by Harwood et al. (17). As a way to determine the lower limit for considering a source to be a significant contributor to a watershed, Whitlock et al. (33) averaged the expected frequencies of misclassification for all sources and then added four times the value of the standard deviation to the average. We propose calling this value the "minimum detectable percentage" (MDP). Thus, if a source is found at levels above the MDP, it can be reasonably assumed that this is not the result of misclassification of other sources and therefore is present in the watershed.
Representativeness testing. The representativeness of a library can be estimated by how well the library can classify nonlibrary isolates (isolates from the same watershed that are not included in the library). If the nonlibrary isolates are classified (on average) as well as the library isolates, then the library contains enough representation of the patterns to provide confidence in the classification of unknown isolates from the watershed. In the standard type of discriminant analysis (called "resubstitution analysis"), each isolate is classified based on the patterns in the entire library, including its own pattern. As a result, the ARCCs from resubstitution analysis may overestimate representativeness, because each isolate is classified against itself (i.e., it is classified by using a library that contains its resistance pattern). Therefore, in addition to resubstitution analyses, crossvalidation analyses (also called "jackknife" or "leave-one-out" analyses) were performed to determine the representativeness of the libraries. For the crossvalidation analyses, an individual isolate (or all of the isolates from the same sample) was removed from the library one at a time. Then, the removed isolate or isolates were classified based on the library comprised of the remaining isolates, and the ARCC for these removed isolates was calculated. Two types of cross-validation analyses were performed: pulled-isolate analysis and pulledsample analysis. For the pulled-isolate analyses, each isolate in the library was removed separately. However, isolates from the same sample might have similar patterns, which would make the library seem more representative than it actually is. To eliminate this possibility, pulled-sample analyses were performed in which all of the isolates from a common sample were removed at the same time.
Temporal and geographical testing. To determine the temporal stability of the libraries, fecal samples were collected from the Moores Creek, Holmans Creek, and Long Glade Creek watersheds approximately 1 year after the original library samples were collected. To measure the geographic range of the libraries, samples were collected from watersheds in southwestern Virginia and in southern Florida. The isolates obtained from these samples will be referred to as "temporal validation isolates" and "geographical validation isolates," respectively.

RESULTS
Classification of the known isolates was performed for each library. When resubstitution analysis was used, the ARCCs ranged from 65 to 81% (Table 3). The ARCC of the merged library (57%) was lower than those of any of the individual libraries. Human isolates were generally classified the best, with domesticated and wild isolates classified approximately equally well. Cross-validation of these libraries by the pulledisolate method resulted in slightly lower ARCCs (Table 4), with the differences between the two methods ranging from 2% for the Moores Creek and Long Glade libraries up to 8% for the Goose Creek library. The ARCC of the merged library did not change when analyzed by the pulled-isolate method. When pulled-sample cross-validation analyses were performed, there was a more marked reduction in ARCCs ( Table 5). The differences between this method and the resubstitution analyses were large for most of the libraries (ranging from 8 to 24%), but were much smaller for the largest library (Long Glade Creek) and for the merged library (4 and 3%, respectively).
One way to estimate the representativeness of a library is to determine how well isolates obtained from new samples from the watershed are classified compared to how well library isolates from that source are classified. If isolates from new samples (nonlibrary isolates) from a given source are classified (on average) as well as the isolates from that source that are in the library, then the library can be considered representative for that source. The testing of new isolates can be simulated by cross-validation analyses (which remove isolates from the library and then classify them). If just one isolate per sample had been used to construct the library, the pulled-isolate method of cross-validation (which classifies each isolate based on a library from which it has been removed) would be a good method to use. However, because the libraries in this study contain multiple isolates from the same sample, the pulled-isolate method may not give a true measure of representativeness, because the similarity of isolates within samples may be greater than the similarity between samples. Therefore, the pulled-isolate method is inappropriate because it may overestimate representativeness. The pulled-sample cross-validation analysis avoids this problem by removing all of the isolates from each sample and then classifying them against the remaining isolates, thus simulating the testing of a set of multiple isolates from a new sample. Thus, by comparing the difference between the ARCC of the resubstitution analysis (which classifies each isolate based on a library containing all of the isolates) and the ARCC of the pulled-sample analysis (which classifies each isolate based on a library from which the isolates from its own sample have been removed), the representativeness of a library can be estimated. If the difference is small (less than 5%), then the library can be considered representative (i.e., new isolates are classified almost as well as isolates in the library). Based on this criterion, only the large libraries (Long Glade and the merged library) can be considered representative of the sources of enterococci.
Another way to estimate the representativeness of a library is to hold out a portion of the isolates from an existing library and then classify them by using the sublibrary that contains the remaining isolates. As with the cross-validation analyses, if these held-out isolates are classified (on average) as well as the isolates in the sublibrary, then the entire library can be considered representative. When various sizes of randomly selected sublibraries of the Long Glade library were created, the smallest libraries had the highest ARCCs, but were the poorest in classifying the held-out isolates (Fig. 1). For example, the smallest Long Glade sublibrary (5% of the full library) had an ARCC of 85%, but when the held-out isolates (the remaining 95% of the full library) were classified based on this small library, they were correctly classified at an average rate of just 48%. As the size of the sublibraries increased, the difference decreased. The held-out isolates were correctly classified about as well as the sublibrary isolates at sizes of about 80% of the full library. Thus, approximately 2,300 isolates were required to produce a representative library. Because of the large number of isolates that are required for representativeness, it may be impractical to collect enough isolates from a single watershed. However, if the resistance patterns of isolates from different watersheds are similar, several smaller libraries could be combined to produce a larger, more representative multiwatershed library. To determine the similarity of the six libraries to each other, the isolates from each of the libraries were classified by using each of the other five libraries. Generally, the isolates were better classified against isolates from the same library (based on pulled-sample analysis) than they were against any of the other libraries (Fig.  2). The average of the differences between the pulled-sample ARCC and the ARCC of each library classified against each of the other five libraries was 14%.
To determine how well the isolates from each library would be classified by a larger, more representative library, each of the libraries was classified by using a library comprising the other five libraries combined (i.e., the merged library with the individual library removed). Several of the individual libraries were classified nearly as well against the other five libraries combined as they were against themselves (by the pulled-sam-ple method) (Fig. 2). The average of these differences was 9%. The difference probably results from the lack of representativeness of several of the smaller libraries, as well as some geographical differences between the watersheds. However, the previously demonstrated representativeness of the merged six-watershed library suggests that the resistance patterns do not vary greatly over the geographical range of these watersheds (northern and central Virginia).
As a further test of the geographic range of the merged library, samples from known sources from south Florida and from southwestern Virginia were tested in our laboratory and then classified by the merged library. In general, classification of the isolates from these geographical validation samples was lower than the ARCC of the merged library, with correct classification of just 45% of Florida human isolates and 37% of Florida domesticated isolates ( Table 6). The classification of southwest Virginia isolates was varied, with human isolates being classified better than the isolates in the library (71% compared to 61%), but with domesticated isolates classified much worse (18% compared to 52%).
If the merged library is to be most useful, the resistance patterns should be stable over time. To determine how stable the resistance patterns are, additional samples from three of the watersheds were collected approximately 1 year after the original library was collected. When compared to the original Moores Creek library, the isolates from these temporal validation samples from Moores Creek were classified worse than the isolates in the library (with an average correct classification rate of just 47%) ( Table 6). However, when the merged library was used, they were classified at a higher rate (average ϭ 60%). Similarly, the temporal validation isolates from Holmans Creek were classified better by using the merged library (average ϭ65%) than by using just the Holmans library (average ϭ 48%). Long Glade Creek temporal validation isolates were classified better than the isolates in the library (84%) by using both the Long Glade library and the merged library. On average, all three sets of temporal validation isolates were classified better than the isolates from the merged library, demonstrating that the patterns are temporally stable over a period of 1 year. Given that the merged library is representative and is temporally stable, this would be the library of choice for the identification of unknown isolates. Although the ARCC of the library (Table 7) is 57%, it can still provide useful information, because the classification success rate is well above the rate that would be produced by random classification (33%). For this analysis, the expected frequencies of misclassification for human, domesticated, and wild sources were 20, 22, and 22%, respectively. The average rate at which the entire library was misclassified was 21% Ϯ 1% (mean Ϯ standard deviation), which resulted in an MDP of 25%. Thus, if an average of more than 25% of a particular source type is detected in a watershed, it can be reasonably assumed that it is a true source and not just a result of misclassification of the other source types.

DISCUSSION
The representativeness of the libraries was tested by several techniques. If multiple isolates are included from each sample, the pulled-isolate method does not give a true measure of representativeness, because the similarity of isolates within samples is greater than the similarity between samples. Therefore, the pulled-sample method, even though it is harder to perform, is a better measure of representativeness. By using the pulled-sample method, it is clear that larger libraries are more representative. Both the Long Glade Creek library and the 6,500-isolate merged library show approximately equal classification success when isolates from a nonlibrary sample (one that has been pulled out) are classified and thus are representative. At least 2,300 of the isolates in the Long Glade library were required for it to be representative, which suggests that this is the magnitude of a minimum size for a representative library. Unfortunately, these larger libraries also have the lowest ARCCs. This lowered classification success of the larger libraries probably results from variability in the resistance patterns of the isolates within each source type in the watersheds. In other words, the more isolates of each source type that are contained in the library (i.e., the more representative it is), the greater the chance they will vary in their resistance patterns, which would thus result in lower classification success. If there is variability in the patterns in a watershed, then the ARCC of a small library could be misleading, because it would be unable to classify the large number of unknown isolates that have a pattern that is not included in it. Thus, it is unwise to rely only on the ARCC of a library without also knowing its representativeness.
The levels of classification of the temporal validation isolates are encouraging for maintaining a merged library over the period of 1 year. Although the long-term usefulness of ARA libraries has yet to be demonstrated, many source tracking studies are conducted over the course of a single year, so the temporal stability that we observed for this library would be sufficient for these studies. Because the creation of a representative library requires considerable effort, it would be desirable to have long-term temporal stability. To determine this, we are continuing to monitor the resistance patterns in the Holmans Creek and Long Glade Creek watersheds. There were some differences in classification success from watershed to watershed. This may be indicative of real differences between watersheds (feeding practices, antibiotic usage, types of animals, etc.), or it may be a result of nonrepresentativeness of the individual libraries. When the libraries were merged, the differences decreased, suggesting that the lack of representativeness was causing some of the differences and that geographically close watersheds share some patterns. Unfortunately, the lower classification rates of the geographical validation isolates suggest that libraries will have to be comprised of isolates from a specific region for them to be useful. Further research will be needed to determine the size of these regions.
The ARCCs for the six libraries, especially those of the large Long Glade Creek library and the merged library, are generally lower than the values obtained in previous studies. However, for the watersheds presented here, only unique isolates have been included in the analyses. The presence of duplicate isolates can artificially raise the ARCC by the inclusion of more similar patterns of each source type. Thus, the ARCCs in this paper are more truly indicative of the actual classification abilities of ARA. Also, the Long Glade and merged libraries are larger than previously published libraries, and larger libraries have lower ARCCs. Because ARA is rapid and inexpensive, however, large numbers of unknown isolates can be tested, which can overcome the lower ARCC. Even with an ARCC of 57% and an MDP of 25%, the merged library can be useful for source tracking studies. Often, only the major source(s) of fecal pollution will need to be identified for these studies, and this library can reliably identify sources that are present at average levels above 25%.
The ability of ARA to differentiate among the different source types demonstrates the validity of our working hypothesis that differential antibiotic use selects for bacterial populations with different resistance patterns in animals of different sources. This is an advantage over other source tracking methods that do not rely on detecting a selectable difference. Many library-based methods look for overall genetic differences among the strains, but not for any specific difference. However, overall genetic differences may be hard to demonstrate because of the diverse (10; G. A. Dykes, Letter, Appl. Environ. Microbiol. 68:4698, 2002) and changeable (11) nature of the gut microflora. One specific selectable difference between strains that is commonly proposed is that of adhesion or attachment to the intestinal wall. However, recent evidence shows that although strains of Lactobacillus that adhere to the mucosa of the gut wall are host specific, they are very different from the strains present in the feces (36). By relying on a selectable difference that is exogenously applied to the guts of the source animals, ARA can somewhat reduce the effects of the variation in gut composition.
In conclusion, jackknife analysis of multiple ARA libraries has shown that large libraries are required for representativeness, that libraries from several watersheds can be merged to produce a library that is large enough to be representative, and that the resistance patterns within watersheds are stable over the span of 1 year. The ability to obtain a representative, temporally stable library demonstrates that ARA can be used to identify sources of fecal pollution in natural waters.