Expanding the Diversity of Bacterioplankton Isolates and Modeling Isolation Efficacy with Large-Scale Dilution-to-Extinction Cultivation

Even before the coining of the term “great plate count anomaly” in the 1980s, scientists had noted the discrepancy between the number of microorganisms observed under the microscope and the number of colonies that grew on traditional agar media. New cultivation approaches have reduced this disparity, resulting in the isolation of some of the “most wanted” bacterial lineages. Nevertheless, the vast majority of microorganisms remain uncultured, hampering progress toward answering fundamental biological questions about many important microorganisms. Furthermore, few studies have evaluated the underlying factors influencing cultivation success, limiting our ability to improve cultivation efficacy. Our work details the use of dilution-to-extinction (DTE) cultivation to expand the phylogenetic and geographic diversity of available axenic cultures. We also provide a new model of the DTE approach that uses cultivation results and natural abundance information to predict taxon-specific viability and iteratively constrain DTE experimental design to improve cultivation success.

16S rRNA gene amplicon analyses to compare cultivation results with the microbial communities in the source waters. We have previously reported on the success of our artificial media in obtaining abundant taxa over the course of the first seven experiments from this campaign (35). Here, we expand our report to include cultivation results from a total of 17 experiments, and we update the classic viability calculations of Button et al. (33) with a new model to estimate the viability of individual taxa using relative-abundance information. New isolates belonged to cultivated groups in eight putatively novel genera and seven putatively novel species in previously cultivated genera and expanded cultured geographic representation for many important clades like SAR11. Additionally, using model-based predictions, we identified possible taxonspecific viability variation that can influence cultivation success. By incorporating these new viability estimates into the model, our method facilitates statistically informed experimental design for targeting individual taxa, thereby reducing uncertainty for future culturing work (59).

RESULTS
General cultivation campaign results. We conducted a total of 17 DTE cultivation experiments to isolate bacterioplankton (sub-2.7-m fraction), with paired microbial community characterization of source waters (0.22-to 2.7-m fraction), from six coastal Louisiana sites over a 3-year period (see Table S1 in the supplemental material, available at https://doi.org/10.6084/m9.figshare.12142113). We inoculated 7,820 distinct cultivation wells (all experiments) with an estimated 1 to 3 cells · well Ϫ1 using overlapping suites of artificial seawater media, JW (years 1 and 2 [35]) and MWH (year 3), designed to match the natural environment ( Table 1). The MWH suite of media was modified from the JW media to additionally include choline, glycerol, glycine betaine, cyanate, dimethyl sulfoxide (DMSO), dimethylsulfoniopropionate (DMSP), thiosulfate, and orthophosphate (Table S1). These compounds have been identified as important metabolites and osmolytes for marine and freshwater microorganisms and were absent in the first iteration (JW) of our media (60)(61)(62)(63)(64)(65)(66). A total of 1,463 wells were positive (Ͼ10 4 cells · ml Ϫ1 ), and 738 of these were transferred to 125-ml polycarbonate flasks. For four experiments (FWC, FWC2, JLB2, and JLB3) we transferred only a subset of positives (48/301, 60/403, 60/103, and 60/146) because the number of isolates exceeded our ability to maintain and identify them at that time ( Table 1). The subset of positive wells for these four experiments was selected using flow cytometry signatures usually indicative of smaller oligotrophic cells like SAR11 strain HTCC1062 (49) using our settings. Of the 738 wells from which we transferred cells across all experiments, 328 of these yielded repeatably transferable isolates that we deemed pure cultures based on 16S rRNA gene PCR and Sanger sequencing. Phylogenetic and geographic novelty of our isolates. The 328 isolates belonged to three phyla: Proteobacteria (n ϭ 319), Actinobacteria (n ϭ 8), and Bacteroidetes (n ϭ 1) (see Fig. S1 to S5 in the supplemental material). We placed these isolates into 55 groups based on their positions within 16S rRNA gene phylogenetic trees (Fig. S1 to S5) and as a result of having Ն94% 16S rRNA gene sequence identity to other isolates. We applied a nomenclature to each group based on previous 16S rRNA gene database designations and/or other cultured representatives ( Fig. 1; Table S1 [available at https://doi.org/10 .6084/m9.figshare.12142113]). Isolates represented eight putatively novel genera with Յ94.5% 16S rRNA gene identity to a previously cultured representative: the Actinobac-FIG 1 Percent identity of LSUCC isolate 16S rRNA genes compared with those from other isolates in NCBI ("Other," gray dots) or from the DTE culture collections IMCC (gold dots), HTCC (blue dots), and HIMB (green dots). Each dot represents a pairwise 16S rRNA gene comparison (via BLASTn). x-axis categories are groups designated according to Ն94% sequence identity and phylogenetic placement (see Fig. S1 to S5 in the supplemental material). Above the graph is the 16S rRNA gene sequence percent identity to the closest non-LSUCC isolate within a column. Groups colored in red are those where LSUCC isolates represent putatively novel genera, whereas orange indicates putatively novel species.
teria acIV subclades A and B and one other unnamed Actinobacteria group, an undescribed Acetobacteraceae clade (Alphaproteobacteria), the freshwater SAR11 LD12 ("Candidatus Fonsibacter ubiquis" [29]), the MWH-UniPo and an unnamed Burkholderiaceae clade (Betaproteobacteria), and the OM241 Gammaproteobacteria ( Fig. 1; Table S1). Seven additional putatively novel species in other genera were also isolated (between 94.6 and 96.9% 16S rRNA gene sequence identity) in unnamed Comamonadaceae and Burkholderiales clades (Betaproteobacteria), the SAR92 clade and Pseudohongiella genus (Gammaproteobacteria), and unnamed Rhodobacteraceae and Bradyrhizobiaceae clades, as well as Maricaulis spp. (Alphaproteobacteria) (Fig. 1). Louisiana State University Culture Collection (LSUCC) isolates belonging to the groups BAL58 Betaproteobacteria (Fig. S4), OM252 Gammaproteobacteria, HIMB59 Alphaproteobacteria, and what we designated the LSUCC0101-type Gammaproteobacteria (Fig. S5) had close 16S rRNA gene matches to other isolates at the species level; however, none of those previously cultivated organisms have been formally described (Fig. 1). The OM252, BAL58, and MWH-UniPo clades were the most frequently cultivated, with 124 of our 328 isolates belonging to these three groups (Table S1). In total, 73 and 10 of the 328 isolates belonged in putatively novel genera and novel species in previously cultivated genera, respectively. We estimated that at least 310 of these isolates were geographically novel, being the first of their type cultivated from the nGOM (Fig. 2). This included isolates from cosmopolitan groups like SAR11 subclade IIIa, OM43 Betaproteobacteria, SAR116, and HIMB11-type "Roseobacter" spp. Cultivars from Vibrio sp. and Alteromonas sp. were the only two groups with close relatives (species level) isolated from the GOM.
Natural abundance of isolates. We matched LSUCC isolate 16S rRNA gene sequences with both operational taxonomic units (OTUs) and amplicon single-nucleotide variants (ASVs) from bacterioplankton communities to assess the relative abundances of our isolates in the coastal nGOM waters that served as inocula. While OTUs provide a broad group-level designation (97% sequence identity), this approach can artificially combine multiple ecologically distinct taxa (67). Due to higher stringency for defining a matching 16S rRNA gene, ASVs can increase the confidence that our isolates represent environmentally relevant organisms (68,69). However, while many abundant oligotrophic bacterioplankton clades, such as SAR11 (29,70), OM43 (40,41), SAR116 (71), and Sphingomonas spp. (72), have a single copy of the rRNA gene operon, other taxa can have multiple rRNA gene copies (70,73), complicating ASV analyses. Since we could not a priori rule out multiple rRNA gene operons for novel groups with no genome-sequenced representatives, we used both OTU and ASV approaches.  In total, we obtained at least one isolate from 40 of the 777 OTUs and 71 of the 1,323 ASVs observed throughout the 3-year data set. Forty-three percent and 26% of LSUCC isolates matched the top 50 most abundant OTUs (median relative abundances for all sites, from 8.1 to 0.11% [see Fig. S6A in the supplemental material]) and ASVs (mean relative abundances for all sites, from 3.8 to 0.11% [ Fig. S6B]), respectively, across all sites and samples. Microbial communities from all collected samples clustered into two groups corresponding to those inhabiting salinities below 7 and above 12, and salinity was the primary environmental driver distinguishing community beta diversity (OTU, R 2 ϭ 0.88, P ϭ 0.001; ASV, R 2 ϭ 0.89, P ϭ 0.001). As part of the cultivation strategy after the first five experiments, we used a suite of five media differing by salinity and matched the experiment with the medium that most closely resembled the salinity at the sample site. Consequently, our isolates matched abundant environmental groups from both high-and low-salinity regimes. At salinities above 12, LSUCC isolates matched 13 and 14 of the 50 most abundant OTUs and ASVs, respectively (Fig. 3A and  4A; Table S1 [available at https://doi.org/10.6084/m9.figshare.12142113]). These taxa included the abundant SAR11 subclade IIIa.1, HIMB59, HIMB11-type "Roseobacter," and SAR116 Alphaproteobacteria; the OM43 Betaproteobacteria; and the OM182 and LSUCC0101-type Gammaproteobacteria. At salinities below seven, 10 and 9 of the 50 most abundant OTUs and ASVs, respectively, were represented by LSUCC isolates, including one of the most abundant taxa in both cluster sets, SAR11 LD12 ( Fig. 3B and 4B). Some taxa, such as SAR11 IIIa.1 and OM43, were among the top 15 most abundant taxa in both salinity regimes ( Fig. 3 and 4; Table S1), suggesting a euryhaline lifestyle. In fact, our cultured SAR11 IIIa.1 ASV7471 was the most abundant ASV in the aggregate data set (Fig. S6).
Overall, this effort isolated taxa representing 18 and 12 of the top 50 most abundant OTUs and ASVs, respectively (  OTUs and ASVs, respectively), either because their matching OTUs/ASVs were below our thresholds for inclusion (at least two reads from at least two sites) or because they were below the detection limit from our sequencing effort (Table 2). Thus, 43% and 30% of our isolates belonged to OTUs and ASVs, respectively, with median relative abundances of Ͼ0.1%. Modeling DTE cultivation. An enigma that became immediately apparent through a review of our data was the absence of an obvious relationship between the abundance of a given taxon in the inoculum and the frequency of obtaining an isolate of the same type from a DTE cultivation experiment (see Fig. S7 and S8 in the supplemental material). For example, although we could culture SAR11 LD12 over a range of medium conditions (29) and the matching ASV had relative abundances of Ͼ5% in six of our 17 experiments (Fig. 5), we isolated only one representative (LSUCC0530). In an ideal DTE cultivation experiment where cells are randomly subsampled from a Poissondistributed population, if the medium is sufficient for a given microorganism's growth, then the number of isolates should correlate with that microorganism's abundance in the inoculum. However, a qualitative examination of several abundant taxa that grew in our media, some of which we cultured on multiple occasions, revealed no clear pattern between abundance and isolation success (Fig. 5). Considering that medium composition was sufficient for cultivation of these organisms on at least some occasions, we hypothesized that cultivation frequency may reflect differences in the capacity for growth within populations of a given taxon. Thus, we decided to model cultivation frequency in relationship to estimated abundances in a way that could generate estimates of cellular viability, defined herein as meaning "presently able to grow in defined medium," as opposed to a broader definition equating viability with being alive more generally, since we evaluated only growth capacity in this study. We     (37) 109 (33) hoped that modeling might also help us inform experimental design and make DTE cultivation efforts more predictable (59). Previously, Don Button and colleagues developed a statistical model for viability (V) of cells in the entire population for a DTE experiment (33): Where p is the proportion of wells or tubes, n, with growth z (p ϭ z/n) and is the estimated number of cells inoculated per well (the authors used X originally). The equation uses a Poisson distribution to account for the variability in cell distribution within the inoculum and therefore the variability in the number of wells or tubes receiving the expected number of cells. We and others have used this equation in the past (26,28,35) to evaluate the efficacy of our cultivation experiments in the context of commonly cited numbers for cultivability using agar plate-based methods (13,17,75).
While equation 1 was effective for its intended purpose, it has a number of drawbacks that limit its utility for taxon-specific application: (i) if p ϭ 1, i.e., all wells are positive, then the equation is invalid; (ii) At high values of p and low values of , estimates of V can exceed 100% (Table 1); (iii) accuracy of viability, calculated by the asymptotic standard error (ASE) or the coefficient of variation (CV), was shown to be nonuniform across a range of , with greatest accuracy when true viability was ϳ10% (33), and thus, low viability, low values of , and small values of n were found to produce unreliable results; (iv) if p ϭ 0, i.e., no positive wells are observed, estimates of viability that could produce 0 positive wells cannot be calculated; and (v) Button's original model assumes that a well will produce a pure culture only if the inoculated well contains one cell, but in contrast, in low-diversity samples, samples dominated by a single taxon, or experiments evaluating viability from axenic cultures across different media, a limitation that only wells with single cells are axenic will underestimate the expected number of pure wells.
To overcome these limitations, we developed a Monte Carlo simulation model that facilitates the incorporation of relative-abundance data from complementary community profiling data (e.g., 16S rRNA gene amplicons) to calculate the likelihood of positive wells, pure wells, and viability at a taxon-specific level, based on the observed number of wells for which we obtained an isolate of a particular taxon (Fig. 6). By employing a Monte Carlo approach, our model is robust across all values of p and n with uniform prediction accuracy, and we can estimate the accuracy of our prediction within 95% confidence intervals (CI). Furthermore, the width of 95% CI boundaries of viability, as well as the expected number of positive and pure wells, is entirely controllable and w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w n Step 1: Simulate n wells Step 6: Count wells where viable cells >=1 Step 7: Bootstrap steps 1-6 k times at different levels of viability, 0 ≤ v ≤ 1 Step 8: Identify min, max values of v where # observed wells falls within bootstrapped 95% CI Step 2: Simulate inoculation of wells from Poisson distribution (λ=inoculum) Step 3: Simulate likelihood of taxon from Binomial distribution Step

Evaluation of Large-Scale DTE Cultivation
Applied and Environmental Microbiology dependent only on available computational capacity for bootstrapping (i.e., these can be improved with more bootstrapping, but at greater computational cost). When zero positive wells are observed experimentally, our approach enables estimation of a maximum viability that could explain such an observation by identifying the range of variability values for which zero resides within the bootstrapped 95% CI. Finally, the ability to calculate the viability of the entire community, as in equation 1, is retained simply by estimating viability using a relative abundance of one. We compared our model to that of Button et al. for evaluating viability from whole-community experimental results, similarly to previous reports (26,28,35) (Table  1). Our viability estimates (V est ) generally agreed with those using equation 1, but we have now provided 95% CI to depict the maximum and minimum viabilities that would match the returned positive-well distribution, as well as maximum and minimum values for the number of wells that ought to have contained a single cell. Maximum V est ranged from 1.1% to Ͼ92.3% depending on the experiment, with a median V est across all our experiments of 8.6% (Table 1). In one case, the extremely high value (FWC2) was better handled by our model than by equation 1, because it did not lead to a viability estimation greater than 100%. FWC and FWC2 represent V est outliers compared with the entire data set (maximums of 59.7% and Ͼ92.3%, respectively) ( Table 1). We believe these high numbers most likely resulted from underestimating the number of cells inoculated into each well (because of the use of microscopy, the presence of clumped cells, or possible pipette error [described in reference 35]), thus increasing the estimated viability.
Isolate-specific viability estimates. Our new model also facilitates taxon-specific viability estimates. Cultivation efficacy was evaluated for 71 cultured taxa matching ASVs within our detection limits (219 isolates) across 17 sites (1,207 pairwise combinations) by comparing the number of observed pure wells to those predicted by the Monte Carlo simulation using 9,999 bootstraps, 460 wells per experiment, and an assumption that all cells were viable (i.e., V ϭ 100%). In total, for 1,158 out of 1,207 pairwise combinations (95.9%), the observed number of pure wells fell within the 95% CI of data simulated at matching relative abundance and inoculum size, suggesting that these two parameters alone could explain the observed cultivation success for most taxa (Table S1 [available at https://doi.org/10.6084/m9.figshare.12142113]). A total of 1,059 out of these 1,158 combinations (91%) recorded zero observed wells, but with a maximum relative abundance of 2.8% within these combinations, a score of zero fell within predicted 95% CI of simulations with 460 wells. Sensitivity analysis showed that with 460 wells per experiment, an observation of zero pure wells falls below the 95% CI's lower bound (and is thus significantly depleted to enable viability to be estimated) for taxa with relative abundances of 2.3%, 2.9%, and 4.5% for inoculum sizes of one, two, and three cells per well, respectively (see Fig. S9 in the supplemental material). In fact, modeling DTE experiments from 92 wells to 9,200 wells per experiment showed that for taxa comprising ϳ1% of a microbial community 1,104 wells (or 12 plates at 92 wells per plate), 1,380 wells (15 plates), and 2,576 wells (28 plates) were required to be statistically likely to recover at least one positive, pure well using inocula of one, two, or three cells per well, respectively, with V ϭ 100% (Fig. S9).
A small but taxonomically relevant minority (49 out of 1,207) of pairwise combinations had a number of observed pure wells that fell outside the simulated 95% CI with V ϭ 100% (Fig. 7). Of these, 28 had either one, two, or three more observed pure wells than the upper 95% CI (Table S1 [available at https://doi.org/10.6084/m9.figshare .12142113]), suggesting cultivability higher than expected based purely a model capturing the interaction between a Poisson-distributed inoculum and a binomially distributed relative abundance, with V ϭ 100%. However, the deviance from the expected number of positive wells for those above the 95% CI was limited to three or fewer wells, meaning that we obtained only 1 to 3 more isolates than expected (Table S1). Conversely, those organisms that we isolated less frequently than expected showed greater deviance. 21 out of the 49 outliers had lower-than-expected cultivability (Fig. 7).
These taxa had relative abundances ranging from 2.7% to 14.5% but recorded only 0, 1, or 2 isolates. In the most extreme case, ASV7629 (SAR11 LD12) at site ARD2c comprised 14.5% of the community but recorded no observed pure wells, compared to expected number of 13 to 30 isolates (95% CI) predicted by the Monte Carlo simulation.
All the examples of taxa that were isolated less frequently than expected given the assumption of V ϭ 100% belonged to either SAR11 LD12, SAR11 IIIa.1, or one particular OM43 ASV (7241) (Fig. 6). We used our model to calculate estimated viability (V est ) for these organisms based on their cultivation frequency at sites where the assumption of V ϭ 100% appeared to be violated (Table 3). Using the extreme example of SAR11 LD12 ASV7629 at site ARD2c, simulations across a range of V indicated that a result of zero positive wells fell within 95% of simulated values when the associated taxon V est was Յ15%. When considering all anomalous cultivation results, LD12 had estimated maximum viabilities that ranged up to 55% (Table 3). OM43 (ASV7241) estimated maximum viabilities ranged from 52 to 80%, depending on the site, and similarly, SAR11 IIIa.1 ranged between 22 and 82% maximum viability (Table 3).

DISCUSSION
This work paired 17 DTE cultivation experiments with cultivation-independent assessments of microbial community structure in source waters to evaluate cultivation efficacy. We generated 328 new bacterial isolates representing 40  This campaign led to the first isolations of the abundant SAR11 LD12 and Actinobacteria acIV, the second isolate of the HIMB59 Alphaproteobacteria, and new genera within the Acetobacteraceae, Burkholderiaceae, OM241 and LSUCC0101-type Gammaproteobacteria, and MWH-UniPo Betaproteobacteria, thereby demonstrating again that continued DTE experimentation leads to isolation of previously uncultured organisms with value for aquatic microbiology. We have also added a considerable collection of isolates to previously cultured groups such as OM252 Gammaproteobacteria, BAL58 Betaproteobacteria, and HIMB11-type "Roseobacter" spp., and the majority of our isolates represent the first versions of these types of taxa from the Gulf of Mexico, adding comparative biogeographic value to these cultures.
Our viability model improved upon the statistical equation developed by Button and colleagues (33) to extend viability estimates to individual taxa within a mixed community and provide 95% CI to constrain those estimates. We cultured several groups of organisms abundant enough to evaluate viability with 460 wells (Fig. 7; see Fig. S9 in the supplemental material). The fact that these organisms were successfully cultured at least once meant that we could reasonably assume that the medium was sufficient for growth.
Some taxa were cultivated more frequently than expected (Fig. 7). We explore two possible explanations for this phenomenon: errors in quantification and variation in microbial cell organization. Any systematic error that led to underestimating the abundance of an organism would have correspondingly resulted in our underestimating the number of wells in which we would expect to find a pure culture of that organism. Such underestimations could come from primer biases associated with amplicon sequencing (76-78), but we do not know if those protocols specifically underestimate the OM252, MWH-UniPo, and HIMB11-type taxa cultured more fre- quently than expected (Fig. 7). However, due the low number of expected isolates in these groups and the small deviances in actual isolates from those expected numbers (within 1 to 3 isolates compared to expected values), the biases inherent in the relative abundance estimations for these taxa were probably small. Furthermore, one of the microorganisms isolated more frequently than expected matched the OM43 ASV1389 (see Fig. S6 in the supplemental material), whereas another OM43 ASV (7241) was cultivated less frequently than expected (see below), meaning that if primer bias were the cause of this discrepancy, it would have to be operating differently on very closely related organisms. One possible biological explanation for why some isolates might have been cultured more frequently than expected is clumped cells. If cells of any given taxon in nature grew in small clusters, then the number of cells we added to a well would have been greater than expected based on a Poisson distribution. Furthermore, the model assumes that each cell is independent and that the composition of a subset of cells is only a function of the relative abundance of the taxon in the community. Within a cluster of cells, this assumption is violated, as the probability of cells being from the same taxon is higher. Thus, the model will underestimate the probability of a well being pure and therefore underestimate the number of pure wells likely to be observed within an experiment, leading to a greater number of isolates than expected. Future microscopy work could examine whether microorganisms such as OM252 and MWH-UniPo form small clusters in situ and/or in pure culture and whether this phenomenon may be different for different ASVs of OM43, or if clumping may be a transient phenotype.
We also identified three taxa-SAR11 LD12, SAR11 subclade IIIa.1, and the aforementioned OM43 ASV7241-that were isolated much less frequently than expected based on their abundances ( Fig. 7; Table 3). This could mean that our assumption of V ϭ 100% was incorrect or that, in contrast to the taxa that were cultured more frequently than expected (see above), our methods had biases that overestimated the abundance of these organisms, thereby overinflating the expected number of isolates. We used the modified 515/806RB primers, which have been shown to be much more accurate in quantifying SAR11 compared to fluorescent in situ hybridization (FISH) than the original 515/806 primers (within 6% Ϯ 4% [standard deviation]), and this protocol almost always underestimates SAR11 abundance (76). This suggests that our expected number of isolates may have actually been underestimated and our cultivation success poorer than we measured, and therefore we may be overestimating viability for the SAR11 taxa in this study. Other sources of systematic error that might impinge on successful transfers, and thereby reduce our recovery, include sensitivity to pipette tip and/or flask material. However, the fact that these taxa were sometimes successfully isolated means that if these mechanisms were impacting successful transfers, then their activity was less than 100% efficient, which implies variations in subpopulation vulnerability that would be very similar conceptually to variations in subpopulation viability.
Another possible source of error that could have resulted in lower-than-expected numbers of isolates was the subset of experiments for which we did not transfer all positive wells due to limitations in available personnel time (Tables 1 and 3). However, our selection criteria for the subset of wells to transfer was based on flow cytometric signatures that would have encompassed small cells like SAR11 (see Results), and in any case, there were many examples of lower-than-expected recovery from other experiments where we transferred all positive wells (Table 3). Thus, we believe that these four experiments were unlikely to contribute major errors biasing our estimates of viability for SAR11 LD12, SAR11 IIIa.1, and other small cells like OM43.
If we instead explore biological reasons for the lower-than-expected numbers of positive wells in DTE experiments, a plausible explanation supported by the literature is simply that a large fraction of the population is in some state of inactivity or at least not actively dividing (79). Studies using uptake of a variety of radiolabeled carbon and sulfur sources have demonstrated that substantial fractions of SAR11 cells may be inactive, depending on the population (80)(81)(82)(83). SAR11 cells in the northwest Atlantic and Mediterranean showed variable uptake of labeled leucine (30 to 50% [80,81] and ϳ25 to 55% [83,84]) and amino acids (34 to 61% [80,81] and 34 to 66% [80,81]). Taken in reverse, this means that up to 75% of the SAR11 population may be dormant at any given time. In another study focused on brackish communities, fewer than 10% of SAR11 LD12 cells took up labeled leucine and/or thymidine (82). While this was likely not the ideal habitat for LD12 based on salinities above six (29,82), this study supports the others described above that show substantial proportions of inactive SAR11 cells, the fraction of which may depend on environmental conditions and other unknown factors. Biorthogonal noncanonical amino acid tagging (BONCAT) showed a similar trend for SAR11 (85). These results also match general data indicating prevalent inactivity among aquatic bacterioplankton (79,(86)(87)(88). Although labeled uptake methods do not directly measure rates of cell division, the incorporation of these compounds requires active DNA replication or translation, which represent an even more fundamental level of activity than cell division (89).
Why might selection favor high percentages of subpopulation dormancy? One possibility is as an effective defense mechanism against abundant viruses. Viruses infecting SAR11 have been shown to be extremely abundant in both marine (90,91) and freshwater (92) systems. Indeed, the paradox of high viral abundances and high host abundances in SAR11 has led to a refining of negative density-dependent selection through Lokta-Volterra predator-prey dynamics (93) to include heterogeneous susceptibility at the strain level (94,95) and positive density-dependent selection through intraspecific proliferation of defense mechanisms (96). Activity of lytic viruses infecting SAR11 in situ demonstrated that phages infecting SAR11 have lower ratios of viral transcripts to host cells than in other abundant taxa and that observed abrupt changes in these ratios suggest coexistence of several SAR11 strains with different life strategies and phage susceptibilities (97). Phenotypic stochasticity of phage receptor expression has been shown to maintain a small proportion of phage-insensitive hosts within a population, enabling coexistence of predator and prey without extinction (98). Phages adsorb to a vast array of receptor proteins on their hosts, with many wellcharacterized receptors (e.g., OmpC, TonB, BtuB, and LamB) associated with nutrient uptake or osmoregulation (99). Selection therefore favors phenotypes that limit receptor expression, with an associated fitness cost, particularly in nutrient-limited environments.
However, an alternative mechanism is possible if a population of cells comprised a small number of susceptible cells and a large number of either resistant or dormant cells where presentation of receptor proteins is retained. The majority of host-virus encounters would occur with resistant or dormant cells and would constrain viral propagation through inefficient or failed infection, effectively acting as a sink for infectious particles. Prevalent lysogeny in SAR11 populations would provide a mechanism for establishing resistant cells via superinfection immunity (100,101), where integration of a temperate phage prevents infection by other closely related viruses. There is growing evidence that many viruses infecting SAR11 are temperate (102,103) and that reversion to virulence can be triggered through nutrient limitation (103), in contrast to other systems where lysogeny is favored in nutrient-poor conditions (104). Viral infection may also trigger host dormancy, lowering cellular metabolism to minimize energy requirements under nutrient-limited conditions (105). Such cells would be selected against during cultivation experiments, potentially explaining the rarity of SAR11 isolate genomes found to contain prophages. Dormancy and/or lysogeny would also enable long-term costability between abundant phages and their hosts (106) and resolve the apparent paradox of high host and virus abundances (101).
Detailed measurements of dormancy in SAR11 and what types of cellular functions become inactivated are part of our ongoing work. In the meantime, it is prudent to examine the implications of a substantial proportion of nondividing cells for our understanding of basic growth dynamics. Studies attempting to measure SAR11 growth rates in nature have yielded a wide range of results, ranging from 0.03 to 1.8 day Ϫ1 (70, 80, 83, 107-109). These span wider growth rates than observed for axenic cultures of SAR11 (0.4 to 1.2 day Ϫ1 ), but isolate-specific growth ranges within that spread are much more constrained (29,36,49,110,111). Conversion factors for determining production from [ 3 H]leucine incorporation (112) are accurate for at least some Ia subclade members of SAR11 (113), so variations in growth rate estimates from microradiography experiments likely have other explanations. It is possible that different strains of SAR11 simply have variations in growth rate not captured by existing isolates. Another, not mutually exclusive, possibility is that the differences in in situ growth rate estimates also reflect variations in the proportion of actively dividing cells within the population. A simple model of cell division with binary fission where only a subset of cells divide and nondividing cells persist, rather than die, can still yield logarithmic growth curves (see Fig. S10 in the supplemental material) like those observed for SAR11 in pure culture (29,49,114). However, this subpopulation variability means that the division rate for the subset of cells that are actively dividing is much higher than calculated when assuming 100% dividing cells in the population. Based on our estimated viability for SAR11 LD12 of 15 to 55%, to obtain our previously calculated maximum division rate (0.5 day Ϫ1 ) for the whole culture (29), the per-cell division rate for only a subpopulation would span 2.48 to 0.79 day Ϫ1 (Fig. S10 and supplemental text). Verifying the proportion of SAR11 cells actively dividing in a culture may be challenging. Time-lapse microscopy (115) offers an elegant solution if SAR11 can be maintained for the requisite time periods for accurate measurements in a microfluidic device.
In addition to identifying taxa whose isolation success suggested deviations from biological assumptions of single planktonic cells with 100% viability, the model also revealed the limitations of DTE cultivation in assessing viability depending on relative abundance (Fig. S9). We cannot ascertain whether any given taxon may violate an assumption of V ϭ 100% unless we have enough wells to demonstrate that it grew in fewer wells than expected. For example, taxa at 1% of the microbial community require more than 1,000 wells before the lack of a cultured organism represents a significant negative event, rather than a taxon simply lacking sufficient abundance to ensure inclusion in a well within 95% CI. In our 460-well experiments, we could not resolve whether taxa may have had viabilities below 100% if they were less than 3% of the community for any given experiment (Fig. S9). Modeling DTE experiments showed that for experiments targeting rare taxa, lower inoculum sizes are favored where a selective medium for enrichment is either unknown or undesirable. The exponential increase in the number of required wells with respect to the inoculum size is a function of a pure well requiring all cells within it to belong to the same taxon, assuming all cells are equally and optimally viable.
By providing taxon-specific predictions of viability from cultivation data, our model now facilitates an iterative process to improve experimental design and make cultivation more reliable. First, we use the cultivation success rates to determine for which taxa the assumption of 100% viability was violated. Second, we use the model to estimate viability for those organisms. Third, we use the viability and relativeabundance data to determine, within 95% CI, the appropriate number of inoculation attempts required to isolate a new version of that organism. Using SAR11 LD12 as an example, given a relative abundance of 10% and a viability of 15%, 800 DTE wells should yield four pure, positive wells (95% CI, 1 to 8). This means that, for microorganisms that we know successfully grow in our media, we can now statistically constrain the appropriate number of wells required to culture a given taxon again. For organisms that were not abundant enough to estimate viability using the model, we can use a conservative viability assumption (e.g., 50% [86]) with which to base our cultivation strategy, thereby still reducing uncertainty about the experimental effort necessary to reisolate one of these microorganisms.
Conclusions. This work has provided hundreds of new cultures for microbiological research, many among the most abundant members of the nGOM coastal bacterioplankton community. It also provides another demonstration of the effectiveness of sustained cultivation efforts for bringing previously uncultivated strains into culture.
Our modeled cultivation results have generated compelling evidence for low viability within subpopulations of SAR11 LD12 and IIIa.1, as well as OM43 Betaproteobacteria. The prevalence of and controls on dormancy in these clades deserve further study. We anticipate that future work with larger DTE experiments will yield similar viability data about other groups of taxa with lower abundance, highlighting a valuable diagnostic application of DTE cultivation/modeling beyond the primary role in isolating new microorganisms. The integration of cultivation results, natural-abundance data from inoculum communities, and DTE modeling represents an important step forward in quantifying the risk associated with DTE efforts to isolate valuable taxa from new sources or repeating isolation from the same locations. We hope variations of this approach will be incorporated into wider community efforts to invest in culturing the uncultured.

MATERIALS AND METHODS
Sampling. Surface water samples were collected at six different sites once a year for 3 years, except for Terrebonne Bay, which was collected twice. The sites sampled were Lake Borgne ( Table S1 in the supplemental material [available at https:// doi.org/10.6084/m9.figshare.12142113]). Water collection for biogeochemical and biological analysis followed the protocol described previously (35). Briefly, we collected surface water in a sterile, acidwashed polycarbonate bottle. Duplicate 120-ml water samples were filtered serially through 2. Ϫ , and NO 2 Ϫ . Samples for cell counts were filtered through a 2.7-m GF/D filter, fixed with 10% formaldehyde, and stored on ice until enumeration (maximum of 3 h). Temperature, salinity, pH, and dissolved oxygen were measured using a handheld YSI 556 multiprobe system (YSI Inc., OH, USA). All metadata are available in Table S1.
DTE culturing and propagation. Isolation, propagation, and identification of isolates were completed as previously reported (29,35,116). A subsample of 2.7-m-filtered surface water was stained with 1ϫ SYBR green (Lonza, Basel, Switzerland) using a repeat pipettor and disposal tip (Gilson, WI, USA) and enumerated using a Guava EasyCyte 5HT HPL flow cytometer (Millipore, MA, USA) as described previously (116). After serial dilution to a predicted 1 to 3 cells · l Ϫ1 , 2 l water was inoculated into five 2-ml 96-well polytetrafluoroethylene (PTFE) plates (Radleys, Essex, UK) containing 1.7 ml artificial seawater medium (Table S1 [available at https://doi.org/10.6084/m9.figshare.12142113]) using a 20-l multichannel pipette (Gilson, WI, USA) to achieve an estimated 1 to 3 cells · well Ϫ1 ( Table 1). The salinity of the medium was chosen to match in situ salinity after experiment JLB (January 2015) (Tables 1 and S1). After year two, a second generation of media, designated MWH, was designed to incorporate additional important osmolytes, reduced sulfur compounds, and other constituents (Tables 1 and S1) potentially necessary for in vitro growth of uncultivated clades (49,(117)(118)(119)(120)(121)(122)(123). The four corner wells of each plate were left uninoculated as negative controls for every experiment. Plates were covered using sterile, PTFE-coated silicon 96-well plate mats (Thermo Scientific, MA, USA). Cultures were incubated at in situ temperatures (Table S1) in the dark for 3 to 6 weeks and evaluated for positive growth (Ͼ10 4 cells · ml Ϫ1 ) by flow cytometry. Two hundred microliters from positive wells was transferred using a 200-l singlechannel pipette (Gilson, WI, USA) to duplicate 125-ml polycarbonate flasks (Corning, NY, USA) containing 50 ml of medium (29,35,116). At FWC, FWC2, JLB2c, and JLB3, not all positive wells were transferred because of the large number of positive wells. At each site, 48/301, 60/403, 60/103, and 60/146 of the positive wells were transferred, respectively, selected using flow cytometry signatures with Ͻ10 2 green fluorescence and Ͻ10 2 side scatter, which maximized our chances of isolating small microorganisms that encompass many of the most abundant and most wanted taxa, like SAR11, using our settings (116).
Culture identification. Cultures reaching Ն1 ϫ 10 5 cells · ml Ϫ1 had 35 ml of the 50-ml volume filtered for identification via 16S rRNA gene PCR onto 25-mm, 0.22-m polycarbonate filters (Millipore, MA, USA). DNA extractions were performed using the MoBio PowerWater DNA kit (Qiagen, MA, USA) following the manufacturer's instructions and eluted in sterile water. The 16S rRNA gene was amplified as previously reported by Henson et al. (35) and sequenced at the Michigan State University Research Technology Support Facility Genomics Core. Evaluation of Sanger sequence quality was performed with 4Peaks (v. 1.7.1) (https://nucleobytes.com/4peaks/index.html), and forward and reverse complement sequences (converted via http://www.bioinformatics.org/sms/rev_comp.html) were assembled where overlap was sufficient using the CAP3 web server (http://doua.prabi.fr/software/cap3).
Community iTag sequencing, OTUs, and single-nucleotide variants. Sequentially filtered (2.7 m, 0.22 m) duplicate samples were extracted and analyzed using our previously reported protocols and settings (35,124). We sequenced the 2.7-to 0.22-m fraction for this study because this fraction corresponded with the Ͻ2.7-m communities that were used for the DTE experiments. To avoid batch sequencing effects, DNA from the first seven collections reported previously (35) was resequenced with the additional samples from this study (FWC2 and after) ( Table 1). We targeted the 16S rRNA gene V4 region with the 515F-806RB primer set (which corrects for poor amplification of taxa like SAR11) (76,77) using Illumina MiSeq 2ϫ 250-bp paired-end sequencing at Argonne National Laboratories, resulting in 2,343,106 raw reads for the 2.7-to 0.22-m fraction. Using mothur v1. 33.3 (125), we clustered 16S rRNA gene amplicons into distinctive operational taxonomic units (OTUs) with a 0.03 dissimilarity threshold (OTU 0.03 ) and classified them according to the Silva v119 database (126,127). After these steps, 55,256 distinct OTUs 0.03 remained. We also used minimum entropy decomposition (MED) to partition reads into fine-scale amplicon single-nucleotide variants (ASVs) (68). Reads were first analyzed using mothur as described above up to the screen.seqs() command. The cleaned-reads fasta file was converted to MED-compatible headers with the "mothur2oligo" tool renamer.pl from the functions in MicrobeMiseq (https://github.com/DenefLab/MicrobeMiseq) (128) using the fasta output from screen.seqs() and the mothur group file. These curated reads were analyzed using MED (v. 2.1) with the flags -M 60, and -d 1. MED resulted in 2,813 refined ASVs. ASVs were classified in mothur using classify.seqs(), the Silva v119 database, and a cutoff bootstrap value of 80% (129). After classification, we removed ASVs identified as "chloroplast," "mitochondria," or "unknown" from the data set.
Community analyses. OTU (OTU 0.03 ) and ASV abundances were analyzed within the R statistical environment v.3.2.1 (130) following previously published protocols (29,35,124). Using the package PhyloSeq (131), OTUs and ASVs were rarefied using the command rarefy_even_depth(), and OTUs/ASVs without at least two reads in four of the 34 samples (2 sites; ϳ11%) were removed. This cutoff was used to remove potentially spurious OTUs/ASVs resulting from sequencing errors. Our modified PhyloSeq script is available on our GitHub repository, https://github.com/thrash-lab/Modified-Phyloseq. After filtering, the data sets contained 777 unique OTUs and 1,323 unique ASVs (Table S1 [available at https://doi.org/10.6084/m9.figshare.12142113]). For site-specific community comparisons, beta diversity between sites was examined using Bray-Curtis distances via ordination with nonmetric multidimensional scaling (NMDS) ( Table S1). The nutrient data were normalized using the R function "scale," which subtracts the mean and divides by the standard deviation for each column. The influence of the transformed environmental parameters on beta diversity was calculated in R with the envfit function. Relative abundances of an OTU or ASV from each sample were calculated as the number of reads over the sum of all the reads in that sample. The relative abundance was then averaged between biological duplicates for a given OTU or ASV. To determine the best-matching OTU or ASV for a given LSUCC isolate, the OTU representative fasta file, provided by mothur using get.oturep(), and the ASV fasta file were used to create a BLAST database (makeblastdb) against which the LSUCC isolate 16S rRNA genes could be searched via blastn (BLAST v 2.2.26) ("OTU_ASVrep_db"; available as supplemental information at https://doi.org/10.6084/m9.figshare.12142098). We designated an LSUCC isolate 16S rRNA gene match with an OTU or ASV sequence based on Ն97% or Ն99% sequence identity, respectively, as well as a Ն247-bp alignment.
16S rRNA gene phylogeny. Taxa in the Alpha-, Beta-, and Gammaproteobacteria phylogenies from reference 35 served as the backbone for the trees in the current work. For places in these trees with poor representation near isolate sequences, additional taxa were selected by searching the 16S rRNA genes of LSUCC isolates against the NCBI nucleotide database online with BLASTn (132) and selecting a variable number of best hits. The Bacteroidetes and Actinobacteria trees were composed entirely of nonredundant top 100 to 300 MegaBLAST hits to a local version of the NCBI nucleotide database, accessed August 2018. Sequences were aligned with MUSCLE v3.6 (133) using default settings and culled with TrimAl v1.4.rev22 (134) using the -automated1 flag, and the final alignment was inferred with IQ-TREE v1.6.11 (135) with default settings and -bb 1000 for ultrafast bootstrapping (136). Tips were edited with the nw_rename script within Newick Utilities v1.6 (137), and trees were visualized with Archaeopteryx (138). Fasta files for these trees and the naming keys are available as supplemental information at https://doi.org/10.6084/ m9.figshare.12142098.
Assessment of isolate novelty. We quantified taxonomic novelty using BLASTn of our isolate 16S rRNA genes to those of other known isolates collected in three databases: (i) The NCBI nucleotide database (accessed August 2018) (NCBIdb), (ii) a custom database comprised of sequences from DTE experiments in other labs (DTEdb), and (iii) a database containing all of our isolate 16S rRNA genes (LSUCCdb). The DTEdb and LSUCCdb fasta files are available as supplemental information at https://doi .org/10.6084/m9.figshare.12142098. We compared our isolate sequences to these databases as follows.
(i) All representative sequences were searched against the nucleotide database using BLASTn (BLASTϩv. 2.7.1) with the flags -perc_identity 84, -evalue 1E-6, -task blastn, -outfmt "6 qseqid sseqid pident length slen qlen mismatch evalue bitscore sscinames sblastnames stitle," and -negative_gilist to remove uncultured and environmental sequences. The negative GI list was obtained by searching "environmental samples[organism] OR metagenomes[orgn]" in the NCBI nucleotide database (accessed 12 September 12 2018), and hits were downloaded in GI list format. This negative GI list is available as supplemental information at https://doi.org/10.6084/m9.figshare.12142098. The resultant hits from the NCBIdb search were further manually curated to remove sequences classified as single-cell genomes, clones, duplicates, and previously deposited LSUCC isolates.
(ii) We observed that many known HTCC, IMCC, and HIMB isolates that had previously been described as matching our clades (see Fig. S1 to S5 in the supplemental material) were missing from the resultant lists of nucleotide hits, so we extracted isolate accession numbers from numerous DTE experiments (26-28, 31, 34, 37, 44, 139, 140) from the nucleotide database via blastdbcmd and generated a separate DTEdb using makeblastdb. Duplicate accession numbers found in the NCBIdb were removed. The same BLASTn settings as for the first step were used to search our isolate sequences against DTEdb. Any match that fell below the lowest percent identity hit to the NCBIdb was removed from the DTEdb search, since the match would not have been present in the first NCBIdb search.
(iii) Finally, using the same BLASTn settings, we compared all pairwise identities of our 328 LSUCC isolate 16S rRNA gene sequences via the LSUCCdb.
We placed our LSUCC isolates into 55 taxonomic groups based on sharing Ն94% identity and/or their occurrence in monophyletic groups within our 16S rRNA gene trees (Fig. S1 to S5) (see above). For visualization purposes, in groups with multiple isolates we used our chronologically first cultivated isolate as the representative sequence for blastn searches, and these are the top point (100% identity to itself) in each group column of Fig. 1. Sequences from the other DTE culture collections were labeled with the corresponding collection name, while all other hits were labeled "other." Geographic novelty was assessed by manually screening the accession numbers from hits to LSUCC isolates with Ն99% 16S rRNA gene sequence identity for the latitude and longitude from a connected publication or location name (e.g., source, country, or site) in the NCBI description. LSUCC isolates in the Janibacter sp., Micrococcus sp., Altererythrobacter sp., Pseudomonas sp., and Phycicoccus sp. groups (16 total isolates) were not assessed because of missing isolation source information and no traceable publication. Isolation locations were plotted for a subset of important taxa (Table S1 [available at https://doi.org/10.6084/m9.figshare.12142113], "Map_cultivars" tab) using the "LSUCC_cultivar_map.R" available at our GitHub repository, https://github.com/thrash-lab/Cultivar-novelty-map.
Modeling DTE cultivation via Monte Carlo simulations. We developed a model using Monte Carlo simulation to estimate the median number of positive and pure wells (and associated 95% confidence intervals [CI]) expected from a DTE experiment for a given taxon at different inoculum sizes (), relative abundances (r), and viability (V) (Fig. 5). For each bootstrap, the number of cells added to each well was simulated using a Poisson distribution at a mean inoculum size of cells per well across n wells. The number of cells added to each well that belonged to a specific taxon was then estimated using a binomial distribution where the number of trials was set as the number of cells in a well and the probability of a cell belonging to a specific taxon, r, was the relative abundance of its representative ASV in the community analysis. Wells that contained at least one cell of a specific taxon were designated "positive." Wells in which all the cells belonged to a specific taxon were designated "pure." Finally, the influence of taxon-specific viability on recovery of pure wells was simulated using a second binomial distribution, where the number of cells within a pure well was used as the number of trials and the probability of growth was a viability score ranging from 0 to 1. For each simulation, 9,999 bootstraps were performed. Code for the model and all simulations is available at our GitHub repository, https:// github.com/thrash-lab/viability_test. Actual versus expected number of isolates. For each taxon in each DTE experiment, the Monte Carlo simulation was used to evaluate whether the number of recovered pure wells for each taxon was within 95% CI of simulated estimates, assuming optimum growth conditions (i.e., V ϭ 100%). For each of 9,999 bootstraps, 460 wells were simulated with the inoculum size used for the experiment and the relative abundance of the ASV. For taxa where the number of expected wells fell outside the 95% CI of the model, a deviance score was calculated as the difference between the actual number of wells observed and median of the simulated data set. The results of this output are presented in Table S1 (available at https://doi.org/10.6084/m9.figshare.12142113) under the "Expected vs actual" tab, and the R script for visualizing this output (Fig. 7) is available at our GitHub repository, https://github.com/thrash -lab/EvsA-visualization.
Estimating viability in underrepresented taxa. For taxa where the observed number of positive wells was lower than the 95% CI lower limit within a given experiment, and because our analysis was restricted to only those organisms for which our medium was sufficient for growth at least once, the deviance was assumed to be a function of a viability term, V (ranging from 0 to 1), associated with suboptimal growth conditions, dormancy, persister cells, etc. To estimate a value of viability for a given taxon within a particular experiment, the Monte Carlo simulation was run using an experiment-appropriate inoculum size, relative abundance, and number of wells (460 for each experiment). Taxon-specific viability was tested across a range of decreasing values from 99% to 1% until such time as the observed number of pure wells for a given taxon fell between the 95% CI bounds of the simulated data. At this point, the viability value is the maximum viability of the taxon that enables the observed number of pure wells for a given taxon to be explained by the model. The results of this output are presented in Table 3 and Table S1 (available at https://doi.org/10.6084/ m9.figshare.12142113) under the "Expected vs actual" tab.
Likelihood of recovering taxa at different relative abundances. To estimate the number of wells required in a DTE experiment to have a significant chance of recovering a taxon with a relative abundance of r, assuming optimum growth conditions (V ϭ 100%), the Monte Carlo model was used to simulate experiments from 92 wells to 9,200 wells per experiment across a range of relative abundances from 0% to 100% in 0.1% increments and a range of inoculum sizes (cells per well of 1, 1.27, 1.96, 2, 3, 4, and 5). Each experiment was bootstrapped 999 times, and the number of bootstraps in which the lower bound of the 95% CI was Ն1 was recorded.
Data availability. All iTag sequences are available at the Sequence Read Archive with accession numbers SRR6235382 to SRR6235415 (29). PCR-generated 16S rRNA gene sequences from this study are accessible in NCBI GenBank under the accession numbers MK603525 to MK603769. Previously generated 16S rRNA gene sequences are accessible in NCBI GenBank under the accession numbers KU382357 to KU382438 (35). Table S1 is available at https://doi.org/10.6084/m9.figshare.12142113.