Expanding the diversity of bacterioplankton isolates and modeling isolation efficacy with large scale dilution-to-extinction cultivation

Cultivated bacterioplankton representatives from diverse lineages and locations are essential for microbiology, but the large majority of taxa either remain uncultivated or lack isolates from diverse geographic locales. We paired large scale dilution-to-extinction (DTE) cultivation with microbial community analysis and modeling to expand the phylogenetic and geographic diversity of cultivated bacterioplankton and to evaluate DTE cultivation success. Here, we report results from 17 DTE experiments totaling 7,820 individual incubations over three years, yielding 328 repeatably transferable isolates. Comparison of isolates to microbial community data of source waters indicated that we successfully isolated 5% of the observed bacterioplankton community throughout the study. 43% and 26% of our isolates matched operational taxonomic units and amplicon single nucleotide variants, respectively, within the top 50 most abundant taxa. Isolates included those from previously uncultivated clades such as SAR11 LD12 and Actinobacteria acIV, as well as geographically novel members from other ecologically important groups like SAR11 subclade IIIa, SAR116, and others; providing the first isolates in eight putatively new genera and seven putatively new species. Using a newly developed DTE cultivation model, we evaluated taxon viability by comparing relative abundance with cultivation success. The model i) revealed the minimum attempts required for successful isolation of taxa amenable to growth on our media, and ii) identified possible subpopulation viability variation in abundant taxa such as SAR11 that likely impacts cultivation success. By incorporating viability in experimental design, we can now statistically constrain the effort necessary for successful cultivation of specific taxa on a defined medium. Importance Even before the coining of the term “great plate count anomaly” in the 1980s, scientists had noted the discrepancy between the number of microorganisms observed under the microscope and the number of colonies that grew on traditional agar media. New cultivation approaches have reduced this disparity, resulting in the isolation of some of the “most wanted” bacterial lineages. Nevertheless, the vast majority of microorganisms remain uncultured, hampering progress towards answering fundamental biological questions about many important microorganisms. Furthermore, few studies have evaluated the underlying factors influencing cultivation success, limiting our ability to improve cultivation efficacy. Our work details the use of dilution-to-extinction (DTE) cultivation to expand the phylogenetic and geographic diversity of available axenic cultures. We also provide a new model of the DTE approach that uses cultivation results and natural abundance information to predict taxon-specific viability and iteratively constrain DTE experimental design to improve cultivation success.


Assessment of isolate novelty
We quantified taxonomic novelty using BLASTn of our isolate 16S rRNA genes to those of 276 other known isolates collected in three databases: 1) The NCBI nt database (accessed August 277 2018) -"NCBIdb"; 2) a custom database comprised of sequences from DTE experiments in other 278 labs -"DTEdb"; and 3) a database containing all of our isolate 16S rRNA genes -"LSUCCdb". 279 The DTEdb and LSUCCdb fasta files are available as Supplemental Information at 280 https://doi.org/10.6084/m9.figshare.12142098. We compared our isolate sequences to these 281 databases as follows: 282 1) All representative sequences were searched against the nt database using BLASTn 283 (BLAST+v. 2.7.1) with the flags -perc_identity 84, -evalue 1E-6, -task blastn, -outfmt "6 284 qseqid sseqid pident length slen qlen mismatch evalue bitscore sscinames sblastnames 285 stitle", and -negative_gilist to remove uncultured and environmental sequences. The  2) We observed that many known HTCC, IMCC, and HIMB isolates that had previously 294 been described as matching our clades (Figs. S1-5) were missing from the resultant lists The output from these searches is available in Table S1 under the "taxonomic novelty" tab. 305 We placed our LSUCC isolates into 55 taxonomic groups based on sharing ≥ 94% 306 identity and/or their occurrence in monophyletic groups within our 16S rRNA gene trees (Figs. 307 S1-5, see above). For visualization purposes, in groups with multiple isolates we used our 308 chronologically first cultivated isolate as the representative sequence for blastn searches, and while all other hits were labeled as "Other".

312
Geographic novelty was assessed by manually screening the accession numbers from hits 313 to LSUCC isolates with ≥ 99% 16S rRNA gene sequence identity for the latitude and longitude 314 from a connected publication or location name (e.g. source, country, site) in the NCBI 315 description. LSUCC isolates in the Janibacter sp., Micrococcus sp., Altererythrobacter sp., 316 Pseudomonas sp., and Phycicoccus sp. groups (16 total isolates) were not assessed because of 317 missing isolation source information and no traceable publication. Isolation locations were 318 plotted for a subset of important taxa (Table S1 "Map_cultivars" tab) using the 319 "LSUCC_cultivar_map.R" available at our GitHub repository https://github.com/thrash-320 lab/Cultivar-novelty-map. 323 We developed a model using Monte Carlo simulation to estimate the median number of positive 324 and pure wells (and associated 95% confidence intervals (CI)) expected from a DTE experiment 325 for a given taxon at different inoculum sizes (λ), relative abundances (r), and viability (V) (Fig.   326 5). For each bootstrap, the number of cells added to each well was simulated using a Poisson 327 distribution at a mean inoculum size of λ cells per well across n wells. The number of cells added 328 to each well that belonged to a specific taxon was then estimated using a binomial distribution 329 where the number of trials was set as the number of cells in a well and the probability of a cell 330 belonging to a specific taxon, r, was the relative abundance of its representative ASV in the 331 community analysis. Wells that contained at least one cell of a specific taxon were designated 332 'positive'. Wells in which all the cells belonged to a specific taxon were designated as 'pure'. 333 Finally, the influence of taxon-specific viability on recovery of 'pure' wells was simulated using 334 a second binomial distribution, where the number of cells within a 'pure' well was used as the 335 number of trials and the probability of growth was a viability score ranging from 0 to 1. For each 336 simulation, 9,999 bootstraps were performed. Code for the model and all simulations is available 337 in the 'viability_test.py' at our GitHub repository https://github.com/thrash-lab/viability_test.

339
Actual versus expected number of isolates 340 For each taxon in each DTE experiment, the Monte Carlo simulation was used to evaluate 341 whether the number of recovered pure wells for each taxon was within 95% CI of simulated  Table   347 S1 under the "Expected vs actual" tab, and the R script for visualizing this output as Figure 7 is 348 available at our GitHub repository https://github.com/thrash-lab/EvsA-visualization.

350
Estimating viability in under-represented taxa 351 For taxa where the observed number of positive wells was lower than the 95% CI lower limit 352 within a given experiment, and because our analysis was restricted to only those organisms for 353 which our media was sufficient for growth at least once, the deviance was assumed to be a 354 function of a viability term, V, (ranging from 0 to 1) associated with suboptimal growth 355 conditions, dormancy, persister cells, etc. To estimate a value of viability for a given taxon 356 within a particular experiment, the Monte Carlo simulation was run using an experiment-357 appropriate inoculum size, relative abundance, and number of wells (460 for each experiment).

358
Taxon-specific viability was tested across a range of decreasing values from 99% to 1% until 359 such time as the observed number of pure wells for a given taxon fell between the 95% CI 360 bounds of the simulated data. At this point, the viability value is the maximum viability of the 361 taxon that enables the observed number of pure wells for a given taxon to be explained by the 362 model. The results of this output are presented in Table S1 under the "Expected vs actual" tab.

382
General cultivation campaign results 383 We conducted a total of seventeen DTE cultivation experiments to isolate bacterioplankton (sub 384 2.7 µm fraction), with paired microbial community characterization of source waters (0.22 µm -385 2.7 µm fraction), from six coastal Louisiana sites over a three-year period (Table S1). We  3), designed to match the natural environment (Table 1). The MWH suite of media was modified 389 from the JW media to additionally include choline, glycerol, glycine betaine, cyanate, DMSO, 390 DMSP, thiosulfate, and orthophosphate (Table S1). These compounds have been identified as
406 † Experiments where a subset of positive wells were transferred.

407
FWC2 shows the advantage of our method over equation 1 for extreme values.

409
Phylogenetic and geographic novelty of our isolates 410 The 328 isolates belonged to three Phyla: Proteobacteria (n = 319), Actinobacteria (n = 8), and 411 Bacteroidetes (n = 1) (Figs. S1-S5). We placed these isolates into 55 groups based on their UniPo clades were the most frequently cultivated, with 124 of our 328 isolates belonging to these 431 three groups (Table S1). In total, 73 and 10 of the 328 isolates belonged in putatively novel 432 genera and novel species in previously cultivated genera, respectively. We estimated that at least 433 310 of these isolates were geographically novel, being the first of their type cultivated from the 434 nGOM (Fig. 2) OTUs and ASVs, respectively), either because their matching OTUs/ASVs were below our 482 thresholds for inclusion (at least two reads from at least two sites), or because they were below 483 the detection limit from our sequencing effort (Table 2). Thus, 43% and 30% of our isolates 484 belonged to OTUs and ASVs, respectively, with median relative abundances > 0.1%. An enigma that became immediately apparent through a review of our data was the absence of an revealed no clear pattern between abundance and isolation success (Fig. 5). Considering that 500 medium composition was sufficient for cultivation of these organisms on at least some 501 occasions, we hypothesized that cultivation frequency may reflect differences in the capacity for 502 growth within populations of a given taxon. Thus, we decided to model cultivation frequency in 503 relationship to estimated abundances in a way that could generate estimates of cellular viability, 504 defined herein as meaning "presently able to grow in defined medium," as opposed to a broader 505 definition equating viability with being alive more generally, since we only evaluated growth 506 capacity in this study. We hoped that modeling might also help us inform experimental design 507 and make DTE cultivation efforts more predictable (59). obtained an isolate of a particular taxon (Fig. 6)  an assumption that all cells were viable (i.e. V = 100%). In total, for 1,158 out of 1,207 pairwise 568 combinations (95.9%) the observed number of pure wells fell within the 95% CI of data 569 simulated at matching relative abundance and inoculum size, suggesting that these two 570 parameters alone could explain the observed cultivation success for most taxa (Table S1) be statistically likely to recover at least one positive, pure well using inocula of one, two or three 581 cells per well, respectively, with V = 100% (Fig. S9).

582
A small, but taxonomically relevant minority (49 out of 1,207) of pairwise combinations 583 had a number of observed pure wells that fell outside of the simulated 95% CI with V = 100% 584 (Fig. 7). Of these, 28 had either one, two, or three more observed pure wells than the upper 95% 585 CI (Table S1), suggesting cultivability higher than expected based purely a model capturing the 586 interaction between a Poisson-distributed inoculum and a binomially-distributed relative 587 abundance, with V = 100%. However, the deviance from the expected number of positive wells 588 for those above the 95% CI was limited to three or fewer wells, meaning that we only obtained 589 1-3 more isolates than expected (Table S1). On the other hand, those organisms that we isolated 590 less frequently than expected showed greater deviance. 21 out of the 49 outliers had lower than 591 expected cultivability (Fig. 7). These taxa had relative abundances ranging from 2.7% to 14.5%, . 598 We used our model to calculate estimated viability (V est ) for these organisms based on 599 their cultivation frequency at sites where the assumption of V = 100% appeared violated (Table   600 3 One possible biological explanation for why some isolates might have been cultured 654 more frequently than expected is clumped cells. If cells of any given taxon in nature grew in 655 small clusters, then the number of cells we added to a well would have been greater than 656 expected based on a Poisson distribution. Furthermore, the model assumes that each cell is 657 independent, and that the composition of a subset of cells is only a function of the relative 658 abundance of the taxon in the community. Within a cluster of cells, this assumption is violated as 659 the probability of cells being from the same taxon is higher. Thus, the model will underestimate 660 the probability of a well being pure and therefore underestimate the number of pure wells likely 661 to be observed within an experiment, leading to a greater number of isolates than expected.

662
Future microscopy work could examine whether microorganisms such as OM252 and MWH-

663
UniPo form small clusters in situ and/or in pure culture, and whether this phenomenon may be 664 different for different ASVs of OM43, or if clumping may be a transient phenotype. 665 We also identified three taxa-SAR11 LD12, SAR11 subclade IIIa.1, and the 666 aforementioned OM43 ASV7241-that were isolated much less frequently than expected based 667 on their abundances (Fig. 7, Table 3). This could mean that our assumption of V = 100% was 668 incorrect, or that, in contrast to the taxa that were cultured more frequently than expected 669 (above), our methods had biases that overestimated the abundance of these organisms, thereby 670 over-inflating the expected number of isolates. We used the modified 515/806RB primers that 671 have been shown to be much more accurate in quantifying SAR11 compared to FISH than the 672 original 515/806 primers (within 6% ± 4% SD), and this protocol almost always underestimates 673 SAR11 abundance (69). This suggests that our expected number of isolates may have actually 674 been underestimated, our cultivation success poorer than we measured, and therefore we may be 675 overestimating viability for the SAR11 taxa in this study. Other sources of systematic error that 676 might impinge on successful transfers, and thereby reduce our recovery, include sensitivity to 677 pipette tip and/or flask material. However, the fact that these taxa were sometimes successfully 678 isolated means that if these mechanisms were impacting successful transfers, then their activity 679 was less than 100% efficient, which implies variations in subpopulation vulnerability that would  (Table 3).

687
Thus, we believe that these four experiments were unlikely to contribute major errors biasing our persist, rather than die, can still yield logarithmic growth curves (Fig. S10) Table S1 for details). Circles represent LSUCC isolates,  green-isolates within the 95% CI for expected frequency, orange-actual isolates > maximum 95% CI for expected isolates; blue-actual isolates < minimum 95% CI for expected isolates.

884
Circle size is proportional to the deviation of the number of actual isolates from the maximum 885 (for orange) or minimum (for blue) 95% CI for expected isolates. The dotted line is the 1:1 ratio.

886
Notable taxa on the extremities of the actual and expected values are labeled. All datapoints 887 provided in Table S1.