Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis

ABSTRACT Bacterial endophytes that colonize Populus trees contribute to nutrient acquisition, prime immunity responses, and directly or indirectly increase both above- and below-ground biomasses. Endophytes are embedded within plant material, so physical separation and isolation are difficult tasks. Application of culture-independent methods, such as metagenome or bacterial transcriptome sequencing, has been limited due to the predominance of DNA from the plant biomass. Here, we describe a modified differential and density gradient centrifugation-based protocol for the separation of endophytic bacteria from Populus roots. This protocol achieved substantial reduction in contaminating plant DNA, allowed enrichment of endophytic bacteria away from the plant material, and enabled single-cell genomics analysis. Four single-cell genomes were selected for whole-genome amplification based on their rarity in the microbiome (potentially uncultured taxa) as well as their inferred abilities to form associations with plants. Bioinformatics analyses, including assembly, contamination removal, and completeness estimation, were performed to obtain single-amplified genomes (SAGs) of organisms from the phyla Armatimonadetes, Verrucomicrobia, and Planctomycetes, which were unrepresented in our previous cultivation efforts. Comparative genomic analysis revealed unique characteristics of each SAG that could facilitate future cultivation efforts for these bacteria. IMPORTANCE Plant roots harbor a diverse collection of microbes that live within host tissues. To gain a comprehensive understanding of microbial adaptations to this endophytic lifestyle from strains that cannot be cultivated, it is necessary to separate bacterial cells from the predominance of plant tissue. This study provides a valuable approach for the separation and isolation of endophytic bacteria from plant root tissue. Isolated live bacteria provide material for microbiome sequencing, single-cell genomics, and analyses of genomes of uncultured bacteria to provide genomics information that will facilitate future cultivation attempts.

M icroorganisms are the most phylogenetically diverse and abundant life forms on earth, yet an in depth understanding of their individual physiological diversities was largely limited to the species that can be grown in culture until the advent of cultivation independent methods (1,2). The presence of many groups of yet uncultured bacteria was revealed mainly through cultivation-independent molecular surveys based on conserved marker genes (small subunit ribosome component, or 16S rRNA) (3). According to 16S rRNA-based phylogeny, microbial species fall into 60 major descents (phyla or divisions) within the bacterial and archaeal domains, of which half have no cultivated representatives (1). Conventional approaches to bring this uncultured majority of bacteria into pure culture are limited by the ability to mimic the required nutrients and microenvironment conditions. Modern cultivation approaches include the use of microfluidics chips (4), the recent iChip design to cultivate microbes in their natural environments (5), or inferred phenotypic traits for the selection of effective cultivation conditions (6,7). Despite a few successes achieved through such intensive approaches, the large majority of microorganisms yet remain uncultured to such a large extent that this majority has often been referred to as microbial dark matter (8).
An alternative approach to study such intractable organisms is to bypass the culturing altogether and instead infer function from DNA by direct sequencing methods. Metagenomics, or direct sequencing of DNA from mixed environmental samples, can be applied to address the problem of such uncultured microbes (9); in some cases, draft or even complete genomes of the uncultured bacteria have been recovered, computationally segregated into individual taxa or populations, and assembled solely from metagenomics data (10)(11)(12). A complementary culture-independent approach for obtaining genomes from uncultured microbes is single-cell genomics (SCG). This approach involves amplification and sequencing of DNA from single cell or a few cells obtained directly from environmental samples separated by flow cytometry or other methods (13). The SCG approach could sometimes be advantageous over metagenomics sequencing for targeted recovery of genomes. In particular, natural populations that are present in low abundance or samples with high degrees of genomic heterogeneity may be more accessible through SCG than through metagenomics. The power of the SCG approach was demonstrated by a recent study in which 200 single cells were isolated from different habitats, including Nevada hot spring sediments and water from near hydrothermal vents in the Pacific Ocean. The researchers sequenced the genome of each cell and classified the cells into more than 20 new archaeal and bacterial lineages without any cultivated representatives (1). Many large-scale studies, including the Microbial Earth Project (generation of comprehensive genome catalogue of all archaeal and bacterial type strains) and the Human Microbiome Project (sequencing uncultured bacteria from the human microbiome), have relied at least in part on SCG approaches.
Efforts to understand the dynamic interface that exists between plants, the environment, and their microbiomes are critical for biofuel production, agricultural, and environmental sustainability. The soil surrounding the roots of plants accommodates an abundance of microorganisms due to the presence of nutrientrich plant-derived exudates. The interface between plant root and soil constitutes the rhizosphere (14), and the inside of the root tissues constitutes the endosphere environment (15). These two compartments represent distinct environments for the growth of microbes. Both culture-independent and culture-dependent assessments of microbial communities from Populus have been undertaken, which includes community profiling using phylogenetic marker genes (16)(17)(18) and large culture collections of endosphere and rhizosphere isolates (19)(20)(21). The microbiome in these rootassociated environments is comprised primarily of bacteria and fungi and, to a lesser extent, archaea which are virtually absent from the endosphere (18). Each of these may have potentially beneficial, neutral, or detrimental effects on plant growth and development. Microorganisms within the plant endosphere and rhizosphere are metabolically diverse (22)(23)(24) and can promote plant growth by fixing atmospheric nitrogen, solubilizing inorganic phosphorus, increasing the availability of nitrogen sources, producing plant phytohormones, decreasing ethylene stress, suppressing pathogens, and inducing systemic resistance (25)(26)(27)(28)(29)(30). Within the rhizosphere, bacterial concentrations can be as high as 10 9 cells/g of soil (27). A phylogenetically distinct portion of the soil and rhizosphere populations is able to cross into the root and comprise the bacterial endosphere (18). Endophytic bacterial populations can be as high as 10 8 cells/g of root material (27), but most often they are several orders of magnitude less, at 10 4 of 10 5 cell/g of root. Because of the close association between endophytic bacterial communities and host tissues, physical separation of the microorganisms is a challenging task, and certain endophytic groups have been difficult to isolate and culture in a laboratory setting. Culture-independent methods have revealed the information about the uncultured endophytes and their phylogenetic diversities. However, application of metagenomics or SCG methods to interrogate endophytic samples has been difficult due to the prevalence of contaminating plant material and DNA. In this study, we describe a protocol for the enrichment of endophytic bacteria from Populus deltoides roots, upstream of cultivation and isolation, which in turn achieves reduction in host plant material and facilitates single-cell genomics analysis. In a first demonstration, we report on the genomes of organisms within the Armatimonadetes, Verrucomicrobia, and Planctomycetes that were absent in our previous cultivation efforts.

MATERIALS AND METHODS
Root harvesting. Three Populus deltoides saplings were harvested from a field on the Oak Ridge National Laboratory campus (35°55=20.2ЉN, 84°19=24.4ЉW). Whole root samples were collected from each tree, and roots Յ5 mm in diameter were separated for enrichment. Total root weights used for enrichment were ϳ10 g. The roots were cut into 1-to 2-cm-long pieces and placed into a 300-ml sterile flask with 40 ml of autoclaved Milli-Q water. The flasks were shaken at 200 rpm for 1 min, and the liquid was poured through a sterile miracloth (EMD Millipore, Billerica, MA) and collected in a 50-ml conical tube. Then, 100 ml of sterile Milli-Q water was added to the flasks containing the roots, and the flask was placed in a water bath sonicator at 40 kHz (Branson 2510; Danbury, CT) for 5 min to remove the rhizoplane microorganisms. The liquid was again poured through sterile miracloth and collected in a 50-ml conical tube. The two washes were pooled for each tree and represented the rhizosphere samples. The roots were further washed with sterile Mill-Q four more times, and the liquid was discarded. An ethanol and UV-sterilized (15 min) grinder (KSM2; Braun, Kronberg, Germany) was used to disrupt and homogenize the root samples in 40 ml of sterile Milli-Q. The homogenate was poured through sterile miracloth and collected in a 50-ml conical tube. This root homogenate constituted the endosphere sample.
Differential and density centrifugation for microbial enrichment. Microbes were enriched using an adaptation of a previously described method developed by Ikeda et al. (31,32). Prior to the enrichment, 1 ml of the rhizosphere and of the endosphere samples was saved as an unenriched control for sequencing. The endosphere homogenates and the rhizosphere samples were centrifuged at 500 ϫ g for 5 min at 10°C (Spinchron R; Beckman Coulter, Brea, CA). The supernatants were transferred to new conical tubes and centrifuged at 5,500 ϫ g for 20 min at 10°C (Sorvall Evolution RC; Thermo Scientific, Carlsbad, CA). The supernatants were discarded, and the pellet was resuspended in 40 ml of bacterial cell extraction (BCE) buffer (50 mM Tris-HCl [pH 7.5] and 1% Triton X-100). The suspension was filtered through a layer of sterile miracloth and transferred to a sterile 50-ml Oak Ridge tube (Nalgene, Rochester, NY). The suspensions were centrifuged at 10,000 ϫ g for 10 min at 10°C. The supernatants were discarded, and the pellet was resuspended in 40 ml of BCE buffer and filtered through a layer of sterile miracloth. The filtrate was centrifuged again at 10,000 ϫ g for 10 min at 10°C. The supernatant was discarded, and the pellet was resuspended in 6 ml of 50 mM Tris-HCl (pH 7.5). The suspension was overlaid on 4 ml Histodenz (Sigma-Aldrich, St. Louis, MO) solution (8 g Histodenz dissolved in 10 ml of 50 mM Tris-HCl [pH 7.5]) in 10-ml ultra-clear centrifuge tubes (Beckman, Palo Alto, CA) such that the two solutions did not mix. The density centrifugation was run at 10,000 ϫ g for 40 min at 10°C (Optima LE-80K; Beckman Coulter, Brea, CA). The microbial fraction (ϳ1 ml) was visible as a white band at the Histodenz-water interface. The microbial fraction was collected and washed by centrifugation at 10,000 ϫ g for 3 min, followed by removal of the supernatant and resuspension of the pellet in 1 ml of 50 mM Tris-HCl (pH 7.5). Half of the sample was pelleted by centrifugation and stored at Ϫ20°C for DNA extraction. Glycerol at a final concentration of 25% (vol/vol) was added to the other half of the sample, and this sample was stored at Ϫ80°C for single-cell sorting.
DNA extraction for microbiome sequencing. DNA for the enriched and unenriched rhizosphere samples was extracted using the PowerSoil DNA isolation kit (Mo Bio Laboratories, Carlsbad, CA) using the provided protocol. DNA for the enriched and unenriched endosphere samples was extracted using the PowerPlant Pro DNA isolation kit with phenolic removal protocol (Mo Bio Laboratories, Carlsbad, CA) using the provided protocol.
Sequencing, quality control, and analysis of paired-end Illumina data. Libraries were prepared for the enriched endosphere DNA samples. Paired-end sequencing of the V4 region of the bacterial rRNA was performed on the Illumina MiSeq platform (San Diego, CA) using the protocol of Lundberg et al. (33). Sequence processing and quality control were performed through the use of the UPARSE, QIIME, and cutadapt pipelines (34,35), as per Andrei et al. in 2015 (36), with the following modifications: reference-based chimera checking was performed with Ϫminh 1.5. Low read count operational taxonomic units (OTUs) were removed using the command QIIME command filter_otus_from_otu_ table.py -min_count_fraction 0.00005. Finally, enrichment of OTUs was determined via the use of the QIIME script group_significance.py and reported using false discovery rate (FDR)-adjusted P values.
Single-cell sorting, multiple displacement amplification, and 16S rRNA Sanger sequencing. The enriched microbial samples were stained with 5 M Syto 9 nucleic acid stain (Life Technologies, Grand Island, NY). The stained samples were sorted on a Cytopeia Influx cell sorter (BD, Franklin Lakes, NJ) according to a previously published method (37). A flow cytometry plot was generated from forward scatter and green fluorescence. Ten gates were chosen from different positions on the plot. Single cells from enriched rhizosphere and endosphere samples from one tree were sorted into 20 96-well plates (10 plates from the rhizosphere and 10 plates from the endosphere; 1 plate each per gate).
The single-cell sorted plates were stored at Ϫ80°C prior to wholegenome amplification by multiple displacement amplification (MDA), as published previously (37). Briefly For 16S rRNA sequencing of amplified DNA, 1 l of the MDA was diluted into 150 l of PCR-grade water. The remainder of the MDA was stored at Ϫ20°C. Universal 16S rRNA primers 27f (5=-AGAGTTTGA TCMTGGCTCAG-3=) and 1492r (5=-TACGGYTACCTTGTTACGAC TT-3=) were used to PCR amplify (in 50-l reaction mixtures: 1ϫ Pfu buffer, 200 M dNTPs, 2 mM MgCl 2 , 5 g bovine serum albumin, 300 M forward and reverse primers, 0.2 l Pfu polymerase, 37.90 l doubledistilled water (dH 2 O), and 1 l 1:150 MDA product) the majority of the 16S rRNA sequences. Conditions for the PCR were 94°C for 2 min followed by 30 cycles of 94°C for 30 s, 55°C for 30 s, and 72°C for 2 min, with a final extension at 72°C for 5 min. Positive amplifications were identified by gel electrophoresis (1.5% agarose [wt/vol]). Positive PCR products were purified with PCR filtration plates (Millipore, Billerica, MA). The purified 16S rRNA products were sequenced by fluorescent dye-terminator cycle Sanger sequencing at the University of Tennessee Molecular Biology Resource Facility. Phylogenetic identifications were acquired using Ribosomal Database Project (RDP) classifier (38), SILVA incremental aligner (39), and NCBI blastn.
Whole-genome amplification and sequencing of single cells. Singlecell genomes were selected for whole-genome amplification based on 16S rRNA assignment. Nextera XT sequencing libraries (Illumina, La Jolla, CA) were prepared according to the manufacturer's recommendations (Part 15031942 Rev. E), stopping after library validation. In short, samples were fragmented, barcodes were appended, and samples were amplified. Libraries were cleaned using AMPure XP beads (Beckman Coulter, Indianapolis, IN). Final libraries were validated on an Agilent bioanalyzer (Agilent, Santa Clara, CA) using a DNA7500 chip, and concentration was determined on a Qubit with the broad-range double-stranded DNA assay (Life Technologies, Grand Island, NY). Libraries were prepared for sequencing following the manufacturer's recommended protocols. The library was denatured with 0.2 N sodium hydroxide and then diluted to the final sequencing concentration (19 pM). Libraries were loaded into the sequencing cassette (v3), and a paired-end (2-by-300) run was completed on an Illumina MiSeq instrument to obtain single amplified genomes (SAGs).
Single-cell assembly. Demultiplexed Illumina reads from the MiSeq software output were preprocessed using two separate approaches: (1) khmer digital normalization (40) and (2) regular assembly (41). The khmer digital normalization is a routinely applied method to SCG data in order to decrease the memory and time requirements for de novo assembly without significant impact on the assembly contents. The khmer protocol removes the redundant sequence reads, decreases sampling variation, removes the majority of errors, and substantially reduces the size of the sequence data (40). On the other hand, the regular assembly protocol utilized the complete set of raw reads without any data reduction. During regular assembly protocol, the quality trimming and filtering of raw sequence reads were performed for each SAG using CLC Genomics Workbench (CLC) (version 7.5.2) at a quality cutoff value of 0.02 (42). De novo genome assembly for each data set (khmer normalized and CLC trimmed) was performed using four assembly software packages with default options: IDBA-UD (version 1. Single-cell sequence contamination screening. A number of recommended filtering operations (46) were performed to search for contaminated contigs. The first step was to check for any rRNA sequences from assembled SAGs, and blastn was performed to verify that they originated from a target organism of interest. A blastx search was performed against an NCBI-nonredundant database, and any contigs that matched (over half the contig length) with eukaryotic organisms were discarded. GC content was determined for each contig, and any that were outside a Ϯ10% GC content range of the target organism were marked for removal. Cross-contamination between SAGs was analyzed by conservative searching of all assemblies against each other using blastn. Sequence regions that had more than 99.5% identity over at least 5,000 bp with another single cell were removed from the smaller contigs. Additionally, phylogenetic distribution of the genes on all removed contigs was manually reviewed to identify any false positives. The initial annotation of the screened singlecell genomes was performed using the annotation pipeline at Oak Ridge National Laboratory (47), and any contigs that did not contain proteincoding genes were discarded. The quality of the contamination-screened assemblies was verified using kmer frequency analysis (with settings: fragment window, 1,000 bp; fragment step, 200 bp; oligomer size, 4; minimum variation, 10) before and after contamination removal. Contamination-screened assemblies for each SAG were then submitted to the Integrated Microbial Genomes Expert Review (IMG-ER) system (48) for gene prediction and annotation.
Genome completeness estimation. The assembly completeness estimation was performed using the checkM tool (49) and the genome quality scoring matrix (50) with default parameters.
Genome-based phylogenetic tree construction. Universally distributed single-copy marker genes (51) were identified from individual SAGs. NCBI blastn was employed to extract these genes from other organisms within same phylogenetic lineage. For concatenated tree construction, all marker gene sequences extracted from the single organism were renamed per the organism name, e.g., all marker genes extracted from SAG E9H3 were named SAG E9H3. Individual marker genes from different organisms were collected into a single group, e.g., all marker genes corresponding to ribosomal protein L18 were collated as a single group (file) of fasta formatted sequences. Then, 18 files were created, corresponding to 18 commonly used conserved marker genes (see Table S1 in the supplemental material) from our SAGs and selected reference genomes from same phylum and imported into Geneious software (version 9.1.2). Multiplesequence alignment for each individual group (file) was created using the MUSCLE alignment option, with a maximum of 8 iterations allowed. Individual alignments for 18 groups were sorted by high to low percentage pairwise identity and concatenated using the concatenate sequences or alignments tool from Geneious software. A maximum likelihood-based bootstrapped phylogenetic tree of concatenated sequence alignment was constructed using the PHYML tree builder plugin within Geneious software with the following options: substitution model, Blosum62; branch support, Bootstrap; number of bootstraps, 100; and optimized for topology/length/rate with topology search option Best (i.e., best of nearestneighbor interchange [NNI] and subtree pruning and regrafting [SPR] search).
Functional characterization of SAGs. Genome statistics and comparative analyses were performed using various IMG-ER tools (52). The IMG annotation pipeline is integrated with a phenotype prediction tool (52) which generates phenotypes/metabolism assertions from pathways and was used to identify specific genome characteristics. The IMG pipeline also provided lists of protein-coding genes connected to transporter classification, KEGG pathways, and biosynthetic clusters that were used for functional characterization. The complete list of description/annotation for the Pfam clans (53) and the cluster of orthologous groups (COG) categories (54) is available at the IMG website. The abundance profile tool was employed to create functional profiles (containing COG categories and Pfam clans) for each of the SAGs and their corresponding draft/ finished genomes. The abundance profile from the genomes contained a number of predicted genes for each COG/Pfam category, and clusters were identified that were uniquely present in SAGs but not close relatives. Another IMG tool, pathway via KEGG orthology (KO) terms, was used to identify the presence/absence of specific genes within pathways.

RESULTS AND DISCUSSION
Enrichment and analysis of endophytic bacteria. Approximately 10 7 to 10 8 cells were enriched from the rhizosphere and endosphere samples using the current method (data not shown). On average, 33.67 Ϯ 7.07 ng of DNA was isolated from the enrichments. In contrast, unenriched endosphere extractions yielded an average of 605.25 Ϯ 469.84 ng of DNA, most of which was presumably from the host plant. The 16S rRNA phylotyping performed on the three enriched and three unenriched endosphere samples demonstrated that Proteobacteria dominated the endosphere of these saplings. These data showed similar read percent abundances at the phylum level, though some significant differences existed (Fig. 1). Phyla that were significantly increased in read abundance percentage in the average enrichment of the three trees were the Actinobacteria and the Planctomycetia (P Ͻ 0.01, FDR corrected). The Proteobacteria showed different enrichment profiles at the class level. Alphaproteobacteria and Gammaproteobacteria were significantly increased in read abundance percentage (P Ͻ 0.1, FDR corrected). Betaproteobacteria showed no significant difference, while Deltaproteobacteria were significantly decreased in read abundance percentage (P Ͻ 0.01, FDR corrected). Differences in read abundances between enriched and nonenriched samples could be due to several issues. Not all bacteria are captured by the enrichment. Bacteria that are tightly associated with plant material could be lost, as they would be removed with the plant fraction in filtering and centrifugation. Lysis during the enrichment could also change the sequencing read abundance, both positively, with more free DNA to sequence, and negatively, if that free DNA became degraded prior to sequencing. Importantly, contaminating chloroplast reads from the roots were significantly decreased in the enrichment by approximately 10-fold (ϳ7% to Յ0.7% of all reads; P Ͻ 0.01, FDR corrected) due to removal of plant material.
Single-cell sorting, MDA amplification, and sequencing. For single-cell sorting, the endosphere and rhizosphere enrichments from one tree were chosen, and cells from each sample were sorted into 10 96-well plates from 10 different gates on the cytometry plot. After MDA whole-genome amplification and 16S rRNA gene PCR amplification, there were 169 positive 16S rRNA gene amplifications (86 from the endosphere and 83 from the rhizosphere) based on agarose gel observations. PCR investigations of wells that did not produce bacterial 16S rRNA gene signals suggested that a further 179 wells may have contained fungal cells (data not shown). Of the 169 positive 16S rRNA signals, 115 were successfully sequenced by the Sanger method. The RDP classifier (38) and the NCBI reference RNA database were used to assign phylogeny to the amplified signals. Sorted cells represented multiple phyla, including Acidobacteria, Actinobacteria, Armatimonadetes (formally OP10), Bacteroidetes, Firmicutes, Planctomycetes, Proteobacteria, and Verrucomicrobia. Several 16S rRNA sequences appeared to represent members of the human microbiome, with sequences corresponding to Corynebacterium spp., Propionibacterium acnes, and Staphylococcus epidermidis, implying some potential skin contamination. It is unclear where this contamination originated, as care was taken to avoid contamination during the harvest and preparation of the samples; however, these are common contaminants in many studies (55). OTUs of these sequences were present in the 16S rRNA gene phylotyping data, though at low abundances (data not shown). Regardless, novel 16S rRNA sequences (Ͻ97% identity to sequenced relatives) from multiple phyla were present in the sorted cells. Four single-cell genomes were selected for whole-genome sequencing based on representing rare and uncultured phyla (from the NCBI database), abundance of OTUs present within Populus rhizosphere, and their inferred ability to form associations with plant. The 16S rRNA gene sequences from these single cells analyzed by the blast search algorithm revealed greater than 99% identity to Zavarzinella sp. (2 SAGs), Armatimonadetes sp., Acidobacteria sp., and Verrucomicrobia sp. that had previously been observed in microbiome studies of Populus endospheres (16-18) but that were not present in our culture collections from these systems.
Genome assembly. De novo genome assembly of single cells was performed using two data preprocessing approaches (khmer digital normalization and regular assembly) and four assembly software packages (SPAdes, Velvet-SC, IDBA-UD and CLC), as described in Materials and Methods. Independent of preprocessing approach, the IDBA-UD assembler always generated the best assembly statistics with the highest N 50 values and total genome size assembled. It is worth mentioning that, although khmer normalization has become a prevalent step during single cell assembly, the khmer authors have prepared a blog about the application of the khmer protocol (http://ivory.idyll.org/blog/why-you -shouldnt-use-diginorm.html) which clearly suggests that normalization steps are not necessary when comparable results are obtained through regular assembly protocols. Our single-cell assemblies have comparable statistics from both khmer and regular assembly protocols. Based on the recommendations from the blog, IDBA-UD assemblies generated with regular assembly protocols were used for further downstream analysis.
Contamination screening of single-cell amplified genomes. Single-cell sequence data are often found to be contaminated with organisms other than the target population, and contamination removal is a necessary step (56). Contamination screening was performed as described in Materials and Methods. Four SAGs (identification numbers E1D9, E2G8, E9H3, and R9F7) contained 30 to 40% contaminants and generated assembly sizes of 4 to 7 Mb. Most of the contaminating DNA corresponded to eukaryotic lineages, with high similarity to human and plant species. The kmer frequency distribution graphs were created before and after contamination removal steps. Before contamination removal, there were two distinct kmer frequency clouds observed, and one of them (belonging mostly to DNA of eukaryotic origin) was absent after contamination removal, suggesting that we were able to effectively remove the majority of contaminants. Detailed assembly statistics for each SAG after contamination removal are presented in Table 1.
Genome-based phylogenetic inference. Small-subunit (SSU) rRNA trees are well known predictors of phylogenetic novelty (57,58). However, concatenated alignment of multiple universally distributed single-copy marker genes provides greater phylogenetic resolution than any individual gene for estimating a species tree (59). We constructed a bootstrapped maximum likelihood tree based on concatenation of 18 commonly used conserved marker genes that were present in most of our SAGs and selected reference genomes from the same phylum present on IMG (see Table S1 in the supplemental material). Phylogenetic analyses of the 18 gene concatenated alignments (Fig. 2) showed the presence of 3 distinct clusters corresponding to the phyla Armatimonadetes, Planctomycetes, and Verrucomicrobia, each supported by high bootstrapped values (Ͼ90). Each SAG in the analyses grouped with the members from their predicted lineages. The closest relatives for SAGs E9H3 and R9F7 were Zavarzinella formosa strain  (38) available at the RDP database (60) and found to be matching with expected lineages, thus confirming the origin for each SAG. Genome completeness analysis. The checkM tool classified each SAG belonging to the domain Bacteria, with an estimated completeness of 27% for Armatimonadetes sp. SAG E2G8, of 25% for Verrucomicrobia sp. SAG E1D9, of 48% for Planctomycetes sp. SAG R9F7, and of 51% for Planctomycetes sp. SAG E9H3. Despite contamination removal steps, each SAG was determined to contain contaminants at very low levels (Ͻ2%). Detailed quality statistics determined by checkM tool are described in Table 1. Additional evaluation of the quality and completeness of the SAGs was performed by assessment of a set of essential genes present in each genome (50). By this method, the estimated completeness was 64% for Armatimonadetes sp. SAG E2G8, 54% for Verrucomicrobia sp. SAG E1D9, 68% for Planctomycetes sp. SAG R9F7, and 64% for Planctomycetes sp. SAG E9H3 (see Table S2 in the supplemental material). A combined quality score was assigned to each SAG based on the presence of essential gene sets and the completeness of rRNA and tRNA. The combined quality score was Ͼ0.6 for Armatimonadetes sp. SAG E2G8 and Planctomycetes sp. SAGs R9F7 and 0.36 for Verrucomicrobia sp. SAG E1D9 (see Table S2 in the supplemental material). The maximum score assigned by this matrix was 1, in which the complete set of all the essential genes, tRNA, and rRNA were present. These two tools provided independent evaluations for SAG quality estimations using different algorithms. The checkM tool used stringent parameters (ubiquitous and single-copy genes within a phylogenetic lineage, various genomic characteristics, and proximity within a reference genome tree) and provided robust estimations. These completeness estimation results were in accordance with a recent study which estimated genome completeness of 201 SAGs from uncultured archaeal and bacterial cells in the range of less than 10% to greater than 90%, with a mean of 40% (1). Another important factor is that these rare and uncultured small bacterial cells are known to be missing many so-called essential genes and core biosynthetic pathways and so are at least partially dependent on other community members (11,61,62). Therefore, the completeness estimation based on common ubiquitous genes from cultured bacteria may only be a relative measure. In another recent example, a nearcomplete genome of a Verrucomicrobia phylotype was reconstructed from metagenomic data which showed a drastic reduction (2.81 Mb compared to the predicted effective mean genome size of 4.74 Mb for soil bacteria) (63). Therefore, genome reduction could also be a possible reason for comparatively lower completeness estimation scores.
Functional characterization of single cells. The availability of genomic information for uncultured microbes that remain elusive to direct investigation enables comparative genomic analyses and allows inferences about biochemical properties and metabolic traits. These inferences are useful to predict the roles of these microbes in specific environments and could be used to select effective cultivation conditions. Comparisons between SAGs and corresponding finished/draft genomes revealed the presence of several unique genes and functional characteristics of individual SAGs, which allowed for the prediction of putative roles for these bacteria in the plant environment. The putative functional characteristics for individual SAGs compared to close relatives are described below.
(i) SAG of the phylum Armatimonadetes. The Armatimonadetes sp. SAG E2G8 was isolated from the Populus endosphere, and its genome was compared with the complete genomes of the only two cultured members from the same phylum, Fimbriimonas ginsengisoli Gsoil 348 (IMG ID 2585427636) (64) and Chthonomonas calidirosea T49, DSM 23976 (IMG ID 2524614646) (65). One potentially key observation was the unique presence of biotin (vitamin B7) biosynthesis-related genes in SAG E2G8 compared to the two cultured representatives. Biotin biosynthesis starts with the metabolite malonyl-acyl carrier protein (ACP), which is converted to the precursor pimeloyl-ACP through a series of enzymatic reactions. Some bacteria also have an alternative route, in which the precursor pimeloyl-CoA is derived from pimelate (66).
Pimeloyl-ACP and pimeloyl-coenzyme A act as precursor molecules, and conversion to biotin takes place through four reaction steps. Interestingly, the genes involved in the final four steps (8amino-7- (67), suggesting a possible biotin producing phenotype for Armatimonadetes sp. SAG E2G8. However, some intermediate genes involved in conversion of the starting metabolites (malonyl-ACP or pimelate) to precursor molecules were missing from the Armatimonadetes sp. SAG E2G8 (Fig. 3), possibly because the genome was incomplete or because the precursors could be obtained from within the plant endosphere.
The Armatimonadetes sp. SAG E2G8 contains 21 -70-like proteins and has a high factor-to-genome size (/Mb) ratio, as also reported for the Chthonomonas calidirosea strain T49 (65). The high-abundance factors are predicted to coordinate transcriptional regulation of functionally related but dispersed genes (65) and are likely to be involved in the transcription regulatory mechanism in SAG E2G8. Central metabolism appears to proceed via standard glycolysis and the tricarboxylic acid cycle, although some key genes were missing. The presence of genes related to oxidative phosphorylation supports a likely aerobic respiration phenotype. The SAG also contains genes for extracellular nitrate/ nitrite transporters, assimilatory nitrate reductase (narB), and dissimilatory nitrate reduction components (nirB, nirD) involved in nitrogen cycling which could be beneficial inside and outside the plant. We also identified the genes coding for cyanate lyase (Ga0078968_13342) and carbonic anhydrase (Ga0078968_11235, Ga0078968_12064) in SAG E2G8, which might confer the ability to tolerate environmental cyanate.
(ii) SAGs of the phylum Planctomycetes. Two SAGs of the phylum Planctomycetes of endosphere (E9H3) and rhizosphere (R9F7) origins were compared with the draft genome of Zavarzinella formosa strain A10 T (IMG identification number 2548877000) (68), the closest sequenced relative based on 16S rRNA gene sequence similarity. The key distinction between the Planctomycetes SAGs and Zavarzinella formosa strain A10 T was the presence of the urease system as a unique feature of SAG E9H3. The urease gene cluster (including urease ␣, ␤, and ␥ subunits (Ga0078970_101213, Ga0078970_101212, and Ga0078970_101211) and urease accessory proteins UreF (Ga0078970_101214), UreG (Ga0078970_101215), and UreH (Ga0078970_101216) were detected as part of the operon on contig Ga0078970_1012 in SAG E9H3. Other accessory genes coding for the urea binding protein (Ga0078970_10129) and the urea ABC transporters (Ga0078970_10125, Ga0078970_10126) were also detected on the same contig and as part of the operon (Fig. 4). Active ureases require a nickel-containing active site to catalyze the hydrolysis of urea to ammonia and carbamate (69). We also identified the genes related to COG0378 with the predicted function of Ni2 ϩ -binding GTPase involved in regulation of expression and maturation of urease and hydrogenase in SAG E9H3, and these genes were missing from strain A10 T . SAG E9H3 also contained the gene related to hydrogenase/urease accessory protein HupE (Ga0078970_115010), which is implicated as a secondary transporter for nickel or cobalt (70). Additionally, genes involved in various acid tolerance or pH homeostasis mechanisms, such as the F 1 F 0 -ATPase proton pump (71), the arginine and/or glutamate decarboxylase system (72,73), and the urease system (74,75), were present in SAG E9H3 and/or SAG R9F7, suggesting the presence of possible pH tolerance and regulation mechanism.
Most of the genes involved in glycolysis, the citric acid cycle, the pentose phosphate pathway, and pyruvate metabolism were identified in both SAGs and Zavarzinella formosa strain A10 T , which suggests a common route for central metabolism. The IMG phenotype prediction tool (52) predicted an aerobic phenotype for the SAG E9H3 based on presence of the genes coding for cytochrome bd-I ubiquinol oxidase (Ga0078970_104513, Ga0078970_104514) which are known to be involved in ubiquinol oxidation. Interestingly, in the cytochrome bd complex, genes were detected only in E9H3 but were missing from strain A10 T and R9F7, though they could have been missing from R9F7 because the genome was incomplete. Pilus assembly-related genes were also present in both SAGs and might serve the function of cell-to-cell or surface attachment, as observed in case of Z. formosa strain A10 T (76). Further, a gene coding for putative pectate lyase was found in the rhizosphere SAG R9F7 that is indicative of a plant degradation lifestyle. Pectins are a major component of plant cell walls and an abundant carbon source in the rhizosphere (77).
(iii) SAG of the phylum Verrucomicrobia. The Verrucomicrobia sp. SAG E1D9 genome came from the Populus endosphere, and its SAG was compared against the draft genome sequence of its relative Chthoniobacter flavus Ellin428 (78). Most of the genes involved in glycolysis pathway, several genes involved in citric acid cycle, and those of the pentose phosphate pathway were present, suggesting a traditional route for carbon metabolism. Although a majority of the members of the phylum Verrucomicrobia exhibit aerobic phenotypes, many genes involved in oxidative phosphorylation were missing from the SAG E1D9, possibly because of the incomplete nature of the genome. A putative catalase gene (Ga0078966_11592) was present in both SAG E1D9 and Ellin428, though biochemical tests of Ellin428 revealed catalase negative activity (79). Based on the Pfam functional profile, a total of 39 protein-coding genes related to various glycosyl hydrolase families were identified, which included 6 genes corresponding to cellulases (glycosyl hydrolase family 5) and 12 genes corresponding to glycosyl hydrolase family 16. Members of this family are known to hydrolyze a variety of plant glucans and galactans. Twelve of these glycosyl hydrolase genes were found in the Verrucomicrobia sp. SAG E1D9 but not in Ellin428. The presence of various glycosyl hydrolase family-related genes in SAG E1D9 suggests the ability to degrade complex plant material and could indicate how the organism gained access to the endosphere.
Strategies for bringing culture to the uncultured. Cultureindependent approaches have revolutionized our understanding of microbial diversity and evolution (10); however, laboratory cultures are essential for detailed investigations of complex organismal biology and core biosynthetic capacities and to infer specialized functions within communities. There have been examples of genome-informed isolation of novel microbes, in which sequence-derived information was useful to select appropriate cultivation conditions (6,7,80). Similarly, genomic information and characteristics described for current SAGs may be useful to select appropriate cultivation conditions. All of the SAGs described above share an isolation origin, the Populus root environment, which is rich in complex plant polysaccharides like cellulose, hemicellulose, and other complex heteropolysaccharides. Uncultured bacteria, predominantly diverse Planctomycetes, have been shown to be adapted to use these complex heteropolysaccharides for growth, followed by populations of Armatimonadetes and Verrucomicrobia as secondary consumers (81). The current SAGs of Planctomycetes and Verrucomicrobia contain a variety of glycoside hydrolase, polysaccharide, and pectate lyase genes, suggesting the possibility of a mechanism to scavenge a wide variety of plant oligosaccharides and polysaccharides. Therefore, the use of these complex heteropolysaccharides in a growth medium may provide a means for culturing these bacteria by reducing resource competition. The presence of the urease gene cluster and the additional pH tolerance mechanisms of Planctomycetes SAGs hint that growth media with extreme pH conditions and urea as a sole nitrogen source might further reduce nutrient competition. Similarly, the putative biotin biosynthesis ability of the Armatimonadetes SAG would suggest that growth media lacking biotin could limit the growth of biotin heterotrophs. Several of these conditions, including use of diluted, low-nutrient, low-pH media and use of a complex heteropolysaccharide as an energy source, were key to the successful cultivation of first member of phylum Armatimonadetes (OP10) (82) and may also facilitate future cultivation efforts for the organisms represented by these SAGs.
Conclusion. Physical separation and isolation of plant-associated bacteria from plant material are challenging tasks. Our modified enrichment protocol based on differential and density gradient centrifugation was able to achieve a significant reduction in contaminating plant debris and DNA and enriched for bacteria from the rhizosphere and endosphere. This protocol also enabled single-cell genomic analyses of enriched bacterial samples that generated genomes of previously uncultured bacteria of interest. Bioinformatics and comparative genomic analyses revealed the unique characteristics of these SAGs compared to their close relatives. The unique characteristics include the presence of the biotin biosynthesis gene cluster in Armatimonadetes SAG, the urease gene cluster in Planctomycetes SAGs, and the putative ability to degrade complex plant material in Verrucomicrobia SAG. This genomic information may facilitate future efforts to culture these bacteria. This study provides a modified enrichment protocol for the separation and isolation of a live endophytic bacterial sample and facilitates further analyses by single-cell genomics, metagenomics, or culture-based methods.