ABSTRACT
Acquisition of ecologically relevant genes is common among ocean bacteria, but whether it has a major impact on genome evolution in marine environments remains unknown. Here, we analyzed the core genomes of 16 phylogenetically diverse and ecologically relevant bacterioplankton lineages, each consisting of up to five genomes varying at the strain level. Statistical approaches identified from each lineage up to ∼50 loci showing anomalously high divergence at synonymous sites, which is best explained by recombination with distantly related organisms. The enriched gene categories in these outlier loci match well with the characteristics previously identified as the key phenotypes of these lineages. Examples are antibiotic synthesis and detoxification in Phaeobacter inhibens, exopolysaccharide production in Alteromonas macleodii, hydrocarbon degradation in Marinobacter hydrocarbonoclasticus, and cold adaptation in Pseudoalteromonas haloplanktis. Intriguingly, the outlier loci feature polysaccharide catabolism in Cellulophaga baltica but not in Cellulophaga lytica, consistent with their primary habitat preferences in macroalgae and beach sands, respectively. Likewise, analysis of Prochlorococcus showed that photosynthesis-related genes listed in the outlier loci are found only in the high-light-adapted ecotype and not in the low-light adapted ecotype. These observations strongly suggest that recombination with distant relatives is a key mechanism driving the ecological diversification among marine bacterial lineages.
IMPORTANCE Acquisition of new metabolic genes has been known as an important mechanism driving bacterial evolution and adaptation in the ocean, but acquisition of novel alleles of existing genes and its potential ecological role have not been examined. Guided by population genetic theories, our genomic analysis showed that divergent allele acquisition is prevalent in phylogenetically diverse marine bacterial lineages and that the affected loci often encode metabolic functions that underlie the known ecological roles of the lineages under study.
INTRODUCTION
Lateral gene transfer (LGT) has been repeatedly proposed as a major mechanism giving rise to marine microbially diverse populations. A high frequency of LGT occurs in marine bacterioplankton lineages through conjugation (1), transduction (2, 3), genetic transformation (4), and gene transfer agents (5). More recently, the most abundant free-living marine cyanobacteria, Prochlorococcus spp., were found to produce abundant extracellular membrane vesicles (6), which are known as important media for LGT in Escherichia coli (7) and Acinetobacter (8). LGT may transfer a novel gene that did not occur in the recipient genome. It may also replace the original copy with a novel allele at the same locus, a mechanism called homologous recombination (HR). Regardless of these possibilities, LGT is a rapid way to confer new functions to the recipient organisms and may enable new biogeochemical transformations, but it is not clear whether incidences of LGT events have a major impact on genome evolution and diversification of bacterial lineages in the ocean.
To answer these questions, we explored the recombination pattern occurring in the core genomic regions of marine bacterial lineages, each varying at the strain level. Here, the core genome consists of single-copy genes shared by all members of a lineage. Because the core genomic regions represent 82.01% (±7.23%) of the whole genome among the 16 examined lineages in Alphaproteobacteria, Gammaproteobacteria, Flavobacteria, and Cyanobacteria (see Table S1 in the supplemental material), any major recombination pattern in the core genome may have consequences in microbial evolution at the genomic scale. Since synonymous mutations of a protein-coding gene do not lead to amino acid replacement, they are often considered neutral or nearly so. Without recombination, divergence at synonymous sites between two orthologous genes is determined largely by the mutation rate and their divergence time, and most of the variation of synonymous divergence among loci can be explained by the stochastic nature of mutations among genomic regions. If a divergent allele was introduced through recombination, which is not uncommon in prokaryotic lineages, the synonymous site of the recombined loci would expectedly show an unusually large synonymous substitution rate (dS) (defined as the number of synonymous substitutions per synonymous site) compared to those of the remaining regions of the genome (9). Compared to nonsynonymous changes, which are subject to strong functional constraints at the protein sequence level, synonymous changes are largely neutral in a vast majority of genes (10, 11), and thus enormously large dS values provide a reliable measure of allele acquisition from divergent relatives (12–15). It is worth mentioning that the assumption that changes at synonymous sites are effectively neutral has been questioned. Our recent studies have shown that nutrient limitations act as selective pressures driving genomic G+C content evolution in marine bacterioplankton lineages (16, 17). Under these mechanisms, synonymous sites are affected indiscriminately among all genes, and thus unusual dS values are unlikely to arise in a small subset of gene families. In fact, identifying unusual dS values as a way to find genes subject to recombination has been used in several studies (12, 15, 18). Here, we identified such loci and linked their predicted functions to the ecological roles of theses lineages.
RESULTS AND DISCUSSION
Recombination with distant relatives in marine Flavobacteria lineages.As described in Materials and Methods, potential allele acquisition events from distant relatives were differentiated according to the phylogenetic branches where recombination may have occurred. In the case of Cellulophaga baltica, 21 gene families were potentially subject to recombination occurring at the outgroup branch (NN016038) (shown as asterisks in Fig. 1A; see Fig. S1A and Table S2A in the supplemental material). These loci are highly enriched with polysaccharide utilization genes (Fig. 1A; Table S2A). This includes a glycoside hydrolase (GH) cluster (M667_09725, _09730, and _09735) consisting of genes encoding α-1,3-(3,6-anhydro)-l-galactosidase, polygalacturonase, and β-porphyranase, which hydrolyze the glycosidic bonds in the linear polysaccharide chains of agarose, pectin, and porphyran, respectively (19). Phylogenetic analyses showed that these three neighboring GH genes each have a distinct evolutionary history. For instance, Formosa agariphila KMM 3901, a close relative of Cellulophaga baltica, is more closely related to C. baltica 18 and C. baltica E6 for α-1,3-(3,6-anhydro)-l-galactosidase (see Fig. S2A in the supplemental material) but instead becomes more closely related to C. baltica NN016038 for polygalacturonase (Fig. S2B) and β-porphyranase (Fig. S2C), suggesting that recombination occurred independently among these GH genes. Furthermore, this GH cluster is flanked by genes involved in the degradation of various polysaccharides, including multiple copies of genes for sulfatase, peptidylglycine monooxygenase, β-agarase, and alpha-l-fucosidase, as well as tandem susCD-like genes, which together form a large polysaccharide utilization locus (PUL) (Fig. S2D).
Detection of the dS outliers of Cellulophaga baltica (A) and Cellulophaga lytica (B) through two rounds of linear regression analysis. Here, dS was calculated between each of the ingroup genomes and the outgroup genome across all single-copy orthologous genes shared by the three genomes. Take Cellulophaga baltica as an example. Strain NN016038 is the outgroup, and strains E6 and “18” comprise the ingroup. The dS was calculated between NN016038 and E6 and between NN016038 and “18.” The first linear regression (explained in the legend to Fig. S1 in the supplemental material) led to the identification of the outlier genes in which divergent alleles were acquired at the outgroup branch. These outlier genes are shown in Fig. S1A and marked with asterisks in panel A. The second linear regression was performed using the dS of E6/NN016038 against the dS of “18”/NN016038 after the genes shown with asterisks were excluded. It led to the identification of outlier genes resulting from divergent allele acquisitions at one of the ingroup branches, and these outliers are marked as solid triangles. Schematic representations of the two possible allele acquisition scenarios at ingroup branches are also shown: an acquisition at E6 (top left) and “18” (bottom right) in panel A and an acquisition at HI1 (top left) and DSM 7489 (bottom right) in panel B. The locus tags for ecologically relevant outlier genes are shown next to the outlier gene symbols, and their predicted functions are given in Table S2 in the supplemental material.
The loci subject to recombination at the outgroup branch also highlight genes encoding iota-carrageenase (M667_16305), which degrades the red algal cell wall polysaccharide iota-carrageenan, a glycan metabolism protein (M667_07315), and four SusC-like TonB-dependent transporters (M667_01065, _01810, _02955, and _07310) that are known members in the PUL-like systems and function in concert with linked GHs (20, 21). In addition, many more outlier genes are involved in sugar metabolism. These include a gene cluster for sugar transport (M667_10170, _10175, and _10180) and several loci showing evidence of recombination with distant relatives occurring at the ingroup branches (shown as triangles in Fig. 1A), such as a gene cluster for sugar synthesis (M667_05580, _05585, _05590, and _05595), a sugar transferase (M667_03470), a glycosyl hydrolase (M667_09780), and a carbohydrate-binding protein (M667_19640) (Table S2A).
In the case of the closely related lineage Cellulophaga lytica (Fig. 1B; Fig. S1B), polysaccharide-catabolizing genes are surprisingly depleted in the gene pools showing an unusual pattern of dS (Table S2B). Instead, two genes engaged in cobalt/zinc/cadmium resistance (Celly_3219 and _3220) and a few genes involved in the biosynthesis of the O-antigen precursors UDP-α-d-glucuronate (Celly_2638) and UDP-α-d-galactose (Celly_2640) show evidence for recombination with distant relatives.
Recombination with distant relatives in marine Alteromonadales lineages.Application of these approaches to the three closely related genomes of Alteromonas macleodii led to the identification of 44 gene families showing evidence of potential recombination with distant relatives at the outgroup branch and 21 families of recombination at the ingroup branches (Fig. 2A; see Fig. S1C and Table S3A in the supplemental material). One major characteristic is that these genes are often clustered and located in previously reported genomic islands (GIs) (Table S3A) involved in cell surface modifications mainly for exopolysaccharide (EPS) production and for phage avoidance (22, 23). The dS outlier genes related to capsular EPS biosynthesis include genes for a polysaccharide export outer membrane protein (AMEC673_07395), five glycosyltransferases (AMEC673_07440, _07450, _07455, _07490, and _07505), and a putative polysaccharide transporter (AMEC673_07465) (Fig. 3A). Capsular polysaccharide and exopolysaccharide are recognized media for cell-cell communication; they are critical compounds involved in adhesion for surface colonization, cohesion of biofilm, tolerance to antimicrobial agents, resistance to phage infection, and defense against protozoan attack, among other functions (24). Our finding that the dS outlier list includes genes involved in EPS metabolism is consistent with observations that A. macleodii is among the most potent bacteria for EPS production (25), and it is commonly used to produce EPS for industry and other purposes (25, 26).
Detection of dS outliers of Alteromonas macleodii (A) and Paraglaciecola agarilytica (B) through the linear regression procedure outlined in the Fig. 1 legend. Asterisks and solid triangles represent outlier genes in which the divergent alleles were acquired at the outgroup and ingroup branches, respectively. The predicted functions of outlier genes are given in Table S3 in the supplemental material.
Three loci with anomalously high dS values located in previously reported genomic islands (GIs) of exopolysaccharide (EPS) biosynthesis (A), lipopolysaccharide (LPS) O-chain production (B), and flagellum glycosylation (C) in Alteromonas macleodii. Loci with unusually high dS values are indicated by red (recombination at the outgroup branch) and green (recombination at an ingroup branch) arrows, and the remaining genes are indicated by blue arrows. Orthologous genes between genomes are linked with shades in distinct colors: light coral for genes with anomalously high dS values and blue for the rest. Parallelograms illustrate relocations on the same strand, while a twist represents an inversion. Phylograms on the left of each panel describe the relationships among the three taxa.
One prominent example related to phage avoidance in the dS outlier list is the cluster involved in the synthesis of the O chain of lipopolysaccharide (LPS) (AMEC673_13405, _13415, _13465, _13470, _13475, _13540, and _13545) within the LPS O-chain GI proposed previously (23) (Fig. 3B). The LPS O chain has been known as a phage receptor (27), and recombination with divergent relatives is likely an adaptive strategy to escape from phage recognition (28). In fact, high diversity of these loci has been postulated to be an important mechanism for phage avoidance in A. macleodii (23). Another gene cluster, located inside a flagellum glycosylation island, encodes the major exposed components of the flagellum, including two flagellar proteins (AMEC673_04840 and _04845), and an additional gene cluster includes genes for the N-acylneuraminate cytidylyltransferase (AMEC673_04860) essential for the formation of the capsule, the UDP-2,4-diacetamido-2,4,6-trideoxy-beta-l-altropyranose hydrolase (AMEC673_04865) and the pseudaminic acid synthase (AMEC673_04890) used for flagellin modification (29), the DegT/DnrJ/EryC1/StrS aminotransferase (AMEC673_04895), and the nucleoside-diphosphate sugar epimerases (AMEC673_04900) engaged in polysaccharide biogenesis (Fig. 3C). The flagellum has been suggested to be the target for recognition by phages in A. macleodii (23), and flagellum-specific phages may interact with their host by attachment to flagella followed by DNA injection (30). Our analysis suggests that recombination with divergent alleles may allow A. macleodii to escape from phage infection.
In contrast, the dS outlier genes in a related lineage, Paraglaciecola chathamensis (Fig. 2B; Fig. S1D), feature heavy metal resistance functions (Table S3B). This includes a gene cluster (Glaag_4429, _4430, _4431, _4432, _4433, _4434, and _4435) responsible for resistance to cobalt, zinc, and cadmium. Similar czc cassettes for metal efflux pumps are broadly present in Alteromonas genomes and may be used to export toxic products that are imported along with nutrient uptake from particulate organic matter (23). It also includes a gene cluster responsible for mercury (Hg) resistance, consisting of genes for an Hg-responsive transcriptional regulator (Glaag_4413), a periplasmic mercuric ion-scavenging protein (Glaag_4411), two inner membrane-spanning Hg(II) and phenylmercury transporters (Glaag_4410 and _4412), and an Hg(II) reductase (Glaag_4409) (Table S3B). Further analyses suggest that recombination at these Hg resistance loci occurred on the leaf branch of Paraglaciecola chathamensis S18K6 (see Fig. S3 in the supplemental material). This strain was isolated from the deep-sea sediment near Chatham Rise (31), a region where numerous seamounts with hydrothermal vents are distributed (32). One important feature of the vent habitats is high levels of heavy metals, including Hg, and indeed an Hg-resistant bacterium was isolated from a similar vent environment (33). Our finding therefore suggests that allele acquisition from distant relatives at loci involved in Hg metabolism is potentially an important adaptive mechanism for P. chathamensis S18K6.
A third related lineage, Marinobacter hydrocarbonoclasticus, contains 56 gene families showing unusual dS values (see Fig. S4A and Tables S3C and S4A in the supplemental material). Among them, genes involved in aromatic hydrocarbon degradation (MARHY0454) and flagellar assembly (MARHY2506, -2508, and -2178) were subject to repeated recombination with divergent alleles (see Fig. S5A, panels i, g, and h, in the supplemental material). These outlier genes also feature clustered loci responsible for signal transduction, including a cluster (MARHY3575, -3576, -3577, and -3578) encoding the chemotaxis protein CheY, diguanylate cyclase, a two-component system sensor histidine kinase, and another histidine kinase and a second cluster (MARHY1182, -1183, and -1184) encoding a chemotaxis protein, a pseudouridylate synthase, and a diguanylate cyclase. In addition, they include genes for the enzyme glutathione S-transferase (MARHY2825), which has been used as a biomarker of polycyclic aromatic hydrocarbon (PAH) pollution (34), a cluster (MARHY3259, -3260, -3261, and -3262) involved in the catabolism of hydrophobic carbons, including an alkylhydroperoxidase, an alcohol dehydrogenase, a TetR family transcriptional regulator, and an aldehyde dehydrogenase, a cluster (MARHY0073 and -0075) for a dicarboxylic acid transport system, and a cluster (MARHY0216, -0217, -0218, and -0219) encoding a TonB-dependent receptor and ABC transporters. The four strains used here were all isolated from oil-contaminated beach sands (35, 36) and are capable of degrading a wide range of hydrophobic organic compounds (37). Previous metatranscriptomic studies showed that genes involved in PAH, alkane, and toluene degradation, as well as motility and chemotaxis, are often highly expressed in natural beach communities associated with oil contamination (38, 39). Because the genes identified through the dS outlier approaches agree well with the highly expressed genes, we suggest that recombination with distant relatives is potentially an important mechanism increasing their capabilities in hydrocarbon degradation.
A fourth lineage, Pseudoalteromonas haloplanktis, contains 67 gene families showing an anomalous dS pattern (Fig. S4B; Tables S3D and S4B). The most interesting gene within this outlier list is PH505_be00240, an ortholog of the gene for a cold-adapted S-formylglutathione hydrolase (PSHAa1385) in the closely related P. haloplanktis TAC125 (see Fig. S6 in the supplemental material). The crystal structure of this protein from P. haloplanktis TAC125 revealed a significant reduction of arginine residues and iron pairs, as well as a lower arginine/(arginine + lysine) ratio, providing appropriate flexibility around the active site for catalysis at low temperatures (40). Other interesting outlier genes related to cold adaptation include a gene (PH505_ap00150) orthologous to a cold shock RNA methyltransferase gene (PSHAb0516) in P. haloplanktis TAC125 (41) and three genes involved in the defense against oxidative stress, including genes for a peroxiredoxin (PH505_dz00090) responsible for the thiol-dependent reduction of peroxides (42), a chorismate pyruvate-lyase (PH505_bj00180) engaged in the biosynthesis of ubiquinone, which is a lipid subject to oxido-reduction cycles (43), and a lactoylglutathione lyase (PH505_dn00020) transforming methylglyoxal and glutathione to S-lactoylglutathione in methylglyoxal degradation (44). These genes are potentially important for P. haloplanktis to adopt a psychrophilic lifestyle, suggesting that acquisition of divergent alleles is an adaptive mechanism for cold adaptation.
Recombination with distant relatives in marine Roseobacter lineages.Using the same dS outlier approaches, we investigated two marine Roseobacter species, each consisting of three genomes. In the case of Phaeobacter inhibens, three gene families were potentially subject to recombination occurring at the outgroup branch (PG210) (shown as asterisks in Fig. 4A; see Fig. S1E and Table S5A in the supplemental material). These include two genes encoding repeats-in-toxin (RTX) proteins (PGA1_65p00040 and _65p00350) that function as virulence factors (45) and a third gene encoding lactoylglutathione lyase (PGA1_c13890), which is responsible for detoxification of methylglyoxal, a highly toxic by-product of glycolysis metabolism (46) contributing to intracellular invasiveness (47). In addition, 27 gene families show evidence of potential recombination with distant relatives occurring at the ingroup branches (shown as triangles in Fig. 4A). These outlier genes were highly enriched in antibiotic production and resistance (Table S5A). One example is aminoglycoside phosphotransferase (PGA1_c00530), which catalyzes the phosphorylation of the aminoglycoside and macrolide antibiotics, leading to their inactivation and to bacterial antibiotic resistance (48). A second example is d-hydantoinase (PGA1_c12850), which catalyzes the hydrolysis of 5-substituted hydantoins in microorganisms, leading to enantiomerically pure N-carbamyl amino acids, and is widely used for the production of antibiotics, peptide hormones, pyrethroids, and pesticides (49). A third example is the deacylase PagL (PGA1_c04940), which is important for virulence and resistance to cationic antimicrobial peptides via regulating lipid A modification (50). A fourth example includes the two glutathione S-transferases (PGA1_c13330 and _c23760), which are involved in cellular detoxification by catalyzing the conjugation of glutathione with a wide range of endogenous and xenobiotic alkylating agents or by binding antibiotics and reduce the antimicrobial activity of beta-lactam drugs (51, 52). A fifth example is the homoserine/homoserine lactone efflux protein (PGA1_c28370), which is part of the quorum-sensing system in roseobacters (53) and is suggested to be involved in antibiotic/exoenzyme production and biofilm formation in response to high population density (54). Finally, a gene encoding antibiotic biosynthesis monooxygenases (PGA1_c18850) was also in the outlier list. Because Phaeobacter spp. are probiotic bacteria showing strong antagonistic interaction with the common aquaculture pathogens in the Vibrionaceae lineage (55–57), our finding suggests that allele acquisition from distant relatives is potentially an important mechanism supporting this major phenotype.
Detection of dS outliers of Phaeobacter inhibens (A) and Sulfitobacter pontiacus (B) through the linear regression procedure outlined in the Fig. 1 legend. Asterisks and solid triangles represent outlier genes in which the divergent alleles were acquired at the outgroup and ingroup branches, respectively. The predicted functions of outlier genes are given in Table S5 in the supplemental material.
In another Roseobacter lineage, Sulfitobacter pontiacus, we identified seven gene families showing evidence of potential recombination occurring at the outgroup branch and another seven families showing evidence of potential recombination at ingroup branches (Fig. 4B; Fig. S1F). These outlier loci encode various biological functions, such as DNA transposition, signal transduction, and capsular polysaccharide synthesis, but they are not related to antibiotic production and degradation (Table S5B).
Recombination with distant relatives in a marine SAR11 lineage.The k-means procedure identified 55 genes showing unusual pattern of the dS values in the five closely related genomes of “Candidatus Pelagibacter ubique” (see Fig. S4C and Table S6 in the supplemental material). In 8 of the 10 pairwise genome combinations, the mean and median values of dS in this cluster were one or two orders of magnitude greater than those in the other cluster (Table S4C). We reconstructed the most likely scenarios of homologous recombination events that gave rise to the anomalous dS at each of these 55 loci along the phylogeny of these five strains. The most frequently observed pattern (18 of 55) is that all pairs involved in HTCC1040 are highly divergent from each other at synonymous sites, while all remaining pairs are identical or nearly so. This can be readily explained by an acquisition of a divergent allele through recombination on the leaf branch of HTCC1040 or on the ancestral branch, giving rise to the most recent common ancestor (MRCA) of the other four strains (Fig. 5A). Another commonly observed pattern (6 of 55) is that all pairs involved in HTCC1016 have unusually high dS values while the remaining pairs do not, suggesting that recombination occurred on the leaf branch of HTCC1016 (Fig. 5B). Recombination on other leaf branches (HTCC1002, HTCC1062, and HTCC1013) was rare (two gene families for HTCC1002 and one each for HTCC1013 and HTCC1062). For some loci (4 of 55), HTCC1002 and HTCC1062 are very similar to each other at synonymous sites, while they are highly divergent from HTCC1013, HTCC1016, and HTCC1040, suggesting that recombination occurred prior to the diversification of HTCC1002 and HTCC1062 (Fig. 5C). For other loci (1 of 55), HTCC1002, HTCC1062, and HTCC1013 are identical or nearly so to each other at synonymous sites, while they are highly divergent from HTCC1016 and HTCC1040. This can be easily explained by a recombination event having occurred prior to the diversification of HTCC1002, HTCC1062, and HTCC1013 (Fig. 5D). In the above cases, recombination most likely occurred only once. Among the genes showing these patterns include those encoding a number of substrate transporters, such as the TRAP-type C4-dicarboxylate transporter (Ga0076703_1168), tripartite ATP-independent periplasmic transporter (DctQ component) (Ga0076703_1167), tripartite tricarboxylate transporter TctA family (Ga0076703_1244), ABC transporters (mostly for sugar) (Ga0076703_11155, _11156, _11157, and _11158), permease for cytosine/purines (Ga0076703_1164), mannitol transporter (Ga0076703_1166), and Na+-dependent transporter (Ga0076703_11892) (Table S6).
The phylogenetic relationships of five closely related “Candidatus Pelagibacter ubique” genomes, illustrating eight possible scenarios of recombination with distant relatives that gave rise to the anomalous dS values at the 55 loci. This inference was based on the principle of parsimony. Cladograms, rather than phylograms, are used here in order to display the gene recombination among taxa appropriately.
The remaining loci have more complex patterns of synonymous substitution, which must account for two or more recombination events. In two of these loci, for instance, HTCC1002 and HTCC1062 are identical at synonymous sites, while all other possible pairwise combinations are highly divergent. This requires at least three independent recombination events at three ancestral branches of the phylogeny (Fig. 5E). Interestingly, these loci encode an O-antigen ligase (Ga0076703_11357) involved in the synthesis of O antigen, an LPS found in the outer membrane of Gram-negative bacteria. In another locus, encoding a putative lipoprotein (Ga0076703_11990), all pairwise comparisons involving HTCC1062 or HTCC1013 are highly divergent at synonymous sites, but HTCC1062 and HTCC1013 are highly similar at synonymous sites, suggesting that either HTCC1062 or HTCC1013 acquired a divergent allele and this was followed by a recombination event between HTCC1062 and HTCC1013 (Fig. 5F). In a third locus, encoding a glycosyltransferase (Ga0076703_12271) and an endonuclease (Ga0076703_12272), all but the pair HTCC1016 and HTCC1062 are highly divergent at synonymous sites, suggesting that each of the three ancestral branches may have recombined with divergently related organisms and this was followed by a recombination event between HTCC1016 and HTCC1062 (Fig. 5G). In another six genes, the dS values of all pairwise comparisons within the clade HTCC1002/HTCC1062/HTCC1013 are approximately one order of magnitude smaller than those of all the remaining pairwise comparisons, suggesting that recombination with a divergently related organism occurred on the ancestral branch leading to the MRCA of this clade and on the leaf branch of HTCC1040 or on the ancestral branch, giving rise to the clade HTCC1002/HTCC1062/HTCC1013/HTCC1016 (Fig. 5H). This pattern of recombination is evident in a gene cluster consisting of genes for CTP synthase (Ga0076703_12347) and 2-dehydro-3-deoxy-phosphooctonate aldolase (Ga0076703_12346), which is engaged in CMP-2-keto-3-deoxyoctulosonic acid (KDO) biosynthesis, a key step in the biosynthesis of LPS (58), as well as in another gene cluster involved in thiamine salvage comprising genes for a YggS-like protein with a TIM-barrel fold (Ga0076703_11577), a thiamine monophosphate synthase (Ga0076703_11578), and a putative porin (Ga0076703_11579) (Table S6).
The recent isolation of phages (pelagiphage) from “Candidatus Pelagibacter ubique” and discovery of their high abundance in oceanic viromes suggest that SAR11 organisms interact with pelagiphage commonly in the ocean (59). In contrast to recombinational genes encoding various transporters affected by only one recombination event, multiple genes potentially involved in the interaction with phages were subject to repeated recombination with distant relatives, providing further evidence for the ecological significance of the SAR11-pelagiphage interaction. Among these, the gene encoding O-antigen ligase is located in the hypervariable region (HVR2), consistent with a previous speculation that HVR2 may be involved in phage resistance (59). Independent recombination events occurring at the O-antigen ligase gene strongly suggest that LPS plays an important but currently unrecognized role in SAR11 and perhaps many other marine bacteria.
Recombination with distant relatives in marine Prochlorococcus lineages.We also analyzed two high-light (HL)-adapted Prochlorococcus lineages (both in clade HLII) (Fig. 6A, S1G, S4D, and S5C; Table S4D) and two low-light (LL)-adapted Prochlorococcus lineages (clade LLIV and clade LLI) (Fig. 6B, Fig. S1H, S4E, and S5D; Table S4E) using the dS outlier approaches. In the first HLII lineage, consisting of three genomes (MIT 9202, MIT 9201, and MIT 9215), 38 genes fall into the outlier group, where two genes coding for high-light-inducible proteins (HLIPs) (P9215_16211 and _16221) (see Table S7A in the supplemental material) show evidence for recombination with distant relatives. HLIPs play a critical role in protecting the cyanobacterial photosynthetic apparatus under stressful conditions, such as high-level light exposure, nitrogen/sulfur deprivation, and low temperature (60). These genes were found in Prochlorococcus phage genomes and are subject to repeated transfers among hosts through phage infection (61, 62). Phylogenetic analysis suggests that horizontal transfers for these two loci are potentially mediated by different phages (see Fig. S7 in the supplemental material). Another photosynthesis-related gene with evidence for recombination with distant relatives encodes the photosystem II (PSII) reaction center protein H (psbH; P9215_02741) (Table S7A). This protein is potentially important for energy dissipation within PSII (63). We did not find psbH in the 78 available cyanophage genomes, and thus the mechanism underlying its transfer remains unknown. A further interesting finding was that a gene encoding carboxysome shell protein (P9215_06331), involved in CO2 fixation, is also part of the recombination gene pool. In contrast, in the second HLII lineage, consisting of four genomes (AS9601, MIT9301, MIT9314, and SB), only 20 genes show unusual patterns of dS values. This list is enriched with genes involved in the biosynthesis of the cell surface structure, including a UDP-glucose 6-dehydrogenase gene (A9601_13901) and a cluster of four genes engaged in biosynthesis of GDP-mannose (A9601_13971, _13981, _13991, and _14001), a key substrate in glycoprotein formation (Table S7B).
Detection of dS outliers of a high-light (HL)-adapted Prochlorococcus lineage (A), a low-light (LL)-adapted Prochlorococcus lineage (B), and Synechococcus 5.1B (C) through the linear regression procedure outlined in the Fig. 1 legend. Asterisks and solid triangles represent outlier genes in which the divergent alleles were acquired at the outgroup and ingroup branches, respectively. The predicted functions of outlier genes are given in Tables S7 and S8 in the supplemental material.
In the case of the two LL lineages, genes involved in O-antigen biosynthesis are commonly found in the dS outlier list in both lineages (Table S7C and D). Other outlier genes in the LLIV lineage encode dTDP–d-glucose 4,6-dehydratase (rfbB; P9303_01181), NAD-dependent epimerase (P9303_26091), dTDP-4-dehydrorhamnose 3,5-epimerase (rfbC; P9303_00941), and dTDP-4-dehydrorhamnose reductase (rfbD; P9303_00961). These together convert dTDP-α-d-glucose to β-l-rhamnopyranose (64–66), a deoxy-sugar that functions as a building block of the O antigens of many bacterial species (27). They also include several genes encoding group 1 glycosyltransferases (P9303_25631, _25641, and _25651), which are engaged in cell envelope and outer membrane biogenesis. The outlier list in the LLI lineage includes genes for UDP-glucose 4-epimerase (NATL1_08531) and a cluster catalyzing dTDP–l-rhamnose biosynthesis (NATL1_08551, _08561, and _08571).
Recombination with distant relatives in marine Synechococcus lineages.We also applied the dS outlier approaches to three lineages of Synechococcus affiliated with clade 5.1B (Fig. 6C), an unclassified clade (Fig. S1J), and clade 5.1A (Fig. S1K). For the 5.1B lineage, consisting of three genomes, the loci showing evidence for recombination at the outgroup branch feature a cluster of genes involved in the biosynthesis of phycobilisomes (PBS), which are accessory pigments in Synechococcus but are not visible in Prochlorococcus (67). The PBS component genes showing this unusual pattern of dS include a gene encoding ferredoxin oxidoreductase (sync_0491), which catalyzes the conversion of biliverdin-IX-α to phycoerythrobilin, cpaB-1 and cpaB-2 (sync_0495 and _0505), which encode type 1 and 2 phycoerythrin pigments, and two PBS linker polypeptide genes (sync_0512 and _0513), which ensure efficient energy transfer between the peripheral pigments and the photosynthetic reaction center (see Table S8A in the supplemental material). As the PBS pigment composition has been known as a niche-defining feature in Synechococcus (68, 69), the fine-tuning of the pigment-encoding sequences through homologous recombination with distant relatives is likely an important mechanism that Synechococcus uses to adapt to ocean niches where distinct wavelengths of light are available. In the other two Synechococcus lineages, photosynthesis-related genes are surprisingly depleted in the outlier list (Table S8B and C). In the unclassified lineage consisting of PCC7002 and NKBG042902, for instance, these outlier loci contain many genes potentially involved in interaction with phage, including genes for CRISPR-associated proteins Cas5 (SYNPCC7002_D0021) and Cas2 (SYNPCC7002_RS14945), phage integrase (SYNPCC7002_A1888), glycosyltransferases (SYNPCC7002_A1514 and _A1517), and a polysaccharide biosynthesis export protein (SYNPCC7002_A2451) involved in lipooligosaccharide biosynthesis (70). In the case of the 5.1A lineage, the outlier loci feature a number of genes for transporters, including a urea transporter (Syncc9605_2617 and _2618).
Concluding remarks.In sum, the 16 phylogenetically and ecologically divergent heterotrophic marine bacterial populations examined here all modified a subset of their gene repertoire through allele acquisition from evolutionarily distant relatives. This type of recombination may have distinctly facilitated substrate uptake, cell maintenance, phage resistance, antibiotic production, detoxification, adhesion, EPS production, hydrocarbon degradation, polysaccharide degradation, cold adaptation, photosynthesis, and CO2 fixation, among other important cellular processes, and thus may represent an important way that bacteria rapidly adapt to a changing ocean. The dS outlier approaches employed here allow for identification of important differences among closely related phylogenetic lineages (e.g., the two species in Cellulophaga, the high-light- and low-light-adapted ecotypes in Prochlorococcus, clades 5.1A and 5.1B in Synechococcus, and the two genera in the Roseobacter group) and among bacteria that are ecologically distinct but share nearly identical 16S rRNA genes (e.g., Paraglaciecola chathamensis S18K6 from deep-sea sediment near hydrothermal vents compared to two strains from other sediments). As these bacteria are members of Alphaproteobacteria, Gammaproteobacteria, Flavobacteria, and Cyanobacteria, which dominate in global ocean bacterial communities (71–73), our finding has significant implications for microbial evolution and adaptation in the oligotrophic ocean.
MATERIALS AND METHODS
Genomic sequence download, ortholog identification, and phylogenetic analysis.By the time this study was conducted, the genomic sequences varying at the strain level were available for multiple marine bacterial lineages (Table S1), and their predicted protein-coding sequences and 16S rRNA gene sequences were downloaded from NCBI RefSeq (74) and IMG/ER (75). Orthologous gene families shared by all members of a species and its outgroup were identified (Table S1) using the GET_HOMOLOGUES package (76). Members in each gene family were aligned at the amino acid sequence level using the MAFFT software (77), and the columns with gaps were deleted. The trimmed alignments then were concatenated. Best-fitting partition schemes and substitution models were selected by PartitionFinder v2.1.1 (78). A phylogenomic tree for each species was constructed using the RAxML software (79) under the PROTGAMMA model with statistical support estimated from 100 bootstrapped replicates. Phylogenomic trees were also constructed using only the vertically derived genes by deleting the gene families whose members showed a significant signal of homologous recombination through the PhiTest (80) analysis. The resulting topologies are consistent across both methods. The branch lengths and support values obtained from the former method are used in the figures.
Calculation of synonymous substitution rate.All methods for the calculation of dS assume a nucleotide evolution model, and these models fall in two categories depending on whether they take into account the base frequency. The Hasegawa-Kishino-Yano model, for instance, considers the base frequency, while the Kimura two-parameter model does not. Unrealistic results, in particular those for dS, can be obtained if a model that does not control for the base frequency is applied to composition-biased DNA sequences (81). The KaKs_Calculator software is designed to select the best-fit model that accounts for biased nucleotide content based on the corrected Akaike information criterion (82) and thus was employed in the present study, in which the genomic G+C content of the included taxa varies from below 30% to 60% (Table S1).
Outlier identification.We improved the approaches that were used to detect dS outlier genes in a previous study (12). For a lineage with two closely related genomes (e.g., Synechococcus clade 5.1A; a complete list is in Table S1), we ranked the dS values of each genome pair from least to greatest and plotted the dS values against their ranks (Fig. S1J and K). A linear relationship between dS and rank was tested with linear regression analysis (83). The Bonferroni correction was implemented to control for multiple comparisons using the p.adjust function of the “stats” base package of R (83). Loci showing statistically significant Studentized residuals after Bonferroni correction (P < 0.05) were considered outliers.
For a lineage with three closely related genomes, one strain must be an outgroup of the other two strains. Recombination may have occurred in the outgroup strain (or the ancestral branch giving rise to the most recent common ancestor [MRCA] of the other two) and may have also occurred on the leaf branches of the other two. To differentiate these two types, different linear regression approaches were fitted to identify outliers. Taking C. baltica as an example, strain NN016038 is the outgroup of strains E6 and “18,” with the latter two considered “ingroup” strains (Fig. 1A; Fig. S1A). If a divergent allele was introduced into the outgroup (NN016038) or the ancestral branch giving rise to the MRCA of E6 and “18,” both E6 and “18” are expected to show unusually high dS in comparison with NN016038 (Fig. S1A). If E6 acquired a divergent allele, the dS of the E6/NN016038 pair at that locus is expected to be unusually high compared to the dS of the “18”/NN016038 pair (Fig. 1A). Likewise, if strain “18” acquired a divergent allele, the dS of the “18”/NN016038 pair at that locus should be unusually high in comparison to the dS of the E6/NN016038 pair (Fig. 1A). To detect outliers of the first type, we ranked the dS values of the single-copy orthologous genes shared by the E6/NN016038 pair from least to greatest (Fig. S1A) and performed a linear regression analysis of dS against the rank. The same procedure was also applied to the “18”/NN016038 pair. Only those gene families identified as outliers in both genome pairs were considered loci subject to recombination occurring at outgroup (NN016038) or the ancestral branch giving rise to the MRCA of E6 and “18.” To detect outliers of the second type, we performed a linear regression analysis of the dS calculated from the E6/NN016038 pair against the dS calculated from the “18”/NN016038 pair after excluding loci identified as being of the first type. In both types, outliers were defined as gene families whose absolute Studentized residuals exceeded two times the standard deviation and remained significant after Bonferroni correction. The possible influence of outliers and leverage points upon the inference of linearity was minimized using a robust weighted-likelihood approach (wle.lm function in R) (83), which calibrates weights for observations based on their deviation from the smoothed kernel density (84).
For a lineage with four or more closely related genomes, we applied the k-means clustering procedure to identify genes showing unusual patterns of synonymous nucleotide substitution. Taking “Candidatus Pelagibacter ubique” as an example, this method used the all possible pairwise dS values of “Candidatus Pelagibacter ubique” HTCC1002, HTCC1062, HTCC1013, HTCC1016, and HTCC1040 across 1,162 single-copy orthologous genes shared by these five genomes. The number (k) of clusters has to be provided before the k-means procedure can be started. The R package “NbClust” v2.0.1 (85) provides a wide variety of indices for cluster validity and thus was used to determine the optimal number of clusters. For all the lineages under the k-means analysis, k was determined to be two as the optimal value (see Fig. S4A for Marinobacter hydrocarbonoclasticus, Fig. S4B for Pseudoalteromonas haloplanktis, Fig. S4C for “Candidatus Pelagibacter ubique,” Fig. S4D for Prochlorococcus HLII, and Fig. S4E for Prochlorococcus LLI).
Syntenic block analysis.Syntenic blocks of multiple genomes were identified using Sibelia (86), a sequence-referenced approach based on de Bruijn graphs. The cutoff for the minimum length of syntenic blocks was set as 5,000 bp, and 4 rounds of iteration were performed for fine-granularity scalable refinement. Genes in the orthologous families identified above were assigned to a syntenic block with regard to their loci on the chromosome/scaffold/contigs.
ACKNOWLEDGMENTS
We thank Mary Ann Moran for providing valuable advice on an early version of the manuscript.
This research was funded by the Hong Kong Environment and Conservation Fund (15/2016), the Hong Kong Research Grants Council Area of Excellence Scheme (AoE/M-403/16), and a Direct Grant (4930062) from the Chinese University of Hong Kong.
We declare no competing commercial interests in relation to this work.
FOOTNOTES
- Received 15 November 2017.
- Accepted 20 March 2018.
- Accepted manuscript posted online 23 March 2018.
Supplemental material for this article may be found at https://doi.org/10.1128/AEM.02545-17.
- Copyright © 2018 American Society for Microbiology.
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵