ABSTRACT
Survival and growth of the anaerobic gut fungi (AGF; Neocallimastigomycota) in the herbivorous gut necessitate the possession of multiple abilities absent in other fungal lineages. We hypothesized that horizontal gene transfer (HGT) was instrumental in forging the evolution of AGF into a phylogenetically distinct gut-dwelling fungal lineage. The patterns of HGT were evaluated in the transcriptomes of 27 AGF strains, 22 of which were isolated and sequenced in this study, and 4 AGF genomes broadly covering the breadth of AGF diversity. We identified 277 distinct incidents of HGT in AGF transcriptomes, with subsequent gene duplication resulting in an HGT frequency of 2 to 3.5% in AGF genomes. The majority of HGT events were AGF specific (91.7%) and wide (70.8%), indicating their occurrence at early stages of AGF evolution. The acquired genes allowed AGF to expand their substrate utilization range, provided new venues for electron disposal, augmented their biosynthetic capabilities, and facilitated their adaptation to anaerobiosis. The majority of donors were anaerobic fermentative bacteria prevalent in the herbivorous gut. This study strongly indicates that HGT indispensably forged the evolution of AGF as a distinct fungal phylum and provides a unique example of the role of HGT in shaping the evolution of a high-rank taxonomic eukaryotic lineage.
IMPORTANCE The anaerobic gut fungi (AGF) represent a distinct basal phylum lineage (Neocallimastigomycota) commonly encountered in the rumen and alimentary tracts of herbivores. Survival and growth of anaerobic gut fungi in these anaerobic, eutrophic, and prokaryote-dominated habitats necessitates the acquisition of several traits absent in other fungal lineages. We assess here the role of horizontal gene transfer as a relatively fast mechanism for trait acquisition by the Neocallimastigomycota postsequestration in the herbivorous gut. Analysis of 27 transcriptomes that represent the broad diversity of Neocallimastigomycota identified 277 distinct HGT events, with subsequent gene duplication resulting in an HGT frequency of 2 to 3.5% in AGF genomes. These HGT events have allowed AGF to survive in the herbivorous gut by expanding their substrate utilization range, augmenting their biosynthetic pathway, providing new routes for electron disposal by expanding fermentative capacities, and facilitating their adaptation to anaerobiosis. HGT in the AGF is also shown to be mainly a cross-kingdom affair, with the majority of donors belonging to the bacteria. This study represents a unique example of the role of HGT in shaping the evolution of a high-rank taxonomic eukaryotic lineage.
INTRODUCTION
Horizontal gene transfer (HGT) is defined as the acquisition, integration, and retention of foreign genetic material into a recipient organism (1). HGT represents a relatively rapid process for trait acquisition, as opposed to gene creation either from preexisting genes (via duplication, fission, fusion, or exon shuffling) or through de novo gene birth from noncoding sequences (2–6). In prokaryotes, the occurrence, patterns, frequency, and impact of HGT on the genomic architecture (7), metabolic abilities (8, 9), physiological preferences (10, 11), and ecological fitness (12) has been widely investigated, and the process is now regarded as a major driver of genome evolution in bacteria and archaea (13, 14). Although eukaryotes are perceived to evolve principally through modifying existing genetic information, analysis of HGT events in eukaryotic genomes has been eliciting increasing interest and scrutiny. In spite of additional barriers that need to be overcome in eukaryotes, e.g., crossing the nuclear membrane, germ line sequestration in sexual multicellular eukaryotes, and epigenetic nucleic acid modifications mechanisms (5, 15), it is now widely accepted that HGT contributes significantly to eukaryotic genome evolution (16, 17). HGT events have convincingly been documented in multiple phylogenetically disparate eukaryotes ranging from the Excavata (18–21), SAR supergroup (22–25), algae (26), plants (27), and Opisthokonta (28–31). The reported HGT frequency in eukaryotic genomes ranges from a few genes (see, for example, reference 32) to up to 9.6% in bdelloid rotifers (30).
The kingdom Fungi represents a phylogenetically coherent clade that evolved ca. 900 to 1,481 million years ago from a unicellular flagellated ancestor (33–35). To date, multiple efforts have been reported on the detection and quantification of HGT in fungi. A survey of 60 fungal genomes reported HGT frequencies of 0 to 0.38% (29), and similar low values were observed in the genomes of five early-diverging pathogenic microsporidia and Cryptomycota (36). A recent study has documented the role of HGT in expanding the catabolic capabilities of members of the mycotrophic genus Trichoderma by extensive acquisition of plant biomass degradation capacities from plant-associated filamentous ascomycetes (37). The osmotrophic lifestyle of fungi (38) has typically been regarded as less conducive to HGT compared to the phagocytic lifestyle of several microeukaryotes with relatively higher HGT frequency (39).
The anaerobic gut fungi (AGF; Neocallimastigomycota) represent a phylogenetically distinct basal fungal lineage. The AGF appear to exhibit a restricted distribution pattern, being encountered in the gut of ruminant and nonruminant herbivorous (40). In the herbivorous gut, the life cycle of the AGF (see Fig. S1 in the supplemental material) involves the discharge of motile flagellated zoospores from sporangia in response to animal feeding, the chemotaxis and attachment of zoospores to ingested plant material, spore encystment, and the subsequent production of rhizoidal growth that penetrates and digests plant biomass through the production of a wide array of cellulolytic and lignocellulolytic enzymes.
Survival, colonization, and successful propagation of AGF in the herbivorous gut necessitate the acquisition of multiple unique physiological characteristics and metabolic abilities absent in other fungal lineages. These include, but are not limited to, development of a robust plant biomass degradation machinery, adaptation to anaerobiosis, and exclusive dependence on fermentation for energy generation and recycling of electron carriers (41, 42). Therefore, we hypothesized that sequestration into the herbivorous gut was conducive to the broad adoption of HGT as a relatively faster adaptive evolutionary strategy for niche adaptation by the AGF (Fig. S1). Further, since no part of the AGF life cycle occurs outside the animal host and no reservoir of AGF outside the herbivorous gut has been identified (40), acquisition would mainly occur from donors that are prevalent in the herbivorous gut (Fig. S1). Apart from earlier observations on the putative bacterial origin of a few catabolic genes in two AGF isolates (43, 44), and preliminary BLAST-based queries of a few genomes (42, 45), little is currently known on the patterns, determinants, and frequency of HGT in the Neocallimastigomycota. To address this hypothesis, we systematically evaluated the patterns of HGT acquisition in the transcriptomes of 27 AGF strains and 4 AGF genomes broadly covering the breadth of AGF genus-level diversity. Our results document the high level of HGT in AGF in contrast to HGT paucity across the fungal kingdom. The identity of genes transferred, distribution pattern of events across AGF genera, phylogenetic affiliation of donors, and the expansion of acquired genetic material in AGF genomes highlight the role played by HGT in forging the evolution and diversification of the Neocallimastigomycota as a phylogenetically, metabolically, and ecologically distinct lineage in the fungal kingdom.
RESULTS
Isolates.The transcriptomes of 22 different isolates were sequenced. These isolates belonged to six of the nine currently described AGF genera: Anaeromyces (n = 5), Caecomyces (n = 2), Neocallimastix (n = 2), Orpinomyces (n = 3), Pecoramyces (n = 4), and Piromyces (n = 4), as well as the recently proposed genus Feramyces (n = 2) (46) (Table 1, Fig. S3). Of the three AGF genera not included in this analysis, two are currently represented by a single strain that was either lost (genus Oontomyces [47]) or appears to exhibit an extremely limited geographic and animal host distribution (genus Buwchfawromyces [48]). The third unrepresented genus (Cyllamyces) has recently been suggested to be phylogenetically synonymous with Caecomyces (49). As such, the current collection is a broad representation of currently described AGF genera.
Neocallimastigomycota strains analyzed in this study
Sequencing.Transcriptomic sequencing yielded 15.2 to 110.8 million reads (average, 40.87) that were assembled into 31,021 to 178,809 total transcripts, 17,539 to 132,141 distinct transcripts (clustering at 95%), and 16,500 to 70,061 predicted peptides (average, 31,611) (Table S2). Assessment of transcriptome completion using BUSCO (50) yielded high values (82.76 to 97.24%) for all assemblies (Table S1). For strains with a sequenced genome, genome coverage (percentage of genes in a strain’s genome for which a transcript was identified) ranged between 70.9 and 91.4% (Table S2).
HGT events.A total of 12,786 orthologues with a nonfungal bit score of >100 and an HGT index of >30 were identified (Fig. 1). After removing orthologues occurring only in a single strain or in <50% of the isolates belonging to the same genus, 2,147 events were further evaluated. Phylogenetic analysis could not confirm the HGT nature (e.g., a single long branch that could either be attributed to HGT or gene loss in all other fungi, unstable phylogeny, and/or low bootstrap) of 1,863 orthologues and so were subsequently removed. Of the remaining 286 orthologues, 8 had suspiciously high (>90%) first hit amino acid identities. Although the relatively recent divergence and/or acquisition time could explain this high level of similarity, we opted to remove these orthologues as a safeguard against possible bacterial contamination of the transcriptomes. Of the remaining 278 orthologues, one was not inferred since horizontally transferred by the gene-species tree reconciliation softwares used. Ultimately, a total of 277 distinct HGT events that satisfied the criteria described above for HGT were identified (Table S3). The average number of events per genus was 220 ± 12.6 and ranged between 206 in the genus Orpinomyces to 237 in the genus Pecoramyces pantranscriptomes (Fig. 2A). The majority of HGT acquisition events identified (254, 91.7%) appear to be Neocallimastigomycota specific, i.e., identified only in genomes belonging to the Neocallimastigomycota but not in other basal fungal genomes (Table S4), strongly suggesting that such acquisitions occurred after, or concurrent with, the evolution of Neocallimastigomycota as a distinct fungal lineage. Also, the majority of these identified genes were Neocallimastigomycota-wide, being identified in strains belonging to at least six of the seven examined genera (196 events, 70.76%), suggesting the acquisition of such genes prior to genus level diversification within the Neocallimastigomycota. Only 30 events (10.83%) were genus specific, with the remainder (51 events, 18.4%) being identified in the transcriptomes of three to five genera (Table S4, Fig. S4, and Fig. 2B).
Workflow diagram describing the procedure employed for identification HGT events in Neocallimastigomycota data sets analyzed in this study.
(A) Distribution pattern of HGT events in AGF transcriptomes demonstrating that the majority of events were Neocallimastigomycota-wide, i.e., identified in all seven AGF genera examined. (B) Total number of HGT events identified per AGF genus.
The absolute majority (89.2%) of events were successfully mapped to at least one of the four AGF genomes (Table S5), with a fraction (7/30) of the unmapped transcripts being specific to a genus with no genome representative (Feramyces and Caecomyces). Compared to a random subset of 277 genes in each of the sequenced genomes, horizontally transferred genes in AGF genomes exhibited significantly (P < 0.0001) fewer introns (1.1 ± 0.31 versus 3.32 ± 0.83), as well as a higher GC content (31 ± 4.5 versus 27.7 ± 5.5) (Table S5). Further, HGT genes/pfam’s often displayed high levels of gene/pfam duplication and expansion within the genome (Table S5), resulting in an HGT frequency of 2.03% in Pecoramyces ruminantium (331 HGT genes of 16,347 total genes), 2.91% in Piromyces finnis (334 HGT genes of 11,477 total genes), 3.21% in Anaeromyces robustus (415 HGT genes out of 12,939 total genes), and 3.46% in Neocallimastix californiae (724 HGT genes of 20,939 total genes).
Donors.A bacterial origin was identified for the majority of HGT events (85.9%), with four bacterial phyla (Firmicutes, Proteobacteria, Bacteroidetes, and Spirochaetes) identified as donors for 169 events (61% of total, 71% of bacterial events) (Fig. 3A). Specifically, the contribution of members of the Firmicutes (119 events) was paramount, the majority of which were most closely affiliated with members of the order Clostridiales (93 events). In addition, minor contributions from a wide range of bacterial phyla were also identified (Fig. 3A). The majority of the putative donor taxa are strict/facultative anaerobes, and many of which are also known to be major inhabitants of the herbivorous gut and often possess polysaccharide-degradation capabilities (51, 52). Archaeal contributions to HGT were extremely rare (five events). On the other hand, multiple (i.e., 30) events with eukaryotic donors were identified. In a few instances, a clear nonfungal origin was identified for a specific event, but the precise inference of the donor based on phylogenetic analysis was not feasible (Table S4).
Identity of HGT donors and their contribution to the various functional classes. The x axis shows the absolute number of events belonging to each of the functional classes shown in the legend. The tree is intended to show the relationship between the donors’ taxa and is not drawn to scale. Bacterial donors are shown with red branches depicting the phylum level, with the exception of Firmicutes and Bacteroidetes donors, where the order level is shown, and Proteobacteria, where the class level is shown. Archaeal donors are shown with green branches and all belonged to the Methanobacteriales order of Euryarchaeota. Eukaryotic donors are shown with blue branches. Only the 230 events from a definitive-taxon donor are shown in the figure. The other 53 events were clearly nested within a nonfungal clade, but a definitive donor taxon could not be ascertained. Functional classification of the HGT events, determined by searching the Conserved Domain server (106) against the COG database are shown in panel B. For events with no COG classification, a search against the KEGG orthology database (107) was performed. For the major COG/KEGG categories (metabolism, cellular processes, and signaling, and information storage and processing), subclassifications are shown in panels C, D, and E, respectively.
Metabolic characterization.Functional annotation of HGT genes/pfams indicated that the majority (63.9%) of events encode metabolic functions such as extracellular polysaccharide degradation and central metabolic processes. Bacterial donors were slightly overrepresented in metabolic HGT events (87.5% of the metabolism-related events compared to 85.9% of the total events). Genes involved in cellular processes and signaling represent the second most represented HGT events (11.19%), while genes involved in information storage and processing only made up 4.69% of the HGT events identified (Fig. 3B to E). Below we present a detailed description of the putative abilities and functions enabled by HGT transfer events.
Central catabolic abilities.Multiple HGT events encoding various central catabolic processes were identified in AGF transcriptomes and successfully mapped to the genomes (Fig. 4, Table S4, and Fig. S5 to S16). A group of events appears to encode enzymes that allow AGF to channel specific substrates into central metabolic pathways. For example, genes encoding enzymes of the Leloir pathway for galactose conversion to glucose-1-phosphate (galactose-1-epimerase, galactokinase [Fig. 5A], and galactose-1-phosphate uridylyltransferase) were identified, in addition to genes encoding ribokinase, as well as xylose isomerase and xylulokinase for ribose and xylose channeling into the pentose phosphate pathway. In addition, genes encoding deoxyribose-phosphate aldolase (DeoC) enabling the utilization of purines as carbon and energy sources were also horizontally acquired in AGF. Further, several of the glycolysis/gluconeogenesis genes, e.g., phosphoenolpyruvate synthase, as well as phosphoglycerate mutase, were also of bacterial origin. Fungal homologs of these glycolysis/gluconeogenesis genes were not identified in the AGF transcriptomes and genomes, suggesting the occurrence of xenologous replacement HGT events.
HGT impact on AGF central metabolic abilities. Pathways for sugar metabolism are highlighted in blue, pathways for amino acid metabolism are highlighted in red, pathways for cofactor metabolism are highlighted in green, pathways for nucleotide metabolism are highlighted in gray, pathways for lipid metabolism are highlighted in orange, fermentation pathways are highlighted in purple, while pathways for detoxification are highlighted in brown. The double black lines depict the hydrogenosomal outer and inner membrane. Arrows corresponding to enzymes encoded by horizontally transferred transcripts are shown with thicker dotted lines and are given numbers 1 through 46 as follows. Sugar metabolism (1 to 9): 1, xylose isomerase; 2, xylulokinase; 3, ribokinase; 4, 2,3-bisphosphoglycerate-independent phosphoglycerate mutase; 5, 2,3-bisphosphoglycerate-dependent phosphoglycerate mutase; 6, phosphoenolpyruvate synthase; 7, aldose-1-epimerase; 8, galactokinase; 9, galactose-1-phosphate uridyltransferase. Amino acid metabolism (10 to 18): 10, aspartate-ammonia ligase; 11, tryptophan synthase (TrpB); 12, tryptophanase; 13, monofunctional prephenate dehydratase; 14, serine-o-acetyltransferase; 15, cysteine synthase; 16, low-specificity threonine aldolase; 17, 5′-methylthioadenosine nucleosidase/5′-methylthioadenosine phosphorylase (MTA phosphorylase); 18, arginase. Cofactor metabolism (19 to 26): 19, pyridoxamine 5′-phosphate oxidase; 20, l-aspartate oxidase (NadB); 21, quinolate synthase (NadA); 22, NH3-dependent NAD+ synthetase (NadE); 23, 2-dehydropantoate 2-reductase; 24, dephospho-CoA kinase; 25, dihydrofolate reductase (DHFR) family; 26, dihydropteroate synthase. Nucleotide metabolism (27 to 34): 27, GMP reductase; 28, trifunctional nucleotide phosphoesterase; 29, deoxyribose-phosphate aldolase (DeoC); 30, oxygen-sensitive ribonucleoside-triphosphate reductase class III (NrdD); 31, nucleoside/nucleotide kinase family protein; 32, cytidylate kinase-like family; 33, thymidylate synthase; 34, thymidine kinase. Pyruvate metabolism (fermentation pathways) (35 to 39): 35, d-lactate dehydrogenase; 36, bifunctional aldehyde/alcohol dehydrogenase family of Fe-alcohol dehydrogenase; 37, butanol dehydrogenase family of Fe-alcohol dehydrogenase; 38, Zn-type alcohol dehydrogenase; 39, Fe-only hydrogenase. Detoxification reactions (40 to 43): 40, phosphoglycolate phosphatase; 41, glyoxal reductase; 42, glyoxalase I; 43, glyoxalase II. Lipid metabolism (44 or 46): 44, CDP-diacylglycerol–serine O-phosphatidyltransferase; 45, lysophospholipid acyltransferase LPEAT; 46, methylene-fatty-acyl-phospholipid synthase. Following the numbers, between parentheses, the distribution of the specific event across AGF genera is shown where (all) indicates the event was detected in all seven genera, while a minus sign followed by a genus indicates that the event was detected in all but that/those genus/genera. Genera are represented by letters as follows: A, Anaeromyces; C, Caecomyces; F, Feramyces, N, Neocallimastix, O, Orpinomyces; Pe, Pecoramyces; Pi, Piromyces. Abbreviations: CDP-DAG, CDP-diacylglycerol; 7,8 DHF, 7,8-dihydrofolate; EthA, ethanolamine; Gal, galactose; GAP, glyceraldehyde-3-P; Glu, glucose; GSH, glutathione; I, complex I NADH dehydrogenase; NaMN, nicotinate d-ribonucleotide; Orn, ornithine; PEP, phosphoenol pyruvate; Phenyl-pyr, phenylpyruvate; PRPP, phosphoribosyl-pyrophosphate; Ptd, phosphatidyl; SAM; S-adenosylmethionine; THF, tetrahydrofolate.
(A) Maximum-likelihood tree showing the phylogenetic affiliation of AGF galactokinase. AGF genes highlighted in light blue clustered within the Flavobacteriales order of the Bacteroidetes phylum and were clearly nested within the bacterial domain (highlighted in green) attesting to their nonfungal origin. Fungal galactokinase representatives are highlighted in pink. (B) Maximum-likelihood tree showing the phylogenetic affiliation of AGF Fe-only hydrogenase. AGF genes highlighted in light blue clustered within the Thermotogae phylum and were clearly nested within the bacterial domain (highlighted in green) attesting to their nonfungal origin. Stygiella incarcerata (anaerobic Jakobidae) clustered with the Thermotogae as well, as has recently been suggested (55). Fe-only hydrogenases from Gonopodya prolifera (Chytridiomycota) (shown in orange text) clustered with the AGF genes. This is an example of one of the rare occasions (n = 24) where a non-AGF basal fungal representative showed an HGT pattern with the same donor affiliation as the Neocallimastigomycota. Other basal fungal Fe-only hydrogenase representatives are highlighted in pink and clustered outside the bacterial domain. (C) Maximum-likelihood tree showing the phylogenetic affiliation of AGF l-aspartate oxidase (NadB). AGF genes highlighted in light blue clustered within the Deltaproteobacteria class and were clearly nested within the bacterial domain (highlighted in green) attesting to their nonfungal origin. Since de novo NAD synthesis in fungi usually follows the five-enzyme pathway starting from tryptophan, as opposed to the two-enzyme pathway from aspartate, no NadB was found in non-AGF fungi, and hence no fungal cluster is shown in the tree. (D) Maximum-likelihood tree showing the phylogenetic affiliation of AGF oxygen-sensitive ribonucleotide reductase (NrdD). AGF genes highlighted in light blue clustered with representatives from the candidate phylum Dependentiae and were clearly nested within the bacterial domain (highlighted in green) attesting to their nonfungal origin. Fungal NrdD representatives are highlighted in pink. GenBank accession numbers are shown in parentheses. Alignment was done using the standalone MAFFT aligner (94), and trees were constructed using IQ-TREE (95).
In addition to broadening the substrate range, HGT acquisitions provided additional venues for recycling reduced electron carriers via new fermentative pathways in this strictly anaerobic and fermentative lineage. The production of ethanol, d-lactate, and hydrogen appears to be enabled by HGT (Fig. 4). The acquisition of several aldehyde/alcohol dehydrogenases and of d-lactate dehydrogenase for ethanol and lactate production from pyruvate was identified. Although these two enzymes are encoded in other fungi as part of their fermentative capacity (e.g., Saccharomyces and Schizosaccharomyces), no homologs of these fungal genes were identified in AGF pantranscriptomes. Hydrogen production in AGF, as well as in many anaerobic eukaryotes with mitochondrion-related organelles (e.g., hydrogenosomes and mitosomes), involves pyruvate decarboxylation to acetyl coenzyme A (acetyl-CoA), followed by the use of electrons generated for hydrogen formation via an anaerobic Fe-Fe hydrogenase (42, 53, 54). In AGF, while enzymes for pyruvate decarboxylation to acetyl-CoA (pyruvate-formate lyase) and the subsequent production of acetate in the hydrogenosome (via acetyl-CoA:succinyl transferase) appear to be of fungal origin, the Fe-Fe hydrogenase and its entire maturation machinery (HydEFG) seem to be horizontally transferred being phylogenetically affiliated with similar enzymes in Thermotogae, Clostridiales, and the anaerobic jakobid excavate Stygiella incarcerate (Fig. 5B). It has recently been suggested that Stygiella acquired the Fe-Fe hydrogenase and its maturation machinery from bacterial donors, including Thermotogae, Firmicutes, and Spirochaetes (55), suggesting either a single early acquisition event in eukaryotes or, alternatively, independent events for the same group of genes have occurred in different eukaryotes. With the exception of the Fe-Fe hydrogenase and its maturation machinery, no other hydrogenosomally destined proteins (see the list in reference 42) were identified as horizontally transferred in this study. These results collectively suggest that HGT did not play a role in the evolution of hydrogenosomes in AGF and reinforces the proposed mitochondrial origin of hydrogenosomes through reductive evolution (54).
Anabolic capabilities.Multiple anabolic genes that expanded AGF biosynthetic capacities appear to be horizontally transferred (Fig. S17 to S30). These include several amino acid biosynthesis genes, e.g., cysteine biosynthesis from serine, glycine and threonine interconversion, and asparagine synthesis from aspartate. In addition, horizontal gene transfer allowed AGF to de novo synthesize NAD via the bacterial pathway (starting from aspartate via l-aspartate oxidase [NadB; Fig. 5C] and quinolinate synthase [NadA] rather than the five-enzyme fungal pathway starting from tryptophan [56]). HGT also allowed AGF to salvage thiamine via the acquisition of phosphomethylpyrimidine kinase. In addition, several genes encoding enzymes in purine and pyrimidine biosynthesis were horizontally transferred (Fig. 4). Finally, horizontal gene transfer allowed AGF to synthesize phosphatidyl-serine from CDP-diacylglycerol and to convert phosphatidylethanolamine to phosphatidylcholine.
Adaptation to the host environment.Horizontal gene transfer also appears to have provided means of guarding against toxic levels of compounds known to occur in the host animal gut (Fig. S31 to S37). For example, methylglyoxal, a reactive electrophilic species (57), is inevitably produced by ruminal bacteria from dihydroxyacetone phosphate when experiencing growth conditions with excess sugar and limiting nitrogen (58). Genes encoding enzymes mediating methylglyoxal conversion to d-lactate (glyoxalase I- and glyoxalase II-encoding genes) appear to be acquired via HGT in AGF. Further, HGT allowed several means of adaptation to anaerobiosis. These include (i) acquisition of the oxygen-sensitive ribonucleoside-triphosphate reductase class III (Fig. 5D), which is known to only function during anaerobiosis to convert ribonucleotides to deoxyribonucleotides (59); (ii) acquisition of squalene-hopene cyclase, which catalyzes the cyclization of squalene into hopene, an essential step in biosynthesis of the cell membrane steroid tetrahymanol that replaced the molecular O2-requiring ergosterol in the cell membranes of AGF; and (iii) acquisition of several enzymes in the oxidative stress machinery, including Fe/Mn superoxide dismutase, glutathione peroxidase, rubredoxin/rubrerythrin, and alkylhydroperoxidase.
In addition to anaerobiosis, multiple horizontally transferred general stress and repair enzymes were identified (Fig. S38 to S45). HGT-acquired genes encoding 2-phosphoglycolate phosphatase, known to metabolize the 2-phosphoglycolate produced in the repair of DNA lesions induced by oxidative stress (60) to glycolate, were identified in all AGF transcriptomes studied (Fig. 4, Table S4). Surprisingly, two genes encoding antibiotic resistance enzymes, chloramphenicol acetyltransferase and aminoglycoside phosphotransferase, were identified in all AGF transcriptomes, presumably to improve its fitness in the eutrophic rumen habitat that harbors antibiotic-producing prokaryotes (Table S4). While unusual for eukaryotes to express antibiotic resistance genes, basal fungi such as Allomyces, Batrachochytrium, and Blastocladiella were shown to be susceptible to chloramphenicol and streptomycin (61, 62). Other horizontally transferred repair enzymes include DNA–3-methyladenine glycosylase I, methylated-DNA–protein-cysteine methyltransferase, galactoside, and maltose O-acetyltransferase, and methionine-R-sulfoxide reductase (Table S4).
HGT transfer in AGF carbohydrate-active enzyme machinery.Within the analyzed AGF transcriptomes, carbohydrate-active enzymes (CAZymes) belonging to 39 glycoside hydrolase (GH), 5 polysaccharide lyase (PL), and 10 carbohydrate esterase (CE) families were identified (Fig. 6). The composition of the CAZymes of various AGF strains examined were broadly similar, with the following ten notable exceptions: the presence of GH24 and GH78 transcripts only in Anaeromyces and Orpinomyces; the presence of GH28 transcripts only in Pecoramyces, Neocallimastix, and Orpinomyces; the presence of GH30 transcripts only in Anaeromyces and Neocallimastix; the presence of GH36 and GH95 transcripts only in Anaeromyces, Neocallimastix, and Orpinomyces; the presence of GH97 transcripts only in Neocallimastix and Feramyces; the presence of GH108 transcripts only in Neocallimastix and Piromyces; and the presence of GH37 predominantly in Neocallimastix, GH57 transcripts predominantly in Orpinomyces, GH76 transcripts predominantly in Feramyces, and CE7 transcripts predominantly in Anaeromyces (Fig. 6).
HGT in the AGF CAZyome shown across the seven genera studied. Glycosyl hydrolase (GH), carboxyl esterase (CE), and polysaccharide lyase (PL) families are shown to the left. The color of the cells depicts the prevalence of HGT within each family. Red indicates that 100% of the CAZyme transcripts were horizontally transferred. Shades of red-orange indicate that HGT contributed to >50% of the transcripts belonging to that CAZy family. Dark blue indicates that 100% of the CAZyme transcripts were of fungal origin. Shades of blue indicate that HGT contributed to <50% of the transcripts belonging to that CAZy family. The numbers in each cell indicate the affiliation of the HGT donor as shown in the key to the right.
HGT appears to be rampant in the collective repertoire of CAZymes examined (pan-CAZyome): a total of 72 events (26% of total HGT events) were identified, with 40.3% occurring in at least six of the seven AGF genera examined (Fig. 6, Table S4). In 48.7% of GH families, 50% of CE families, and 40% of PL families, a single event (i.e., attributed to one donor) was observed (Fig. 6, Table S4).
Duplication of these events in AGF genomes was notable, with 132, 310, 156, and 130 copies of HGT CAZyme pfam’s identified in Anaeromyces, Neocallimastix, Piromyces, and Pecoramyces genomes, representing 33.59, 36.77, 40.41, and 24.62% of the overall organismal CAZyme machinery (Table S5). The contribution of Viridiplantae, Fibrobacteres, and Gammaproteobacteria was either exclusive to CAZyme-related HGT events or significantly higher in CAZyme compared to other events (Fig. 3A).
Transcripts acquired by HGT represented >50% of transcripts in anywhere between 13 (Caecomyces) to 20 (Anaeromyces) GH families; 3 (Caecomyces) to 5 (Anaeromyces, Neocallimastix, Orpinomyces, and Feramyces) CE families; and 2 (Caecomyces and Feramyces) to 3 (Anaeromyces, Pecoramyces, Piromyces, Neocallimastix, and Orpinomyces) PL families (Fig. 6). It is important to note that in all these families, multiple transcripts appeared to be of bacterial origin based on BLAST similarity search but did not meet the strict criteria implemented for HGT determination in this study. As such, the contribution of HGT transcripts to overall transcripts in these families is probably an underestimate. Only GH9, GH20, GH37, GH45, and PL3 families appear to lack any detectable HGT events. A principal-component analysis biplot comparing CAZyomes in AGF genomes to other basal fungal lineages strongly suggests that the acquisition and expansion of many of these foreign genes play an important role in shaping the lignocellulolytic machinery of AGF (Fig. 7). The majority of CAZyme families defining AGF CAZyome were predominantly of nonfungal origin (Fig. 7). This pattern clearly attests to the value of HGT in shaping AGF CAZyome via acquisition and extensive duplication of acquired gene families.
Principal-component analysis biplot of the distribution of CAZy families in AGF genomes (solid stars), compared to representatives of other basal fungi belonging to the Mucoromycotina (solid hexagons), Chytridiomycota (open circles), Blastocladiomycota (solid boxes), Entomophthoromycotina (ovals), Mortierellomycotina (open hexagons), Glomeromycota (“+” signs), Kickxellomycotina (open boxes), and Zoopagomycotina (“×” signs). CAZy families are shown as colored dots. The color code used was as follows: green, CAZy families that are absent from AGF genomes; black, CAZy families present in AGF genomes and with an entirely fungal origin; blue, CAZy families present in AGF genomes and for which HGT contributed to <50% of the transcripts in the examined transcriptomes; red, CAZy families present in AGF genomes and for which HGT contributed to >50% of the transcripts in the examined transcriptomes. The majority of CAZyme families defining the AGF CAZyome were predominantly of nonfungal origin (red and blue dots).
Collectively, HGT had a profound impact on AGF plant biomass degradation capabilities, as recently proposed (63). The AGF CAZyome encodes enzymes putatively mediating the degradation of twelve different polysaccharides (Fig. S46). In all instances, GH and PL families with >50% horizontally transferred transcripts contributed to backbone cleavage of these polymers; although in many polymers, e.g., cellulose, glucoarabinoxylan, and rhamnogalactouronan, multiple different GHs can contribute to backbone cleavage. Similarly, GH, CE, and PL families with >50% horizontally transferred transcripts contributed to 10 of 13 side chain-cleaving activities, and 3 of 5 oligomer-to-monomer breakdown activities (Fig. S46).
DISCUSSION
Here, we present a systematic analysis of HGT patterns in 27 transcriptomes and 4 genomes belonging to the Neocallimastigomycota. Our analysis identified 277 events, representing 2 to 3.46% of genes in examined AGF genomes. Further, we consider these values to be conservative estimates due to the highly stringent criteria and employed. Only events with a horizontal gene transfer (HGT) index (hU) of >30 were considered, and all putative events were further subjected to manual inspection, phylogenetic tree construction, and gene-species tree reconciliation analysis to confirm incongruence with organismal evolution and bootstrap-supported affiliation to donor lineages. Further, events identified in <50% of strains in a specific genus were excluded, and parametric gene composition approaches were implemented in conjunction with sequence-based analysis.
Multiple factors could be postulated to account for the observed high HGT frequency in AGF. The sequestration of AGF into the anaerobic, prokaryote-dominated herbivorous gut necessitated the implementation of the relatively faster adaptive mechanisms for survival in this new environment, as opposed to the slower strategies of neofunctionalization and gene birth. Indeed, niche adaptation and habitat diversification events are widely considered important drivers for HGT in eukaryotes (16, 23, 26, 37, 64). Further, AGF are constantly exposed to a rich milieu of cells and degraded DNA in the herbivorous gut. Such close physical proximity between donors/extracellular DNA and recipients is also known to greatly facilitate HGT (65–67). Finally, AGF release asexual motile free zoospores into the herbivorous gut as part of their life cycle (40). According to the weak-link model (68), these weakly protected and exposed structures provide excellent entry point of foreign DNA to eukaryotic genomes. It is important to note that AGF zoospores also appear to be naturally competent, capable of readily taking up nucleic acids from their surrounding environment (69).
The anaerobic gut fungi have a notoriously low GC content, ranging between 13 and 20%. It has previously been postulated that this low GC content is due to genetic drift (42) triggered by the low effective population sizes, bottlenecks in vertical transmission, and the asexual life style of anaerobic fungi. As such, the low GC content is an additional consequence of AGF sequestration in the herbivorous gut. Whether the low GC content in AGF played a role in facilitating HGT is currently unclear. It is worth mentioning, however, that the majority of AGF donors identified in this study are members of the bacterial order Clostridiales, many of which have relatively low-GC-content genomes.
The distribution of HGT events across various AGF taxa (Fig. 2), identities of HGT donors (Fig. 3), and abilities imparted (Fig. 4 and 5) could offer important clues regarding the timing and impact of HGT on Neocallimastigomycota evolution. The majority of events (70.76%) were Neocallimastigomycota-wide and were mostly acquired from lineages known to inhabit the herbivorous gut, e.g., Firmicutes, Proteobacteria, Bacteroidetes, and Spirochaetes (Fig. 2 and 3). This pattern strongly suggests that such acquisitions occurred after (or concurrent with) AGF sequestration into the herbivorous gut but prior to AGF genus-level diversification. Many of the functions encoded by these events represented novel functional acquisitions that impart new abilities, e.g., galactose metabolism, methyl glyoxal detoxification, pyruvate fermentation to d-lactate and ethanol, and chloramphenicol resistance (Fig. 3). Others represented acquisition of novel genes or pfam’s augmenting existing capabilities within the AGF genomes, e.g., acquisition of GH5 cellulases to augment the fungal GH45, acquisition of additional GH1 and GH3 β-gluco- and galactosidases to augment similar enzymes of apparent fungal origin in AGF genomes (Fig. 6 and 7; see also Fig. S46 in the supplemental material). Novel functional acquisition events enabled AGF to survive and colonize the herbivorous gut by (i) expanding substrate-degradation capabilities (Fig. 5A, 6, and 7; see also Fig. S5 to S17 and Table S4), hence improving fitness by maximizing carbon and energy acquisition from available plant substrates; (ii) providing additional venues for electron disposal via lactate, ethanol, and hydrogen production; and (iii) enabling adaptation to anaerobiosis (Fig. 4; see also Fig. S32 to S38 and Table S4).
A smaller number of observed events (n = 30) were genus-specific (Fig. 2; Table S4). This group was characterized by being significantly enriched in CAZymes (56.7% of genus-specific horizontally transferred events have a predicted CAZyme function, as opposed to 26% in the overall HGT data set), and being almost exclusively acquired from donors that are known to inhabit the herbivorous gut (70) (25 of the 30 events were acquired from the orders Clostridiales, Bacillales, and Lactobacillales within Firmicutes; Burkholderiales within the Betaproteobacteria; Flavobacteriales; and Bacteroidales within Bacteroidetes; and the Spirochaetes, Actinobacteria, and Lentisphaerae) or from Viridiplantae (4 of the 30 events). Such pattern suggests the occurrence of these events relatively recently in the herbivorous gut post AGF genus level diversification. A recent study also highlighted the role of HGT in complementing the CAZyme machinery of Piromyces sp. strain E2 (63). We reason that the lower frequency of such events is a reflection of the relaxed pressure for acquisition and retention of foreign genes at this stage of AGF evolution.
Gene acquisition by HGT necessitates physical contact between donor and recipient organisms. Many of the HGT acquired traits by AGF are acquired from prokaryotes that are prevalent in the herbivorous gut microbiota (Fig. 3). However, since many of these traits are absolutely necessary for survival in the gut, the establishment of AGF ancestors in this seemingly inhospitable habitat is, theoretically, unfeasible. This dilemma is common to all HGT processes enabling niche adaptation and habitat diversification (22). We put forth two evolutionary scenarios that could explain this dilemma not only for AGF but also for other gut-dwelling anaerobic microeukaryotes, e.g., Giardia, Blastocystis, and Entamoeba, where HGT was shown to play a vital role in enabling survival in anaerobic conditions (22, 71, 72). The first is a coevolution scenario in which the progressive evolution of the mammalian gut from a short and predominantly aerobic structure characteristic of carnivores/insectivores to the longer, more complex, and compartmentalized structure encountered in herbivores was associated with a parallel progressive and stepwise acquisition of genes required for plant polymer metabolism and anaerobiosis by AGF ancestors, hence ensuring its survival and establishment in the current herbivorous gut. The second possibility is that AGF ancestors were indeed acquired into a complex and anaerobic herbivorous gut but initially represented an extremely minor component of the gut microbiome and survived in locations with relatively higher oxygen concentrations in the alimentary tract, e.g., the mouth, saliva, or esophagus or in microniches in the rumen, where transient oxygen exposure occurs. Subsequently, HGT acquisition has enabled the expansion of their niche, improving their competitiveness and their relative abundance in the herbivorous gut to the current levels.
In conclusion, our survey of HGT in AGF acquisition demonstrates that the process is absolutely crucial for the survival and growth of AGF in its unique habitat. This is not only reflected in the large number of events, the massive duplication of acquired genes, and the overall high HGT frequency observed in AGF genomes but also in the nature of abilities imparted by the process. HGT events not only facilitated AGF adaptation to anaerobiosis but also allowed them to drastically improve their polysaccharide degradation capacities, provide new venues for electron disposal via fermentation, and acquire new biosynthetic abilities. As such, we reason that the process should not merely be regarded as a conduit for supplemental acquisition of few additional beneficial traits. Rather, we posit that HGT enabled AGF to forge a new evolutionary trajectory, resulting in Neocallimastigomycota sequestration, evolution as a distinct fungal lineage in the fungal tree of life, and subsequent genus- and species-level diversification. This provides an excellent example of the role of HGT in forging the formation of high rank taxonomic lineages during eukaryotic evolution.
MATERIALS AND METHODS
Organisms.Type strains of the Neocallimastigomycota are unavailable through culture collections due to their strict anaerobic and fastidious nature, as well as the frequent occurrence of senescence in AGF strains (73). As such, obtaining a broad representation of the Neocallimastigomycota necessitated the isolation of representatives of various AGF genera de novo. Samples were obtained from the feces, rumen, or digesta of domesticated and wild herbivores around the city of Stillwater, OK, and Val Verde County, TX (Table 1). Samples were immediately transferred to the laboratory, and the isolation procedures usually commenced within 24 h of collection. A second round of isolation was occasionally conducted on samples stored at −20°C for several weeks (Table 1).
Isolation was performed using a rumen fluid medium reduced by cysteine-sulfide, supplemented with a mixture of kanamycin, penicillin, streptomycin, and chloramphenicol (50, 50, 20, and 50 μg/ml, respectively), and dispensed under a stream of 100% CO2 (42, 74). All media were prepared according to the Hungate technique (75), as modified by Balch and Wolfe (76). Cellulose (0.5%) or a mixture of switchgrass (0.5%) and cellobiose (0.5%) was used as a carbon source. Samples were serially diluted and incubated at 39°C for 24 to 48 h. Colonies were obtained from dilutions showing visible signs of fungal growth using the roll tube technique (77). Colonies obtained were inoculated into liquid media, and a second round of isolation and colony picking was conducted to ensure culture purity. Microscopic examination of thallus growth pattern, rhizoid morphology, and zoospore flagellation, as well LSU rRNA gene D1-D2 domain amplification and sequencing, was employed to determine the genus-level affiliation of all isolates (74). Isolates were maintained and routinely subcultured on rumen fluid medium supplemented with antibiotics (to guard against accidental bacterial contamination) and stored on agar media as described previously (42, 69).
RNA extraction, sequencing, and assembly.Transcriptomic sequencing was conducted for 22 AGF strains. Sequencing multiple taxa provides stronger evidence for the occurrence of HGT in a target lineage (78) and allows for the identification of phylum-wide versus genus- and species-specific HGT events. Transcriptomic, rather than genomic, sequencing was chosen for AGF-wide HGT identification efforts since enrichment for polyadenylated [poly(A)] transcripts prior to transcriptome sequencing (RNA-seq) provides a built-in safeguard against possible prokaryotic contamination, an issue that often plagued eukaryotic genome-based HGT detection efforts (79, 80), as well as to demonstrate that HGT genes identified are transcribed in AGF. Further, sequencing and assembly of a large number of Neocallimastigomycota genomes is challenging due to the extremely high AT content in intergenic regions and the extensive proliferation of microsatellite repeats, often necessitating employing multiple sequencing technologies for successful genomic assembly (42, 45).
Cultures for RNA extraction were grown in rumen fluid medium with cellobiose as the sole carbon source. RNA extraction was conducted on late log/early stationary-phase cultures (approximately 48 to 60 h postinoculation, depending on strain’s growth characteristics) as described previously (81). Briefly, fungal biomass was obtained by vacuum filtration and grounded with a pestle under liquid nitrogen. RNA was extracted using Epicentre MasterPure yeast RNA purification kit (Epicentre, Madison, WI) and stored in RNase-free Tris-EDTA buffer. Transcriptomic sequencing using Illumina HiSeq2500 2 × 150bp paired-end technology was conducted using the services of a commercial provider (Novogene Corporation, Beijing, China).
RNA-Seq reads were assembled by the de novo transcriptomic assembly program Trinity (82) using previously established protocols (83). All settings were implemented according to the recommended protocol for fungal genomes, with the exception of the absence of the “–jaccard_clip” flag due to the low gene density of anaerobic fungal genomes. The assembly process was conducted on the Oklahoma State University High Performance Computing Cluster as well as the XSEDE HPC Bridges at the Pittsburg Super Computing Center. Quantitative levels for all assembled transcripts were determined using Bowtie2 (84). The program Kallisto was used for quantification and normalization of the gene expression of the transcriptomes (85). All final peptide models predicted were annotated using the Trinotate platform with a combination of homology-based search using BLAST+, domain identification was accomplished using hmmscan and the Pfam 30.0 database 19 (86), and cellular localization was done using SignalP 4.0 (87). The 22 transcriptomes sequenced in this effort, as well as previously published transcriptomic data sets from Pecoramyces ruminantium (42), Piromyces finnis, Piromyces sp. E2, Anaeromyces robustus, and Neocallimastix californiae (45), were examined. In each data set, redundant transcripts were grouped into clusters using CD-HIT-EST with identity parameter of 95% (–c 0.95). The obtained nonredundant transcripts from each analyzed transcriptome were subsequently used for peptide and coding sequence prediction using the TransDecoder with a minimum peptide length of 100 amino acids (http://transdecoder.github.io). Assessment of transcriptome completeness per strain was conducted using BUSCO (50) with the Fungi data set.
HGT identification.A combination of BLAST similarity searches, comparative similarity index (HGT index, hU), and phylogenetic analyses were conducted to identify HGT events in the analyzed transcriptomic data sets (Fig. 1). We define an HGT event as the acquisition of a foreign gene/pfam by AGF from a single lineage/donor. All predicted peptides were queried against UniProt databases (downloaded May 2017), each containing both reviewed (Swiss-Prot) and unreviewed (TrEMBL) sequences. The databases encompassed nine different phylogenetic groups; Bacteria, Archaea, Viridiplantae, Opisthokonta-Chaonoflagellida, Opisthokonta-Fungi (without Neocallimastigomycota representatives), Opisthokonta-Metazoa, the Opisthokonta-Nucleariidae and Fonticula group, all other Opisthokonta, and all other non-Opisthokonta, non-Viridiplantae Eukaryota. For each peptide sequence, the bit score threshold and HGT index hU (calculated as the difference between the bit-scores of the best nonfungal and the best Dikarya fungal matches) were determined. Peptide sequences that satisfied the criteria of having a BLASTP bit-score against a nonfungal database that was >100 (i.e., 2−100 chance of random observation) and an HGT index hU that was ≥30 were considered HGT candidates and subjected to additional phylogenetic analysis. We chose to work with bit-score rather than the raw scores since the bit-score measures sequence similarity independent of query sequence length and database size. This is essential when comparing hits from databases with different sizes (for example, the Bacteria database contained 83 million sequences while the Choanoflagellida database contained 21 thousand sequences). We chose an hU value of ≥30 (a difference of bit-score of at least 30 between the best nonfungal hit and the best fungal hit to an AGF sequence) previously suggested and validated (88, 89) as the best trade-off between sensitivity and specificity. Since the bit-score is a logarithmic value that describes sequence similarity, a bit-score >30 ensure that the sequence aligned much better to the nonfungal hit than it did to the fungal hit.
The identified HGT candidates were modified by removing all CAZyme-encoding sequences (due to their multimodular nature [see below]) and further clustered into orthologues using OrthoMCL (90). Orthologues obtained were subjected to detailed phylogenetic analysis to confirm HGT occurrence, as well as to determine the potential donor. Each orthologue was queried against the nr database using web BLASTP (91) under two different settings: once against the full nr database and once against the Fungi (taxonomy ID 4751) excluding the Neocallimastigomycetes (taxonomy ID 451455). The first 250 hits obtained using these two BLASTP searches with an E value below e−10 were downloaded and combined in one fasta file. To remove redundancies, the downloaded sequences were crudely aligned using the standalone Clustal Omega (92), and the alignments were used to generate phylogenetic trees in FastTree under the LG model (93). Produced trees were visualized in FigTree, and the groups of sequences that clustered together with very short branches were identified. Perl scripts were then used to remove these redundant sequences from the original fasta files (leaving just one representative). The resulting nonredundant fasta files were used for subsequent analysis. AGF and reference sequences were aligned using MAFFT multiple sequence aligner (94), and alignments were masked for sites with >50% alignment gaps using the Mask Alignment Tool in Geneious 10.2.3. Masked alignments were then used in IQ-TREE (95) to first predict the best amino acid substitution model (based on the lowest BIC criteria) and to generate maximum likelihood trees under the predicted best model. Both the “–alrt 1000” option for performing the Shimodaira-Hasegawa approximate-likelihood ratio test (SH-aLRT) and the “–bb 1000” option for ultrafast bootstrap (UFB) (96) were added to the IQ-TREE command line. This resulted in the generation of phylogenetic trees with two support values (SH-aLRT and UFB) on each branch. Candidates that showed a nested phylogenetic affiliation that was incongruent to organismal phylogeny with strong SH-aLRT and UFB supports were deemed horizontally transferred. As a final confirmatory step, each tree generated was also reconciled against a species tree (constructed using the large ribosomal subunit L3 protein) using the programs Ranger-DTL (97) and NOTUNG (98) to infer transfer events at the node where AGF taxa clustered with a phylogenetically incongruent donor.
Identification of HGT events in carbohydrate-active enzyme transcripts.In AGF genomes, carbohydrate-active enzymes (CAZymes) are often encoded by large multimodule genes with multiple adjacent CAZyme or non-CAZyme domains (42, 45). A single gene can hence harbor multiple CAZyme pfam’s of different (fungal or nonfungal) origins (42, 45). As such, our initial efforts for HGT assessment in CAZyme-encoding transcripts using an entire gene/transcript strategy yielded inaccurate results since similarity searches only identified pfam’s with the lowest E value or highest number of copies, while overlooking additional CAZyme pfam’s in the transcripts (Fig. S2). To circumvent the multimodular nature of AGF CAZyme transcripts, we opted for the identification of CAZyme HGT events on trimmed domains, rather than entire transcript. CAZyme-containing transcripts (glycoside hydrolases [GHs], polysaccharide lyases [PLs], and carbohydrate esterases [CEs]) were first identified by searching the entire transcriptomic data sets against the dbCAN hidden Markov models V5 (99) (downloaded from the dbCAN web server in September 2016) using the command hmmscan in standalone HMMER. For each CAZy family identified, the predicted peptides across all transcriptomic data sets were grouped into one fasta file that was then amended with the corresponding Pfam seed sequences (downloaded from the Pfam website [http://pfam.xfam.org/] in March 2017). Sequences were aligned using the standalone Clustal Omega (92) to their corresponding Pfam seeds. Using the Pfam seed sequences as a guide for the start and end of the domain, aligned sequences were then truncated in Jalview (100). Truncated transcripts with an identified CAZy domain were again compared to the pfam database (101) using hmmscan (102) to ensure correct assignment to CAZy families and accurate domain trimming. These truncated peptide sequences were then analyzed to pinpoint incidents of HGT using the approach described above.
Neocallimastigomycota-specific versus nonspecific HGT events.To determine whether an identified HGT event (i.e., foreign gene acquisition from a specific donor) is specific to the phylum Neocallimastigomycota; the occurrence of orthologues (30% identity, >100 amino acid alignment) of the identified HGT genes in basal fungi, i.e., members of Blastocladiales, Chytridiomycota, Cryptomycota, Microsporidia, Mucormycota, and Zoopagomycota, as well as the putative phylogenetic affiliation of these orthologues, when encountered, were assessed. HGT events were judged to be Neocallimastigomycota specific if (i) orthologues were absent in all basal fungal genomes; (ii) orthologues were identified in basal fungal genomes, but these orthologues were of clear fungal origin; or (iii) orthologues were identified in basal fungal genomes and showed a nonfungal phylogenetic affiliation, but such affiliation was different from that observed in the Neocallimastigomycota. On the other hand, events were judged to be nonspecific to the Neocallimastigomycota if phylogenetic analysis of basal fungal orthologues indicated a nonfungal origin with a donor affiliation similar to that observed in the Neocallimastigomycota (Fig. 1).
Mapping HGT events to available AGF genomes.HGT events identified in AGF data sets examined (both CAZy and non-CAZy events) were mapped onto currently available AGF genome assemblies (42, 45) (GenBank accession numbers ASRE00000000.1, MCOG00000000.1, MCFG00000000.1, and MCFH00000000.1). The duplication and expansion patterns, as well as the GC content, and the intron distribution were assessed in all identified genes. Averages were compared to the AGF genome average using a Student t test to identify possible deviations in such characteristics, as often observed with HGT genes (103). To avoid any bias the differences in the number of genes compared might have on the results, we also compared the GC content, codon usage, and intron distribution averages for the identified genes to a subset of an equal number of randomly chosen genes from AGF genomes. We used the MEME Suite’s fasta-subsample function (http://meme-suite.org/doc/fasta-subsample.html) to randomly select an equal number of genes from the AGF genomes.
Validation of HGT identification pipeline using previously published data sets.As a control, the frequencies of HGT occurrence in the genomes of a filamentous ascomycete (Colletotrichum graminicola, GenBank Assembly accession no. GCA_000149035.1) and a microsporidian (Encephalitozoon hellem, GenBank Assembly accession no. GCA_000277815.3) were determined using our pipeline (Table S1), and the results were compared to previously published results (36, 104).
Guarding against false-positive HGT events due to contamination.Multiple safeguards were taken to ensure that the frequency and incidence of HGT reported here are not due to bacterial contamination of AGF transcripts. These included (i) application of antibiotics in all culturing procedures as described above, (ii) utilization of transcriptomes rather than genomes selects for eukaryotic poly(A) transcripts prior to RNA-seq as a built-in safeguard against possible prokaryotic contamination, (iii) mapping HGT transcripts identified to genomes generated in prior studies and confirming the occurrence of introns in the majority of HGT genes identified, (iv) applying a threshold where only transcripts identified in >50% of transcriptomic assemblies from a specific genus are included, and (v) the exclusion of HGT events showing suspiciously high (>90%) sequence identity to donor sequences.
In addition, recent studies have demonstrated that GenBank-deposited reference genomes (79) and transcriptomes (105) of multicellular organisms are often plagued by prokaryotic contamination. The occurrence of prokaryotic contamination in reference donors’ genomes/transcriptomes could lead to false-positive HGT identification or incorrect HGT assignments. To guard against any false-positive HGT event identification due to possible contamination in reference data sets, sequence data from potential donor reference organisms were queried using blast, and their congruence with organismal phylogeny was considered a prerequisite for inclusion of an HGT event.
Data availability.Sequences of individual transcripts identified as horizontally transferred were deposited in GenBank under accession numbers MH043627 to MH043936 and MH044722 to MH044724. The whole-transcriptome shotgun sequences were deposited in GenBank under the BioProject PRJNA489922 and Biosample accession numbers SAMN09994575 to SAMN09994596. Transcriptomic assemblies were deposited in the SRA under project accession number SRP161496. Trees of HGT events discussed in Results and Discussion are presented in the supplemental material (Fig. S5 to S45).
ACKNOWLEDGMENTS
This study was funded by the NSF-DEB grants 1557102 to N.H.Y. and M.S.E. and 1557110 to J.E.S.
The authors declare no conflict of interest.
FOOTNOTES
- Received 29 April 2019.
- Accepted 19 May 2019.
- Accepted manuscript posted online 24 May 2019.
Supplemental material for this article may be found at https://doi.org/10.1128/AEM.00988-19.
- Copyright © 2019 American Society for Microbiology.
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.