ABSTRACT
In soy sauce manufacturing, Candida versatilis plays a role in the production of volatile flavor compounds, such as volatile phenols, but limited accessible information on its genome has prevented further investigation regarding aroma production and breeding. Although the draft genome sequence data of two strains of C. versatilis have recently been reported, these strains are not similar to each other. Here, we reassess the draft genome sequence data for strain t-1, which was originally reported to be C. versatilis, and conclude that strain t-1 is most probably not C. versatilis but a gamete of hybrid Zygosaccharomyces rouxii. Phylogenetic analysis of the D1/D2 region of the 26S ribosomal DNA (rDNA) sequence indicated that strain t-1 is more similar to the genus Zygosaccharomyces than to C. versatilis. Moreover, we found that the genome of strain t-1 is composed of haploid genome content and divided into two regions that show approximately 100% identity with the T or P subgenome derived from the natural hybrid Zygosaccharomyces rouxii, such as NBRC110957 and NBRC1876. We also found a chromosome crossing-over signature in the scaffolds of strain t-1. These results suggest that strain t-1 is a gamete of the hybrid Z. rouxii, generated by either meiosis or chromosome loss following reciprocal translocation between the T and P subgenomes. Although it is unclear why strain t-1 was misidentified as C. versatilis, the genome of strain t-1 has broad implications for considering the evolutionary fate of an allodiploid.
IMPORTANCE In yeast, crossing between different species sometimes leads to interspecies hybrids. The hybrid generally cannot produce viable spores because dissimilarity of parental genomes prevents normal chromosome segregation during meiotic division, leading to a dead end. Thus, only a few natural cases of homoploid hybrid speciation, which requires mating between 1n gametes of hybrids, have been described. However, a recent study provided strong evidence that homoploid hybrid speciation is initiated in natural populations of the budding yeast, suggesting the potential presence of viable 1n gametes of hybrids. The significance of our study is finding that the strain t-1, which had been misidentified as Candida versatilis, is a viable 1n gamete derived from hybrid Zygosaccharomyces rouxii.
INTRODUCTION
Interspecies hybridization is often observed in yeast. The increased genome size and complexity due to the hybridization can confer a selective advantage called “heterosis,” but the hybrids are generally sterile (unable to produce viable spores) or infertile (unable to sporulate). The hybrid sterility (or infertility) can be explained by incompatibility between genes from different species (1), which is known as Dobzhansky-Muller (DM) incompatibility (2, 3), or dissimilarity between chromosomes from different species, which prevents precise chromosome pairing essential for meiosis (4–7). Although viable allohaploids (haploid gametes of hybrids generated by mating between two different species) can be generated experimentally (8), it is less understood whether the viable haploid gametes derived from hybrids exist in the natural environment. To address this question, accumulation of genome data in various yeasts is essential because it is difficult to discriminate between parental haploids and allohaploids derived from hybrids, which are generated by the fusion of two close relatives. Recent advances in next-generation sequencing technology and its outcomes offer an opportunity to explore allohaploids from the international nucleotide sequence database.
Candida versatilis (also previously called Torulopsis versatilis) is a yeast strain with a recently uncovered genome. C. versatilis is a highly halotolerant yeast used in the production of soy sauce and soybean paste, similar to Zygosaccharomyces rouxii. In soy sauce manufacturing, C. versatilis produces volatile phenols, such as 4-ethylguaiacol (4-EG) and 4-ethylphenol (4-EP), that confer the characteristic aroma in soy sauce (9). Although excess quantities of volatile phenols in soy sauce normally impart an unpleasant aroma, the presence of 1 to 2 mg/liter of 4-EG gives a better flavor quality to the soy sauce as deemed by subjective evaluations (9).
Recently, the draft genome sequence data of C. versatilis became available; the genome data for strain t-1 (originally reported to be C. versatilis) and strain JCM5958 were submitted by Hou et al. (10) and by the RIKEN Center for Life Science Technologies of Japan, respectively. Hou et al. did not clearly describe the strain name and accession number in their paper (10), but they certainly described strain t-1 because the NCBI nucleotide database entries for strain t-1 (accession numbers KV452433 to KV452453) directly cite the paper by Hou et al. as their source.
More recently, the genome sequence analyses of hybrid Z. rouxii NBRC110957 and NBRC1876 have been reported (11, 12). These hybrids contain two subgenomes; one is derived from one parent similar to haploid Z. rouxii CBS732T (referred to as the T subgenome) and another may be derived from one parent similar to NCYC3042, informally called Z. pseudorouxii (referred to as the P subgenome). However, genome sequence analysis of NCYC3042 has not yet been performed, so it remains to be examined in detail whether one parent of hybrid Z. rouxii (e.g., NBRC110957 and NBRC1876) is indeed Z. pseudorouxii. Interestingly, we noticed that the genome sequence data of strain t-1 are not similar to those of strain JCM5958 and seemed to be similar to those of hybrid Z. rouxii. Unlike hybrid Z. rouxii, strain t-1 has haploid genome content; alternatively, it has a mosaic scaffold sequence similar to both the T and P subgenomes of hybrid Z. rouxii.
In this study, we conducted a comparative analysis between C. versatilis and Z. rouxii to clarify the taxonomic status and the origin of strain t-1 and revealed that strain t-1 is most probably not C. versatilis but a gamete of hybrid Z. rouxii. Our study provides the evidence implicating isolation of an allohaploid derived from hybrid Z. rouxii in a natural environment.
RESULTS AND DISCUSSION
Dissimilarity between strains t-1 and JCM5958.We first focused on the divergence of G+C content between the genomes of strains t-1 and JCM5958 in the NCBI genome database (https://www.ncbi.nlm.nih.gov/genome/genomes/44240?). The G+C content of the strain t-1 genome is 40.1% (39.74% in reference 10) and that of the JCM5958 genome is 44.8%, suggesting that these strains are phylogenetically divergent. To confirm the presence of the genes that have been previously cloned from C. versatilis (13–15), we conducted a BLASTN search using these genes as a query against the strain t-1 and JCM5958 genomes (Table 1). The genes encoding Cagpd1 (13), Cagpd2 (14), CvGPD1 (15), PLB1, and PLB2 were detected with high sequence identity in the JCM5958 genome, while they were not detected in the strain t-1 genome (Table 1). We found the D1/D2 region of the 26S ribosomal DNA (rDNA) sequence that can be amplified by the primer sets NL1 and NL4 (16) in the scaffold00031 (bp 10,704 to 11,327) of the strain t-1 genome. The analysis of the sequence in this region revealed that the D1/D2 region of the 26S rDNA sequence in strain t-1 shares 100% identity with the corresponding region of Z. rouxii CBS732T with 100% query cover (see Fig. S1 in the supplemental material). Alternatively, the same region of JCM5958 shares 99% identity with the corresponding region of C. versatilis CBS1752T with 100% query cover (Fig. S1). Phylogenetic analysis of the D1/D2 region of the 26S rDNA sequence indicated that strain t-1 is more similar to the genus Zygosaccharomyces than to C. versatilis (Fig. 1). Ribosomal DNA arrays in the genome of the Z. rouxii complex species consist of Z. rouxii-, Z. sapae-, and Z. mellis-like sequences (17). This indicates the possibility that strain t-1 may also contain a Z. rouxii type D1/D2 sequence and other Zygosaccharomyces species type internal transcribed spacers (ITS) sequences, so we attempted the phylogenetic analysis of the D1/D2 region and compared it to the ITS sequence in order to confirm whether strain t-1 has a mosaic rDNA array such as those seen in Z. rouxii complex species. However, we cannot fill several sequence gaps in the scaffold00031 of strain t-1 because of the difficulty in obtaining strain t-1, which is not deposited in the public culture collection, preventing the phylogenetic analysis of the ITS sequence. In any case, our partial phylogenetic analysis based on the D1/D2 region of the 26S rDNA sequence suggests that strain t-1 might be misidentified as C. versatilis.
Genes previously cloned from C. versatilis
Phylogenetic tree showing the relationship among the D1/D2 region of the 26S rDNA sequence. The dendrogram was constructed using the neighbor-joining method. Bootstrap values were calculated from 1,000 replications and expressed as percentages. The scale bar represents 0.02 substitutions per nucleotide position. Multiple-sequence alignment was used to construct the phylogenetic tree shown in Fig. S1. Sequence data were downloaded from the DNA data bank of Japan and National Institute of Technology and Evaluation Biological Resource Center (http://www.nbrc.nite.go.jp/NBRC2/NBRCDispSearchServlet?lang=en). Zr, Zygosaccharomyces rouxii; HyZr, Hybrid Zygosaccharomyces rouxii; Zm, Zygosaccharomyces mellis; Zb, Zygosaccharomyces bailli; Cv, Candida versatilis; Ce, Candida etchellsii.
To test this possibility, we examined the genome-wide synteny and identity between C. versatilis and Z. rouxii by dot plot analysis using YASS. YASS is a genomic alignment search tool that uses a new spaced-seed model called transition-constrained seeds that takes advantage of statistical properties of real genomic sequences to achieve high sensitivity on the nucleic sequences being compared (18). The genome comparison of strain t-1 and Z. rouxii CBS732T shows that they share significant synteny and identity (Fig. 2), but the genomes of strains t-1 and JCM5958 or of JCM5958 and Z. rouxii CBS732T showed poor synteny and very low identity (see Fig. S2 in the supplemental material). This result suggests that strain t-1 is most probably not C. versatilis but a haploid strain related to Z. rouxii. Note that we assume that there are strong, but not complete, collinearities among the genomes of strain t-1, CBS732T, and the T and P subgenomes of hybrid Z. rouxii. There may be some additional chromosome rearrangements that are not visible by assembly of short reads in strains t-1 (10), NBRC110957 (11), and NBRC1876 (12). For example, a physical linkage between scaffold00009 and scaffold00008 in strain t-1 has not been confirmed, suggesting that unexpected chromosomal rearrangement may have occurred.
Dot plot analysis between the genome of Z. rouxii CBS732T and the reconstructed genome of strain t-1. The x axis indicates the aligned scaffold of strain t-1, and the y axis indicates the reference genome of CBS732T. The number shows only the largest 12 scaffolds of strain t-1. The red number shows the reverse complement of the scaffold sequence. Scaffolds 29, 31, 32, 34, 35, and 36 are aligned but are not visible in this scale (because they are too small). “Other scaffolds” indicates the remaining scaffolds, 37 to 69. Dot plots were made using the program YASS (18), and horizontal and vertical lines were added according to the length of chromosomes of CBS732T and scaffold of NBRC110957, respectively.
Strain t-1 is an allohaploid derived from hybrid Z. rouxii.Is strain t-1 just a haploid of Z. rouxii? To confirm the phylogenetic status and the origin of strain t-1, we conducted neighbor-joining (NJ)-based phylogenetic analysis of 14 proteins encoded by ortholog genes that are conserved among strains t-1, JCM5958, NBRC110957, and NBRC1876 and cover seven chromosomes of the haploid Z. rouxii CBS732T genome (Fig. 3; see also Fig. S3 in the supplemental material). First, to detect ortholog genes, we performed a TBLASTN search against the genomes of strains t-1, JCM5958, NBRC110957, and NBRC1876 using the sequences of housekeeping proteins in Z. rouxii as the query. Next, we compared the gene order around the ortholog genes in each strain to confirm synteny using Yeast Genome Annotation Pipeline (19) and analyzed the results with Yeast Gene Order Browser (20). Except for JCM5958, the gene order around the orthologous target genes was highly conserved among strains t-1, NBRC1876, NBRC110957, and CBS732T, which ensured that orthologous target genes analyzed in this study diverged from a common ancestral sequence by speciation. Consistent with a previous study, both allodiploid NBRC110957 and NBRC1876 have two types of proteins: T-type proteins encoded by a T subgenome originated from a donor similar to the Z. rouxii CBS732T and P-type proteins encoded by the P subgenome originated from donors similar to strains related to Z. rouxii (Fig. 3 and S3). In contrast, strain t-1 has only one type of protein; Pho88, Met16, Atp12, Leu2, Ura3, Ade2, and Arg2 were clustered in a T-type protein, while Ade1, Trp1, Aur1, Bet3, Aro8, His3, and Ump1 were clustered in a P-type protein (Fig. 3 and S3). These results suggest that the haploid genome of strain t-1 can be comprised of both T- and P-subgenome-derived sequences.
Patterns of 14 proteins detected in Z. rouxii CBS732T, allodiploids NBRC110957 and NBRC1876, and strain t-1. Red and blue boxes indicate the T- and P-type proteins, respectively. The sequence type discrimination (T type or P type) is supported by the phylogenetic analysis (see also Fig. S3).
To test this hypothesis, a comprehensive BLAST search between strain t-1 scaffolds and the Z. rouxii CBS732T genome was performed. As a result, we were able to divide the scaffolds of strain t-1 into two regions by degree of identity to the Z. rouxii CBS732T; the scaffolds with approximately 98 to 100% identity were considered to be the region derived from the T subgenome (T-type sequence), and the scaffolds with approximately 80 to 90% identity were considered to be the region derived from the P subgenome (P-type sequence) (see Table S1 in the supplemental material). Conserved synteny and high sequence identity permit us to map the scaffolds of strain t-1 to the Z. rouxii CBS732T genome. The reconstructed genome structure of strain t-1 is shown in Fig. 4. We found that strain t-1 has genomic contents that are comparable in size to the genome of haploid Z. rouxii CBS732T, suggesting that strain t-1 is apparently allohaploid.
Map of the scaffold of strain t-1 aligned on the chromosome structure of Z. rouxii CBS732T. The black horizontal bars indicate the chromosome of CBS732T with tick marks for every 100 kbp. Boxes filled with colors indicate the T-type sequence that shares approximately 98 to 100% identity with the sequence of CBS732T. Boxes filled with angled striped colors indicate the P-type sequence that shares approximately 80 to 90% identity with the sequence of CBS732T. The white (T-type sequence) and black (P-type sequence) hatched boxes connected with the colored boxes represent the positions corresponding to the genome of CBS732T. Only the colored boxes exist in the strain t-1 genome (these hatched boxes connected with the colored boxes do not represent redundant sequence in the strain t-1 genome). Arrows indicate the direction of the segment. The numbers show scaffold numbers, and the red number shows the reverse complement of the scaffold sequence. The closed circles indicate the positions of DNA crossing-over shown in Fig. S4 to S6.
The characteristic chromosomal rearrangement between chromosomes C and F in scaffold00008 as a tandem repeat (a region corresponding to one end of CB732 chromosome F occurs twice in the strain t-1 genome) (Fig. 2 and 4) is shared with strain t-1 and NBRC1876 (11), suggesting that strain t-1 and NBRC1876 share a allodiploid ancestor. This characteristic structural feature as a tandem repeat is assumed to be formed by reciprocal translocation between the HML locus of the T subgenome and the MAT locus of the P subgenome in an ancestor of NBRC1876 (11). The chromosomal rearrangement between chromosomes A and G in scaffold00001 seems to be strain t-1 specific and is not shared among other strains of hybrid Z. rouxii. Taken together, these data indicate that the chromosomal rearrangement between C and F may have occurred in a common allodiploid ancestor of strains t-1 and NBRC1876, the ancestor of strain t-1 would have diverged from the common allodiploid ancestor, experiencing chromosome rearrangement between chromosomes C and F, and allohaploid “strain t-1” would have been generated by meiotic division before or after the chromosomal rearrangement between A and G. Note that it is also possible that the chromosomal rearrangement between C and F occurred independently in both strain t-1 and the NBRC1876 lineage because the chromosomal translocation between mating-type-like genes is sometimes detected in haploid Z. rouxii (21).
We further compared genetic relatedness among strains t-1, Z. rouxii CBS732T, NBRC110957, and NBRC1876 genomes (Fig. 5). The average nucleotide identity (ANI) among the T-type sequence of strain t-1 (the region of filled boxes in Fig. 4), CBS732T, and the T subgenomes of NBRC110957 and NBRC1876 showed approximately 100% identity, similar to the P-type sequence of strain t-1 (the region of angled-striped boxes) and the P subgenomes of NBRC110957 and NBRC1876 (Fig. 5). These results suggest that strain t-1 is an allohaploid strain derived from hybrid Z. rouxii.
The average nucleotide identity (ANI) among strains t-1, Z. rouxii CBS732T, allodiploid NBRC110957, and NBRC1876. The ANI among the (sub-) genomes of strains t-1, CBS732T, NBRC110957, and NBRC1876 was calculated using an ANI calculator (43) with a default setting. The upper horizontal line of the box is the 75th percentile; the lower horizontal line of the box is the 25th percentile; the horizontal bar within box is the median value; the upper and lower horizontal thick bars outside the box indicate 1.5 times the interquartile range from the box.
A model for generation of allohaploid strain t-1 from an allodiploid.How is the allohaploid genome structure of strain t-1 generated from an allodiploid? There are two possible ways to generate an allohaploid genome from an allodiploid: meiotic division and chromosome loss (22). We found three traces of chromosome crossing-over between the T and P subgenomes in scaffolds 1, 9, and 10 (Fig. 4, filled circle; see also Fig. S4 to S6 in the supplemental material). Although the chromosome crossing-over can be mapped to sequence regions of 83 to 240 bp, which are identical in the T and P subgenomes, we could not discriminate whether the trace of chromosome crossing-over was due to meiotic division or chromosome loss following the reciprocal translocation of chromosomes.
We illustrate a model for allohaploid generation from an allodiploid (Fig. 6). In the case of allohaploid generation through meiotic division, two haploid cells of different species mate and form an allodiploid. This allodiploid could start meiotic division in response to nutrient starvation; in most cases, however, homologous chromosomes (homologues) cannot become tethered to each other due to dissimilarity between the T and P subgenomes, leading to failure of chiasma formation. Because the linkage provided by chiasmata ensures proper segregation of homologues through the process by which the kinetochores attach to microtubules in such a way that the homologues will be pulled to the opposite rather than the same side of the spindle at anaphase I (23, 24), defects of chiasmata cause the missegregation of homologues, leading to a dead end. Alternatively, the rare success of chiasma formation enables proper chromosome segregation. After the second round of chromosome segregation, gametes with haploid genomic content could be produced; however, the majority of gametes cannot germinate due to incompatibility between two chromosomes derived from different species (25–28). Thus, it would be very rare for a viable allohaploid to be generated by meiosis.
Model for allohaploid generation from allodiploid. For a detailed explanation, see Results and Discussion. Ovals represent yeast cells. Red and blue blocks indicate homologue chromosomes that share 80 to 90% sequence identity to each other.
Chromosome loss is a well-established phenomenon in yeast hybrids. For example, heat stress on Saccharomyces cerevisiae × Saccharomyces uvarum hybrids favored loss of the S. uvarum genome (29), and another type of stress promoted rearrangement between the parental species' chromosomes (30). Thus, it would be possible that the allodiploid returns to a haploid genome content by chromosome loss following reciprocal translocation of chromosomes.
Conclusion.Our results showed that strain t-1 is most probably not C. versatilis but a gamete of hybrid Z. rouxii. Although it is unclear why strain t-1 was misidentified as C. versatilis, the genome of strain t-1 has broad implications for considering the evolutionary fate of allodiploids. In general, allodiploids are viable, but the sexual gametes of this allodiploid are not (1). This hybrid sterility is one of the postzygotic reproductive isolation mechanisms to evolve between recently diverged species. One of several possible causes of the nonviability of sexual gametes is incompatibility between genes derived from different species (1), which is known as DM incompatibility (2, 3). The presence of compatible alleles in hybrids can mask the effect of incompatibility because hybrids contain two complete haploid genomes derived from each parent species; however, haploid gametes of hybrids could be exposed to recessive incompatibility, preventing allodiploids from reproducing sexually (1). Indeed, some of the incompatible DM pairs have previously been identified in diverse organisms (8, 26–28, 31). Under experimental conditions, haploid gametes of allodiploids can be generated (8), but it is less well understood whether viable haploid gametes of allodiploids exist in the natural environment. A recent study provided strong evidence that homoploid hybrid speciation is present in the natural population of the budding yeast Saccharomyces paradoxus (32), suggesting the potential presence of viable 1n gametes of hybrids. In this study, we demonstrated that strain t-1, which was isolated from the natural environment in soy sauce mash at Tianjin in China in 2008 (BioSample number SAMN03466599), is most probably not C. versatilis but a viable haploid gamete of hybrid Z. rouxii. Most likely, the strain t-1 could be generated in soy sauce mash by either meiosis or chromosome loss following reciprocal chromosome translocation. At this moment, this is the leading hypothesis, which requires further evidence to be fully confirmed, because isolating haploid gametes from allodiploids using NBRC110957 and NBRC1876 remains to be validated experimentally. In addition, genome sequence analysis of NCYC3042, which is one of the putative parents of hybrid Z. rouxii, is necessary to fully demonstrate that strain t-1 is a haploid gamete with a mosaic genome containing T- and P-subgenomic segments.
Hybrid Z. rouxii seems to have generated at least two lineages of new species. One has an allodiploid/allotetraploid sexual reproduction status (11) and another has an allohaploid/allodiploid sexual reproduction status (e.g., strain t-1). Allodiploid hybrids Z. rouxii with opposite mating types can mate with each other, and the resulting allotetraploid can form allodiploid viable spores (11). In addition, it is reasonable to assume that allohaploid strain t-1 can switch mating type and undergo mother-daughter mating, and the resulting allodiploid can also form allohaploid viable spores. It is an open question as to whether these lineages are reproductively isolated from the parent species. To demonstrate that hybrid speciation has occurred, three criteria should be satisfied: (i) reproductive isolation of hybrid lineages from parental species, (ii) evidence of hybridization in the genome, and (iii) evidence that this reproductive isolation is a consequence of hybridization (33). The hybrid Z. rouxii and strain t-1 satisfy the second criterion (ii), but the first and third criteria (i and iii) are untested.
More physiological study of strain t-1 is needed to clarify its adaptation to a harsh environment. Hybrid Z. rouxii organisms, such as NBRC1876 and ATCC 42981, are capable of growing in more-extreme conditions than the haploid Z. rouxii CBS732T (34–36). Given that polyploidization could be an evolutionary device to cope with strong selection pressures during times of environmental instability by doubling all genes (37), the haploid strain t-1 isolated from soy sauce mash would retain fitness in a harsh environment by mixing two parental genomes. The rare appearance of viable haploid gametes of allodiploids may be a driving force to generate diverse species fitting a diverse environment.
MATERIALS AND METHODS
Ortholog finding.Ortholog gene findings were performed manually using TBLASTN algorithms or the Yeast Genome Annotation Pipeline (http://wolfe.ucd.ie/annotation/) (19) and analyzed using the Yeast Gene Order Browser (http://ygob.ucd.ie/) (20). Dot plot analysis was performed using YASS (18).
Sequence comparison.Multiple nucleotide and amino acid sequence alignments were performed using Clustal Omega and Clustal W2 (38) and used for phylogenetic analysis by the neighbor-joining (NJ) method (39) with 1,000 bootstrap replications (40). A phylogenetic tree was illustrated by NJplot (41). Searches for nucleotide and protein sequence homology were performed in the GenBank database with BLAST algorithms (42).
Data availability.The scaffold and contig sequences of strain t-1 are available in DDBJ/ENA/GenBank under accession numbers KV452433 to KV452453 and LAVI01000001 to LAVI01000543, respectively. The scaffold sequence of C. versatilis JCM5958 is available in DDBJ/ENA/GenBank under accession numbers BCJV01000001 to BCJV01000019. The scaffold sequences of hybrid Z. rouxii NBRC1876 (DF983528 to DF983589) and hybrid Z. rouxii NBRC110957 (BDGX01000001 to BDGX01000132) and the complete genome sequence of haploid Z. rouxii CBS732T (CU928173 to CU928176, CU928178 to CU928179, and CU928181) were downloaded from NCBI.
FOOTNOTES
- Received 22 August 2017.
- Accepted 17 October 2017.
- Accepted manuscript posted online 27 October 2017.
Supplemental material for this article may be found at https://doi.org/10.1128/AEM.01845-17.
- Copyright © 2017 American Society for Microbiology.