Characterization of a Novel Integrative Element, ICESt1, in the Lactic Acid BacteriumStreptococcus thermophilus

ABSTRACT The 35.5-kb ICESt1 element of Streptococcus thermophilus CNRZ368 is bordered by a 27-bp repeat and integrated into the 3′ end of a gene encoding a putative fructose-1,6-biphosphate aldolase. This element encodes site-specific integrase and excisionase enzymes related to those of conjugative transposons Tn5276and Tn5252. The integrase was found to be involved in a site-specific excision of a circular form. ICESt1 also encodes putative conjugative transfer proteins related to those of the conjugative transposon Tn916. Therefore, ICESt1could be or could be derived from an integrative conjugative element.

Cocultures of various lactic acid bacteria are used during the manufacture of dairy products. Sequence comparisons and hybridizations reveal that horizontal transfers between a large array of species of lactic acid bacteria have occurred, most likely during dairy cocultures (13,32). The most convincing evidence indicates that insertion sequences IS1191, IS981, ISS1, and IS1194 (4,5,14) and some open reading frames (ORFs) involved in exopolysaccharide synthesis (6) or in restriction-modification (24) were transferred between the lactic acid bacteria Streptococcus thermophilus and Lactococcus lactis in cocultures used during cheese manufacture. However, the mechanism of genetic exchange between these two species remains unknown, and no conjugative element has been previously characterized in S. thermophilus.
Cloning of var1C and localization of its limits. The Sm4 fragment of the S. thermophilus CNRZ368 chromosome was previously found to contain the 35-kb variable region var1C, which was absent from the corresponding chromosomal fragments of strains A054 and NST2280 (28). A region containing an IS1191 copy inserted in a truncated IS981 element (14) was cloned and found to be included in var1C (28). Chromosome walking using a GEM11 genomic library of CNRZ368 (25) was performed to isolate recombinant bacteriophages overlapping the var1C region. Their inserts were subcloned in pBC KSϩ and used as hybridization probes on A054 and NST2280 DNAs. S35, ES27, I132.3, ES13, and SC02 fragments hybridized to A054 and NST2280 DNAs. On the contrary, all of the probes covering the 35.5-kb region (except IS1191 and IS981) and located between the HindIII sites H L and H R (Fig. 1) did not hybridize to A054 and NST2280 DNAs (data not shown). Furthermore, CNRZ368, A054, and NST2280 showed identical restriction maps in regions located to the left of the HindIII site H L and to the right of the HindIII site H R (Fig. 1). These data indicated that var1C limits are located near these HindIII sites. When ES27 including the left end and ES13 including the right end were hybridized to DNAs of the three strains digested by ClaI, EcoO109, EcoRI, PstI, or XbaI, they revealed the same fragment from A054 and NST2280, but two different fragments from CNRZ368. Thus, the flanking regions of var1C in CNRZ368 are adjacent to each other in strains A054 and NST2280 (Fig. 1).
Because A054 and CNRZ368 are very closely related to each other, but distantly related to NST2280 (28), the absence of var1C in A054 and NST2280 probably results from an insertion in CNRZ368 rather than from two independent identical deletions in the two other strains.
var1C is bordered by a direct repeat and encodes an integrative system. Sequencing of the var1C limits revealed that the element is bordered by a 27-bp direct repeated sequence (R1) containing a HindIII site (Fig. 2). A 362-bp fragment was obtained by PCRs performed with the DNA of S. thermophilus A054 by using the convergent primers O132.3 (GGACTACT AAGAGAACAT) and O131.2 (TGTTGCTGAATACGAA GC) (Fig. 3). The sequence of this fragment revealed a unique R1 copy identical to those found on either side of var1C in CNRZ368 (Fig. 2). Sequence comparison indicates that R1 direct repeats of CNRZ368 correspond to the boundaries of var1C (Fig. 2).
Two ORFs, int and xis, are located within var1C near the right copy of R1 ( Fig. 1 and Table 1). The putative protein encoded by int shows significant similarities to site-specific recombinases belonging to the LC3 subgroup of the integrase family (http://members.home.net/domespo/trhome.html). This subgroup includes a large array of integrases of temperate bacteriophages and conjugative transposons of lactic acid bacteria and other gram-positive low-GϩC bacteria. The C terminus of Int contains the five amino acids which are perfectly conserved in this family (data not shown) (1,3,11). Furthermore, xis, located to the left of the int gene, encodes a small basic protein (pI 9.88) which show significant similarities to excisionases of two conjugative transposons, Tn5252 of Streptococcus pneumoniae and Tn5276 of L. lactis (Table 1). int and xis are located at comparable positions in many prophages and conjugative transposons.
Therefore, these ORFs probably encode an integrative system which would mediate excision of var1C by site-specific recombination between the two R1 copies corresponding to the cores of the left and right attachment sites attL and attR. The unique R1 sequence found in A054 would be the attB attachment site used for var1C integration. fda, which flanks the right of var1C (Fig. 1), encodes a putative fructose-1,6biphosphate aldolase ( Table 1). The 3Ј end of fda includes 20 bp of the R1 core of attR (Fig. 2). Thus, var1C integration does not change the sequence of fda. Numerous integrative elements (e.g., prophages or integrative conjugative elements) integrate into the 3Ј end of genes encoding tRNAs, their sequences remaining unmodified by the integration (8,15,17,23,30,31). Other integrative elements (e.g., most of the conjugative transposons) integrate into several or numerous sites (19,26). Only a few elements site specifically integrate into the 3Ј end of protein-encoding genes. The substitution sequence is then generally only similar to the original one (10,18).
An imperfect 14-bp inverted repeat, R2, is located 29 bp to the right of the 3Ј end of the int gene and 21 bp to the left of the R1 core of attR (Fig. 1). The potential stem-loop structure (⌬G ϭ Ϫ14.8 kcal ⅐ mol Ϫ1 ) (33), preceded by a stretch of A's and followed by a stretch of T's, could be used as a -independent transcription terminator for both int and fda. A perfect 13-bp inverted repeat, R5 (⌬G ϭ Ϫ18.8 kcal ⅐ mol Ϫ1 ), preceded by a stretch of A's, is located 2 bp to the left of the core of attL ( Fig. 1) and could be used as a transcription termination signal for fda prior to the var1C integration. Therefore, these data suggest that the expression of fda would not be changed after var1C integration.
R3, a perfect 9-bp direct repeat, was found 2 bp downstream from the stop codon of int (Fig. 1). A copy of this 9-bp sequence was also found 148 bp to the right of the R1 core of attL. R6, an imperfect 12-bp inverted repeat, and R4, an imperfect 9-bp inverted repeat, are located 123 and 229 bp to the right of the core of attL, respectively. R2, R3, R4, and R6 could be binding sites for integrase or host-encoded proteins involved in the recombination.
Detection of site-specific recombination products. A nested PCR was performed to amplify the putative junction between the var1C termini, which could result from a site-specific recombination event between the R1 cores of attL and attR. Nested-PCR amplification was performed with the O132.5 (GATGAAATTCACATCATC)-O131.5 (CAGGAATCGAT ATTGACA) outer primer pair and the O132.4 (AGTTGAA ACTAGACTCAG)-O131.1 (TTCCGACATACGCATATC) inner primer pair (Fig. 3A) according to the method described by Manganelli et al. (21). As expected, no product was identified in strain A054 (Fig. 3B), which does not contain var1C. The sequence of the 536-bp PCR product obtained in CNRZ368 (attI, Fig. 2) is identical to the expected sequence resulting from site-specific recombination between the R1 cores of attL and attR. The PCR product was digoxigenin labelled and hybridized to EcoRI-digested A054 and CNRZ368 chromosomal DNA. As expected, this probe hybridizes with the two fragments containing the var1C termini in CNRZ368, but not with A054 DNA (data not shown). Sitespecific excision of var1C in CNRZ368 should also lead to a junction between sequences flanking var1C, identical to that observed in A054. PCR amplification using the O132.3-O131.2 primer pair (Fig. 3A) was performed to detect this junction. PCR products obtained for A054 and CNRZ368 show the same size (Fig. 3B) and restriction map (data not shown).
Detection of these two junction fragments implies in . attB corresponds to the partial sequence of a PCR product obtained from strain A054 with the primers O132.3 and O131.2. attI corresponds to the partial sequence of a nested-PCR product obtained from strain CNRZ368 with the primer pairs O132.5-O131.5 and O132.4-O131.1 (Fig. 3). R1 sequences are written in capital letters. The italic letters correspond to the internal sequence of var1C. Underlined letters indicate the bases that are complementary to the 3Ј end of the fda gene encoding fructose-1,6-biphosphate aldolase. Sequences underlined twice correspond to the HindIII restriction sites included in R1.
CNRZ368 the excision of a covalent circular molecule in some cells of the population. The R1 sequences found in the chromosome of A054, in the circular form of var1C, and in the ends of integrated var1C probably constitute the core of the attB, attI, attL, and attR attachment sites: the strand exchange reaction probably takes place by crossover events similar to those involved in integration and excision. The length of the core of attachment sites suggests that this element would show very strong insertional site specificity.
Disruption of the int gene prevents var1C excision. The ORF int was disrupted in order to prove its involvement in var1C excision. The thermosensitive plasmid pNST152 was constructed by subcloning the 754-bp HindIII fragment of pNST131.1 containing a fragment of int (region encoding res-idues 137 to 383 of the integrase) into pGϩHost9 (20). pNST152 was used to transform S. thermophilus CNRZ368 by electroporation according to the method of Marciset and Mollet (22). Integration of pNST152 into the int gene was promoted by homologous recombination at a nonpermissive temperature (42°C). The integration site and the number of integrated copies were verified by hybridization of probe I131.1 to PstI patterns of integrants (data not shown). The recombinant strain NST1008 contains two truncated copies of int resulting from the integration of a unique copy of pNST152 within the int gene of CNRZ368. Junction fragments containing attB or attI were not detected in NST1008 by PCR experiments (Fig. 3B), whereas a fragment bearing attR was amplified from NST1008 by using the O131.1 and O131.2 primers (Fig. 3). Therefore, int gene disruption causes the disappearance of the two junction fragments and, therefore, of the covalent circular molecule, showing that this gene is actually involved in var1C excision.
var1C encodes proteins related to those of some conjugative system. The 5,881-bp region located to the left of the xis ORF start codon was sequenced. Four ORFs have been identified by GeneMark (http://genemark.biology.gatech.edu/GeneMark/) and/or by comparison of the putative translation products with proteins from the EMBL/GenBank databases by using BLASTX and BLASTP (2) ( Fig. 1 and Table 1). All of these ORFs are preceded by a suitably located ribosome binding site (RBS) (27), have the same orientation as xis and int, and are spaced by very short sequences (Table 1). Therefore, orfDCBA, xis, and int could be translated from a unique transcript.
The orfA and orfD products share significant sequence similarities with proteins involved in conjugative transfer of plasmids from Staphylococcus aureus and Tn916 from Enterococcus faecalis (Table 1). orfC encodes a putative protein weakly related to the translational product of orf15 of the conjugative transposon Tn916. Topology predictions using the HMMTOP server (http://www.enzim.hu/hmmtop/) indicate that the proteins encoded by these two ORFs would be transmembrane proteins with similar tridimensional structures, suggesting that they are actually related. Thus, this region of var1C could encode conjugative functions. Various recently identified elements excise by forming a circular intermediate, promote selftransfer by conjugation into the recipient cell, and integrate by recombination between the specific site of the circular molecule and another site (17,26,29,31). Therefore, the entire var1C sequence could be or could be derived from a sitespecific integrative conjugative element. This possible conjugative element, which would be the first isolated in S. thermophilus, was named ICESt1, for integrative conjugative element of S. thermophilus no. 1. The possible conjugative system of ICESt1 is related to that of Tn916, but not to the system encoded by Tn5252. On the contrary, the ICESt1 excisionase is related only to those of Tn5276 and Tn5252. Moreover, the integrases of ICESt1, Tn5276, and Tn5252 belong to the LC3 integrase subfamily, whereas the integrase of Tn916 belongs to another subfamily (http://members.home.net/domespo/trhome.html). Furthermore, differences in GϩC content between the xis and int genes (about 34%) and orfABCD (about 42%) of ICESt1 also suggest that the integration-excision system and the possible conjugative system have different origins or have undergone very different evolutions. A similar structure is observed in Tn916 (about 36% GϩC for the xis and int genes versus about 40% GϩC for the conjugative system). This suggests that ICESt1 and Tn916 possess a modular structure which results from exchanges or acquisitions of sequences from different sources. This modular structure and evolution are similar to   those of bacteriophages (9,16) and enterobacterial plasmids (7). The large size of ICESt1 (35 kb) suggests that this element, like Tn5276, which encodes nisin synthesis (26), could carry industrially attractive genes. The ICESt1 element contains a complete copy of IS1191, an insertion sequence probably transferred from S. thermophilus to L. lactis, and a truncated copy of IS981, which was probably transferred from L. lactis to S. thermophilus, most likely in cocultures of these species used during the manufacture of cheese (14). Furthermore, conjugative transposons related to ICESt1, like Tn916 of Enterococcus faecalis and Tn5252 of S. pneumoniae, are broad-host-range elements (12,34). Therefore, ICESt1 or elements related to ICESt1 could be involved not only in intraspecific but also in interspecific horizontal transfers between S. thermophilus and other lactic acid bacteria.
Nucleotide sequence accession numbers. The GenBank accession numbers of the nucleotide sequences reported in this paper are AJ243105 (left terminus of var1C) and AJ243106 (right terminus of var1C).
We thank E. Maguin for providing the thermosensitive plasmid pGϩHost9.
This work was supported by grants from the Institut National de la Recherche Agronomique, the University of Nancy 1, and the Ministère de l'Education Nationale, de la Recherche et de la Technologie, France.