A Type I Restriction-Modification System Associated with Enterococcus faecium Subspecies Separation

Enterococcus faecium is a leading cause of hospital-acquired infections around the world. Rising antibiotic resistance in certain E. faecium lineages leaves fewer treatment options. The overarching aim of this work was to determine whether restriction-modification (R-M) systems contribute to the structure of the E. faecium species, wherein hospital-epidemic and non-hospital-epidemic isolates have distinct evolutionary histories and highly resolved clade structures. R-M provides bacteria with a type of innate immunity to horizontal gene transfer (HGT). We identified a type I R-M system that is enriched in the hospital-epidemic clade and determined that it is active for DNA modification activity and significantly impacts HGT. Overall, this work is important because it provides a mechanism for the observed clade structure of E. faecium as well as a mechanism for facilitated gene exchange among hospital-epidemic E. faecium isolates.

multidrug-resistant (MDR) E. faecium strains belong to clade A, while clade B consists of drug-susceptible fecal commensal E. faecium strains (8). Clade A is further split into two subclades, clades A1 and A2, with hospital-endemic strains generally clustering in clade A1 and sporadic-infection isolates and animal isolates generally clustering in clade A2 (8). Specific phenotypes and genomic features are enriched in clade A1 isolates relative to clade A2 and B isolates (8). Specifically, clade A1 isolates have significantly higher mutation rates and larger overall genome sizes, including a larger core genome, and possess more mobile elements. On the other hand, clade A2 possesses a larger pangenome than clades A1 and B, possibly reflective of the broader host origins of these strains. Given that clade A and clade B strains would be expected to comingle in certain environments (for example, in a hospital and municipal sewage), the phylogenetic separation among the E. faecium clades suggests that they are not sharing genetic information freely because of endogenous barriers to genetic exchange.
Horizontal gene transfer (HGT) is the exchange of genetic material between cells rather than the vertical inheritance of genetic material from a parental cell. Bacteria can encode genome defense mechanisms that can act in opposition to HGT. Two examples of these mechanisms are clustered regularly interspaced short palindromic repeat (CRISPR) and associated protein (CRISPR-Cas) systems and restriction-modification (R-M) systems. CRISPR-Cas is a dynamic immune system that utilizes sequence complementarity between self (CRISPR RNAs) and foreign nucleic acid to carry out its restrictive function, whereas R-M discriminates self from foreign DNA by DNA methylation patterns. If the E. faecium clades encode different defense mechanisms, they may not exchange genetic information freely, thereby facilitating and maintaining phylogenetic separation. However, little is known about CRISPR-Cas and R-M in E. faecium. Genomic analysis suggests that these systems could contribute to the observed clade structure of E. faecium. For example, CRISPR-Cas systems have been identified exclusively in clade B E. faecium strains and in sporadic clade A-clade B recombinant strains (8). For R-M, a predicted methyl-directed restriction endonuclease (REase) is enriched in clade A2 and B E. faecium genomes relative to clade A1 genomes (8).
Here, we focused on R-M systems and their roles in regulating gene exchange in E. faecium because little is known about R-M defense in this species. Moreover, there is precedent in the literature for R-M systems contributing to bacterial clade structure, as has been observed in Burkholderia (9) and Neisseria (10). Our overarching hypothesis is that the E. faecium clades encode different R-M systems, thereby inhibiting genetic exchange between them. In general, R-M systems are composed of cognate methyltransferase (MTase) and REase activities and are classified into different types based on the specific number and types of enzymes in the system as well as characteristics such as methylation type and pattern, cofactor requirement, and restriction activity (11). A MTase recognizes specific sequences in the bacterial genome and transfers a methyl group to either an adenine or a cytosine residue, resulting in 6-methyladenine (m6A), 4-methylcytosine (m4C), or 5-methylcytosine (m5C). A REase may recognize the same sequence as a MTase and cleave that region if the sequence is unmethylated (or, in some cases, if it is methylated). With the activities of MTases and REases, bacteria can use R-M to impede the entry of nonself DNA.
In this study, we used single-molecule real-time (SMRT) sequencing and wholegenome bisulfite sequencing to characterize the methylomes of representative E. faecium strains from clade A1 (E. faecium 1,231,502 or Efm502) and clade B (E. faecium 1,141,733 or Efm733). Two unique m6A methylation patterns were identified, one in each strain. These patterns were asymmetric and bipartite, which is characteristic of type I R-M methylation motifs (12). Bioinformatic analyses were performed to identify candidate genes responsible for methylation. A unique type I R-M system is encoded by each strain. We have named these systems Efa502I (for Efm502) and Efa733I (for Efm733). Expression of these candidate systems in Enterococcus faecalis heterologous hosts followed by SMRT sequencing confirmed that they are responsible for the methylation patterns observed in Efm502 and Efm733. A functional analysis was performed in order to assess the abilities of these systems to reduce E. faecium HGT by transformation. In a comparative analysis among 73 E. faecium genomes, we found that Efa502I is significantly enriched among clade A1 isolates, while the type I R-M system of Efm733 appears to be strain specific. Overall, this study is a first step toward understanding the role of R-M in regulating HGT in E. faecium and the potential for R-M as one mechanism for the clade structure of E. faecium.

RESULTS
Identification of a clade A1-specific putative type I R-M system in E. faecium. We previously reported that a type II R-M system significantly reduces HGT via conjugation (13) and transformation (14) in E. faecalis. Here, we hypothesize that the E. faecium clades encode distinct R-M systems that reduce the exchange of genetic information between them. We utilized an approach that we previously developed for E. faecalis R-M analysis (13) to predict potential R-M systems in eight previously sequenced E. faecium genomes. The 8 genomes included 3 genomes from clade A1, 3 genomes from clade B, 1 genome from clade A2, and 1 recombinant clade A1-clade B hybrid (5,8). Because REases are difficult to identify with bioinformatics, and MTase prediction is comparatively straightforward, as was previously reported by New England Biolabs (NEB) (15), we first identified predicted DNA MTases in E. faecium genomes and then analyzed surrounding genes for predicted R-M-related activities. The complete list of candidates for the eight strains is shown in Table 1.
Interestingly, we predicted at least one putative type I R-M system for seven of the eight E. faecium strains (Table 1). Type I R-M systems are multisubunit complexes comprised of a specificity (S) subunit, a methylation (M) subunit, and a restriction (R) subunit (11,(16)(17)(18). The S subunit is responsible for the specific DNA recognition motif and associates with the DNA to bring the M and R subunits into contact. The system has two conformations: M 2 S 1 , which is capable of methylating DNA based on the recognition sequence, and R 2 M 2 S 1 , which is capable of restricting DNA (18,19). One predicted E. faecium type I R-M system is comprised of highly conserved (Ͼ90% amino acid sequence identity) M and R subunits in six of eight genomes across both clade A1 and clade B (Table 1; see also Data Set S1 in the supplemental material). The specificity TABLE 1 Distribution of predicted DNA MTases and R-M systems a a Loci that are in black are strain specific. Loci that are the same color are conserved in their protein sequences based on a Ͼ90% sequence identity threshold. An empty cell indicates that the system was not detected. subunit from this system, however, is highly conserved in clade A1 genomes but not in clade B (Table 1, Fig. 1, and Fig. S1). S subunits possess two target recognition domains (TRDs) that determine the nucleotide sequence that the subunit binds to (20,21). The variation in amino acid sequence between the S subunits occurs within these TRDs (Fig.  S1), suggesting that these S subunits recognize different DNA sequences. Notably, the S subunits from clade A1 strains are identical to each other, indicating that they utilize the same recognition sequence.
To examine the distribution of this putative clade A1-specific system in a larger collection of E. faecium strains, we analyzed 73 E. faecium genomes of mostly draft status that were reported previously (8). This list includes 15 clade B isolates, 21 clade A1 isolates, 35 clade A2 isolates, and 2 hybrid isolates (Table S1). We selected Efm502 (clade A1) as our representative clade A1 strain for this analysis and used type I R-M sequences from this genome as references for analysis against the broader collection of E. faecium strains. The M and R subunits of the putative clade A1-specific type I system were detected in 51 and 52, respectively, of 73 E. faecium genomes, including both clade A and clade B strains ( Fig. S2a and b). However, the distribution of the S subunits varied ( Fig. S2a and b). The S subunit present in Efm502, EFQG_01131, was significantly enriched within clade A1 isolates (14/21; P value of Ͻ0.0001 using a Fisher exact test) (Fig. S2a) and absent from all other clades with the exception of strain EnGen002, which is classified as a clade A1-clade B hybrid strain. Interestingly, the S subunits present in most other E. faecium strains are strain specific by the strict thresholds applied here. Given that the Efm502 S subunit is enriched in clade A1, we hypothesize that many clade A1 strains exchange genetic information freely with each other, while exchange with other E. faecium strains is restricted.
SMRT sequencing for E. faecium methylome analysis. We analyzed the Efm502 and Efm733 genomes by SMRT sequencing. SMRT sequencing measures the kinetics of DNA polymerase as it synthesizes DNA in order to identify bases that have been modified (22)(23)(24). It has been extensively utilized for bacterial methylome analysis representative E. faecium genomes were pairwise aligned using MacVector. The percent identity of each pair is shown. White to red, low to high percent identities. Each number represents the percent identity of one protein sequence (row name) to another (column name). (25)(26)(27)(28)(29)(30)(31)(32)(33). With SMRT sequencing, 6-methyladenine (m6A) and 4-methylcytosine (m4C) can be easily detected with modest sequence coverage (ϳ25ϫ per strand), while 5-methylcytosine (m5C) detection requires high coverage (ϳ250ϫ per strand) (32,34). Using SMRT sequencing, we identified two unique m6A methylation motifs in Efm502 and Efm733. Efm502 possessed m6A methylation at the underlined position of the motif 5=-RAYCNNNNNNTTRG-3= (and its complementary strand 5=-CYAANNNNNNGRT Y-3=), and Efm733 possessed m6A methylation at the underlined position of the motif 5=-AGAWNNNNATTA-3= (and its complementary strand 5=-TAATNNNNWTCT-3=) ( Table 2 and Data Set S2). These sequences are asymmetric and bipartite, which is characteristic of type I R-M methylation (12). Due to the coverage of our SMRT sequencing, m5C modification could not be accurately detected. The two unique m6A methylation patterns indicate that DNA from one strain would be recognized as foreign should it cross the strain barrier.
Expression in heterologous hosts links methylation activity to genes in Efm502 and Efm733. According to our predictions (Table 1), there is only one complete type I R-M system encoded by each of the strains Efm502 and Efm733. To determine if these systems are responsible for the methylation patterns identified by SMRT sequencing, we expressed the respective S and M subunits (EFQG_01131-EFQG_01132 [EFQG_ 01131-01132] for Efm502 and EFSG_05028-05027 for Efm733) in the heterologous host E. faecalis OG1RF or its spectinomycin/streptomycin-resistant relative OG1SSp. Previous work in our laboratory had characterized the methylome of OG1RF using SMRT and bisulfite sequencing (14). This allowed us to attribute any new methylation patterns observed during SMRT sequencing to the E. faecium genes that were expressed in the OG1RF background. SMRT sequencing of these strains detected the same methylation patterns originally identified in Efm502 and Efm733 (Table 2 and Data Set S2). These data demonstrate that EFQG_01131-01132 is responsible for 5=-RAYCNNNNNNTTRG-3= methylation in Efm502 and that EFSG_05028-05027 is responsible for 5=-AGAWNNNN ATTA-3= methylation in Efm733. Because we have confirmed the function of these genes, we have named them Efa502I and Efa733I, which is consistent with the R-M system nomenclature convention established by New England Biolabs (12).
Efa502I and Efa733I reduce transformation efficiency in E. faecium. To determine whether the type I R-M systems in Efm733 and Efm502 actively defend against exogenous DNA, we constructed null strains (Efm733ΔRM and Efm502ΔRM) ( Table 3) and evaluated their transformation efficiencies relative to their wild-type parent strains. Here, we utilized the broad-host-range plasmid pAT28 (Table 3) (35). The pAT28 sequence has motifs recognized by the type I R-M systems in both wild-type Efm733 (1 occurrence) and Efm502 (1 occurrence). The transformation of pAT28 into Efm733 and Efm502 served as a baseline for the experiment. If Efa733I and Efa502I are active, we expect to see higher efficiencies of pAT28 transformation into Efm733ΔRM and Efm502ΔRM, respectively. Indeed, we observed significantly higher efficiencies of transformation into type I R-M system-null strains (P value of Ͻ0.05 using one-tailed Student's t test) (Fig. 2). These data demonstrate that the type I R-M systems in Efm733 and Efm502 actively function as mechanisms of genome defense. m5C methylation occurs in Efm733. As described above, our SMRT sequencing had insufficient coverage depth for m5C methylome characterization. Hence, we used REase protection assays with commercially available methylation-sensitive REases to query the presence of m5C methylation in our eight E. faecium strains. Table 4 summarizes the recognition sequences and modifications of the enzymes used in this study. Only Efm733 showed evidence of cytosine modification, as it was digested by MspJI (Table 4 and Fig. S3).
To determine the exact cytosine methylation motif present in Efm733, genomic DNA (gDNA) was subjected to whole-genome bisulfite sequencing. Whole-genomeamplified DNA was used as a negative control since whole-genome amplification (WGA) removes all modifications. During bisulfite treatment, cytosine bases are converted to thymine unless they are protected by either m4C or m5C methylation. Additionally, our laboratory previously reported a method of distinguishing between m4C and m5C methylation using thymine conversion ratios of sequencing reads after bisulfite treatment (36). m5C methylation is sufficient to protect the cytosine residue completely from bisulfite conversion so that most sequencing reads at that position contain the original cytosine base. However, m4C methylation provides only partial protection from bisulfite conversion, so a thymine conversion rate of 0.5 at a particular position within the sequencing reads suggests the presence of m4C methylation. Bisulfite conversion and subsequent sequencing revealed that Efm733 possesses an m5C modification at the motif 5=-R m CCGGY-3= (Fig. 3 and Fig. S4); methylation occurs at the underlined position (position 2), which overlaps the MspJI recognition site (5=-CNNR-3=) and hence supports the evidence of methylation obtained from the MspJI digestion assays. To summarize this analysis, of the 780 5=-RCCGGY-3= motifs in the Efm733 genome, 750 passed the coverage filter applied, and 748 of these were detected as being modified (the number of motifs detected is defined as the number of motifs with a coverage depth of more than 10ϫ and a methylation ratio of 0.35). The average methylation ratio (defined as the mean of the methylation ratios from all motifs in the genome) was 96.5% (standard deviation, 6.2%), and the mean motif coverage was 61.5ϫ. Based on the cytosine conversion ratio of close to 1.0, m5C modification is supported, which is consistent with why it was not detected by our SMRT sequencing. EFSG_00659 is responsible for m5C methylation in Efm733. Based on the bioinformatics analyses, we hypothesized that EFSG_00659 was responsible for the m5C methylation found in Efm733 (Table 1). EFSG_00659 possessed no homologs in the other seven strains analyzed, making it a good candidate for the unique methylation found in Efm733. We queried the EFSG_00659 protein sequence against the REBASE gold-standard list and identified that it has high sequence similarity to M.AvaIX, M.VchO395I, and M.VchAI (E value of Յ3e Ϫ125 ; recognition sites are 5=-RCCGGY-3=). Interestingly, BLASTP identified no significant hits when EFSG_00659 was queried against the larger collection of 73 E. faecium genomes, indicating its unique presence in Efm733.  In order to link EFSG_00659 with the m5C methylation identified during bisulfite sequencing, we expressed it in the heterologous host Escherichia coli STBL4 and performed a REase protection assay. The REase AgeI recognizes the motif 5=-ACCGGT-3=, and its enzymatic activity is blocked if m5C methylation is present at the underlined position. This motif overlaps the m5C methylation motif in Efm733 identified by bisulfite sequencing. If the motif is methylated, DNA will be protected against digestion. We cloned EFSG_00659 into the vector pGEM-T and transformed it into E. coli STBL4, generating strain E. coli STBL4(pRB01). Genomic DNAs from Efm733, STBL4 (pGEM-T), and STBL4(pRB01) were treated with AgeI according to the manufacturer's instructions. EcoRI was used as a positive control for digestion. Figure 4 shows representative results of the digestions on a 1% agarose gel. As expected, Efm733 was digested by EcoRI and protected against digestion from AgeI. STBL4(pGEM-T) was digested by both EcoRI and AgeI, indicating that the original host and the empty vector  pGEM-T did not possess the appropriate m5C methylation. STBL4(pRB01) was protected against digestion by AgeI, demonstrating that EFSG_00659 is responsible for the 5=-R m CCGGY-3= methylation found in Efm733.

DISCUSSION
In this study, we used a combination of genomic and genetic approaches to identify a functional type I R-M system that is enriched in clade A1 E. faecium and that significantly alters the transformability of a model clade A1 strain. We propose that this R-M system impacts HGT rates among E. faecium mixed-clade communities, thereby helping to maintain the observed phylogenetic structure of E. faecium and facilitating HGT specifically among clade A1 strains. Mixed communities of E. faecium clades are expected to occur in environments where healthy and ill human hosts, human and animal hosts, and/or the feces of any of these hosts comingle (i.e., in sewage). In future studies, we plan to assess the impact of R-M on conjugative plasmid transfer, which is a major mode of HGT in enterococci and was not assessed in the present study.
An interesting observation from our study is the sequence diversity of type I S subunits encoded within E. faecium type I R-M systems having nearly identical R and M subunits (see Fig. S2a and b in the supplemental material). After further investigation into those alignments, we found that those S subunits sharing 50% to 70% overall amino acid sequence identities possess sequence diversity within one TRD, where the other TRD and the central conserved domain are conserved. This suggests that these systems share partial recognition sequences. Previous research showed that the diversification of type I R-M recognition sequences is driven by TRD exchanges, permutation of the dimerization domain, and circular permutation of TRDs (37). Our observation suggests that TRD recombination and reorganization events occur for E. faecium type I R-M systems outside clade A1. Future studies will use genomics to further explore the relationship between S subunit sequence diversity and its impact on E. faecium methylomes and interstrain and interclade HGT.

MATERIALS AND METHODS
Bacterial strains and growth conditions. The strains used in this study are shown in Table 3. All enterococci were grown in brain heart infusion (BHI) broth or agar at 37°C, unless otherwise stated. Escherichia coli strains were grown in Luria broth (LB) at 37°C and with shaking at 225 rpm unless otherwise stated. Antibiotic concentrations for enterococcal strains were as follows: 50 g/ml for rifampin, 25 g/ml for fusidic acid, 500 g/ml for spectinomycin, 500 g/ml for streptomycin, and 15 g/ml for chloramphenicol. Antibiotic concentrations for E. coli strains were as follows: 15 g/ml for chloramphenicol and 100 g/ml for ampicillin. All REases were purchased from New England Biolabs (NEB) and used according to the manufacturer's instructions. PCR was performed using Taq polymerase (NEB) or Phusion (Fisher). Sanger sequencing to validate all genetic constructs was performed at the Massachusetts General Hospital DNA Core facility (Boston, MA).
Isolation of genomic DNA. Enterococcal strains were cultured overnight in BHI broth prior to genomic DNA (gDNA) extraction. Extraction was performed using a Qiagen blood and tissue DNeasy kit using a previously reported protocol (38). To isolate E. coli gDNA, bacteria were grown overnight in LB prior to extraction using either the blood and tissue DNeasy kit (Qiagen) or the UltraClean microbial DNA isolation kit (Qiagen) according to the manufacturer's instructions. Whole-genome amplification (WGA) control DNA was generated by amplification of native gDNA using the REPLI-g kit (Qiagen) according to the manufacturer's instructions.
SMRT sequencing and methylome detection in E. faecium. SMRT sequencing of E. faecium gDNAs and their WGA controls was performed by the Johns Hopkins Medical Institute Deep Sequencing and Microarray Core. After sequencing, the reads were aligned to existing references (NCBI RefSeq accession numbers NZ_GG688486 to NZ_GG688546 for Efm502 and NZ_GG688461 to NZ_GG688485 for Efm733) and analyzed using the RS modification and motif detection protocol in SMRT portal v1.3.3. WGA controls were used as methylation baselines.
Bioinformatic analysis of R-M systems in eight E. faecium genomes. The entire protein complement for eight previously sequenced E. faecium isolates (39) was analyzed. To identify potential MTases, the REBASE gold-standard list (15) was used as a reference. This list is comprised of biochemically verified MTases and REases. Each protein sequence from E. faecium genomes was analyzed using BLASTP against the REBASE gold-standard list. The protein sequences with significant (E value of Ͻ1e Ϫ3 ) homology to REBASE gold-standard proteins were further filtered by protein size. If an E. faecium query protein length was less than half of its subject's length, the match was removed from the prediction list. Due to the sequence diversity of REases, which complicates their bioinformatic identification (15), guilt by association was used to identify full R-M systems, as we previously described (14). The proteins encoded near candidate DNA MTases were analyzed using BLAST and Pfam for conserved domains consistent with REase activities and/or sequence identity to confirmed REases. The amino acid sequence of each R-M candidate was then pairwise compared among all the eight strains to identify putative orthologs. If two protein sequences shared an amino acid identity of Ն90% with a query coverage of Ն90%, they were considered to be orthologous.
Expression of R-M systems in E. faecalis heterologous hosts. Genes encoding the specificity and methylation subunits of Efa733I (EFSG_05028-EFSG_05027) were PCR amplified in their entirety, including the upstream region to retain the native promoter, using primers 733_T1A_SM_F and 733_T1A_SM_R (see Table 5 for primer sequences). The PCR product was digested with NotI and ligated into NotIdigested pWH03 (14) using T4 DNA ligase (NEB), generating pHA102. pWH03 is a pLT06 derivative for expression of genes from a previously validated neutral genomic insertion site (EF2238-EF2239) for expression (GISE) (14,40). pHA102 constructs were then introduced into E. coli DH5␣ via heat shock for propagation and sequence confirmation. pHA102 was electroporated into E. faecalis OG1SSp using a previously described method (13). An E. faecalis OG1SSp derivative with a chromosomal integration of Efa733I, referred to as OG1SSp::efa733I, was generated by temperature shifts and p-chlorophenylalanine counterselection, as previously described (41).
Genes encoding the specificity and methylation subunits of Efa502I (EFQG_01131-EFQG_01132) were PCR amplified using primers 502_T1A_SM_F and 502_T1A_SM_R. The PCR product was then TA cloned into the pGEM-T Easy vector (Promega) and introduced into DH5␣ via heat shock to generate pGEM-SMA1. pGEM-SMA1 was then digested with NotI, and the digestion reaction product was used as the insert for ligation into NotI-digested pWH03. The ligation reaction product was then introduced into DH5␣, and colonies were screened for chloramphenicol resistance and ampicillin susceptibility to ensure that the pGEM backbone was not ligated into pWH03. Once the construct, referred to as pHA103, was confirmed via Sanger sequencing, it was introduced into electrocompetent E. faecalis OG1RF cells. An E. faecalis OG1RF derivative with a chromosomal integration of Efa502I, referred to as OG1RF::efa502I, was generated by temperature shifts and p-chlorophenylalanine counterselection. All plasmids and strains for heterologous expression were validated by PCR and Sanger sequencing.
SMRT sequencing of E. faecalis OG1 derivatives expressing Efa502I or Efa733I was performed by the University of Michigan sequencing core facility. Reads were mapped to the E. faecalis OG1RF reference sequence (GenBank accession number NC_017316), and the methylation motifs were detected using the RS modification and motif detection protocol in SMRT portal v.2.3.2. In silico controls were used as modification baselines.
Generation of E. faecium R-M deletion mutants. Regions up-and downstream of Efa502I and Efa733I were PCR amplified using primers listed in Table 5, ligated into pLT06, and transformed into E. coli EC1000 (42), generating pWH16 and pWH17 (Table 3). Insert sequences were confirmed using Sanger sequencing. E. faecium strains were made electrocompetent using a previously reported protocol (43). Two micrograms of sequence-confirmed plasmids was electroporated into electrocompetent Efm733 and Efm502. The generation of deletion mutants was accomplished using temperature shifts and p-chlorophenylalanine counterselection, as previously described (41). The successful deletion mutants were sequence confirmed by PCR and Sanger sequencing.
Transformation efficiency test. Efm733, Efm502, and their respective R-M deletion mutants were made electrocompetent using a modified version of a previously reported protocol (43). Briefly, cultures grown overnight were diluted 10-fold in BHI medium and cultured to an optical density at 600 nm (OD 600 ) of ϳ0.6. The bacteria were then pelleted and treated with filter-sterilized lysozyme buffer (10 mM Tris-HCl [pH 8.0], 10 mM EDTA [pH 8.0], 50 mM NaCl) supplemented with 83 l of a 2.5-kU/ml mutanolysin stock for 30 min at 37°C. The cells were then pelleted and washed three times with ice-cold filter-sterilized electroporation buffer (0.5 M sucrose and 10% glycerol). Finally, the cells were pelleted, resuspended in electroporation buffer, and aliquoted for storage at Ϫ80°C for future use. One microgram of pAT28 (35) was electroporated into the electrocompetent E. faecium cells. The counts of total viable cells and spectinomycin-resistant cells were determined by serial dilution and plating. The transformation efficiency was expressed as a percentage of transformed (spectinomycin-resistant) cells per total viable  502SMR_A1F_XbaI  CTAGTCTAGACCCTTGTTCGATATAGACCC  502SMR_A1R_BamHI  CGCGGATCCCCTAATATGAAAGCAATTATCAAC  502SMR_A2F_BamHI  CGCGGATCCGAGATTATCTGCCGCACTAAAT  502SMR_A2R_XmaI  TCCCCCCGGGGTAGAAACATACACAGCTATAC  733SMR_A1F_XbaI  CTAGTCTAGAGGACGTAAACCTTCGACA  733SMR_A1R_BamHI  CGCGGATCCATGTTAGATGATGAGTTGATTCC  733SMR_A2F_BamHI  CGCGGATCCGCTTACTGTGTCATATTCCTC  733SMR_A2R_XmaI  TCCCCCCGGGCTCTACGAACTTGATTGAATCC  733_T1A_SM_F  TACGATGCGGCCGCGGGAAGAACCATTTACACAA  733_T1A_SM_R  ACATTGGCGGCCGCCGAGCATCATTTATTAACATGTTT  502_T1A_SM_F  TTGCATGCGGCCGCCGGATTTGGAGCCCGACT  502_T1A_SM_R  TACGATGCGGCCGCCCCATAGGTAAAGTGCCAC  EFSG_00659_F CGTTACCTGCAGGCAGCGAAAACTAGTTGGG EFSG_00659_R GAATCAGGATCCGCCTCTAAAGAATAAAGATCC a Underlined sequences are restriction enzyme recognition sites.
cells. Three independent experiments were performed, and statistical significance was assessed using the unpaired one-tailed Student t test. Distribution analysis of putative R-M systems and orphan MTases. The amino acid sequences for select R-M system and orphan MTase candidates were queried against a collection of 73 E. faecium isolates previously analyzed by Lebreton et al. (8) using BLASTP. Any proteins that shared Ͼ90% query coverage and amino acid identity were considered orthologs. Fisher's exact test was used to determine if an orphan MTase or R-M system was significantly over-or underrepresented in a particular clade.
REase protection assays. To identify m5C methylation, gDNA was treated with the methylationsensitive REases McrBC, FspEI, and MspJI (NEB). A total of 500 ng gDNA was incubated with each REase at 37°C for 3 h (McrBC) or 6 h (FspEI and MspJI), followed by analysis by electrophoresis on a 1% agarose gel with ethidium bromide.
Bisulfite sequencing. Whole-genome bisulfite sequencing libraries were constructed using the Illumina TruSeq LT PCR Free kit and the Qiagen EpiTect bisulfite kit. Native DNA was isolated as described above. Whole-genome-amplified control DNA was generated by amplification of native gDNA using the Qiagen REPLI-g kit, according to the manufacturer's instructions. For bisulfite sequencing, briefly, 2 g each of native and WGA control DNA was fragmented using NEB fragmentase. DNA fragments ranging from 200 bp to 700 bp were gel extracted and end repaired. After A tailing of DNA fragments, Illumina TruSeq adapters were ligated. Bisulfite conversion was then performed using the Qiagen EpiTect bisulfite kit, according to the manufacturer's instructions. An 8-cycle PCR enrichment with Illumina primer mix was performed, followed by size selection and gel purification. The libraries were sequenced using Illumina MiSeq with 2-by 75-bp paired-end chemistry.
Whole-genome bisulfite sequencing analysis. The sequencing reads were analyzed using Bismark (44), with additional quality control and filtering as described previously (14). Briefly, the Illumina reads were mapped to the in silico bisulfite-converted references (44). We then quantified the conversion rate of each mapped read by calculating the percentage of converted C bases (which will result in T) to the total number of C bases in the reference within the mapped region. The mapped reads with a Յ80% conversion rate were filtered out from the analysis (14). Next, the coverage depth and methylation ratio were calculated for each C site. The methylation ratio was calculated by dividing the total number of C bases by the coverage depth at each C site. A fully methylated C residue thus protected from bisulfite conversion will have a methylation ratio near 1. An unmethylated C residue will have a methylation ratio near zero. To identify consensus methylation motifs, C sites with a methylation ratio of Ն0.35 and a coverage depth of Ն10ϫ, along with the sequences 5 bp upstream and 5 bp downstream, were extracted. The extracted sequences were subjected to a MEME motif search (45).
Confirmation of m5C MTase activity. Primers EFSG_00659_F and EFSG_00659_R (Table 5) were used to amplify the entire Efm733 EFSG_00659 coding region and its upstream predicted promoter. The PCR product was then cloned into the pGEM-T Easy vector (Promega) according to the manufacturer's instructions and transformed into E. coli STBL4 (Fisher) to generate pRB01. REase digestion assays with methylationsensitive enzymes were performed on purified E. coli and E. faecium gDNAs as described above.
Accession number(s). DNA sequence data generated in this study have been deposited in the Sequence Read Archive under accession numbers PRJNA397049 (for SMRT sequencing data) and PRJNA488088 (for Illumina bisulfite sequencing data).

SUPPLEMENTAL MATERIAL
Supplemental material for this article may be found at https://doi.