Discovery and Biosynthesis of the Antibiotic Bicyclomycin in Distantly Related Bacterial Classes

ABSTRACT Bicyclomycin (BCM) is a clinically promising antibiotic that is biosynthesized by Streptomyces cinnamoneus DSM 41675. BCM is structurally characterized by a core cyclo(l-Ile-l-Leu) 2,5-diketopiperazine (DKP) that is extensively oxidized. Here, we identify the BCM biosynthetic gene cluster, which shows that the core of BCM is biosynthesized by a cyclodipeptide synthase, and the oxidative modifications are introduced by five 2-oxoglutarate-dependent dioxygenases and one cytochrome P450 monooxygenase. The discovery of the gene cluster enabled the identification of BCM pathways encoded by the genomes of hundreds of Pseudomonas aeruginosa isolates distributed globally, and heterologous expression of the pathway from P. aeruginosa SCV20265 demonstrated that the product is chemically identical to BCM produced by S. cinnamoneus. Overall, putative BCM gene clusters have been found in at least seven genera spanning Actinobacteria and Proteobacteria (Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria). This represents a rare example of horizontal gene transfer of an intact biosynthetic gene cluster across such distantly related bacteria, and we show that these gene clusters are almost always associated with mobile genetic elements. IMPORTANCE Bicyclomycin is the only natural product antibiotic that selectively inhibits the transcription termination factor Rho. This mechanism of action, combined with its proven biological safety and its activity against clinically relevant Gram-negative bacterial pathogens, makes it a very promising antibiotic candidate. Here, we report the identification of the bicyclomycin biosynthetic gene cluster in the known bicyclomycin-producing organism Streptomyces cinnamoneus, which will enable the engineered production of new bicyclomycin derivatives. The identification of this gene cluster also led to the discovery of hundreds of bicyclomycin pathways encoded in highly diverse bacteria, including in the opportunistic pathogen Pseudomonas aeruginosa. This wide distribution of a complex biosynthetic pathway is very unusual and provides an insight into how a pathway for an antibiotic can be transferred between diverse bacteria.

essential protein in many bacteria (7,8) and has been used to treat traveler's diarrhea (9), as well as in veterinary medicine to treat calves, pigs, and fish (7).
BCM is the only natural product known to target Rho, which together with its proven safety in mammals and its activity against clinically relevant Gram-negative bacterial pathogens, like Acinetobacter baumannii and Klebsiella pneumoniae, makes it a very attractive antibiotic (7,10). This promise is enhanced by the recent discovery that a combination of BCM with bacteriostatic concentrations of antibiotics targeting protein synthesis leads to a rapid bactericidal synergy (10). Furthermore, structure-activity relationship studies show that BCM potency can be improved through modification of its exomethylene group (11,12).
In contrast with the extensive knowledge on BCM's mechanism of action (6, 7), very little was known about the biosynthesis of this antibiotic. Feeding experiments previously showed that the DKP scaffold derives from L-leucine and L-isoleucine, as well as the likely involvement of a cytochrome P450 monooxygenase in one of the oxidative steps that convert cIL into BCM (13) (Fig. 1A). To understand BCM biosynthesis, we identified the biosynthetic gene cluster for BCM in S. cinnamoneus DSM 41675, which showed that the DKP core is produced by a cyclodipeptide synthase (CDPS) (Fig. 1B). This discovery enabled the identification of homologous clusters in several other species, including hundreds of isolates of Pseudomonas aeruginosa, an opportunistic pathogen that causes serious hospital-acquired infections. We prove that the P. aeruginosa bcm gene cluster is functional and that its product is identical to BCM from Streptomyces; therefore, it represents a viable alternative platform for BCM production. This is a rare example of an almost identical biosynthetic gene cluster in Gram-negative and Gram-positive bacteria. An analysis of the phylogeny and genomic context of bcm gene clusters provides an insight into its likely dispersion through horizontal gene transfer (HGT) and implies that the bcm gene cluster may have undergone a partial genetic rearrangement between Gram-positive and Gram-negative bacteria.

RESULTS AND DISCUSSION
Genome sequencing and identification of the BCM gene cluster in S. cinnamoneus. The genome sequence of a known BCM producer, S. cinnamoneus DSM 41675, was obtained using a combination of Oxford Nanopore MinION and Illumina MiSeq technologies. Illumina MiSeq provided accurate nucleotide-level read data, but an Illumina-only assembly was distributed across 415 contigs, in part due to the difficulties in assembling short-read data of highly repetitive sequences from large modular polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) genes (14), which were found at the start or end of multiple contigs. Therefore, we also sequenced the genome using Oxford Nanopore MinION technology, which is capable of achieving read lengths of over 150 kb (15). The Nanopore output enabled a much better assembly of the genome over 4 contigs, although at a much lower accuracy at the nucleotide level. Using the raw read data from both sequence runs, we obtained a hybrid assembly composed of a 6.46-Mb contig containing almost all of the chromosome, as well as a smaller 199-kb contig (see Table S1 in the supplemental material). antiSMASH analysis (16) of this assembly revealed that the 199-kb contig is likely to form part of the chromosome, as the termini of this contig and the 6.46-Mb contig encode different regions of an enduracidin-like gene cluster. In total, these two contigs yield an almost-contiguous 6.66-Mb S. cinnamoneus genome sequence.
Published feeding experiments indicate that BCM is a DKP derived from L-leucine and L-isoleucine and that a cytochrome P450 is likely to be involved in the pathway (13). Furthermore, a number of additional oxidative reactions are needed to form the final molecule (Fig. 1A). DKPs are produced naturally by either bimodular NRPSs (17,18) or by CDPSs (19)(20)(21), so we expected the biosynthetic gene cluster for BCM to encode either of these enzymatic systems plus six to seven oxidative enzymes. Analysis of the S. cinnamoneus genome sequence with antiSMASH 3.0.5 (16) indicated that there were no suitable NRPS pathways but also no identifiable CDPS pathways. We therefore assessed the genomic regions surrounding every P450 gene in the genome, which revealed the presence of a P450 gene (bcmD) that was clustered with genes encoding five 2-oxoglutarate (2OG)-dependent dioxygenases (bcmB, bcmC, bcmE, bcmF, and bcmG), a gene encoding a major facilitator superfamily (MFS) transporter (bcmT), and a CPDS gene (bcmA) that is below the antiSMASH conserved domain detection limit for CDPSs ( Fig. 1B and Table S2). Both P450s and 2OG-dependent dioxygenases are capable of catalyzing the regiospecific and stereospecific oxidation of nonactivated COH bonds (22)(23)(24), while MFS transporters often function as drug efflux pumps and can confer antibiotic resistance (25,26).
The putative CDPS (pfam16715) BcmA has multiple homologs (Ͼ45% identity) in other Actinobacteria and, notably, in various Pseudomonas aeruginosa strains. Interestingly, a homolog from P. aeruginosa (accession no. WP_003158562.1) was previously shown to catalyze the in vitro synthesis of cIL (27), and BcmA contains almost all the same specificity-determining binding pocket residues as WP_003158562.1 (Fig. S1). Surprisingly, the five 2OG-dependent dioxygenases encoded in the cluster share only moderate sequence identity (33 to 45%). In total, the gene cluster encodes six oxidative enzymes, which is consistent with the number of modifications required to convert cIL into BCM.
Heterologous expression of the bcm gene cluster. To test whether the identified gene cluster was indeed responsible and sufficient for the biosynthesis of BCM, a 7-kb region spanning bcmA to bcmG (bcmA-G) was PCR amplified and cloned into the ⌽BT1 integrative vector pIJ10257 (28) by Gibson assembly (29) to generate pIJ-BCM. This places the constitutive promoter ermE*p before bcmA, which we anticipated would promote the expression of all bcm genes, as they are tightly clustered on the same strand. The putative transporter gene bcmT was not included on the basis that several homologs of this gene, as well as a homolog of the reported BCM resistance gene (30), are present in the Streptomyces coelicolor genome. pIJ-BCM was introduced into S. coelicolor M1146 and M1152 (31) via intergeneric conjugation. Liquid chromatographytandem mass spectrometry (LC-MS 2 ) analysis of cultures of the resulting strains yielded a peak of m/z 285.11 not present in the control strains (Fig. 2 Identification and heterologous expression of a bcm gene cluster from Pseudomonas aeruginosa. During our bioinformatic analysis of the S. cinnamoneus bcm gene cluster, it became clear that entire bcm-like gene clusters with an apparently identical organization of bcmA-G genes were present in a variety of Gram-negative and Gram-positive bacterial species and in particular in multiple P. aeruginosa strains. This distribution of such a conserved antibiotic gene cluster is very rare and prompted us to investigate whether these highly similar gene clusters actually make identical products. As a representative example, P. aeruginosa SCV20265 was therefore investigated for its ability to produce BCM. This strain is a well-studied (34)(35)(36) small-colony variant of the opportunistic pathogen isolated from the lung of a patient with cystic fibrosis (37) and is considered a reference strain in antibiotic resistance studies (38). The P. aeruginosa SCV20265 bcm-like gene cluster encodes proteins with sequence identities of between 30 and 56% compared to their Streptomyces counterparts. An MFS transporter is also encoded in this cluster but is at the end of the bcmA-G operon instead of preceding bcmA (Fig. 1B and Table S2).
No BCM production was detected in cultures of P. aeruginosa SCV20265 in a range of production media, so heterologous expression of the gene cluster was carried out to determine whether the pathway is functional. The putative bcm cluster (including bcmT) was PCR amplified from SCV20265 genomic DNA (gDNA) and cloned into pJH10TS (39,40), which places the putative bcm operon under the control of the synthetic promoter Ptac. Pseudomonas fluorescens SBW25 was transformed with the resulting plasmid (pJH-BCMclp-PA). Several clones of this heterologous expression strain were cultured in the same set of production media as P. aeruginosa and assessed for their ability to produce BCM. LC-MS 2 analysis revealed that P. fluorescens SBW25/ pJH-BCMclp-PA efficiently produces BCM after 14 h of growth (Fig. 3). The putative BCM detected in these samples exhibited the same retention time, mass, and fragmentation profile as a pure BCM standard, including MS signals of m/z 285.11, as observed previously, and m/z 325.10, corresponding to [BCMϩNa] ϩ (Fig. 3, S3, and S4). This result is consistent with parallel work from Patteson et al. (32), but this does not preclude the possibility of variation in stereochemistry at one or more positions in the molecule. We therefore scaled up production, purified the compound, and subjected it to nuclear magnetic resonance (NMR) analysis ( 1 H, 13 C, correlation spectroscopy [COSY], heteronuclear multiple bond correlation [HMBC], and heteronuclear single-quantum correlation [HSQC]), which provided identical spectra (Fig. S5 to S10 and Table S4) to authentic BCM reported previously (41). Pseudomonas-produced BCM also had the same optical rotation as a BCM standard, confirming that they are stereochemically identical.
One of the most efficient media for BCM production in P. fluorescens was synthetic cystic fibrosis medium (SCFM), which mimics the salt and amino acid composition from cystic fibrosis sputum samples (42). The composition of this medium was simplified to generate bicyclomycin production medium (BCMM), in which cultures of P. fluorescens SBW25/pJH-BCMclp-PA provided BCM yields of 34.5 Ϯ 2.1 mg/liter in only 14 h. Interestingly, we could detect at least six additional compounds in the heterologous expression strain in comparison to a negative-control strain harboring empty pJH10TS ( Fig. 3, S3, and S4). All of these compounds have masses compatible with BCM-like compounds (Table S3), and some have BCM-like MS 2 fragmentation patterns, such as a loss of 74.04 Da that corresponds to fragmentation of the oxidized leucine side chain (Fig. S4). This production profile makes P. fluorescens a promising BCM production system compared to the complex media and longer incubation times required to produce BCM in Streptomyces species, the current source of commercially available BCM. In contrast, we could not detect any BCM-like molecules in cultures of wild-type P. aeruginosa SCV20265, suggesting that additional factors are required to activate the expression of an otherwise-functional gene cluster.
Organization, taxonomic distribution, and phylogeny of the bcm cluster. The presence of seven contiguous biosynthetic genes that make the same antibiotic in both Gram-positive and Gram-negative bacteria was a fascinating result. The production of the same compound in such distantly related organisms (bacteria that are evolutionarily at least 1 billion years apart [43]) is incredibly rare but not unprecedented (44). To investigate this unusual result, a BLASTP search using BcmA was used to identify putative bcm gene clusters (bcmA-G) in sequenced bacterial genomes. In total, 724 candidates were identified, where 31 are found in a variety of taxa, and the remaining sequences all come from Pseudomonas species, in particular, P. aeruginosa. This initial data set was filtered (see Materials and Methods) to generate a final data set for phylogenetic analysis containing 374 bcm-like gene clusters (Data Set S1). Analysis of this data set showed that bcm-like gene clusters are also found in seven other sequenced Streptomyces species besides S. cinnamoneus, as well as 20 Mycobacterium chelonae strains, Williamsia herbipolensis (order Corynebacteriales), Actinokineospora spheciospongiae (order Pseudonocardiales), and the Gram-negative bacteria Burkholderia plantarii and Tistrella mobilis (from Betaproteobacteria and Alphaproteobacteria, respectively). Furthermore, a fragmented bcm-like gene cluster was identified in Photorhabdus temperata (Gammaproteobacteria) by BLAST analysis of BcmA and the P450 BcmD. This cluster is split across two different contigs (accession numbers NZ_AYSJ01000007 and NZ_AYSJ01000009), where it is accompanied by transposase genes and therefore was not included in our data set.
Most bcm gene clusters from Gram-positive bacteria share the same gene organization, with bcmT in an opposite orientation upstream of bcmA, whereas in all the Gram-negative bacteria (and Actinokineospora), bcmT is downstream of bcmG and in the same orientation as the rest of the cluster. Streptomyces ossamyceticus is the only representative that lacks a transporter gene immediately adjacent to the biosynthetic genes. Additionally, the MFS transporters in gene clusters from Gram-positive bacteria only share 27 to 30% sequence identity (approximately 40% coverage) with MFS transporters from Gram-negative gene clusters, suggesting that the transporters have been recruited independently from the rest of the cluster in these distant bacteria.
All the bcm gene clusters identified in this work were analyzed phylogenetically by constructing a maximum likelihood tree from the nucleotide sequence spanning bcmA-G. This showed that their evolutionary relationship correlates with bacterial genera (Fig. 4A). Clusters from Gram-negative (particularly Pseudomonas) and Grampositive bacteria are grouped in completely independent and distant clades, while the clusters from Burkholderia and Tistrella appear at intermediate points between these two groups. Within the Gram-positive clade, the clusters have a higher degree of divergence but are similarly grouped according to the classification of their native species, with the Williamsia gene cluster clustering with the M. chelonae gene clusters (these two genera belong to the order Corynebacteriales) (Fig. 4B). All P. aeruginosa gene clusters are ϳ99% identical to each other (Fig. 4A), whereas the two most distantly related streptomycete gene clusters share 69% identity and 83% coverage.
Mobile genetic elements associated with bcm-like gene clusters. The contrast between the genetic conservation of the bcm gene cluster and its distribution across distantly related bacteria strongly implies that the bcm gene cluster has been horizontally transferred between them. The increased sequence divergence of the bcm gene clusters in Streptomyces species suggests that the gene cluster may have originated from this taxonomic group, although it is difficult to prove this hypothesis, as the gene clusters in all strains appear to have adapted to their hosts, making HGT difficult to infer. Despite the below-average GC content of the clusters (59.6% in P. aeruginosa SCV20265 and 70.8% in S. cinnamoneus) versus the genome averages (66.3% and 72.4%, respectively), the clusters were not predicted to be part of genomic islands in these strains when analyzed with IslandViewer4 (45).
However, analysis of the genomic context of bcm gene clusters in P. aeruginosa strains strongly supports an insertion hypothesis, since the genes that flank the cluster are contiguous in a number of P. aeruginosa strains that lack the cluster (Fig. S11). Most notably, bcmT is adjacent to the glucosamine-fructose-6-phosphate aminotransferase gene glmS, and the intergenic region that precedes glmS contains the specific attachment site for transposon Tn7 (attTn7) (46). Consistent with this observation, some strains that lack the bcm gene cluster (e.g., P. aeruginosa BL08) have mobile genetic elements integrated next to glmS (Fig. S11). Intriguingly, many strains, including the reference strain P. aeruginosa PAO1, contain an MFS transporter gene (PA5548 in PAO1) adjacent to glmS that is 99% identical with bcmT from SCV20265. This either indicates that the bcm gene cluster recently integrated next to an existing P. aeruginosa transporter or that a subset of strains lost the biosynthetic genes but retained a potential BCM resistance gene.  Fig. 1B) is shown for each branch of the tree. Flanking genes are color-coded gray if they encode proteins with conserved domains, white for hypothetical proteins with no conserved domains, and red for proteins related to mobile genetic elements (see Table S5 for details). Vertical black lines represent tRNA genes. (C) Genetic context of the bcm clusters in Gram-negative bacteria. The black triangle represents an attTn7 site.
The bcm-like gene clusters in other Gram-negative bacteria (Burkholderia and Tistrella spp.) and most Gram-positive bacteria are located next to genes coding for integrases, transposases, and other genetic mobility elements ( Fig. 4B and C and Table  S5). For example, the mycobacterial clusters are found close to tRNA genes, and their flanking genes are syntenic in some Mycobacterium abscessus strains, whereas in other M. abscessus strains, these genes are separated by a cluster of phage-related genes ( Fig.  4B and S12). In the streptomycetes, the clusters are integrated in different genomic locations, where they are also often associated with mobile genetic elements ( Fig. 4B and Table S5). This observation strongly supports HGT of the cluster between these taxa as well.
Diversity and geographical distribution of the bcm cluster in P. aeruginosa. The high sequence identity of the bcm gene cluster across hundreds of P. aeruginosa strains (Fig. 4A) along with its consistent genomic context (Fig. 4C) led us to question whether this cluster is truly widespread or only found in a small subset of P. aeruginosa strains that are overrepresented in sequence databases. P. aeruginosa isolates have been widely sequenced to evaluate pathogen diversity and evolution (38,47,48). As a result, large collections of sequenced clinical isolates are available in the databases, potentially constituting a biased data set that might lead to an overestimation of bcm gene cluster abundance and conservation. Most of the sequences in our final bcm data set come from well-characterized isolate collections. Among them, the Kos collection (38) provides a comprehensive survey of P. aeruginosa diversity, and the bcm gene cluster is present in nearly 20% of the isolates sequenced in this collection (74 out of the 390 isolates). To assess the phylogenetic diversity of these strains, we plotted the presence of the bcm gene cluster onto the Kos collection phylogenetic tree (38). Strikingly, this showed that nearly all of the bcm-positive strains are found in the PAO1 clade (Fig. 5), but these come from very diverse locations, including the United States, Mexico, Spain, France, Germany, China, Argentina, Brazil, Colombia, Croatia, and Israel, among others. This geographic diversity was further augmented by an analysis of all P. aeruginosa strains encoding the pathway (Data Set S1). We can therefore conclude that the bcm gene cluster is distributed globally but within a phylogenetically distinct subset of P. aeruginosa strains. Given this phylogenetic distribution, it is surprising to note that a bcmT gene is also found next to glmS in P. aeruginosa PA14 (Fig. S11), even though no isolates within the PA14 clade carry the bcm gene cluster.
2OG-dependent dioxygenase phylogeny. An unusual feature of the bcm gene clusters is the presence of five 2OG-dependent dioxygenase genes. While it is possible that they originally arose by gene duplication events, the S. cinnamoneus 2OGdependent dioxygenases only possess 33 to 45% sequence identity with each other (Fig. S13). We hypothesized that an analysis of the diversity of the bcm 2OGdependent dioxygenases across multiple taxa could provide an insight into gene cluster evolution. We therefore constructed a maximum likelihood tree using protein sequences of every 2OG-dependent dioxygenase (BcmB, BcmC, BcmE, BcmF, and BcmG homologs) from both S. cinnamoneus and P. aeruginosa SCV20265, as well as from other selected P. aeruginosa strains and at least one representative from the other genera that contain bcm-like gene clusters.
In contrast to the overall gene cluster phylogeny, the evolutionary relationship of the bcm oxidases correlates with their position in the cluster, as would be expected for a horizontally transferred unit ( Fig. 6 and S14). BcmB, BcmC, and BcmG homologs group clearly in different clades, and within these clades the sequences from Gram-negative bacteria branch out from the Gram-positive subgroups, perhaps indicating the ancestral origin of these proteins. A surprising result was the unexpected phylogeny of the remaining two 2OG-dependent dioxygenases, BcmE and BcmF. These are clearly sep- arated into two different clades: one containing BcmE from Gram-negative bacteria (BcmEϪ) and BcmF from Gram-positive bacteria (BcmFϩ), and one where BcmEϩ groups with BcmFϪ. Within these two clades, Gram-positive and Gram-negative representatives are more distinct and bifurcate earlier than in the other clades ( Fig. 6 and S14). The phylogenetic relationship between the 2OG-dependent dioxygenases strongly supports HGT of the cluster between taxa, although the BcmE/BcmF phylogeny indicates that the cluster may have undergone partial reorganization (Fig. 6). This intriguing result might mean that BcmE and BcmF fulfill inverse roles in Gram-positive and Gram-negative bacteria, and further experiments are necessary to test this hypothesis.
In summary, we demonstrate that the antibiotic BCM is a CDPS-derived natural product whose biosynthetic gene cluster is present in a diverse array of both Grampositive and Gram-negative bacteria. This characterization was supported by heterologous expression of pathways from S. cinnamoneus and P. aeruginosa, where the pathway product was proven to be stereochemically identical to authentic BCM. We have also showed that the previously orphan P. aeruginosa pathway is a promising system for the production of BCM and related derivatives. The bcm cluster is dispersed across a number of taxonomically distant bacteria, including Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria, as well as several families in Actinobacteria. The widespread presence of bcmT genes in P. aeruginosa (even those that lack the biosynthetic genes; Fig. S11), may explain why BCM is inactive toward P. aeruginosa (49), but further work is required to determine whether bcmT confers BCM resistance.
The presence of mobile genetic elements associated with the bcm gene cluster in many bacteria strongly supports the dissemination of this gene cluster via HGT, and the diversity of the gene cluster in Gram-positive bacteria suggests that it was subsequently transferred to Gram-negative bacteria, where two dioxygenase genes have apparently rearranged in the gene cluster and an alternative MFS transporter was acquired. However, the opposite direction of horizontal transfer cannot be ruled out. We are not aware of such a widespread distribution of any other specialized metabolite gene cluster, although there are examples of compounds that have been found in both Gram-positive and Gram-negative bacteria, such as pyochelin (50), the coronafacoyl phytotoxins (51), and furanomycin (52). Moreover, even in the cases where a similar compound is produced by distantly related bacteria, this can be achieved using different biosynthetic machinery. This is the case for the antibiotic fosfomycin, whose biosynthesis in Streptomyces and Pseudomonas spp. is catalyzed by distinct pathways that have undergone convergent evolution (53). Examples of highly conserved biosynthetic gene clusters between distantly related bacteria are rare, where one or more genes are different, such as the althiomycin gene clusters in Serratia marcescens and Myxococcus xanthus (54). A recent study by McDonald and Currie showed that it is very rare to find intact laterally transferred biosynthetic gene clusters, even between streptomycetes (55).
Given this distribution of bcm gene clusters, it will be interesting to determine the ecological role of BCM, especially given the abundance of functional pathways in pathogenic P. aeruginosa strains isolated from lungs, where adaptive evolutionary pressure would have led to the loss or decay of the cluster unless it conferred a competitive advantage (56). Antibacterial natural products can have roles in pathogen virulence, such as a bacteriocin produced by the pathogen Listeria monocytogenes that modifies intestinal microbiota to promote infection (57). In addition, given the horizontal transfer of the bcm gene cluster and its extensive association with mobile genetic elements, it is interesting to note that the transcription terminator Rho most strongly represses the transcription of horizontally acquired regions of genomes (58), an activity that would be specifically inhibited by BCM (7). It is known that phages recruit genes from bacteria that increase their fitness and that of their hosts (59,60), and this may occur with the bcm gene cluster. These intriguing observations invite further work to be conducted to determine the natural role of BCM.

MATERIALS AND METHODS
Chemicals and molecular biology reagents. Pharmamedia was obtained from Archer Daniels Midland Company. Antibiotics and all other medium components and reagents were purchased from Sigma-Aldrich. Bicyclomycin was purchased from BioAustralis Fine Chemicals (Australia). Enzymes were purchased from New England BioLabs unless otherwise specified, and molecular biology kits were purchased from Promega and GE Healthcare.
Bacterial strains, plasmids, and culture conditions. Escherichia coli, Streptomyces, and Pseudomonas strains, as well as plasmids and oligonucleotides used or generated in this work, are reported in Tables 1 and 2. S. cinnamoneus DSM 41675 was acquired from the German Collection of Microorganisms and Cell Cultures (DSMZ, Germany), P. aeruginosa SCV20265 was provided by Susanne Häussler (Helmholtz Centre for Infection Research, Germany), and pJH10TS was provided by Barrie Wilkinson (John Innes Centre, UK). E. coli and Pseudomonas strains were grown in lysogeny broth (LB) at 37°C (except for P. fluorescens SBW25, which is temperature sensitive and was grown at 28°C) and stored at Ϫ70°C in 50% glycerol stocks. Streptomyces strains were cultured in liquid tryptone soya broth (TSB; Oxoid) or solid soya flour mannitol (SFM) medium (61) at 28 to 30°C and stored at Ϫ70°C as 20% glycerol spore stocks.
The following media were used for bicyclomycin production experiments: Aizunensis production medium (AIZ), adapted from Miyamura et al. .8]; a solid version of CIN medium with 20 g/liter agar was used to grow S. cinnamoneus for reliable spore production). Synthetic cystic fibrosis medium (SCFM) was prepared following the recipe reported by Kamath and coworkers (42), and an alternative medium optimized for bicyclomycin production (BCMM) was developed from SCFM and comprised of the following (per liter): 6.5 ml of 0.2 M NaH 2 PO 4 , 6.25 ml of 0.2 M Na 2 HPO 4 , 0.348 ml of 1 M KNO 3 , 0.122 g NH 4 Cl, 1.114 g KCl, 3.03 g NaCl, 10 mM morpholinepropanesulfonic acid (MOPS), 16.09 ml of 100 mM L-leucine, 11.2 ml of 100 mM L-isoleucine, 6.33 ml of 100 mM L-methionine, 15.49 ml of 100 mM L-glutamic acid hydrochloride, 6.76 ml of 100 mM L-ornithine-HCl, 1.92 ml of 84 mM L-cystine (dissolved in 0.8 M HCl), and 2 ml of 3.6 mM FeSO 4 ·7H 2 O, all in Milli-Q water. The solution was adjusted to pH 6.8, filter sterilized, and supplemented with 0.606 ml of 1 M MgCl 2 and 1.754 ml of 1 M CaCl 2 (sterilized separately). When necessary, antibiotics were added at the following concentrations: 50 g/ml hygromycin, 50 g/ml apramycin, 50 g/ml kanamycin, 25 g/ml chloramphenicol, 25 g/ml nalidixic acid, and 12.5 g/ml tetracycline.
Genome sequencing, annotation, and bioinformatics analysis of S. cinnamoneus. Genomic DNA of S. cinnamoneus DSM 41675 was isolated according to the salting-out protocol (61), subjected to a TruSeq PCR-free library preparation, and sequenced using Illumina MiSeq (600 cycles, 2 ϫ 300 bp) at the DNA Sequencing Facility, Department of Biochemistry, University of Cambridge (UK). MinION Nanopore sequencing (Oxford Nanopore Technologies, UK) was carried out using the protocol below. A single colony from S. cinnamoneus grown on solid CIN medium was used to inoculate 50 ml TSB, which was incubated at 28°C overnight with shaking at 250 rpm. 1 ml of this seed culture was used to inoculate a further 50 ml of TSB, which was again incubated at 28°C overnight with shaking at 250 rpm. DNA was extracted from 10 ml of this culture using the salting-out procedure described before (61) and resuspended in 5 ml Tris-EDTA (TE) buffer. DNA concentration was quantified using a Qubit 2.0 fluorometer (Life Technologies), and fragment length and DNA quality were assessed using the Agilent TapeStation 2200 (Agilent Technologies).
Genomic DNA (ϳ11 g in 100 l) was fragmented using a Covaris g-TUBE (Covaris, UK) centrifuged at 3,380 ϫ g for 90 s 2 times to achieve a fragment distribution with a peak at ϳ16 kb. The sequencing library was prepared using Oxford Nanopore Technologies Nanopore sequencing kit SQK-NSK007 (R9 version), according to the manufacturer's protocol (16 May 2016 version), starting at the end-prep step with ϳ2.5 ng of DNA. Half (12 l) of the library was loaded onto a FLO-Min104 (R9 version) flow cell and sequenced for ϳ22 h using the script MinKNOW NC_48hr_Sequencing_Run_FLO-Min104.py. The flow cell was restarted after ϳ7 h. The remaining 12 l of the library was loaded after restarting the flow cell at 22 h. Sequencing was then run for a further 43 h. Base calling was performed using Metrichor Desktop Agent (version 1.107, 2D basecalling for SQK-NSK007).
The complete raw data set comprised 7,044,217 paired-end 301-bp Illumina MiSeq reads and 53,048 Nanopore MinION reads that passed quality control (QC). The Nanopore reads were extracted to fastq format using the poRe R package (63). For the Illumina-only assembly, SPAdes version 3.6.2 (64) was used with the k-mer flag set to -k 21,33,55,77,99,127. For the Nanopore-only assembly, Canu version 1.5 (65) was used with genome size of 7.0 m and the -nanopore-raw flag. For the hybrid Illumina/Nanopore assembly, SPAdes version 3.8.2 (66) was used, supplied with both data sets and with the -careful and -nanopore flags. Contigs with low sequence coverage were removed from the hybrid assembly. All assembly tasks were conducted using 16 central processing units (CPUs) on a 256-Gb compute node within the Norwich Bioscience Institutes (NBI) High Performance Computing (HPC) cluster. Genome assembly statistics are reported in Table S1. The hybrid assembly genome sequence was annotated using Prokka (67), which implements Prodigal (68) as an orf calling tool.
Cloning the S. cinnamoneus bcm gene cluster. The DNA region containing the bcm gene cluster was PCR amplified from S. cinnamoneus gDNA using primers pIJ-bcm_start and pIJ-bcm_end with Herculase II Fusion DNA polymerase (Agilent). The resulting 6,981-bp fragment was gel purified and inserted via Gibson assembly (29) into pIJ10257 (a ⌽BT1 integrative and hygromycin-resistant vector [28]) linearized with NdeI and PacI to generate plasmid pIJ-BCM. To verify that the cluster sequence in this construct was correct, the plasmid was Sanger sequenced using primers BCM_seq_1 to BCM_seq_8. All other DNA isolation and manipulation techniques were performed according to standard procedures (69).
Genetic manipulation of Streptomyces and heterologous expression of the bcm cluster. Methylation-deficient E. coli ET12567 carrying the helper plasmid pUZ8002 (70) was transformed with pIJ-BCM by electroporation. This was employed as the donor strain in an intergeneric conjugation with S. coelicolor M1146 and M1152 (31), which was performed according to standard protocols (61). Exconjugants were screened by colony PCR with primers bcm-cdps_chk_fw and bcm-cdps_chk_rv to confirm plasmid integration. Control strains containing empty pIJ10257 were also generated using the same methodology. Cloning and expression of the P. aeruginosa bcm gene cluster. Genomic DNA of P. aeruginosa SCV20265 was obtained using the FastDNA Spin kit for soil (MP Biomedicals). The DNA region containing genes bcmA to bcmT preceded by their own native promoter was PCR amplified using primers pJH-BCMclp_start and pJH-BCMcl_end with Herculase II Fusion DNA polymerase (Agilent). The resulting 8,604-bp fragment was gel purified and inserted via Gibson assembly (29) into pJH10TS (a derivative of the broad-host-range IncQ expression vector pJH10 carrying the synthetic Tac promoter [39,40]) linearized with NdeI and XbaI to generate expression plasmid pJH-BCMclp-PA. This plasmid was verified by Sanger sequencing with primers BCM_PA_seq_1 to BCM_PA_seq_9 and introduced into P. fluorescens SBW25 via electroporation of freshly made competent cells, which were prepared as follows: two 1-ml aliquots of an overnight culture of P. fluorescens were centrifuged at 11,000 ϫ g for 1 min, and the pellets were washed three times with 1 ml of HEPES buffer each, centrifuging at 11,000 ϫ g for 1 min in every wash. The two pellets were then merged and resuspended in 100 l HEPES buffer, and 2 l of plasmid prep was added to the cell suspension, which was electroporated applying 2,500 V. After electroporation, the suspension was transferred to 1 ml of fresh LB and incubated with shaking at 28°C for 1 h, after which 100 l of the mixture were plated onto an LB plate containing 12.5 g/ml tetracycline. As a negative control, the empty vector pJH10TS (40) was also transformed into P. fluorescens SBW25. In order to verify the presence and sequence accuracy of the construct in P. fluorescens, colony PCR was carried out with transformants using primers pJH_chk_fw and pJH_chk_rv. For the positive clones selected for downstream work, pJH-BCMclp-PA was recovered and sequenced with primers BCM_PA_seq_1 to BCM_ PA_seq_9.
Production and LC-MS analysis of BCM. Thirty microliters of a concentrated stock of S. cinnamoneus spores was used to inoculate 10 ml AIZ medium in 50-ml flasks, which were incubated at 28°C with shaking at 250 rpm for 3 days. Five hundred microliters of this seed culture was used to inoculate 7 ml of CIN medium in 50-ml Falcon tubes covered with foam bungs. These production cultures were incubated at 28°C with shaking at 250 rpm for 4 days. The same procedure was used for S. coelicolor M1146/pIJ-BCM and M1152/pIJ-BCM. For production in P. fluorescens, 20 l of cell stocks was used to inoculate 10 ml SCFM in 30-ml universal polystyrene tubes. These cultures were grown overnight at 28°C with shaking at 250 rpm, with the screw caps slightly loose to allow aeration, and 400-l aliquots were used to inoculate 10 ml BCMM in 50-ml Falcon tubes covered with foam bungs. Production cultures were incubated for 12 to 16 h at 28°C with shaking at 250 rpm.
For the analysis of BCM production, 1-ml production culture samples were centrifuged at 18,000 ϫ g for 5 min. Five microliters of these samples was analyzed by LC-MS using a Luna Omega 1.6-m Polar C18 column (50 mm by 2.1 mm, 100 Å; Phenomenex) connected to a Shimadzu Nexera X2 ultrahighperformance liquid chromatography (UHPLC) eluting with a linear gradient of 0 to 35% methanol in water plus 0.1% formic acid over 6 min, with a flow rate of 0.5 ml/min. MS data were obtained using a Shimadzu ion-trap-time of flight (IT-TOF) mass spectrometer coupled to the UHPLC and analyzed using the LabSolutions software (Shimadzu). MS data were collected in positive mode over an m/z 200 to 2,000 range, with an ion accumulation window of 10 ms and automatic sensitivity control of 70% of the base peak. The curved desolvation line (CDL) temperature was 250°C, and the heat block temperature was 300°C. MS 2 data were collected between m/z 90 and 2,000 in a data-dependent manner for parent ions between m/z 200 and 1,500, using a collision-induced dissociation energy of 50% and a precursor ion width of 3 Da. The instrument was calibrated using sodium trifluoroacetate cluster ions prior to every run.
Additional LC-MS analysis was carried out using a Waters Xevo TQ-S tandem LC-MS fitted with the aforementioned column and employing the same chromatographic method but injecting 1 l of sample. A multiple-reaction monitoring (MRM) method for BCM identification and quantification was configured with the IntelliStart software (Waters) using pure BCM as a standard. MRM is based on the tracking of signature fragment ions (transitions) of a selected parent ion (determined with a true standard) to ensure the unambiguous and quantitative identification of a given molecule.  (22 V). Data were acquired in positive electrospray mode with a capillary voltage of 3.9 kV, desolvation temperature of 500°C, gas flow of 900 liters/h, cone gas flow of 150 liters/h, and nebulizer set to 7.0 ϫ 10 5 Pa. LC-MS data were analyzed using the MassLynx software and the quantification tool QuanLynx (Waters). Xevo MS peak areas were used to determine BCM yields in comparison to a BCM standard.
For the accurate mass measurement of the BCM-like compounds, high-resolution mass spectra were acquired on a Synapt G2-Si mass spectrometer (Waters) operated in positive mode with a scan time of 0.5 s in the mass range of m/z 50 to 600. Five-microliter samples were injected onto a Luna Omega 1.6-m Polar C18 column (50 mm by 2.1 mm, 100 Å; Phenomenex) and eluted with a linear gradient of 1 to 40% acetonitrile in water plus 0.1% formic acid over 7 min. Synapt G2-Si MS data were collected with the following parameters: capillary voltage, 2.5 kV; cone voltage, 40 V; source temperature, 120°C; and desolvation temperature, 350°C. Leu-enkephalin peptide was used to generate a dual lock-mass calibration with m/z of 278.1135 and m/z of 556.2766 measured every 30 s during the run.
Isolation and characterization of BCM from Pseudomonas. Four 2-liter flasks containing 500 ml of BCMM were each inoculated with 20 ml of SBW25/pJH-BCMclp-PA SCFM seed culture grown overnight. After 20 h of fermentation at 28°C with shaking at 250 rpm, the culture broth (approximately 2 liters) was separated from the cells by centrifugation to yield a cell-free supernatant (ca. 2 liters). The supernatant was lyophilized and then resuspended in distilled water (0.6 liters). This aqueous solution was extracted with ethyl acetate (3 ϫ 0.6 liters) and then with 1-butanol (3 ϫ 3 liters). The solvent was removed to dryness from each extract to afford an ethyl acetate extract (0.014 g), a butanol extract (0.914 g), and an aqueous extract (7.06 g). LC-MS analysis determined that the target compounds were mainly in the butanol and the aqueous extracts. The aqueous extract (0.202 g) and all of the butanol extract were subjected to solid-phase chromatography (SPE) on a C18 cartridge (DSC-18, 20 ml) using a gradient of H 2 O-MeOH (100:0 to 80:20). Fractions containing BCM were combined and further purified by semipreparative HPLC (Phenomenex, Luna PFP [2], 250 mm by 10 mm, 5 m; 2 ml/min, UV detection at 218 nm) using a linear gradient of MeOH-H 2 O from 2 to 35% MeOH over 35 min, yielding bicyclomycin (3.3 mg; retention time ϭ 30.2 min). One-dimensional (1D) and 2D NMR spectra were recorded at a 1 H resonance frequency of 400 MHz and a 13 C resonance frequency of 100 MHz using a Bruker Avance 400 MHz NMR spectrometer operated using the TopSpin 2.0 software. Spectra were calibrated to the residual solvent signals of CD 3  Identification of bcm gene clusters in sequenced bacteria. The sequences used for the phylogenetic analyses performed in this work were retrieved as follows. A BLASTP search against the NCBI nonredundant protein sequence database was carried out using the CDPS BcmA from S. cinnamoneus as the query, and the accession numbers of the resulting 73 hits were retrieved. These accession numbers were then used as input for Batch Entrez (https://www.ncbi.nlm.nih.gov/sites/batchentrez) to retrieve all the genomic records associated with them in the RefSeq nucleotide database (i.e., genomic sequences containing the protein identifications [IDs] recovered from BLASTP). This yielded a total of 754 nucleotide records which were then analyzed using MultiGeneBlast (71) to ascertain which ones had the complete bcm gene cluster. Thirty of the 754 sequences were discarded on the basis that they did not contain the bcm gene cluster or that the sequence was truncated. Analysis of the metadata associated with the remaining records led to the exclusion of 217 P. aeruginosa sequences (accession numbers NZ_LCSU01000019.1 to NZ_LFDI01000014.1, ordered by taxonomic ID) in order to avoid overestimation of the cluster conservation, since they were all isolated from a single patient (72). An additional 134 P. aeruginosa sequences (accession numbers NZ_FRFJ01000027.1 to NZ_FUEJ01000078.1, ordered by taxonomic ID) were also excluded from the analysis, due to a lack of associated metadata that prevented an assessment of the diversity of the sample set. Finally, a sequence from accession number NZ_LLUU01000091.1 was also discarded due to the presence of a stretch of undetermined nucleotides (substituted with Ns) in the bcm gene cluster. This resulted in a final data set of 374 sequences: 372 putative bcm gene clusters (Data Set S1) plus the gene clusters from S. cinnamoneus DSM 41675 and P. aeruginosa SCV20265. For the downstream formatting of the data set sequences, scripts or programs that could be run in parallel to process multiple inputs were run via GNU Parallel (73).
Phylogenetic analysis of the bcm gene cluster. Nucleotide sequences of the 374 data set clusters were trimmed to span a nucleotide region from 200 bp upstream of the start of bcmA to 200 bp downstream of the end of bcmG (average length, 7,224 bp). Phylogenetic analyses were carried out using MUSCLE and RAxML, which were used through the CIPRES science gateway (74) and T-REX (75), and the trees were visualized and edited using iTOL (76).
The resulting PHYLIP interleaved output file was then used to generate a maximum likelihood phylogenetic tree using RAxML (78). The program was configured to perform rapid bootstrapping (BS) with up to a maximum 1,000 BS replicate searches (or until convergence was reached), followed by a maximum likelihood search to identify the best tree, with the following input parameters: Raxml -T 4 -N autoMRE -n correctorientcluster -s infile.txt -c 25 -m GTRCAT -p 12345 -k -f a -x 12345.
During the phylogenetic analysis with RAxML, 225 sequences were found to be absolutely identical and were subsequently removed to allow for a streamlined analysis of cluster phylogeny. After the analysis, a sequence with accession number NZ_LLQO01000184.1 was also found to be truncated and was eliminated from the phylogenetic tree, which contained 148 nonredundant entries.
For the phylogenetic analyses of the 2-OG-dependent dioxygenases, the amino acid sequences of BcmB, BcmC, BcmE, BcmF, and BcmG from S. cinnamoneus, P. aeruginosa SCV20265, and a strain subset including all representatives from Streptomyces, Actinokineospora, Williamsia, Burkholderia, and Tistrella spp., as well as two from Mycobacterium and seven from Pseudomonas, were retrieved and aligned with MUSCLE (with same parameters as before except for -seqtype protein -hydro 5 -hydrofactor 1.2), and a maximum likelihood phylogenetic tree was generated with RAxML using the model -m PROTGAMMABLOSUM62, including protein BP3529 from Bordetella pertussis (accession no. P0A3X2.1) as an outgroup.
Analysis of the genomic context of the bcm gene cluster. For all of the sequences containing the bcm gene cluster, a 20-kb region around BcmA was retrieved and reannotated using Prokka. A subset of these sequences (all Gram-positive bacteria, plus Burkholderia, Tistrella, and several Pseudomonas strains) were analyzed for conserved domains using CDD at NCBI (79), and mobile genetic elements were identified by manual analysis.