LibraGen S.A.,1 Écologie Microbienne, UMR CNRS 5557,2 Laboratoire de Biométrie et Biologie Évolutive, UMR CNRS 5558, Université Claude Bernard, Villeurbanne, France3
Received 6 May 2004/ Accepted 10 May 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Based on physiological studies, cultivated bacterium numbers will increase significantly in the near future, thus providing new bacterial isolates for screening tests (17). However, to avoid culture limits, another approach has been developed consisting of the screening of recombinant bacteria that could express genes from the metagenome, defined as all bacterial genomes of a given environment (6, 19, 21, 24). The interest of this metagenomic approach has already been demonstrated with the analysis of clones containing ribosomal genes. Phylogenetic studies of 16S rRNA genes indicate that metagenomic DNA encompasses a large bacterial diversity including uncultivated bacteria and even unknown bacterial phyla (4, 19, 29).
Beyond the descriptive analysis of diversity, the metagenome has been shown to provide the functional identification of bacterial genes that encode bioactive compounds (11), new polyketide synthases (6, 21), and even new functions like a membrane-associated proteolytic system (4). The functional analysis of metagenomic clones requires their genes, operons, or biosynthetic pathway to be entirely cloned and then transferred into an adapted host for heterologous expression. The construction of metagenomic libraries leads to such a genetic manipulation. The screening of large libraries for biosynthetic genes was demonstrated to detect numerous potentially interesting clones (6). Due to the technical difficulties encountered with heterologous expression, production of the expressed compound, and chemical analysis of the compound, the choice of clones to study is crucial. Type I polyketide synthases (PKSI) synthesize natural products of therapeutic interest such as erythromycin, rapamycin, or epothilone, and their organization provides facility in the selection of promising clones.
Properties of PKSI make them particularly well suited for the metagenomic DNA library approach. These large multienzymes are composed of a succession of modules. A loading module loads and activates the first substrate. Then each extender module catalyzes an elongation step with condensation of extender units onto the growing polyketide chain (28). A minimal extender module is composed of three domains: a ketosynthase (KS) domain for decarboxylative condensation of the extender unit onto the growing chain, an acyl transferase (AT) domain for substrate selection, activation, and transfer, and an acyl carrier protein (ACP), which loads the growing chain. Each substrate can be reductively tailored by additional domains, ketoreductase, enoylreductase, and dehydratase (14, 27). A thioesterase domain is often localized after the PKSI extender modules and catalyzes the release of the completed polyketide chain.
The method for selecting the promising clones is supplied by phylogenetic analysis of PKSI domains. Phylogenies of the protein sequences of KS and AT domains led to (i) the determination of the taxonomic position of the donor DNA, since most of KS domains from the order Actinomycetales are monophyletic (20, 21), (ii) the identification of unusual KS domain functions, such as KS from loading modules and from hybrid nonribosomal peptide synthases (NRPS)/PKS (20, 21), (iii) the prediction of the incorporated substrate of each module by AT phylogeny (13, 15, 26, 34), and (iv) the prediction of the polyketide novelty. Indeed, one can observe with phylogeny analysis that KS domains, with a usual function, from a PKSI operon tend to cluster (11a, 20). This clustering can be linked with the synthesized polyketide. To select the promising clones containing PKSI genes, their KS domains have to be analyzed though KS phylogeny to predict their functional novelty, which is defined as the novelty of their potentially synthesized polyketide. In others words, the branching of uncharacterized KS domains within clusters of published PKSI operons can lead to the exclusion of these uncharacterized PKSI from further experiments.
Among PKSI domains, the KS domain is the most conserved (3, 20, 21). Thus, we designed PCR primers in the conserved regions of the KS domain to detect PKSI genes in recombinant clones from a large metagenomic soil library. Three of 139 detected positive PKSI clones were entirely sequenced. The KS and AT domains of these three metagenomic clones were analyzed and compared to the domains from 23 previously published PKSI. This method led to the detection of numerous PKSI-positive clones and to the selection of two promising clones for the potential production of new compounds.
| MATERIALS AND METHODS |
|---|
|
|
|---|
DNA extraction was performed by using a centrifugation-based separation of bacteria from soil particles, followed by the incorporation of bacteria in agarose before a gentle bacterial lysis as described by Nalin et al. (R. Nalin, P. Robe, and V. Tran Van, 11 January 2001, Method for indirectly extracting noncultivable DNA organisms and DNA by said method, French Patent Office). The Nycodenz-mediated extraction of bacteria from the soil matrix was achieved as previously described (5). The bacterial pellets were resuspended in a 50 mM Tris (pH 8.0), 100 mM EDTA buffer, mixed with an equal volume of molten 1.6% Incert agarose (BMA), and then transferred into disposable plug molds (Bio-Rad). The lysis of the soil bacteria was then performed in agarose. Agarose plugs were first transferred in 45 ml of LA lysis buffer (50 mM Tris [pH 8.0], 100 mM EDTA, 5 mg of lysozyme/ml, 0.5 mg of achromopeptidase/ml) and incubated at 37°C for 6 h. The agarose plugs were then incubated in 45 ml of SP lysis buffer (50 mM Tris [pH 8.0], 100 mM EDTA, 1% lauryl sarcosyl, 2 mg of proteinase K/ml) at 55°C for 24 h. An additional incubation for 24 h was performed with fresh SP buffer. Agarose plugs were finally equilibrated in a 10 mM Tris (pH 8.0), 1 mM EDTA storage buffer.
Construction of the metagenomic library.
High-molecular-weight bacterial DNA trapped in agarose plugs was immediately inserted into the wells of an 0.8% low-melting-temperature gel (Bio-Rad) and separated for 18 h by pulsed-field gel electrophoresis at 4.5 V/cm with 5- to 40-s pulse times with a CHEF-DRIII apparatus (Bio-Rad). DNA fragments ranging between 35 and 48 kbp were isolated and then recovered from the gel with GELase (Epicentre Technologies). Metagenomic DNA was then cloned into fosmids by using the EpiFos fosmid library production kit (Epicentre Technologies) as recommended by the manufacturer. Recombinant colonies were transferred to 96-well microtiter plates containing freezing medium (Luria-Bertani, 20% glycerol complemented with 12.5 µg of chloramphenicol/ml). After growing at 37°C for 22 h, the plates were stored at 20°C.
PCR screening of clones for PKSI genes.
Overnight cultures of 1 ml per well in 96-deepwell plates (22 h, 37°C, 250 rpm shaking) were pooled and purified with the Nucleobond PC100 kit (Macherey Nagel) by following the instructions of the manufacturer. Purified DNA from these 96 pools (100 to 500 ng) was used as a template. Primers KSLF (5'-CCSCAGSAGCGCSTSYTSCTSGA-3') and KSLR (5'-GTSCCSGTSCCGTGSGYSTCSA-3') were designed based on the conserved KS domain motifs. The specific fragment amplified with KSLF-KSLR is about 700 bp in length. PCR amplification on DNA 96-well pools was performed with recombinant Taq DNA polymerase (Sigma) as follows: a denaturation step at 96°C for 5 min; 7 cycles consisting of 1 min at 96°C, 1 min at 65°C (annealing temperature lowering 1°C per cycle), and 1 min at 72°C; 40 cycles consisting of 1 min at 96°C, 1 min at 58°C, and 1 min at 72°C; and a final extension for 7 min at 72°C.
Localization of PKSI-positive clones.
The microtiter plate positions of PKSI-positive clones were determined by colony hybridization. A PCR fragment obtained with degenerate PKSI primers set from four positive pools was used to generate the probe. The four PCR products were mixed to hybridize the four respective microtiter plates. This probe mixing minimized the number of labeling reactions. The mixed probes were labeled with [
-33P]dCTP by using the random priming DNA labeling kit (Roche) in accordance with the manufacturer's protocol. Transformants were spotted onto GeneScreen Plus (NEF988) nylon membranes previously laid onto Luria-Bertani agar plates and then incubated at 37°C for 18 h. Colonies were lysed by incubating the membranes for 15 min on a sheet of 3M paper (Whatman) saturated with 0.5 M NaOH-1.5 M NaCl. The membranes were then neutralized by incubation for 15 min on a sheet of 3M paper (Whatman) saturated with 1.5 M NaCl-1 M Tris (pH 7.5). After drying at room temperature for 20 min, immobilization of DNA on membranes was performed by the UV cross-linking technique (312 nm for 4 min). A prehybridization was realized for 2 h with conditions as follows. Hybridization with the probe was performed for 16 h at 68°C with a 1% sodium dodecyl sulfate-5x Denhardt's-1 M NaCl solution. Membranes were washed sequentially at 68°C in (i) 2x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate) for 10 min, (ii) 2x SSC-0.1% sodium dodecyl sulfate for 20 min, and (iii) 1x SSC for 10 min. The hybridization signals were visualized after exposure for 4 h by using a PhosphorImager (Bio-Rad GS-525).
Sequencing.
Fosmid inserts were sequenced by using both transposon-mediated and shotgun subcloning approaches. Transposition was performed by using the transposition kit (Epicentre) according to the manufacturer's instructions. For subcloning, the purified fosmid DNA was partially restricted with Sau3AI. Restriction fragments ranging from 1 to 3 kbp were size selected by standard gel electrophoresis and then cloned into the pBC SK (+/) vector (Stratagene) (25). In addition, PCR products of about 700 bp were purified from an agarose gel with a gel extraction kit (Qiagen) and then cloned by using the Topo PCR II kit (Invitrogen). Recombinant plasmids were purified by using the QIAprep plasmid extraction kit (Qiagen) and sequenced with forward and reverse M13 primers. Sequencing reactions were performed with the DTCS cycle sequencing kit (Beckman Coulter) as recommended by the supplier. Sequencing reactions were run on a CEQ 2000 sequencer (Beckman Coulter). Then the overlaps of at least four independent clones were assembled.
Phylogenetic analysis.
The protein sequences of KS and AT domains detected in the metagenomic library were aligned with a large set of published sequences (supplemental table available at http://web.libragen.com/Phylogeny/sup_table.html). This set contains 23 PKSI clusters representing 203 KS domains and 207 AT domains. The KS and AT domains from three clones detected in the metagenomic library, named Lib4, Lib7, and Lib10, were included and aligned with respective published sequences by using DbClustal (30). Alignments were manually corrected by using Seaview (10). Phylogenetic reconstructions were performed with Phylo_win (10) for the distance method by using neighbor joining (NJ) and the PAM matrix. The program PhyML (12) was used for the maximum-likelihood (ML) method by using BIONJ with the JTT model of substitution (16). All trees were built with 500 bootstrap replicates. For the ML reconstruction, the 500 data sets were generated by using SEQBOOT from the PHYLIP, version 3.57c, package (7). A tree was built for each replicate with PhyML, and then bootstraps were computed with CONSENSE. All trees were drawn with NJplot (22). The trees were rooted by a fatty acid synthase (mas gene, Uniprot/Swiss-Prot database accession number M95808).
Nucleotide sequence accession number.
The three nucleotide sequences Lib4, Lib7, and Lib10 encoding the KS and AT domain regions have been assigned the accession numbers AJ639921, AJ639922, and AJ639923, respectively, in the EMBL database. The accession numbers for published PKSI sequences are reported in the supplemental table (http://web.libragen.com/Phylogeny/sup_table.html).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
PCR-based metagenomic library screening for PKSI genes.
Recombinant fosmid DNA from 60,000 of the 100,000 clones of the soil metagenomic library was extracted for use as a template for PCR screening according to the protocol described in Materials and Methods. Primers were designed based on the most conserved DNA regions of the KS domain and gave a positive PCR response for the 139 clones. Further analysis was restricted to 40 randomly chosen fosmids, which were subsequently cloned and sequenced. All 40 DNA sequences were unique. The 40 deduced protein sequences never exceeded 67% similarity to published sequences (maximum of 141 of 210 amino acid identities) according to BLAST analysis (1). These results confirm that the diversity level in the metagenomic DNA library was relatively high, as described for a previous 5,000-clone-rich library in which 11 different KS domain sequences were detected (6).
Phylogenetic analysis of the detected PKSI genes.
An active KS domain site encoding an open reading frame was detected in each of the 39 DNA sequences amplified from the metagenomic DNA library clones. Among the 39 sequences, 11 sequences displayed a unique pattern N(DE)KD 22 amino acids upstream from the cysteine active site in the KS domain and the conserved pattern VDTACSSS was replaced by VQTACSTS (amino acid modifications are shown in boldface type). These two patterns were shown to identify KS domains belonging to hybrids between NRPS and PKSI (21) and, more precisely, KS domains preceded by an NRPS, thus acting on an amino acid chain (11a).
For further analysis, complete insert sequences were obtained for 3 of the 39 clones, Lib4, Lib7, and Lib10. Lib10 was selected as a representative of the 11-clone group that could exhibit the presence of an NRPS domain upstream of the KS domain. Lib4 and Lib7 were chosen because the extremities of their fosmid inserts did not contain any PKS genes, and thus, their complete biosynthetic pathway genes were expected to be contained in the cloned DNA inserts.
Analysis of KS domains.
Only one KS domain was found in Lib4, Lib7, and Lib10 clone sequences, indicating that the potentially synthesized compounds would not exhibit a typical linear polyketide structure. Phylogenetic analyses were carried out with the objective of locating these metagenomic KS domains inside the phylogenetic tree built with the KS domains from 23 PKSI published sequences. Both reconstruction methods, distance and ML, provided the same tree topologies, suggesting that observed groups do not result from computational artifacts.
The KS domain sequences of the Lib4, Lib7, and Lib10 clones exhibited low similarity values to those available in GenBank (58, 61, and 58% BLASTP identities with their closest neighbors, respectively) and were not identical to each other.
None of the KS domains from the metagenomic clones Lib4, Lib7, and Lib10 were clustered within the Actinomycetales group (bootstrap values, 100 [ML] and 97 [NJ]), suggesting that these clones probably do not belong to this order (Fig. 1). However, such a hypothesis cannot be totally supported considering that functional constraints could have led some genes to evolve differently than the other genes that encode typical KS domains. For instance, KS domains that do not catalyze a usual incoming acyl chain but an amino acid chain were found to cluster together (called a hybrid group) and separately from the other KS domains (20, 21), independently of their taxonomic positions. An NRPS module systematically precedes these unusual KS domains. Interestingly, an NRPS gene was also detected upstream of the KS domain in the Lib10 clone that strongly clustered in the hybrid group (Fig. 1).
|
Analysis of the complete sequences confirmed that the Lib4 and Lib7 inserts contained all of the genes coding for a complete biosynthetic pathway. The two KS domains from Lib4 and Lib7 belong to the loading module of their respective PKSI. They did not exhibit the active-site mutation specific to KSQ domains in which cysteine is replaced by glutamine. These KSQ domains have lost their condensation activity but still decarboxylate the ACP-bound dicarbocylic acid, giving rise to the initial substrate (33). Moreover, as these modules contain only one AT domain, they cannot be classified in the starting module group that presents the organization ACP-KS-AT-AT-ACP (18). As expected, distance and ML phylogenetic methods did not include the KS domains from Lib4 and Lib7 in the KSQ/2AT, group although they were closed (Fig. 1). Thus, the KS domains from Lib4 and Lib7 may have a usual function and are reliable for novelty prediction.
Substrate specificity prediction of AT domains.
The chemical structure of the final compound is dictated by (i) the incorporated substrate recognized by AT domains, (ii) the degree of the reduction cycle catalyzed by additional domains of each module (14), and (iii) the number of modules and their succession. Substrate recognition is a major factor influencing polyketide structure and diversity. In most cases, the incorporated substrates are predicted by phylogenetic analyses of AT domains (13, 15, 26, 34).
AT domains specific to malonyl and methylmalonyl were found to cluster in two separate groups. However, the malonyl node is not supported by high bootstrap values when five AT domains (pltB, mtaBbis, mtaB2, mcyD, and mxaC2) were included in the reconstruction. A second phylogenetic reconstruction performed without these five protein sequences showed that the typical malonyl incorporation coding sequences clustered together with bootstrap values of 100 (ML) and 100 (NJ) (Fig. 2). The two AT domains from Lib7 and Lib10 are clustered in the real malonyl group and, therefore, must incorporate malonyl (Fig. 2).
|
|
|
The majority of AT domains from loading modules of non-Actinomycetales fall in the unsolved group (Fig. 2). AT domains from loading modules can accept different loading units with a lower specificity, maybe to adapt to the substrate availability in the cell (9). The AT domain from Lib4 did not cluster within any of the malonyl or methylmalonyl reliable groups (Fig. 2); thus, any prediction for substrate incorporation based on phylogenetic analysis is excluded. Moreover, the Yadav prediction for the Lib4 AT domain provided similar expect values (5 e70 to 3 e58) for various substrates, including 3-methylbutyryl, which ranked at the first position (Table 2). Since, this AT domain belongs to a loading domain, which is also predicted by the Yadav model, it may recognize several substrates.
The organization of clone libraries containing large fragments of metagenomic DNA provides access to a wide diversity of uncharacterized genes, operons, and biosynthetic pathways. By targeting PKSI genes, our results and those of a previous report (6) indicate that one can expect 0.23% (139 PCR hits for 60,000 clones screened) of PKSI genes from metagenomic libraries. This frequency and the absence of redundancy observed in this study confirmed the potential and the quality of this soil library. Since large and homogeneous libraries of more than 100,000 clones are available, sorting the numerous detected clones is an important challenge. Indeed, the transfer into an adapted host and the identification of conditions for their heterologous expression are fastidious to perform. Moreover, even when this heterologous expression is achieved, chemical analysis of natural compounds is often difficult. Our study demonstrates that complete KS and AT sequences from PKSI provide enough fundamental information to select the promising clones. This information includes the prediction of novelty of the potentially synthesized compound and the incorporated substrates. These useful predictions will help the chemical characterization of the polyketide. Thus, this method will enable investigators to decide which clones deserve to be studied further. In this study, we detected and selected Lib4 and Lib7 as the most pertinent clones for potentially novel active compounds.
| ACKNOWLEDGMENTS |
|---|
This work is part of the project "Développement et exploitation de librairies d'ADN metagénomique," which was funded by Région Rhône-Alpes (Thématiques Prioritaires, Sciences Analytiques Appliquées) and was supported by the Agence Nationale de Valorisation de la Recherche.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| J. Bacteriol. | Microbiol. Mol. Biol. Rev. | Eukaryot. Cell | All ASM Journals |
|---|