Previous Article | Next Article ![]()
Applied and Environmental Microbiology, April 2004, p. 2429-2436, Vol. 70, No. 4
0099-2240/04/$08.00+0 DOI: 10.1128/AEM.70.4.2429-2436.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Toby Richardson, Aileen Milan, Mark Miller, David P. Weiner, Kelvin Wong, Jeff McQuaid, Bob Farwell, Lori A. Preston, Xuqiu Tan, Marjory A. Snead, Martin Keller, Eric Mathur, Patricia L. Kretz, Mark J. Burk, and Jay M. Short
Diversa Corporation, San Diego, California 92121
Received 18 September 2003/ Accepted 22 December 2003
|
|
|---|
|
|
|---|
The protein sequence space parsed by modern science represents a small fraction of the genetic information available in the biosphere. This has become apparent from recent microbiological efforts targeting a spectrum of physical and chemical environments. These studies have shown that biotopes at extremes of temperature, pressure, pH, salinity, etc., are rich in microbial biodiversity. It is also clear that the physical properties and chemical specificities of the associated gene products reflect the extrinsic and intrinsic physical properties of the biotope and its own participation in a unique microscopic chemical cycle. It has been estimated that <0.1% of all eubacteria and archaea have been cultured and characterized (1, 20, 27). The geography of this potential sequence space promises the existence of new enzymes with fitness parameters compatible with highly selective industrial transformations.
This study focused on the discovery of new nitrile-hydrolyzing enzymes from various environmental sources with the goal of catalyzing efficient chemo- and enantioselective carboxylic acid synthesis. Due to the favorable economics of their preparation, nitriles are attractive starting points in fine chemical manufacturing. However, their conversion to corresponding amides and carboxylic acids requires harsh conditions and generates significant waste (31). Enzymatic hydrolysis of nitriles can be performed under mild conditions and can also leverage the advantages of stereoselective enzymatic hydrolysis and nitrile reracemization to potentially realize 100% enantioselective conversions.
A broad spectrum of biotopes could be targeted for nitrile-hydrolyzing enzymes due to microbially assisted chemical cycles involving the mineralization of organocyanides. These compounds are intermediates in the metabolism of cyanide by plants, animals, and microbes. Cyanogenic glycosides and cyanolipids are found in plants, aminonitriles and cyanohydrins are found in fungi, mandelonitriles are found in arthropods, and a variety of nitrile compounds are found in microorganisms (15). It has been speculated that microbial nitrile metabolism is, at least in part, a consequence of eukaryotic-prokary-otic ecological relationships and that members of the nitrilase superfamily of genes have been horizontally transferred from plants to bacteria and archaea (18). Microbes metabolize cyanosugars, cyanohormones, such as plant auxins, and other organocyanide compounds produced for defense or storage by bacteria and eukaryotes. Also, some pathogenic fungi have cyanide hydratases that allow them to infect plants that synthesize large amounts of defensive alkenyl glucosinates, which break down into isothiocyanates and nitriles (25).
Two general but evolutionarily distinct mechanisms have been described for enzymatic nitrile hydrolysis (5, 18). A two-step nitrile hydrolysis is catalyzed by the concerted action of a nitrile hydratase, producing an amide and a coexpressed amidase, which completes the transformation to the carboxylic acid. A direct hydrolysis of a nitrile to its corresponding carboxylic acid, the mechanism of interest here, is catalyzed by a nitrilase. The nitrilases are a subcategory of the carbon-nitrogen hydrolase superfamily, whose members in general catalyze nonpeptide carbon-nitrogen bond hydrolysis (18). Sequence analysis and subdivision of the superfamily results in the categorization of 13 distinct branches that further sort by bond specificity. Enzymes in the superfamily catalyze amidase, carbamylase, 9N-acyltransferase, and nitrilase reactions. The inference from sequence and structure data for two of the superfamily members, one an N-carbamyl-D-amino acid amidohydrolase from Agrobacterium (17) and the other the NitFhit Rosetta Stone protein from Caenorhabditis (19), is that the enzymes utilize a thiol mechanism, contain a Glu-Lys-Cys catalytic triad, and fold into an
-ß-ß-
sandwich conformation (18). There are no structures for the nitrilase subfamily, though their mechanisms and structures are presumed to conform to that of the amidohydrolase.
This study sought to leverage the synthetic utility of the nitrilase subfamily (EC 3.5.5.1) by searching exhaustively through large numbers of microbial genomes for novel sequences. Using a novel culture-independent approach, environmental DNA samples derived from >600 biotopes were screened exhaustively for nitrilase gene products with a view toward development of a synthetic toolbox for chiral carboxylic acid synthesis. Each of the newly discovered nitrilases was screened for activity on three nitrile substrates which are hydrolyzed to yield mandelic acid, phenyl lactic acid, and 4-cyano-3-hydroxybutyric acid. The collection of a large and diverse group of homologous sequences and the characterization of their activities also provides an opportunity to explore principles of protein structure and evolution and their relationship to catalytic specificity.
|
|
|---|
High-quality, high-molecular-weight DNA was isolated directly from soil or water samples following separation of cells from the environmental matrix (27-29). Highly purified suspensions of microbial consortia were obtained by isopycnic density gradient centrifugation with Nycodenz (22). The resulting cell pellet was lysed by enzymatic and chemical digestions, followed by the isolation and purification of genomic DNA (33). After isolation from environmental contaminants, the genomic DNA was either digested using restriction enzymes or sheared to provide clonable fragments 1 to 10 kb in length.
The gene libraries were constructed in
ZAP-based cloning vectors (Stratagene, La Jolla, Calif.). The library contained members with insert sizes of 1 to 10 kb. The libraries were propagated in the form of bacteriophage lambda and amplified to produce high-titer stocks. The lambda library constructed in
ZAP was then converted into a phagemid library through in vivo excision (26).
Nitrilase selection.
To prepare clones for screening, each eDNA phagemid library was combined with the Escherichia coli host (
r Sup0 F'), suspended at an A600 of 1 in 10 mM MgSO4, allowed to adsorb for 15 min, and then incubated at 37°C for 45 min to allow expression of the kanamycin antibiotic resistance gene. Infected cells, now designated by kanamycin resistance, were plated on kanamycin Luria-Bertani (LB) plates and allowed to grow overnight at 30°C. Titer plates were also made to determine the infection efficiency. After overnight growth, the cells were pooled, washed, and resuspended in 10 mM MgSO4. The cell suspension was then used to inoculate 1 ml of liquid M9 medium (without nitrogen) supplemented with 10 mM nitrile substrate. M9 media consisted of 1x M9 salts (with NH4Cl omitted), 0.1 mM CaCl2, 1 mM MgSO4, 0.2% glucose, and
10 mM either adiponitrile or (R,S)-4-chloro-3-hydroxyglutaronitrile. These selection cultures were then incubated at 30°C with shaking at 200 rpm for up to 5 weeks. Libraries containing positive nitrilase clones were identified by visual observation of growth. Growth resulted from a clone's ability to hydrolyze the nitrile substrate (which was the sole nitrogen source), thus generating ammonia to support the growth of that clone. Positive clones were isolated from this liquid culture by streak purification onto solid LB medium. Several colonies from the streak plate were subsequently recultured in the same defined liquid medium. The phagemid DNA from the secondary cultures exhibiting growth was then isolated and sequenced to confirm the discovery of a nitrilase gene and to establish the unique nature of that gene. In general, one to three unique nitrilases were identified from each iteration of this process on a particular library. Unique nitrilase genes were then subcloned into an expression vector. The nitrilase genes were cloned in pSE420 (Invitrogen, San Diego, Calif.) and expressed in E. coli XL1 Blue MR (Stratagene). Expression was performed in LB or Terrific Broth (Difco) medium and induced with 0.1 mM IPTG (isopropyl-ß-D-thiogalactopyranoside) at 30°C.
Phylogenetic analysis.
For phylogenetic analysis, 150 nitrilase amino acid sequences were aligned using Clustal W, followed by manual refinement. The small regions in the global alignment that could not be reliably aligned were masked out, resulting in a data set containing 256 positions. A maximum-likelihood analysis was performed in ProML (Phylip version 3.6) (9), using the JTT substitution model (12) with equal rates, global rearrangements, and three random sequence addition replicates. The large data set made the use of more complex evolutionary models computationally prohibitive. We also performed a Bayesian phylogenetic inference using MrBayes (11). Four Monte Carlo Markov chains were run for 700,000 generations after stabilization of the likelihood values, generating 7,000 trees. A majority rule consensus tree was generated, and the percentage of the time a particular clade occurred, i.e., its posterior probability, was recorded at the nodes. Values of >80 to 85% were considered strong support and were considered equivalent to high-confidence values obtained by bootstrap analysis. The topology of the maximum-likelihood tree and that of the Bayesian consensus tree were virtually identical. The same topology was also obtained by a neighbor-joining analysis. The internal topologies of the individual clades were unchanged regardless of whether two plant (Arabidopsis and Oryza) nitrilase genes were included as outgroups.
An alignment of the sequences used in the analysis is available on request.
Biochemical assays for nitrilase specificity.
3-Hydroxyglutaronitrile was purchased from TCI America and was used as received. (R)-3-hydroxy-4-cyanobutyric acid was obtained from Gateway Chemical Technology (St. Louis, Mo.). Mandelonitrile, (R)- and (S)-mandelic acid, and (R)- and (S)-phenyllactic acid were purchased from Sigma Aldrich. The synthesis of phenylacetaldehyde cyanohydrin was described previously (8).
High-performance liquid chromatography assay methods for determination of reaction products have been described previously (8). Briefly, activity assays for mandelonitrile and phenylacetaldehyde cyanohydrin were conducted using 25 mM substrate and 0.6 mg of enzyme (lyophilized cell lysate)/ml in 0.25 ml of assay solution (0.1 M sodium phosphate buffer, pH 8, 10% MeOH) for 24 h. For 3-hydroxyglutaronitrile, assay conditions of 50 mM substrate, 1.2 mg of enzyme/ml, and pH 7 were used, with an incubation time of 48 h.
Nucleotide sequence accession numbers.
The nitrilase sequences were deposited in GenBank (accession numbers AY487426 to AY487562).
|
|
|---|
-ß-ß-
sandwich fold, similar topologies, and characteristic sequence motifs (2). About 20 nitrilases have been identified in bacteria, half of them having been identified in sequenced genomes and not experimentally tested for activity. While members of the superfamily have been identified in archaea, no archaeal enzyme with nitrile-hydrolyzing activity has been reported.
A total of 651 environmental samples collected worldwide from terrestrial and aquatic microenvironments were processed into genomic eDNA libraries containing fragments of 1 to 10 kb. These fragments were cloned into a common bacterial expression vector. The process results in a population of 106 to 109 unique clones, each with the potential to express the individual or multiple genes on its discrete eDNA insert. Each environmental sample resulted in a single eDNA library, and one selection was performed per library. A high-throughput selection screen was designed in which clones were rescued if they were able to hydrolyze an added nitrile substrate and liberate ammonia in nitrogen-free growth medium. Selection substrates were chosen with regard to the substituent character. Adiponitrile was chosen as a small aliphatic dinitrile, and (R,S)-4-chloro-3-hydroxyglutaronitrile was chosen as a chiral substitute for 3-hydroxyglutaronitrile, a prochiral molecule that is useful after desymmetrization in the synthesis of the cholesterol-lowering drug Lipitor
(8). Over 200 nitrilase expression hits were sequenced and shown to be unique at the DNA level. In addition, two nitrile hydratase-amidase gene combinations were identified by this screening strategy. They were not further characterized as part of this study; 137 nitrilase gene products were expressed and further characterized.
The collection of a large and diverse group of homologous sequences and characterization of their activities provide an opportunity to begin exploration of principles of evolution and protein structure and their relationship to catalytic specificity. The nitrilase sequences were subjected to phylogenetic analysis using maximum-likelihood and Bayesian inference methods (11). In addition to the 137 novel sequences, 9 bacterial and 2 plant nitrilase sequences, as well as 2 related fungal cyanide hydratase sequences, were retrieved from GenBank for comparison. The Glu-Lys-Cys catalytic triad was identified in all of the sequences. The nitrilases were between 304 and 385 amino acids in length, most falling in the 320- to 340-amino-acid range; the variation in length is due primarily to extensions of C termini.
Inspection of the resulting unrooted tree, shown in Fig. 1, reveals the presence of several distinct and highly supported sequence clades. All previously known bacterial sequences belong to clades 1 and 2. The degree of amino acid sequence similarity varies by 40 to 60% between clades and by 75% within the individual clades. The four panels in Fig. 1 overlay biogeographical data, as well as substrate and enantioselectivity data, on the tree. Quantitative specificity data, correlated with the tree topology, are presented in detail in Fig. 2 .
![]() View larger version (51K): [in a new window] |
FIG.1. Phylogenetic analysis of nitrilase sequences. The data are illustrated as a maximum-likelihood tree (lnL = 43224.1 [where L is the likelihood]) based on 137 new nitrilases and 13 previously known sequences from GenBank. The numbers at the nodes are percent bipartitions based on Bayesian phylogenetic inference (7,000 trees) and are shown for the major clades only. The scale bar represents the expected number of amino acid substitutions. (A) Geographic origins of environmental libraries from which individual nitrilases were discovered. Each geographic origin is represented by a colored dot: magenta, tropical Southeast Asia; mustard, tropical Central America; blue, Arctic; lilac, Antarctica; green, temperate; and white, other regions. (B to D) Reactivities and enantioselectivities of individual nitrilase enzymes on hydroxyglutaronitrile, mandelonitrile, and phenylacetaldehyde cyanohydrin. The (R)-enantioselectivity is represented by a magenta dot, (S)-enantioselectivity is represented by a green dot, and the absence of enantioselectivity is represented by a blue dot. Nonreactive enzymes do not have an associated dot. The public database enzymes were not tested.
|
![]() View larger version (57K): [in a new window] |
FIG.2. Linear representation of the maximum-likelihood tree topology overlaid with data for yield (percent) and enantioselectivity (% ee) for each of the individual nitrilases for 3-hydroxy-4-cyanobutyric acid, mandelic acid, and phenyllactic acid. The reactions were performed as described in Materials and Methods. *, enzymes discovered using (R,S)-4-chloro-3-hydroxyglutaronitrile as a growth selection substrate; all other nitrilases were identified through growth selection on adiponitrile.
|
The natural substrates for nitrilases and the nature of the metabolic selection pressure that shapes the evolution of this family of enzymes are not understood. Activities have been reported, however, on a number of nitrile substrates that have chemical or pharmaceutical importance. Nitrilases from cultured species of the genera Pseudomonas, Alcaligenes, Rhodococcus, and Acinetobacter have previously been used in stereoselective syntheses of substituted (R)-mandelic acids (32), carbohydrate acids (14), (S)-
-phenylglycine (3), (S)-naproxen, and (S)-ibuprofen (13).
To determine the range of enzymatic specificity in the new nitrilase library, we tested the abilities of the 137 enzymes to catalyze stereoselective hydrolysis of three structurally distinct and industrially important nitrile substrates. The resultant chemo- and enantioselectivity of each enzyme was mapped to its position in the phylogenetic tree. Figure 1B to D illustrates the unrooted nitrilase tree with color-coded enantiospecificity indicated at the branch tips for each of the three substrates, 3-hydroxyglutaronitrile (Fig. 1B), mandelonitrile (Fig. 1C), and phenylacetaldehyde cyanohydrin (Fig. 1D). Substrate chemo- and enantioselectivities generally cluster with the phylogenetically defined clades. While none of these substrates is likely to be the natural substrate for these enzymes, the resulting specificities can provide clues regarding the evolution of the nitrilase active site to accommodate and enantioselectively hydrolyze structurally distinct molecules.
The nitrilase library was screened for the ability to desym- metrize the prochiral substrate 3-hydroxyglutaronitrile. This compound, with no reported natural occurrence, is an intermediate in the enantioselective synthesis of the cholesterol-lowering drug Lipitor(8) (see reaction scheme on page 2433, top).
Of the 137 nitrilases, 110 (>80% of the enzymes) hydrolyze this substrate (Fig. 1B and 2), with wide variation in the degree of enzyme enantioselectivity (Fig. 2). A few of the nitrilases exhibited the ability to hydrolyze both nitrile substituents of hydroxyglutaronitrile; however, this product is not commercially relevant and was therefore not pursued in the comprehensive analysis. The most selective enzymes generated products with enantiomeric excesses (ee) of >90% for (S)-4-cyano-3-hydroxybutyric acid (4A2) and >95% for (R)-4-cyano-3-hydroxybutyric acid (1A8 and 1A9). Several patterns of enantioselectivity clustering can be discerned among the clades. While most of the enzymes in clade 1 are (S)-selective (27 enzymes), with ee values of 10 to 68%, there is a group of 12 related enzymes in subclade 1A (1A2 to 1A12) which show the opposite stereoselectivity, reaching values of 95% for 1A8 and 1A9. Outside of clade 1, (R)-selective enzymes predominate in clades 2 and 5, whereas (S)-selective enzymes constitute most of clade 4, although for the most part the ee values of these are modest. We can speculate that clades 1, 3, and 4 evolved from ancestral sequences with an enantioselective active site accommodating substrates structurally resembling the (R)-4-cyano-3-hydroxybutyric acid. Changes in the active-site configuration, resulting in reversal of stereoselectivity, occurred independently multiple times during the evolution of these enzymes, indicating possible switches in function following mutations or gene duplications.
(S)- and (R)-mandelic acids, the products of enantioselective mandelonitrile hydrolysis, are important in the production of pharmaceutical and agricultural
intermediates (4) (see reaction scheme below).
Forty-eight nitrilases are active on mandelonitrile. The activity data fall into a radically different pattern from those of hydroxyglutaronitrile, with most active enzymes clustering in clade 2 (Fig. 1C). Activity was obtained only sporadically in other clades, including clades 1, 4, and 5. Forty-four of the active enzymes were (R)-selective, and 4 were (S)-selective, with ee values of up to 99% for (R)-mandelic acid and up to 30% for (S)-mandelic acid (Fig. 2). Most of the 37 (R)-specific enzymes of clade 2 were highly enantioselective, with ee values of >90%. Members of a clade 2 subgroup (2A24 to 2A32) of closely related sequences exhibit product ee values of
98%, as do several other more distant enzymes scattered throughout clade 2 (2A6, 2A16, and 2A17). Surprisingly, the two enzymes in this clade with the lowest stereoselectivity, 2A4 and 2A5 (ee = 9%), are 80% identical and are most closely related to the highly stereoselective enzymes 2A2 and 2A3 (ee > 90%;
50% sequence identity). A group of three related enzymes in clade 1A (1A8 to 1A10; 82 to 95% identity) and one enzyme in clade 5 (5B17) have opposite enantioselectivities and generate (S)-mandelic acid with 25 to 30% ee. The activities of these enzymes on mandelonitrile are significantly lower than those of clade 2. (S)-Mandelic acid yields of
25% were observed after 24 h of incubation, whereas most of the nitrilases of group 2 exhibited complete (R)-specific conversions within 6 h. While sequence clustering is observed relative to substrate specificity, the sequence, and consequently the structural factors that control enantioselectivity, are subtle, underscoring the need to sample large portions of sequence space to uncover functional biocatalyst diversity.
Hydrolysis of cyanohydrin substrates to yield chiral aryllactic acids (see reaction scheme above) is another important transformation providing valuable starting materials for fine chemical synthesis (4). Phenylacetaldehyde cyanohydrin does not appear in natural-product databases and may be a xenobiotic molecule.
Sixty-eight
nitrilases were active on phenylacetaldehyde cyanohydrin; however, the patterns of activity tend to be opposed to those observed for mandelonitrile (Fig. 1D). Specifically, most of the enzymes that sort into clades 1, 3, 4, and 5 are active on this substrate, while they are, with a few exceptions, inactive on mandelonitrile. Enantioselectivity also correlates with individual clades, and unlike observations for hydroxyglutaronitrile, there appear to be fewer evolutionary events that lead to reversal of enantioselectivity. In two cases, minor sequence differences effected a reversal. The first is an (S)-specific enzyme (5B17; ee = 77%) within the cluster of highly similar enzymes in clade 5, whose members are otherwise (R)-selective (5B12 to 5B20). The second example is the (R)-selective enzyme 1B15 (ee = 76%) within clade 1, whose members are otherwise exclusively (S)-selective for this substrate. Strikingly, 1B15 is 97% identical to its neighbor, (S)-selective (ee = 43%) 1B14. 1B15 also shows an enantioselectivity switch for 3-hydroxyglutaronitrile relative to other members of its clade (Fig. 1C).
An important result of this study is the extraordinary multiplicity of novel nitrilase sequences discovered in diverse uncultured microbial populations. In a parallel to results from groups who have recorded new clades of bacteria and archaea in studies of microbial diversity using rRNA indexing (5), we have discovered several new clades of nitrilases, substantially expanding the sequence space previously described for this subfamily, and in doing so we have facilitated access to chiral monomers important in pharmaceutical manufacture. Even so, the full breadth of diversity of these enzymes remains unknown. This is undoubtedly the case for all enzyme classes, as well as classes of other biomolecules.
High-throughput enzyme discovery has very clear benefits, both for substantially expanding protein sequence space and for solving problems in industrial catalysis. This discovery paradigm may be used to access other important and previously underexplored enzyme classes for use not only in biocatalysis but also in therapeutics, as well as evolutionary research. The nitrilase phylogenetic network has locales of exquisite selectivity, and multiple solutions have evolved which address synthetic issues of chemo- and enantiospecificity. Efforts are under way to elucidate the tertiary structures of selected nitrilases to extend these conclusions and to better understand the complex contributions of enzyme structure to activity and selectivity.
Laboratory methods to create gene sequence diversity by generating synthetic gene libraries comprised of arrays of point mutations or chimeric recombinants of selected parental genes are now available (10, 21, 23, 30). The directed-evolution approach is complementary to discovery and is enhanced by the availability of novel sequence templates available via the high-throughput functional-discovery methods described here (7). Enzyme discovery can directly solve problems in industrial catalysis and can also provide mutivariant starting points for rapid and highly targeted directed evolution. Clearly, the combination of discovery of new sequences and laboratory evolution where an ideal functional sequence does not exist is a key to the rapid solution of problems in industrial transformation, particularly in the area of chiral catalysis.
We dedicate this paper to Mark Madden, whose enthusiasm inspired and propelled our nitrilase research. ![]()
Present address: Syrrx Corporation, San Diego, CA 92121. ![]()
|
|
|---|
-aminophenylacetonitrile by nitrilase: development of a new biotechnology for stereospecific production of S-
-phenylglycine. Arch. Pharmacol. Res. 9:45-47.
-amylase. J. Biol. Chem. 277:26501-26507.
ZAP: a bacteriophage
expression vector with in vivo excision properties. Nucleic Acids Res. 16:7583-7600.
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»