Previous Article | Next Article ![]()
Applied and Environmental Microbiology, May 2006, p. 3696-3701, Vol. 72, No. 5
0099-2240/06/$08.00+0 doi:10.1128/AEM.72.5.3696-3701.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Hal Alper,
Curt Fischer, and
Gregory Stephanopoulos*
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts
Received 16 December 2005/ Accepted 28 February 2006
|
|
|---|
promoter variants. Each of these
promoters was generated by error-prone PCR and incorporated numerous
mutations. The activity of the promoters was assayed using flow
cytometry to measure the fluorescence of a green fluorescent protein
reporter gene. Our analysis of the sequences of these mutants revealed
seven positions having a statistically significant correlation with
promoter activity. Using site-directed mutagenesis, we constructed
point mutations for several sites, both statistically significant and
insignificant, and combinations of these sites. Our results show that
the statistical method correctly elucidated the phenotypic
manifestations of these mutations. We suggest that this method may be
useful for expediting directed evolution experiments by allowing both
desired and undesired mutations to be identified and incorporated
between rounds of
mutagenesis. |
|
|---|
In many cases, directed evolution of genes and other functional DNA loci is an effective approach to sample the sequence space in search of biomolecules with desirable properties (7, 15). However, the most successful examples employ a selectable fitness criterion that allows for high-throughput screening of the mutational space: sampling a large enough space eliminates the need to make rational mutations. For many proteins or functional nucleic acids, it may not be possible to link a desired phenotype with a selectable criterion fit for high-throughput screening. In the absence of such a criterion, clonal populations of mutants must be assayed individually for the phenotype of interest. This scenario might be called "assay-based" directed evolution, a situation in which the upstream mutagenesis has a higher throughput than the downstream characterization does. In this scenario, there is a premium on information linking mutational changes to their phenotypic manifestations. Further, there is a strong incentive to "learn from" the (relatively small) mutational spectra of these mutants to determine sequence-phenotype interactions and to use this information rationally in subsequent rounds of mutagenesis.
Here,
we present a simple statistical method for analyzing a mutational
spectrum to parse out the phenotypic manifestation of individual
mutations, even when they are masked by the presence of many other
mutations. Because assay-based directed evolution does not employ any
prescreening or selection of clones, as is the case when a selectable
marker is available, mutants are expected to have a range of
phenotypes, including both increased and decreased fitness. Here, we
demonstrate our method by identifying mutations in a library of
mutagenized PL-
promoters
(2) that result in either
increased or decreased promoter activity, and we show how to quantify
the statistical confidence in these mutation-phenotype
linkages.
The central premise of our method is that mutations that have no effect on mutant phenotype should partition randomly, following a multinomial distribution, between phenotypic classes. For example, consider a hypothetical experiment in which we mutagenize a protein that can fluoresce in one of three colors: red, blue, or green. After generating a library of 1,000 mutants, each bearing many point mutations, our assay reveals that 600 have the red phenotype, 300 are blue, and 100 are green. If a particular point mutation has no effect on the color, then we expect that, by chance, mutants containing this modification will be distributed between the red, blue, and green classes in a ratio of 6:3:1. That is, the mutation should not be correlated to any particular phenotypic class. More rigorously, we say that the mutations are multinomially distributed between the three classes with background frequencies of 0.6, 0.3, and 0.1.
Multinomial statistics and related combinatorial statistics commonly arise in the analysis of naturally occurring mutational diversity (1, 13). For example, similar statistical analyses have been used to find functional gene domains (9), important structural RNA sites (8), and genomic loci with an overabundance of single-nucleotide polymorphisms (16). Here we apply multinomial statistics to the analysis of an artificially generated mutational landscape to parse out critical residues controlling phenotypic behavior. We show that, based on this information, mutants with sets of individual mutations can be made, and we suggest that this can be used as a method for improving directed evolution experiments by incorporating sequence information.
In what follows, we detail
the construction of numerous PL-
promoter variants,
which were generated by error-prone PCR such that each mutant
incorporated many point mutations. The activity of these promoters was
assayed using flow cytometry to measure the fluorescence of a green
fluorescent protein (GFP) reporter gene. We show how our statistical
analysis revealed the phenotypic manifestation of numerous mutations.
Finally, we present a validation of our method by constructing point
mutations for several of the identified mutations and combinations of
sites using site-directed mutagenesis. These mutations, we show, have
the predicted effect on the promoter phenotype, even when removed from
the background of other
mutations.
|
|
|---|
(Invitrogen) was used for routine transformations as
described in the protocol. Assay strains were grown at 37°C
with 225 rpm orbital shaking in M9 minimal medium
(11) containing 5 g/liter
D-glucose (M9G) supplemented with 0.1% Casamino Acids. All
other strains and propagations were cultured at 37°C in LB
medium. The medium was supplemented with 68 µg/ml
chloramphenicol. All PCR products and restriction enzymes were
purchased from New England BioLabs and utilized Taq
polymerase. M9 minimal salts were purchased from U.S. Biological, and
all remaining chemicals were from
Sigma-Aldrich.
Library construction.
Nucleotide
analogue mutagenesis was carried out in the presence of 20 µM
8-oxo-2'-deoxyguanosine (8-oxo-dGTP) and
6-(2-deoxy-ß-D-ribofuranosyl)-3,4-dihydro-8H-pyrimido-[4,5-c][1,2]oxazin-7-one
(dPTP) (TriLink Biotech), using plasmid pZE-gfp(ASV) kindly provided by
M. Elowitz as the template
(10) along with the
primers PL_sense_AatII
(TCCGACGTCTAAGAAACCATTATTATC) and
PL_anti_EcoRI
(CCGGAATTCGGTCAGTGCGTCCTGCTGAT). Ten and 30
amplification cycles with the primers mentioned above were
performed. The 151-bp PCR products were purified using theGeneClean spin kit (Qbiogene). Following digestion with AatII
and EcoRI, the product was ligated overnight at 16°C and
transformed into the library of E. coli DH5
mutants (Invitrogen). About 30,000 colonies were screened by eye from
agar plates containing minimal medium and Casamino Acids, and 200
colonies, spanning a wide range in fluorescent intensity,
were picked from each plate. Selected mutants were sequenced using
primers PL_Sense_Seq
(AGATCCTTGGCGGCAAGAAA) and
PL_Anti_Seq
(GCCATGGAACAGGTAGTTTTCCAG).
Library characterization.
About 20
µl of overnight cultures of library clones growing in LB broth
were used to inoculate 5 ml M9G medium supplemented with
0.1% (wt/vol) Casamino Acids. The cultures were grown at 37°C
with orbital shaking. After 14 h, roughly the point of
glucose depletion, a culture sample was centrifuged at 18,000 x
g for 2 min, and the cells were resuspended in ice-cold water.
Flow cytometry was performed on a Becton-Dickinson FACScan as described
elsewhere (2), and the
geometric mean of the fluorescence distribution of each clonal
population was calculated.
The means and standard deviations were calculated from the FL1-H distribution resulting after gating the cells based on a forward scatter-side scatter plot. A total of 200,000 events were counted to gain statistical confidence in the results.
Construction of designed promoters.
Promoters with
specific nucleotide changes were created using overlap-extension PCR
and primers specifically designed to incorporate these changes. Primers
were designed to divide the promoter region into thirds, and the proper
primers were assembled piecewise in a PCR consisting of 95°C
for 4 min, 10 cycles with an annealing temperature of 44°C,
followed by 30 cycles of PCR with an annealing temperature of
60°C, and a final extension for 3 min at 72°C.
Fragments were gel extracted using 2.5% agarose gels and the QIAGEN
MERmaid spin kit. The isolated fragment was then linked with the final
primer using the same PCR and extraction procedures. These fragments
were then digested using EcoRI and AatII and ligated into the digested
plasmid backbone. Sequencing was performed to verify correct
constructs.
|
|
|---|
promoter (3), which was
placed upstream of a gfp gene. The promoter region contains
two tandem promoters, PL-1 and PL-2, each of
which contains 10 and 35 sigma factor binding sites
(4,
5,
6). Furthermore, the
promoter contains, at approximately the same location, an UP element
that binds the C-terminal domain of the alpha subunit and a binding
site for integration host factor (IHF). In addition, the
PL-TET01 promoter has two tetO2 operators from the
Tn10 tetracycline resistance operon
(10). Mutants in the library were analyzed using flow cytometry to measure the single-cell level of expression of GFP as a proxy for the activity of the mutagenized promoters. (A detailed schematic of the experimental procedure is shown in Fig. 1.) Promoters that had roughly log-normal fluorescence distributions (no obvious tails in the distribution or bimodal distributions) were sequenced, and those mutants that contained deletions or insertions were removed from that set. The final set comprised 69 mutant promoters, with well-behaved fluorescence distributions (single distribution with a low standard deviation), that contained only transition and transversion mutations. Notably, our error-prone PCR method introduces predominantly transitions and not transversions, except in rare cases.
![]() View larger version (23K): [in a new window] |
FIG. 1. Schematic
of the experimental procedure. A variant of the constitutive
bacteriophage PL- promoter (PL-TET01)
was mutated through error-prone PCR to create mutant promoters. Plasmid
constructs containing these promoters were used to drive the expression
of gfp in E. coli. Clonal populations of
promoter mutant cells were then analyzed using flow cytometry to
quantify the fluorescence of GFP and output capacity of the promoter.
Kan, kanamycin resistance; FACS, fluorescence-activated cell sorting;
FSC, forward scatter; SSC, side scatter; Freq., frequency; Fluor.,
fluorescence.
|
ni = N. Consider a subset
of mutants B of size X, where X <
N, comprising mutants with a particular mutation. If the
mutation does not influence the phenotype of the mutants, we would
expect, by chance, that there would be xi
=
/N
mutants of type Pi. In general, the probability
(Pr) that the set {x1,
x2,...,
xM} will take on the particular set of
values {y1,
y2,...,
yM} is
![]() | (1) |
yi = X. In this equation,
the term
![]() | (2) |
![]() | (3) |
The probability that q or more (where
q < X) of the B mutants would be
seen in a particular class, Pi, by chance
is
![]() | (4) |
For this study, we divided the mutants into two phenotypic classes on the basis of their fluorescence (i.e., M = 2): the top 50th percentile and the lower 50th percentile. Figure 2 shows a detailed schematic of the statistical analysis, which is greatly simplified in this case because there are only two phenotypes. As shown in the figure, applying our statistical method to the sequence data resulted in the identification of seven nucleotide positions that are correlated with one of the two phenotypic classes in a statistically significant manner. The figure should be read clockwise from the top left, progressively showing the fluorescence distribution, mutation distribution, statistical distribution of mutations, and finally, the identified important positions in Fig. 2D in the lower left (see the legend to Fig. 2 for more detail).
![]() View larger version (36K): [in a new window] |
FIG. 2. Statistical distribution of mutations and their effects on mutant fluorescence. In panel A, the vertical axis shows the mutant number, where the mutants
are sorted in descending order by their relative fluorescence. In
general, the single-cell fluorescence distribution for each
mutant strain was log normal distributed. The horizontal axis shows the mean of the
log relative fluorescence for each mutant strain, where the error is
the standard deviation of this distribution. Reading to the right from
panel A into panel B reveals the point mutations present in each
mutant. For each location in a mutant (where location is indicated on
the horizontal axis) that was changed via the error-prone PCR, a black
dot is indicated. With only two exceptions, all of these changes are
base transitions rather than transversions, so the sequence of each of
the 69 clones can be inferred from the wild-type sequence shown in
panel D. (All of the mutations indicated in panel B are transitions
with the exception of one A-C transversion at 125 bp in clone
53 and one T-G transversion at 8 in clone 68. These were
treated as though they were transitions in our analysis.) Reading down
from panel B into panel C shows how mutations at a particular location
partition between the two classes of mutants: the top and bottom 50th
percentiles. Sites that have no effect on the fluorescence phenotype
should partition equally between the two classes, i.e., they should
follow a binomial distribution with P = 0.5. Sites
that deviate from this distribution are labeled with a dot and are
colored either green or red, corresponding to the apparent effect of a
mutation at the site. For these sites, P values are indicated,
where this value is the probability of seeing a distribution at least
as skewed to one side. Sites that were subsequently tested
experimentally (see text) are indicated with an asterisk, where the
color of the asterisk denotes the expected effect of a mutation at the
site. We chose a range of sites to test experimentally from sites with
high-confidence (low P value) positive effects to those with
low-confidence (P value 0.5) negative effects (Table
1). These sites are also
shown in panel D, which contains the wild-type nucleotide sequence of
the promoter region that was subjected to
mutation.
|
|
View this table: [in a new window] |
TABLE 1. Summary of site-directed mutagenesis
locia
|
|
View this table: [in a new window] |
TABLE 2. Summary of double and triple mutants constructed by site-directed mutagenesis
|
|
|
|---|
It is interesting to note that while most of the statistically significant mutations are near the sigma factor binding sites, two are located further upstream of this region. The 123 site, which was not statistically significant, but was tested experimentally, showed that such distal sites are participating in the regulation of transcription.
There are a few caveats to the use of our statistical method. First, the method assumes independence between mutations. That is, we assume mutated sites cannot interact. As shown in Table 2, four of six of the combination mutations had the predicted effect. The two combination mutants that had unintuitive phenotypes could be a result of interaction between sites. (Notably, the 82, 14, 21 triple mutant appeared to have a high fluorescence by visual inspection in a rich medium preculture; however, quantification of GFP activity by flow cytometry revealed consistently low measurements in the minimal medium used.)
The second caveat is that the method can require a significant number of mutants for each position: for a position to be statistically significant in our particular experiment, at least four observations were required. (This would be true for any two-phenotype mutational spectra, where each phenotype occurs with equal prior probability.) The number of observations required scales roughly with the number of mutation types. Our mutagenesis method introduced only transitions, not transversions, which allowed us to treat each site as "mutated" or "not mutated" without loss of information. The method can by applied to cases in which all four nucleotides are present; however, roughly four times as many observations would be required to make a statistically significant correlation between a particular nucleotide (at a single position) and a phenotype. Finally, the statistical method presented here is applicable only to situations in which the method used to introduce sequence diversity does not also introduce deletions or insertions. Ignoring relatively small insertions or deletions in the analysis would not significantly bias the results of identifying critical residues (data not shown). However, rigorously, alterations would be needed to differentiate between deletions and mutations in our statistical framework. In such cases, more-complex models could be adapted, such as those used to describe the distribution and effects of naturally occurring mutations over a fitness landscape for populations under positive and negative selective pressures (12, 14).
Despite its caveats, this method has a significant advantage compared to deducing critical mutations using sequence data from only the best-performing mutants. Intuitively, if we were to ignore the bottom 50th percentile in Fig. 2C, we may mistakenly identify sites as associated with high fluorescence that are, in fact, evenly distributed between the two classes. That is, having sequence data for multiple phenotypes allowed us to determine, with quantifiable confidence, the effect of each individual mutation in a way that discounts artifacts of the mutagenesis method, such as a bias for mutagenizing particular loci.
We also thank the MIT biopolymers laboratory for DNA sequencing.
These authors contributed equally to this work. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»