05.01.2013 Views

Addressing the numbers problem in directed evolution.pdf

Addressing the numbers problem in directed evolution.pdf

Addressing the numbers problem in directed evolution.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The Numbers Problem <strong>in</strong> Directed Evolution<br />

O f ¼ T=V ¼ lnð1 P iÞ ð3Þ<br />

Upon calculat<strong>in</strong>g <strong>the</strong> oversampl<strong>in</strong>g factor O f as a function of<br />

<strong>the</strong> percentage coverage, one obta<strong>in</strong>s <strong>the</strong> curve shown <strong>in</strong><br />

Figure 1, which should be kept <strong>in</strong> m<strong>in</strong>d when design<strong>in</strong>g and<br />

Figure 1. Correlation between librarycoverage and oversampl<strong>in</strong>g of enzyme<br />

variants.<br />

analyz<strong>in</strong>g libraries. It can be seen that for ensur<strong>in</strong>g 95% coverage,<br />

for example, <strong>the</strong> oversampl<strong>in</strong>g factor O f amounts to about<br />

three, which means that a threefold excess of transformants<br />

needs to be screened. Due to <strong>the</strong> exponential character of <strong>the</strong><br />

relationship, degrees of coverage beyond 95 % require vastly<br />

higher screen<strong>in</strong>g efforts. Of course, lower degrees of library<br />

coverage might suffice <strong>in</strong> a given experiment, [16] but decisions<br />

ACHTUNGTRENUNGregard<strong>in</strong>g this important aspect can be guided byconsult<strong>in</strong>g<br />

Figure 1.<br />

In manysaturation mutagenesis experiments sites that are<br />

composed of more than one aa position are randomized.<br />

[13, 16,19–23] Us<strong>in</strong>g <strong>the</strong> appropriate algorithm, [24] we calculated<br />

for <strong>the</strong> NNK and NDT libraries <strong>the</strong> amount of oversampl<strong>in</strong>g, <strong>in</strong><br />

absolute <strong>numbers</strong>, necessaryfor 95 % coverage of relevant<br />

prote<strong>in</strong> sequence space (Table 1). It is immediatelyclear that<br />

<strong>the</strong> potential screen<strong>in</strong>g effort is verydifferent for <strong>the</strong> NNK<br />

versus NDT systems. For example, <strong>in</strong> <strong>the</strong> case of a site com-<br />

Table 1. Oversampl<strong>in</strong>g necessaryfor 95 % coverage as a function of NNK<br />

and NDT codon degeneracy.<br />

NNK NDT<br />

Codons Transformants Codons Transformants<br />

needed needed<br />

1 32 94 12 34<br />

2 1 028 3066 144 430<br />

3 32 768 98 163 1 728 5175<br />

4 1048 576 3 141 251 20 736 62 118<br />

5 33 554 432 100 520 093 248 832 745 433<br />

6 > 1.0 ” 10 9<br />

> 3.2”10 9<br />

> 2.9 ” 10 6<br />

> 8.9”10 6<br />

7 > 3.4”10 10<br />

> 1.0 ”10 11<br />

> 3.5 ” 10 7<br />

> 1.1”10 8<br />

8 > 1.0”10 12<br />

> 3.3 ” 10 12<br />

> 4.2 ” 10 8<br />

> 1.3”10 9<br />

9 > 3.5”10 13<br />

> 1.0 ” 10 14<br />

> 5.1 ” 10 9<br />

> 1.5 ” 10 10<br />

10 > 1.1”10 15<br />

> 3.4 ” 10 15<br />

> 6.1”10 10<br />

> 1.9 ” 10 11<br />

No. [a]<br />

[a] Number of aa positions at one site.<br />

posed of three aa’s, NNK requires almost 100 000 clones,<br />

whereas NDT needs onlyabout 5000 for 95% coverage.<br />

Figures 2 and 3 show this dependencyover <strong>the</strong> whole range<br />

of coverage (0–95 %) for sites composed of 1, 2, 3, 4, and 5 aa<br />

positions as a function of NNK or NDT codon degeneracy. Here<br />

Figure 2. Librarycoverage calculated for NNK codon degeneracyat sites<br />

compris<strong>in</strong>g 1, 2, 3, 4 and 5 am<strong>in</strong>o acid positions.<br />

Figure 3. Librarycoverage calculated for NDT degeneracyat sites compris<strong>in</strong>g<br />

1, 2, 3, 4 and 5 am<strong>in</strong>o acid positions.<br />

aga<strong>in</strong> <strong>the</strong> pronounced differences <strong>in</strong> NNK and NDT libraries<br />

become apparent. For example, if <strong>in</strong> <strong>the</strong> case of a three-aa site,<br />

<strong>the</strong> experimenter restricts <strong>the</strong> screen<strong>in</strong>g to 5000 transformants<br />

for practical and economical reasons, <strong>the</strong> use of an NDT library<br />

correlates with about 95% coverage, whereas <strong>the</strong> same<br />

number of screened clones <strong>in</strong> an NNK libraryallows for only<br />

15% coverage. One can suspect that <strong>in</strong> such size-restricted<br />

ACHTUNGTRENUNGlibraries, <strong>the</strong> NDT libraryshould be characterized byhigher<br />

quality.<br />

It is straightforward to analyze <strong>in</strong> <strong>the</strong> same way o<strong>the</strong>r codon<br />

degeneracies that might be more suitable for a given task. In<br />

general we conclude that it is best to choose systems <strong>in</strong> which<br />

<strong>the</strong> number of codons is equal to <strong>the</strong> number of aa’s. This reduces<br />

<strong>the</strong> <strong>in</strong>herent bias and over-representation of certa<strong>in</strong> aa’s.<br />

Assum<strong>in</strong>g all aa’s are equallyprobable, a maximum number of<br />

dist<strong>in</strong>ct prote<strong>in</strong> variants <strong>in</strong> a given libraryis <strong>the</strong>n ensured.<br />

Moreover, codon degeneracies such as NDT exclude <strong>the</strong> occurrence<br />

of WT transformants <strong>in</strong> <strong>the</strong> enzyme libraries. Both facets<br />

can be expected to contribute to <strong>the</strong> qualityof <strong>the</strong> focused<br />

ACHTUNGTRENUNGlibraries.<br />

ChemBioChem 2008, 9, 1797 – 1804 2008 Wiley-VCH Verlag GmbH & Co. KGaA, We<strong>in</strong>heim www.chembiochem.org 1799

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!