12.07.2015 Views

Protein Engineering Protocols - Mycobacteriology research center

Protein Engineering Protocols - Mycobacteriology research center

Protein Engineering Protocols - Mycobacteriology research center

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Kono et al.structure is specified. In addition to this structural complexity, there is alsosequence complexity. Design involves identifying folding sequences from theenormous ensemble of possible sequences. This search is guided by the largedegree of “consistency” observed in folded proteins (1). On average, a foldedprotein is atomically well-packed with favorable van der Waals interactions,hydrophobic residues are sequestered from solvents, and most hydrogen-bondinginteractions are satisfied. However, this consistency is often complex and may havelittle simplifying symmetry. In addition, such noncovalent interactions are some ofthe most difficult to accurately quantify, and estimating free energies associatedwith mutation or structural ordering remains a subtle area of computational<strong>research</strong> (2,3). Despite their predictive power, presently we cannot expect to determinethe relative stability changes of large numbers of sequences using detailedsimulation methods for estimating free energy differences. Nonetheless, molecularpotentials derived from small molecules and from the protein structure databasedo contain partial information regarding the interactions and forces known to beimportant for specifying and stabilizing protein structures. In some cases, the optimizationof such potentials has lead to striking successes in protein design (4).Such potentials are necessarily approximate, and any sequence so designed islikely sensitive to the particular potential and target structure used. Alternatively,the partial information contained in these potentials may be used in a probabilisticmanner, to yield the likelihoods of the amino acids. A probabilistic approach is alsoappropriate for characterizing the full variability of sequences that may fold to acommon structure, because there are likely to be an enormous number of suchsequences—far more than can be addressed via sequence search or enumeration.Such probabilistic approaches are also particularly appropriate for de novoprotein design in the context of combinatorial protein experiments, which createand rapidly assay many sequences. Although combinatorial methods canaddress very large numbers of sequences (10 4 –10 12 ), these numbers are stillinfinitesimal compared with the numbers of possible protein sequences, e.g.,20 100 ≈ 10 130 for a 100-residue protein. Thus, even with combinatorial methods,we still must focus on selected regions of sequence space. This is most oftenaccomplished by preselecting a few residue sites within the protein by inspectionand allowing full or partial variability at these sites. Recently, computationalmethods have been developed that can keep track of a much wider rangeof sequence variability and provide quantitative methods for winnowing andfocusing the sequence space. Herein, we discuss computational methods forsequence design with an emphasis on probabilistic methods that address thesite-specific amino acid variability for a given structure.1.2. Directed Methods of <strong>Protein</strong> DesignHere, “directed protein design” refers to the identification of a sequence (ora set of sequences) likely to fold to a predetermined backbone structure. Each

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!