12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

32 L.A. Kelleyalways unique) structure driven by the energetically favourable interactionsbetween the amino acids <strong>with</strong>in the structure, and between the amino acids and thesurrounding solvent.If one were able <strong>to</strong> understand what spatial and solvent interactions stabilise agiven structure, then one could both detect compatible sequences given a structureand design sequences that fit that structure. This is the concept of threading. Givena sequence whose structure we wish <strong>to</strong> predict, one aligns or ‘drapes’ this sequenceover each of the known structures in our database. In each case one calculates ascore <strong>to</strong> represent how favourable our sequence is <strong>with</strong> each structure. A structure<strong>with</strong> a highly favourable score will be our prediction. But what are these favourableinteractions and how do we calculate their magnitude? Fortunately, thanks <strong>to</strong> thediligent work of many experimentalists around the world, we have a database ofnative protein structures; a database of favourable interactions.By careful statistical analysis of the distribution of the different amino acid typesthroughout known protein structures, powerful sequence-structure relationships can beinferred, and used <strong>to</strong> tackle prediction problems. These empirically-derived or ‘knowledge-based’force fields are widely used across the entire spectrum of protein structureprediction techniques and their key role in ab initio modelling means many of thedetails may be found in that chapter. Nevertheless, a brief summary will be useful.2.2.1 Knowledge-Based PotentialsTo empirically derive rules relating protein sequence <strong>to</strong> three-dimensional structurerequires (1) a large number of examples of sequences and their correspondingstructures and (2) a structural feature of proteins one wishes <strong>to</strong> analyse. A simpleillustration of the technique is the generation of a solvation potential. Any globularprotein in its folded native state has some residues buried in the (largely hydrophobic)interior and some residues (largely hydrophilic) on the surface exposed <strong>to</strong> the surroundingsolvent. It is straightforward <strong>to</strong> calculate <strong>to</strong> what extent a given residue R isexposed or buried in a protein of known structure. One method, albeit crude, issimply <strong>to</strong> measure how many other residues are <strong>with</strong>in a certain distance of theresidue R (more sophisticated methods are usually used; Richmond 1984; Kabschand Sander 1983). So it is possible <strong>to</strong> compile a list of every residue in every knownprotein structure <strong>to</strong>gether <strong>with</strong> its associated level of solvent accessibility (in termsof neighbours). With these data it is now possible <strong>to</strong> use a variety of statisticaltechniques <strong>to</strong> attempt <strong>to</strong> discover any relationship between amino acid type and itspropensity <strong>to</strong> be on the interior or exterior of the protein. One of the common methodsused is based on statistical mechanics or Bayesian statistics (for a comparison <strong>with</strong>other methods see Xia and Levitt 2000). First proposed by Tanaka and Scheraga(1976) and later refined by Sippl (1990) and Myazawa and Jernigan (1996), thesemethods all rely on Boltzmann statistics.First one assumes that protein structures in the database constitute a kind ofensemble and that the levels of solvent exposure of a residue type <strong>with</strong>in proteins

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!