13.07.2015 Views

computer modeling in molecular biology.pdf

computer modeling in molecular biology.pdf

computer modeling in molecular biology.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14 Tim Ll? Hubbard and Arthur M. Lesk14 Yo of sequences could be associated with a prote<strong>in</strong> of known structure us<strong>in</strong>g standardsequence alignment methods.For the output, the divid<strong>in</strong>g l<strong>in</strong>e may soon become blurred if it becomes possibleto predict more than just secondary structure (1-D) when families of homologous sequencesare considered together. The Yeast chromosome I11 analysis found that 24%of sequences could be associated with an exist<strong>in</strong>g sequence family that had no knownstructure. As more prote<strong>in</strong>s are sequenced such families are com<strong>in</strong>g to have <strong>in</strong>creas<strong>in</strong>glylarge numbers of members with wider sequence diversity. S<strong>in</strong>ce related sequencesmay all be expected to adopt the same fold, any prediction must be consistentwith each sequence <strong>in</strong> such a family. This is a considerable restriction and hasallowed significant improvements <strong>in</strong> 1-D secondary structure prediction [12- 141, thelatter method be<strong>in</strong>g available to anyone with access to electronic mail (send “help”to Predictprote<strong>in</strong> @ embl-heidelberg-de). S<strong>in</strong>ce the number of natural folds isthought to be f<strong>in</strong>ite and may be as small as 1000 [I51 there will come a time whenall new sequences can be associated with a known prote<strong>in</strong> structure. There istherefore someth<strong>in</strong>g of a race between various methods - fold recognition versusfold prediction - that seek to elim<strong>in</strong>ate the current “unpredictable” region of sequencespace.Figure 2-1 does not <strong>in</strong>clude all prote<strong>in</strong> modell<strong>in</strong>g exercises, as it omits designedsequences. It is important to realise that even if methods to predict a structure consistentwith a large family of sequences are developed, this is not a solution of thefold<strong>in</strong>g problem. The assumptions that (1) any sequence folds and (2) folds aresimilar among homologous sequences are based on evolutionary reason<strong>in</strong>g, for sequencesthat do not fold would be selected aga<strong>in</strong>st and would not therefore beobserved by chance, and significant sequence homologies are only likely to occurthrough divergent evolution, i. e. from a s<strong>in</strong>gle fold. Designed sequences may not foldlike the sequence to which they appear to be related and <strong>in</strong> many cases may not foldat all. In order to be able predict the structure of a designed sequence it will benecessary to predict structure from <strong>in</strong>dividual sequences, ignor<strong>in</strong>g evolutionary relations,i.e. to solve the a priori fold<strong>in</strong>g problem [16].2.2 Prov<strong>in</strong>g a Sequence/Structural RelationshipThe first stage <strong>in</strong> any modell<strong>in</strong>g project should be to compare the sequence of theprote<strong>in</strong> of <strong>in</strong>terest with the contents of sequence databases. There are many sequencealignment programs available that can do this with vary<strong>in</strong>g speed and sensitivity. Theobjective is to f<strong>in</strong>d homologous sequences of known structure, but f<strong>in</strong>d<strong>in</strong>g anyhomologous sequence is useful s<strong>in</strong>ce it provides additional <strong>in</strong>formation about theprote<strong>in</strong> to be modelled.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!