12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Membrane <strong>Protein</strong> <strong>Structure</strong> Prediction 105process is continued until all subsets have been validated. Two types are commonin TM <strong>to</strong>pology prediction. In K-fold cross-validation, the data set is partitionedin<strong>to</strong> K subsets. Of the K subsets, a single subset containing a number of sequencesis retained as validation data for testing the model, while the remaining K-1 subsetsare used as training data. This process is then repeated K times (folds), <strong>with</strong> eachof the K subsets being used exactly once as the validation data. The K results fromthe folds can then either be combined or averaged <strong>to</strong> produce a single estimation.A more stringent, although computationally more intensive form of cross-validationis leave-one-out cross-validation (LOOCV), also referred <strong>to</strong> as a jack knife test.Jack knifing involves testing a single sequence from the data set against the remainingsequences which make up the training set, then repeating the test such that everysequence is validated once. This is the same as a K-fold cross-validation <strong>with</strong> Kbeing equal <strong>to</strong> the number of sequences in the data set.While some studies have attempted <strong>to</strong> compare TM <strong>to</strong>pology prediction accuracybetween different methods (e.g. Melén et al. 2003), significant progress hasbeen made since then. Currently, the best TM <strong>to</strong>pology predic<strong>to</strong>rs claim <strong>to</strong> predictcorrect <strong>to</strong>pologies for 80–93% of proteins, though in the absence of independentcross-validation using a common test set it is difficult <strong>to</strong> accurately compare methods.Those which perform well when tested on a particular data set, e.g. one containingfew signal peptides, may perform poorly when tested on a data set whichcontains many signal peptides. Methods optimised on a data set containing manyweakly hydrophobic TM helices may tend <strong>to</strong> over predict TM helices in other datasets. Current gold-standard TM protein data sets <strong>with</strong> <strong>to</strong>pologies derived solelyfrom structural data contain no more than 150 sequences when homology reduced(Lomize et al. 2006b), but a lack of consensus amongst these combined <strong>with</strong> thescarcity of necessary cross-validation files means that differences in accuracybetween methods may thus be a result of differences in training and validation datasets rather than significant differences in performance.4.7 3D <strong>Structure</strong> PredictionAs <strong>with</strong> globular proteins, 3D structure prediction of TM proteins can be dealt <strong>with</strong>via two approaches, homology modelling and ab initio modelling, covered inChapters 1 and 3 of this book.Homology modelling, also known as comparative modelling, involves the use of arelated template structure in order <strong>to</strong> build a 3D model of a target protein. The methodis based on the observation that protein structure is conserved more highly than aminoacid sequence, hence even proteins that have diverged significantly in sequence butstill share detectable similarity (>30% sequence identity) may also share commonstructural properties, particularly the overall fold. Due <strong>to</strong> the difficulties involved inobtaining high-resolution crystal structures, particularly <strong>with</strong> regard <strong>to</strong> TM proteins,homology modelling can provide useful structural models for generating hypothesesabout a protein’s function and directing further experimental work. The process can be

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!