12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Membrane <strong>Protein</strong> <strong>Structure</strong> Prediction 103Table 4.4 Machine learning-based beta-barrel TM <strong>to</strong>pology predic<strong>to</strong>rs. MSA: Topology predictionsmade using multiple sequence alignments. HGA: Suitable for whole genome analysisMethod URL Algorithm FeaturesB2TMRhttp://gpcr.biocomp. NNMSAunibo.it/predic<strong>to</strong>rs/TMBETA-NET http://psfs.cbrc.jp/ NNMSA, HGAtmbeta-net/HMM-B2TMR http://gpcr.biocomp. HMMMSAunibo.it/predic<strong>to</strong>rs/PROFtmbhttp://www.rostlab. HMMHGAorg/services/PROFtmb/PRED-TMBB http://biophysics.biol. HMMHGAuoa.gr/PRED-TMBB/TMBETA-SVM http://tmbeta-svm. SVMHGAcbrc.jp/TMB-Hunt2http://bmbpcu36.leeds.ac.uk/HMM + SVM HGAfor discriminating between globular and TM proteins. To do so requires the method<strong>to</strong> be specially trained for this process, and that the program is available as a standalonepackage as web-based predic<strong>to</strong>rs are unsuitable for such large-scale submissions.A number of methods which are suitable for whole genome analysis ofalpha-helical and beta-barrel TM proteins are shown in Tables 4.3 and 4.4. In general,error rates are minimised by prior filtering <strong>to</strong> remove signal and transit peptidesusing methods such as SignalP and TargetP, since many globular proteins <strong>with</strong> suchsignal sequences are frequently predicted as single spanning TM proteins. Currently,the best methods are capable of error rates of less than 1% for alpha-helical TM proteins(Jones 2007) and less than 6% for beta-barrel TM proteins (Park et al. 2005).The results of applying an alpha-helical TM protein discrimina<strong>to</strong>r <strong>to</strong> a number ofproteomes are shown in Fig. 4.6.4.6.4 Data Sets, Homology, Accuracy and Cross-ValidationA key element when constructing any prediction method is the use of a high qualitydata set for both training and validation purposes. Extracting a training set fromavailable databases requires a large amount of work and requires a number of criticaldecisions <strong>to</strong> be made. As an example in the case of TM proteins, searches of databasessuch as the PDB using the keyword ‘transmembrane’ will return both genomicallyencoded TM proteins as well as TM proteins that are not native, such as entry1BH1 – a bilayer disrupting peptide found in bee venom – and 1CII, a bacterial

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!