17.12.2012 Views

crc press - E-Lib FK UWKS

crc press - E-Lib FK UWKS

crc press - E-Lib FK UWKS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

312 Cell-Penetrating Peptides: Processes and Applications<br />

and its cleavage site is predicted. The authors reported that their algorithm, SignalP,<br />

can predict the presence of signal peptides with high accuracy (72.4% for eukaryotes)<br />

and the program has been widely used. Later, SignalP was incorporated in TargetP,<br />

which can also predict the presence of mitochondrial targeting signals and chloroplast<br />

transit signals. 219 In version 2.0 of SignalP, the hidden Markov model method<br />

(see below) is also incorporated (http://www.cbs.dtu.dk/services/SignalP-2.0/).<br />

One of the potential problems of the ANN-based approach is that the method<br />

often uses too many numeric parameters compared with the size of the training data<br />

used. In such a case, overfitting of the network to the training data is a danger; i.e.,<br />

the network may memorize the examples without extracting general sequence features.<br />

To overcome this difficulty, Jagla and Schuchhardt used the adaptive encoding<br />

artificial neural network method in which they could reduce the number of parameters<br />

to one tenth without loss of prediction accuracy. 220 The approach of Bannai<br />

et al. grows from a motivation to overcome the difficulty that learned knowledge is<br />

very hard to interpret from the obtained parameter sets. 208 Thus, they extensively<br />

searched a much simpler parameter space directly interpretable as prediction rules.<br />

14.5.1.3 Global Structure-Based Methods<br />

The other category of signal peptide prediction methods is an approach trying to<br />

recognize the three-domain (tripartite) structure of signal peptides. A pioneering<br />

work on this approach was done by McGeoch. 221 In his algorithm, the start site of<br />

the h-region was first searched within the N-terminal 12 residues; then, the length<br />

of the h-region and the maximally hydrophobic eight-residue segment in it were<br />

used to judge the presence of signal peptide. Later, the method was refined using<br />

discriminant analysis in a general purpose prediction program of subcellular localization<br />

sites, PSORT/PSORT II. 222-224 At that time, the third variable, net charge of<br />

the n-region, was also considered.<br />

Note that McGeoch’s algorithm does not take the (–3, –1) rule into account.<br />

Therefore, it cannot predict the cleavage site. However, this feature was used in<br />

PSORT to detect signal anchors (uncleavable signal peptides) by combining von<br />

Heijne’s and McGeoch’s algorithms; namely, a protein is predicted to have a signal<br />

anchor when McGeoch’s method predicts positively and von Heijne’s algorithm<br />

negatively. 222 However, as mentioned earlier, it is difficult to exactly predict signal<br />

anchors exactly only from the absence of potential cleavage sites.<br />

Hidden Markov model (HMM) has been widely used in various problems of<br />

speech recognition and bioinformatics, including representation of protein functional<br />

domains and gene finding. 216,225 It is a kind of probabilistic method where a predefined<br />

model is optimized from a set of training data and then the constructed<br />

model is used to scan target sequences. One of the strengths of HMM is that it can<br />

flexibly model various sequence features, including the tripartite structure of signal<br />

peptides. An application of HMM to signal peptide prediction was done by Nielsen<br />

and Krogh. 226 Their program, named SignalP-HMM, can predict cleavable signal<br />

peptides and also signal anchors by modeling the structure of type II signal anchor.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!