12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

72 Uversky et al.sequence space and analyzed the space of various physicochemical properties (55),confirming the existence of several biases in IDP sequences. The mentionedsequence biases were exploited to develop a multitude of highly accurate predictorsof ID regions, which then were used to estimate the commonness of IDPs inthe three kingdoms of life, as well as to elaborate first identifiers of IDP function.The first predictor of ID regions was reported in 1997 (54). This two-layer feedforwardneural network, which achieved a surprising accuracy of about 70% clearlymarked the beginning of a new epoch by showing that (1) there are significant compositionaldifferences between ordered and ID protein regions, (2) the lack of fixedprotein 3D structure is predictable from amino acid sequence alone, and (3) IDregions of different lengths (short, medium, and long) may be compositionally differentfrom each other. The predictive model was later extended to the VLXT predictor(51), which is a combination of the VL1 and XT predictors (56). The lettersdescribe the amino acids used for training, where VL stands for Variously-characterizedLong disordered internal regions and XT stands for X-ray characterizedTerminal regions. The VLXT designation is preceded by a descriptive prefix,Predictor of Natural Disordered Regions (PONDR) giving PONDR VLXT.In 2000, it was noticed that natively unfolded proteins can be separated fromordered proteins by considering their average net charge and hydropathy (25).This observation led to the development of a simple binary classifier, the chargehydropathyplot (CH-plot) (25), which was based on the analysis of the aminoacid composition and instead of predicting ID on a per residue basis, classifiedentire protein as compact or natively unfolded. Another binary classifier is thecumulative distribution functions (CDF) analysis of disorder scores, which separatesordered and disordered sequences based on the per-residue disorder scoreretrieved by PONDR VLXT, and the optimal boundary (57,58). This methodsummarizes the per-residue predictions by plotting PONDR scores against theircumulative frequency, which allows ordered and disordered proteins to be distinguishedbased on the distribution of prediction scores.Later, more sophisticated methods based on various statistical and machinelearningtechniques (including bagging and boosting [59] and linear regressionmodel for the prediction of long disordered regions [60]) emerged, culminatingin the inclusion of the disorder prediction as a separate category in the CriticalAssessment of (protein) Structure Prediction (CASP) experiments (61,62).Table 1 presents the information related to those ID predictors that are scientificallynovel and/or published. These predictors are briefly outlined as follows:1. DISOPRED (63) is a neural network classifier trained on the position-specificscoring matrices and combined disorder prediction with the predictor of secondarystructure (64).2. PONDR VL3 is an ensemble of feed-forward neural networks that uses evolutionaryinformation and is trained on long disordered regions (65).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!