12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

116 P. Tompa5.2.3 Low Sequence Complexity and DisorderAnother manifestation of the repetitive nature of IDPs is low sequence complexityof their polypeptide chains. Application of an entropy function (Shannon 1948) <strong>to</strong>amino acid sequences of proteins (Woot<strong>to</strong>n 1994a, b) has shown that globular proteinsappear mostly <strong>to</strong> be in a high-entropy (complexity) state, whereas in manyother proteins long regions apparently of low complexity can be observed. As muchas 25% of all amino acids in Swiss-Prot are in low-complexity regions, and 34% ofall proteins have at least one such segment (Woot<strong>to</strong>n 1994a, b). The exact relationshipof low complexity and disorder has been addressed in two studies. First, therelationship of alphabet size (number of amino acids) and complexity <strong>to</strong> the capacityof folding was studied (Romero et al. 1999). It was found that SwissProt proteinscover the entire possible range of alphabet size (1–20) and entropy range (K = 0.0–4.5),whereas globular domains only occupied a limited region (alphabet = 10–20,K = 3.0–4.2). Regions corresponding <strong>to</strong> lower values (down <strong>to</strong> alphabet size = 3and K = 1.5) mostly correspond <strong>to</strong> structured, fibrous proteins, such as coiled-coils,collagens and fibroins. It was concluded that a minimal alphabet size of 10 andentropy near 2.9 are necessary and sufficient <strong>to</strong> define a sequence that can fold in<strong>to</strong>a globular structure. By extending these studies <strong>to</strong> IDPs (Romero et al. 2001), it wasshown that the complexity distribution of disordered proteins is shifted <strong>to</strong> lowervalues, but significantly overlaps <strong>with</strong> that of ordered proteins. Overall, disorderedand low-complexity regions correlate and are abundant in proteomes, but lowcomplexityand disorder should not be treated as synonyms.5.3 Prediction of DisorderBased on the noted compositional bias, about 25 predic<strong>to</strong>rs of disorder have beendeveloped (see Table 5.1 (Ferron et al. 2006; Dosztanyi et al. 2007) ). The best predic<strong>to</strong>rsapproach the accuracy of the best secondary structure prediction algorithms,and the principles of comparing their performance have already been laid down.5.3.1 Prediction of Low-Complexity RegionsAs shown by the aforementioned studies, low sequence complexity differs from disorder,yet prediction of low complexity regions can be considered as a first reasonableapproach <strong>to</strong> assessing disorder, or at least the lack of globularity. The entropyfunction of Shannon (Shannon 1948), adapted <strong>to</strong> the case of protein sequences(Woot<strong>to</strong>n 1994a, b) forms the basis of the SEG program routinely used <strong>to</strong> identifysequentially biased fragments of low compositional complexity measures. This practicehas a definite value in delineating non-globular regions of proteins.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!