03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: 000 Category: Abstract template<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P31. KMAD: KNOWLEDGE BASED MULTIPLE SEQUENCE ALIGNMENT<br />

FOR INTRINSICALLY DISORDERED PROTEINS<br />

Joanna Lange 1,2 , Lucjan S Wyrwicz 1 & Gert Vriend 2* .<br />

Laboratory of Bioinformatics and Biostatistics, M. Sklodowska-Curie Memorial Cancer Center;<br />

Institute of Oncology 1 , CMBI, Radboud University Nijmegen 2 . * vriend@cmbi.ru.nl<br />

INTRODUCTION<br />

Intrinsically disordered proteins (IDPs) lack tertiary<br />

structure and thus differ from globular proteins in terms of<br />

their sequence – structure – function relations. IDPs have a<br />

lower sequence conservation, different types of active<br />

sites, and a different distribution of functionally important<br />

regions, which altogether makes their multiple sequence<br />

alignment (MSA) difficult.<br />

Algorithms underlying existing MSA programs are<br />

directly or indirectly based on knowledge obtained from<br />

studying three dimensional protein structures. Hereby we<br />

introduce a tool for Knowledge based Multiple sequence<br />

Alignment for intrinsically Disordered proteins, KMAD,<br />

that incorporates SLiM, domain, and PTM annotations to<br />

improve the alignments.<br />

KMAD web server is accessible at<br />

http://www.cmbi.ru.nl/kmad/. A standalone version is<br />

freely available.<br />

METHODS<br />

Dataset of proteins experimentally proven to be disordered<br />

was obtained from DisProt (Sickmeier et al., 2007). For<br />

each IDP all homologous sequences were extracted from<br />

SwissProt (The Uniprot Consortium, 2014) using BLAST.<br />

The sequence sets were aligned with several MSA tools.<br />

Apart from manual validation we also performed a<br />

benchmark validation on reference sets from BAliBASE<br />

(Thompson et al., 2005) and PREFAB holding structurebased<br />

'gold standard' sequence alignments. For this<br />

purpose we used KMAD and a modified version of<br />

KMAD, which performs a ’refinement’ of Clustal Omega<br />

(Sievers et al., 2011) alignments.<br />

RESULTS & DISCUSSION<br />

Manual validation showed that KMAD bypasses many<br />

mistakes made by Clustal Omega. An example of an<br />

alignment mistake is shown on Figure 1.<br />

a) Clustal Omega<br />

b) KMAD<br />

FIGURE 1. Excerpts from Clustal Omega and KMAD alignments of<br />

human sialoprotein (SIAL HUMAN) with four homologues. Various PTM<br />

kinds are highlighted with bright colours<br />

In the field of sequence alignment research it is common<br />

practice to compare the sequence alignments obtained with<br />

MSA software with those that are obtained from structure<br />

superpositions. IDPs do not possess a static 3D structure<br />

so that this method is not applicable to KMAD alignments.<br />

Both of the validation methods that we used have their<br />

disadvantages, but so far there is no alternative. Validation<br />

on benchmark alignments of structured proteins is biased<br />

towards Clustal Omega, because it was optimized to work<br />

with structured proteins. On the other hand, the manual<br />

inspection based on the same features that influence the<br />

alignment is not a very elegant method, but given the<br />

nature of IDPs probably the best we can do.<br />

REFERENCES<br />

Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high<br />

accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–<br />

1797.<br />

Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W.,<br />

Lopez, R., McWilliam, H., Remmert, M., S öding, J., Thompson, J.<br />

D., and Higgins, D. G. (2011). Fast, scalable generation of highquality<br />

protein multiple sequence alignments using Clustal Omega.<br />

Molecular System Biology, 7(539), 539.<br />

Sickmeier, M., Hamilton, J. a., LeGall, T., Vacic, V., Cortese, M. S.,<br />

Tantos, A., Szabo, B., Tompa, P., Chen, J., Uversky, V. N.,<br />

Obradovic, Z., and Dunker, a. K. (2007). DisProt: the Database of<br />

Disordered Proteins. Nucleic Acids Research, 35(Database issue),<br />

D786–93.<br />

The Uniprot Consortium (2014). Activities at the Universal Protein<br />

Resource (UniProt). Nucleic Acids Research, 42(Database issue),<br />

D191–8.<br />

Thompson, J. D., Koehl, P., Ripp, R., and Poch, O. (2005). BAliBASE<br />

3.0: latest developments of the multiple sequence alignment<br />

benchmark. Proteins: Structure, Function, and Bioinformatics,<br />

61(1), 127–136.<br />

75

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!