bbc 2015
BBC2015_booklet
BBC2015_booklet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: 000 Category: Abstract template<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P31. KMAD: KNOWLEDGE BASED MULTIPLE SEQUENCE ALIGNMENT<br />
FOR INTRINSICALLY DISORDERED PROTEINS<br />
Joanna Lange 1,2 , Lucjan S Wyrwicz 1 & Gert Vriend 2* .<br />
Laboratory of Bioinformatics and Biostatistics, M. Sklodowska-Curie Memorial Cancer Center;<br />
Institute of Oncology 1 , CMBI, Radboud University Nijmegen 2 . * vriend@cmbi.ru.nl<br />
INTRODUCTION<br />
Intrinsically disordered proteins (IDPs) lack tertiary<br />
structure and thus differ from globular proteins in terms of<br />
their sequence – structure – function relations. IDPs have a<br />
lower sequence conservation, different types of active<br />
sites, and a different distribution of functionally important<br />
regions, which altogether makes their multiple sequence<br />
alignment (MSA) difficult.<br />
Algorithms underlying existing MSA programs are<br />
directly or indirectly based on knowledge obtained from<br />
studying three dimensional protein structures. Hereby we<br />
introduce a tool for Knowledge based Multiple sequence<br />
Alignment for intrinsically Disordered proteins, KMAD,<br />
that incorporates SLiM, domain, and PTM annotations to<br />
improve the alignments.<br />
KMAD web server is accessible at<br />
http://www.cmbi.ru.nl/kmad/. A standalone version is<br />
freely available.<br />
METHODS<br />
Dataset of proteins experimentally proven to be disordered<br />
was obtained from DisProt (Sickmeier et al., 2007). For<br />
each IDP all homologous sequences were extracted from<br />
SwissProt (The Uniprot Consortium, 2014) using BLAST.<br />
The sequence sets were aligned with several MSA tools.<br />
Apart from manual validation we also performed a<br />
benchmark validation on reference sets from BAliBASE<br />
(Thompson et al., 2005) and PREFAB holding structurebased<br />
'gold standard' sequence alignments. For this<br />
purpose we used KMAD and a modified version of<br />
KMAD, which performs a ’refinement’ of Clustal Omega<br />
(Sievers et al., 2011) alignments.<br />
RESULTS & DISCUSSION<br />
Manual validation showed that KMAD bypasses many<br />
mistakes made by Clustal Omega. An example of an<br />
alignment mistake is shown on Figure 1.<br />
a) Clustal Omega<br />
b) KMAD<br />
FIGURE 1. Excerpts from Clustal Omega and KMAD alignments of<br />
human sialoprotein (SIAL HUMAN) with four homologues. Various PTM<br />
kinds are highlighted with bright colours<br />
In the field of sequence alignment research it is common<br />
practice to compare the sequence alignments obtained with<br />
MSA software with those that are obtained from structure<br />
superpositions. IDPs do not possess a static 3D structure<br />
so that this method is not applicable to KMAD alignments.<br />
Both of the validation methods that we used have their<br />
disadvantages, but so far there is no alternative. Validation<br />
on benchmark alignments of structured proteins is biased<br />
towards Clustal Omega, because it was optimized to work<br />
with structured proteins. On the other hand, the manual<br />
inspection based on the same features that influence the<br />
alignment is not a very elegant method, but given the<br />
nature of IDPs probably the best we can do.<br />
REFERENCES<br />
Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high<br />
accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–<br />
1797.<br />
Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W.,<br />
Lopez, R., McWilliam, H., Remmert, M., S öding, J., Thompson, J.<br />
D., and Higgins, D. G. (2011). Fast, scalable generation of highquality<br />
protein multiple sequence alignments using Clustal Omega.<br />
Molecular System Biology, 7(539), 539.<br />
Sickmeier, M., Hamilton, J. a., LeGall, T., Vacic, V., Cortese, M. S.,<br />
Tantos, A., Szabo, B., Tompa, P., Chen, J., Uversky, V. N.,<br />
Obradovic, Z., and Dunker, a. K. (2007). DisProt: the Database of<br />
Disordered Proteins. Nucleic Acids Research, 35(Database issue),<br />
D786–93.<br />
The Uniprot Consortium (2014). Activities at the Universal Protein<br />
Resource (UniProt). Nucleic Acids Research, 42(Database issue),<br />
D191–8.<br />
Thompson, J. D., Koehl, P., Ripp, R., and Poch, O. (2005). BAliBASE<br />
3.0: latest developments of the multiple sequence alignment<br />
benchmark. Proteins: Structure, Function, and Bioinformatics,<br />
61(1), 127–136.<br />
75