Research Unit for Immuno<strong>in</strong>formatics An important mission of our research unit is to develop and ma<strong>in</strong>ta<strong>in</strong> an open resource bio<strong>in</strong>formatics platform and data resources <strong>in</strong> order to ga<strong>in</strong> <strong>in</strong>sights <strong>in</strong>to primary immunodeficiency diseases (PID) through the accumulation and analysis of genomic, transcriptomic, and proteomic data. Our ultimate goal is to provide relevant, up-to-date and validated <strong>in</strong>formation on PID as per global community standards <strong>in</strong> an easily decipherable and usable format. Unit leader Sujatha Mohan Research Associates : Shivakumar Keerthikumar Rajesh Raju Generation of Primary Immunodeficiency Disease Database from Asia Primary immunodeficiency diseases are rare and genetically heterogeneous group of disorders affect<strong>in</strong>g various components of the <strong>in</strong>nate and adaptive immune systems. Despite rapid developments <strong>in</strong> the science of PID, early diagnosis and effective treatment of PID have posed new challenges to physicians all over the world. In this milieu, the development of a freely accessible dynamic web-based <strong>in</strong>tegrated tool on PID has become a paramount need of different target groups such as cl<strong>in</strong>ical immunologists, researchers, patients and family members. We hope that this <strong>in</strong>itiative will assist cl<strong>in</strong>ical immunologists and physicians <strong>in</strong> the identification of susceptible populations followed by cost effective regular screen<strong>in</strong>g procedures <strong>in</strong> order to m<strong>in</strong>imize cl<strong>in</strong>ical deterioration and provide researchers with essential clues to analyze candidate PID genes. We also hope to create awareness among PID patients to help them lead a near normal quality of life and to support affected family members with relevant PID <strong>in</strong>formation. The strength of this project is that it makes access to biological <strong>in</strong>formation simple for those without any bio<strong>in</strong>formatics knowledge and <strong>in</strong>tegrates it with experimental data from a diverse set of biological platforms <strong>in</strong>clud<strong>in</strong>g FACS and DNA genotyp<strong>in</strong>g to cl<strong>in</strong>ical phenotypes and signal<strong>in</strong>g pathways. Our first focus is on the analysis of primary immunodeficiency diseases because RCAI <strong>in</strong>vestigators have been work<strong>in</strong>g <strong>in</strong>tensively on PID <strong>in</strong> collaboration with outside researchers and have accumulated a wide array of cl<strong>in</strong>ical and experimental data.. To achieve our goals, we propose to carry out strategic approaches such as: i) curation of PID literature <strong>in</strong>formation along with publicly available cl<strong>in</strong>ical <strong>in</strong>formation <strong>in</strong>to a web accessible knowledge base, ii) development of PID 82
Figure 1 PID annotation tool web <strong>in</strong>terface. Recent publications H. C. Harsha, Shubha Suresh , Ramars Amanchy , Nandan Deshpande , K. Shanker, A. J. Yatish, Babylakshmi Muthusamy, B. M. Vrushabendra, B. P. Rashmi, K. N. Chandrika, N. Padma, Salil Sharma, Jose L. Badano, M. A. Ramya, H. N. Shivashankar, Suraj Peri, Dipanwita Roy Choudhury, M. P. Kavitha, R. Saravana, Vidya Niranjan, T. K. B. Gandhi, Neelanjana Ghosh, Sreenath Chandran, M<strong>in</strong>al Menezes, Mary Joy, S. Sujatha Mohan, Nicholas Katsanis, Krishna S. Deshpande, Chaerkady Raghothama, C. K. Prasad, and Akhilesh Pandey. Manually Curated Functional Annotation of the Human X chromosome. Nature Genetics 37: 331 – 332 (2005). annotation tools, iii) accumulation of cell specific immune and other signal<strong>in</strong>g pathway <strong>in</strong>formation and depiction of signal<strong>in</strong>g cascades with various biological processes us<strong>in</strong>g a visualization tool, iv) analysis of known PID gene mutations and, where applicable, the mechanism of defective regulation <strong>in</strong> signal<strong>in</strong>g pathways, v) identification of candidate genes by an <strong>in</strong> silico approach, vi) three dimensional structural prediction of mutant PID prote<strong>in</strong>s relative to their normal counterparts. Overall architecture of the PID database The PID database will be constructed as a web-based tool that is built on a traditional threetier architecture consist<strong>in</strong>g of a client application as the first tier, Python and DTML code runn<strong>in</strong>g on the Zope server eng<strong>in</strong>e as the second tier, and a backend database runn<strong>in</strong>g MySQL as the third tier. This architecture will facilitate annotation of PID <strong>in</strong>formation, storage, retrieval, review, and update and generate a standard output format. Future plans We propose to develop a web-based <strong>in</strong>tegrated dynamic database for PID us<strong>in</strong>g a simple graphical user <strong>in</strong>terface (Figure 1). We are also <strong>in</strong> the process of standardiz<strong>in</strong>g the entire work flow of a prediction program <strong>in</strong> order to automate and streaml<strong>in</strong>e the process to identify the most probable candidate PID genes for further experimental validation (Figure 2). Given the rare occurrence of PID, we plan to encourage direct submission of patient data to this resource by physicians, thus ensur<strong>in</strong>g the cont<strong>in</strong>uous flow of relevant <strong>in</strong>formation as well as sufficient numbers of updated records. We s<strong>in</strong>cerely believe that this repository will serve as a prototype for other immunological diseases and will be of immense value to physicians <strong>in</strong> cl<strong>in</strong>ical decision mak<strong>in</strong>g and diagnosis. This effort will comb<strong>in</strong>e the skills of cl<strong>in</strong>icians as well as scientists from molecular biology, immunology, genomics, proteomics and bio<strong>in</strong>formatics fields from other countries especially from Asian regions. The PID project has been <strong>in</strong>itiated <strong>in</strong> collaboration with the Institute of Bio<strong>in</strong>formatics (IOB, Bangalore, India), the Immunogenomics research group at RIKEN RCAI, Japan, and the Kazusa DNA Research Institute (KDRI), Japan. This research unit is supported by The Asia S&T Strategic Cooperation Promotion Program, Special Coord<strong>in</strong>ation Funds for Promot<strong>in</strong>g Science and Technology by the M<strong>in</strong>istry of Education, Culture, Sports, Science and Technology (MEXT). Figure 2 Flow chart outl<strong>in</strong><strong>in</strong>g our current approach to the proposed PID prediction algorithm. A list of known PID genes (1) or any genes (2) is given as an <strong>in</strong>put data and passed through various gateways such as Human Prote<strong>in</strong> Reference Database, HPRD - http:// www.hprd.org/, STRING - http:// str<strong>in</strong>g.embl.de/ (3), HuPA - http:// humanprote<strong>in</strong>pedia.org/ (4), GO - http://www.geneontology.org/(5), KEGG -http://www.genome.jp/kegg/ and NetPath - http://netpath.org/ (6), RefDIC - http://refdic.rcai.riken.jp/ welcome.cgi (7), MGI - http://www. <strong>in</strong>formatics.jax.org/ (8). Each gene is scored and ranked based on the number of successful gateways that are be<strong>in</strong>g passed for identify<strong>in</strong>g the most probable candidate PID genes (9). Muthusamy B, Hanumanthu G, Suresh S, Rekha B, Sr<strong>in</strong>ivas D, Karthick L, Vrushabendra BM, Sharma S, Mishra G, Chatterjee P, Mangala KS, Shivashankar HN, Chandrika KN, Deshpande N, Suresh M, Kannabiran N, Niranjan V, Nalli A, Keshava Prasad TS, Arun KS, Reddy R, Chandran S, Jadhav T, Julie D, Mahesh M, John SL, Palvankar K, Sudhir D, Bala P, Rashmi NS, Vishnupriya G, Dhar K, Reshma S, Chaerkady R, Gandhi TK, Harsha HC, S. Sujatha Mohan , Deshpande KS, Sarker M, Pandey A. Plasma proteome database as a resource for proteomics research. Proteomics 5: 3531-3536 (2005). Suresh S, S. Sujatha Mohan, Mishra G, Hanumanthu GR, Suresh M, Reddy R, Pandey A. Proteomic resources: Integrat<strong>in</strong>g biomedical <strong>in</strong>formation <strong>in</strong> humans. Gene 364: 13-18. 5 (2005). Gopa R. Mishra, M. Suresh, K. Kumaran, N. Kannabiran, Shubha Suresh, P. Bala, K. Shivakumar, N. Anuradha, Raghunath Reddy, T. Madhan Raghavan, Shal<strong>in</strong>i Menon, G. Hanumanthu, Malvika Gupta, Sapna Upendra, Shweta Gupta, M. Mahesh, B<strong>in</strong>cy Jacob, P<strong>in</strong>ky Mathew, Pritam Chatterjee, K. S. Arun, Salil Sharma, Chandrika N. Deshpande, Nandan P. Deshpande, Kshitish Palvankar, R. Raghavnath, R. Krishnakanth, Hiren Karathia, B. Rekha, Rashmi Nayak, G. Vishnupriya, Mohan Kumar, M. Nag<strong>in</strong>i, G. S. Sameer Kumar, Rojan Jose, P. Deepthi, S. Sujatha Mohan, T. K. B. Gandhi., H. C. Harsha., Krishna S. Deshpande, Malabika Sarker, T. S. Keshav Prasad and Akhilesh Pandey. Human Prote<strong>in</strong> Reference Database – 2006, Update. Nucleic Acids Research 34: D411-414 (2006). T K B Gandhi, Jun Zhong, Suresh Mathivanan, L Karthick, K N Chandrika, S. Sujatha Mohan, Salil Sharma, Stefan P<strong>in</strong>kert, Shilpa Nagaraju, Balamurugan Periaswamy, Goparani Mishra, Kannabiran Nandakumar, Beiyi Shen, Nandan Deshpande, Rashmi Nayak, Malabika Sarker, Jef D Boeke, Giovanni Parmigiani, Jörg Schultz, Joel S Bader, & Akhilesh Pandey. Analysis of the human prote<strong>in</strong> <strong>in</strong>teractome and comparison with yeast, worm and fly <strong>in</strong>teraction datasets. Nature Genetics 38: 285-293 (2006). 83