05.12.2012 Views

3D Structure Databases - Uses for Biological Problem solving - EBI

3D Structure Databases - Uses for Biological Problem solving - EBI

3D Structure Databases - Uses for Biological Problem solving - EBI

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>3D</strong> <strong>Structure</strong> <strong>Databases</strong> - <strong>Uses</strong> <strong>for</strong> <strong>Biological</strong><br />

<strong>Problem</strong> <strong>solving</strong><br />

The course will teach the basic principles aspects of <strong>3D</strong> database technology and the<br />

associated tools <strong>for</strong> data analysis to bioscientists wishing to understand the wealth of<br />

structure in<strong>for</strong>mation available. The course is aimed at PhD students and postdocs to<br />

give them a familiarity with how structure data can be used in their own projects.<br />

<strong>Databases</strong> <strong>for</strong> <strong>3D</strong> structural data <strong>for</strong> proteins and nucleic acids, together with the<br />

associated access tools have matured into a major tool <strong>for</strong> molecular biology. The<br />

course is intended to cover the background to relational databases and the<br />

computational aspects of characterizing structure of biological macromolecules<br />

The importance of databases in biological research has been stressed in the recent<br />

Nature technology feature by Buckingham [1]. In the United States, the National<br />

Science Foundation (NSF) has announced a new initiative, <strong>Biological</strong> <strong>Databases</strong> and<br />

In<strong>for</strong>matics Program Announcement’ [2], with the belief that future advances in the<br />

biological sciences will depend both upon the creation of new knowledge and upon<br />

effective management of proliferating in<strong>for</strong>mation. Further general background can<br />

be found in references 3 and 4.<br />

1. S. Buckingham Data’s future shock (2004) Nature 428, 774-777<br />

2. <strong>Biological</strong> <strong>Databases</strong> and In<strong>for</strong>matics Program Announcement NSF 02-058<br />

http://www.nsf.gov/pubs/2002/nsf02058/nsf02058.html<br />

3. Michael Y. Galperin (2004) The Molecular Biology Database Collection: 2004<br />

update. Nucleic Acids Research. 32, Database issue D3-D22<br />

4. Andrej Sali, Robert Glaeser, Thomas Earnest & Wolfgang Baumeister (2003)<br />

From words to literature in structural proteomics NATURE, 422, 216-225<br />

Provisional Timetable<br />

Lecturers:<br />

Professor Janet Thornton Dr Sue Jones<br />

Dr Roman Laskowski Dr Hannes Ponstingl<br />

Dr Kim Henrick Dr Eugene Krissinel<br />

Dr Phil McNeil Dr Thomas Oldfield<br />

Dr Sameer Velankar Mr Adel Golovin<br />

Mr Dimitris Dimitropoulos Dr Gerard Kleywegt<br />

Dr Jaime Prilusky Dr Helen Berman<br />

Dr Loredana Lo Conte Dr Christine Orengo<br />

Professor Bob Spence Dr James Milner-White<br />

Dr Tom Oldfield Dr John Westbrook<br />

Dr. Philip E. Bourne Dr Robert Finn<br />

Ms. Kyle Burkhardt<br />

Monday 20 th September<br />

9:00-9-40 <strong>Structure</strong> analysis Professor Janet Thornton (<strong>EBI</strong>)<br />

1. Todd A.E, Orengo C.A, Thornton J.M. (2002) Plasticity of enzyme active sites. Trends<br />

Biochem Sci. 27 419-26.


2. Steward RE, MacArthur MW, Laskowski RA, Thornton JM. (2003) Molecular basis of<br />

inherited diseases: a structural perspective. Trends Genet. 19, 505-13.<br />

3. Sanishvili R, Yakunin AF, Laskowski RA, Skarina T, Evdokimova E, Doherty-Kirby A,<br />

Lajoie GA, Thornton JM, Arrowsmith CH, Savchenko A, Joachimiak A, Edwards AM.<br />

(2003) Integrating structure, bioin<strong>for</strong>matics, and enzymology to discover function:<br />

BioH, a new carboxylesterase from Escherichia coli. J Biol Chem. 278, 26039-45.<br />

9:40-10:20 An overview of the RCSB Protein Data Bank Dr. Helen M. Berman<br />

RCSB Protein Data Bank Rutgers, The State University of New Jersey<br />

A description of the resources <strong>for</strong> data deposition, validation and query offered by the<br />

RCSB PDB will be given.<br />

1. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H.,<br />

Shindyalov, I. N. and Bourne, P. E. (2000) The Protein Data Bank. Nucleic Acids<br />

Res., 28, 235-242.<br />

2. Berman, H.M., Battistuz, T., Bhat, T.N., Bluhm, W.F., Bourne, P.E., Burkhardt, K.,<br />

Feng, Z., Gilliland, G.L., Iype, L., Jain, S., Fagan, P., Marvin, J., Padilla, D.,<br />

Ravichandran, V., Schneider, B., Thanki, N., Weissig, H., Westbrook, J.D. and<br />

Zardecki, C. (2002) The Protein Data Bank. Acta Crystallogr D 58, 899-907.<br />

3. John Westbrook, Zukang Feng, Li Chen, Huanwang Yang and Helen M. Berman<br />

(2003) The Protein Data Bank and structural genomics. Nucleic Acids Research, 31,<br />

489–491<br />

4. Bhat,T.N., Bourne,P., Feng,Z., Gilliland,G., Jain,S., Ravichandran,V., Schneider,B.,<br />

Schneider,K., Thanki,N., Weissig,H. et al. (2001) The PDB data uni<strong>for</strong>mity project.<br />

Nucleic Acids Res., 29, 214-218.<br />

5. Westbrook,J., Feng,Z., Jain,S., Bhat,T.N., Thanki,N., Ravichandran,V., Gilliland,G.L.,<br />

Bluhm,W., Weissig,H., Greer,D.S. et al. (2002) The Protein Data Bank: unifying the<br />

archive. Nucleic Acids Res., 30, 245-248.<br />

10:20-11:00 Crystals, Symmetry and Protein Assemblies Dr Kim Henrick (<strong>EBI</strong>)<br />

1. Henrick,K. and Thornton,J.M. (1998) PQS: a protein quaternary structure file<br />

server. Trends Biochem. Sci., 23, 358-361.<br />

11:00-11:30 coffee break<br />

11:30-12:10 Protein-DNA Interactions: analysis and prediction Dr Sue Jones<br />

The <strong>3D</strong> structures of over 700 proteins bound to DNA molecules have been determined.<br />

These proteins have diverse structural folds, and achieve binding and recognition of DNA in<br />

many different ways. This lecture will give an overview of the prominent characteristics of<br />

DNA-binding proteins, and explain how common physicochemical properties and conserved<br />

structural motifs can be used in a predictive manner to identify novel DNA-binding proteins.<br />

1. Jones S. & Thornton J.M. (2004) Searching <strong>for</strong> functional sites in protein<br />

structures. Current Opinion in Chemical Biology. 8, p3-7.<br />

2. Jones S, Shanahan H, Berman H.M. & Thornton J.M. (2003) Using<br />

electrostatic potentials to predict DNA-binding sites on DNA-binding proteins.<br />

Nucleic Acids Research 31, p7189-7198.<br />

3. Jones S, Barker J, Nobeli I & Thornton JM. (2003): Using structural motifs to<br />

identify proteins with DNA binding function. Nucleic Acids Research. 31,<br />

p2811-2823.


4. Jones S. & Thornton J.M. (2003) Protein-DNA interactions: the story so far<br />

and a new method <strong>for</strong> prediction. Comparative and Functional Genomics. 4,<br />

p428-431.<br />

5. Jones S, van Heyningen P, Berman HM & Thornton JM. (1999) Protein-DNA<br />

interactions: a structural analysis. Journal of Molecular Biology 287, p877-<br />

896.<br />

Additional Notes on Protein-Protein Interactions: classification, analysis and prediction<br />

Interactions between proteins are fundamental to many diverse biological processes including<br />

signal transduction, enzyme inhibition and cell adhesion. These interactions can be classified<br />

as ‘obligate’ or ‘non-obligate’. Obligate interactions <strong>for</strong>m the basis of the quaternary structure<br />

of multimeric proteins, and non-obligate interactions occur between proteins that exist<br />

independently as well as in complexes. This lecture will give an overview of the<br />

characteristics of protein-protein complexes from <strong>3D</strong> structures, and explain how these<br />

features vary dependant upon the class of the complex. A method <strong>for</strong> the prediction of protein<br />

interaction sites will then be described based on the analysis of patches on the protein<br />

surface.<br />

�� Jones S & Thornton JM. (1999) Protein domain interfaces: characterisation and<br />

comparison with oligomeric protein interfaces. Protein Engineering 13, p77-82.<br />

�� Jones S & Thornton JM. (1997): Prediction of protein-protein interaction sites using<br />

patch analysis. Journal of Molecular Biology 272, p133-143.<br />

�� Jones S & Thornton JM. (1997): Analysis of protein-protein interaction sites using<br />

surface patches. Journal of Molecular Biology 272, p121-132.<br />

�� Jones S & Thornton JM. (1996): Principles of protein-protein interactions derived from<br />

structural studies. Proceedings of the National Academy of Science (USA) 93, p13-20.<br />

12:10-12:50 Using the ‘Thornton-Group’ WWW Database Services <strong>for</strong> Structural<br />

Research Dr Roman Laskowski <strong>EBI</strong><br />

12:50-14:00 lunch<br />

1. R.A. Laskowski, J.D. Watson and J.M. Thornton (2003). From Protein<br />

structure to biochemical function. J. Struct. Funct Genomics, 4, 167-177.<br />

2. Laskowski RA. (2001) PDBsum: summaries and analyses of PDB structures.<br />

Nucleic Acids Res. 29 221-2.<br />

3. Luscombe N M, Laskowski R A, Westhead D R, Milburn D, Jones S,<br />

Karmirantzou M, Thornton J M (1998). New tools and resources <strong>for</strong> analysing<br />

protein structures and their interactions. Acta Cryst., D54, 1132-1138.<br />

4. Craig T. Porter, Gail J. Bartlett, and Janet M. Thornton (2004) The Catalytic<br />

Site Atlas: a resource of catalytic sites and residues identified in enzymes<br />

using structural data. Nucl. Acids. Res. 32: D129-D133.<br />

5. G J Bartlett, C T Porter, N Borkakoti & J M Thornton, Journal of Molecular<br />

Biology (2002) 324, 105-121.<br />

14:00-14:40 Quaternary <strong>Structure</strong> Inference of Proteins from their Crystals Dr<br />

Hannes Ponstingl <strong>EBI</strong><br />

Protein-Protein Interactions: The basic principles which determine the strength and<br />

geometry of protein-protein complexes by Patch Analysis and other methods.


1. H. Ponstingl, T. Kabir & J. M. Thornton (2003) Automatic inference of protein<br />

quaternary structure from crystals, J. Appl. Cryst. 36, 1116-1122.<br />

2. H. Ponstingl, K. Henrick & J. M. Thornton (2000) Discriminating between<br />

homodimeric and monomeric proteins in the crystalline state. Proteins 41, 47-57.<br />

14:40-15:20 Validation of protein structures ... Or: just because it was<br />

published in Nature, doesn't mean it's true ! Dr. Gerard Kleywegt Uppsala<br />

Sweden<br />

With the explosive growth of the number of experimentally determined<br />

macromolecular structures, "structural awareness" is becoming an important aspect<br />

of many disciplines, ranging from medicinal chemistry to cell biology. This means that<br />

many scientists who have not been specifically trained in the area want to make use<br />

of structural in<strong>for</strong>mation in order to explain the molecular basis of their own research<br />

findings, to plan new experiments, to design novel ligands, substrates or inhibitors,<br />

etc. What these users of structural in<strong>for</strong>mation often are unaware of is that there are<br />

limitations to and uncertainties in the experimentally determined structures. A number<br />

of protein structures have been published (often in prestigious journals) that turned<br />

out to be partly or entirely incorrect. A few examples will be given, and simple ways in<br />

which non-experts can assess the overall reliability of structures will be discussed.<br />

However, as technology improves, such gross errors are less and less likely to occur.<br />

On the other hand, mistakes in the details of the structures are much easier to make,<br />

and concomitantly more difficult to detect. It is often in these details, however, that the<br />

value of a structure lies, since they reveal the molecular basis of interactions. This is<br />

particularly true <strong>for</strong> non-macromolecular entities (ligands, inhibitors, substrateanalogues,<br />

sugars, ions, etc.). Some of the pitfalls and limitations of the use of<br />

structural in<strong>for</strong>mation will be discussed, with a view to structure-based design. In the<br />

practical, some of the basics of protein structure validation will be reviewed, and the<br />

use of various databases (PDB, PDBsum, PDBREPORT and EDS) to assess the<br />

quality of deposited protein structures will be explained.<br />

References:<br />

1. G J Kleywegt, "Validation of protein crystal structures" (Topical review), Acta<br />

Crystallographica, D56, 249-265 (2000).<br />

2. AM Davis, SJ Teague & GJ Kleywegt, "Applications and limitations of X-ray<br />

crystallographic data in structure-based ligand and drug design", Angewandte<br />

Chemie International Edition, 42, 2718-2736 (2003)<br />

3. EDS Viewer <strong>for</strong> structures and electron density maps – interpretation of validation<br />

criteria<br />

15:20-16:00 <strong>3D</strong> databases and data warehouse technology Dr Phil McNeil (<strong>EBI</strong>)<br />

16:00-16:30 coffee break<br />

16:30-17:10 Clustering of <strong>3D</strong> structures and representative sets Dr Thomas<br />

Oldfield <strong>EBI</strong><br />

1. Li,W., Jaroszewski,L. and Godzik,A. (2001) Clustering of highly homologous<br />

sequences to reduce the size of large protein databases.Bioin<strong>for</strong>matics, 17,<br />

282-283.<br />

17:10-17:50 Sequences and <strong>3D</strong> structures (Integration of <strong>3D</strong> data and sequence<br />

databases) Dr Sameer Velankar <strong>EBI</strong><br />

The "<strong>Structure</strong> integration with function, taxonomy and sequence (SIFTS) initiative"<br />

aims to work towards the integration of various bioin<strong>for</strong>matics resources. One of the


major obstacles to the improved integration of structural databases such as MSD<br />

and sequence databases like UniProt<br />

, which are primary archival databases <strong>for</strong><br />

structure and sequence data, is the absence of up to date and well-maintained<br />

mapping between corresponding entries. We have worked closely with the UniProt<br />

group at the <strong>EBI</strong> to clean up the taxonomy and sequence cross-reference in<strong>for</strong>mation<br />

in the MSD and UniProt databases. The project was started in the year 2001 and has<br />

resulted in creating a robust mechanisms <strong>for</strong> exchanging data between the two<br />

primary data resources. This has dramatically improved the quality of annotation in<br />

both databases and is aiding the continuing improvements of legacy data. In the<br />

longer term this project will allow <strong>for</strong> not only the better and closer integration of<br />

derived-data resources but will continue to improve the quality of all data in the<br />

primary resources. This in<strong>for</strong>mation is vital <strong>for</strong> the reliable integration of the sequence<br />

family databases such as Pfam and<br />

Interpro with the structure-oriented databases of<br />

SCOP and CATH<br />

. This in<strong>for</strong>mation has been made available<br />

to the eFamily group and now <strong>for</strong>ms the basis of the<br />

regular interchange of in<strong>for</strong>mation between the member databases (MSD, Uniprot,<br />

Pfam, Interpro, SCOP and CATH).


Tuesday 21 st September<br />

Participants split into 2 groups.<br />

Morning session Group 1<br />

Tutorials (15 min intro/demo – 45 minute tutorial)<br />

�� Searching <strong>3D</strong> protein structures <strong>for</strong> conserved DNA-binding motifs: Dr<br />

Sue Jones<br />

�� Using the ‘Thornton-Group’ WWW Database Services <strong>for</strong> Structural<br />

Research Dr Roman Laskowski <strong>EBI</strong> (2hours)<br />

CATRES, Catalytic Site Atlas (CSA), NetFunc, EC->PDB, Pita - Protein<br />

InTerfaces and Assemblies, Receptor <strong>Structure</strong> and Function, Protein Side-<br />

Chain Interactions, Practical: Structural Genomics<br />

�� Pfam and MEROPS Robert Finn Sanger<br />

Morning session Group 2<br />

Lectures (40 minutes – intro/demos)<br />

�� Protein <strong>Structure</strong> Classification and Genome Annotation: Technologies<br />

and Insights from the CATH Database. Professor Christine Orengo UCL<br />

London UK<br />

�� SCOP: Structural Classification of Proteins. Dr Loredana Lo Conte (LMB<br />

Cambridge UK)<br />

�� SSM fold characterization Dr Eugene Krissinel <strong>EBI</strong><br />

Afternoon session Group 1<br />

Lectures (40 minutes - intro/demos)<br />

�� Protein <strong>Structure</strong> Classification and Genome Annotation: Technologies<br />

and Insights from the CATH Database. Professor Christine Orengo UCL<br />

London UK<br />

�� SCOP: Structural Classification of Proteins. Dr Loredana Lo Conte (LMB<br />

Cambridge UK)<br />

�� SSM fold characterization Dr Eugene Krissinel <strong>EBI</strong><br />

Afternoon session Group 2<br />

Tutorials (15 min intro/demo – 45 minute tutorial)<br />

�� Searching <strong>3D</strong> protein structures <strong>for</strong> conserved DNA-binding motifs: Dr<br />

Sue Jones<br />

�� Using the ‘Thornton-Group’ WWW Database Services <strong>for</strong> Structural<br />

Research Dr Roman Laskowski <strong>EBI</strong> (2hours)<br />

CATRES, Catalytic Site Atlas (CSA), NetFunc, EC->PDB, Pita - Protein<br />

InTerfaces and Assemblies, Receptor <strong>Structure</strong> and Function, Protein Side-<br />

Chain Interactions, Practical: Structural Genomics<br />

�� Pfam and MEROPS Robert Finn Sanger<br />

[ Motif and protein structures Dr. Gerard Kleywegt Uppsala Sweden ]<br />

Protein <strong>Structure</strong> Classification and Genome Annotation: Technologies and<br />

Insights from the CATH Database. Professor Christine Orengo UCL London UK<br />

CATH is a novel hierarchical classification of protein domain structures, which<br />

clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and<br />

Homologous superfamily (H). Class, derived from secondary structure content, is


assigned <strong>for</strong> more than 90% of protein structures automatically. Architecture, which<br />

describes the gross orientation of secondary structures, independent of<br />

connectivities, is currently assigned manually. The topology level clusters structures<br />

according to their toplogical connections and numbers of secondary structures. The<br />

homologous superfamilies cluster proteins with highly similar structures and<br />

functions. The assignments of structures to toplogy families and homologous<br />

superfamilies are made by sequence and structure comparisons.<br />

I will illustrate concepts behind protein structure comparison and classifiction using<br />

the CATH database. I will also present methods <strong>for</strong> providing structural annotations<br />

<strong>for</strong> genome sequences.<br />

1. Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B. and<br />

Thornton, J. M. (1997) CATH–a hierarchic classification of protein domain<br />

structures. <strong>Structure</strong>, 5, 1093-1108.<br />

2. Pearl, F.M.G, Lee, D., Bray, J.E, Sillitoe, I., Todd, A.E., Harrison, A.P.,<br />

Thornton, J.M. and Orengo, C.A. (2000) Assigning genomic sequences to<br />

CATH Nucleic Acids Research. 28. 277-282<br />

SCOP: Structural Classification of Proteins. Dr Loredana Lo Conte (LMB<br />

Cambridge UK)<br />

Nearly all proteins have structural similarities with other proteins and, in some of<br />

these cases, share a common evolutionary origin. The SCOP database, created by<br />

manual inspection and abetted by a battery of automated methods, aims to provide a<br />

detailed and comprehensive description of the structural and evolutionary<br />

relationships between all proteins whose structure is known. As such, it provides a<br />

broad survey of all known protein folds, detailed in<strong>for</strong>mation about the close relatives<br />

of any particular protein, and a framework <strong>for</strong> future research and classification.<br />

I will take you behind the scenes. All we learned so far. What is missing.<br />

1. Murzin, A. G., Brenner, S. E., Hubbard, T. and Chothia, C. (1995) SCOP: a<br />

structural classification of proteins database <strong>for</strong> the investigation of<br />

sequences and structures. J. Mol. Biol., 247, 536-540.<br />

2. Andreeva A., Howorth D., Brenner S.E., Hubbard T.J.P., Chothia C., Murzin<br />

A.G. (2004). SCOP database in 2004: refinements integrate structure and<br />

sequence family data. Nucl. Acid Res. 32, D226-D229.<br />

3. Lo Conte L., Brenner S. E., Hubbard T.J.P., Chothia C., Murzin A. (2002).<br />

SCOP database in 2002: refinements accommodate structural genomics.<br />

Nucl. Acid Res. 30, 264-267.<br />

Pfam and MEROPS Robert Finn Sanger<br />

Pfam is a database of two parts, the first is the curated part of Pfam containing over<br />

7459 protein families. To give Pfam a more comprehensive coverage of known<br />

proteins we automatically generate a supplement called Pfam-B. This contains a<br />

large number of small families taken from the PRODOM database that do not overlap<br />

with Pfam-A. Although of lower quality Pfam-B families can be useful when no Pfam-<br />

A families are found.<br />

1. The Pfam Protein Families Database Alex Bateman, Lachlan Coin, Richard Durbin,<br />

Robert D. Finn, Volker Hollich, Sam Griffiths-Jones, Ajay Khanna, Mhairi Marshall,<br />

Simon Moxon, Erik L. L. Sonnhammer, David J. Stud holme, Corin Yeats and Sean<br />

R. Eddy (2004) Nucleic Acids Research Database Issue 32, D138-D141


2. Falquet L, Pagni M, Bucher P, Hulo N, Sigrist CJ, Hofmann K, Bairoch A. (2002). The<br />

PROSITE database, its status in 2002 Nucleic Acids Res. 30:235-238<br />

3. Julie D. Thompson, Fédéric Plewniak, Raymond Ripp, Jean-Claude Thierry and<br />

Olivier Poch (2001) Towards a Reliable Objective Function <strong>for</strong> Multiple Sequence<br />

Alignments. J.Mol.Biol. 314, 937-951<br />

4. Servant F, Bru C, Carrère S, Courcelle E, Gouzy J, Peyruc D, Kahn D (2002)<br />

ProDom: Automated clustering of homologous domains. Briefings in Bioin<strong>for</strong>matics.<br />

3, 246-251<br />

SSM fold characterization Dr Eugene Krissinel <strong>EBI</strong><br />

SSM is a powerful interactive research tool <strong>for</strong> secondary structure matching that<br />

allows <strong>for</strong> comparing protein structures in <strong>3D</strong>. The service provides <strong>for</strong> (i) pairwise<br />

comparison and <strong>3D</strong> alignment of protein structures, (ii) multiple comparison and <strong>3D</strong><br />

alignment of protein structures, (iii) examination of a protein structure <strong>for</strong> similarity<br />

with the whole PDB or SCOP archives, (iv) best Ca-alignment of compared structures<br />

and (v) the ability to download and visualization of best-superposed structures using<br />

RasMol or RasTop. The results are linked to other services including OCA, SCOP,<br />

GeneCensus, FSSP, <strong>3D</strong>ee, CATH, PDBsum, SwiisProt and ProtoMap. SSM is<br />

recognized as a valuable tool in protein research, an aid to study protein function via<br />

structural similarities (used in drug design), protein con<strong>for</strong>mations, choice of model <strong>for</strong><br />

structure solution in X-ray experiments and many others. The development has also<br />

extensive integration with other MSD services.<br />

1. E. Krissinel and K. Henrick, Protein structure comparison in <strong>3D</strong> based on secondary<br />

structure matching (SSM) followed by Ca alignment, scored by a new structural<br />

similarity function. In: Andreas J. Kungl & Penelope J. Kungl (Eds.), Proceedings of<br />

the 5th International Conference on Molecular Structural Biology, Vienna, September<br />

3-7, 2003, p.88.<br />

2. E. Krissinel and K. Henrick, Common subgraph isomorphism detection by bactracking<br />

search. (2004) Software: Practice and Experience, 34, 591-607.<br />

Motif and protein structures Dr. Gerard Kleywegt Uppsala Sweden<br />

Motif recognition (essentially, function-from-structure)<br />

SPASM server Motif recognition in nucleic acids structures (SPANA)<br />

Motif recognition in proteins (SPASM)<br />

1. Madsen, D. and Kleywegt, G.J. (2002). Interactive motif and fold recognition in<br />

protein structures. J. Appl. Cryst. 35, 137-139.


WEDS 22 nd September<br />

Participants split into 2 groups.<br />

Morning session Group 1<br />

Tutorials (15 min intro/demo –45 minute tutorial)<br />

�� Validation of protein structures ... Or: just because it was published in<br />

Nature, doesn't mean it's true ! Dr. Gerard Kleywegt Uppsala Sweden<br />

(90min)<br />

�� Learning about structures using the RCSB PDB Dr. Philip E. Bourne<br />

Professor of Pharmacology University of Cali<strong>for</strong>nia (60min)<br />

�� Evaluation of structure quality tutorial, using RCSB tools Ms. Kyle<br />

Burkhardt RCSB Protein Data Bank Rutgers, The State University of New<br />

Jersey (60min)<br />

Morning session Group 2<br />

Lectures (7x30mins - intro/demo MSD tools)<br />

�� MSDchem, MSDlite, MSDpro, MSDsite<br />

�� MSDmySQL , MSDmine<br />

�� Advanced AstexViewer integrated into <strong>3D</strong> PDB searches<br />

Generic search systems <strong>for</strong> the search database has been written and made a public<br />

service as http://www.ebi.ac.uk/msd-srv/msdlite and http://www.ebi.ac.uk/msdsrv/msdpro.<br />

The system is written using java servlets and uses XML extensively <strong>for</strong><br />

configuration of the database/search system interactions, the description of the user<br />

interface, and the return of results. The system is designed to translate user input<br />

from a series of values into an SQL query, which can then be executed on the<br />

database. The architecture of the server-side of the search system allows a high<br />

degree of flexibility and extending the range of searches SQL statements that can be<br />

created automatically). The system is highly configurable and can be moved easily to<br />

other databases simply by modifying XML dictionaries, which describe the database.<br />

Afternoon session Group 1<br />

Lectures (7x30mins - intro/demo MSD tools)<br />

�� MSDchem, MSDlite, MSDpro, MSDsite<br />

�� MSDmySQL , MSDmine<br />

�� Advanced AstexViewer integrated into <strong>3D</strong> PDB searches<br />

Afternoon session Group 2<br />

Tutorials<br />

�� Validation of protein structures ... Or: just because it was published in<br />

Nature, doesn't mean it's true ! Dr. Gerard Kleywegt Uppsala Sweden<br />

(90min)<br />

�� Learning about structures using the RCSB PDB Dr. Philip E. Bourne<br />

Professor of Pharmacology University of Cali<strong>for</strong>nia (60min)<br />

�� Evaluation of structure quality tutorial, using RCSB tools Ms. Kyle<br />

Burkhardt RCSB Protein Data Bank Rutgers, The State University of New<br />

Jersey (60min)<br />

Learning about structures using the RCSB PDB Dr. Philip E. Bourne Professor of<br />

Pharmacology University of Cali<strong>for</strong>nia,


The RCSB PDB has been reengineered to include many new features and to<br />

integrate a variety of additional in<strong>for</strong>mation related to macromolecular structure and<br />

function ranging from genomic in<strong>for</strong>mation to disease states. A number of these<br />

features will be explored through several typical usage scenarios suited to novice<br />

users and more senior biologists.<br />

1. Philip E. Bourne, Kenneth J. Addess, Wolfgang F. Bluhm, Li Chen,<br />

Nita Deshpande, Zukang Feng, Ward Fleri, Rachel Green, Jeffrey C.<br />

Merino-Ott, Wayne Townsend-Merino, Helge Weissig, John<br />

Westbrook and Helen M. Berman (2004). The distribution and query<br />

systems of the RCSB Protein Data Bank Nucleic Acids Research 32,<br />

Database issue D223-D225<br />

Evaluation of structure quality tutorial, using RCSB tools Ms. Kyle Burkhardt<br />

RCSB Protein Data Bank Rutgers, The State University of New Jersey<br />

In this one hour tutorial, users will learn how to evaluate the quality of a structure.<br />

Users will download a structure from the PDB and validate the structure using the<br />

RCSB developed online Validation Suite. Users will learn how to analyze the<br />

validation report as well as PROCHECK, NUCHECK, and SFCheck results to<br />

determine structure quality.<br />

MSDsite Service Mr Adel Golovin <strong>EBI</strong><br />

The research service, MSDsite has been developed to give access to <strong>3D</strong> active site<br />

data. The three-dimensional environments of ligand binding sites have been derived<br />

from the parsing and loading of the PDB entries into a relational database. For each<br />

bound molecule the biological assembly of the quaternary structure has been used to<br />

determine all contact residues and a fast interactive search and retrieval system has<br />

been developed. Prosite pattern and short sequence search options are available<br />

together with a novel graphical query generator <strong>for</strong> inter-residue contacts.<br />

Dimitris Dimitropoulos<br />

MSDchem tutorial: Chemistry as the starting point of a search. Following the path<br />

from ligand chemistry to protein structure. This tutorial demonstrates in detail the<br />

searching capabilities of the MSD database and the MSDchem tool in identifiying<br />

ligands using their basic chemical topology and chemical signature.<br />

MSDmySQL tutorial: Working with the bare MSD database in the popular mySQL<br />

<strong>for</strong>m using directly general purpose standard API's and programming languages. A<br />

way to use the MSD database infrastructure in the most flexible and powerfull way.<br />

MSDmine tutorial: A web application <strong>for</strong> scientific discovery, data analysis and<br />

knowledge mining <strong>for</strong> the advanced researcher of the MSD database. From the<br />

simplest to most complex searches, that combine many different in<strong>for</strong>mation entities,<br />

together with visualisation and cross-references. Online generation of charts and<br />

data drill and roll-up operations.<br />

1. Golovin, T. J. Oldfield, J. G. Tate, S. Velankar, G. J. Barton, H. Boutselakis, D.<br />

Dimitropoulos, J. Fillon, A. Hussain, J. M. C. Ionides, M. John, P. A. Keller, E.<br />

Krissinel, P. McNeil, A. Naim, R. Newman, A. Pajon, J. Pineda, A. Rachedi, J.<br />

Copeland, A. Sitnov, S. Sobhany, A. Suarez-Uruena, G. J. Swaminathan, M. Tagari,


S. Tromm, W. Vranken and K. Henrick (2004) E-MSD: an integrated data resource <strong>for</strong><br />

bioin<strong>for</strong>matics. Nucleic Acids Research, 32, Database issue D211-D216<br />

2. [4] Boutselakis, H., Dimitropoulos, D., Fillon, J., Golovin, A., Henrick, K., Hussain, A.,<br />

Ionides, J., John, M., Keller, P.A., Krissinel, E., McNeil, P., Naim, A., Newman, R.,<br />

Oldfield, T., Pineda, J., Rachedi, A., Copeland, J., Sitnov, A., Sobhany, S., Suarez-<br />

Uruena, A., Swaminathan, J., Tagari, M., Tate, J., Tromm, S., Velankar, S. and<br />

Vranken, W. (2003) E-MSD: the European Bioin<strong>for</strong>matics Institute Macromolecular<br />

<strong>Structure</strong> Database Nucleic Acids Research, 31, 458-462


Thursday<br />

Morning session Group 1<br />

�� Tutorials – Using MSD Tools to solve <strong>Problem</strong>s (60min)<br />

��Tutorials Database Replication and use on your Desktop (60min)<br />

MSDmySQL and MSDmine<br />

��Tutorials –Scripting languages + data mining Dr Jaime Prilusky<br />

Weizmann Israel (90min)<br />

Morning session Group 2<br />

Lectures and Demos<br />

��Visualisation data mining Dr Thomas Oldfield <strong>EBI</strong><br />

��What is visualisation and how are complex data Represented?<br />

Professor Bob Spence (Imperial College London)<br />

��Small Structural and Sequence Motifs Dr James Milner-White<br />

University of Glasgow<br />

Afternoon session Group 1<br />

Lectures and Demos<br />

��Visualisation data mining Dr Thomas Oldfield <strong>EBI</strong><br />

��What is visualisation and how are complex data Represented?<br />

Professor Bob Spence (Imperial College London)<br />

��Small Structural and Sequence Motifs Dr James Milner-White<br />

University of Glasgow<br />

Afternoon session Group 2<br />

�� Tutorials – Using MSD Tools to solve <strong>Problem</strong>s (60min)<br />

��Tutorials Database Replication and use on your Desktop (60min)<br />

MSDmySQL & MSDmine<br />

��Tutorials –Scripting languages + data mining Dr Jaime Prilusky<br />

Weizmann Israel (90min)<br />

Scripting languages + data mining Dr Jaime Prilusky Weizmann Israel<br />

Use of fast scripting languages (Perl, Python) <strong>for</strong> creating ad-hoc searching and<br />

analysis tools on top of existing databases<br />

Visualisation<br />

1. Tate,J.G., Moreland,J.L. and Bourne,P.E. (2001) Design and<br />

implementation of a collaborative molecular graphics environment. J.<br />

Mol. Graph. Model., 19, 280-287, 369-273.<br />

2. Neshich,G., Togawa,R.C., Mancini,A.L., Kuser,P.R., Yamagishi,M.E.,<br />

Pappas,G.,Jr, Torres,W.V., Fonseca e Campos,T., Ferreira,L.L.,<br />

Luna,F.M. et al. (2003) STING Millennium: a web-based suite of<br />

programs <strong>for</strong> comprehensive and simultaneous analysis of protein<br />

structure and sequence. Nucleic Acids Res., 31, 3386-3392.<br />

3. Bob Spence The Acquisition of Insight<br />

http://www.ee.ic.ac.uk/research/in<strong>for</strong>mation/www/Bobs.html


4. Watson, J. D. and Milner-White, E. J. 2002 The con<strong>for</strong>mations of Polypeptide<br />

Chains where the main-chain parts of successive residues are enantiomeric.<br />

Their occurrence in Cation and Anion-binding regions of proteins. Journal of<br />

Molecular Biology 315, 183-191<br />

5. Watson, J. D. and Milner-White, E. J. 2002 A novel main-chain anion-binding<br />

site in proteins: The Nest. A particular combination of phi,psi values in<br />

successive residues gives rise to anion-binding sites that occur commonly<br />

and are found often at functionally important regions. Journal of Molecular<br />

Biology 315, 171-182<br />

6. Wan, W. Y. and Milner-White, E. J. 1999 A natural grouping of motifs with an<br />

aspartate or asparagine residue <strong>for</strong>ming two hydrogen bonds to residues<br />

ahead in sequence: Their occurrence at alpha-helical N termini and in other<br />

situations. Journal of Molecular Biology 286, 1633-1649<br />

7. Furnas, G.W. Generalised fisheye views. (1986) Human Factors in<br />

Computing Systems. CHI’86 Conference Proceeedings, Boston, April 13-17<br />

pp 16-23.<br />

8. George G. Robertson, Jock D. Mackinlay and Stuart K. Card. (1991) The<br />

Perspective wall: Detail and context smoothly integrated CHI’91 Conferenve<br />

Proceedings, pp 174-179.<br />

9. Oldfield, T.J. Creating structure features by datamining the PDB to use as<br />

molecular replacement models. Acta Cryst D57 1421-1427.


Friday 24 th September<br />

9:00-9-40 Summary of <strong>3D</strong> data and WWW Current scientific resources<br />

<strong>for</strong> <strong>3D</strong> structure (Dr Kim Henrick <strong>EBI</strong>)<br />

��Clean data<br />

��Data base architectures<br />

9:40-10:20 Now you have seen the services – a word about Conceptual<br />

basis <strong>for</strong> data analysis, problem <strong>solving</strong> and critical thinking – (Dr Tom<br />

Oldfield <strong>EBI</strong>)<br />

10:20-11:00 Data Interchange/API’s Data Integration problems / Query<br />

Interchange – Dr John Westbrook (RCSB)<br />

11:00-11:30 coffee break<br />

11:30-12:10 FeedBack – Wrapup – Chair Dr Helen Berman (RCSB)<br />

12:10 Lunch and Depart

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!