3D Structure Databases - Uses for Biological Problem solving - EBI
3D Structure Databases - Uses for Biological Problem solving - EBI
3D Structure Databases - Uses for Biological Problem solving - EBI
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>3D</strong> <strong>Structure</strong> <strong>Databases</strong> - <strong>Uses</strong> <strong>for</strong> <strong>Biological</strong><br />
<strong>Problem</strong> <strong>solving</strong><br />
The course will teach the basic principles aspects of <strong>3D</strong> database technology and the<br />
associated tools <strong>for</strong> data analysis to bioscientists wishing to understand the wealth of<br />
structure in<strong>for</strong>mation available. The course is aimed at PhD students and postdocs to<br />
give them a familiarity with how structure data can be used in their own projects.<br />
<strong>Databases</strong> <strong>for</strong> <strong>3D</strong> structural data <strong>for</strong> proteins and nucleic acids, together with the<br />
associated access tools have matured into a major tool <strong>for</strong> molecular biology. The<br />
course is intended to cover the background to relational databases and the<br />
computational aspects of characterizing structure of biological macromolecules<br />
The importance of databases in biological research has been stressed in the recent<br />
Nature technology feature by Buckingham [1]. In the United States, the National<br />
Science Foundation (NSF) has announced a new initiative, <strong>Biological</strong> <strong>Databases</strong> and<br />
In<strong>for</strong>matics Program Announcement’ [2], with the belief that future advances in the<br />
biological sciences will depend both upon the creation of new knowledge and upon<br />
effective management of proliferating in<strong>for</strong>mation. Further general background can<br />
be found in references 3 and 4.<br />
1. S. Buckingham Data’s future shock (2004) Nature 428, 774-777<br />
2. <strong>Biological</strong> <strong>Databases</strong> and In<strong>for</strong>matics Program Announcement NSF 02-058<br />
http://www.nsf.gov/pubs/2002/nsf02058/nsf02058.html<br />
3. Michael Y. Galperin (2004) The Molecular Biology Database Collection: 2004<br />
update. Nucleic Acids Research. 32, Database issue D3-D22<br />
4. Andrej Sali, Robert Glaeser, Thomas Earnest & Wolfgang Baumeister (2003)<br />
From words to literature in structural proteomics NATURE, 422, 216-225<br />
Provisional Timetable<br />
Lecturers:<br />
Professor Janet Thornton Dr Sue Jones<br />
Dr Roman Laskowski Dr Hannes Ponstingl<br />
Dr Kim Henrick Dr Eugene Krissinel<br />
Dr Phil McNeil Dr Thomas Oldfield<br />
Dr Sameer Velankar Mr Adel Golovin<br />
Mr Dimitris Dimitropoulos Dr Gerard Kleywegt<br />
Dr Jaime Prilusky Dr Helen Berman<br />
Dr Loredana Lo Conte Dr Christine Orengo<br />
Professor Bob Spence Dr James Milner-White<br />
Dr Tom Oldfield Dr John Westbrook<br />
Dr. Philip E. Bourne Dr Robert Finn<br />
Ms. Kyle Burkhardt<br />
Monday 20 th September<br />
9:00-9-40 <strong>Structure</strong> analysis Professor Janet Thornton (<strong>EBI</strong>)<br />
1. Todd A.E, Orengo C.A, Thornton J.M. (2002) Plasticity of enzyme active sites. Trends<br />
Biochem Sci. 27 419-26.
2. Steward RE, MacArthur MW, Laskowski RA, Thornton JM. (2003) Molecular basis of<br />
inherited diseases: a structural perspective. Trends Genet. 19, 505-13.<br />
3. Sanishvili R, Yakunin AF, Laskowski RA, Skarina T, Evdokimova E, Doherty-Kirby A,<br />
Lajoie GA, Thornton JM, Arrowsmith CH, Savchenko A, Joachimiak A, Edwards AM.<br />
(2003) Integrating structure, bioin<strong>for</strong>matics, and enzymology to discover function:<br />
BioH, a new carboxylesterase from Escherichia coli. J Biol Chem. 278, 26039-45.<br />
9:40-10:20 An overview of the RCSB Protein Data Bank Dr. Helen M. Berman<br />
RCSB Protein Data Bank Rutgers, The State University of New Jersey<br />
A description of the resources <strong>for</strong> data deposition, validation and query offered by the<br />
RCSB PDB will be given.<br />
1. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H.,<br />
Shindyalov, I. N. and Bourne, P. E. (2000) The Protein Data Bank. Nucleic Acids<br />
Res., 28, 235-242.<br />
2. Berman, H.M., Battistuz, T., Bhat, T.N., Bluhm, W.F., Bourne, P.E., Burkhardt, K.,<br />
Feng, Z., Gilliland, G.L., Iype, L., Jain, S., Fagan, P., Marvin, J., Padilla, D.,<br />
Ravichandran, V., Schneider, B., Thanki, N., Weissig, H., Westbrook, J.D. and<br />
Zardecki, C. (2002) The Protein Data Bank. Acta Crystallogr D 58, 899-907.<br />
3. John Westbrook, Zukang Feng, Li Chen, Huanwang Yang and Helen M. Berman<br />
(2003) The Protein Data Bank and structural genomics. Nucleic Acids Research, 31,<br />
489–491<br />
4. Bhat,T.N., Bourne,P., Feng,Z., Gilliland,G., Jain,S., Ravichandran,V., Schneider,B.,<br />
Schneider,K., Thanki,N., Weissig,H. et al. (2001) The PDB data uni<strong>for</strong>mity project.<br />
Nucleic Acids Res., 29, 214-218.<br />
5. Westbrook,J., Feng,Z., Jain,S., Bhat,T.N., Thanki,N., Ravichandran,V., Gilliland,G.L.,<br />
Bluhm,W., Weissig,H., Greer,D.S. et al. (2002) The Protein Data Bank: unifying the<br />
archive. Nucleic Acids Res., 30, 245-248.<br />
10:20-11:00 Crystals, Symmetry and Protein Assemblies Dr Kim Henrick (<strong>EBI</strong>)<br />
1. Henrick,K. and Thornton,J.M. (1998) PQS: a protein quaternary structure file<br />
server. Trends Biochem. Sci., 23, 358-361.<br />
11:00-11:30 coffee break<br />
11:30-12:10 Protein-DNA Interactions: analysis and prediction Dr Sue Jones<br />
The <strong>3D</strong> structures of over 700 proteins bound to DNA molecules have been determined.<br />
These proteins have diverse structural folds, and achieve binding and recognition of DNA in<br />
many different ways. This lecture will give an overview of the prominent characteristics of<br />
DNA-binding proteins, and explain how common physicochemical properties and conserved<br />
structural motifs can be used in a predictive manner to identify novel DNA-binding proteins.<br />
1. Jones S. & Thornton J.M. (2004) Searching <strong>for</strong> functional sites in protein<br />
structures. Current Opinion in Chemical Biology. 8, p3-7.<br />
2. Jones S, Shanahan H, Berman H.M. & Thornton J.M. (2003) Using<br />
electrostatic potentials to predict DNA-binding sites on DNA-binding proteins.<br />
Nucleic Acids Research 31, p7189-7198.<br />
3. Jones S, Barker J, Nobeli I & Thornton JM. (2003): Using structural motifs to<br />
identify proteins with DNA binding function. Nucleic Acids Research. 31,<br />
p2811-2823.
4. Jones S. & Thornton J.M. (2003) Protein-DNA interactions: the story so far<br />
and a new method <strong>for</strong> prediction. Comparative and Functional Genomics. 4,<br />
p428-431.<br />
5. Jones S, van Heyningen P, Berman HM & Thornton JM. (1999) Protein-DNA<br />
interactions: a structural analysis. Journal of Molecular Biology 287, p877-<br />
896.<br />
Additional Notes on Protein-Protein Interactions: classification, analysis and prediction<br />
Interactions between proteins are fundamental to many diverse biological processes including<br />
signal transduction, enzyme inhibition and cell adhesion. These interactions can be classified<br />
as ‘obligate’ or ‘non-obligate’. Obligate interactions <strong>for</strong>m the basis of the quaternary structure<br />
of multimeric proteins, and non-obligate interactions occur between proteins that exist<br />
independently as well as in complexes. This lecture will give an overview of the<br />
characteristics of protein-protein complexes from <strong>3D</strong> structures, and explain how these<br />
features vary dependant upon the class of the complex. A method <strong>for</strong> the prediction of protein<br />
interaction sites will then be described based on the analysis of patches on the protein<br />
surface.<br />
�� Jones S & Thornton JM. (1999) Protein domain interfaces: characterisation and<br />
comparison with oligomeric protein interfaces. Protein Engineering 13, p77-82.<br />
�� Jones S & Thornton JM. (1997): Prediction of protein-protein interaction sites using<br />
patch analysis. Journal of Molecular Biology 272, p133-143.<br />
�� Jones S & Thornton JM. (1997): Analysis of protein-protein interaction sites using<br />
surface patches. Journal of Molecular Biology 272, p121-132.<br />
�� Jones S & Thornton JM. (1996): Principles of protein-protein interactions derived from<br />
structural studies. Proceedings of the National Academy of Science (USA) 93, p13-20.<br />
12:10-12:50 Using the ‘Thornton-Group’ WWW Database Services <strong>for</strong> Structural<br />
Research Dr Roman Laskowski <strong>EBI</strong><br />
12:50-14:00 lunch<br />
1. R.A. Laskowski, J.D. Watson and J.M. Thornton (2003). From Protein<br />
structure to biochemical function. J. Struct. Funct Genomics, 4, 167-177.<br />
2. Laskowski RA. (2001) PDBsum: summaries and analyses of PDB structures.<br />
Nucleic Acids Res. 29 221-2.<br />
3. Luscombe N M, Laskowski R A, Westhead D R, Milburn D, Jones S,<br />
Karmirantzou M, Thornton J M (1998). New tools and resources <strong>for</strong> analysing<br />
protein structures and their interactions. Acta Cryst., D54, 1132-1138.<br />
4. Craig T. Porter, Gail J. Bartlett, and Janet M. Thornton (2004) The Catalytic<br />
Site Atlas: a resource of catalytic sites and residues identified in enzymes<br />
using structural data. Nucl. Acids. Res. 32: D129-D133.<br />
5. G J Bartlett, C T Porter, N Borkakoti & J M Thornton, Journal of Molecular<br />
Biology (2002) 324, 105-121.<br />
14:00-14:40 Quaternary <strong>Structure</strong> Inference of Proteins from their Crystals Dr<br />
Hannes Ponstingl <strong>EBI</strong><br />
Protein-Protein Interactions: The basic principles which determine the strength and<br />
geometry of protein-protein complexes by Patch Analysis and other methods.
1. H. Ponstingl, T. Kabir & J. M. Thornton (2003) Automatic inference of protein<br />
quaternary structure from crystals, J. Appl. Cryst. 36, 1116-1122.<br />
2. H. Ponstingl, K. Henrick & J. M. Thornton (2000) Discriminating between<br />
homodimeric and monomeric proteins in the crystalline state. Proteins 41, 47-57.<br />
14:40-15:20 Validation of protein structures ... Or: just because it was<br />
published in Nature, doesn't mean it's true ! Dr. Gerard Kleywegt Uppsala<br />
Sweden<br />
With the explosive growth of the number of experimentally determined<br />
macromolecular structures, "structural awareness" is becoming an important aspect<br />
of many disciplines, ranging from medicinal chemistry to cell biology. This means that<br />
many scientists who have not been specifically trained in the area want to make use<br />
of structural in<strong>for</strong>mation in order to explain the molecular basis of their own research<br />
findings, to plan new experiments, to design novel ligands, substrates or inhibitors,<br />
etc. What these users of structural in<strong>for</strong>mation often are unaware of is that there are<br />
limitations to and uncertainties in the experimentally determined structures. A number<br />
of protein structures have been published (often in prestigious journals) that turned<br />
out to be partly or entirely incorrect. A few examples will be given, and simple ways in<br />
which non-experts can assess the overall reliability of structures will be discussed.<br />
However, as technology improves, such gross errors are less and less likely to occur.<br />
On the other hand, mistakes in the details of the structures are much easier to make,<br />
and concomitantly more difficult to detect. It is often in these details, however, that the<br />
value of a structure lies, since they reveal the molecular basis of interactions. This is<br />
particularly true <strong>for</strong> non-macromolecular entities (ligands, inhibitors, substrateanalogues,<br />
sugars, ions, etc.). Some of the pitfalls and limitations of the use of<br />
structural in<strong>for</strong>mation will be discussed, with a view to structure-based design. In the<br />
practical, some of the basics of protein structure validation will be reviewed, and the<br />
use of various databases (PDB, PDBsum, PDBREPORT and EDS) to assess the<br />
quality of deposited protein structures will be explained.<br />
References:<br />
1. G J Kleywegt, "Validation of protein crystal structures" (Topical review), Acta<br />
Crystallographica, D56, 249-265 (2000).<br />
2. AM Davis, SJ Teague & GJ Kleywegt, "Applications and limitations of X-ray<br />
crystallographic data in structure-based ligand and drug design", Angewandte<br />
Chemie International Edition, 42, 2718-2736 (2003)<br />
3. EDS Viewer <strong>for</strong> structures and electron density maps – interpretation of validation<br />
criteria<br />
15:20-16:00 <strong>3D</strong> databases and data warehouse technology Dr Phil McNeil (<strong>EBI</strong>)<br />
16:00-16:30 coffee break<br />
16:30-17:10 Clustering of <strong>3D</strong> structures and representative sets Dr Thomas<br />
Oldfield <strong>EBI</strong><br />
1. Li,W., Jaroszewski,L. and Godzik,A. (2001) Clustering of highly homologous<br />
sequences to reduce the size of large protein databases.Bioin<strong>for</strong>matics, 17,<br />
282-283.<br />
17:10-17:50 Sequences and <strong>3D</strong> structures (Integration of <strong>3D</strong> data and sequence<br />
databases) Dr Sameer Velankar <strong>EBI</strong><br />
The "<strong>Structure</strong> integration with function, taxonomy and sequence (SIFTS) initiative"<br />
aims to work towards the integration of various bioin<strong>for</strong>matics resources. One of the
major obstacles to the improved integration of structural databases such as MSD<br />
and sequence databases like UniProt<br />
, which are primary archival databases <strong>for</strong><br />
structure and sequence data, is the absence of up to date and well-maintained<br />
mapping between corresponding entries. We have worked closely with the UniProt<br />
group at the <strong>EBI</strong> to clean up the taxonomy and sequence cross-reference in<strong>for</strong>mation<br />
in the MSD and UniProt databases. The project was started in the year 2001 and has<br />
resulted in creating a robust mechanisms <strong>for</strong> exchanging data between the two<br />
primary data resources. This has dramatically improved the quality of annotation in<br />
both databases and is aiding the continuing improvements of legacy data. In the<br />
longer term this project will allow <strong>for</strong> not only the better and closer integration of<br />
derived-data resources but will continue to improve the quality of all data in the<br />
primary resources. This in<strong>for</strong>mation is vital <strong>for</strong> the reliable integration of the sequence<br />
family databases such as Pfam and<br />
Interpro with the structure-oriented databases of<br />
SCOP and CATH<br />
. This in<strong>for</strong>mation has been made available<br />
to the eFamily group and now <strong>for</strong>ms the basis of the<br />
regular interchange of in<strong>for</strong>mation between the member databases (MSD, Uniprot,<br />
Pfam, Interpro, SCOP and CATH).
Tuesday 21 st September<br />
Participants split into 2 groups.<br />
Morning session Group 1<br />
Tutorials (15 min intro/demo – 45 minute tutorial)<br />
�� Searching <strong>3D</strong> protein structures <strong>for</strong> conserved DNA-binding motifs: Dr<br />
Sue Jones<br />
�� Using the ‘Thornton-Group’ WWW Database Services <strong>for</strong> Structural<br />
Research Dr Roman Laskowski <strong>EBI</strong> (2hours)<br />
CATRES, Catalytic Site Atlas (CSA), NetFunc, EC->PDB, Pita - Protein<br />
InTerfaces and Assemblies, Receptor <strong>Structure</strong> and Function, Protein Side-<br />
Chain Interactions, Practical: Structural Genomics<br />
�� Pfam and MEROPS Robert Finn Sanger<br />
Morning session Group 2<br />
Lectures (40 minutes – intro/demos)<br />
�� Protein <strong>Structure</strong> Classification and Genome Annotation: Technologies<br />
and Insights from the CATH Database. Professor Christine Orengo UCL<br />
London UK<br />
�� SCOP: Structural Classification of Proteins. Dr Loredana Lo Conte (LMB<br />
Cambridge UK)<br />
�� SSM fold characterization Dr Eugene Krissinel <strong>EBI</strong><br />
Afternoon session Group 1<br />
Lectures (40 minutes - intro/demos)<br />
�� Protein <strong>Structure</strong> Classification and Genome Annotation: Technologies<br />
and Insights from the CATH Database. Professor Christine Orengo UCL<br />
London UK<br />
�� SCOP: Structural Classification of Proteins. Dr Loredana Lo Conte (LMB<br />
Cambridge UK)<br />
�� SSM fold characterization Dr Eugene Krissinel <strong>EBI</strong><br />
Afternoon session Group 2<br />
Tutorials (15 min intro/demo – 45 minute tutorial)<br />
�� Searching <strong>3D</strong> protein structures <strong>for</strong> conserved DNA-binding motifs: Dr<br />
Sue Jones<br />
�� Using the ‘Thornton-Group’ WWW Database Services <strong>for</strong> Structural<br />
Research Dr Roman Laskowski <strong>EBI</strong> (2hours)<br />
CATRES, Catalytic Site Atlas (CSA), NetFunc, EC->PDB, Pita - Protein<br />
InTerfaces and Assemblies, Receptor <strong>Structure</strong> and Function, Protein Side-<br />
Chain Interactions, Practical: Structural Genomics<br />
�� Pfam and MEROPS Robert Finn Sanger<br />
[ Motif and protein structures Dr. Gerard Kleywegt Uppsala Sweden ]<br />
Protein <strong>Structure</strong> Classification and Genome Annotation: Technologies and<br />
Insights from the CATH Database. Professor Christine Orengo UCL London UK<br />
CATH is a novel hierarchical classification of protein domain structures, which<br />
clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and<br />
Homologous superfamily (H). Class, derived from secondary structure content, is
assigned <strong>for</strong> more than 90% of protein structures automatically. Architecture, which<br />
describes the gross orientation of secondary structures, independent of<br />
connectivities, is currently assigned manually. The topology level clusters structures<br />
according to their toplogical connections and numbers of secondary structures. The<br />
homologous superfamilies cluster proteins with highly similar structures and<br />
functions. The assignments of structures to toplogy families and homologous<br />
superfamilies are made by sequence and structure comparisons.<br />
I will illustrate concepts behind protein structure comparison and classifiction using<br />
the CATH database. I will also present methods <strong>for</strong> providing structural annotations<br />
<strong>for</strong> genome sequences.<br />
1. Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B. and<br />
Thornton, J. M. (1997) CATH–a hierarchic classification of protein domain<br />
structures. <strong>Structure</strong>, 5, 1093-1108.<br />
2. Pearl, F.M.G, Lee, D., Bray, J.E, Sillitoe, I., Todd, A.E., Harrison, A.P.,<br />
Thornton, J.M. and Orengo, C.A. (2000) Assigning genomic sequences to<br />
CATH Nucleic Acids Research. 28. 277-282<br />
SCOP: Structural Classification of Proteins. Dr Loredana Lo Conte (LMB<br />
Cambridge UK)<br />
Nearly all proteins have structural similarities with other proteins and, in some of<br />
these cases, share a common evolutionary origin. The SCOP database, created by<br />
manual inspection and abetted by a battery of automated methods, aims to provide a<br />
detailed and comprehensive description of the structural and evolutionary<br />
relationships between all proteins whose structure is known. As such, it provides a<br />
broad survey of all known protein folds, detailed in<strong>for</strong>mation about the close relatives<br />
of any particular protein, and a framework <strong>for</strong> future research and classification.<br />
I will take you behind the scenes. All we learned so far. What is missing.<br />
1. Murzin, A. G., Brenner, S. E., Hubbard, T. and Chothia, C. (1995) SCOP: a<br />
structural classification of proteins database <strong>for</strong> the investigation of<br />
sequences and structures. J. Mol. Biol., 247, 536-540.<br />
2. Andreeva A., Howorth D., Brenner S.E., Hubbard T.J.P., Chothia C., Murzin<br />
A.G. (2004). SCOP database in 2004: refinements integrate structure and<br />
sequence family data. Nucl. Acid Res. 32, D226-D229.<br />
3. Lo Conte L., Brenner S. E., Hubbard T.J.P., Chothia C., Murzin A. (2002).<br />
SCOP database in 2002: refinements accommodate structural genomics.<br />
Nucl. Acid Res. 30, 264-267.<br />
Pfam and MEROPS Robert Finn Sanger<br />
Pfam is a database of two parts, the first is the curated part of Pfam containing over<br />
7459 protein families. To give Pfam a more comprehensive coverage of known<br />
proteins we automatically generate a supplement called Pfam-B. This contains a<br />
large number of small families taken from the PRODOM database that do not overlap<br />
with Pfam-A. Although of lower quality Pfam-B families can be useful when no Pfam-<br />
A families are found.<br />
1. The Pfam Protein Families Database Alex Bateman, Lachlan Coin, Richard Durbin,<br />
Robert D. Finn, Volker Hollich, Sam Griffiths-Jones, Ajay Khanna, Mhairi Marshall,<br />
Simon Moxon, Erik L. L. Sonnhammer, David J. Stud holme, Corin Yeats and Sean<br />
R. Eddy (2004) Nucleic Acids Research Database Issue 32, D138-D141
2. Falquet L, Pagni M, Bucher P, Hulo N, Sigrist CJ, Hofmann K, Bairoch A. (2002). The<br />
PROSITE database, its status in 2002 Nucleic Acids Res. 30:235-238<br />
3. Julie D. Thompson, Fédéric Plewniak, Raymond Ripp, Jean-Claude Thierry and<br />
Olivier Poch (2001) Towards a Reliable Objective Function <strong>for</strong> Multiple Sequence<br />
Alignments. J.Mol.Biol. 314, 937-951<br />
4. Servant F, Bru C, Carrère S, Courcelle E, Gouzy J, Peyruc D, Kahn D (2002)<br />
ProDom: Automated clustering of homologous domains. Briefings in Bioin<strong>for</strong>matics.<br />
3, 246-251<br />
SSM fold characterization Dr Eugene Krissinel <strong>EBI</strong><br />
SSM is a powerful interactive research tool <strong>for</strong> secondary structure matching that<br />
allows <strong>for</strong> comparing protein structures in <strong>3D</strong>. The service provides <strong>for</strong> (i) pairwise<br />
comparison and <strong>3D</strong> alignment of protein structures, (ii) multiple comparison and <strong>3D</strong><br />
alignment of protein structures, (iii) examination of a protein structure <strong>for</strong> similarity<br />
with the whole PDB or SCOP archives, (iv) best Ca-alignment of compared structures<br />
and (v) the ability to download and visualization of best-superposed structures using<br />
RasMol or RasTop. The results are linked to other services including OCA, SCOP,<br />
GeneCensus, FSSP, <strong>3D</strong>ee, CATH, PDBsum, SwiisProt and ProtoMap. SSM is<br />
recognized as a valuable tool in protein research, an aid to study protein function via<br />
structural similarities (used in drug design), protein con<strong>for</strong>mations, choice of model <strong>for</strong><br />
structure solution in X-ray experiments and many others. The development has also<br />
extensive integration with other MSD services.<br />
1. E. Krissinel and K. Henrick, Protein structure comparison in <strong>3D</strong> based on secondary<br />
structure matching (SSM) followed by Ca alignment, scored by a new structural<br />
similarity function. In: Andreas J. Kungl & Penelope J. Kungl (Eds.), Proceedings of<br />
the 5th International Conference on Molecular Structural Biology, Vienna, September<br />
3-7, 2003, p.88.<br />
2. E. Krissinel and K. Henrick, Common subgraph isomorphism detection by bactracking<br />
search. (2004) Software: Practice and Experience, 34, 591-607.<br />
Motif and protein structures Dr. Gerard Kleywegt Uppsala Sweden<br />
Motif recognition (essentially, function-from-structure)<br />
SPASM server Motif recognition in nucleic acids structures (SPANA)<br />
Motif recognition in proteins (SPASM)<br />
1. Madsen, D. and Kleywegt, G.J. (2002). Interactive motif and fold recognition in<br />
protein structures. J. Appl. Cryst. 35, 137-139.
WEDS 22 nd September<br />
Participants split into 2 groups.<br />
Morning session Group 1<br />
Tutorials (15 min intro/demo –45 minute tutorial)<br />
�� Validation of protein structures ... Or: just because it was published in<br />
Nature, doesn't mean it's true ! Dr. Gerard Kleywegt Uppsala Sweden<br />
(90min)<br />
�� Learning about structures using the RCSB PDB Dr. Philip E. Bourne<br />
Professor of Pharmacology University of Cali<strong>for</strong>nia (60min)<br />
�� Evaluation of structure quality tutorial, using RCSB tools Ms. Kyle<br />
Burkhardt RCSB Protein Data Bank Rutgers, The State University of New<br />
Jersey (60min)<br />
Morning session Group 2<br />
Lectures (7x30mins - intro/demo MSD tools)<br />
�� MSDchem, MSDlite, MSDpro, MSDsite<br />
�� MSDmySQL , MSDmine<br />
�� Advanced AstexViewer integrated into <strong>3D</strong> PDB searches<br />
Generic search systems <strong>for</strong> the search database has been written and made a public<br />
service as http://www.ebi.ac.uk/msd-srv/msdlite and http://www.ebi.ac.uk/msdsrv/msdpro.<br />
The system is written using java servlets and uses XML extensively <strong>for</strong><br />
configuration of the database/search system interactions, the description of the user<br />
interface, and the return of results. The system is designed to translate user input<br />
from a series of values into an SQL query, which can then be executed on the<br />
database. The architecture of the server-side of the search system allows a high<br />
degree of flexibility and extending the range of searches SQL statements that can be<br />
created automatically). The system is highly configurable and can be moved easily to<br />
other databases simply by modifying XML dictionaries, which describe the database.<br />
Afternoon session Group 1<br />
Lectures (7x30mins - intro/demo MSD tools)<br />
�� MSDchem, MSDlite, MSDpro, MSDsite<br />
�� MSDmySQL , MSDmine<br />
�� Advanced AstexViewer integrated into <strong>3D</strong> PDB searches<br />
Afternoon session Group 2<br />
Tutorials<br />
�� Validation of protein structures ... Or: just because it was published in<br />
Nature, doesn't mean it's true ! Dr. Gerard Kleywegt Uppsala Sweden<br />
(90min)<br />
�� Learning about structures using the RCSB PDB Dr. Philip E. Bourne<br />
Professor of Pharmacology University of Cali<strong>for</strong>nia (60min)<br />
�� Evaluation of structure quality tutorial, using RCSB tools Ms. Kyle<br />
Burkhardt RCSB Protein Data Bank Rutgers, The State University of New<br />
Jersey (60min)<br />
Learning about structures using the RCSB PDB Dr. Philip E. Bourne Professor of<br />
Pharmacology University of Cali<strong>for</strong>nia,
The RCSB PDB has been reengineered to include many new features and to<br />
integrate a variety of additional in<strong>for</strong>mation related to macromolecular structure and<br />
function ranging from genomic in<strong>for</strong>mation to disease states. A number of these<br />
features will be explored through several typical usage scenarios suited to novice<br />
users and more senior biologists.<br />
1. Philip E. Bourne, Kenneth J. Addess, Wolfgang F. Bluhm, Li Chen,<br />
Nita Deshpande, Zukang Feng, Ward Fleri, Rachel Green, Jeffrey C.<br />
Merino-Ott, Wayne Townsend-Merino, Helge Weissig, John<br />
Westbrook and Helen M. Berman (2004). The distribution and query<br />
systems of the RCSB Protein Data Bank Nucleic Acids Research 32,<br />
Database issue D223-D225<br />
Evaluation of structure quality tutorial, using RCSB tools Ms. Kyle Burkhardt<br />
RCSB Protein Data Bank Rutgers, The State University of New Jersey<br />
In this one hour tutorial, users will learn how to evaluate the quality of a structure.<br />
Users will download a structure from the PDB and validate the structure using the<br />
RCSB developed online Validation Suite. Users will learn how to analyze the<br />
validation report as well as PROCHECK, NUCHECK, and SFCheck results to<br />
determine structure quality.<br />
MSDsite Service Mr Adel Golovin <strong>EBI</strong><br />
The research service, MSDsite has been developed to give access to <strong>3D</strong> active site<br />
data. The three-dimensional environments of ligand binding sites have been derived<br />
from the parsing and loading of the PDB entries into a relational database. For each<br />
bound molecule the biological assembly of the quaternary structure has been used to<br />
determine all contact residues and a fast interactive search and retrieval system has<br />
been developed. Prosite pattern and short sequence search options are available<br />
together with a novel graphical query generator <strong>for</strong> inter-residue contacts.<br />
Dimitris Dimitropoulos<br />
MSDchem tutorial: Chemistry as the starting point of a search. Following the path<br />
from ligand chemistry to protein structure. This tutorial demonstrates in detail the<br />
searching capabilities of the MSD database and the MSDchem tool in identifiying<br />
ligands using their basic chemical topology and chemical signature.<br />
MSDmySQL tutorial: Working with the bare MSD database in the popular mySQL<br />
<strong>for</strong>m using directly general purpose standard API's and programming languages. A<br />
way to use the MSD database infrastructure in the most flexible and powerfull way.<br />
MSDmine tutorial: A web application <strong>for</strong> scientific discovery, data analysis and<br />
knowledge mining <strong>for</strong> the advanced researcher of the MSD database. From the<br />
simplest to most complex searches, that combine many different in<strong>for</strong>mation entities,<br />
together with visualisation and cross-references. Online generation of charts and<br />
data drill and roll-up operations.<br />
1. Golovin, T. J. Oldfield, J. G. Tate, S. Velankar, G. J. Barton, H. Boutselakis, D.<br />
Dimitropoulos, J. Fillon, A. Hussain, J. M. C. Ionides, M. John, P. A. Keller, E.<br />
Krissinel, P. McNeil, A. Naim, R. Newman, A. Pajon, J. Pineda, A. Rachedi, J.<br />
Copeland, A. Sitnov, S. Sobhany, A. Suarez-Uruena, G. J. Swaminathan, M. Tagari,
S. Tromm, W. Vranken and K. Henrick (2004) E-MSD: an integrated data resource <strong>for</strong><br />
bioin<strong>for</strong>matics. Nucleic Acids Research, 32, Database issue D211-D216<br />
2. [4] Boutselakis, H., Dimitropoulos, D., Fillon, J., Golovin, A., Henrick, K., Hussain, A.,<br />
Ionides, J., John, M., Keller, P.A., Krissinel, E., McNeil, P., Naim, A., Newman, R.,<br />
Oldfield, T., Pineda, J., Rachedi, A., Copeland, J., Sitnov, A., Sobhany, S., Suarez-<br />
Uruena, A., Swaminathan, J., Tagari, M., Tate, J., Tromm, S., Velankar, S. and<br />
Vranken, W. (2003) E-MSD: the European Bioin<strong>for</strong>matics Institute Macromolecular<br />
<strong>Structure</strong> Database Nucleic Acids Research, 31, 458-462
Thursday<br />
Morning session Group 1<br />
�� Tutorials – Using MSD Tools to solve <strong>Problem</strong>s (60min)<br />
��Tutorials Database Replication and use on your Desktop (60min)<br />
MSDmySQL and MSDmine<br />
��Tutorials –Scripting languages + data mining Dr Jaime Prilusky<br />
Weizmann Israel (90min)<br />
Morning session Group 2<br />
Lectures and Demos<br />
��Visualisation data mining Dr Thomas Oldfield <strong>EBI</strong><br />
��What is visualisation and how are complex data Represented?<br />
Professor Bob Spence (Imperial College London)<br />
��Small Structural and Sequence Motifs Dr James Milner-White<br />
University of Glasgow<br />
Afternoon session Group 1<br />
Lectures and Demos<br />
��Visualisation data mining Dr Thomas Oldfield <strong>EBI</strong><br />
��What is visualisation and how are complex data Represented?<br />
Professor Bob Spence (Imperial College London)<br />
��Small Structural and Sequence Motifs Dr James Milner-White<br />
University of Glasgow<br />
Afternoon session Group 2<br />
�� Tutorials – Using MSD Tools to solve <strong>Problem</strong>s (60min)<br />
��Tutorials Database Replication and use on your Desktop (60min)<br />
MSDmySQL & MSDmine<br />
��Tutorials –Scripting languages + data mining Dr Jaime Prilusky<br />
Weizmann Israel (90min)<br />
Scripting languages + data mining Dr Jaime Prilusky Weizmann Israel<br />
Use of fast scripting languages (Perl, Python) <strong>for</strong> creating ad-hoc searching and<br />
analysis tools on top of existing databases<br />
Visualisation<br />
1. Tate,J.G., Moreland,J.L. and Bourne,P.E. (2001) Design and<br />
implementation of a collaborative molecular graphics environment. J.<br />
Mol. Graph. Model., 19, 280-287, 369-273.<br />
2. Neshich,G., Togawa,R.C., Mancini,A.L., Kuser,P.R., Yamagishi,M.E.,<br />
Pappas,G.,Jr, Torres,W.V., Fonseca e Campos,T., Ferreira,L.L.,<br />
Luna,F.M. et al. (2003) STING Millennium: a web-based suite of<br />
programs <strong>for</strong> comprehensive and simultaneous analysis of protein<br />
structure and sequence. Nucleic Acids Res., 31, 3386-3392.<br />
3. Bob Spence The Acquisition of Insight<br />
http://www.ee.ic.ac.uk/research/in<strong>for</strong>mation/www/Bobs.html
4. Watson, J. D. and Milner-White, E. J. 2002 The con<strong>for</strong>mations of Polypeptide<br />
Chains where the main-chain parts of successive residues are enantiomeric.<br />
Their occurrence in Cation and Anion-binding regions of proteins. Journal of<br />
Molecular Biology 315, 183-191<br />
5. Watson, J. D. and Milner-White, E. J. 2002 A novel main-chain anion-binding<br />
site in proteins: The Nest. A particular combination of phi,psi values in<br />
successive residues gives rise to anion-binding sites that occur commonly<br />
and are found often at functionally important regions. Journal of Molecular<br />
Biology 315, 171-182<br />
6. Wan, W. Y. and Milner-White, E. J. 1999 A natural grouping of motifs with an<br />
aspartate or asparagine residue <strong>for</strong>ming two hydrogen bonds to residues<br />
ahead in sequence: Their occurrence at alpha-helical N termini and in other<br />
situations. Journal of Molecular Biology 286, 1633-1649<br />
7. Furnas, G.W. Generalised fisheye views. (1986) Human Factors in<br />
Computing Systems. CHI’86 Conference Proceeedings, Boston, April 13-17<br />
pp 16-23.<br />
8. George G. Robertson, Jock D. Mackinlay and Stuart K. Card. (1991) The<br />
Perspective wall: Detail and context smoothly integrated CHI’91 Conferenve<br />
Proceedings, pp 174-179.<br />
9. Oldfield, T.J. Creating structure features by datamining the PDB to use as<br />
molecular replacement models. Acta Cryst D57 1421-1427.
Friday 24 th September<br />
9:00-9-40 Summary of <strong>3D</strong> data and WWW Current scientific resources<br />
<strong>for</strong> <strong>3D</strong> structure (Dr Kim Henrick <strong>EBI</strong>)<br />
��Clean data<br />
��Data base architectures<br />
9:40-10:20 Now you have seen the services – a word about Conceptual<br />
basis <strong>for</strong> data analysis, problem <strong>solving</strong> and critical thinking – (Dr Tom<br />
Oldfield <strong>EBI</strong>)<br />
10:20-11:00 Data Interchange/API’s Data Integration problems / Query<br />
Interchange – Dr John Westbrook (RCSB)<br />
11:00-11:30 coffee break<br />
11:30-12:10 FeedBack – Wrapup – Chair Dr Helen Berman (RCSB)<br />
12:10 Lunch and Depart