EMBL-EBI Powerpoint Presentation
EMBL-EBI Powerpoint Presentation
EMBL-EBI Powerpoint Presentation
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
21.11.2012<br />
5<br />
Visualising Molecular Interactions<br />
(Standards and Formats)<br />
Sandra Orchard<br />
<strong>EMBL</strong>-<strong>EBI</strong><br />
<strong>EBI</strong> is an Outstation of the European Molecular Biology Laboratory.
Who wants to use interaction data?<br />
1. Protein biochemists studying one or a family of protein –<br />
detail orientated, down to specific residues at interacting<br />
interfaces<br />
2. Cell biologists – interested in processes, pathways, and<br />
additional interactions → new biology. Often interested in<br />
binding domains.<br />
3. Network biologists – tend to want coverage rather than<br />
accuracy<br />
2
What are the issues with interaction data?<br />
1. Users tend not to understand the data – if they see<br />
A – B<br />
this will be interpreted as a direct, experimentally<br />
verified interaction.<br />
2. Users do not take data confidence into account –<br />
will naively regard all data as of equal strength
Interaction Databases<br />
Deep Curation<br />
IntAct – active curation, broad species coverage, all molecule types<br />
MINT – active curation, broad species coverage, PPIs<br />
DIP – active curation, broad species coverage, PPIs<br />
MatrixDB – active curation, extracellular matrix molecules only<br />
MPACT - ceased curation, limited species coverage, PPIs<br />
BIND – ceased curating 2006/7, broad species coverage, all<br />
molecule types – information becoming dated<br />
Shallow curation<br />
BioGRID – active curation, limited number of model organisms<br />
HPRD – ? active curation, human-centric, modelled interactions<br />
MPIDB – active curation, microbial interactions, soon to be closed,<br />
data to IntAct
Why are data standards essential<br />
• Prior to 2003, many databases= many formats. Onus on<br />
the user to reformat when merging data<br />
• File conversion inevitably leads to data loss<br />
• Many formats compromised tool development – each tool<br />
developed tended to be database specific<br />
5
HUPO-PSI<br />
• Global representative body for MS proteomics<br />
• Initiated multi-lab tissue profiling studies – data<br />
incompatibility between groups compromised<br />
comparability<br />
• MS and MI interactions standards developed to enable<br />
these comparisons<br />
6
What constitutes a PSI standard<br />
• Documents that make up each individual standard<br />
• Minimal reporting requirements => MIAPE document<br />
• XML Data exchange format<br />
• Domain-specific controlled vocabulary
MIMIx
PSI-MI standard format<br />
• XML schema, tab-delimited option (MITAB) and<br />
detailed controlled vocabularies<br />
• Jointly developed by major data providers:<br />
BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U.<br />
Bordeaux, U. Cambridge, and others<br />
• PSI-MI XML Version 1.0 published in February 2004<br />
The HUPO PSI Molecular Interaction Format - A community standard for the representation of<br />
protein interaction data.<br />
Henning Hermjakob et al, Nature Biotechnology 2004, 22, 176-183.<br />
• Version 2.5/MITAB2.5 published in October 2007<br />
Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions;<br />
Samuel Kerrien et al. BioMed Central. 2007.<br />
9
PSI-MI standard format benefits<br />
• Collecting and combining data from different sources<br />
has become easier<br />
• Standardized annotation through PSI-MI ontologies<br />
• Tools from different organizations can be chained,<br />
e.g. analysis of IntAct data in Cytoscape.<br />
Home page<br />
http://www.psidev.info/MI<br />
10
PSIMITAB Extended Columns<br />
+<br />
+<br />
2.5 2.6 2.7<br />
11<br />
MITAB2.5 columns (15):<br />
• ID(s) interactor A & B<br />
• Alt. ID(s) interactor A & B<br />
• Alias(es) interactor A & B<br />
• Interaction detection<br />
method(s)<br />
• Publication 1st author(s)<br />
• Publication Identifier(s)<br />
• Taxid interactor A & B<br />
• Interaction type(s)<br />
• Source database(s)<br />
• Interaction identifier(s)<br />
• Confidence value(s)<br />
MITAB2.6 columns (+21):<br />
• Experimental role(s) A&B<br />
• Biological role(s) A&B<br />
• Properties (CrossReference) A&B<br />
• Type(s) of interactors A&B<br />
• HostOrganism<br />
• Expansion method(s)<br />
• Annotations A&B<br />
• Parameters<br />
• Creation/update date<br />
• Checksums A, B & interaction<br />
• Negative<br />
MITAB2.7 columns (+6):<br />
• Features A&B<br />
• Stoichiometry A&B<br />
• Participant detection<br />
method A&B
PSI-MI XML3.0?<br />
• New data stretching the format to its limits<br />
• Cooperative binding – e.g. allostery<br />
• Dynamic data, captured over a series of time points, [agonist] etc<br />
Need to indicate ordered series of events – currently can be<br />
achieved as a hack.<br />
Discussion topic for 2013 PSI-MI meeting (April, Liverpool)<br />
www.psidev.info<br />
Will aim for backward compatibility,<br />
12
Using interaction data<br />
• Data from IMEx databases is richly annotated – much of<br />
this is not used in network analysis<br />
• As model organism interactomes move to completeness,<br />
data quality may become more of an issue than quantity<br />
• Passage of data from<br />
Primary databases → compilation databases → analysis tools<br />
results in loss of information<br />
•May wish to rethink pipelines to ensure more<br />
sophisticated user requirements are met<br />
13
Cell cycle – Arabidopsis thalia
www.ebi.ac.uk/ols<br />
Controlled<br />
vocabularies
Additional benefits<br />
• MITAB format – released 2007 by popular demand. Tab-delimitated<br />
organisation of data.<br />
• PSICQUIC – query access that runs across all interaction databases<br />
using PSI formats<br />
• Interaction confidence – it becomes possible to score interactions<br />
across multiple databases<br />
• Access to R Bioconductor statistics packages<br />
• Growth industry in “composite” databases – do no new curation but<br />
merge the output of resources producing data in PSI format.<br />
• IMEx
PSICQUIC<br />
• Proteomics Standards Initiative Common QUery InterfaCe.<br />
• Community effort to standardise the way to access and retrieve data<br />
from Molecular Interaction databases.<br />
• Widely implemented by independent interaction data resources.<br />
• Based on the PSI standard formats (PSI-MI MITAB) - PSICQUIC-view<br />
and PSICQUIC webservices now support MITAB 2.6 and 2.7<br />
• Not limited to protein-protein interactions, also e.g.<br />
• Drug-target interactions<br />
• Simplified pathway data<br />
• A registry listing resources implementing PSICQUIC<br />
• Documentation: http://psicquic.googlecode.com
PSICQUIC<br />
http://www.ncbi.nlm.nih.gov/pubmed/21716279
PSICQUIC Registry<br />
• 25 sources<br />
• >150,000,000 interactions<br />
http://www.ebi.ac.uk/Tools/webservices/psicquic/registry/registry?action=STATUS
PSICQUIC Registry - Tagging<br />
Content<br />
protein-protein<br />
small molecule-protein<br />
nucleic acid-protein<br />
Interaction representation<br />
evidence<br />
clustered<br />
Curation standards<br />
mimix curation<br />
imex curation<br />
rapid curation<br />
Source<br />
internally curated<br />
text mining<br />
predicted<br />
imported<br />
Complex expansion<br />
spoke<br />
matrix<br />
bipartite
PSICQUIC view<br />
• Simple and complex queries<br />
• Link back to the original source for more details<br />
• Clustering of queries across providers<br />
• Visualization of graphical network<br />
http://www.ebi.ac.uk/Tools/webservices/psicquic/view/
IntAct, PSICQUIC and Cytoscape
PSICQUIC – the downside<br />
• PSICQUIC gives access to multiple databases and large<br />
amounts of data<br />
• Many data types can be searched – experimental<br />
(physical/genetic/functional), inferred, predictive,<br />
• User has problems easily differentiating the data types<br />
• Many databases do not curate data but repackage and<br />
represent (APID, iREFindex, I2D, InnateDB, STRING) – the<br />
same information will appear multiple times<br />
23
Data distribution best practise<br />
• Flexibility in standard usage leads to non-standard usage<br />
• Data merged by identifier – if this differs, the same<br />
interaction will not merge<br />
• uniprotkb:P51587 will not merge with P51587<br />
• Ongoing work through HUPO-PSI to standardise what<br />
data goes in each column and have reference identifier<br />
(Proteins- UniProt, standard inchi key – small molecules)<br />
in designated columns to allow merging.<br />
• http://code.google.com/p/psicquic/wiki/DataDistributionBe<br />
stPractices<br />
24
IMEx<br />
• Consortium of 9 molecular interaction databases<br />
dedicated to producing high quality, annotated data,<br />
curated to the same standards<br />
• Data is curated once at a single centre then exchanged<br />
between partners<br />
• Users need only go to a single site to obtain all data<br />
• www.imexconsortium.org
26<br />
www.imexconsortium.org
IntAct goals & achievements<br />
1. Publicly available repository of molecular interactions<br />
(mainly PPIs) - ~310K binary interactions taken from<br />
>5,600 publications (October 2012)<br />
2. Data is standards-compliant and available via our<br />
website, for download at our ftp site or via PSICQUIC<br />
http://www.ebi.ac.uk/intact<br />
ftp://ftp.ebi.ac.uk/pub/databases/intact<br />
www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml<br />
3. Provide open-access versions of the software to allow<br />
installation of local IntAct nodes.<br />
27
http://www.ebi.ac.uk/intact<br />
IntAct – Home Page<br />
28
First search from the home page…<br />
Visualisation – table view<br />
Interaction<br />
Score<br />
Choice of UniProtKB<br />
or Dasty View<br />
Taxonomy<br />
PubMed/IMEx ID<br />
Details of<br />
interaction<br />
Expansion<br />
method<br />
29
30<br />
Visualisation – table view
31<br />
Visualisation – network view
Cytoscape Web<br />
• Currently V1 installed – no further development as<br />
technology change announced immediately afterwards<br />
• Development will start on beta version on V2 in New Year<br />
32
Applying a better graph layout…<br />
Visualisation – network view<br />
Master headline
Highlighting network properties…<br />
Visualisation – network view<br />
Master headline
PSICQUIC plugin for Cytoscape 2.8.3<br />
Cytoscape<br />
File Import<br />
Network from Web Services...
PSICQUIC plugin for Cytoscape 2.8.3<br />
• Does not import MITAB 2.6/2.7 fields (experimental and biological roles of<br />
participants, expansion method, host taxID, participant detection<br />
method...).<br />
• Visualization has room for improvement.<br />
• Reference attributes for visual features need to be referenced to humanreadable<br />
ones.
PSICQUIC plugin for Cytoscape 3<br />
Cytoscape<br />
File Import Network Public Databases...
PSICQUIC plugin for Cytoscape 3<br />
• Does not import MITAB 2.6/2.7 fields (experimental and biological roles of<br />
participants, expansion method, host taxID, participant detection<br />
method...).<br />
• Minor issues with some MITAB 2.5 fields (interaction type, publication 1 st<br />
author, source database).<br />
• Visualization has room for improvement.<br />
• Some bugs while merging and querying.
xgmml format from IntAct / PSICQUIC View<br />
• Includes MITAB 2.6/2.7 fields with some minor issues (fields with multiple<br />
values can have exporting problems, participant detection method).<br />
• Visualization is pretty dull, lots of room for improvement.<br />
• Issues with non-readable attributes should be fixed in the next few days (or<br />
have been fixed already).
40<br />
Current IntAct support:<br />
European Commission grants PSIMEx<br />
(FP7-HEALTH-2007-223411)<br />
APO-SYS (FP7-HEALTH-2007-200767)<br />
Affinomics (241481)
?<br />
?<br />
? ?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
?<br />
41