26.10.2014 Views

EMBL-EBI Powerpoint Presentation

EMBL-EBI Powerpoint Presentation

EMBL-EBI Powerpoint Presentation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

21.11.2012<br />

5<br />

Visualising Molecular Interactions<br />

(Standards and Formats)<br />

Sandra Orchard<br />

<strong>EMBL</strong>-<strong>EBI</strong><br />

<strong>EBI</strong> is an Outstation of the European Molecular Biology Laboratory.


Who wants to use interaction data?<br />

1. Protein biochemists studying one or a family of protein –<br />

detail orientated, down to specific residues at interacting<br />

interfaces<br />

2. Cell biologists – interested in processes, pathways, and<br />

additional interactions → new biology. Often interested in<br />

binding domains.<br />

3. Network biologists – tend to want coverage rather than<br />

accuracy<br />

2


What are the issues with interaction data?<br />

1. Users tend not to understand the data – if they see<br />

A – B<br />

this will be interpreted as a direct, experimentally<br />

verified interaction.<br />

2. Users do not take data confidence into account –<br />

will naively regard all data as of equal strength


Interaction Databases<br />

Deep Curation<br />

IntAct – active curation, broad species coverage, all molecule types<br />

MINT – active curation, broad species coverage, PPIs<br />

DIP – active curation, broad species coverage, PPIs<br />

MatrixDB – active curation, extracellular matrix molecules only<br />

MPACT - ceased curation, limited species coverage, PPIs<br />

BIND – ceased curating 2006/7, broad species coverage, all<br />

molecule types – information becoming dated<br />

Shallow curation<br />

BioGRID – active curation, limited number of model organisms<br />

HPRD – ? active curation, human-centric, modelled interactions<br />

MPIDB – active curation, microbial interactions, soon to be closed,<br />

data to IntAct


Why are data standards essential<br />

• Prior to 2003, many databases= many formats. Onus on<br />

the user to reformat when merging data<br />

• File conversion inevitably leads to data loss<br />

• Many formats compromised tool development – each tool<br />

developed tended to be database specific<br />

5


HUPO-PSI<br />

• Global representative body for MS proteomics<br />

• Initiated multi-lab tissue profiling studies – data<br />

incompatibility between groups compromised<br />

comparability<br />

• MS and MI interactions standards developed to enable<br />

these comparisons<br />

6


What constitutes a PSI standard<br />

• Documents that make up each individual standard<br />

• Minimal reporting requirements => MIAPE document<br />

• XML Data exchange format<br />

• Domain-specific controlled vocabulary


MIMIx


PSI-MI standard format<br />

• XML schema, tab-delimited option (MITAB) and<br />

detailed controlled vocabularies<br />

• Jointly developed by major data providers:<br />

BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U.<br />

Bordeaux, U. Cambridge, and others<br />

• PSI-MI XML Version 1.0 published in February 2004<br />

The HUPO PSI Molecular Interaction Format - A community standard for the representation of<br />

protein interaction data.<br />

Henning Hermjakob et al, Nature Biotechnology 2004, 22, 176-183.<br />

• Version 2.5/MITAB2.5 published in October 2007<br />

Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions;<br />

Samuel Kerrien et al. BioMed Central. 2007.<br />

9


PSI-MI standard format benefits<br />

• Collecting and combining data from different sources<br />

has become easier<br />

• Standardized annotation through PSI-MI ontologies<br />

• Tools from different organizations can be chained,<br />

e.g. analysis of IntAct data in Cytoscape.<br />

Home page<br />

http://www.psidev.info/MI<br />

10


PSIMITAB Extended Columns<br />

+<br />

+<br />

2.5 2.6 2.7<br />

11<br />

MITAB2.5 columns (15):<br />

• ID(s) interactor A & B<br />

• Alt. ID(s) interactor A & B<br />

• Alias(es) interactor A & B<br />

• Interaction detection<br />

method(s)<br />

• Publication 1st author(s)<br />

• Publication Identifier(s)<br />

• Taxid interactor A & B<br />

• Interaction type(s)<br />

• Source database(s)<br />

• Interaction identifier(s)<br />

• Confidence value(s)<br />

MITAB2.6 columns (+21):<br />

• Experimental role(s) A&B<br />

• Biological role(s) A&B<br />

• Properties (CrossReference) A&B<br />

• Type(s) of interactors A&B<br />

• HostOrganism<br />

• Expansion method(s)<br />

• Annotations A&B<br />

• Parameters<br />

• Creation/update date<br />

• Checksums A, B & interaction<br />

• Negative<br />

MITAB2.7 columns (+6):<br />

• Features A&B<br />

• Stoichiometry A&B<br />

• Participant detection<br />

method A&B


PSI-MI XML3.0?<br />

• New data stretching the format to its limits<br />

• Cooperative binding – e.g. allostery<br />

• Dynamic data, captured over a series of time points, [agonist] etc<br />

Need to indicate ordered series of events – currently can be<br />

achieved as a hack.<br />

Discussion topic for 2013 PSI-MI meeting (April, Liverpool)<br />

www.psidev.info<br />

Will aim for backward compatibility,<br />

12


Using interaction data<br />

• Data from IMEx databases is richly annotated – much of<br />

this is not used in network analysis<br />

• As model organism interactomes move to completeness,<br />

data quality may become more of an issue than quantity<br />

• Passage of data from<br />

Primary databases → compilation databases → analysis tools<br />

results in loss of information<br />

•May wish to rethink pipelines to ensure more<br />

sophisticated user requirements are met<br />

13


Cell cycle – Arabidopsis thalia


www.ebi.ac.uk/ols<br />

Controlled<br />

vocabularies


Additional benefits<br />

• MITAB format – released 2007 by popular demand. Tab-delimitated<br />

organisation of data.<br />

• PSICQUIC – query access that runs across all interaction databases<br />

using PSI formats<br />

• Interaction confidence – it becomes possible to score interactions<br />

across multiple databases<br />

• Access to R Bioconductor statistics packages<br />

• Growth industry in “composite” databases – do no new curation but<br />

merge the output of resources producing data in PSI format.<br />

• IMEx


PSICQUIC<br />

• Proteomics Standards Initiative Common QUery InterfaCe.<br />

• Community effort to standardise the way to access and retrieve data<br />

from Molecular Interaction databases.<br />

• Widely implemented by independent interaction data resources.<br />

• Based on the PSI standard formats (PSI-MI MITAB) - PSICQUIC-view<br />

and PSICQUIC webservices now support MITAB 2.6 and 2.7<br />

• Not limited to protein-protein interactions, also e.g.<br />

• Drug-target interactions<br />

• Simplified pathway data<br />

• A registry listing resources implementing PSICQUIC<br />

• Documentation: http://psicquic.googlecode.com


PSICQUIC<br />

http://www.ncbi.nlm.nih.gov/pubmed/21716279


PSICQUIC Registry<br />

• 25 sources<br />

• >150,000,000 interactions<br />

http://www.ebi.ac.uk/Tools/webservices/psicquic/registry/registry?action=STATUS


PSICQUIC Registry - Tagging<br />

Content<br />

protein-protein<br />

small molecule-protein<br />

nucleic acid-protein<br />

Interaction representation<br />

evidence<br />

clustered<br />

Curation standards<br />

mimix curation<br />

imex curation<br />

rapid curation<br />

Source<br />

internally curated<br />

text mining<br />

predicted<br />

imported<br />

Complex expansion<br />

spoke<br />

matrix<br />

bipartite


PSICQUIC view<br />

• Simple and complex queries<br />

• Link back to the original source for more details<br />

• Clustering of queries across providers<br />

• Visualization of graphical network<br />

http://www.ebi.ac.uk/Tools/webservices/psicquic/view/


IntAct, PSICQUIC and Cytoscape


PSICQUIC – the downside<br />

• PSICQUIC gives access to multiple databases and large<br />

amounts of data<br />

• Many data types can be searched – experimental<br />

(physical/genetic/functional), inferred, predictive,<br />

• User has problems easily differentiating the data types<br />

• Many databases do not curate data but repackage and<br />

represent (APID, iREFindex, I2D, InnateDB, STRING) – the<br />

same information will appear multiple times<br />

23


Data distribution best practise<br />

• Flexibility in standard usage leads to non-standard usage<br />

• Data merged by identifier – if this differs, the same<br />

interaction will not merge<br />

• uniprotkb:P51587 will not merge with P51587<br />

• Ongoing work through HUPO-PSI to standardise what<br />

data goes in each column and have reference identifier<br />

(Proteins- UniProt, standard inchi key – small molecules)<br />

in designated columns to allow merging.<br />

• http://code.google.com/p/psicquic/wiki/DataDistributionBe<br />

stPractices<br />

24


IMEx<br />

• Consortium of 9 molecular interaction databases<br />

dedicated to producing high quality, annotated data,<br />

curated to the same standards<br />

• Data is curated once at a single centre then exchanged<br />

between partners<br />

• Users need only go to a single site to obtain all data<br />

• www.imexconsortium.org


26<br />

www.imexconsortium.org


IntAct goals & achievements<br />

1. Publicly available repository of molecular interactions<br />

(mainly PPIs) - ~310K binary interactions taken from<br />

>5,600 publications (October 2012)<br />

2. Data is standards-compliant and available via our<br />

website, for download at our ftp site or via PSICQUIC<br />

http://www.ebi.ac.uk/intact<br />

ftp://ftp.ebi.ac.uk/pub/databases/intact<br />

www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml<br />

3. Provide open-access versions of the software to allow<br />

installation of local IntAct nodes.<br />

27


http://www.ebi.ac.uk/intact<br />

IntAct – Home Page<br />

28


First search from the home page…<br />

Visualisation – table view<br />

Interaction<br />

Score<br />

Choice of UniProtKB<br />

or Dasty View<br />

Taxonomy<br />

PubMed/IMEx ID<br />

Details of<br />

interaction<br />

Expansion<br />

method<br />

29


30<br />

Visualisation – table view


31<br />

Visualisation – network view


Cytoscape Web<br />

• Currently V1 installed – no further development as<br />

technology change announced immediately afterwards<br />

• Development will start on beta version on V2 in New Year<br />

32


Applying a better graph layout…<br />

Visualisation – network view<br />

Master headline


Highlighting network properties…<br />

Visualisation – network view<br />

Master headline


PSICQUIC plugin for Cytoscape 2.8.3<br />

Cytoscape<br />

File Import<br />

Network from Web Services...


PSICQUIC plugin for Cytoscape 2.8.3<br />

• Does not import MITAB 2.6/2.7 fields (experimental and biological roles of<br />

participants, expansion method, host taxID, participant detection<br />

method...).<br />

• Visualization has room for improvement.<br />

• Reference attributes for visual features need to be referenced to humanreadable<br />

ones.


PSICQUIC plugin for Cytoscape 3<br />

Cytoscape<br />

File Import Network Public Databases...


PSICQUIC plugin for Cytoscape 3<br />

• Does not import MITAB 2.6/2.7 fields (experimental and biological roles of<br />

participants, expansion method, host taxID, participant detection<br />

method...).<br />

• Minor issues with some MITAB 2.5 fields (interaction type, publication 1 st<br />

author, source database).<br />

• Visualization has room for improvement.<br />

• Some bugs while merging and querying.


xgmml format from IntAct / PSICQUIC View<br />

• Includes MITAB 2.6/2.7 fields with some minor issues (fields with multiple<br />

values can have exporting problems, participant detection method).<br />

• Visualization is pretty dull, lots of room for improvement.<br />

• Issues with non-readable attributes should be fixed in the next few days (or<br />

have been fixed already).


40<br />

Current IntAct support:<br />

European Commission grants PSIMEx<br />

(FP7-HEALTH-2007-223411)<br />

APO-SYS (FP7-HEALTH-2007-200767)<br />

Affinomics (241481)


?<br />

?<br />

? ?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

?<br />

41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!