10.07.2015 Views

Protein-Protein Interactions, Networks and Pathways - Chagall

Protein-Protein Interactions, Networks and Pathways - Chagall

Protein-Protein Interactions, Networks and Pathways - Chagall

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Protein</strong>-<strong>Protein</strong> <strong>Interactions</strong>,<strong>Networks</strong> <strong>and</strong> <strong>Pathways</strong>Doron BetelComputational Biology CenterMSKCCMay 6, 2010


Outline• Definitions• Experimental methods & Datasets• Computational Challenges• Databases• <strong>Networks</strong>• PPI reliability• Visualization• Network properties• Interaction predictions


David S. Goodsell, http://mgl.scripps.edu/people/goodsell/


<strong>Pathways</strong>• Do not exist in the cell!• A human abstraction to help organizeour underst<strong>and</strong>ing of biology.• A description of chronological orderingof proteins/DNA/small moleculeinteractions.• Unlike proteins <strong>and</strong> genes, pathwaysrepresent processes that may not beclearly defined.


http://www.biocarta.com/pathfiles/h_mTORPathway.asp


http://www.genome.jp/kegg/pathway/map/map00780.html


<strong>Protein</strong> <strong>Interactions</strong>• A geometric matching between two proteinssurfaces <strong>and</strong> ensuing covalent bonds. Similardefinition for protein-DNA, protein-small molecule interactions.• An obvious requirement for a PPI is the cooccurrenceof the two proteins in the sameplace at the same time.• Must be specific relative to other moleculespresent.• <strong>Interactions</strong> are governed by two opposingforces, binding affinities <strong>and</strong> thermal collisionsthat disrupt the binding.• Context is everything.


Experimental methods foranalysis of PPIShoemaker BA, Pancheko AR, PLoS CB, 2007, 3: e42


Yeast-two-hybrid• Widely used for detectingPPI.• Can detect weakinteractions (Kd~10 -7 )• Often detect spuriousinteractions.• Excludes membrane <strong>and</strong>secreted proteins thatcannot be detected by anuclear-based system.• Some TF are self-activating.• Review - Fields S., FEBS J.,2005, 272,5391-7


IP-MS• Does not detect binaryprotein interactions butidentifies ‘complexes’.• More biological correct.In vivo conditionsconcentrations,cofactors,unmodifiedproteins.• Can use differentprotein tags <strong>and</strong> MSdetection systems.Kumar A., Snyder M., Nature 415, 123 - 4 (10 Jan 2002)


Graph TheoryVertex (node)CycleEdge-5DirectedEdge710WeightedEdge


Internet Mapping Project• ISPs• 100KNodeshttp://www.cs.bell-labs.com/who/ches/map/


Yeast <strong>Protein</strong> Linkage Map Project• proteins• 1548Nodeshttp://depts.washington.edu/sfields/yplm/data/index.html


Computational ChallengesI:Data description• The identity of the interacting proteins is notalways clear. Gene names, differentIdentifiers, alternative splicing, experimentalconstructs - a major resource expend.• Post translational modification. Not easy todescribe <strong>and</strong> can be very complex (e.g.glycosylation).• Conditions <strong>and</strong> context are often missing. Invitro experiments may lack the time/spatialcomponents of interactions.


Computational Challenges II:Data quality• Different experiments provide differentinformation: functional assays, site-directedmutagenesis, Y2H, IP-MS, IP-Western.• Different levels of details: Structuralcomplexes provide detailed atomicinformation. IP-MS provide proteincomplexes.• Incomplete coverage: not all cellularcomponents <strong>and</strong> events are covered equally.For example, membrane proteins are missedby Y2H. Impacts PPI predictions.


Computational ChallengesIII:Data availability• Most <strong>and</strong> best data is in the minds of expertscientists. Most scientists will not volunteer theirtime.– Indexers that read the literature <strong>and</strong> enter the data. Highquality records but slow <strong>and</strong> expensive.• Literature - large corpus of text published overmany years of research. Accessible but notcomputable.– There are semi-automated methods to extract informationfrom literature. Major area of text-mining research.• Datasets - obtaining data from othercomputational sources (PDB, other databases) orhigh-throughput interaction experiments.


www.pathguide.org


Types of resources• Interaction/pathway databases focus ondifferent types of data.• Species specific - Human, Yeast, Drosophila.• <strong>Interactions</strong> type specific - <strong>Protein</strong>:<strong>Protein</strong>,<strong>Protein</strong>:DNA, <strong>Protein</strong>:Lig<strong>and</strong>.• <strong>Protein</strong> specific - G-coupled receptors, TFbinding sites.• Varying qualities of data, licensing conditions,query options <strong>and</strong> updates.


Data st<strong>and</strong>ards• There are none currently. There areproposed st<strong>and</strong>ards under developmentwith growing user-base.• Most databases have their ownrepresentation of PPI <strong>and</strong> pathways.• Some use formal description languages(XML), database schemas others usesimple text files.


Data Exchange Formats• New XML st<strong>and</strong>ards are available toexchange data between PPI <strong>and</strong> pathwaydatabases.• PSI-MI - Proteomics St<strong>and</strong>ards Initiative(psidev.sourceforge.net). Mostly for PPI,Mass spec <strong>and</strong> proteomics.– Supported by DIP, MINT, IntAct, HPRD, BIND,MIPS– IMEx - International Molecular ExchangeConsortium using PSI-MI.• BioPax - Biological Pathway Exchange(biopax.org). Broader coverage towardspathways <strong>and</strong> networks.– Supported by BioCyc, Reactome, NCIPathwayCommons, INOH,...


CellMap.org


Reactome.org


PATIKA.org


Mapping PPI to a Graph• A simple mapping– compound=node, interaction=edge• A more realistic mapping– Cell localization, cell cycle, cell type, taxonomy– Only represent physiologically relevantinteraction networks• Visualizing large datasets– Collapsing redundancies to a single node– Hierarchical organization (pathways,complexes).


Rual JF, et al. Nature, (2005), 437, 1173-8


Giot L, et al., Science, (2003), 302, 1727-36


Yeast protein complexesKrogan NJ., et al., Nature, (2006), 440, 637-643


Visualization• Cytoscape (www.cytoscape.org)• GenMapp (www.genmapp.org/)(human, mouse, rat, yeast)• Osprey (biodata.mshri.on.ca/osprey)• Patika (www.patika.org)• Generic graph viewers (Pajek,GraphViz)• Static image mappers


Many ways to draw the samethingsLe Novère et al., Nature Biotech. 27, 735 - 741 (2009)


The Systems BiologyGraphical Notation(process diagram, entity relationship diagram <strong>and</strong> activity flow diagram)Le Novère et al., Nature Biotech. 27, 735 - 741 (2009)


Cytoscape• Open source platform for visualizinginteractions <strong>and</strong> pathways.• Integrates expression <strong>and</strong> annotation tools.Can import data directly from data sources (e.g.NCBI, BioMart, IntAct, PathwayCommons).• Supports many input st<strong>and</strong>ards (text, GML, BioPax…).• Lots of layouts <strong>and</strong> visual styles.• Plug-in architecture that allows for addition ofnew functionalities. New plug-ins arecontinuously added.• Collaboration between ISB, UCSD, MSKCC,U of Toronto, Pasteur, Agilent. Very activedevelopment team.


GenMAPP• Application to visualize pathway files.• Overlays expression <strong>and</strong> annotationdata• Can build your own pathways – drawingtools• Contains many Human, mouse, rat,yeast pathways.• Windows only


Osprey


VisAnt


Cerami, et al. PLoS One, (2010), 5,e8918Netbox


Cerami, et al. PLoS One, (2010), 5,e8918


Cerami, et al. PLoS One, (2010), 5,e8918


Reliability of PPI• PPI interactions are identified by variousexperimental techniques with varying degreesof accuracies.• High-throughput interactions are inherently‘noisy’. Large variations betweenexperiments, sparse overlap with knowninteractions.• Many abundant ‘sticky’ proteins (e.gRibosome, chaperones).• Experimental techniques are continuouslyimproving. New datasets are more reliable.• References - vonMering et al., 2002; Sprinzak et al., 2003;Bader et al. 2002; Siato et al., 2003; Jansen et al.; 2003, Denget al.; 2003; Bader et al., 2004; G<strong>and</strong>hi et al.,2006 <strong>and</strong> more…


How to assess reliability?• Repeated observations by multiple experimentsby different groups by different techniques.Limitation: interactions are not likely to bepublished twice, especially by the sametechnique.• Similar annotation - Similar GO Biologicalfunction (with some limitation), similar cellularcompartment.• Correlated gene expression profile.• Genetic interactions - Suppressions, syntheticlethality.• Homologous interaction - homologous interactingin other species.• Network characteristics <strong>and</strong> neighboringinteractions.


Yeast PPI l<strong>and</strong>scape (2002)von Mering C., et al., Nature, 2002, 417,399-403


Human PPI (2006)G<strong>and</strong>hi TKB. et al., Nature Genetics, 2006, 38,285-293


Network motifs• Network motifs - Patterns of connectivity thatappear in high frequencies in certain types ofnetworks.• Network motifs constitute the building blocksof networks. Can optimizefunctionalities=evolutionary selection.• For example: Feed-Forward motifs arecommon in transcriptional networks, enablingthe cells to regulate transcriptional response.


Milo R., et al., Science, 298:824-827 (2002).Network motifs


Milo R., et al., Science, 298:824-827 (2002).


Zhang LV., et al., J Biol. 2005;4(2):6Yeast PPI motifs


Network Connectivity• <strong>Networks</strong> can be characterized byspecific attributes.• Shortest paths, distribution of nodeconnectivity, clustering coefficients (C I= 2n I /k(k-1)), network motifs.• The evolution <strong>and</strong>/or design of networksis represented by these attributes.


Barabasi AL, et al., Nat Rev Genet. 2004 Feb;5(2):101-13Network Models


Predicting pathways <strong>and</strong>interactions• Many of the feature that are used to assessreliability can be used for predictions.• Gene Fusion• Genome co-localization• Phylogenetic profiles• Correlated mutations• Gene expression• Network properties• Domain-Domain correlation


Gene Fusion


Predicting pathways <strong>and</strong>interactions• STRING – string.embl.de– Predicted functional associations – geneneighborhood, phylogenetic profile, genefusion, coexpression experiments, textmining.• VisAnt - VisAnt.bu.edu/• GeneMania – genemania.org– Gene function prediction server.– Integrates an extensive amount of geneinformation, interaction <strong>and</strong> associations.– Flash-based visualization <strong>and</strong> integratedCytoscape web


Domain-Domain <strong>Interactions</strong>• Many PPI predictions are based on identifyinginteracting domain pairs or domain-motifs. (Ref:Deng et al.,2002, Ng et al.,2003, Riley et al., 2005, Sprinzak et al.,2001,Li et al., 2004, Neduva et al., 2005, Wang et al.,2004, Betel et al.,2007)• Based on identifying over representeddomain pairs in interacting proteins orconserved motifs.• To date, only a small number of conservedrecognition motifs have been identified(Scansite, ELM).


Domains <strong>and</strong> protein interactionsPawson T., Nash P., (2003), Science, 300,445-452


SH3 domain


Structure-based PPI predictionQEDYXRYVPXVPQEDYXRLXXLYXPXXFBetel D. et al., (2007) PLoS Computational Biology 3(9): e182


Learning the motifsRhoGAPPPI sources:YVMTVFRhoGAPRhoGAPRhoGAPBINDDIPIntActMINTYVPTVFQEDYCRLCPLYVPTVYYIPSVFRhoGAPRhoGAPYVPTVFRhoGAPWVPTVFQEDGDTLRGL


Gibbs Sampling• Gibbs sampling is a Markov Chain MonteCarlo (MCMC) algorithm that samples froman unknown joint distribution using the knownconditional probabilities.• Commonly used (in bioinformatics) for motifdiscoverysuch as, transcription factorsbinding sites.• Provides a mechanism for incorporating priorknowledge about the length <strong>and</strong> compositionof the motif.


Converting a motif to a profile


Applications of domain bindingprofiles• Predicting new interactions – predictinteracting proteins based on thedomain composition of given proteins.• Evaluate existing interactions – identifyinteractions within affinity pull-downcomplexes.


Verified PredictionsBetel D. et al., (2007) PLoS Computational Biology 3(9): e182

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!