25.03.2013 Views

Gene ontology & hypergeometric test

Gene ontology & hypergeometric test

Gene ontology & hypergeometric test

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Gene</strong> <strong>ontology</strong> &<br />

<strong>hypergeometric</strong> <strong>test</strong><br />

Simon Rasmussen<br />

CBS - DTU


The DNA Microarray Analysis Pipeline<br />

Array design<br />

Probe design<br />

Expression Index<br />

Calculation<br />

Question/hypothesis<br />

Experimental Design<br />

Sample Preparation<br />

Hybridization<br />

Image analysis<br />

Normalization<br />

Comparable<br />

<strong>Gene</strong> Expression Data<br />

Statistical Analysis<br />

Fit to Model (time series)<br />

Buy standard<br />

Chip / Array<br />

Advanced Data Analysis<br />

Clustering PCA <strong>Gene</strong> Annotation Analysis Promoter Analysis<br />

Classification Meta analysis Survival analysis Regulatory Network


<strong>Gene</strong> Ontology<br />

• <strong>Gene</strong> Ontology (GO) is a collection of controlled<br />

vocabularies describing the biology of a gene<br />

product in any organism<br />

• Very useful for interpreting biological function of<br />

microarray data<br />

• Organized in 3 independent sets of ontologies in<br />

a tree structure<br />

– Molecular function (MF), Biological process (BP),<br />

Cellular compartment (CC)


Tree structure<br />

• Controlled networked terms (total ~25.000)<br />

– Parent / child network organized as a tree<br />

– Terms get more detailed as you move down<br />

the network


Relationship<br />

• A gene can be<br />

– present in any of the ontologies (MF / BP /<br />

CC)<br />

– a member of several GO terms<br />

• True path rule<br />

– If a gene is member of a term it is also<br />

member of the terms parents


GO Tree example<br />

•visit www.gene<strong>ontology</strong>.org for more information


KEGG<br />

• KEGG PATHWAYS:<br />

– Manually drawn pathway maps representing our<br />

knowledge on the molecular interaction and reaction<br />

networks, for a large selection of organisms<br />

• 1. Metabolism<br />

• 2. <strong>Gene</strong>tic Information Processing<br />

• 3. Environmental Information Processing<br />

• 4. Cellular Processes<br />

• 5. Human Diseases<br />

• 6. Drug Development<br />

Other pathway database: Reactome


KEGG example


Using <strong>Gene</strong> <strong>ontology</strong><br />

• Input: Any list of genes; from microarray exp.<br />

– Cluster of genes with similar expression<br />

– Up/down regulated genes<br />

• Question we ask:<br />

– Are any GO terms overrepresented in the gene list,<br />

compared to what would happen by chance?<br />

• Method<br />

– Hypergeometric <strong>test</strong>ing


Hypergeometric <strong>test</strong><br />

• The <strong>hypergeometric</strong> distribution arises from<br />

sampling from a fixed population.<br />

20 white balls<br />

out of<br />

100 balls<br />

10 balls<br />

• We want to calculate the probability for drawing 7 or<br />

more white balls out of 10 balls given the<br />

distribution of balls in the urn


Example<br />

• List of 80 significant genes from a microarray<br />

experiment of yeast (~ 6000 genes)<br />

• 10 of the 80 genes are in BP-GO term: DNA replication<br />

– Total nr of yeast genes in GO term is 100<br />

• What is the probability of this occurring by chance?<br />

100 white balls<br />

out of<br />

6000 balls<br />

10 x<br />

70 x<br />

Total 80 balls<br />

p = 6.6 * 10 -7<br />

The GO term DNA replication is overrepresented in our list


Fisher exact <strong>test</strong><br />

• List of 80 significant genes from a microarray experiment of<br />

yeast (~ 6000 genes)<br />

• 10 of the 80 genes are in BP-GO term: DNA replication<br />

– Total nr of yeast genes in GO term is 100<br />

Non-sig.<br />

Signif.<br />

-GO +GO<br />

a b<br />

c d<br />

a+b<br />

c+d<br />

a+c b+d n<br />

Non-sig.<br />

Signif.<br />

Fisher exact <strong>test</strong>:<br />

p-value: 6.6 10 -07<br />

-GO +GO<br />

5830 90<br />

70 10<br />

5920<br />

80<br />

5900 100 6000


Exercise<br />

http://www.cbs.dtu.dk/chipcourse/<br />

Exercises/Ex_GO/GOexercise10.php

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!