Gene ontology & hypergeometric test
Gene ontology & hypergeometric test
Gene ontology & hypergeometric test
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Gene</strong> <strong>ontology</strong> &<br />
<strong>hypergeometric</strong> <strong>test</strong><br />
Simon Rasmussen<br />
CBS - DTU
The DNA Microarray Analysis Pipeline<br />
Array design<br />
Probe design<br />
Expression Index<br />
Calculation<br />
Question/hypothesis<br />
Experimental Design<br />
Sample Preparation<br />
Hybridization<br />
Image analysis<br />
Normalization<br />
Comparable<br />
<strong>Gene</strong> Expression Data<br />
Statistical Analysis<br />
Fit to Model (time series)<br />
Buy standard<br />
Chip / Array<br />
Advanced Data Analysis<br />
Clustering PCA <strong>Gene</strong> Annotation Analysis Promoter Analysis<br />
Classification Meta analysis Survival analysis Regulatory Network
<strong>Gene</strong> Ontology<br />
• <strong>Gene</strong> Ontology (GO) is a collection of controlled<br />
vocabularies describing the biology of a gene<br />
product in any organism<br />
• Very useful for interpreting biological function of<br />
microarray data<br />
• Organized in 3 independent sets of ontologies in<br />
a tree structure<br />
– Molecular function (MF), Biological process (BP),<br />
Cellular compartment (CC)
Tree structure<br />
• Controlled networked terms (total ~25.000)<br />
– Parent / child network organized as a tree<br />
– Terms get more detailed as you move down<br />
the network
Relationship<br />
• A gene can be<br />
– present in any of the ontologies (MF / BP /<br />
CC)<br />
– a member of several GO terms<br />
• True path rule<br />
– If a gene is member of a term it is also<br />
member of the terms parents
GO Tree example<br />
•visit www.gene<strong>ontology</strong>.org for more information
KEGG<br />
• KEGG PATHWAYS:<br />
– Manually drawn pathway maps representing our<br />
knowledge on the molecular interaction and reaction<br />
networks, for a large selection of organisms<br />
• 1. Metabolism<br />
• 2. <strong>Gene</strong>tic Information Processing<br />
• 3. Environmental Information Processing<br />
• 4. Cellular Processes<br />
• 5. Human Diseases<br />
• 6. Drug Development<br />
Other pathway database: Reactome
KEGG example
Using <strong>Gene</strong> <strong>ontology</strong><br />
• Input: Any list of genes; from microarray exp.<br />
– Cluster of genes with similar expression<br />
– Up/down regulated genes<br />
• Question we ask:<br />
– Are any GO terms overrepresented in the gene list,<br />
compared to what would happen by chance?<br />
• Method<br />
– Hypergeometric <strong>test</strong>ing
Hypergeometric <strong>test</strong><br />
• The <strong>hypergeometric</strong> distribution arises from<br />
sampling from a fixed population.<br />
20 white balls<br />
out of<br />
100 balls<br />
10 balls<br />
• We want to calculate the probability for drawing 7 or<br />
more white balls out of 10 balls given the<br />
distribution of balls in the urn
Example<br />
• List of 80 significant genes from a microarray<br />
experiment of yeast (~ 6000 genes)<br />
• 10 of the 80 genes are in BP-GO term: DNA replication<br />
– Total nr of yeast genes in GO term is 100<br />
• What is the probability of this occurring by chance?<br />
100 white balls<br />
out of<br />
6000 balls<br />
10 x<br />
70 x<br />
Total 80 balls<br />
p = 6.6 * 10 -7<br />
The GO term DNA replication is overrepresented in our list
Fisher exact <strong>test</strong><br />
• List of 80 significant genes from a microarray experiment of<br />
yeast (~ 6000 genes)<br />
• 10 of the 80 genes are in BP-GO term: DNA replication<br />
– Total nr of yeast genes in GO term is 100<br />
Non-sig.<br />
Signif.<br />
-GO +GO<br />
a b<br />
c d<br />
a+b<br />
c+d<br />
a+c b+d n<br />
Non-sig.<br />
Signif.<br />
Fisher exact <strong>test</strong>:<br />
p-value: 6.6 10 -07<br />
-GO +GO<br />
5830 90<br />
70 10<br />
5920<br />
80<br />
5900 100 6000
Exercise<br />
http://www.cbs.dtu.dk/chipcourse/<br />
Exercises/Ex_GO/GOexercise10.php