Use Case 13 - Genboree
Use Case 13 - Genboree
Use Case 13 - Genboree
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Introduction to the Fourth Epigenome<br />
Informatics Workshop<br />
NIH Roadmap<br />
Epigenomics Data Analysis and Coordination Center<br />
May 17-18, 2012<br />
Houston, Texas
Workshop Objective:<br />
Catalyze Conversion of Epigenomic Profiling Data<br />
into Biological Insights through Integrative Analysis<br />
• Introduction to Epigenomics, Epigenome Informatics<br />
• Methods (Assays, Data Processing)<br />
• Standards (Metadata, Interoperability)<br />
• Data Resources (Human Epigenome Atlas)<br />
• Epigenomic Tools (<strong>Genboree</strong> Workbench <strong>Use</strong> <strong>Case</strong>s)<br />
• Collaborative Opportunities / Networking / Exchange of Experience<br />
2
Thursday, May 17th 2012<br />
Session 1<br />
9:00 – 9:45 am Introduction to the Workshop, Epigenome Analysis & <strong>Genboree</strong> –<br />
Matt Roth<br />
Session 2<br />
9:45 – 10:45 am <strong>Use</strong> case preparation – Setting up projects, databases, groups,<br />
accessing files, user privaleges, navigating <strong>Genboree</strong>, toolsets, submitting jobs,<br />
etc. (BRL staff)<br />
10:45 – 11:00 am Break<br />
11:00 – 1:00 pm Hands-on <strong>Case</strong> Studies: Epigenomic variation between tissues<br />
(Part 1 of 3) (BRL staff)<br />
1:00 – 2:00 pm Lunch<br />
3
Thursday, May 17 th , 2012<br />
Session 3<br />
2:00 – 3:30 pm Hands-on <strong>Case</strong> Studies: Epigenomic variation between<br />
individuals (Part 2 of 3)<br />
3:30 – 5:00 pm Hand-on <strong>Case</strong> Studies: Epigenomic variation in cancer<br />
(Part 3 of 3)<br />
6:00 pm Depart for dinner together (place TBD) or dinner on your own<br />
Friday, May 18 th , 2012<br />
8:30 – 9:00 am Continental breakfast (outside auditorium)<br />
Session 4<br />
9:00 – 9:15 am Review of Day 1 and preview of Day 2 – Matt Roth<br />
4
Friday, May 18 th , 2012<br />
9:15 – 10:00 am Quantitative profiling of histone modifications, peak calling and<br />
segmentation of epigenomic signals, Chip-Seq, RNA-Seq –<br />
Cristi Coarfa (EDACC)<br />
10:00 – 12:00 pm Hands-on <strong>Case</strong> Study: Chip-Seq analysis (BRL staff)<br />
12:00 – 1:00 pm Boxed lunch (outside auditorium)<br />
Session 5<br />
1:00 – 2:30 pm Hands-on <strong>Case</strong> Study: RNA-Seq analysis (BRL staff)<br />
2:30 – 3:00 pm Hands-on <strong>Case</strong> Study: Virtual data integration using <strong>Genboree</strong><br />
(BRL staff)<br />
3:00 – 4:30 pm Hands-on: semi-structured time for completing data<br />
analysis/case studies, informal discussions, and feedback from<br />
users on practical changes that can improve tools (all)<br />
4:30 – 5:00 pm Open discussion and wrap-up<br />
5
Workshop Participants:<br />
Brief introductions to facilitate networking<br />
Your goals in attending workshop<br />
6
Bioinformatics Research Laboratory (BRL)<br />
Govind Kunde, MS<br />
Rob Waterland Lab<br />
Viren Amin<br />
Graduate Student<br />
7
NIH Roadmap Epigenomics:<br />
Reference Epigenomes<br />
and the Human Epigenome Atlas<br />
NIH Roadmap<br />
Epigenomics Data Analysis and Coordination Center (EDACC)<br />
May 17-18, 2012<br />
Houston, Texas
NIH Roadmap Epigenomics Project<br />
Hypothesis:<br />
Origins of health and susceptibility to disease are, in part,<br />
result of epigenetic regulation<br />
Goal:<br />
Transform biomedical research by<br />
• Developing comprehensive reference epigenome maps<br />
• Developing new technologies for epigenomic analyses<br />
Including cyberinfrastructure for epigenomic research 9
NIH Roadmap Epigenomics<br />
Project: Data Flow<br />
Quarterly cumulative releases of the Human Epigenome Atlas<br />
(Human Epigenome Atlas Release 6 completed)
Epigenomics Portal at NCBI
Human Epigenome Browser at Wash U
Epigenomic MetaData Standards<br />
www.ihec-epigenomes.org
Epigenomic Metadata
(2010)<br />
Alan Harris
DNA Methylation<br />
• C5 position of cytosines primarily in CpGs, but also in non-<br />
CpGs primarily in embryonic stem cells<br />
• CpG Islands<br />
• Regulation of cellular processes<br />
– Transcription<br />
– Defense against endogenous retroviruses<br />
– Embryonic development<br />
– X chromosome inactivation<br />
– Imprinting
DNA Methylation Methods<br />
• Bisulfite-based Methods<br />
– MethylC-seq: whole genome shotgun bisulfite sequencing<br />
– RRBS: Reduced Representation Bisulfite Sequencing<br />
• MspI digestion<br />
• Enrichment-based Methods<br />
– MeDIP-seq: Methylated DNA Immunoprecipitation<br />
• 5-methylcytosine antibody<br />
– MBD-seq: Methyl-Binding Domain<br />
• MBD2 protein methyl-CpG binding domain<br />
– MRE-seq: Methylation-sensitive Restriction Enzyme<br />
• Parallel HpaII, Hin6I, and AciI digestion
Chromatin ImmunopreciPitation<br />
Sequencing – ChIP-Seq
Methylation Data Analysis Software<br />
Software Features<br />
BISMARK Supports both single end and pair-end reads.<br />
<strong>Use</strong>s bowtie aligner.<br />
PASH 3.0 Methylation & SNP’s.<br />
<strong>Use</strong>s low memory & High speed alignment<br />
BSMAP Maps both single/pair-end reads.<br />
<strong>Use</strong>s SOAP aligner.<br />
Methylcoder Maps both single/pair-end reads.<br />
Handles also color space reads (SOLiD).<br />
BS-Seq <strong>Use</strong>s Gaussian Mixture model (GMM) to identify the<br />
probability of A vs G vs C vs T.<br />
GMM available only to Arabidopsis genome<br />
BRAT Maps both single/pair-end reads.<br />
Trims low quality bases.<br />
Improves unique mapping for pair-end reads.<br />
Kismeth Web-based tool.<br />
Designed for plant methylation data.
<strong>Genboree</strong> Workbench (cont’d)
Various Data Types<br />
(tracks, files, ROIs, etc)<br />
Tells the tool to use<br />
this data/file<br />
Tells the tool<br />
where to<br />
deposit results
Specific<br />
information on<br />
files/samples<br />
selected in the<br />
“Data Selector”
Epigenome Atlas Release 5<br />
over 1500 experiments<br />
www.epigenomeatlas.org<br />
27
Epigenome Atlas Release 5<br />
www.epigenomeatlas.org<br />
28
Biology Across Key Genetic Elements (promoters, exons, UTR, etc):<br />
Many ROI (i.e. annotation) Tracks Available<br />
ROI tracks available<br />
Promoter 5’ UTR Exon 1 Exon 2 3’ UTR<br />
Introns
ROI track<br />
Data Tracks from Epigenomic Experiments Projected<br />
On To ROI (i.e. annotation) Tracks<br />
Data track<br />
LCP 1 LCP 2 LCP 3<br />
Promoters: LCP = Low-CpG promoters (as defined in Weber et al., Nature Genetics (2007)
Compute Pearson Correlation Coefficient Between Experiments:<br />
Similarity Matrix is Output as Heatmap<br />
Cell line 1 Cell line 2<br />
LCP 1 LCP 2 LCP 3 LCP 1 LCP 2 LCP 3<br />
(2005) RDCT: R: A language and environment for<br />
statistical computing. In. Vienna, Austria: R Foundation<br />
for Statistical Computing; 2005.<br />
Cell line 3 (not shown)<br />
Cell line 2<br />
Cell line 1<br />
Cell line 1<br />
Cell line 2<br />
Cell line 3 (not shown)
Viewing selections<br />
32
Including More Genes in the Same<br />
Pathway<br />
33
The data for several of the workshop <strong>Use</strong><br />
<strong>Case</strong>s was kindly provided by Dr. Jonathan<br />
Mill (King’s College London, UK), and is<br />
under review for this publication:<br />
"Tissue-specific epigenetic variation across brain and blood:<br />
functional annotation of the human brain methylome". Matthew<br />
Davies 1 , Manuela Volta 1 , Abhishek Dixit 1 , Simon Lovestone 1 , Cristian<br />
Coarfa 2 , R. Alan Harris 2 , Aleksandar Milosavljevic 2 , Claire Troakes 1 ,<br />
Safa Al-Sarraj 1 , Richard Dobson 1 , Leonard C. Schalkwyk 1 , Jonathan<br />
Mill 1 *<br />
1 Institute of Psychiatry, King’s College London. UK. 2 Baylor College of Medicine,<br />
Houston, Texas. USA. *Corresponding Author: Dr. Jonathan Mill, Address: Institute of<br />
Psychiatry, SGDP Centre, De Crespigny Park, Denmark Hill, London.<br />
Since the paper is under review (Davies et al), it can not be shared with anyone outside<br />
of the workshop at this time, and we ask that you consider the data confidential. We will<br />
notify you when the data can be shared. Thank you for your understanding.
<strong>Use</strong> <strong>Case</strong> 1: Genomewide Patterns of Methylation can Distinguish Between Blood, Cerebellum, and Cortex<br />
Epigenomic Tracks:<br />
-Blood<br />
-Cerebellum<br />
-Cortex<br />
<strong>Use</strong> <strong>Case</strong> 2: Breast Cell Types Cluster Based on Their MeDIP-seq Profiles (Epigenome Atlas and UCSF REMC data)<br />
Epigenomic Tracks:<br />
-Breast Luminal Epithelium<br />
-Breast Myoepithelial<br />
-Breast Stem Cell<br />
ROIs<br />
ROIs<br />
Data from Davies et al (in review)<br />
36
<strong>Use</strong> <strong>Case</strong> 5: Methylation of some features discriminate tissue type better than others<br />
Similar to <strong>Use</strong> <strong>Case</strong> 1 & 2 but uses different ROIs to illustrate how different features produce different<br />
similarity matrices (heatmaps).<br />
Epigenomic Tracks<br />
-Blood<br />
-Cerebellum<br />
-Cortex<br />
ROIs<br />
Data from Davies et al (in review)<br />
37
<strong>Use</strong> <strong>Case</strong> 9: Coordinated Changes of Epigenomic Marks Across Tissue Types<br />
Epigenomic Tracks:<br />
-H1 cell line<br />
-IMR90 cell line<br />
Collates score tracks into one data matrix, export to Excel<br />
Column headers = experiments<br />
Rows = ROIs<br />
Data from Davies et al (in review)<br />
Scatter plots<br />
38
<strong>Use</strong> <strong>Case</strong> 12: Assess breast cancer cell type of origin<br />
“Your” Epigenomic Tracks:<br />
-Breast Luminal Epithelium<br />
-Breast Myoepithelial<br />
-Breast Stem Cell<br />
Public Epigenomic Tracks:<br />
-Breast Luminal Epithelium<br />
-Breast Myoepithelial<br />
-Breast Stem Cell<br />
Data from Davies et al (in review)<br />
LIMMA: Smyth, G. K. Statistical Applications in Genetics and Molecular Biology (2005)<br />
39
<strong>Use</strong> <strong>Case</strong> <strong>13</strong>: Analysis of epigenomic variation in breast tumors<br />
<strong>Use</strong> <strong>Case</strong> <strong>13</strong>a: Cluster all 16 breast tissue samples<br />
16 450 K Samples (Dedeurwaerder, S.et al., 2011)<br />
-8 normal breast samples<br />
-8 cancerous breast samples<br />
ROIs (HCP vs LCP)<br />
Data from Davies et al (in review)<br />
LCP<br />
HCP<br />
Normal<br />
Tumor<br />
Normal<br />
Tumor<br />
40
<strong>Use</strong> <strong>Case</strong> <strong>13</strong>: Analysis of epigenomic variation in breast tumors<br />
<strong>Use</strong> <strong>Case</strong> <strong>13</strong>b: Compare 450K profiles (8 tumor, 8 normal) against<br />
reference epigenomes from the Epigenome Atlas<br />
16 450 K Samples (Dedeurwaerder, S.et al.,2011)<br />
-8 normal breast samples<br />
-8 cancerous breast samples<br />
ROIs<br />
Data from Davies et al (in review)<br />
LCP<br />
HCP<br />
41
<strong>Use</strong> <strong>Case</strong> <strong>13</strong>: Analysis of epigenomic variation in breast tumors<br />
<strong>Use</strong> <strong>Case</strong> <strong>13</strong>c: Since most breast tumor samples appear to contain excess of blood & immune<br />
cells, comparison of normal and tumor tissue may reveal differentially methylated<br />
genes (and corresponding pathways). Identify differentially methylated probes,<br />
genes, and pathways using LIMMA & online resources<br />
16 450 K Samples (Dedeurwaerder)<br />
-8 normal breast samples<br />
-8 cancerous breast samples<br />
ROIs<br />
Gene List<br />
Data from Davies et al (in review)<br />
42
BED<br />
files<br />
<strong>Use</strong> <strong>Case</strong> 14: Chip-Seq and RNA-Seq Data Analysis<br />
MACS results (file in <strong>Genboree</strong>) Visualize in <strong>Genboree</strong><br />
H3K4me3, H3K4me1<br />
Data from Davies et al (in review)<br />
Zhang et al, Genome Biology (2008)<br />
43
FASTQ,<br />
BAM<br />
files<br />
<strong>Use</strong> <strong>Case</strong> 14: Chip-Seq and RNA-Seq Data Analysis<br />
Gene expression diffs (file in <strong>Genboree</strong>)<br />
Data from Davies et al (in review)<br />
Trapnell et al, Bioinformatics (2009)<br />
Trapnell et al, Nature Biotech (2010)<br />
Trapnell et al, Nature Biotech (2010)<br />
Visualization/pathway analysis<br />
44
Workshop Evaluation (link)<br />
Workshop<br />
?<br />
Epigenomic Profiling Biological Insights<br />
Integrative Analysis<br />
• Introduction to Workshop, Epigenome Informatics, <strong>Genboree</strong><br />
• Methods (Assays, Data Processing)<br />
• Standards (Metadata, Interoperability)<br />
• Data Resources (Human Epigenome Atlas)<br />
• Tools (Epigenomic Toolset, <strong>Genboree</strong> Workbench)<br />
• <strong>Use</strong> <strong>Case</strong>s / <strong>Case</strong> Studies<br />
• Collaborative Opportunities / Networking / Exchange of Experience<br />
45
Acknowledgments<br />
• BRL<br />
– Aleksandar Milosavljevic<br />
– Cristian Coarfa, Alan R Harris<br />
– Elke Norwig-Eastaugh<br />
– Viren Amin<br />
• BRL core<br />
– Matt Roth, Kevin Riehle<br />
• <strong>Genboree</strong> programming team<br />
– Andrew Jackson, Sameer Paithankar, Sriram Raghuram<br />
• Former BRL members<br />
– Chia-Chin Wu, Arpit Tandon<br />
• Govind Kunde (Waterland lab)<br />
• Peak Calling<br />
– Anshul Kundaje<br />
– Bob Thurman (UW)<br />
– Noam Shoresh (BI)<br />
– Martin Hirst (BCGSC)<br />
– Lee Daniels (NIH)<br />
– Wei Li (BCM)<br />
• RNA-Seq<br />
– Adrian Lee, Joe Gray<br />
– Laising Yen, Kalpana Kannan<br />
– Sean McGuire, Christine Pichot