30.11.2012 Views

Use Case 13 - Genboree

Use Case 13 - Genboree

Use Case 13 - Genboree

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introduction to the Fourth Epigenome<br />

Informatics Workshop<br />

NIH Roadmap<br />

Epigenomics Data Analysis and Coordination Center<br />

May 17-18, 2012<br />

Houston, Texas


Workshop Objective:<br />

Catalyze Conversion of Epigenomic Profiling Data<br />

into Biological Insights through Integrative Analysis<br />

• Introduction to Epigenomics, Epigenome Informatics<br />

• Methods (Assays, Data Processing)<br />

• Standards (Metadata, Interoperability)<br />

• Data Resources (Human Epigenome Atlas)<br />

• Epigenomic Tools (<strong>Genboree</strong> Workbench <strong>Use</strong> <strong>Case</strong>s)<br />

• Collaborative Opportunities / Networking / Exchange of Experience<br />

2


Thursday, May 17th 2012<br />

Session 1<br />

9:00 – 9:45 am Introduction to the Workshop, Epigenome Analysis & <strong>Genboree</strong> –<br />

Matt Roth<br />

Session 2<br />

9:45 – 10:45 am <strong>Use</strong> case preparation – Setting up projects, databases, groups,<br />

accessing files, user privaleges, navigating <strong>Genboree</strong>, toolsets, submitting jobs,<br />

etc. (BRL staff)<br />

10:45 – 11:00 am Break<br />

11:00 – 1:00 pm Hands-on <strong>Case</strong> Studies: Epigenomic variation between tissues<br />

(Part 1 of 3) (BRL staff)<br />

1:00 – 2:00 pm Lunch<br />

3


Thursday, May 17 th , 2012<br />

Session 3<br />

2:00 – 3:30 pm Hands-on <strong>Case</strong> Studies: Epigenomic variation between<br />

individuals (Part 2 of 3)<br />

3:30 – 5:00 pm Hand-on <strong>Case</strong> Studies: Epigenomic variation in cancer<br />

(Part 3 of 3)<br />

6:00 pm Depart for dinner together (place TBD) or dinner on your own<br />

Friday, May 18 th , 2012<br />

8:30 – 9:00 am Continental breakfast (outside auditorium)<br />

Session 4<br />

9:00 – 9:15 am Review of Day 1 and preview of Day 2 – Matt Roth<br />

4


Friday, May 18 th , 2012<br />

9:15 – 10:00 am Quantitative profiling of histone modifications, peak calling and<br />

segmentation of epigenomic signals, Chip-Seq, RNA-Seq –<br />

Cristi Coarfa (EDACC)<br />

10:00 – 12:00 pm Hands-on <strong>Case</strong> Study: Chip-Seq analysis (BRL staff)<br />

12:00 – 1:00 pm Boxed lunch (outside auditorium)<br />

Session 5<br />

1:00 – 2:30 pm Hands-on <strong>Case</strong> Study: RNA-Seq analysis (BRL staff)<br />

2:30 – 3:00 pm Hands-on <strong>Case</strong> Study: Virtual data integration using <strong>Genboree</strong><br />

(BRL staff)<br />

3:00 – 4:30 pm Hands-on: semi-structured time for completing data<br />

analysis/case studies, informal discussions, and feedback from<br />

users on practical changes that can improve tools (all)<br />

4:30 – 5:00 pm Open discussion and wrap-up<br />

5


Workshop Participants:<br />

Brief introductions to facilitate networking<br />

Your goals in attending workshop<br />

6


Bioinformatics Research Laboratory (BRL)<br />

Govind Kunde, MS<br />

Rob Waterland Lab<br />

Viren Amin<br />

Graduate Student<br />

7


NIH Roadmap Epigenomics:<br />

Reference Epigenomes<br />

and the Human Epigenome Atlas<br />

NIH Roadmap<br />

Epigenomics Data Analysis and Coordination Center (EDACC)<br />

May 17-18, 2012<br />

Houston, Texas


NIH Roadmap Epigenomics Project<br />

Hypothesis:<br />

Origins of health and susceptibility to disease are, in part,<br />

result of epigenetic regulation<br />

Goal:<br />

Transform biomedical research by<br />

• Developing comprehensive reference epigenome maps<br />

• Developing new technologies for epigenomic analyses<br />

Including cyberinfrastructure for epigenomic research 9


NIH Roadmap Epigenomics<br />

Project: Data Flow<br />

Quarterly cumulative releases of the Human Epigenome Atlas<br />

(Human Epigenome Atlas Release 6 completed)


Epigenomics Portal at NCBI


Human Epigenome Browser at Wash U


Epigenomic MetaData Standards<br />

www.ihec-epigenomes.org


Epigenomic Metadata


(2010)<br />

Alan Harris


DNA Methylation<br />

• C5 position of cytosines primarily in CpGs, but also in non-<br />

CpGs primarily in embryonic stem cells<br />

• CpG Islands<br />

• Regulation of cellular processes<br />

– Transcription<br />

– Defense against endogenous retroviruses<br />

– Embryonic development<br />

– X chromosome inactivation<br />

– Imprinting


DNA Methylation Methods<br />

• Bisulfite-based Methods<br />

– MethylC-seq: whole genome shotgun bisulfite sequencing<br />

– RRBS: Reduced Representation Bisulfite Sequencing<br />

• MspI digestion<br />

• Enrichment-based Methods<br />

– MeDIP-seq: Methylated DNA Immunoprecipitation<br />

• 5-methylcytosine antibody<br />

– MBD-seq: Methyl-Binding Domain<br />

• MBD2 protein methyl-CpG binding domain<br />

– MRE-seq: Methylation-sensitive Restriction Enzyme<br />

• Parallel HpaII, Hin6I, and AciI digestion


Chromatin ImmunopreciPitation<br />

Sequencing – ChIP-Seq


Methylation Data Analysis Software<br />

Software Features<br />

BISMARK Supports both single end and pair-end reads.<br />

<strong>Use</strong>s bowtie aligner.<br />

PASH 3.0 Methylation & SNP’s.<br />

<strong>Use</strong>s low memory & High speed alignment<br />

BSMAP Maps both single/pair-end reads.<br />

<strong>Use</strong>s SOAP aligner.<br />

Methylcoder Maps both single/pair-end reads.<br />

Handles also color space reads (SOLiD).<br />

BS-Seq <strong>Use</strong>s Gaussian Mixture model (GMM) to identify the<br />

probability of A vs G vs C vs T.<br />

GMM available only to Arabidopsis genome<br />

BRAT Maps both single/pair-end reads.<br />

Trims low quality bases.<br />

Improves unique mapping for pair-end reads.<br />

Kismeth Web-based tool.<br />

Designed for plant methylation data.


<strong>Genboree</strong> Workbench (cont’d)


Various Data Types<br />

(tracks, files, ROIs, etc)<br />

Tells the tool to use<br />

this data/file<br />

Tells the tool<br />

where to<br />

deposit results


Specific<br />

information on<br />

files/samples<br />

selected in the<br />

“Data Selector”


Epigenome Atlas Release 5<br />

over 1500 experiments<br />

www.epigenomeatlas.org<br />

27


Epigenome Atlas Release 5<br />

www.epigenomeatlas.org<br />

28


Biology Across Key Genetic Elements (promoters, exons, UTR, etc):<br />

Many ROI (i.e. annotation) Tracks Available<br />

ROI tracks available<br />

Promoter 5’ UTR Exon 1 Exon 2 3’ UTR<br />

Introns


ROI track<br />

Data Tracks from Epigenomic Experiments Projected<br />

On To ROI (i.e. annotation) Tracks<br />

Data track<br />

LCP 1 LCP 2 LCP 3<br />

Promoters: LCP = Low-CpG promoters (as defined in Weber et al., Nature Genetics (2007)


Compute Pearson Correlation Coefficient Between Experiments:<br />

Similarity Matrix is Output as Heatmap<br />

Cell line 1 Cell line 2<br />

LCP 1 LCP 2 LCP 3 LCP 1 LCP 2 LCP 3<br />

(2005) RDCT: R: A language and environment for<br />

statistical computing. In. Vienna, Austria: R Foundation<br />

for Statistical Computing; 2005.<br />

Cell line 3 (not shown)<br />

Cell line 2<br />

Cell line 1<br />

Cell line 1<br />

Cell line 2<br />

Cell line 3 (not shown)


Viewing selections<br />

32


Including More Genes in the Same<br />

Pathway<br />

33


The data for several of the workshop <strong>Use</strong><br />

<strong>Case</strong>s was kindly provided by Dr. Jonathan<br />

Mill (King’s College London, UK), and is<br />

under review for this publication:<br />

"Tissue-specific epigenetic variation across brain and blood:<br />

functional annotation of the human brain methylome". Matthew<br />

Davies 1 , Manuela Volta 1 , Abhishek Dixit 1 , Simon Lovestone 1 , Cristian<br />

Coarfa 2 , R. Alan Harris 2 , Aleksandar Milosavljevic 2 , Claire Troakes 1 ,<br />

Safa Al-Sarraj 1 , Richard Dobson 1 , Leonard C. Schalkwyk 1 , Jonathan<br />

Mill 1 *<br />

1 Institute of Psychiatry, King’s College London. UK. 2 Baylor College of Medicine,<br />

Houston, Texas. USA. *Corresponding Author: Dr. Jonathan Mill, Address: Institute of<br />

Psychiatry, SGDP Centre, De Crespigny Park, Denmark Hill, London.<br />

Since the paper is under review (Davies et al), it can not be shared with anyone outside<br />

of the workshop at this time, and we ask that you consider the data confidential. We will<br />

notify you when the data can be shared. Thank you for your understanding.


<strong>Use</strong> <strong>Case</strong> 1: Genomewide Patterns of Methylation can Distinguish Between Blood, Cerebellum, and Cortex<br />

Epigenomic Tracks:<br />

-Blood<br />

-Cerebellum<br />

-Cortex<br />

<strong>Use</strong> <strong>Case</strong> 2: Breast Cell Types Cluster Based on Their MeDIP-seq Profiles (Epigenome Atlas and UCSF REMC data)<br />

Epigenomic Tracks:<br />

-Breast Luminal Epithelium<br />

-Breast Myoepithelial<br />

-Breast Stem Cell<br />

ROIs<br />

ROIs<br />

Data from Davies et al (in review)<br />

36


<strong>Use</strong> <strong>Case</strong> 5: Methylation of some features discriminate tissue type better than others<br />

Similar to <strong>Use</strong> <strong>Case</strong> 1 & 2 but uses different ROIs to illustrate how different features produce different<br />

similarity matrices (heatmaps).<br />

Epigenomic Tracks<br />

-Blood<br />

-Cerebellum<br />

-Cortex<br />

ROIs<br />

Data from Davies et al (in review)<br />

37


<strong>Use</strong> <strong>Case</strong> 9: Coordinated Changes of Epigenomic Marks Across Tissue Types<br />

Epigenomic Tracks:<br />

-H1 cell line<br />

-IMR90 cell line<br />

Collates score tracks into one data matrix, export to Excel<br />

Column headers = experiments<br />

Rows = ROIs<br />

Data from Davies et al (in review)<br />

Scatter plots<br />

38


<strong>Use</strong> <strong>Case</strong> 12: Assess breast cancer cell type of origin<br />

“Your” Epigenomic Tracks:<br />

-Breast Luminal Epithelium<br />

-Breast Myoepithelial<br />

-Breast Stem Cell<br />

Public Epigenomic Tracks:<br />

-Breast Luminal Epithelium<br />

-Breast Myoepithelial<br />

-Breast Stem Cell<br />

Data from Davies et al (in review)<br />

LIMMA: Smyth, G. K. Statistical Applications in Genetics and Molecular Biology (2005)<br />

39


<strong>Use</strong> <strong>Case</strong> <strong>13</strong>: Analysis of epigenomic variation in breast tumors<br />

<strong>Use</strong> <strong>Case</strong> <strong>13</strong>a: Cluster all 16 breast tissue samples<br />

16 450 K Samples (Dedeurwaerder, S.et al., 2011)<br />

-8 normal breast samples<br />

-8 cancerous breast samples<br />

ROIs (HCP vs LCP)<br />

Data from Davies et al (in review)<br />

LCP<br />

HCP<br />

Normal<br />

Tumor<br />

Normal<br />

Tumor<br />

40


<strong>Use</strong> <strong>Case</strong> <strong>13</strong>: Analysis of epigenomic variation in breast tumors<br />

<strong>Use</strong> <strong>Case</strong> <strong>13</strong>b: Compare 450K profiles (8 tumor, 8 normal) against<br />

reference epigenomes from the Epigenome Atlas<br />

16 450 K Samples (Dedeurwaerder, S.et al.,2011)<br />

-8 normal breast samples<br />

-8 cancerous breast samples<br />

ROIs<br />

Data from Davies et al (in review)<br />

LCP<br />

HCP<br />

41


<strong>Use</strong> <strong>Case</strong> <strong>13</strong>: Analysis of epigenomic variation in breast tumors<br />

<strong>Use</strong> <strong>Case</strong> <strong>13</strong>c: Since most breast tumor samples appear to contain excess of blood & immune<br />

cells, comparison of normal and tumor tissue may reveal differentially methylated<br />

genes (and corresponding pathways). Identify differentially methylated probes,<br />

genes, and pathways using LIMMA & online resources<br />

16 450 K Samples (Dedeurwaerder)<br />

-8 normal breast samples<br />

-8 cancerous breast samples<br />

ROIs<br />

Gene List<br />

Data from Davies et al (in review)<br />

42


BED<br />

files<br />

<strong>Use</strong> <strong>Case</strong> 14: Chip-Seq and RNA-Seq Data Analysis<br />

MACS results (file in <strong>Genboree</strong>) Visualize in <strong>Genboree</strong><br />

H3K4me3, H3K4me1<br />

Data from Davies et al (in review)<br />

Zhang et al, Genome Biology (2008)<br />

43


FASTQ,<br />

BAM<br />

files<br />

<strong>Use</strong> <strong>Case</strong> 14: Chip-Seq and RNA-Seq Data Analysis<br />

Gene expression diffs (file in <strong>Genboree</strong>)<br />

Data from Davies et al (in review)<br />

Trapnell et al, Bioinformatics (2009)<br />

Trapnell et al, Nature Biotech (2010)<br />

Trapnell et al, Nature Biotech (2010)<br />

Visualization/pathway analysis<br />

44


Workshop Evaluation (link)<br />

Workshop<br />

?<br />

Epigenomic Profiling Biological Insights<br />

Integrative Analysis<br />

• Introduction to Workshop, Epigenome Informatics, <strong>Genboree</strong><br />

• Methods (Assays, Data Processing)<br />

• Standards (Metadata, Interoperability)<br />

• Data Resources (Human Epigenome Atlas)<br />

• Tools (Epigenomic Toolset, <strong>Genboree</strong> Workbench)<br />

• <strong>Use</strong> <strong>Case</strong>s / <strong>Case</strong> Studies<br />

• Collaborative Opportunities / Networking / Exchange of Experience<br />

45


Acknowledgments<br />

• BRL<br />

– Aleksandar Milosavljevic<br />

– Cristian Coarfa, Alan R Harris<br />

– Elke Norwig-Eastaugh<br />

– Viren Amin<br />

• BRL core<br />

– Matt Roth, Kevin Riehle<br />

• <strong>Genboree</strong> programming team<br />

– Andrew Jackson, Sameer Paithankar, Sriram Raghuram<br />

• Former BRL members<br />

– Chia-Chin Wu, Arpit Tandon<br />

• Govind Kunde (Waterland lab)<br />

• Peak Calling<br />

– Anshul Kundaje<br />

– Bob Thurman (UW)<br />

– Noam Shoresh (BI)<br />

– Martin Hirst (BCGSC)<br />

– Lee Daniels (NIH)<br />

– Wei Li (BCM)<br />

• RNA-Seq<br />

– Adrian Lee, Joe Gray<br />

– Laising Yen, Kalpana Kannan<br />

– Sean McGuire, Christine Pichot

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!