01.01.2015 Views

PLoS Genetics

PLoS Genetics

PLoS Genetics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Elucidating Transcriptional Regulation at<br />

Multiple Scales Using High-Throughput<br />

Sequencing, Data Integration, and<br />

Computational Methods<br />

Raymond Auerbach<br />

PhD Candidate, Yale University<br />

Gerstein and Snyder Labs<br />

August 30, 2012<br />

1


Outline<br />

Background<br />

Transcriptional Regulation, ENCODE, ChIP-Seq<br />

Selected Projects from my PhD work<br />

Understanding the technical aspects of ChIP-Seq scoring<br />

and how choice of reference sample matters<br />

Using high-throughput sequencing to gain a genome-wide<br />

view of chromatin remodeling (SWI/SNF complex)<br />

Understanding the effects of long-range interactions and<br />

genome folding on transcription<br />

CAPE: a tool to classify features by RNAPII binding and<br />

gene expression<br />

2


g<br />

DNA folding<br />

Transcriptional Regulation: A Cartoon View<br />

TF combinations<br />

DNA folding<br />

Site-specific<br />

binding<br />

Holstege and Young, PNAS, 1999<br />

Histone modifications<br />

Chromatin remodeling<br />

3 Credit: Adam Steinberg


ENCODE Data Description<br />

The ENCODE<br />

Project Consortium,<br />

<strong>PLoS</strong> Biology, 2011<br />

4


ChIP-Seq<br />

5


Early ChIP-Seq Questions<br />

How should peaks be identified<br />

Which peaks are significant<br />

The ChIP peaks seem obvious, so why not score<br />

against randomized background<br />

Are controls or references needed What biases are<br />

present<br />

Does the peak calling need to be tuned for different<br />

factors and/or organisms<br />

6


Highlights from Key Papers<br />

Nature Biotechnology, 2009<br />

PNAS, 2009<br />

7


First surprise: Input DNA has structure<br />

Input DNA profile shows peaks itself and is not “flat”<br />

The Pol2 antibody is exceptional. For ChIP with a “typical” antibody,<br />

the input DNA peaks could affect the ability to call significant peaks<br />

8


Origin of Input DNA from Nuclear Lysate<br />

1. Reverse cross-links<br />

2. Phenol-chloroform<br />

extract<br />

3. Purify DNA<br />

4. Size select DNA<br />

5. Ligate Illumina<br />

adapters<br />

What happens if we change some of these<br />

variables<br />

9


Hypothesis and Strategy<br />

Initial hypothesis: Input DNA peaks will be<br />

highest in regions of open chromatin<br />

Input DNA peaks also seen in other genomes<br />

Strategy<br />

Using ChIP-Seq experiments in HeLa S3 and<br />

yeast, score all tracks and aggregate signal over<br />

interesting features<br />

10


Reference Types We Examined<br />

Input DNA<br />

ChIP DNA that is not IP’ed with an antibody<br />

MNase-digested DNA<br />

Use MNase to cleave DNA instead of sonication<br />

IgG (non-specific antibody)<br />

ChIP DNA IP’ed with a non-specific antibody<br />

Naked DNA<br />

Sonicated DNA. Not crosslinked or IP’ed. Proteins removed.<br />

11


Both Size Selection and Crosslinking are<br />

Necessary<br />

Auerbach and Euskirchen et al., PNAS, 2009<br />

12


Aggregation Plot<br />

Expressed Genes (TSS)<br />

Pol II<br />

Input DNA 100-350 bp<br />

Naked DNA<br />

Input DNA 350-500 bp<br />

IgG<br />

Mappability<br />

MNase<br />

13<br />

Input DNA enriched 4x<br />

over background!


Regions Associated with Active Transcription<br />

Input DNA 100-350 bp<br />

Input DNA 350-500 bp<br />

Auerbach and Euskirchen, et al. PNAS, 2009.<br />

14


Regions Associated with Transcriptional<br />

Inactivity<br />

Input DNA 100-350 bp<br />

Input DNA 350-500 bp<br />

Auerbach and Euskirchen, et al. PNAS, 2009.<br />

15


What are the peaks<br />

16


Summary and Bioinformatics Contributions<br />

First comprehensive analysis of ChIP-Seq reference<br />

DNAs on peak scoring<br />

Led to the choice of a preferred reference by our lab<br />

for ENCODE Consortium work (IgG)<br />

Integration of various data sets with reference DNA<br />

types to gain a greater understanding of scoring biases<br />

Useful for detecting accessible chromatin regions,<br />

particularly as a first pass<br />

17


Generalized Peak Caller<br />

18


Considerations with Early Peak Callers<br />

Usually designed around ChIP with an ideal<br />

antibody<br />

Also usually targeted toward one organism<br />

Default parameters typically arise from choices of<br />

the experimental collaborator<br />

How do peak callers work with more typical<br />

antibodies How about with members of a protein<br />

complex<br />

19


ChIP-Seq of a Large Chromatin<br />

Remodeling Complex (SWI/SNF)<br />

Paper: Euskirchen and Auerbach, et al., <strong>PLoS</strong> <strong>Genetics</strong>,<br />

2011<br />

20


Chromatin Remodeling: Why You Should Care<br />

Can change whether a region is accessible to<br />

TFs and other proteins<br />

Quick way to regulate regions that are actively<br />

transcribed<br />

Zofall et al., Nature Structural & Molecular Biology, 2006<br />

21


Chromatin Remodelers and Epigenetics<br />

de la Serna et al., Nature Reviews <strong>Genetics</strong>, 2006<br />

22


Role in Cancer<br />

SWI/SNF subunit Cancer Mutation Type Reference<br />

Ini1 malignant rhabdoid tumors truncating mutations<br />

BAF250A/ARID1A<br />

BAF250A/ARID1A<br />

ovarian clear cell carcinomas<br />

transitional cell carcinoma of the<br />

bladder<br />

somatically acquired, inactivating<br />

mutations<br />

(1998) Nature 394: 203; (2006)<br />

Mod. Pathol. 19: 717<br />

(2010) Science 330: 228; (2010) N.<br />

Engl. J. Med. 363:1532<br />

somatic, non-silent mutations (2011) Nat. Genet. 43: 875<br />

BAF200<br />

hepatitis C virus-associated<br />

hepatocellular carcinomas<br />

somatic, inactivating mutations (2011) Nat. Genet. 43: 828<br />

BAF180 clear cell renal carcinomas somatic, inactivating mutations (2011) Nature 469: 539<br />

Brg1 & Brm<br />

Brg1<br />

BAF250A/ARID1, Brg1 &<br />

BAF180<br />

non-small cell lung carcinomas<br />

lung cancer cell lines, esp. nonsmall<br />

cell lung cancers<br />

pancreatic cancers<br />

23<br />

unknown; based on negative<br />

staining of tissue<br />

(2003) Cancer Res. 63: 560<br />

inactivating mutations (2008) Hum. Mutat. 29: 617<br />

various (nonsense, missense, indel,<br />

frameshift, rearrangement, splice<br />

site)<br />

Brd7 breast cancer multi-gene deletion<br />

(2012) PNAS 109: E252<br />

(2010) Nature Cell Biol. 12,<br />

380-389


SWI/SNF Has 288 Subunit Combinations!<br />

ARID<br />

(1a or 1b or 2)<br />

* *<br />

* *<br />

24


Project Overview<br />

Analysis Questions<br />

Where does SWI/SNF bind and in what configurations<br />

What other elements are associated with SWI/SNF binding<br />

sites<br />

Functional implications (pathway analysis, etc.)<br />

Experimental Procedure<br />

ChIP-Seq against Brg1, BAF155, BAF170, and Ini1 in HeLa<br />

S3 cells<br />

Mass spectrometry to inventory co-immunoprecipitating<br />

proteins<br />

25


Features We Integrated<br />

Feature Platform Source<br />

Ini1 Sequencing Euskirchen and Auerbach et al., 2011<br />

Brg1 Sequencing Euskirchen and Auerbach et al., 2011<br />

BAF155 Sequencing Euskirchen and Auerbach et al., 2011<br />

BAF170 Sequencing Euskirchen and Auerbach et al., 2011<br />

RNA Polymerase II Sequencing Rozowsky et al., 2009<br />

IgG Control Sequencing Auerbach and Euskirchen et al., 2009<br />

Lamin A/C Array Euskirchen and Auerbach et al., 2011<br />

Lamin B Array Euskirchen and Auerbach et al., 2011<br />

H3K27me3 Sequencing Cuddapah et al., 2009<br />

CTCF Sequencing Cuddapah et al., 2009<br />

Predicted enhancers Array Heintzman et al., 2009<br />

RNA Polymerase III Sequencing Oler et al., 2010; Barski et al., 2010<br />

RNA-Seq Sequencing Morin et al., 2008<br />

Non-canonical small RNAs<br />

Sequencing<br />

26<br />

Affymetrix and CSHL ENCODE<br />

Transcription Project, 2009<br />

DNA replication origins Array Cadoret et al., 2008


How to Combine Data<br />

27


Subunit Breakdown from ChIP-Seq<br />

Subunit<br />

Number in<br />

49,555 union<br />

regions<br />

Ini1 24,478 (49%)<br />

BAF155 37,921 (77%)<br />

BAF170 25,433 (51%)<br />

Brg1 12,317 (25%)<br />

SWI/SNF Subunit Combinations<br />

Total Observed<br />

SWI/SNF high-confidence union set 49,555<br />

Two or more subunits 30,310<br />

Three or more subunits 15,535<br />

Core set: Ini1, BAF155, and BAF170<br />

(may include Brg1)<br />

9,760<br />

Ini1, BAF155, BAF170, and Brg1 4,750<br />

28


SWI/SNF Co-occurrences<br />

CTCF, Pol II<br />

Enhancers, 5’ ends,<br />

(any combination)<br />

SWI/SNF Union Set<br />

(49,555 regions)<br />

SWI/SNF Core Set<br />

(9,760 regions)<br />

44,755 (90%) 8,968 (92%)<br />

Unclassified 4,800 (10%) 792 (8%)<br />

RNA Pol II Sites 19,669 (40%) 6,562 (67%)<br />

Putative Enhancers 21,228 (43%) 3,431 (35%)<br />

CTCF Sites 8,542 (17%) 1,692 (17%)<br />

5’ ends of Ensembl<br />

protein-coding genes<br />

(within 2.5 kb)<br />

14,291 (29%) 4,089 (42%)<br />

29


Association of Subunit Combinations with<br />

Transcription Levels<br />

Euskirchen and Auerbach, et al.<br />

<strong>PLoS</strong> <strong>Genetics</strong>, 2011.<br />

30


Pathway Analysis<br />

Euskirchen and<br />

Auerbach, et al.<br />

<strong>PLoS</strong> <strong>Genetics</strong>, 2011.<br />

31


Overrepresented GO Categories<br />

(Mass Spectrometry)<br />

Euskirchen and Auerbach, et al.<br />

<strong>PLoS</strong> <strong>Genetics</strong>, 2011.<br />

32


Summary and Bioinformatics Contributions<br />

Different peak scoring criteria for ubiquitous<br />

factors<br />

Inferring information about a complex given<br />

ChIP-Seq from subunits<br />

Overall, SWI/SNF binds very generally, but is<br />

enriched at 5’ ends, genes associated with cell<br />

cycle, DNA repair, and cancer.<br />

33


SWI/SNF and DNA Looping<br />

Euskirchen and Auerbach, et al.<br />

<strong>PLoS</strong> <strong>Genetics</strong>, 2011.<br />

CIITA locus (~150 kb)<br />

34


Exploring Transcription, DNA Folding,<br />

and Nuclear Organization in a<br />

Multidimensional Context<br />

Paper: Li, Ruan, Auerbach, and Sandhu, et al., Cell,<br />

2012<br />

35


ChIA-PET<br />

Chromatin Interaction Analysis by Paired End diTag<br />

Sequencing<br />

Collaboration with Stanford and Genome Institute of<br />

Singapore<br />

In addition to ChIA-PET method, FISH, qPCR,<br />

enhancer assays, and other methods used for validation.<br />

Question: How does transcriptional regulation work in<br />

3-D space on an intrachromosomal level<br />

36


So How Does ChIA-PET Work<br />

(Cliffs Notes Version)<br />

ChIP-Seq<br />

ChIA-PET<br />

DNA 1<br />

DNA 1<br />

DN<br />

Linker<br />

DNA 2 DNA 2<br />

37


The Textbook Version<br />

38


The Textbook Version<br />

39


First Goal - Transcription Factories<br />

Sutherland and Bickmore.<br />

Nature Reviews <strong>Genetics</strong>, 2009.<br />

40


Second Goal - Formation of Protein<br />

Complexes<br />

PJ Farnham, Nature Reviews <strong>Genetics</strong>. 2009.<br />

41


Models of Transcription<br />

Li, Ruan, Auerbach, and Sandhu, et al. Cell, 2012.<br />

42


Gene Expression Characteristics<br />

Li, Ruan, Auerbach, and Sandhu, et al.<br />

Cell, 2012.<br />

43


Binding of Different TFs Across Models<br />

Li, Ruan, Auerbach, and<br />

Sandhu, et al. Cell, 2012.<br />

44


IRS1 and T2D: Long Range Interactions<br />

and Disease<br />

Li, Ruan, Auerbach,<br />

and Sandhu, et al.<br />

Cell, 2012.<br />

45


ChIA-PET Conclusions<br />

Active regions are connected to other active regions<br />

Some factors are present at promoters while others<br />

are brought in by LRI<br />

Most interactions follow the basal promoter model,<br />

but most genes are involved in multigene complexes<br />

Long range interactions and its role in disease<br />

46


Bioinformatics Contributions<br />

Integration of LRI data with ChIP-Seq, RNA-<br />

Seq, etc., to look at transcription as a system<br />

Basis for future studies in how various protein<br />

complexes are formed in vivo<br />

47


CAPE - Coupled Analysis of<br />

Polymerase and Expression<br />

48


Combining RNAPII ChIP-Seq with RNA-<br />

Seq<br />

A “natural experiment” for transcription analysis<br />

Simple to generate, gain a lot of information<br />

Many paired datasets available in public<br />

repositories<br />

Can identify transcripts with unexpected<br />

relationships between binding and expression<br />

compare to other organisms/samples/conditions<br />

49


CAPE Summary<br />

Publicly available tool designed to categorize features<br />

based on expression & RNAPII binding<br />

Open-source and multiplatform (Java)<br />

Designed to work on diverse sets of genomes out of<br />

the box, but also allows for parameter customization<br />

Useful for comparative genomics (e.g. modENCODE)<br />

Two modules: CAPE-analyze and CAPE-compare<br />

50


CAPE: Coupled Analysis of Polymerase<br />

Binding and Expression<br />

Auerbach et al. In revision.<br />

51


Sample CAPE-analyze Output<br />

52


Sample CAPE-compare Output (raw)<br />

53


Sample CAPE-compare Output (HTML)<br />

54


Sample CAPE-compare Output (Venn)<br />

55


Overall Summary<br />

Technical implications of scoring ChIP-Seq data (PNAS)<br />

Considerations when analyzing data from ChIP-Seq<br />

experiments targeted to non-standard transcription<br />

factors and protein complexes (<strong>PLoS</strong> <strong>Genetics</strong>)<br />

How DNA folding affects how we view transcription and<br />

ChIP-Seq data (Cell)<br />

New, robust tool to quickly classify transcripts/genes<br />

based on mRNA abundance and RNAPII binding levels<br />

56


Other Work While at Yale<br />

Co-author of 14 peer-reviewed papers while at<br />

Yale (4 as primary or starred)<br />

12 published<br />

2 in press<br />

One manuscript being revised for resubmission<br />

57


Acknowledgements<br />

58


Questions<br />

59

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!