05.01.2013 Views

Absolute quantification of somatic DNA alterations in ... - Nature

Absolute quantification of somatic DNA alterations in ... - Nature

Absolute quantification of somatic DNA alterations in ... - Nature

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

npg © 2012 <strong>Nature</strong> America, Inc. All rights reserved.<br />

A n A ly s i s<br />

a<br />

b<br />

Cancer and<br />

normal cells<br />

<strong>DNA</strong> extraction<br />

with loss <strong>of</strong> <strong>in</strong>formation<br />

regard<strong>in</strong>g <strong>DNA</strong> copies<br />

per cancer cell<br />

Segmentation<br />

and smooth<strong>in</strong>g<br />

<strong>of</strong> copy number<br />

data<br />

d<br />

e<br />

Local relative <strong>DNA</strong><br />

concentration<br />

Local relative <strong>DNA</strong><br />

concentration<br />

Precomputed<br />

models <strong>of</strong><br />

recurrent cancer<br />

karyotypes<br />

1.5<br />

1.0<br />

0.5<br />

0<br />

Genomic position<br />

Measurement<br />

<strong>of</strong> local relative<br />

<strong>DNA</strong> concentration<br />

0 0.2 0.4 0.6 0.8<br />

Allelic fraction<br />

Somatic<br />

po<strong>in</strong>t mutations<br />

(optional)<br />

Interpretation <strong>of</strong> <strong>somatic</strong> <strong>DNA</strong> <strong>alterations</strong> on an absolute scale<br />

Local relative <strong>DNA</strong><br />

concentration<br />

2.0 Summary histogram<br />

1.5<br />

1.0<br />

0.5<br />

0<br />

Subclonal<br />

copy <strong>alterations</strong><br />

4<br />

3<br />

2<br />

1<br />

0<br />

Subclonal mutations<br />

Clonal<br />

mutations<br />

0 1.0 2.0<br />

Po<strong>in</strong>t mutation multiplicity<br />

an <strong>in</strong>tegrated analysis <strong>of</strong> po<strong>in</strong>t-mutation and copy-number estimates<br />

and its application to ovarian carc<strong>in</strong>oma.<br />

We describe three key mathematical features <strong>of</strong> ABSOLUTE. First,<br />

it jo<strong>in</strong>tly estimates tumor purity and ploidy directly from observed<br />

relative copy pr<strong>of</strong>iles (po<strong>in</strong>t mutations may also be used, if available).<br />

Second, because jo<strong>in</strong>t estimation may not be fully determ<strong>in</strong>ed on a<br />

s<strong>in</strong>gle sample, it uses a large and diverse sample collection to help<br />

resolve ambiguous cases. Third, it attempts to account for subclonal<br />

copy-number <strong>alterations</strong> and po<strong>in</strong>t mutations, which are expected <strong>in</strong><br />

heterogeneous cancer samples.<br />

We apply ABSOLUTE to conduct the first, to our knowledge, largescale<br />

‘pan-cancer’ analysis <strong>of</strong> copy-number <strong>alterations</strong> on an absolute<br />

basis, across 3,155 cancer samples, represent<strong>in</strong>g 25 diseases with at least<br />

20 samples each. The analysis reveals that whole-genome doubl<strong>in</strong>g<br />

events occur frequently dur<strong>in</strong>g tumorigenesis, ultimately result<strong>in</strong>g <strong>in</strong><br />

mature cancers descended from doubled cells bear<strong>in</strong>g complex karyo-<br />

Percent<br />

Candidate <strong>in</strong>terpretations <strong>of</strong> copy pr<strong>of</strong>ile<br />

2.0<br />

f g<br />

Fraction<br />

cancer nuclei<br />

h<br />

Relationship to purity, ploidy<br />

1.0<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

4<br />

3<br />

2<br />

1<br />

0<br />

2n 4n 6n 8n 10n<br />

Ploidy<br />

<strong>Absolute</strong> <strong>somatic</strong><br />

copy numbers<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

SCNAs<br />

Karyotype<br />

c<br />

3<br />

2<br />

1<br />

0<br />

Total<br />

<strong>Absolute</strong> <strong>somatic</strong><br />

copy numbers<br />

Log−likelihood Model-based evaluation<br />

Allelic copy<br />

High<br />

Balanced<br />

Low<br />

Figure 1 Overview <strong>of</strong> tumor <strong>DNA</strong> analysis us<strong>in</strong>g ABSOLUTE. (a) A constant<br />

mass <strong>of</strong> <strong>DNA</strong> is extracted from a heterogeneous cell population consist<strong>in</strong>g<br />

<strong>of</strong> cancer and normal cells. This <strong>DNA</strong> is pr<strong>of</strong>iled us<strong>in</strong>g either microarray or<br />

massively parallel sequenc<strong>in</strong>g technology, giv<strong>in</strong>g a genome-wide pr<strong>of</strong>ile <strong>of</strong><br />

<strong>DNA</strong> concentrations. (b) Genome-wide view <strong>of</strong> homologous copy ratios for<br />

a lung adenocarc<strong>in</strong>oma tumor sample processed us<strong>in</strong>g ABSOLUTE. The<br />

copy ratios for both homologous chromosomes are shown for each genomic<br />

segment with locally constant copy number. Color axis <strong>in</strong>dicates distance<br />

between low (blue) and high (red) homologue concentration; segments<br />

where these are similar (allelic balance) are purple. (c) Homologous copyratio<br />

histogram. Copy ratios shown <strong>in</strong> b were b<strong>in</strong>ned at 0.04 resolution<br />

(y axis); the length <strong>of</strong> each block corresponds to the (haploid) genomic<br />

fraction (x axis) <strong>of</strong> each correspond<strong>in</strong>g segment <strong>in</strong> b. Several discrete<br />

SCNA peaks are visible, each correspond<strong>in</strong>g either to an (unknown) <strong>in</strong>teger<br />

copy state <strong>in</strong> the <strong>somatic</strong> clone or to a subclonal alteration. (d) To aid <strong>in</strong><br />

the <strong>in</strong>terpretation <strong>of</strong> potentially ambiguous data, ABSOLUTE uses precomputed<br />

statistical models <strong>of</strong> recurrence cancer karyotypes (left, Onl<strong>in</strong>e<br />

Methods). Optionally, if <strong>somatic</strong> po<strong>in</strong>t mutation data are available (from<br />

sequenc<strong>in</strong>g <strong>of</strong> the <strong>DNA</strong>), then the allelic fractions (fraction <strong>of</strong> sequenc<strong>in</strong>g<br />

reads bear<strong>in</strong>g the nonreference allele) <strong>of</strong> these mutations may be used help<br />

to <strong>in</strong>terpret the <strong>DNA</strong> concentrations. (e) Three potential <strong>in</strong>terpretations <strong>of</strong><br />

the copy-ratio histogram (b) <strong>in</strong> terms <strong>of</strong> absolute copy numbers. Horizontal<br />

dotted l<strong>in</strong>es <strong>in</strong>dicate the copy ratios correspond<strong>in</strong>g to the <strong>in</strong>dicated<br />

absolute <strong>somatic</strong> copy-numbers. (f) Purity (fraction <strong>of</strong> tumor nuclei) and<br />

cancer-genome ploidy values correspond<strong>in</strong>g to each <strong>in</strong>terpretation <strong>in</strong> (e).<br />

Dotted l<strong>in</strong>es denote potential solutions that share either b, the copy ratio<br />

associated with zero <strong>somatic</strong> copies (from upper left to lower right), or δ τ ,<br />

the spac<strong>in</strong>g between consecutive <strong>in</strong>teger copy levels (from lower left<br />

to upper right). Candidate solutions lie on the <strong>in</strong>dicated grid <strong>of</strong><br />

b = 2(1 – α)/D and δ τ = α/D (equation (1)). (g) The log-likelihood (score) <strong>of</strong><br />

each solution <strong>in</strong> terms <strong>of</strong> the SCNA fit <strong>of</strong> the observed copy ratios to <strong>in</strong>teger<br />

absolute copy numbers and plausibility <strong>of</strong> the proposed karyotype. The<br />

highest-scor<strong>in</strong>g solution (green) is identified by the comb<strong>in</strong>ation <strong>of</strong> SCNA-fit<br />

and karyotype log-likelihood values. This <strong>in</strong>terpretation implies subclonal<br />

ga<strong>in</strong> <strong>of</strong> chromosome 2 (e, arrow). The SCNA score alone cannot dist<strong>in</strong>guish<br />

between this and an additional solution (blue), <strong>in</strong> which the arrowed region<br />

is closer to an <strong>in</strong>teger copy state, but the overall SCNA-fit score is equivalent<br />

to that <strong>of</strong> the first solution. (h) Interpretation <strong>of</strong> <strong>somatic</strong> <strong>DNA</strong> <strong>alterations</strong> on<br />

an absolute scale. Modeled SCNA copy states are shown (left). In addition,<br />

allelic fractions may be re<strong>in</strong>terpreted as average allelic copies per cancer cell<br />

(multiplicity), potentially reveal<strong>in</strong>g subclonal po<strong>in</strong>t mutations (right).<br />

types. Despite evidence that genome doubl<strong>in</strong>gs can result <strong>in</strong> genetic<br />

<strong>in</strong>stability and accelerate oncogenesis 13,25,26 , the <strong>in</strong>cidence and tim<strong>in</strong>g<br />

<strong>of</strong> such events had not been broadly characterized <strong>in</strong> human cancer.<br />

We then describe how estimates <strong>of</strong> tumor purity and absolute copy<br />

number allow us to analyze allelic-fraction values (the fraction <strong>of</strong><br />

non-reference sequenc<strong>in</strong>g reads support<strong>in</strong>g a mutation) to dist<strong>in</strong>guish<br />

clonal and subclonal po<strong>in</strong>t mutations, and to detect macroscopic<br />

subclonal structure <strong>in</strong> an ovarian cancer sample. Clonal events<br />

may be classified as homozygous or heterozygous <strong>in</strong> the cancer cells,<br />

guid<strong>in</strong>g <strong>in</strong>terpretation <strong>of</strong> their function. In addition, the ability to<br />

quantify <strong>in</strong>teger multiplicity <strong>of</strong> po<strong>in</strong>t mutations aids <strong>in</strong> the relative<br />

tim<strong>in</strong>g <strong>of</strong> segmental <strong>DNA</strong> copy-number ga<strong>in</strong>s, as multiplicity values<br />

<strong>of</strong> greater than one imply that the po<strong>in</strong>t mutation preceded copy ga<strong>in</strong><br />

<strong>of</strong> the locus. Controll<strong>in</strong>g for tumor purity and local copy-number<br />

allow such tim<strong>in</strong>gs to be calculated more generally than <strong>in</strong> the special<br />

case <strong>of</strong> copy-neutral loss <strong>of</strong> heterozygosity 27 . F<strong>in</strong>ally, our data allow<br />

characterization <strong>of</strong> <strong>somatic</strong> cancer evolution with respect to wholegenome<br />

doubl<strong>in</strong>g, which we demonstrate <strong>in</strong> ovarian carc<strong>in</strong>oma and<br />

associate with cl<strong>in</strong>icopathological values.<br />

RESULTS<br />

Inference <strong>of</strong> sample purity and ploidy <strong>in</strong> cancer-derived <strong>DNA</strong><br />

A conceptual overview <strong>of</strong> ABSOLUTE is shown <strong>in</strong> Figure 1. When <strong>DNA</strong><br />

is extracted from a mixed population <strong>of</strong> cancer and normal cells, the<br />

414 VOLUME 30 NUMBER 5 MAY 2012 nature biotechnology

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!