Absolute quantification of somatic DNA alterations in ... - Nature
Absolute quantification of somatic DNA alterations in ... - Nature
Absolute quantification of somatic DNA alterations in ... - Nature
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
npg © 2012 <strong>Nature</strong> America, Inc. All rights reserved.<br />
A n A ly s i s<br />
a<br />
b<br />
Cancer and<br />
normal cells<br />
<strong>DNA</strong> extraction<br />
with loss <strong>of</strong> <strong>in</strong>formation<br />
regard<strong>in</strong>g <strong>DNA</strong> copies<br />
per cancer cell<br />
Segmentation<br />
and smooth<strong>in</strong>g<br />
<strong>of</strong> copy number<br />
data<br />
d<br />
e<br />
Local relative <strong>DNA</strong><br />
concentration<br />
Local relative <strong>DNA</strong><br />
concentration<br />
Precomputed<br />
models <strong>of</strong><br />
recurrent cancer<br />
karyotypes<br />
1.5<br />
1.0<br />
0.5<br />
0<br />
Genomic position<br />
Measurement<br />
<strong>of</strong> local relative<br />
<strong>DNA</strong> concentration<br />
0 0.2 0.4 0.6 0.8<br />
Allelic fraction<br />
Somatic<br />
po<strong>in</strong>t mutations<br />
(optional)<br />
Interpretation <strong>of</strong> <strong>somatic</strong> <strong>DNA</strong> <strong>alterations</strong> on an absolute scale<br />
Local relative <strong>DNA</strong><br />
concentration<br />
2.0 Summary histogram<br />
1.5<br />
1.0<br />
0.5<br />
0<br />
Subclonal<br />
copy <strong>alterations</strong><br />
4<br />
3<br />
2<br />
1<br />
0<br />
Subclonal mutations<br />
Clonal<br />
mutations<br />
0 1.0 2.0<br />
Po<strong>in</strong>t mutation multiplicity<br />
an <strong>in</strong>tegrated analysis <strong>of</strong> po<strong>in</strong>t-mutation and copy-number estimates<br />
and its application to ovarian carc<strong>in</strong>oma.<br />
We describe three key mathematical features <strong>of</strong> ABSOLUTE. First,<br />
it jo<strong>in</strong>tly estimates tumor purity and ploidy directly from observed<br />
relative copy pr<strong>of</strong>iles (po<strong>in</strong>t mutations may also be used, if available).<br />
Second, because jo<strong>in</strong>t estimation may not be fully determ<strong>in</strong>ed on a<br />
s<strong>in</strong>gle sample, it uses a large and diverse sample collection to help<br />
resolve ambiguous cases. Third, it attempts to account for subclonal<br />
copy-number <strong>alterations</strong> and po<strong>in</strong>t mutations, which are expected <strong>in</strong><br />
heterogeneous cancer samples.<br />
We apply ABSOLUTE to conduct the first, to our knowledge, largescale<br />
‘pan-cancer’ analysis <strong>of</strong> copy-number <strong>alterations</strong> on an absolute<br />
basis, across 3,155 cancer samples, represent<strong>in</strong>g 25 diseases with at least<br />
20 samples each. The analysis reveals that whole-genome doubl<strong>in</strong>g<br />
events occur frequently dur<strong>in</strong>g tumorigenesis, ultimately result<strong>in</strong>g <strong>in</strong><br />
mature cancers descended from doubled cells bear<strong>in</strong>g complex karyo-<br />
Percent<br />
Candidate <strong>in</strong>terpretations <strong>of</strong> copy pr<strong>of</strong>ile<br />
2.0<br />
f g<br />
Fraction<br />
cancer nuclei<br />
h<br />
Relationship to purity, ploidy<br />
1.0<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
4<br />
3<br />
2<br />
1<br />
0<br />
2n 4n 6n 8n 10n<br />
Ploidy<br />
<strong>Absolute</strong> <strong>somatic</strong><br />
copy numbers<br />
7<br />
6<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
SCNAs<br />
Karyotype<br />
c<br />
3<br />
2<br />
1<br />
0<br />
Total<br />
<strong>Absolute</strong> <strong>somatic</strong><br />
copy numbers<br />
Log−likelihood Model-based evaluation<br />
Allelic copy<br />
High<br />
Balanced<br />
Low<br />
Figure 1 Overview <strong>of</strong> tumor <strong>DNA</strong> analysis us<strong>in</strong>g ABSOLUTE. (a) A constant<br />
mass <strong>of</strong> <strong>DNA</strong> is extracted from a heterogeneous cell population consist<strong>in</strong>g<br />
<strong>of</strong> cancer and normal cells. This <strong>DNA</strong> is pr<strong>of</strong>iled us<strong>in</strong>g either microarray or<br />
massively parallel sequenc<strong>in</strong>g technology, giv<strong>in</strong>g a genome-wide pr<strong>of</strong>ile <strong>of</strong><br />
<strong>DNA</strong> concentrations. (b) Genome-wide view <strong>of</strong> homologous copy ratios for<br />
a lung adenocarc<strong>in</strong>oma tumor sample processed us<strong>in</strong>g ABSOLUTE. The<br />
copy ratios for both homologous chromosomes are shown for each genomic<br />
segment with locally constant copy number. Color axis <strong>in</strong>dicates distance<br />
between low (blue) and high (red) homologue concentration; segments<br />
where these are similar (allelic balance) are purple. (c) Homologous copyratio<br />
histogram. Copy ratios shown <strong>in</strong> b were b<strong>in</strong>ned at 0.04 resolution<br />
(y axis); the length <strong>of</strong> each block corresponds to the (haploid) genomic<br />
fraction (x axis) <strong>of</strong> each correspond<strong>in</strong>g segment <strong>in</strong> b. Several discrete<br />
SCNA peaks are visible, each correspond<strong>in</strong>g either to an (unknown) <strong>in</strong>teger<br />
copy state <strong>in</strong> the <strong>somatic</strong> clone or to a subclonal alteration. (d) To aid <strong>in</strong><br />
the <strong>in</strong>terpretation <strong>of</strong> potentially ambiguous data, ABSOLUTE uses precomputed<br />
statistical models <strong>of</strong> recurrence cancer karyotypes (left, Onl<strong>in</strong>e<br />
Methods). Optionally, if <strong>somatic</strong> po<strong>in</strong>t mutation data are available (from<br />
sequenc<strong>in</strong>g <strong>of</strong> the <strong>DNA</strong>), then the allelic fractions (fraction <strong>of</strong> sequenc<strong>in</strong>g<br />
reads bear<strong>in</strong>g the nonreference allele) <strong>of</strong> these mutations may be used help<br />
to <strong>in</strong>terpret the <strong>DNA</strong> concentrations. (e) Three potential <strong>in</strong>terpretations <strong>of</strong><br />
the copy-ratio histogram (b) <strong>in</strong> terms <strong>of</strong> absolute copy numbers. Horizontal<br />
dotted l<strong>in</strong>es <strong>in</strong>dicate the copy ratios correspond<strong>in</strong>g to the <strong>in</strong>dicated<br />
absolute <strong>somatic</strong> copy-numbers. (f) Purity (fraction <strong>of</strong> tumor nuclei) and<br />
cancer-genome ploidy values correspond<strong>in</strong>g to each <strong>in</strong>terpretation <strong>in</strong> (e).<br />
Dotted l<strong>in</strong>es denote potential solutions that share either b, the copy ratio<br />
associated with zero <strong>somatic</strong> copies (from upper left to lower right), or δ τ ,<br />
the spac<strong>in</strong>g between consecutive <strong>in</strong>teger copy levels (from lower left<br />
to upper right). Candidate solutions lie on the <strong>in</strong>dicated grid <strong>of</strong><br />
b = 2(1 – α)/D and δ τ = α/D (equation (1)). (g) The log-likelihood (score) <strong>of</strong><br />
each solution <strong>in</strong> terms <strong>of</strong> the SCNA fit <strong>of</strong> the observed copy ratios to <strong>in</strong>teger<br />
absolute copy numbers and plausibility <strong>of</strong> the proposed karyotype. The<br />
highest-scor<strong>in</strong>g solution (green) is identified by the comb<strong>in</strong>ation <strong>of</strong> SCNA-fit<br />
and karyotype log-likelihood values. This <strong>in</strong>terpretation implies subclonal<br />
ga<strong>in</strong> <strong>of</strong> chromosome 2 (e, arrow). The SCNA score alone cannot dist<strong>in</strong>guish<br />
between this and an additional solution (blue), <strong>in</strong> which the arrowed region<br />
is closer to an <strong>in</strong>teger copy state, but the overall SCNA-fit score is equivalent<br />
to that <strong>of</strong> the first solution. (h) Interpretation <strong>of</strong> <strong>somatic</strong> <strong>DNA</strong> <strong>alterations</strong> on<br />
an absolute scale. Modeled SCNA copy states are shown (left). In addition,<br />
allelic fractions may be re<strong>in</strong>terpreted as average allelic copies per cancer cell<br />
(multiplicity), potentially reveal<strong>in</strong>g subclonal po<strong>in</strong>t mutations (right).<br />
types. Despite evidence that genome doubl<strong>in</strong>gs can result <strong>in</strong> genetic<br />
<strong>in</strong>stability and accelerate oncogenesis 13,25,26 , the <strong>in</strong>cidence and tim<strong>in</strong>g<br />
<strong>of</strong> such events had not been broadly characterized <strong>in</strong> human cancer.<br />
We then describe how estimates <strong>of</strong> tumor purity and absolute copy<br />
number allow us to analyze allelic-fraction values (the fraction <strong>of</strong><br />
non-reference sequenc<strong>in</strong>g reads support<strong>in</strong>g a mutation) to dist<strong>in</strong>guish<br />
clonal and subclonal po<strong>in</strong>t mutations, and to detect macroscopic<br />
subclonal structure <strong>in</strong> an ovarian cancer sample. Clonal events<br />
may be classified as homozygous or heterozygous <strong>in</strong> the cancer cells,<br />
guid<strong>in</strong>g <strong>in</strong>terpretation <strong>of</strong> their function. In addition, the ability to<br />
quantify <strong>in</strong>teger multiplicity <strong>of</strong> po<strong>in</strong>t mutations aids <strong>in</strong> the relative<br />
tim<strong>in</strong>g <strong>of</strong> segmental <strong>DNA</strong> copy-number ga<strong>in</strong>s, as multiplicity values<br />
<strong>of</strong> greater than one imply that the po<strong>in</strong>t mutation preceded copy ga<strong>in</strong><br />
<strong>of</strong> the locus. Controll<strong>in</strong>g for tumor purity and local copy-number<br />
allow such tim<strong>in</strong>gs to be calculated more generally than <strong>in</strong> the special<br />
case <strong>of</strong> copy-neutral loss <strong>of</strong> heterozygosity 27 . F<strong>in</strong>ally, our data allow<br />
characterization <strong>of</strong> <strong>somatic</strong> cancer evolution with respect to wholegenome<br />
doubl<strong>in</strong>g, which we demonstrate <strong>in</strong> ovarian carc<strong>in</strong>oma and<br />
associate with cl<strong>in</strong>icopathological values.<br />
RESULTS<br />
Inference <strong>of</strong> sample purity and ploidy <strong>in</strong> cancer-derived <strong>DNA</strong><br />
A conceptual overview <strong>of</strong> ABSOLUTE is shown <strong>in</strong> Figure 1. When <strong>DNA</strong><br />
is extracted from a mixed population <strong>of</strong> cancer and normal cells, the<br />
414 VOLUME 30 NUMBER 5 MAY 2012 nature biotechnology