bbc 2015
BBC2015_booklet
BBC2015_booklet
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
Poster<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P20. A MIXTURE MODEL FOR THE OMICS BASED IDENTIFICATION OF<br />
MONOALLELICALLY EXPRESSED LOCI AND THEIR DEREGULATION IN<br />
CANCER<br />
Tine Goovaerts 1 , Sandra Steyaert 1 , Jeroen Galle 1 , Wim Van Criekinge 1 & Tim De Meyer 1* .<br />
BIOBIX lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling,<br />
Statistics and Bioinformatics, Ghent University 1 . * tim.demeyer@ugent.be<br />
Imprinting is a phenomenon featured by parent-specific monoallelic gene expression. Its deregulation has been<br />
associated with non-Mendelian inherited genetic diseases but is also a common feature of cancer. As imprinting does not<br />
alter the genome yet is mitotically inherited, epigenetics is deemed to be a key regulator. Current knowledge in the field<br />
is particularly hampered by a lack of accurate computational techniques suitable for omics data. Here we introduce a<br />
mixture model for the identification of monoallelically expressed loci based on large scale omics data that can also be<br />
exploited to identify samples and loci featured by loss of imprinting / monoallelic expression.<br />
INTRODUCTION<br />
The genome-wide identification of mono-allelically<br />
expressed or epigenetically modified loci typically<br />
requires the presence of SNPs to discriminate both alleles.<br />
Current methods predominantly rely on genotyping for the<br />
identification of heterozygous loci in a limited sample set,<br />
followed by testing whether the expression/epigenetic<br />
modification levels for both alleles deviate from a 1:1 ratio<br />
for those loci (Wang et al., 2014). This approach is limited<br />
by the genotyping step and the required presence of<br />
heterozygous individuals. As large scale omics data is<br />
becoming increasingly available, an alternative strategy<br />
may be to screen larger numbers (e.g. hundreds) of<br />
samples, ensuring the presence of heterozygous<br />
individuals at predictable rates, thereby also avoiding the<br />
need for and limitations of a prior genotyping step.<br />
Based on this concept, a previous strategy (Steyaert et al.,<br />
2014) enabled us to identify and validate approximately 80<br />
loci featured by monoallelic DNA methylation, but had<br />
several drawbacks, such as computational inefficiency,<br />
heavy reliance on Hardy-Weinberg equilibrium (HWE),<br />
need for 100% imprinting and low power, which limited<br />
its practical use. Here we present a novel mixture model<br />
for the identification of monoallelically modified or<br />
expressed loci from large-scale omics data (without<br />
known genotypes) that largely circumvents previous<br />
drawbacks.<br />
METHODS<br />
The rationale of the methodology is that RNA-seq and<br />
ChIP-seq(-like) derived SNP data for monoallelic loci are<br />
featured by a general lack of apparent heterozygosity.<br />
More specifically, under the null-hypothesis (no<br />
imprinting) the homozygous and heterozygous sample<br />
fractions can be modelled as a mixture of (beta-)binomial<br />
distributions, with weights according to HWE or<br />
empirically derived. For imprinted loci however, the<br />
heterozygous fraction is split and shifted towards the two<br />
homozygous fractions (Figure 1), which can be evaluated<br />
with a likelihood ratio test. The model does not require but<br />
can incorporate prior genotyping data and allows for<br />
deviation from HWE, sequencing errors and efficiency<br />
differences and partial monoallelic events. Once loci<br />
featured by monoallelic events have been identified in<br />
control data, a loss of imprinting index can be calculated<br />
for each non-normal sample based on the mixture model<br />
likelihoods and loci generally featured by loss of<br />
imprinting in the pathology under study can be identified.<br />
RESULTS & DISCUSSION<br />
We demonstrate the applicability of the novel mixture<br />
model with simulations and a proof of concept study using<br />
breast cancer and control RNA-seq data from The Cancer<br />
Genome Atlas (TCGA Research Network, 2008). Well<br />
known imprinted loci such as IGF2 (Figure 1) and H19<br />
were indeed identified. Ongoing efforts are directed<br />
towards artefact-free RNA/ChIP-seq data based allele<br />
frequency inference and the efficient implementation of a<br />
beta-binomial based mixture.<br />
FIGURE 1. Observed (red) and modelled (green) allele frequencies for a<br />
100% (right, no observable heterozygotes) and a partially imprinted<br />
(left) SNP of the IGF2 gene<br />
In conclusion, we introduce a novel mixture model for the<br />
identification of loci featured by monoallelic events which<br />
can subsequently be exploited to determine their<br />
deregulation in the pathology of interest.<br />
REFERENCES<br />
Steyaert S et al. Nucleic Acids Research 42, e157 (2014).<br />
TCGA Research Network. Nature 455, 1061-1068 (2008).<br />
Wang X & Clark AG. Heredity 113, 156-166 (2014).<br />
64