03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P20. A MIXTURE MODEL FOR THE OMICS BASED IDENTIFICATION OF<br />

MONOALLELICALLY EXPRESSED LOCI AND THEIR DEREGULATION IN<br />

CANCER<br />

Tine Goovaerts 1 , Sandra Steyaert 1 , Jeroen Galle 1 , Wim Van Criekinge 1 & Tim De Meyer 1* .<br />

BIOBIX lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling,<br />

Statistics and Bioinformatics, Ghent University 1 . * tim.demeyer@ugent.be<br />

Imprinting is a phenomenon featured by parent-specific monoallelic gene expression. Its deregulation has been<br />

associated with non-Mendelian inherited genetic diseases but is also a common feature of cancer. As imprinting does not<br />

alter the genome yet is mitotically inherited, epigenetics is deemed to be a key regulator. Current knowledge in the field<br />

is particularly hampered by a lack of accurate computational techniques suitable for omics data. Here we introduce a<br />

mixture model for the identification of monoallelically expressed loci based on large scale omics data that can also be<br />

exploited to identify samples and loci featured by loss of imprinting / monoallelic expression.<br />

INTRODUCTION<br />

The genome-wide identification of mono-allelically<br />

expressed or epigenetically modified loci typically<br />

requires the presence of SNPs to discriminate both alleles.<br />

Current methods predominantly rely on genotyping for the<br />

identification of heterozygous loci in a limited sample set,<br />

followed by testing whether the expression/epigenetic<br />

modification levels for both alleles deviate from a 1:1 ratio<br />

for those loci (Wang et al., 2014). This approach is limited<br />

by the genotyping step and the required presence of<br />

heterozygous individuals. As large scale omics data is<br />

becoming increasingly available, an alternative strategy<br />

may be to screen larger numbers (e.g. hundreds) of<br />

samples, ensuring the presence of heterozygous<br />

individuals at predictable rates, thereby also avoiding the<br />

need for and limitations of a prior genotyping step.<br />

Based on this concept, a previous strategy (Steyaert et al.,<br />

2014) enabled us to identify and validate approximately 80<br />

loci featured by monoallelic DNA methylation, but had<br />

several drawbacks, such as computational inefficiency,<br />

heavy reliance on Hardy-Weinberg equilibrium (HWE),<br />

need for 100% imprinting and low power, which limited<br />

its practical use. Here we present a novel mixture model<br />

for the identification of monoallelically modified or<br />

expressed loci from large-scale omics data (without<br />

known genotypes) that largely circumvents previous<br />

drawbacks.<br />

METHODS<br />

The rationale of the methodology is that RNA-seq and<br />

ChIP-seq(-like) derived SNP data for monoallelic loci are<br />

featured by a general lack of apparent heterozygosity.<br />

More specifically, under the null-hypothesis (no<br />

imprinting) the homozygous and heterozygous sample<br />

fractions can be modelled as a mixture of (beta-)binomial<br />

distributions, with weights according to HWE or<br />

empirically derived. For imprinted loci however, the<br />

heterozygous fraction is split and shifted towards the two<br />

homozygous fractions (Figure 1), which can be evaluated<br />

with a likelihood ratio test. The model does not require but<br />

can incorporate prior genotyping data and allows for<br />

deviation from HWE, sequencing errors and efficiency<br />

differences and partial monoallelic events. Once loci<br />

featured by monoallelic events have been identified in<br />

control data, a loss of imprinting index can be calculated<br />

for each non-normal sample based on the mixture model<br />

likelihoods and loci generally featured by loss of<br />

imprinting in the pathology under study can be identified.<br />

RESULTS & DISCUSSION<br />

We demonstrate the applicability of the novel mixture<br />

model with simulations and a proof of concept study using<br />

breast cancer and control RNA-seq data from The Cancer<br />

Genome Atlas (TCGA Research Network, 2008). Well<br />

known imprinted loci such as IGF2 (Figure 1) and H19<br />

were indeed identified. Ongoing efforts are directed<br />

towards artefact-free RNA/ChIP-seq data based allele<br />

frequency inference and the efficient implementation of a<br />

beta-binomial based mixture.<br />

FIGURE 1. Observed (red) and modelled (green) allele frequencies for a<br />

100% (right, no observable heterozygotes) and a partially imprinted<br />

(left) SNP of the IGF2 gene<br />

In conclusion, we introduce a novel mixture model for the<br />

identification of loci featured by monoallelic events which<br />

can subsequently be exploited to determine their<br />

deregulation in the pathology of interest.<br />

REFERENCES<br />

Steyaert S et al. Nucleic Acids Research 42, e157 (2014).<br />

TCGA Research Network. Nature 455, 1061-1068 (2008).<br />

Wang X & Clark AG. Heredity 113, 156-166 (2014).<br />

64

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!