bbc 2015

Recommendations

Info

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P20. A MIXTURE MODEL FOR THE OMICS BASED IDENTIFICATION OF MONOALLELICALLY EXPRESSED LOCI AND THEIR DEREGULATION IN CANCER Tine Goovaerts 1 , Sandra Steyaert 1 , Jeroen Galle 1 , Wim Van Criekinge 1 & Tim De Meyer 1* . BIOBIX lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University 1 . * tim.demeyer@ugent.be Imprinting is a phenomenon featured by parent-specific monoallelic gene expression. Its deregulation has been associated with non-Mendelian inherited genetic diseases but is also a common feature of cancer. As imprinting does not alter the genome yet is mitotically inherited, epigenetics is deemed to be a key regulator. Current knowledge in the field is particularly hampered by a lack of accurate computational techniques suitable for omics data. Here we introduce a mixture model for the identification of monoallelically expressed loci based on large scale omics data that can also be exploited to identify samples and loci featured by loss of imprinting / monoallelic expression. INTRODUCTION The genome-wide identification of mono-allelically expressed or epigenetically modified loci typically requires the presence of SNPs to discriminate both alleles. Current methods predominantly rely on genotyping for the identification of heterozygous loci in a limited sample set, followed by testing whether the expression/epigenetic modification levels for both alleles deviate from a 1:1 ratio for those loci (Wang et al., 2014). This approach is limited by the genotyping step and the required presence of heterozygous individuals. As large scale omics data is becoming increasingly available, an alternative strategy may be to screen larger numbers (e.g. hundreds) of samples, ensuring the presence of heterozygous individuals at predictable rates, thereby also avoiding the need for and limitations of a prior genotyping step. Based on this concept, a previous strategy (Steyaert et al., 2014) enabled us to identify and validate approximately 80 loci featured by monoallelic DNA methylation, but had several drawbacks, such as computational inefficiency, heavy reliance on Hardy-Weinberg equilibrium (HWE), need for 100% imprinting and low power, which limited its practical use. Here we present a novel mixture model for the identification of monoallelically modified or expressed loci from large-scale omics data (without known genotypes) that largely circumvents previous drawbacks. METHODS The rationale of the methodology is that RNA-seq and ChIP-seq(-like) derived SNP data for monoallelic loci are featured by a general lack of apparent heterozygosity. More specifically, under the null-hypothesis (no imprinting) the homozygous and heterozygous sample fractions can be modelled as a mixture of (beta-)binomial distributions, with weights according to HWE or empirically derived. For imprinted loci however, the heterozygous fraction is split and shifted towards the two homozygous fractions (Figure 1), which can be evaluated with a likelihood ratio test. The model does not require but can incorporate prior genotyping data and allows for deviation from HWE, sequencing errors and efficiency differences and partial monoallelic events. Once loci featured by monoallelic events have been identified in control data, a loss of imprinting index can be calculated for each non-normal sample based on the mixture model likelihoods and loci generally featured by loss of imprinting in the pathology under study can be identified. RESULTS & DISCUSSION We demonstrate the applicability of the novel mixture model with simulations and a proof of concept study using breast cancer and control RNA-seq data from The Cancer Genome Atlas (TCGA Research Network, 2008). Well known imprinted loci such as IGF2 (Figure 1) and H19 were indeed identified. Ongoing efforts are directed towards artefact-free RNA/ChIP-seq data based allele frequency inference and the efficient implementation of a beta-binomial based mixture. FIGURE 1. Observed (red) and modelled (green) allele frequencies for a 100% (right, no observable heterozygotes) and a partially imprinted (left) SNP of the IGF2 gene In conclusion, we introduce a novel mixture model for the identification of loci featured by monoallelic events which can subsequently be exploited to determine their deregulation in the pathology of interest. REFERENCES Steyaert S et al. Nucleic Acids Research 42, e157 (2014). TCGA Research Network. Nature 455, 1061-1068 (2008). Wang X & Clark AG. Heredity 113, 156-166 (2014). 64
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 2015 Abstract ID: P Poster 10th Benelux Bioinformatics Conference bbc 2015 P21. GEVACT: GENOMIC VARIANT CLASSIFIER TOOL Isel Grau 1,4 , Dorien Daneels 2,3 , Sonia Van Dooren 2,3 , Maryse Bonduelle 2 , Dewan Md. Farid 1,3 , Didier Croes 2,3 , Ann Nowé 1,3 & Dipankar Sengupta 1,3* . Como - Artificial Intelligence Lab, Vrije Universiteit Brussel 1 ; Centre for Medical Genetics, Reproduction and Genetics, Reproduction Genetics and Regenerative Medicine, Vrije Universiteit Brussel,UZ Brussel 2 ; Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB 3 ; Department of Computer Sciences, Universidad Central de Las Villas 4 . * Dipankar.Sengupta@vub.ac.be High throughput screening (HTS) techniques, like genome or exome screening are becoming norms in the conventional clinical analysis. However, classifying the identified variants to be pathogenic, or potentially pathogenic or nonpathogenic, is still a manual, tedious and time consuming process for clinicians or geneticists. Thus, to facilitate the variant classification process, we have developed G E V A CT, a Java based tool, designed on an algorithm, i.e. based on the existing literature and knowledge of clinical geneticists. G E V A CT can classify variants annotated by Alamut Batch, with a future plan to support for inputs from other annotation software's also. INTRODUCTION With the emergence of new screening techniques, targeted or whole exome and genome screening are becoming standard diagnostic norms in clinical settings to identify the variants for a genetic disease (Ng et al., 2010; Saunders et al., 2012). However, development of bioinformatics solutions for pathogenic classification of the variants still remains a big challenge and henceforth, making the process ponderous for geneticists and clinicians. In this work, we describe G E V A CT (Genomic Variant Classifier Tool), a tool for classification of genomic single nucleotide and short insertion/deletion variants. The aim of this study was to design and implement a variant classification algorithm, based on a literature review of cardiac arrhythmia syndromes (Hofman et al., 2013; Schulze-Bahr et al., 2000; Wilde & Tan, 2007) and existing knowledge of clinical geneticists. METHODS The algorithm we propose for G E V A CT is based on a published variant classification schema for cardiac arrhythmia syndromes. This approach is based on the yield of DNA testing over a time span of 15 years (1996-2011), between probands with isolated/familial cases, and also between probands with or without clear disease-specific clinical characteristics (Hofman et al., 2013). It proposes two varying approaches: one to classify missense variants and another to classify nonsense and frameshift variants. The algorithm is implemented in two phases: preprocessing and classification. In the pre-processing phase, the annotated tab-delimited variant file (vcf.ann) from the Alamut batch, is refined based on the gene list for the disease-of-interest, so as to reduce the number of variants for the analysis. Filters are applied to look for variants that have already been reported in the Human Genome Mutation Database (Stenson et al., 2003) and in ClinVar (Landrum et al., 2014), or that have previously been detected and classified in an internal patient population. And lastly, the variants are filtered based on their location in the genome and their coding effect, followed by the check for minor allele frequency of the variant in a control population (Sherry ST et al. 2001). Thereafter, in the classification phase, the filtered variants are classified as missense or nonsense and frameshift variants. For missense variants the classification is based on the parameters: amino acid substitution and its impact on protein function (Adzhubei et al., 2010; Kumar et al., 2009), biochemical variation (Mathe et al., 2006), conservation (Pollard et al., 2010), frequency of variant alleles in a control population (ExAC, 2015), effects on splicing (Desmet et al., 2009), family and phenotype information and functional analysis. Whereas, for the nonsense and frameshift variants, it is based on: effects on splicing, frequency of variant alleles in a control population, family and phenotype information and functional analysis. For each parameter, a score is given to the variant, which is subsequently cumulated. Conclusively, based on the cumulative score each variant is classified into one of the five categories: Class I - Non- Pathogenic; Class II - VUS1 (unlikely pathogenic); Class III - VUS2 (unclear); Class IV - VUS3 (likely pathogenic); Class V - Pathogenic (Sharon et al., 2008). RESULTS & DISCUSSION In this study, we report a Java based tool called G E V A CT, developed for classification of genomic variants. Input for the tool is an annotated vcf file, while the output depicts the cumulative classification score along with the class label for a variant. The tool was tested on a dataset of 130 cardiac arrhythmia syndrome patients, available at UZ Brussel. The results of the variant classification made by the tool were cross-validated by manual curation, performed by the clinical geneticist. Definitively, the study indicates the tool to be promising but needs to be further validated on datasets from other diseases. In addition to, we are working on the tool to be adaptable for file inputs from other annotation software. REFERENCES Adzhubei IA et al. Nat Methods 7(4), 248-249 (2010). Desmet et al. Nucleic Acids Res 37 (9): e67 (2009). Exome Aggregation Consortium (ExAC), Cambridge, MA (2015). Hofman N et al. Circulation 128(14),1513-21 (2013). Kumar P et al. Nat Protoc 4(7), 1073–1081 (2009). Landrum MJ et al. Nucleic Acids Res 42(1), D980-5 (2014). Mathe E et al. Nucleic Acids Res 34(5),1317-25 (2006). Ng SB et al. Nat Genetics 42, 30–35 (2010). Pollard K et al. Genome Res 20, 110-121 (2010). Saunders CJ et al. Sci Transl Med 4, 154ra135 (2012). Sharon EP et al. Hum Mutat. 29(11), 1282–1291 (2008). Sherry ST et al. Nucleic Acids Res 29(1),308-11 (2001). Schulze-Bahr E et al. Z Kardiol 89 Suppl 4:IV12-22 (2000). Stenson et al. Hum Mutat. 21:577-581 (2003). Wilde AA & Tan HL Circ J 71, Suppl A:A12-9 (2007). 65
Page 1 and 2:
10 th Benelux Bioinformatics Confer
Page 3 and 4:
10th Benelux Bioinformatics Confere
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14: 10th Benelux Bioinformatics Confere
Page 19 and 20: BeNeLux Bioinformatics Conference -
Page 63: BeNeLux Bioinformatics Conference -
Page 115:
show all

bbc 2015

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?