03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P21. GEVACT: GENOMIC VARIANT CLASSIFIER TOOL<br />

Isel Grau 1,4 , Dorien Daneels 2,3 , Sonia Van Dooren 2,3 , Maryse Bonduelle 2 ,<br />

Dewan Md. Farid 1,3 , Didier Croes 2,3 , Ann Nowé 1,3 & Dipankar Sengupta 1,3* .<br />

Como - Artificial Intelligence Lab, Vrije Universiteit Brussel 1 ; Centre for Medical Genetics, Reproduction and Genetics,<br />

Reproduction Genetics and Regenerative Medicine, Vrije Universiteit Brussel,UZ Brussel 2 ; Interuniversity Institute of<br />

Bioinformatics in Brussels, ULB-VUB 3 ; Department of Computer Sciences, Universidad Central de Las Villas 4 .<br />

* Dipankar.Sengupta@vub.ac.be<br />

High throughput screening (HTS) techniques, like genome or exome screening are becoming norms in the conventional<br />

clinical analysis. However, classifying the identified variants to be pathogenic, or potentially pathogenic or nonpathogenic,<br />

is still a manual, tedious and time consuming process for clinicians or geneticists. Thus, to facilitate the<br />

variant classification process, we have developed G E V A CT, a Java based tool, designed on an algorithm, i.e. based on the<br />

existing literature and knowledge of clinical geneticists. G E V A CT can classify variants annotated by Alamut Batch, with<br />

a future plan to support for inputs from other annotation software's also.<br />

INTRODUCTION<br />

With the emergence of new screening techniques, targeted<br />

or whole exome and genome screening are becoming<br />

standard diagnostic norms in clinical settings to identify<br />

the variants for a genetic disease (Ng et al., 2010;<br />

Saunders et al., 2012). However, development of<br />

bioinformatics solutions for pathogenic classification of<br />

the variants still remains a big challenge and henceforth,<br />

making the process ponderous for geneticists and<br />

clinicians. In this work, we describe G E V A CT (Genomic<br />

Variant Classifier Tool), a tool for classification of<br />

genomic single nucleotide and short insertion/deletion<br />

variants. The aim of this study was to design and<br />

implement a variant classification algorithm, based on a<br />

literature review of cardiac arrhythmia syndromes<br />

(Hofman et al., 2013; Schulze-Bahr et al., 2000; Wilde &<br />

Tan, 2007) and existing knowledge of clinical geneticists.<br />

METHODS<br />

The algorithm we propose for G E V A CT is based on a<br />

published variant classification schema for cardiac<br />

arrhythmia syndromes. This approach is based on the yield<br />

of DNA testing over a time span of 15 years (1996-2011),<br />

between probands with isolated/familial cases, and also<br />

between probands with or without clear disease-specific<br />

clinical characteristics (Hofman et al., 2013). It proposes<br />

two varying approaches: one to classify missense variants<br />

and another to classify nonsense and frameshift variants.<br />

The algorithm is implemented in two phases: preprocessing<br />

and classification. In the pre-processing phase,<br />

the annotated tab-delimited variant file (vcf.ann) from the<br />

Alamut batch, is refined based on the gene list for the<br />

disease-of-interest, so as to reduce the number of variants<br />

for the analysis. Filters are applied to look for variants that<br />

have already been reported in the Human Genome<br />

Mutation Database (Stenson et al., 2003) and in ClinVar<br />

(Landrum et al., 2014), or that have previously been<br />

detected and classified in an internal patient population.<br />

And lastly, the variants are filtered based on their location<br />

in the genome and their coding effect, followed by the<br />

check for minor allele frequency of the variant in a control<br />

population (Sherry ST et al. 2001). Thereafter, in the<br />

classification phase, the filtered variants are classified as<br />

missense or nonsense and frameshift variants. For<br />

missense variants the classification is based on the<br />

parameters: amino acid substitution and its impact on<br />

protein function (Adzhubei et al., 2010; Kumar et al.,<br />

2009), biochemical variation (Mathe et al., 2006),<br />

conservation (Pollard et al., 2010), frequency of variant<br />

alleles in a control population (ExAC, <strong>2015</strong>), effects on<br />

splicing (Desmet et al., 2009), family and phenotype<br />

information and functional analysis. Whereas, for the<br />

nonsense and frameshift variants, it is based on: effects on<br />

splicing, frequency of variant alleles in a control<br />

population, family and phenotype information and<br />

functional analysis. For each parameter, a score is given to<br />

the variant, which is subsequently cumulated.<br />

Conclusively, based on the cumulative score each variant<br />

is classified into one of the five categories: Class I - Non-<br />

Pathogenic; Class II - VUS1 (unlikely pathogenic); Class<br />

III - VUS2 (unclear); Class IV - VUS3 (likely<br />

pathogenic); Class V - Pathogenic (Sharon et al., 2008).<br />

RESULTS & DISCUSSION<br />

In this study, we report a Java based tool called G E V A CT,<br />

developed for classification of genomic variants. Input for<br />

the tool is an annotated vcf file, while the output depicts<br />

the cumulative classification score along with the class<br />

label for a variant. The tool was tested on a dataset of 130<br />

cardiac arrhythmia syndrome patients, available at UZ<br />

Brussel. The results of the variant classification made by<br />

the tool were cross-validated by manual curation,<br />

performed by the clinical geneticist. Definitively, the<br />

study indicates the tool to be promising but needs to be<br />

further validated on datasets from other diseases. In<br />

addition to, we are working on the tool to be adaptable for<br />

file inputs from other annotation software.<br />

REFERENCES<br />

Adzhubei IA et al. Nat Methods 7(4), 248-249 (2010).<br />

Desmet et al. Nucleic Acids Res 37 (9): e67 (2009).<br />

Exome Aggregation Consortium (ExAC), Cambridge, MA (<strong>2015</strong>).<br />

Hofman N et al. Circulation 128(14),1513-21 (2013).<br />

Kumar P et al. Nat Protoc 4(7), 1073–1081 (2009).<br />

Landrum MJ et al. Nucleic Acids Res 42(1), D980-5 (2014).<br />

Mathe E et al. Nucleic Acids Res 34(5),1317-25 (2006).<br />

Ng SB et al. Nat Genetics 42, 30–35 (2010).<br />

Pollard K et al. Genome Res 20, 110-121 (2010).<br />

Saunders CJ et al. Sci Transl Med 4, 154ra135 (2012).<br />

Sharon EP et al. Hum Mutat. 29(11), 1282–1291 (2008).<br />

Sherry ST et al. Nucleic Acids Res 29(1),308-11 (2001).<br />

Schulze-Bahr E et al. Z Kardiol 89 Suppl 4:IV12-22 (2000).<br />

Stenson et al. Hum Mutat. 21:577-581 (2003).<br />

Wilde AA & Tan HL Circ J 71, Suppl A:A12-9 (2007).<br />

65

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!