bbc 2015
BBC2015_booklet
BBC2015_booklet
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
Poster<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P21. GEVACT: GENOMIC VARIANT CLASSIFIER TOOL<br />
Isel Grau 1,4 , Dorien Daneels 2,3 , Sonia Van Dooren 2,3 , Maryse Bonduelle 2 ,<br />
Dewan Md. Farid 1,3 , Didier Croes 2,3 , Ann Nowé 1,3 & Dipankar Sengupta 1,3* .<br />
Como - Artificial Intelligence Lab, Vrije Universiteit Brussel 1 ; Centre for Medical Genetics, Reproduction and Genetics,<br />
Reproduction Genetics and Regenerative Medicine, Vrije Universiteit Brussel,UZ Brussel 2 ; Interuniversity Institute of<br />
Bioinformatics in Brussels, ULB-VUB 3 ; Department of Computer Sciences, Universidad Central de Las Villas 4 .<br />
* Dipankar.Sengupta@vub.ac.be<br />
High throughput screening (HTS) techniques, like genome or exome screening are becoming norms in the conventional<br />
clinical analysis. However, classifying the identified variants to be pathogenic, or potentially pathogenic or nonpathogenic,<br />
is still a manual, tedious and time consuming process for clinicians or geneticists. Thus, to facilitate the<br />
variant classification process, we have developed G E V A CT, a Java based tool, designed on an algorithm, i.e. based on the<br />
existing literature and knowledge of clinical geneticists. G E V A CT can classify variants annotated by Alamut Batch, with<br />
a future plan to support for inputs from other annotation software's also.<br />
INTRODUCTION<br />
With the emergence of new screening techniques, targeted<br />
or whole exome and genome screening are becoming<br />
standard diagnostic norms in clinical settings to identify<br />
the variants for a genetic disease (Ng et al., 2010;<br />
Saunders et al., 2012). However, development of<br />
bioinformatics solutions for pathogenic classification of<br />
the variants still remains a big challenge and henceforth,<br />
making the process ponderous for geneticists and<br />
clinicians. In this work, we describe G E V A CT (Genomic<br />
Variant Classifier Tool), a tool for classification of<br />
genomic single nucleotide and short insertion/deletion<br />
variants. The aim of this study was to design and<br />
implement a variant classification algorithm, based on a<br />
literature review of cardiac arrhythmia syndromes<br />
(Hofman et al., 2013; Schulze-Bahr et al., 2000; Wilde &<br />
Tan, 2007) and existing knowledge of clinical geneticists.<br />
METHODS<br />
The algorithm we propose for G E V A CT is based on a<br />
published variant classification schema for cardiac<br />
arrhythmia syndromes. This approach is based on the yield<br />
of DNA testing over a time span of 15 years (1996-2011),<br />
between probands with isolated/familial cases, and also<br />
between probands with or without clear disease-specific<br />
clinical characteristics (Hofman et al., 2013). It proposes<br />
two varying approaches: one to classify missense variants<br />
and another to classify nonsense and frameshift variants.<br />
The algorithm is implemented in two phases: preprocessing<br />
and classification. In the pre-processing phase,<br />
the annotated tab-delimited variant file (vcf.ann) from the<br />
Alamut batch, is refined based on the gene list for the<br />
disease-of-interest, so as to reduce the number of variants<br />
for the analysis. Filters are applied to look for variants that<br />
have already been reported in the Human Genome<br />
Mutation Database (Stenson et al., 2003) and in ClinVar<br />
(Landrum et al., 2014), or that have previously been<br />
detected and classified in an internal patient population.<br />
And lastly, the variants are filtered based on their location<br />
in the genome and their coding effect, followed by the<br />
check for minor allele frequency of the variant in a control<br />
population (Sherry ST et al. 2001). Thereafter, in the<br />
classification phase, the filtered variants are classified as<br />
missense or nonsense and frameshift variants. For<br />
missense variants the classification is based on the<br />
parameters: amino acid substitution and its impact on<br />
protein function (Adzhubei et al., 2010; Kumar et al.,<br />
2009), biochemical variation (Mathe et al., 2006),<br />
conservation (Pollard et al., 2010), frequency of variant<br />
alleles in a control population (ExAC, <strong>2015</strong>), effects on<br />
splicing (Desmet et al., 2009), family and phenotype<br />
information and functional analysis. Whereas, for the<br />
nonsense and frameshift variants, it is based on: effects on<br />
splicing, frequency of variant alleles in a control<br />
population, family and phenotype information and<br />
functional analysis. For each parameter, a score is given to<br />
the variant, which is subsequently cumulated.<br />
Conclusively, based on the cumulative score each variant<br />
is classified into one of the five categories: Class I - Non-<br />
Pathogenic; Class II - VUS1 (unlikely pathogenic); Class<br />
III - VUS2 (unclear); Class IV - VUS3 (likely<br />
pathogenic); Class V - Pathogenic (Sharon et al., 2008).<br />
RESULTS & DISCUSSION<br />
In this study, we report a Java based tool called G E V A CT,<br />
developed for classification of genomic variants. Input for<br />
the tool is an annotated vcf file, while the output depicts<br />
the cumulative classification score along with the class<br />
label for a variant. The tool was tested on a dataset of 130<br />
cardiac arrhythmia syndrome patients, available at UZ<br />
Brussel. The results of the variant classification made by<br />
the tool were cross-validated by manual curation,<br />
performed by the clinical geneticist. Definitively, the<br />
study indicates the tool to be promising but needs to be<br />
further validated on datasets from other diseases. In<br />
addition to, we are working on the tool to be adaptable for<br />
file inputs from other annotation software.<br />
REFERENCES<br />
Adzhubei IA et al. Nat Methods 7(4), 248-249 (2010).<br />
Desmet et al. Nucleic Acids Res 37 (9): e67 (2009).<br />
Exome Aggregation Consortium (ExAC), Cambridge, MA (<strong>2015</strong>).<br />
Hofman N et al. Circulation 128(14),1513-21 (2013).<br />
Kumar P et al. Nat Protoc 4(7), 1073–1081 (2009).<br />
Landrum MJ et al. Nucleic Acids Res 42(1), D980-5 (2014).<br />
Mathe E et al. Nucleic Acids Res 34(5),1317-25 (2006).<br />
Ng SB et al. Nat Genetics 42, 30–35 (2010).<br />
Pollard K et al. Genome Res 20, 110-121 (2010).<br />
Saunders CJ et al. Sci Transl Med 4, 154ra135 (2012).<br />
Sharon EP et al. Hum Mutat. 29(11), 1282–1291 (2008).<br />
Sherry ST et al. Nucleic Acids Res 29(1),308-11 (2001).<br />
Schulze-Bahr E et al. Z Kardiol 89 Suppl 4:IV12-22 (2000).<br />
Stenson et al. Hum Mutat. 21:577-581 (2003).<br />
Wilde AA & Tan HL Circ J 71, Suppl A:A12-9 (2007).<br />
65