28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Mining Clinical, Immunological, <strong>and</strong> Genetic Data 221<br />

discretized the age information in buckets: the first bucket ranging from 1 to<br />

30 years, <strong>and</strong> the following ones having a size of 5 years.<br />

An important problem addressed in this phase has been that of homozygous<br />

alleles, i.e., the situation in which the two alleles for a given locus are<br />

identical. Since we are dealing with information structured in transactions,<br />

i.e., plain sets <strong>and</strong> not bags, no duplicate items are admitted. Therefore, in<br />

order not to lose information about homozygous alleles, we have created a<br />

special item for each locus indicating the presence of homozygous alleles.<br />

Following the physicians’ indications we have then divided the pathologies<br />

in two main groups, with a total of three item codes:<br />

• Hepatic cirrhosis due to viral infection:<br />

– one item code for HCV+;<br />

– one item code for HBV+;<br />

• Hepatic cirrhosis with non viral origin (mainly autoimmune): one unique<br />

item code for cryptogenetic cirrhosis, primary biliary cirrhosis, necrosis<br />

VBI <strong>and</strong> VBE, alcoholic cirrhosis, HCC, etc.<br />

Example 1. Table 1 shows a sample of the input file accepted by the algorithm.<br />

The file is composed by several transactions (sets of integers), that can have<br />

different length: a patient can have different types of cirrhosis at the same time<br />

(this is the case of the patient in the third transaction, which has two disease<br />

codes, 1002 <strong>and</strong> 1004), or it could have some genetic variables missing.<br />

The first transaction in Table 1 regards a female patient, up to 30 years old,<br />

with autoimmune cirrhosis (code 1004) <strong>and</strong> the following HLA characterization:<br />

A1 =24,A2 =25,B1 =8,B2 =18,DRB11 = DRB12 =1.Here<br />

the DRB1 is a homozygous locus, i.e. it has the same value in each of the two<br />

chromosomes; this is represented by the presence of the item code 2499.<br />

We ran the software on the input database <strong>and</strong> produced the frequent closed<br />

itemsets. After the mining phase we performed a post-processing phase, in<br />

which the extracted patterns were automatically selected w.r.t. their interestingness.<br />

Here by interesting pattern we mean a pattern with an exceptionally<br />

high frequency in the patients database w.r.t. the frequency of the<br />

same pattern, without disease information, in the control database. In fact,<br />

this situation would correspond to a possible association between a specific<br />

pattern <strong>and</strong> a specific class of diseases.<br />

Table 2 reports some of the results obtained with this method. In the<br />

second column we have the pattern which if taken together with the disease<br />

code in the third column, constitutes a frequent closed itemset in the patients<br />

Table 1. A sample of the input database<br />

2 30 2024 2025 2208 2218 2401 2499 1004<br />

1 30 2001 2002 2203 2297 2403 2499 1004<br />

2 35 2024 2031 2214 2251 2414 2404 1004 1002

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!