04.04.2013 Views

Transcriptional Characterization of Glioma Neural Stem Cells Diva ...

Transcriptional Characterization of Glioma Neural Stem Cells Diva ...

Transcriptional Characterization of Glioma Neural Stem Cells Diva ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.5 Literature Mining Methods<br />

at the last and least-specific <strong>of</strong> the databases. This algorithm maximises the<br />

probability <strong>of</strong> finding a reported connection between any gene and any disease,<br />

prompted minor disease-specific details are changed such as the identity <strong>of</strong> the<br />

first database, which needs to be the most disease-specific <strong>of</strong> the six.<br />

If we call "case" the code that searches for the association between queried<br />

gene and glioblastoma in one database, and "iteration" the search loop com-<br />

pleted from first to last database, then during each case a weighted-index is<br />

accumulated to become, at the end <strong>of</strong> one iteration, a total weighted index, i.e.<br />

a score <strong>of</strong> how well that particular gene did in the search for its association to<br />

glioblastoma in the six databases. Therefore, the total weighted index repre-<br />

sents those parameters used in the look-ups that yielded the most favourable<br />

results, i.e. which database successfully found an association between the gene<br />

and glioblastoma, how many hits were found, and in which part <strong>of</strong> the paper<br />

the hits were found. In fact, I assigned a greater weight to the associations<br />

found in the title, with respect to those found in the abstract, with respect to<br />

those found in the contents <strong>of</strong> a paper. In designing the weight structure I also<br />

decided to favour the associations found in the same sentence to those found<br />

more than one sentence apart. The latter, in fact, barely carry any relevance<br />

in the value cumulation process <strong>of</strong> the total weighted index.<br />

In this particular application it was interesting to know whether the queried<br />

gene was implicated in any type <strong>of</strong> cancer, especially in case the literature<br />

did not report an association with glioblastoma. Therefore, at every iteration,<br />

the databases are also queried with parameters that identify the answer to<br />

the question "Is this gene implicated in any cancer?" and a separate index<br />

from the total weighted index is determined. To calculate this cancer index<br />

I searched for an association between the gene being queried and any <strong>of</strong> the<br />

cancers listed in a database that I compiled for this particular application. All<br />

the cancers listed in the National Cancer Institute A to Z List <strong>of</strong> Cancers [206]<br />

are present in the cancer database as a unique set <strong>of</strong> names. A search was per-<br />

formed in PubMed for every combination <strong>of</strong> queried gene symbol and cancer<br />

name from the compiled cancer database and, if the two terms appeared in the<br />

same sentence, this was considered an association and the cancer index for that<br />

combination incremented by one. At the end <strong>of</strong> the search, every gene-cancer<br />

combination possessed a cancer-index and the association was reported if the<br />

value <strong>of</strong> such index was above an arbitrarily chosen threshold value.<br />

108

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!