13.07.2015 Views

gene pathway text mining and visualization - Artificial Intelligence ...

gene pathway text mining and visualization - Artificial Intelligence ...

gene pathway text mining and visualization - Artificial Intelligence ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Gene Pathway Text Mining <strong>and</strong> Visualization 537selective relations without introducing any errors. To learn the coverage ofthe FSA, we counted all occurrences of “by,” “of,” <strong>and</strong> “in,” with a fewexceptions such as “in addition,” which are explicitly disregarded by theparser because they result in irrelevant relations. Seventy-seven percent ofall “of” prepositions, 29 percent of all “by” prepositions, <strong>and</strong> 14 percent ofall “in” prepositions were correctly captured. This indicates that the OF-FSAis relatively complete for biomedical <strong>text</strong>. The BY-FSA <strong>and</strong> IN-FSA cover asmaller portion of the available structures.3.2.5 Ontology <strong>and</strong> Concept Space Integration1. Additional Genescene ComponentsWe parsed more than 100,000 PUBMED abstracts related to p53, ap1,<strong>and</strong> yeast. The parser processes 15 abstracts per second on a regular desktopcomputer. We stored all relations <strong>and</strong> combined them with Concept Space(Chen <strong>and</strong> Lynch 1992), a co-occurrence based semantic network, inGenescene. Both techniques extract complementary biomedical relations: theparser extracts precise, semantically rich relations <strong>and</strong> Concept Spaceextracts co-occurrence relations. The Gene Ontology (Ashburner et al.,2000), the Human Genome Nomenclature (Wain et al., 2002), <strong>and</strong> theUMLS were used to tag terms. More than half of the terms received a tag.The UMLS provided most tags (57 percent), <strong>and</strong> GO (1 percent) <strong>and</strong> HUGO(0.5 percent) fewer.2. Results of Ontology IntegrationIn an additional user study, two researchers evaluated terms <strong>and</strong> relationsfrom abstracts of interest to them. The results showed very high precision ofthe terms (93 percent) <strong>and</strong> parser relations (95 percent). Concept Spacerelations with terms found in the ontologies were more precise (78 percent)than without (60 percent). Terms with more specific tags, e.g., from GOversus the UMLS, were evaluated as more relevant. Parser relations weremore relevant than Concept Space relations. Details of this system <strong>and</strong> studycan be found in (Leroy <strong>and</strong> Chen, in press).3.2.6 ConclusionThis study described an efficient parser based on closed-class Englishwords to efficiently capture relations between noun phrases in biomedical<strong>text</strong>. Relations are specified with syntactic constraints <strong>and</strong> described in FSAbut may contain any verb, noun, or noun phrase. On average, the extractedrelations are more than 90 percent correct. The parser is very efficient <strong>and</strong>larger collections have been parsed <strong>and</strong> combined with the UMLS, GO,HUGO <strong>and</strong> a semantic network called Concept Space. This facilitatesintegration.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!