13.07.2015 Views

gene pathway text mining and visualization - Artificial Intelligence ...

gene pathway text mining and visualization - Artificial Intelligence ...

gene pathway text mining and visualization - Artificial Intelligence ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

534 MEDICAL INFORMATICS3.2 GeneScene ParserThis case study discusses the Genescene parser, which extractsrelations between noun phrases <strong>and</strong> is tuned for biomedical <strong>text</strong>.3.2.1 Extracting Semantic ElementsThe parser begins by formatting, tokenizing, <strong>and</strong> tagging the PUBMEDabstract with part-of-speech <strong>and</strong> noun phrase tags. The abstracts are preparedby removing phrases referring to publisher <strong>and</strong> copyright information. Thenthe sentence splitter is run followed by the AZ Noun Phraser (Tolle <strong>and</strong>Chen, 2000) to extract noun phrases. Verbs <strong>and</strong> adverbs are tagged with theirpart-of-speech (POS) based on a rule set <strong>and</strong> lexical lookup in the UMLSSpecialist Lexicon. Closed class words such as prepositions, negation,conjunctions, <strong>and</strong> punctuation are also tagged. Nouns <strong>and</strong> noun phrases arechecked for nominalizations. When a nominalization is discovered, e.g.,“activation,” then both the infinitive <strong>and</strong> the original nominalization areretained. Nominalizations can be replaced by the infinitive to facilitate <strong>text</strong><strong>mining</strong> <strong>and</strong> <strong>visualization</strong>.3.2.2 Extracting Structural ElementsRelations have a syntactic basis: they are built around basic sentencestructures <strong>and</strong> prepositions. Prepositions were chosen because they form aclosed class <strong>and</strong> can help capture the structure of a sentence. The closedclasses’ membership does not change <strong>and</strong> allows us to build very specificbut semantically <strong>gene</strong>ric relation templates. In addition, prepositions oftenhead phrases (Pullum <strong>and</strong> Huddleston, 2002) <strong>and</strong> indicate different types ofrelations, such as time or spatial relations (Manning <strong>and</strong> Schütze, 2001).Although prepositional attachment ambiguity may become a problem, webelieve that researchers in biomedicine use a common writing style <strong>and</strong> sothe attachment structures will not vary much for a specific structure.This case study describes relations built around three prepositions, “by,”“of,” <strong>and</strong> “in,” which occur frequently in <strong>text</strong> <strong>and</strong> lead to interesting, diversebiomedical relations. “By” is often used to head complements in passivesentences, for example in “Mdm2 is not increased by the Ala20 mutation.”“Of” is one of the most highly grammaticised prepositions (Pullum <strong>and</strong>Huddleston, 2002) <strong>and</strong> is often used as a complement, such as for examplein “the inhibition of the activity of the tumor suppressor protein p53.” “In” isusually an indication of location. It forms interesting relations whencombined with verbs, for example in “Bcl-2 expression is inhibited inprecancerous B cells.”

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!