12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

154 Osborne et al.are included in the 2006 release of UMLS including widely used vocabulariessuch as SNOMED International Statistical Classification of Diseases andRelated Health Problems (ICD9), Medical Subject Headings (MeSH), theFormal Model of Anatomy, and many others. The NLM is making an effort tocover as widely as possible the biomedical domain, so in addition to the standardmedical vocabularies, additional vocabularies covering drug codes, chemicals,adverse reactions, and nursing care standards are also included. Ingeneral, the coverage is large enough that most researchers should be able tofind most commonly needed systems and concepts needed to map free text fortheir problem domain. A detailed understanding of MMTx is not required touse it, but it helps to understand the process in order to get the best results possible.A more detailed and extensive description can be found from the documentspage (http://mmtx. nlm.nih.gov/ docs.shtml), but the salient points aresummarized herein. Figure 1 outlines the steps taken by MMTx as it mapscomponents of free text to candidate concepts.First the tokenization module organizes the input document into sectionsconsisting of sets of sentences and tokens. This tokenizer will recognize theMEDLINE format (available for PubMed articles through NCBI) or free textautomatically, so in most cases the users will need to do little, if any formattingof input data before running MMTx. The Part of Speech Tagger Client (2) then“tags” the tokens in order to identify which part of speech (such as a noun) thetokens belong to. These tagged tokens are then subject to LexicalLookup, themodule that determines if any of the tagged elements belong to a particular lexicon.Adjacent tokens that are part of the same lexicon (for instance “July” and“5th”) can then be treated as a single lexical element. A noun phrase parser thenidentifies noun phrases from these elements for which variants are calculated bytable lookup. These variants are then used to identify matching strings from theUMLS Metathesaurus termed candidates. Each of the candidates is evaluatedand assigned a score based on the extent of contiguity, central componentinvolvement, cohesiveness, word order, and other factors. The final mappingmodule generates a list of UMLS Metathesaurus concepts that best cover theinput noun phrase and associated scores representing the mapping result forinput text.The number of applications for a tool like MMTx is enormous. It can be datamining and inferring relationship between concepts in MEDLINE publicationsand other published data and is in general appropriate for any task that requiresthe transformation of free text biomedical data into categorized, comparablebiomedical information. Published examples include extracting informationabout medical problems from clinical reports (3), detecting respiratory illnessin patients from emergency department reports (4), and annotating enzymeclasses with disease-related information (5). Although, not always easy to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!