12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Mining Biomedical Data Using MMTx and UMLS 165Separator=’|’) and the text of interest is found in the fifth field(⎯textField=5). The concept unique identifiers are displayed in UMLS(⎯show_cuis), which can come in handy when mapping results acrossvocabularies. The list of candidates for a map are not displayed (-c=false) norare the intermediate mappings (-m=false) in order to reduce the volume ofoutput directed to outputfile.txt.The abbreviations for semantic types are found in (http://mmtx.nlm.nih.gov/semanticTypes.shtml), it is not possible to use the numerical or full-length formatwhen specifying the semantic type. One of the points to keep in mind whenrunning MMTx is that it is CPU bound and usually has a relatively large runningtime. So it is worthwhile to examine the early results of the run by looking atthe outputfile.txt to ensure that the results being achieved are useful. If machineprocessing is desired at a later point, the –f (fielded output) or –q (machineoutput) options can be used. However, neither option has any flexibility incustomizing the output; they do not include the concept unique identifier (CUI)for the actual mapping result and so might be of limited utility.3.6.4. Java API (Java Programmers Only)Using the Java API is the optimal way of handling MMTx. With a little bit ofwork the processing of the input data can be precisely controlled, which includesusing any other metadata from the data source at processing time in evaluatingmapping candidates. It also allows for an exact specification of theoutput format for easy analysis. A description of the API can be foundhttp://mmtx.nlm.nih.gov/MMTxAPI_V2.3.pdf. Below is a template for constructingthe Java API to process geneRIFs, consisting of two separate source files.These are also available on the web at: http://download.bioinformatics.northwestern.edu/download/mmtx/mmtx_java_example_template.tar.gzThe first file (MMTxGeneRIF.java) is a subclass of MMTxAPILite thatwill handle the processing. It is in this file in which one’s evaluation of theinput phrases should occur, because at this point one will have access scoringresults in MMTx plus access to any additional metadata in the input phraseone wants to make use of in evaluating the candidate. The text in bold,“Undesired phrase here” can be replaced with whatever is appropriate toremove undesired mappings. The last lines of this file prints out a candidateCUI, mapped phrase, and score. Adding in an empirical derived cutoff scoreto remove bad mappings can further reduce candidates and the output formatcan also be adjusted herein.MMTxGeneRIF.javaimport java.util.*;import gov.nih.nlm.nls.nlp.textfeatures.*; // -Included in MMTx.jarimport gov.nih.nlm.nls.mmtx.MMTxAPILite;

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!