12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Mining Biomedical Data Using MMTx and UMLS 167String input = (String)it.next();ArrayList al = (ArrayList)inputphrases.get(input);for(Iterator it2=al.iterator();it2.hasNext();){synchronized(rif) { myMMTx.processDocument(input);}}}myMMTx.cleanup();} catch (Exception e) {//Error handling code here}}}3.6.5. MMTx Wrapper (Non-Java Programmer Option)This is similar to running MMTx on the command line, except MMTx iswrapped by an external program that processes the output as it is generated.This gives significantly more power than running MMTx on the command lineand allows any language to be used as the processing tool. The disadvantage isthe extra programming work involved. The options are too varied and will notbe covered here.3.7. Filtering and Reprocessing Preliminary Results (Command Line)The output generated by MMTx can be quickly filtered on a UNIX system(see Note 4) by the following command:cat outputfile.txt | grep -P ‘C0153594|C0855197|Section’ > results.txtThis will leave only the original geneRIF input (the MMTx Section) and belowit any hits for “Testicular malignant germ cell tumor” and “Malignant neoplasm oftestis”. A visual inspection at this point may reveal problems with the currentfiltering (Figs. 4 and 5). As instance calcium ions (Ca 2+ ) are parsed into a Ca tokenthat is recognized as cancer. A geneRIF discussing calcium and the testes may beflagged as cancer accidentally. Using abbreviations like this can be turned off (the—no_acros_abbrs flag) but more may be lost than gained. It is up to the researcherto customize the filtering for the data set, it will be an interactive, learning process.3.8. AnalysisUltimately, at some point the value gained for handling exceptionally badmapping cases will be less than the effort required to handle them. It is at thispoint that the user is done. An additional step a user may want to take is to runthe data through NegEx (7), which detects negation expressions in text. Forinstance, the program outlined earlier will map all geneRIFs mentioning testiculardata. This means that geneRIFs to the effect of “This gene is not involved intesticular cancer” will show an association. NegEx can detect and remove these.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!