12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Mining Biomedical Data Using MMTx and UMLS 1572. MaterialsData mining with MMTx will require a host machine with a least 1 GHzCPU speed, 1 GB of RAM, and at least 3–4 GB of hard disk space for most usecases. If the user is going to use custom data sets (see Subheading 3.1.), thenthe UMLS Metathesaurus and Metamorphosys will also need to be installed,raising the hard disk requirements to approx 40 GB.UMLS Metamorphosys and MMTx are both Java-based programs and requireJava virtual machine (JVM) to run. They have been tested with JVMs forWindows (XP, 2000, and NT), Linux, Solaris (8 and 9), or Macintosh OS X 10.3or higher. Up-to-date requirements for the installation of the UMLSMetathesaurus can be found in the README.txt file distributed with each distributionof UMLS. The ideal running environment for ease of setup is probablyone of the non-Windows systems with Java already installed. The command lineexamples are designed for a UNIX-like system. However, if NegEx (a programwhich detects negations of concepts in Text Mining) is going to be used, then aWindows system is required. Owing to their large size, obtaining UMLS dataand programs is easiest with a fast Internet connection, otherwise UMLS can beordered in DVD format. No Internet connection is required for Metamorphosysor MMTx while running. Users should also have their own data set in electronicformat on the same machine on which UMLS is installed. UMLS does not needto be put into a relational database to be used, but if this is desired, then the hostmachine should have either mySQL or Oracle installed.3. Methods3.1. Determining the Suitability of UMLS for Input Data SetBroadly speaking the UMLS is organized by vocabulary, by semantic type,and by individual atomic UMLS concepts. Vocabularies in UMLS are an organizedset of concepts and relationships. Semantic types span vocabularies inUMLS and were created to categorize all concepts represented in the UMLS. Itincludes general categories such as “drugs” and “congenital abnormalities” thatare commonly found in UMLS vocabularies. At the lowest level the UMLS hasconcepts, which describe the narrowest entity such as a particular drug or a specificdisease. The first step in determining UMLS suitability is to determinewhat in UMLS terms one wants to match the input data with. First-time userswill likely use MMTx to search against one of the preconfigured data sets ofUMLS and then filter their matches against a particular vocabulary, semantictype or small concept set, to get the results needed. UMLS is preconfigured tomake available most of its English language vocabulary sources. The onlyexceptions are listed in Table 1 and are due to licensing restrictions set in placeby the American Medical Association (AMA).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!