09.05.2014 Views

FY2010 - Oak Ridge National Laboratory

FY2010 - Oak Ridge National Laboratory

FY2010 - Oak Ridge National Laboratory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Director’s R&D Fund—<br />

Ultrascale Computing and Data Science<br />

for Medicine using SEmi-supervised Learning (DAMSEL),” we (1) developed an analytical, automated<br />

learning framework and tools for processing multimodality medical data (text and images) for the purpose<br />

of data mining and assessment; (2) improved performance, portability, and scalability of this<br />

computational framework by leveraging available intelligent software and hardware computing resources<br />

and adding functionality to the system; and (3) conducted preliminary validation of the performance on<br />

medical data. These aims were accomplished in the context of two biomedical applications: breast cancer<br />

(mammography—imaging, pathologies, text reports) and Abdominal Aortic Aneurysm (AAA) (using 3D<br />

imaging modalities, such as MRI, in addition to surgical and clinical notes from the patient record). In<br />

addition, initial exploration of data for traumatic brain injury (TBI) [using Joint Theater Trauma<br />

Registries (JTTR) and related TBI source data] was conducted.<br />

Mission Relevance<br />

DAMSEL has the ability to facilitate the development of more powerful analytical tools by leveraging all<br />

of the data in a more effective manner than present approaches permit. These technologies are at the<br />

forefront of systems medicine application development and require computationally intensive<br />

environments for processing. Potential sponsors for follow-on funding include the <strong>National</strong> Institutes of<br />

Health (NIH) and the Department of Defense (DoD), which have open program announcements and<br />

planned initiatives in these areas. For example, NIH’s funding opportunity PAR-09-218, Innovations in<br />

Biomedical Computational Science and Technology, will support research in tools for data acquisition,<br />

archiving, querying, retrieval, visualization, integration, and management; platform-independent<br />

translational tools for data exchange and for promoting interoperability; and analytical and statistical tools<br />

for interpretation of large data sets. This project is also consistent with such DOE programs as<br />

Mathematical, Information, and Computational Sciences, KJ.01.00.00.0; Computer Science,<br />

KJ.01.01.02.0; and Computational Partnerships, KJ.01.01.03.0. Other DOE programs that will benefit<br />

include national security, intelligence, and biosurveillance applications.<br />

Results and Accomplishments<br />

The project created a multimodal learning framework and tools for the analysis of mammography images<br />

and reports and also for abdominal aortic aneurysm (AAA) images and reports. We developed a semisupervised<br />

machine learning framework that integrates the text and image modalities by transforming an<br />

image feature vector produced through image processing to a lower dimensional space that is smooth with<br />

respect to the problem-specific similarities described in the text reports. The DAMSEL project provides<br />

support for combining image and text modalities in previously unavailable ways, but the general<br />

framework is also generic enough to support the combination of any number of different modalities that<br />

represent different views of the medical problem. The effectiveness of the framework when the secondary<br />

modality set is engineered to consist of features representative of the target problem can be dramatic and<br />

has been demonstrated via improvement over state-of-the-art results.<br />

The text analysis work for breast imaging produced the following key new capabilities: (1) a geneticalgorithm-based<br />

approach to identifying reports of abnormalities; (2) a genetic-algorithm-based approach<br />

to identifying key phrase patterns in the language used for the mammography domain; (3) a classifier for<br />

mammography documents; and (4) a temporal analysis approach for examining and finding key phrase<br />

patterns that behave as precursors to a future event in mammography patients. The key-phrase patterns<br />

represent a highly effective set of features for creating cancer-related dichotomies in the data and support<br />

the discovery of a valuable image-processing manifold through the framework. Additional work in year 2<br />

included new analyses for the AAA data. This effort resulted in (1) a searchable index of the<br />

mammography reports that includes specific information such as labels, date, key-phrase patterns<br />

(s-grams), and anonymized patient IDs and can be presented in human- or machine-readable (xml)<br />

format; (2) a patient-centered approach that uses all reports per patient and their timestamps to facilitate<br />

exploratory temporal data analysis and visualization using information retrieval and classical statistical<br />

67

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!