FY2010 - Oak Ridge National Laboratory
FY2010 - Oak Ridge National Laboratory
FY2010 - Oak Ridge National Laboratory
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Director’s R&D Fund—<br />
Ultrascale Computing and Data Science<br />
for Medicine using SEmi-supervised Learning (DAMSEL),” we (1) developed an analytical, automated<br />
learning framework and tools for processing multimodality medical data (text and images) for the purpose<br />
of data mining and assessment; (2) improved performance, portability, and scalability of this<br />
computational framework by leveraging available intelligent software and hardware computing resources<br />
and adding functionality to the system; and (3) conducted preliminary validation of the performance on<br />
medical data. These aims were accomplished in the context of two biomedical applications: breast cancer<br />
(mammography—imaging, pathologies, text reports) and Abdominal Aortic Aneurysm (AAA) (using 3D<br />
imaging modalities, such as MRI, in addition to surgical and clinical notes from the patient record). In<br />
addition, initial exploration of data for traumatic brain injury (TBI) [using Joint Theater Trauma<br />
Registries (JTTR) and related TBI source data] was conducted.<br />
Mission Relevance<br />
DAMSEL has the ability to facilitate the development of more powerful analytical tools by leveraging all<br />
of the data in a more effective manner than present approaches permit. These technologies are at the<br />
forefront of systems medicine application development and require computationally intensive<br />
environments for processing. Potential sponsors for follow-on funding include the <strong>National</strong> Institutes of<br />
Health (NIH) and the Department of Defense (DoD), which have open program announcements and<br />
planned initiatives in these areas. For example, NIH’s funding opportunity PAR-09-218, Innovations in<br />
Biomedical Computational Science and Technology, will support research in tools for data acquisition,<br />
archiving, querying, retrieval, visualization, integration, and management; platform-independent<br />
translational tools for data exchange and for promoting interoperability; and analytical and statistical tools<br />
for interpretation of large data sets. This project is also consistent with such DOE programs as<br />
Mathematical, Information, and Computational Sciences, KJ.01.00.00.0; Computer Science,<br />
KJ.01.01.02.0; and Computational Partnerships, KJ.01.01.03.0. Other DOE programs that will benefit<br />
include national security, intelligence, and biosurveillance applications.<br />
Results and Accomplishments<br />
The project created a multimodal learning framework and tools for the analysis of mammography images<br />
and reports and also for abdominal aortic aneurysm (AAA) images and reports. We developed a semisupervised<br />
machine learning framework that integrates the text and image modalities by transforming an<br />
image feature vector produced through image processing to a lower dimensional space that is smooth with<br />
respect to the problem-specific similarities described in the text reports. The DAMSEL project provides<br />
support for combining image and text modalities in previously unavailable ways, but the general<br />
framework is also generic enough to support the combination of any number of different modalities that<br />
represent different views of the medical problem. The effectiveness of the framework when the secondary<br />
modality set is engineered to consist of features representative of the target problem can be dramatic and<br />
has been demonstrated via improvement over state-of-the-art results.<br />
The text analysis work for breast imaging produced the following key new capabilities: (1) a geneticalgorithm-based<br />
approach to identifying reports of abnormalities; (2) a genetic-algorithm-based approach<br />
to identifying key phrase patterns in the language used for the mammography domain; (3) a classifier for<br />
mammography documents; and (4) a temporal analysis approach for examining and finding key phrase<br />
patterns that behave as precursors to a future event in mammography patients. The key-phrase patterns<br />
represent a highly effective set of features for creating cancer-related dichotomies in the data and support<br />
the discovery of a valuable image-processing manifold through the framework. Additional work in year 2<br />
included new analyses for the AAA data. This effort resulted in (1) a searchable index of the<br />
mammography reports that includes specific information such as labels, date, key-phrase patterns<br />
(s-grams), and anonymized patient IDs and can be presented in human- or machine-readable (xml)<br />
format; (2) a patient-centered approach that uses all reports per patient and their timestamps to facilitate<br />
exploratory temporal data analysis and visualization using information retrieval and classical statistical<br />
67