16.04.2014 Views

Embedding R in Windows applications, and executing R remotely

Embedding R in Windows applications, and executing R remotely

Embedding R in Windows applications, and executing R remotely

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Application of R to Data M<strong>in</strong><strong>in</strong>g <strong>in</strong> Hospital<br />

Information Systems<br />

Shusaku Tsumoto <strong>and</strong> Shoji Hirano<br />

Department of Medical Informatics,<br />

Shimane Medical University, School of Medic<strong>in</strong>e,<br />

Enya-cho Izumo City, Shimane 693-8501 Japan<br />

Abstract<br />

Hospital <strong>in</strong>formation systems have been <strong>in</strong>troduced s<strong>in</strong>ce 1980’s <strong>and</strong> have stored<br />

a large amount of data of laboratory exam<strong>in</strong>ations. Thus, the reuse of such<br />

stored data becomes a critical issue <strong>in</strong> medic<strong>in</strong>e, which may play an important<br />

role <strong>in</strong> medical decision support. On the other h<strong>and</strong>, data m<strong>in</strong><strong>in</strong>g from the computer<br />

science side emerged <strong>in</strong> early 1990’s, which can be viewed a re-<strong>in</strong>vention<br />

of what statisticians, especially those on exploratory data analysis, had proposed<br />

<strong>in</strong> 1970’s. The ma<strong>in</strong> objective of the present data m<strong>in</strong><strong>in</strong>g is to extract<br />

useful patterns from a large mount of data with statistical <strong>and</strong> mach<strong>in</strong>e learn<strong>in</strong>g<br />

methods. Especially, it has been reported <strong>in</strong> medical field that a comb<strong>in</strong>ation<br />

of these two methodologies are very useful: Mach<strong>in</strong>e learn<strong>in</strong>g method is useful<br />

for extract<strong>in</strong>g “hypotheses” which may not be significant from the viewpo<strong>in</strong>t of<br />

statistics. After deriv<strong>in</strong>g these hypotheses, statistical analysis is used for its validity.<br />

Thus, it has been expected that comb<strong>in</strong>ation of these two methodologies<br />

will play an important role <strong>in</strong> medical decision support, such as <strong>in</strong>tra-hospital<br />

<strong>in</strong>fection control, detection of risk factors.<br />

This paper report an application of R to data m<strong>in</strong><strong>in</strong>g <strong>in</strong> hospital <strong>in</strong>formation<br />

systems. As a prelim<strong>in</strong>ary tool, we developed a package for data m<strong>in</strong><strong>in</strong>g <strong>in</strong><br />

medic<strong>in</strong>e, <strong>in</strong>clud<strong>in</strong>g the follow<strong>in</strong>g procedures: (1) Interface for Statistical Analysis,<br />

which is based on Rcmdr. (2) Rule Induction, which supports associaton <strong>and</strong><br />

rough set-based rule <strong>in</strong>duction method. (3) Categorization of Numerical Variables:<br />

detection of cut-off po<strong>in</strong>t is very important <strong>in</strong> medical diagnosis. Several<br />

methods proposed <strong>in</strong> data m<strong>in</strong><strong>in</strong>g <strong>and</strong> medical decision mak<strong>in</strong>g are implemented.<br />

(4) Cluster<strong>in</strong>g, based on R packages. Also, this package supports rough cluster<strong>in</strong>g,<br />

which gives a <strong>in</strong>discernbility-based cluster<strong>in</strong>g with iterative ref<strong>in</strong>ement<br />

of equivalence relations <strong>in</strong> a data set. (5) Temporal Data M<strong>in</strong><strong>in</strong>g (Multiscale<br />

Match<strong>in</strong>g <strong>and</strong> Dynamic Time Warp<strong>in</strong>g): these methods have been <strong>in</strong>troduced<br />

for classification of long-term time series data. These methods output a distance<br />

between two sequences, which can be used for cluster<strong>in</strong>g methods. The usage of<br />

R gives the follow<strong>in</strong>g advantages: (a) Rough set methods can be easily achieved<br />

by fundamental R-functions, (b) Comb<strong>in</strong>ation of rough set methods <strong>and</strong> statistical<br />

packages are easily acheived by rich R-packages. In the conference, several<br />

aspects of this package <strong>and</strong> experimental results will be presented.<br />

Keywords: Data M<strong>in</strong><strong>in</strong>g, Hospital Information System, Time Series Analysis<br />

1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!