19.11.2014 Views

Localized Supervised Metric Learning on ... - Researcher - IBM

Localized Supervised Metric Learning on ... - Researcher - IBM

Localized Supervised Metric Learning on ... - Researcher - IBM

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>Localized</str<strong>on</strong>g> <str<strong>on</strong>g>Supervised</str<strong>on</strong>g> <str<strong>on</strong>g>Metric</str<strong>on</strong>g> <str<strong>on</strong>g>Learning</str<strong>on</strong>g> <strong>on</strong> Temporal Physiological Data<br />

Jimeng Sun, Daby Sow, Jianying Hu, Shahram Ebadollahi<br />

<strong>IBM</strong> T.J. Wats<strong>on</strong> Research Center, New York, USA<br />

{jimeng, sowdaby, jyhu, ebad}@us.ibm.com<br />

Abstract<br />

Effective patient similarity assessment is important<br />

for clinical decisi<strong>on</strong> support. It enables the capture of<br />

past experience as manifested in the collective l<strong>on</strong>gitudinal<br />

medical records of patients to help clinicians assess<br />

the likely outcomes resulting from their decisi<strong>on</strong>s<br />

and acti<strong>on</strong>s. However, it is challenging to devise a patient<br />

similarity metric that is clinically relevant and semantically<br />

sound. Patient similarity is highly c<strong>on</strong>text<br />

sensitive: it depends <strong>on</strong> factors such as the disease, the<br />

particular stage of the disease, and co-morbidities. One<br />

way to discern the semantics in a particular c<strong>on</strong>text is to<br />

take advantage of physicians’ expert knowledge as reflected<br />

in labels assigned to some patients. In this paper<br />

we present a method that leverages localized supervised<br />

metric learning to effectively incorporate such expert<br />

knowledge to arrive at semantically sound patient similarity<br />

measures. Experiments using data obtained from<br />

the MIMIC II database dem<strong>on</strong>strate the effectiveness of<br />

this approach.<br />

1. Introducti<strong>on</strong><br />

Medical records capture both observati<strong>on</strong>s of patients’<br />

health status, and decisi<strong>on</strong>s and acti<strong>on</strong>s taken<br />

by clinicians and care providers. Buried inside these<br />

records are nuggets of insight <strong>on</strong> the temporal evoluti<strong>on</strong><br />

pattern of patient health status, and the effects of different<br />

clinical decisi<strong>on</strong>s <strong>on</strong> the trajectory of a disease.<br />

Tapping into this source of insight can be achieved by<br />

developing techniques measuring cross patient similarities.<br />

These techniques have the potential to improve<br />

patients’ clinical outcomes as essential tools for diagnostic<br />

and prognostic decisi<strong>on</strong> support.<br />

Figure 1 illustrates several aspects of the scenario<br />

that drives our research in this area. In this figure, a patient<br />

with available observati<strong>on</strong>s up to a decisi<strong>on</strong> point<br />

is presented to the system. The cohort of patients who<br />

are clinically similar to the query patient are retrieved.<br />

A clinician looks up the decisi<strong>on</strong>s and acti<strong>on</strong>s applied to<br />

the retrieved cohort and their c<strong>on</strong>sequences and makes<br />

up her mind about the best course of acti<strong>on</strong> for the current<br />

patient. In additi<strong>on</strong>, she can project the trajectory<br />

of patient’s health status, as captured by the patient’s<br />

clinical factors and biomarkers, under the regime of any<br />

particular decisi<strong>on</strong> she makes.<br />

There are three fundamental challenges that need to<br />

be addressed before such decisi<strong>on</strong> support mechanism<br />

can be materialized:<br />

1. Alignment of the trajectories of patients’ temporal<br />

characteristics to make the records amenable to<br />

semantically and clinically sound comparis<strong>on</strong>,<br />

2. Devising similarity measures that can reflect the<br />

clinical proximity or disparity between different<br />

patients,<br />

3. Coupling between decisi<strong>on</strong>s and their c<strong>on</strong>sequences<br />

as manifested in patient prognosis.<br />

The focus of this paper is <strong>on</strong> the sec<strong>on</strong>d task. We propose<br />

two different methods for feature generati<strong>on</strong> over<br />

multi-dimensi<strong>on</strong>al temporal patient data, and adopt a localized<br />

supervised metric learning approach to arrive at<br />

a semantically sound similarity measure for retrieving<br />

patients represented in the multi-dimensi<strong>on</strong>al feature<br />

space. The proposed method is tested using the MIMIC<br />

II database, which c<strong>on</strong>sists of physiological waveforms,<br />

and accompanying clinical data obtained for ICU patients<br />

[1]. The study is carried out <strong>on</strong> 74 patients from<br />

this database, categorized into 2 groups based <strong>on</strong> different<br />

clinical c<strong>on</strong>diti<strong>on</strong>s. Comparis<strong>on</strong>s against unsupervised<br />

metric learning approaches <strong>on</strong> classificati<strong>on</strong> and<br />

retrieval accuracy are presented to illustrate the performance<br />

benefit of the proposed approach.<br />

2. Related Work<br />

In [7], Saeed and Mark reported work <strong>on</strong> retrieving<br />

similar patients using the same database, where<br />

they employed a multi-resoluti<strong>on</strong> descripti<strong>on</strong> scheme<br />

for physiological temporal ICU data and used unsupervised<br />

similarity metrics for retrieving patients. The focus<br />

of that work was more <strong>on</strong> the symbolic representati<strong>on</strong><br />

of the temporal data to make them amenable for


epresented by a N-dimensi<strong>on</strong>al feature vector x. Examples<br />

of features are the mean and variance of the sensor<br />

measures, or Wavelet coefficients. The prior belief<br />

of physicians is captured as labels <strong>on</strong> some of the patients.<br />

With this formulati<strong>on</strong>, our goal is to learn a generalized<br />

Mahalanobis distance between patient x i and<br />

patient x j defined as:<br />

√<br />

d m (x i , x j ) = (x i − x j ) T P(x i − x j ) (1)<br />

where P ∈ R N×N is called the precisi<strong>on</strong> matrix. Matrix<br />

P is positive semi-definite and is used to incorporate<br />

the correlati<strong>on</strong>s between different feature dimensi<strong>on</strong>s.<br />

The key is to learn the optimal P such that the<br />

resulting distance metric has the following properties:<br />

Figure 1. Retrieving patients based <strong>on</strong><br />

their clinical similarity to a query patient<br />

and using the retrieved patients to project<br />

the evoluti<strong>on</strong> of patient’s clinical characteristics.<br />

comparis<strong>on</strong>. In this paper, we leverage both statistical<br />

methods and Wavelet methods to extract features over<br />

the temporal data.<br />

<str<strong>on</strong>g>Supervised</str<strong>on</strong>g> metric learning has been studied in the<br />

past [11, 12, 5]. The goal has been to learn a distance<br />

metric such that samples in the same class are close and<br />

those in different classes are far away. The comm<strong>on</strong><br />

treatment is to add c<strong>on</strong>straints and regularizati<strong>on</strong> terms<br />

into the objective functi<strong>on</strong> and then to solve it using optimizati<strong>on</strong><br />

methods. To avoid a large number of c<strong>on</strong>straints,<br />

in this paper we model this problem as trace<br />

ratio problem which can be solved effectively (similar<br />

to Wang et al. [9]).<br />

3 <str<strong>on</strong>g>Localized</str<strong>on</strong>g> <str<strong>on</strong>g>Supervised</str<strong>on</strong>g> <str<strong>on</strong>g>Metric</str<strong>on</strong>g> <str<strong>on</strong>g>Learning</str<strong>on</strong>g><br />

In this secti<strong>on</strong> we present the supervised metric<br />

learning problem in the c<strong>on</strong>text of patient similarity<br />

measure. When a physician looks for similar patients<br />

in a database, the similarity is often based not <strong>on</strong>ly<br />

<strong>on</strong> quantitative measurements such as lab results, sensor<br />

measurements, age and sex, but also <strong>on</strong> the physician’s<br />

assessment of the disease type and stage. The<br />

assessment would potentially influence the relative importance<br />

a physician places <strong>on</strong> different measurements<br />

or groups of measurements. To compute this specific<br />

noti<strong>on</strong> of similarity, we propose to learn a distance metric<br />

that can automatically adjust the importance of each<br />

numeric feature by leveraging the physician’s belief.<br />

Formally, quantitative measurements of a patient are<br />

• Within-class compactness: patients of the same label<br />

are close together;<br />

• Between-class scatterness: patients of different labels<br />

are far away from each other.<br />

To formally measure these properties, we use two kinds<br />

of neighborhoods as defined in [10]: The homogeneous<br />

neighborhood of x i , denoted as Ni o , is the k-nearest<br />

patients of x i with the same label. The heterogeneous<br />

neighborhood of x i , denoted as Ni e , is the k-nearest patients<br />

of x i with different labels.<br />

Based <strong>on</strong> these two neighborhoods, we define the local<br />

compactness of point x i as<br />

C i =<br />

∑<br />

d 2 m(x i , x j ) (2)<br />

x j∈N o i<br />

and the local scatter ness of point x i as<br />

S i =<br />

∑<br />

x k ∈N e i<br />

d 2 m(x i , x k ) (3)<br />

The discriminability of the distance metric d m is defined<br />

as<br />

∑<br />

J = ∑ i C ∑ ∑<br />

i i x j∈N<br />

(x<br />

i<br />

i S =<br />

o i − x j ) T P(x i − x j )<br />

∑ ∑<br />

i i x k ∈N<br />

(x<br />

i<br />

e i − x k ) T P(x i − x k )<br />

(4)<br />

The goal is to find a P that minimizes J , which is<br />

equivalent to minimizing the local compactness and<br />

maximizing the local scatterness simultaneously. In<br />

c<strong>on</strong>trast with linear discriminant analysis [4] , which<br />

seeks for a discriminant subspace in a global sense,<br />

the localized supervised metric aims to learn a distance<br />

metric with enhanced local discriminability. To minimize<br />

J , we formulate the problem as a trace ratio minimizati<strong>on</strong><br />

problem [9] and use the decomposed Newtown’s<br />

method to find the soluti<strong>on</strong> [6].


Since P is a low-rank positive semi-definite matrix,<br />

we can decompose the precisi<strong>on</strong> matrix as P = WW T ,<br />

where W ∈ R N×d and d ≤ N. The distance metric<br />

can be rewritten as d m (x i , x) = ‖W T x i − W T x j ‖.<br />

Therefore, the distance metric is equivalent to euclidean<br />

distance over the low-dimensi<strong>on</strong>al projecti<strong>on</strong> W T x.<br />

4. Data Descripti<strong>on</strong> and Feature Extracti<strong>on</strong><br />

We have used the physiological data for 74 patients<br />

obtained from the MMIC II database [1] in our experiments.<br />

Each patient is represented with 5 streams<br />

of sensor readings, sampled at 1 minute intervals: 1)<br />

Sp02, 2) heart rate (HR), 3) mean ABP (ABPmean),<br />

(4) systolic ABP (ABPSys), and diastolic ABP (ABP-<br />

Dias). All patients bel<strong>on</strong>g to <strong>on</strong>e of two groups H or C.<br />

Those in group H (36 patients) had experienced Arterial<br />

Hypotensive Episode (AHE) events during the forecast<br />

window, whereas those in group C (38 patients) did not<br />

experience any AHE within the forecast window. The<br />

start of the forecast window is timestamped in the data<br />

set (T 0 ) and its durati<strong>on</strong> is 1 hour, in which an episode<br />

of AHE can occur. For this study, we focus <strong>on</strong> a 2-<br />

hour window around T 0 for each patient. Figure 2 illustrates<br />

the data from two patients, in which samples in<br />

H group show higher variability than those in C group.<br />

Physicians actually use the variability level of ABP to<br />

diagnose AHE [2].<br />

We have used two different schemes to represent<br />

the 2-hour temporal data for each patient: a statistical<br />

time domain method and a wavelet domain method. In<br />

the former, we compute the mean and variance of data<br />

from each sensor for each patient. Thus, each patient is<br />

represented in the time domain with a 10-dimensi<strong>on</strong>al<br />

vector. In the latter, the wavelet coefficients of the 2-<br />

hour window from each sensor are computed. We use<br />

Daubechies-4 Wavelet [3] and keep the top-10 coefficients.<br />

Finally, the coefficients from all 5 sensors are<br />

vectorized into a 50-dimensi<strong>on</strong>al feature vector for each<br />

patient.<br />

5. Experiments<br />

From the feature extracti<strong>on</strong> step described in secti<strong>on</strong><br />

4, we obtain 74 N-dimensi<strong>on</strong>al feature vectors<br />

where N = 10 for the statistic method and N = 50<br />

for the Wavelet method. We then compare the following<br />

three distance metrics using the leave-<strong>on</strong>e-out<br />

paradigm:<br />

• Expert uses Euclidean distance of the variance of<br />

the mean ABP as suggested in [2];<br />

• PCA uses Euclidean distance over lowdimensi<strong>on</strong>al<br />

points after PCA (an unsupervised<br />

metric learning algorithm);<br />

(a) Samples in H group<br />

(b) Samples in C group<br />

Figure 2. Examples of multivariate time<br />

series data for H and C groups. H<br />

group patients show higher variability<br />

than those in C group.<br />

• LSML using the localized supervised metric learning<br />

method described in secti<strong>on</strong> 3.<br />

Note that we do not make comparis<strong>on</strong>s with global supervised<br />

metric learning methods like LDA [4] because<br />

as shown in [5, 8], localized metric usually performs<br />

better. The performance metrics include k-NN classificati<strong>on</strong><br />

error rate and precisi<strong>on</strong>@10 retrieval results.<br />

The precisi<strong>on</strong>@10 of a query point is computed by retrieving<br />

10-nearest points with a specific distance metric<br />

and then computing the percentage of those retrieved<br />

points having the same label as the query point.<br />

Performance Comparis<strong>on</strong> To have a fair comparis<strong>on</strong>,<br />

both PCA and LSML project data into 1-<br />

dimensi<strong>on</strong>al space since the Expert method <strong>on</strong>ly uses<br />

<strong>on</strong>e feature, i.e., the variance of mean ABP. Table 1<br />

shows the classificati<strong>on</strong> results using 3-NN classifier,<br />

and Table 2 shows the retrieval results. As can be<br />

observed in both tables, LSML out-performs both expert<br />

and PCA <strong>on</strong> both statistical and Wavelet features,


which c<strong>on</strong>firms the importance of leveraging label informati<strong>on</strong><br />

into the distance metric. We also observe that<br />

Wavelet features improve the performance significantly<br />

for LSML, where the classificati<strong>on</strong> error drops by half<br />

(from about 15% to less than 7%.)<br />

Table 1. Classificati<strong>on</strong> error comparis<strong>on</strong><br />

Expert PCA LSML<br />

Statistic features 0.2295 0.2131 0.1475<br />

Wavelet features NA 0.2295 0.0656<br />

Table 2. Precisi<strong>on</strong>@10 retrieval results<br />

Expert PCA LSML<br />

Statistic features 0.6120 0.5355 0.6557<br />

Wavelet features NA 0.5410 0.7869<br />

Sensitivity Analysis There are two parameters in the<br />

study: 1) the number of neighbors k in the k-NN classifier<br />

and 2) the dimensi<strong>on</strong>ality d of the resulting lowdimensi<strong>on</strong>al<br />

space (after PCA and LSML). Figure 3<br />

shows the reuslts of sentivity analysis <strong>on</strong> these two parameters.<br />

Figure 3(a) plots classificati<strong>on</strong> error vs. k for<br />

all methods. Small k leads to lower classificati<strong>on</strong> error,<br />

which c<strong>on</strong>firms the need for a localized distance metric.<br />

Figure 3(b) plots classificati<strong>on</strong> error vs. dimensi<strong>on</strong>ality<br />

d for all methods except Expert, which c<strong>on</strong>firms the<br />

stability of LSML w.r.t. to different d.<br />

6. C<strong>on</strong>clusi<strong>on</strong> and Discussi<strong>on</strong><br />

We have presented a method for deriving semantically<br />

sound similarity measures for retrieving patients<br />

represented by multi-dimensi<strong>on</strong>al time series. Our<br />

method uses both statistical and wavelet based features<br />

to capture the characteristics of patients, and leverages<br />

localized supervised metric learning to incorporate<br />

physicians’ expert domain knowledge. Experiments using<br />

the MIMIC II database dem<strong>on</strong>strates the efficacy of<br />

this appraoch. In future work we plan to explore ways<br />

to explicitly incorporate temporal characteristics of the<br />

data to further improve metric learning in this particular<br />

c<strong>on</strong>text.<br />

References<br />

[1] MIMIC II Database.<br />

http://physi<strong>on</strong>et.org/physiobank/database/mimic2db/.<br />

[2] X. Chen, D. Xu, G. Zhang, and R. Mukkamala. Forecasting<br />

acute hypotensive episodes in intensive care<br />

patients based <strong>on</strong> a peripheral arterial blood pressure<br />

waveform. Computers in Cardiology, 36, 2000.<br />

[3] I. Daubechies. Ten Lectures <strong>on</strong> Wavelets. SIAM,<br />

Philadelphia, 1992.<br />

(a) Stable with different k<br />

(b) Stable with different d<br />

Figure 3. LSML is stable with different parameter<br />

values.<br />

[4] K. Fukunaga. Introducti<strong>on</strong> to Statistical Pattern Recogniti<strong>on</strong>.<br />

Academic Press, San Diego, California, 1990.<br />

[5] J. Goldberger, S. Roweis, G. Hint<strong>on</strong>, and R. Salakhutdinov.<br />

Neighborhood comp<strong>on</strong>ent analysis. In NIPS, 2005.<br />

[6] Y. Jia, F. Nie, and C. Zhang. Trace ratio problem revisited.<br />

IEEE Transacti<strong>on</strong>s <strong>on</strong> Neural Networks, 2009.<br />

[7] M. Saeed and R. Mark. A novel method for the efficient<br />

retrieval of similar multiparameter physiologic time series<br />

using wavelet-based symbolic representati<strong>on</strong>s. In<br />

American Medical Informatics Associati<strong>on</strong>, 2006.<br />

[8] M. Sugiyama. Dimensi<strong>on</strong>ality reducti<strong>on</strong> of multimodal<br />

labeled data by local fisher discriminant analysis. J.<br />

Mach. Learn. Res., 8, 2007.<br />

[9] F. Wang, J. Sun, T. Li, and N. Anerousis. Two heads<br />

better than <strong>on</strong>e: <str<strong>on</strong>g>Metric</str<strong>on</strong>g>+active learning and its applicati<strong>on</strong>s<br />

for it service classificati<strong>on</strong>. In ICDM, 2009.<br />

[10] F. Wang and C. Zhang. Feature extracti<strong>on</strong> by maximizing<br />

the neighborhood margin. In CVPR, 2007.<br />

[11] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell.<br />

Distance metric learning, with applicati<strong>on</strong> to clustering<br />

with side-informati<strong>on</strong>. In NIPS, 2002.<br />

[12] L. Yang. Distance metric learning: A comprehensive<br />

survey. Technical report, Michgan State University,<br />

2006.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!