11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

488 Ah-ha momentit was something of an open question how to do multi-category SVMs in oneoptimization problem that did not have this problem. Yi Lin, Yoonkyung Leeand I were sitting around shooting the breeze and one of us said “how about asum-to-zero constraint?” and the other two said “ah-ha,” or at least that’s theway I remember it. The idea is to code the labels as k-vectors, with a 1 in therth position and −1/(k − 1) in the k − 1 other positions for a training samplein class r. Thus, each observation vector satisfies the sum-to-zero constraint.The idea was to fit a vector of functions satisfying the same sum-to-zero constraint.The multi-category SVM fit estimates f(t) =(f 1 (t),...,f k (t)), t ∈Tsubject to the sum-to-zero constraint everywhere and the classification for asubject with attribute vector t is just the index of the largest component ofthe estimate of f(t). See Lee and Lee (2003) and Lee et al. (2004a,b). Advice:shooting the breeze is good.41.1.8 Fan Lu, Steve Wright, Sunduz Keles, Hector CorradaBravo, and dissimilarity informationWe return to the alternative role of positive definite functions as a way toencode pairwise distance observations. Suppose we are examining n objectsO 1 ,...,O n and are given some noisy or crude observations on their pairwisedistances/dissimilarities, which may not satisfy the triangle inequality. Thegoal is to embed these objects in a Euclidean space in such a way as to respectthe pairwise dissimilarities as much as possible. Positive definite matricesencode pairwise squared distances d ij between O i and O j asd ij (K) =K(i, i)+K(j, j) − 2K(i, j), (41.5)and, given a non-negative definite matrix of rank d ≤ n, canbeusedtoembedthe n objects in a Euclidean space of dimension d, centered at 0 and uniqueup to rotations. We seek a K which respects the dissimilarity information d obsijwhile constraining the complexity of K by∑min |dobsij − d ij (K)| + λ trace(K), (41.6)K∈S nwhere S n is the convex cone of symmetric positive definite matrices. I lookedat this problem for an inordinate amount of time seeking an analytic solutionbut after a conversation with Vishy (S.V.N. Vishwanathan) at a meeting inRotterdam in August of 2003 I realized it wasn’t going to happen. The ahhamoment came about when I showed the problem to Steve Wright, whoright off said it could be solved numerically using recently developed convexcone software. The result so far is Corrada Bravo et al. (2009) and Lu et al.(2005). In Lu et al. (2005) the objects are protein sequences and the pairwisedistances are BLAST scores. The fitted kernel K had three eigenvalues thatcontained about 95% of the trace, so we reduced K to a rank 3 matrix bytruncating the smaller eigenvalues. Clusters of four different kinds of proteinswere readily separated visually in three-d plots; see Lu et al. (2005) for the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!