11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

G. Wahba 489details. In Corrada Bravo et al. (2009) the objects are persons in pedigreesin a demographic study and the distances are based on Malecot’s kinshipcoefficient, which defines a pedigree dissimilarity measure. The resulting kernelbecame part of an SS ANOVA model with other attributes of persons, andthe model estimates a risk related to an eye disease. Advice: find computerscientist friends.41.1.9 Gábor Székely, Maria Rizzo, Jing Kong and distancecorrelationThe last ah-ha experience that we report is similar to that involving the randomizedtrace estimate of Section 41.1.4, i.e., the ah-ha moment came aboutupon realizing that a particular recent result was very relevant to what wewere doing. In this case Jing Kong brought to my attention the important paperof Gábor Székely and Maria Rizzo (Székely and Rizzo, 2009). Briefly, thispaper considers the joint distribution of two random vectors, X and Y ,say,and provides a test, called distance correlation that it factors so that the tworandom vectors are independent. Starting with n observations from the jointdistribution, let {A ij } be the collection of double-centered pairwise distancesamong the ( n2)X components, and similarly for {Bij }.Thestatistic,calleddistance correlation, is the analogue of the usual sample correlation betweenthe A’s and B’s. The special property of the test is that it is justified for Xand Y in Euclidean p and q space for arbitrary p and q with no further distributionalassumptions. In a demographic study involving pedigrees (Konget al., 2012), we observed that pairwise distance in death age between closerelatives was less than that of unrelated age cohorts. A mortality risk scorefor four lifestyle factors and another score for a group of diseases was developedvia SS ANOVA modeling, and significant distance correlation was foundbetween death ages, lifestyle factors and family relationships, raising morequestions than it answers regarding the “Nature-Nurture” debate (relativerole of genetics and other attributes).We take this opportunity to make a few important remarks about pairwisedistances/dissimilarities, primarily how one measures them can be important,and getting the “right” dissimilarity can be 90% of the problem. We remarkthat family relationships in Kong et al. (2012) were based on a monotonefunction of Malecot’s kinship coefficient that was different from the monotonefunction in Corrada Bravo et al. (2009). Here it was chosen to fit in with thedifferent way the distances were used. In (41.6), the pairwise dissimilaritiescan be noisy, scattered, incomplete and could include subjective distances like“very close, close.. ” etc. not even satisfying the triangle inequality. So there issubstantial flexibility in choosing the dissimilarity measure with respect to theparticular scientific context of the problem. In Kong et al. (2012) the pairwisedistances need to be a complete set, and be Euclidean (with some specificmetric exceptions). There is still substantial choice in choosing the definitionof distance, since any linear transformation of a Euclidean coordinate system

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!