v2006.03.09 - Convex Optimization

DATTORROCONVEXOPTIMIZATION&EUCLIDEANDISTANCEGEOMETRYMεβοο

DattorroCONVEXOPTIMIZATION&EUCLIDEANDISTANCEGEOMETRYMeboo

Mεβoo Publishing

DATTORRO

Convex Optimization&Euclidean Distance Geometry

EDM N = S N h ∩ ( )S N⊥c − S N +

Convex Optimization&Euclidean Distance GeometryJon DattorroMeboo Publishing USA

Meboo Publishing USA345 Stanford Shopping Ctr.Suite 410Palo Alto, California 94304-1413Convex Optimization & Euclidean Distance Geometry, Jon DattorroISBN 0-9764013-0-4version 03.09.2006cybersearch:I. convex optimizationII. convex conesIII. convex geometryIV. distance geometryV. distance matrixprograms and graphics by Matlabtypesetting byThis searchable electronic color pdfBook is navigable by page, cross reference,citation, equation, figure, table, and hyperlink. A pdfBook has no electronic copyprotection and can be read and printed by most computers. The publisher herebygrants the right to reproduce this work in any format but limited to personal use.2005 Jon DattorroAll rights reserved

for Jennie Columba♦Antonio♦♦& Sze Wan

AcknowledgementsReflecting on help given me all along the way, I am awed by my heavy debtof gratitude:Anthony, Teppo Aatola, Bob Adams, Chuck Bagnaschi, Jeffrey Barish, KeithBarr, Florence Barker, David Beausejour, Leonard Bellin, Dave Berners,Barry Blesser, Richard Bolt, Stephen Boyd, Merril Bradshaw, Gary Brooks,Jack Brown, Patty Brown, Joe Bryan, Barb Burk, John Calderone, KathrynCalí, Jon Caliri, Terri Carter, Reginald Centracchio, Bonnie Charvez,Michael Chille, Jan Chomysin, John Cioffi, Robert Cogan, Tony, Sandra,Josie, and Freddy Coletta, Fred, Rita, and Kathy Copley, Gary Corsini, John,Christine, Donna, Sylvia, Ida, Dottie, Sheila, and Yvonne D’Attoro, CosmoDellagrotta, Robin DeLuca, Lou D’Orsaneo, Antonietta DeCotis, DonnaDeMichele, Tom Destefano, Tom Dickie, Peggy Donatelli, Jeanne Duguay,Pozzi Escot, Courtney Esposito, Terry Fayer, Carol Feola, Valerie, Robin,Eddie, and Evelyn Ferruolo, Martin Fischer, Salvatore Fransosi, StanleyFreedman, Tony Garnett, Edouard Gauthier, Amelia Giglio, Chae TeokGoh, Gene Golub, Patricia Goodrich, Heather Graham, Robert Gray, DavidGriesinger, Robert Haas, George Heath, Bruce, Carl, John, Joy, and Harley,Michelle, Martin Henry, Mar Hershenson, Evan Holland, Leland Jackson,Rene Jaeger, Sung Dae Jun, Max Kamenetsky, Seymour Kass, PatriciaKeefe, Christine Kerrick, James Koch, Mary Kohut, Paul Kruger, HelenKung, Lynne Lamberis, Paul Lansky, Monique Laurent, Karen Law, FrancisLee, Scott Levine, Hal Lipset, Nando Lopez-Lezcano, Pat Macdonald, GinnyMagee, Ed Mahler, Sylvia, Frankie, and David Martinelli, Bill Mauchly,Jamie Jacobs May, Paul & Angelia McLean, Joyce, Al, and Tilly Menchini,Teresa Meng, Tom Metcalf, J. Andy Moorer, Walter Murray, John O’Donnell,Brad Osgood, Ray Ouellette, Sal Pandozzi, Bob Peretti, Bob Poirier, MannyPokotilow, Michael Radentz, Timothy Rapa, Doris Rapisardi, KatharinaReissing, Flaviu, Nancy, and Myron Romanul, Marvin Rous, Carlo, Anthony,and Peters Russo, Eric Safire, Jimmy Scialdone, Anthony, Susan, and EstherScorpio, John Senior, Robin Silver, Josh, Richard, and Jeremy Singer, AlbertSiu-Toung, Julius Smith, Terry Stancil, Brigitte Strain, John Strawn, DavidTaraborelli, Robin Tepper, Susan Thomas, Stavros Toumpis, GeorgetteTrezvant, Rebecca, Daniel, Jack, and Stella Tuttle, Maryanne Uccello, JoeVotta, Julia Wade, Shelley Marged Weber, Jennifer Westerlind, Todd Wilson,Yinyu Ye, George & MaryAnn Young, Donna & Joe ZagamiFrom: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.

PreludeThe constant demands of my department and university and theever increasing work needed to obtain funding have stolen much ofmy precious thinking time, and I sometimes yearn for the halcyondays of Bell Labs.−Steven Chu, Nobel laureate [46]Convex Analysis is the calculus of inequalities while Convex Optimizationis its application. Analysis is inherently the domain of the mathematicianwhile Optimization belongs to the engineer.There is a great race under way to determine which important problemscan be posed in a convex setting. Yet, that skill acquired by understandingthe geometry and application of Convex Optimization will remain more anart for some time to come; the reason being, there is generally no uniquetransformation of a given problem to its convex equivalent. This means, tworesearchers pondering the same problem are likely to formulate the convexequivalent differently; hence, one solution is likely different from the otherfor the same problem. Any presumption of only one right or correct solutionbecomes nebulous. Study of equivalence, sameness, and uniqueness thereforepervade study of optimization.2005 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.11

There is tremendous benefit in transforming a given optimization problemto its convex equivalent, primarily because any locally optimal solution isthen guaranteed globally optimal. Solving a nonlinear system, for example,by instead solving an equivalent convex optimization problem is thereforehighly preferable. 0.1 Yet it can be difficult for the engineer to apply theorywithout understanding Analysis. Boyd & Vandenberghe’s book ConvexOptimization [38] is a marvelous bridge between Analysis and Optimization;rigorous though accessible to nonmathematicians. 0.2These pages comprise my journal over a six year period filling someremaining gaps between mathematician and engineer; they constitute atranslation, unification, and cohering of about two hundred papers, books,and reports from several different areas of mathematics and engineering.Beacons of historical accomplishment are cited throughout. Extraordinaryattention is paid to detail, clarity, accuracy, and consistency; accompaniedby immense effort to eliminate ambiguity and verbosity. Consequently thereis heavy cross-referencing and much background material provided in thetext, footnotes, and appendices so as to be self-contained and to provideunderstanding of fundamental concepts.Material contained in Boyd & Vandenberghe is, for the most part, notduplicated here although topics found particularly useful in the study ofdistance geometry are expanded or elaborated; e.g., matrix-valued functionsin chapter 3 and calculus in appendix D, linear algebra in appendix A,convex geometry in chapter 2, nonorthogonal and orthogonal projection inappendix E, semidefinite programming in chapter 6, etcetera. In that sense,this book may be regarded a companion to Convex Optimization.−Jon DattorroStanford, California20050.1 This is the idea motivating a form of convex optimization known as geometricprogramming [38, p.188] that has already driven great advances in the electronic circuitdesign industry. [24,4.7] [153] [252] [255] [55] [107] [113] [114] [115] [116] [117] [118] [164][165] [172] [196]0.2 Their text was conventionally published in 2004 having been taught at StanfordUniversity and made freely available on the world-wide web for ten prior years; adissemination motivated by the belief that a virtual flood of new applications would followby epiphany that many problems hitherto presumed nonconvex could be transformed orrelaxed into convexity. [8] [9] [24,4.3, p.316-322] [37] [52] [80] [81] [174] [188] [193] [234][235] [259] [262] Course enrollment for Spring quarter 2005 at Stanford was 170 students.12

Convex Optimization&Euclidean Distance Geometry1 Overview 332 Convex geometry 452.1 Convex set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.1.1 subspace . . . . . . . . . . . . . . . . . . . . . . . . . . 462.1.2 linear independence . . . . . . . . . . . . . . . . . . . . 472.1.2.1 preservation . . . . . . . . . . . . . . . . . . . 472.1.3 Orthant: . . . . . . . . . . . . . . . . . . . . . . . . . . 482.1.4 affine set . . . . . . . . . . . . . . . . . . . . . . . . . . 482.1.5 dimension . . . . . . . . . . . . . . . . . . . . . . . . . 482.1.6 empty set versus empty interior . . . . . . . . . . . . . 492.1.6.1 relative interior . . . . . . . . . . . . . . . . . 492.1.7 classical boundary . . . . . . . . . . . . . . . . . . . . 502.1.7.1 Line intersection with boundary . . . . . . . . 512.1.7.2 Tangential line intersection with boundary . . 522.1.8 intersection, sum, difference, product . . . . . . . . . . 542.1.9 inverse image . . . . . . . . . . . . . . . . . . . . . . . 562.2 Vectorized-matrix inner product . . . . . . . . . . . . . . . . . 572.2.1 Frobenius’ . . . . . . . . . . . . . . . . . . . . . . . . . 592.2.2 Symmetric matrices . . . . . . . . . . . . . . . . . . . . 602.2.2.1 Isomorphism on symmetric matrix subspace . 612.2.3 Symmetric hollow subspace . . . . . . . . . . . . . . . 632.3 Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6513

14 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY2.3.1 Affine dimension, affine hull . . . . . . . . . . . . . . . 652.3.2 Convex hull . . . . . . . . . . . . . . . . . . . . . . . . 682.3.2.1 Comparison with respect to R N + . . . . . . . . 682.3.3 Conic hull . . . . . . . . . . . . . . . . . . . . . . . . . 702.3.4 Vertex-description . . . . . . . . . . . . . . . . . . . . . 722.4 Halfspace, Hyperplane . . . . . . . . . . . . . . . . . . . . . . 722.4.1 Halfspaces H + and H − . . . . . . . . . . . . . . . . . . 722.4.1.1 PRINCIPLE 1: Halfspace-description . . . . . 742.4.2 Hyperplane ∂H representations . . . . . . . . . . . . . 762.4.2.1 Matrix variable . . . . . . . . . . . . . . . . . 772.4.2.2 Vertex-description of hyperplane . . . . . . . 782.4.2.3 Affine independence, minimal set . . . . . . . 782.4.2.4 Preservation of affine independence . . . . . . 792.4.2.5 affine maps . . . . . . . . . . . . . . . . . . . 792.4.2.6 PRINCIPLE 2: Supporting hyperplane . . . . 822.4.2.7 PRINCIPLE 3: Separating hyperplane . . . . 842.5 Subspace representations . . . . . . . . . . . . . . . . . . . . . 852.5.1 Subspace or affine set. . . . . . . . . . . . . . . . . . . . 862.5.1.1 . . . as hyperplane intersection . . . . . . . . . 862.5.1.2 . . . as span of nullspace basis . . . . . . . . . . 872.5.2 Intersection of subspaces . . . . . . . . . . . . . . . . . 882.6 Extreme, Exposed . . . . . . . . . . . . . . . . . . . . . . . . . 882.6.1 Exposure . . . . . . . . . . . . . . . . . . . . . . . . . 912.6.1.1 Density of exposed points . . . . . . . . . . . 912.6.1.2 Face transitivity and algebra . . . . . . . . . 922.6.1.3 Boundary . . . . . . . . . . . . . . . . . . . . 922.7 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932.7.1 Cone defined . . . . . . . . . . . . . . . . . . . . . . . 962.7.2 Convex cone . . . . . . . . . . . . . . . . . . . . . . . . 972.7.2.1 cone invariance . . . . . . . . . . . . . . . . . 982.7.2.2 Pointed closed convex cone and partial order . 992.8 Cone boundary . . . . . . . . . . . . . . . . . . . . . . . . . . 1012.8.1 Extreme direction . . . . . . . . . . . . . . . . . . . . . 1022.8.1.1 extreme distinction, uniqueness . . . . . . . . 1022.8.1.2 Generators . . . . . . . . . . . . . . . . . . . 1052.8.2 Exposed direction . . . . . . . . . . . . . . . . . . . . . 1062.8.2.1 Connection between boundary and extremes . 1062.8.2.2 Converse caveat . . . . . . . . . . . . . . . . . 108

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 152.9 Positive semidefinite (PSD) cone . . . . . . . . . . . . . . . . . 1092.9.0.1 Membership . . . . . . . . . . . . . . . . . . . 1102.9.1 PSD cone is convex . . . . . . . . . . . . . . . . . . . . 1102.9.2 PSD cone boundary . . . . . . . . . . . . . . . . . . . 1142.9.2.1 rank ρ subset of the PSD cone . . . . . . . . 1162.9.2.2 Faces of PSD cone . . . . . . . . . . . . . . . 1162.9.2.3 Extreme directions of the PSD cone . . . . . 1172.9.2.4 PSD cone is not circular . . . . . . . . . . . . 1182.9.2.5 Boundary constituents of the PSD cone . . . 1242.9.2.6 Projection on S M +(ρ) . . . . . . . . . . . . . . 1262.9.2.7 Uniting constituents . . . . . . . . . . . . . . 1262.9.2.8 Peeling constituents . . . . . . . . . . . . . . 1272.9.3 Barvinok’s proposition . . . . . . . . . . . . . . . . . . 1282.10 Conic independence (c.i.) . . . . . . . . . . . . . . . . . . . . . 1292.10.1 Preservation of conic independence . . . . . . . . . . . 1312.10.1.1 linear maps of cones . . . . . . . . . . . . . . 1332.10.2 Pointed closed convex K & conic independence . . . . . 1332.10.3 Utility of conic independence . . . . . . . . . . . . . . 1352.11 When extreme means exposed . . . . . . . . . . . . . . . . . . 1352.12 Convex polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . 1362.12.1 Polyhedral cone . . . . . . . . . . . . . . . . . . . . . . 1372.12.2 Vertices of convex polyhedra . . . . . . . . . . . . . . . 1382.12.2.1 Vertex-description of polyhedral cone . . . . . 1392.12.2.2 Pointedness . . . . . . . . . . . . . . . . . . . 1402.12.3 Unit simplex . . . . . . . . . . . . . . . . . . . . . . . 1402.12.3.1 Simplex . . . . . . . . . . . . . . . . . . . . . 1422.12.4 Converting between descriptions . . . . . . . . . . . . . 1422.13 Dual cone & generalized inequality . . . . . . . . . . . . . . . 1452.13.1 Dual cone . . . . . . . . . . . . . . . . . . . . . . . . . 1452.13.1.1 Key properties of dual cone . . . . . . . . . . 1502.13.1.2 Examples of dual cone . . . . . . . . . . . . . 1522.13.2 Abstractions of Farkas’ lemma . . . . . . . . . . . . . . 1552.13.2.1 Null certificate, Theorem of the alternative . . 1572.13.3 Optimality condition . . . . . . . . . . . . . . . . . . . 1592.13.4 Discretization of membership relation . . . . . . . . . . 1602.13.4.1 Dual halfspace-description . . . . . . . . . . . 1602.13.4.2 First dual-cone formula . . . . . . . . . . . . 1612.13.5 Dual PSD cone and generalized inequality . . . . . . . 162

16 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY2.13.5.1 self-dual cones . . . . . . . . . . . . . . . . . 1642.13.6 Dual of pointed polyhedral cone . . . . . . . . . . . . . 1662.13.6.1 Facet normal & extreme direction . . . . . . . 1672.13.7 Biorthogonal expansion by example . . . . . . . . . . . 1672.13.7.1 Pointed cones and biorthogonality . . . . . . 1692.13.8 Biorthogonal expansion, derivation . . . . . . . . . . . 1712.13.8.1 x ∈ K . . . . . . . . . . . . . . . . . . . . . . 1722.13.8.2 x ∈ aff K . . . . . . . . . . . . . . . . . . . . 1742.13.9 Formulae, algorithm finding dual cone . . . . . . . . . 1742.13.9.1 Pointed K , dual, X skinny-or-square . . . . 1742.13.9.2 Simplicial case . . . . . . . . . . . . . . . . . 1752.13.9.3 Membership relations in a subspace . . . . . . 1762.13.9.4 Subspace M = aff K . . . . . . . . . . . . . . 1772.13.9.5 More pointed cone descriptions . . . . . . . . 1822.13.10Dual cone-translate . . . . . . . . . . . . . . . . . . . . 1832.13.11Proper nonsimplicial K , dual, X fat full-rank . . . . . 1863 Geometry of convex functions 1913.1 Convex function . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.1.1 Vector-valued function . . . . . . . . . . . . . . . . . . 1923.1.1.1 Strict convexity . . . . . . . . . . . . . . . . . 1933.1.1.2 Affine function . . . . . . . . . . . . . . . . . 1943.1.1.3 Epigraph, sublevel set . . . . . . . . . . . . . 1983.1.1.4 Gradient . . . . . . . . . . . . . . . . . . . . . 1993.1.1.5 First-order convexity condition, real function 2013.1.1.6 First-order, vector-valued function . . . . . . 2053.1.1.7 Second-order, vector-valued function . . . . . 2063.1.1.8 second-order ⇒ first-order condition . . . . . 2063.1.2 Matrix-valued function . . . . . . . . . . . . . . . . . . 2073.1.2.1 First-order condition, matrix-valued function 2083.1.2.2 Epigraph of matrix-valued function . . . . . . 2083.1.2.3 Second-order, matrix-valued function . . . . . 2093.2 Quasiconvex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2113.2.1 Differentiable and quasiconvex . . . . . . . . . . . . . . 2123.3 Salient properties . . . . . . . . . . . . . . . . . . . . . . . . . 213

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 174 Euclidean Distance Matrix 2154.1 EDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2164.2 First metric properties . . . . . . . . . . . . . . . . . . . . . . 2174.3 ∃ fifth Euclidean metric property . . . . . . . . . . . . . . . . 2174.3.1 Lookahead . . . . . . . . . . . . . . . . . . . . . . . . . 2204.4 EDM definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2224.4.0.1 homogeneity . . . . . . . . . . . . . . . . . . 2234.4.1 −VN TD(X)V N convexity . . . . . . . . . . . . . . . . . 2244.4.2 Gram-form EDM definition . . . . . . . . . . . . . . . 2254.4.2.1 First point at origin . . . . . . . . . . . . . . 2264.4.2.2 0 geometric center . . . . . . . . . . . . . . . 2274.4.3 Inner-product form EDM definition . . . . . . . . . . . 2444.4.3.1 Relative-angle form . . . . . . . . . . . . . . . 2464.4.3.2 Inner-product form −VN TD(Θ)V N convexity . 2474.4.3.3 Inner-product form, discussion . . . . . . . . 2484.5 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2484.5.1 Translation . . . . . . . . . . . . . . . . . . . . . . . . 2484.5.1.1 Gram-form invariance . . . . . . . . . . . . . 2494.5.2 Rotation/Reflection . . . . . . . . . . . . . . . . . . . . 2504.5.2.1 Inner-product form invariance . . . . . . . . . 2504.5.3 Invariance conclusion . . . . . . . . . . . . . . . . . . . 2504.6 Injectivity of D & unique reconstruction . . . . . . . . . . . . 2514.6.1 Gram-form bijectivity . . . . . . . . . . . . . . . . . . 2514.6.1.1 Gram-form operator D inversion . . . . . . . 2534.6.2 Inner-product form bijectivity . . . . . . . . . . . . . . 2554.6.2.1 Inversion of D ( −VN TDV N). . . . . . . . . . . 2554.7 Embedding in affine hull . . . . . . . . . . . . . . . . . . . . . 2574.7.1 Determining affine dimension . . . . . . . . . . . . . . 2574.7.1.1 Affine dimension r versus rank . . . . . . . . 2594.7.2 Précis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2604.7.3 Eigenvalues of −V DV versus −V † N DV N . . . . . . . . 2614.8 Euclidean metric versus matrix criteria . . . . . . . . . . . . . 2624.8.1 Nonnegativity property 1 . . . . . . . . . . . . . . . . . 2624.8.1.1 Strict positivity . . . . . . . . . . . . . . . . . 2634.8.2 Triangle inequality property 4 . . . . . . . . . . . . . . 2644.8.2.1 Comment . . . . . . . . . . . . . . . . . . . . 2654.8.2.2 Strict triangle inequality . . . . . . . . . . . . 2664.8.3 −VN TDV N nesting . . . . . . . . . . . . . . . . . . . . . 266

18 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY4.8.3.1 Tightening the triangle inequality . . . . . . . 2684.8.4 Affine dimension reduction in two dimensions . . . . . 2694.9 Bridge: Convex polyhedra to EDMs . . . . . . . . . . . . . . . 2704.9.1 Geometric arguments . . . . . . . . . . . . . . . . . . . 2724.9.2 Necessity and sufficiency . . . . . . . . . . . . . . . . . 2754.9.3 Example revisited . . . . . . . . . . . . . . . . . . . . . 2754.10 EDM-entry composition . . . . . . . . . . . . . . . . . . . . . 2764.10.1 EDM by elliptope . . . . . . . . . . . . . . . . . . . . . 2784.11 EDM indefiniteness . . . . . . . . . . . . . . . . . . . . . . . . 2784.11.1 EDM eigenvalues, congruence transformation . . . . . . 2794.11.2 Spectral cones . . . . . . . . . . . . . . . . . . . . . . . 2804.11.2.1 Cayley-Menger versus Schoenberg . . . . . . 2814.11.2.2 Ordered eigenvalues . . . . . . . . . . . . . . 2824.11.2.3 Unordered eigenvalues . . . . . . . . . . . . . 2844.11.2.4 Dual cone versus dual spectral cone . . . . . 2854.12 List reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 2864.12.1 x 1 at the origin. V N . . . . . . . . . . . . . . . . . . . 2864.12.2 0 geometric center. V . . . . . . . . . . . . . . . . . . 2884.13 Reconstruction examples . . . . . . . . . . . . . . . . . . . . . 2904.13.1 Isometric reconstruction . . . . . . . . . . . . . . . . . 2904.13.2 Isotonic reconstruction . . . . . . . . . . . . . . . . . . 2904.13.2.1 Isotonic map of the USA . . . . . . . . . . . . 2924.13.2.2 Isotonic solution with sort constraint . . . . . 2944.13.2.3 Convergence . . . . . . . . . . . . . . . . . . 2954.13.2.4 Interlude . . . . . . . . . . . . . . . . . . . . 2964.14 Fifth property of Euclidean metric . . . . . . . . . . . . . . . 2974.14.1 Recapitulate . . . . . . . . . . . . . . . . . . . . . . . . 2974.14.2 Derivation of the fifth . . . . . . . . . . . . . . . . . . 2974.14.2.1 Nonnegative determinant . . . . . . . . . . . 2984.14.2.2 Beyond the fifth metric property . . . . . . . 3004.14.3 Path not followed . . . . . . . . . . . . . . . . . . . . . 3004.14.3.1 N = 4 . . . . . . . . . . . . . . . . . . . . . . 3014.14.3.2 N = 5 . . . . . . . . . . . . . . . . . . . . . . 3024.14.3.3 Volume of simplices . . . . . . . . . . . . . . 3024.14.4 Affine dimension reduction in three dimensions . . . . . 3044.14.4.1 Exemplum redux . . . . . . . . . . . . . . . . 304

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 195 EDM cone 3055.1 Defining the EDM cone . . . . . . . . . . . . . . . . . . . . . . 3065.2 √Polyhedral bounds . . . . . . . . . . . . . . . . . . . . . . . . 3085.3 EDM cone is not convex . . . . . . . . . . . . . . . . . . . . 3105.4 a geometry of completion . . . . . . . . . . . . . . . . . . . . . 3125.5 EDM definition in 11 T . . . . . . . . . . . . . . . . . . . . . . 3175.5.1 Range of EDM D . . . . . . . . . . . . . . . . . . . . . 3185.5.2 Boundary constituents of EDM cone . . . . . . . . . . 3205.5.3 Faces of EDM cone . . . . . . . . . . . . . . . . . . . . 3235.5.3.1 Isomorphic faces . . . . . . . . . . . . . . . . 3235.5.3.2 Extreme direction of EDM cone . . . . . . . . 3235.5.3.3 Smallest face . . . . . . . . . . . . . . . . . . 3255.5.3.4 Open question . . . . . . . . . . . . . . . . . 3255.6 Correspondence to PSD cone S N−1+ . . . . . . . . . . . . . . . 3265.6.1 Gram-form correspondence to S N−1+ . . . . . . . . . . . 3285.6.2 EDM cone by elliptope . . . . . . . . . . . . . . . . . . 3285.7 Vectorization & projection interpretation . . . . . . . . . . . . 3315.7.1 Face of PSD cone S N + containing V . . . . . . . . . . . 3325.7.2 EDM criteria in 11 T . . . . . . . . . . . . . . . . . . . 3355.8 Dual EDM cone . . . . . . . . . . . . . . . . . . . . . . . . . . 3365.8.1 Ambient S N . . . . . . . . . . . . . . . . . . . . . . . . 3365.8.1.1 EDM cone and its dual in ambient S N . . . . 3375.8.1.2 nonnegative orthant . . . . . . . . . . . . . . 3425.8.1.3 Dual Euclidean distance matrix criterion . . . 3435.8.1.4 Nonorthogonal components of dual EDM . . . 3455.8.1.5 Affine dimension complementarity . . . . . . 3455.8.1.6 EDM cone duality . . . . . . . . . . . . . . . 3475.8.1.7 Schoenberg criterion is membership relation . 3485.8.2 Ambient S N h . . . . . . . . . . . . . . . . . . . . . . . . 3495.9 Theorem of the alternative . . . . . . . . . . . . . . . . . . . . 3505.10 postscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3526 Semidefinite programming 3536.1 Conic problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 3546.1.1 Maximal complementarity . . . . . . . . . . . . . . . . 3566.1.1.1 Reduced-rank solution . . . . . . . . . . . . . 3566.1.1.2 Coexistence of low- and high-rank solutions . 3586.1.1.3 Previous work . . . . . . . . . . . . . . . . . . 361

20 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY6.1.1.4 Later developments . . . . . . . . . . . . . . . 3626.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3626.2.1 Feasible sets . . . . . . . . . . . . . . . . . . . . . . . . 3626.2.1.1 A ∩ S n + emptiness determination via Farkas’ . 3636.2.1.2 Theorem of the alternative for semidefinite . . 3656.2.1.3 boundary-membership criterion . . . . . . . . 3666.2.2 Duals . . . . . . . . . . . . . . . . . . . . . . . . . . . 3676.2.3 Optimality conditions . . . . . . . . . . . . . . . . . . 3676.3 Rank reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3766.3.1 Posit a perturbation of X ⋆ . . . . . . . . . . . . . . . . 3766.3.2 Perturbation form . . . . . . . . . . . . . . . . . . . . . 3786.3.3 Optimality of perturbed X ⋆ . . . . . . . . . . . . . . . 3816.3.4 Final thoughts regarding rank reduction . . . . . . . . 3846.3.4.1 Inequality constraints . . . . . . . . . . . . . 3847 EDM proximity 3877.0.1 Measurement matrix H . . . . . . . . . . . . . . . . . . 3887.0.1.1 Order of imposition . . . . . . . . . . . . . . 3887.0.1.2 Egregious input error under nonnegativity . . 3897.0.2 Lower bound . . . . . . . . . . . . . . . . . . . . . . . 3927.0.3 Three prevalent proximity problems . . . . . . . . . . . 3927.1 First prevalent problem: . . . . . . . . . . . . . . . . . . . . . 3947.1.1 Closest-EDM Problem 1, convex case . . . . . . . . . . 3967.1.2 Generic problem . . . . . . . . . . . . . . . . . . . . . 3977.1.2.1 Projection on rank ρ subset of PSD cone . . 3977.1.3 Choice of spectral cone . . . . . . . . . . . . . . . . . . 3987.1.3.1 Orthant is best spectral cone for Problem 1 . 4007.1.4 Closest-EDM Problem 1, “nonconvex” case . . . . . . . 4017.1.4.1 Significance . . . . . . . . . . . . . . . . . . . 4037.1.5 Problem 1 in spectral norm, convex case . . . . . . . . 4047.2 Second prevalent problem: . . . . . . . . . . . . . . . . . . . . 4057.2.1 Convex case . . . . . . . . . . . . . . . . . . . . . . . . 4067.2.1.1 Equivalent semidefinite program, Problem 2 . 4067.2.1.2 Gram-form semidefinite program, Problem 2 . 4087.2.2 Minimization of affine dimension in Problem 2 . . . . . 4097.2.2.1 Rank minimization heuristic . . . . . . . . . . 4097.2.2.2 Applying trace rank-heuristic to Problem 2 . 4107.2.2.3 Rank-heuristic insight . . . . . . . . . . . . . 411

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 217.2.2.4 Rank minimization heuristic beyond convex . 4117.2.2.5 Applying log det rank-heuristic to Problem 2 4127.2.2.6 Tightening this log det rank-heuristic . . . . 4127.3 Third prevalent problem: . . . . . . . . . . . . . . . . . . . . . 4137.3.1 Convex case . . . . . . . . . . . . . . . . . . . . . . . . 4137.3.1.1 Equivalent semidefinite program, Problem 3 . 4147.3.1.2 Schur-form semidefinite program, Problem 3 . 4157.3.1.3 Gram-form semidefinite program, Problem 3 . 4167.3.2 Minimization of affine dimension in Problem 3 . . . . . 4167.3.3 Constrained affine dimension, Problem 3 . . . . . . . . 4177.3.3.1 Cayley-Menger form . . . . . . . . . . . . . . 4187.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421A Linear algebra 423A.1 Main-diagonal δ operator, trace, vec . . . . . . . . . . . . . . . 423A.1.1 Identities . . . . . . . . . . . . . . . . . . . . . . . . . 424A.2 Semidefiniteness: domain of test . . . . . . . . . . . . . . . . . 427A.2.1 Symmetry versus semidefiniteness . . . . . . . . . . . . 427A.3 Proper statements . . . . . . . . . . . . . . . . . . . . . . . . . 430A.3.1 Semidefiniteness, eigenvalues, nonsymmetric . . . . . . 432A.4 Schur complement . . . . . . . . . . . . . . . . . . . . . . . . 442A.4.1 Semidefinite program via Schur . . . . . . . . . . . . . 444A.4.2 Determinant . . . . . . . . . . . . . . . . . . . . . . . . 446A.5 eigen decomposition . . . . . . . . . . . . . . . . . . . . . . . . 448A.5.0.1 Uniqueness . . . . . . . . . . . . . . . . . . . 448A.5.1 Eigenmatrix . . . . . . . . . . . . . . . . . . . . . . . . 449A.5.2 Symmetric matrix diagonalization . . . . . . . . . . . . 450A.5.2.1 Positive semidefinite matrix square root . . . 450A.6 Singular value decomposition, SVD . . . . . . . . . . . . . . . 451A.6.1 Compact SVD . . . . . . . . . . . . . . . . . . . . . . . 451A.6.2 Subcompact SVD . . . . . . . . . . . . . . . . . . . . . 452A.6.3 Full SVD . . . . . . . . . . . . . . . . . . . . . . . . . 452A.6.4 SVD of symmetric matrices . . . . . . . . . . . . . . . 455A.6.5 Pseudoinverse by SVD . . . . . . . . . . . . . . . . . . 455A.7 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456A.7.1 0 entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 456A.7.2 0 eigenvalues theorem . . . . . . . . . . . . . . . . . . 456A.7.3 0 trace and matrix product . . . . . . . . . . . . . . . 459

22 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRYA.7.4 Zero definite . . . . . . . . . . . . . . . . . . . . . . . . 459B Simple matrices 461B.1 Rank-one matrix (dyad) . . . . . . . . . . . . . . . . . . . . . 462B.1.0.1 rank-one modification . . . . . . . . . . . . . 464B.1.0.2 dyad symmetry . . . . . . . . . . . . . . . . . 464B.1.1 Dyad independence . . . . . . . . . . . . . . . . . . . . 464B.1.1.1 Biorthogonality condition, Range, Nullspace . 466B.2 Doublet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467B.3 Elementary matrix . . . . . . . . . . . . . . . . . . . . . . . . 468B.3.1 Householder matrix . . . . . . . . . . . . . . . . . . . . 469B.4 Auxiliary V -matrices . . . . . . . . . . . . . . . . . . . . . . . 470B.4.1 Auxiliary projector matrix V . . . . . . . . . . . . . . 470B.4.2 Schoenberg auxiliary matrix V N . . . . . . . . . . . . . 472B.4.3 Orthonormal auxiliary matrix V W . . . . . . . . . . . . 473B.4.4 Auxiliary V -matrix Table . . . . . . . . . . . . . . . . 474B.4.5 More auxiliary matrices . . . . . . . . . . . . . . . . . 474B.5 Orthogonal matrix . . . . . . . . . . . . . . . . . . . . . . . . 475B.5.1 Vector rotation . . . . . . . . . . . . . . . . . . . . . . 475B.5.2 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . 475B.5.3 Rotation of range and rowspace . . . . . . . . . . . . . 476B.5.4 Matrix rotation . . . . . . . . . . . . . . . . . . . . . . 478B.5.4.1 bijection . . . . . . . . . . . . . . . . . . . . . 478C Some analytical optimal results 479C.1 properties of infima . . . . . . . . . . . . . . . . . . . . . . . . 479C.2 involving absolute value . . . . . . . . . . . . . . . . . . . . . 480C.3 involving diagonal, trace, eigenvalues . . . . . . . . . . . . . . 481C.4 Orthogonal Procrustes problem . . . . . . . . . . . . . . . . . 487C.4.1 Effect of translation . . . . . . . . . . . . . . . . . . . . 488C.4.1.1 Translation of extended list . . . . . . . . . . 489C.5 Two-sided orthogonal Procrustes . . . . . . . . . . . . . . . . 489C.5.0.1 Minimization . . . . . . . . . . . . . . . . . . 489C.5.0.2 Maximization . . . . . . . . . . . . . . . . . . 490C.5.1 Procrustes’ relation to linear programming . . . . . . . 490C.5.2 Two-sided orthogonal Procrustes via SVD . . . . . . . 491C.5.2.1 Symmetric matrices . . . . . . . . . . . . . . 492C.5.2.2 Diagonal matrices . . . . . . . . . . . . . . . 493

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 23D Matrix calculus 495D.1 Directional derivative, Taylor series . . . . . . . . . . . . . . . 495D.1.1 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . 495D.1.2 Product rules for matrix-functions . . . . . . . . . . . . 499D.1.2.1 Kronecker product . . . . . . . . . . . . . . . 500D.1.3 Chain rules for composite matrix-functions . . . . . . . 501D.1.3.1 Two arguments . . . . . . . . . . . . . . . . . 501D.1.4 First directional derivative . . . . . . . . . . . . . . . . 502D.1.4.1 Interpretation directional derivative . . . . . . 505D.1.5 Second directional derivative . . . . . . . . . . . . . . . 507D.1.6 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . 510D.1.7 Correspondence of gradient to derivative . . . . . . . . 511D.1.7.1 first-order . . . . . . . . . . . . . . . . . . . . 511D.1.7.2 second-order . . . . . . . . . . . . . . . . . . 512D.2 Tables of gradients and derivatives . . . . . . . . . . . . . . . 514D.2.1 Algebraic . . . . . . . . . . . . . . . . . . . . . . . . . 515D.2.2 Trace Kronecker . . . . . . . . . . . . . . . . . . . . . . 516D.2.3 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517D.2.4 Log determinant . . . . . . . . . . . . . . . . . . . . . 519D.2.5 Determinant . . . . . . . . . . . . . . . . . . . . . . . . 520D.2.6 Logarithmic . . . . . . . . . . . . . . . . . . . . . . . . 521D.2.7 Exponential . . . . . . . . . . . . . . . . . . . . . . . . 521E Projection 523E.0.1 Logical deductions . . . . . . . . . . . . . . . . . . . . 524E.1 Idempotent matrices . . . . . . . . . . . . . . . . . . . . . . . 525E.1.1 Biorthogonal characterization . . . . . . . . . . . . . . 526E.1.2 Idempotence summary . . . . . . . . . . . . . . . . . . 529E.2 I − P , Projection on algebraic complement . . . . . . . . . . . 530E.2.1 Universal projector characteristic . . . . . . . . . . . . 531E.3 Symmetric idempotent matrices . . . . . . . . . . . . . . . . . 531E.3.1 Four subspaces . . . . . . . . . . . . . . . . . . . . . . 532E.3.2 Orthogonal characterization . . . . . . . . . . . . . . . 532E.3.3 Summary, symmetric idempotent . . . . . . . . . . . . 534E.3.4 Orthonormal decomposition . . . . . . . . . . . . . . . 534E.3.4.1 Unifying trait of all projectors: direction . . . 534E.3.4.2 orthogonal projector, orthonormal . . . . . . 535E.3.4.3 orthogonal projector, biorthogonal . . . . . . 535

24 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRYE.3.4.4 nonorthogonal projector, biorthogonal . . . . 536E.4 Algebra of projection on affine subsets . . . . . . . . . . . . . 536E.5 Projection examples . . . . . . . . . . . . . . . . . . . . . . . 537E.6 Vectorization interpretation, . . . . . . . . . . . . . . . . . . . 543E.6.1 Nonorthogonal projection on a vector . . . . . . . . . . 543E.6.2 Nonorthogonal projection on vectorized matrix . . . . . 544E.6.2.1 Nonorthogonal projection on dyad . . . . . . 544E.6.3 Orthogonal projection on a vector . . . . . . . . . . . . 546E.6.4 Orthogonal projection on a vectorized matrix . . . . . 546E.6.4.1 Orthogonal projection on dyad . . . . . . . . 546E.6.4.2 Positive semidefiniteness test as projection . . 549E.6.4.3 PXP ≽ 0 . . . . . . . . . . . . . . . . . . . . 551E.7 on vectorized matrices of higher rank . . . . . . . . . . . . . . 551E.7.1 PXP misinterpretation for higher-rank P . . . . . . . 551E.7.2 Orthogonal projection on matrix subspaces . . . . . . . 551E.8 Range/Rowspace interpretation . . . . . . . . . . . . . . . . . 555E.9 Projection on convex set . . . . . . . . . . . . . . . . . . . . . 555E.9.1 Dual interpretation of projection on convex set . . . . . 556E.9.1.1 Dual interpretation as optimization . . . . . . 557E.9.2 Projection on cone . . . . . . . . . . . . . . . . . . . . 559E.9.2.1 Relation to subspace projection . . . . . . . . 560E.9.2.2 Salient properties: Projection on cone . . . . 561E.9.3 nonexpansivity . . . . . . . . . . . . . . . . . . . . . . 563E.9.4 Easy projections . . . . . . . . . . . . . . . . . . . . . 564E.9.5 Projection on convex set in subspace . . . . . . . . . . 567E.10 Alternating projection . . . . . . . . . . . . . . . . . . . . . . 568E.10.0.1 commutative projectors . . . . . . . . . . . . 568E.10.0.2 noncommutative projectors . . . . . . . . . . 569E.10.0.3 the bullets . . . . . . . . . . . . . . . . . . . . 571E.10.1 Distance and existence . . . . . . . . . . . . . . . . . . 573E.10.2 Feasibility and convergence . . . . . . . . . . . . . . . 573E.10.2.1 Relative measure of convergence . . . . . . . 576E.10.3 Optimization and projection . . . . . . . . . . . . . . . 580E.10.3.1 Dykstra’s algorithm . . . . . . . . . . . . . . 580E.10.3.2 Normal cone . . . . . . . . . . . . . . . . . . 582F Proof of EDM composition 585F.1 EDM-entry exponential . . . . . . . . . . . . . . . . . . . . . . 585

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 25G Matlab programs 589G.1 isedm() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589G.1.1 Subroutines for isedm() . . . . . . . . . . . . . . . . . 593G.1.1.1 chop() . . . . . . . . . . . . . . . . . . . . . 593G.1.1.2 Vn() . . . . . . . . . . . . . . . . . . . . . . . 593G.1.1.3 Vm() . . . . . . . . . . . . . . . . . . . . . . . 593G.1.1.4 signeig() . . . . . . . . . . . . . . . . . . . 594G.1.1.5 Dx() . . . . . . . . . . . . . . . . . . . . . . . 594G.2 conic independence, conici() . . . . . . . . . . . . . . . . . . 595G.2.1 lp() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597G.3 Map of the USA . . . . . . . . . . . . . . . . . . . . . . . . . . 598G.3.1 EDM, mapusa() . . . . . . . . . . . . . . . . . . . . . . 598G.3.1.1 USA map input-data decimation, decimate() 600G.3.2 EDM using ordinal data, omapusa() . . . . . . . . . . 600G.4 Rank reduction subroutine, RRf() . . . . . . . . . . . . . . . . 603G.4.1 svec() . . . . . . . . . . . . . . . . . . . . . . . . . . . 605G.4.2 svecinv() . . . . . . . . . . . . . . . . . . . . . . . . . 606H Notation and a few definitions 607Bibliography 621

26 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY

List of Figures1 Overview 331 Cocoon nebula . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 Application of trilateration is localization . . . . . . . . . . . . 353 Molecular conformation . . . . . . . . . . . . . . . . . . . . . . 374 Face recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Swiss roll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 USA map reconstruction . . . . . . . . . . . . . . . . . . . . . 407 Honeycomb . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 Convex geometry 458 Slab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Intersection of line with boundary . . . . . . . . . . . . . . . . 5010 Tangentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5311 Convex hull of a random list of points . . . . . . . . . . . . . . 6512 A simplicial cone . . . . . . . . . . . . . . . . . . . . . . . . . 6713 Fantope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6914 Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7115 Hyperplane illustrated ∂H is a line partially bounding . . . . 7316 Hyperplanes in R 2 . . . . . . . . . . . . . . . . . . . . . . . . 7517 Affine independence . . . . . . . . . . . . . . . . . . . . . . . . 7818 {z ∈ C | a T z = κ i } . . . . . . . . . . . . . . . . . . . . . . . . . 8019 Hyperplane supporting closed set . . . . . . . . . . . . . . . . 8120 Closed convex set illustrating exposed and extreme points . . 9021 Two-dimensional nonconvex cone . . . . . . . . . . . . . . . . 9422 Nonconvex cone made from lines . . . . . . . . . . . . . . . . 9423 Nonconvex cone is convex cone boundary . . . . . . . . . . . . 9527

28 LIST OF FIGURES24 Cone exterior is convex cone . . . . . . . . . . . . . . . . . . . 9525 Truncated nonconvex cone X . . . . . . . . . . . . . . . . . . 9626 Not a cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9727 K is a pointed polyhedral cone having empty interior . . . . . 10328 Exposed and extreme directions . . . . . . . . . . . . . . . . . 10729 Positive semidefinite cone . . . . . . . . . . . . . . . . . . . . 11130 Convex Schur-form set . . . . . . . . . . . . . . . . . . . . . . 11231 Projection of the PSD cone . . . . . . . . . . . . . . . . . . . 11532 Circular cone showing axis of revolution . . . . . . . . . . . . 11933 Circular section . . . . . . . . . . . . . . . . . . . . . . . . . . 12134 Polyhedral inscription . . . . . . . . . . . . . . . . . . . . . . 12235 Conically (in)dependent vectors . . . . . . . . . . . . . . . . . 13036 Pointed six-faceted polyhedral cone and its dual . . . . . . . . 13237 Minimal set of generators for halfspace about origin . . . . . . 13438 Unit simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14139 Two views of a simplicial cone and its dual . . . . . . . . . . . 14340 Two equivalent constructions of dual cone . . . . . . . . . . . 14741 Dual polyhedral cone construction by right-angle . . . . . . . 14842 K is a halfspace about the origin . . . . . . . . . . . . . . . . 14943 Orthogonal cones . . . . . . . . . . . . . . . . . . . . . . . . . 15344 Blades K and K ∗ . . . . . . . . . . . . . . . . . . . . . . . . . 15445 Shrouded polyhedral cone . . . . . . . . . . . . . . . . . . . . 16346 Simplicial cone K in R 2 and its dual . . . . . . . . . . . . . . . 16847 Monotone nonnegative cone K M+ and its dual . . . . . . . . . 17848 Monotone cone K M and its dual . . . . . . . . . . . . . . . . . 18049 Two views of monotone cone K M and its dual . . . . . . . . . 1813 Geometry of convex functions 19150 Convex functions having unique minimizer . . . . . . . . . . . 19351 Affine function . . . . . . . . . . . . . . . . . . . . . . . . . . 19652 Supremum of affine functions . . . . . . . . . . . . . . . . . . 19753 Gradient in R 2 evaluated on grid . . . . . . . . . . . . . . . . 19954 Quadratic function convexity in terms of its gradient . . . . . 20255 Plausible contour plot . . . . . . . . . . . . . . . . . . . . . . 204

LIST OF FIGURES 294 Euclidean Distance Matrix 21556 Convex hull of three points . . . . . . . . . . . . . . . . . . . . 21657 Complete dimensionless EDM graph . . . . . . . . . . . . . . 21958 Fifth Euclidean metric property . . . . . . . . . . . . . . . . . 22159 Arbitrary hexagon in R 3 . . . . . . . . . . . . . . . . . . . . . 23060 Kissing number . . . . . . . . . . . . . . . . . . . . . . . . . . 23261 Trilateration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23462 This EDM graph provides unique isometric reconstruction . . 23863 Two sensors • and three anchors ◦ . . . . . . . . . . . . . . . 23964 Two discrete linear trajectories of sensors . . . . . . . . . . . . 24065 Coverage in cellular telephone network . . . . . . . . . . . . . 24266 Orthogonal complements in S N abstractly oriented . . . . . . . 25267 Elliptope E 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27168 Minimum eigenvalue of −VN TDV N . . . . . . . . . . . . . . . . 27769 Some entrywise EDM compositions . . . . . . . . . . . . . . . 27770 Map of United States of America . . . . . . . . . . . . . . . . 28971 Largest ten eigenvalues of −VN TOV N . . . . . . . . . . . . . . 29272 Relative-angle inequality tetrahedron . . . . . . . . . . . . . . 29973 Nonsimplicial pyramid in R 3 . . . . . . . . . . . . . . . . . . . 3035 EDM cone 30574 Relative boundary of cone of Euclidean distance matrices . . . 30775 Intersection of EDM cone with hyperplane . . . . . . . . . . . 30976 Neighborhood graph . . . . . . . . . . . . . . . . . . . . . . . 31177 Trefoil knot untied . . . . . . . . . . . . . . . . . . . . . . . . 31478 Trefoil ribbon . . . . . . . . . . . . . . . . . . . . . . . . . . . 31679 Example of V X selection to make an EDM . . . . . . . . . . . 31980 Vector V X spirals . . . . . . . . . . . . . . . . . . . . . . . . . 32181 Three views of translated negated elliptope . . . . . . . . . . . 32982 Halfline T on PSD cone boundary . . . . . . . . . . . . . . . . 33383 Vectorization and projection interpretation example . . . . . . 33484 Orthogonal complement of geometric center subspace . . . . . 33985 EDM cone construction by flipping PSD cone . . . . . . . . . 34086 Decomposing member of polar EDM cone . . . . . . . . . . . 34487 Ordinary dual EDM cone projected on S 3 h . . . . . . . . . . . 351

30 LIST OF FIGURES6 Semidefinite programming 35388 Visualizing positive semidefinite cone in high dimension . . . . 3577 EDM proximity 38789 Pseudo-Venn diagram . . . . . . . . . . . . . . . . . . . . . . 39090 Elbow placed in path of projection . . . . . . . . . . . . . . . 391A Linear algebra 42391 Geometrical interpretation of full SVD . . . . . . . . . . . . . 454B Simple matrices 46192 Four fundamental subspaces of any dyad . . . . . . . . . . . . 46393 Four fundamental subspaces of a doublet . . . . . . . . . . . . 46794 Four fundamental subspaces of elementary matrix . . . . . . . 46895 Gimbal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476D Matrix calculus 49596 Convex quadratic bowl in R 2 × R . . . . . . . . . . . . . . . . 505E Projection 52397 Nonorthogonal projection of x∈ R 3 on R(U)= R 2 . . . . . . . 52898 Biorthogonal expansion of point x∈aff K . . . . . . . . . . . . 53999 Dual interpretation of projection on convex set . . . . . . . . . 558100 Projection product on convex set in subspace . . . . . . . . . 566101 von Neumann-style projection of point b . . . . . . . . . . . . 569102 Alternating projection on two halfspaces . . . . . . . . . . . . 570103 Distance, optimization, feasibility . . . . . . . . . . . . . . . . 572104 Alternating projection on nonnegative orthant and hyperplane 575105 Geometric convergence of iterates in norm . . . . . . . . . . . 575106 Distance between PSD cone and iterate in A . . . . . . . . . . 579107 Dykstra’s alternating projection algorithm . . . . . . . . . . . 581108 Normal cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582

List of Tables2 Convex GeometryTable 2.9.2.2.1, rank versus dimension of S 3 + faces 117Table 2.10.0.0.1, maximum number c.i. directions 131Cone Table 1 175Cone Table S 175Cone Table A 177Cone Table 1* 1806 Semidefinite programmingFaces of S 3 + correspond to faces of S 3 + 359B Simple matricesAuxiliary V -matrix Table B.4.4 474D Matrix calculusTable D.2.1, algebraic gradients and derivatives 515Table D.2.2, trace Kronecker gradients 516Table D.2.3, trace gradients and derivatives 517Table D.2.4, log determinant gradients and derivatives 519Table D.2.5, determinant gradients and derivatives 520Table D.2.7, exponential gradients and derivatives 5212001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.31

Chapter 1OverviewConvex OptimizationEuclidean Distance GeometryPeople are so afraid of convex analysis.−Claude Lemaréchal, 2003Any convex optimization problem has geometric interpretation. If a givenoptimization problem can be transformed to a convex equivalent, then thisinterpretive benefit is acquired. That is a powerful attraction: the ability tovisualize geometry of an optimization problem. Conversely, recent advancesin geometry and in graph theory hold convex optimization within their proofs’core. [262] [205]This book is about convex optimization, convex geometry (with particularattention to distance geometry), geometrical problems, and problems thatcan be transformed into geometrical problems.2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.33

34 CHAPTER 1. OVERVIEWFigure 1: Cocoon nebula; IC5146. (astrophotography by Jerry Lodriguss)Euclidean distance geometry is, fundamentally, a determination of pointconformation (configuration, relative position or location) by inference frominterpoint distance information. By inference we mean: given only distanceinformation, determine whether there corresponds a realizable conformationof points; a list of points in some dimension that attains the given interpointdistances. Each point may represent simply location or, abstractly, any entityexpressible as a vector in finite-dimensional Euclidean space.We might, for example, realize a constellation given only interstellardistance (or, equivalently, distance from Earth and relative angularmeasurement; the Earth as vertex to two stars). At first it may seem O(N 2 )data is required, yet there are circumstances where this can be reduced toO(N). If we agree a set of points can have a shape (three points can form atriangle and its interior, for example, four points a tetrahedron), then we canascribe shape of a set of points to their convex hull. It should be apparent:these shapes can be determined only to within a rigid transformation (arotation, reflection, and a translation).

35ˇx 4ˇx 3ˇx 2Figure 2: Application of trilateration (4.4.2.2.4) is localization (determiningposition) of a radio signal source in 2 dimensions; more commonly known byradio engineers as the process “triangulation”. In this scenario, anchorsˇx 2 , ˇx 3 , ˇx 4 are illustrated as fixed antennae. [129] The radio signal source(a sensor • x 1 ) anywhere in affine hull of three antenna bases can beuniquely localized by measuring distance to each (dashed white arrowed linesegments). Ambiguity of lone distance measurement to sensor is representedby circle about each antenna. Ye proved trilateration is expressible as asemidefinite program; hence, a convex optimization problem. [204]

36 CHAPTER 1. OVERVIEWAbsolute position information is generally lost, given only distanceinformation, but we can determine the smallest possible dimension in whichan unknown list of points can exist; that attribute is their affine dimension(a triangle in any ambient space has affine dimension 2, for example). Incircumstances where fixed points of reference are also provided, it becomespossible to determine absolute position or location; e.g., Figure 2.Geometric problems involving distance between points can sometimes bereduced to convex optimization problems. Mathematics of this combinedstudy of geometry and optimization is rich and deep. Its applicationhas already proven invaluable discerning organic molecular conformationby measuring interatomic distance along covalent bonds; e.g., Figure 3.[49] [227] [108] [247] [75] Many disciplines have already benefitted andsimplified consequent to this theory; e.g., distance-based pattern recognition(Figure 4), localization in wireless sensor networks by measurement ofintersensor distance along channels of communication, [30,5] [257] [29]wireless location of a radio-signal source such as a cell phone by multiplemeasurements of signal strength, the global positioning system (GPS), andmultidimensional scaling which is a numerical representation of qualitativedata by finding a low-dimensional scale. Euclidean distance geometry hasalso found application in artificial intelligence: to machine learning bydiscerning naturally occurring manifolds in Euclidean bodies (Figure 5,5.4.0.0.1), in Fourier spectra of kindred utterances [131], and in imagesequences [243].by chapterWe study the pervasive convex Euclidean bodies and their variousrepresentations. In particular, we make convex polyhedra, cones, and dualcones more visceral through illustration in chapter 2, Convex geometry,and we study the geometric relation of polyhedral cones to nonorthogonalbases (biorthogonal expansion). We explain conversion between halfspaceandvertex-descriptions of convex cones, we motivate dual cones and provideformulae describing them, and we show how first-order optimality conditionsor linear matrix inequalities [37] or established alternative systems of linearinequality can be explained by generalized inequalities with respect to convexcones and their duals. The conic analogue to linear independence, calledconic independence, is introduced as a new tool in the study of cones; thelogical next step in the progression: linear, affine, conic.

Figure 3: [112] [64] Distance data collected via nuclear magnetic resonance(NMR) helped render this 3-dimensional depiction of a protein molecule.At the beginning of the 1980s, Kurt Wüthrich [Nobel laureate], developedan idea about how NMR could be extended to cover biological moleculessuch as proteins. He invented a systematic method of pairing each NMRsignal with the right hydrogen nucleus (proton) in the macromolecule. Themethod is called sequential assignment and is today a cornerstone of all NMRstructural investigations. He also showed how it was subsequently possible todetermine pairwise distances between a large number of hydrogen nuclei anduse this information with a mathematical method based on distance-geometryto calculate a three-dimensional structure for the molecule. [176] [247] [108]37

38 CHAPTER 1. OVERVIEWFigure 4: This coarsely discretized triangulated algorithmically flattenedhuman face (made by Kimmel & the Bronsteins [139]) represents a stagein machine recognition of human identity; called face recognition. Distancegeometry is applied to determine discriminating features.Any convex optimization problem can be visualized geometrically.Chapter 2 provides tools to make visualization easier. The concepts offace, extreme point, and extreme direction of a convex Euclidean body areexplained here, crucial to understanding convex optimization. The convexcone of positive semidefinite matrices, in particular, is studied in depth. Weinterpret, for example, inverse image of the positive semidefinite cone underaffine transformation, and we explain how higher-rank subsets of the coneboundary united with its interior are convex.

394Figure 5: Swiss roll from Weinberger & Saul [243]. The problem of manifoldlearning, illustrated for N =800 data points sampled from a “Swiss roll” 1Ç.A discretized manifold is revealed by connecting each data point and its k=6nearest neighbors 2Ç. An unsupervised learning algorithm unfolds the Swissroll while preserving the local geometry of nearby data points 3Ç. Finally, thedata points are projected onto the two dimensional subspace that maximizestheir variance, yielding a faithful embedding of the original manifold 4Ç.

40 CHAPTER 1. OVERVIEWoriginalreconstructionFigure 6: About five thousand points along the borders constituting theUnited States were used to create an exhaustive matrix of interpoint distancefor each and every pair of points in the ordered set (a list); called aEuclidean distance matrix. From that noiseless distance information, it iseasy to reconstruct the map exactly via the Schoenberg criterion (479).(4.13.1.0.1, confer Figure 70) Map reconstruction is exact (to within a rigidtransformation) given any number of interpoint distances; the greater thenumber of distances, the greater the detail (just as it is for all conventionalmap preparation).Chapter 3, Geometry of convex functions, observes analogiesbetween convex sets and functions: The set of all vector-valued convexfunctions of particular dimension is a closed convex cone. Included amongthe examples in this chapter, we show how the real affine function relatesto convex functions as the hyperplane relates to convex sets. Here, also,pertinent results for multidimensional convex functions are presentedthat are largely ignored in the literature; tricks and tips for determiningtheir convexity and discerning their geometry, particularly with regard tomatrix calculus which remains largely unsystematized when compared withtraditional (ordinary) calculus. Consequently, we collect some results ofmatrix differentiation in the appendices.

41Figure 7: These bees construct a honeycomb by solving a convex optimizationproblem. (4.4.2.2.3) The most dense packing of identical spheres about acentral sphere in two dimensions is 6. Packed sphere centers describe aregular lattice.The EDM is studied in chapter 4, Euclidean distance matrix, itsproperties and relationship to both positive semidefinite and Gram matrices.We relate the EDM to the four classical properties of the Euclidean metric;thereby, observing existence of an infinity of properties of the Euclideanmetric beyond the triangle inequality. We proceed by deriving the fifthEuclidean metric property and then explain why furthering this endeavoris inefficient because the ensuing criteria (while describing polyhedra inangle or area, volume, content, and so on ad infinitum) grow linearly incomplexity and number with problem size. Some geometrical problemssolvable via EDMs, EDM problems posed as convex optimization, andmethods of solution are presented. Methods of reconstruction are discussedand applied to the map of the United States; e.g., Figure 6. We also generatea distorted but recognizable isotonic map of the USA using only comparativedistance information (only ordinal distance data).We offer a new proof of the Schoenberg characterization of Euclideandistance matrices in chapter 4;D ∈ EDM N⇔{−VTN DV N ≽ 0D ∈ S N h(479)Our proof relies on fundamental geometry; assuming, any EDM mustcorrespond to a list of points contained in some polyhedron (possibly atits vertices) and vice versa. It is known, but not obvious, this Schoenbergcriterion implies nonnegativity of the EDM entries; proved here.

42 CHAPTER 1. OVERVIEWWe characterize the eigenvalue spectrum of an EDM in chapter 4, andthen devise a polyhedral cone required for determining membership of acandidate matrix (in Cayley-Menger form) to the convex cone of Euclideandistance matrices (EDM cone); id est, a candidate is an EDM if and only ifits eigenspectrum belongs to a spectral cone for EDM N ;D ∈ EDM N⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])∈[ ]RN+∩ ∂HR −(672)We will see spectral cones are not unique.In chapter 5, EDM cone, we explain the geometric relationshipbetween the cone of Euclidean distance matrices, two positive semidefinitecones, and the elliptope. We illustrate geometric requirements, in particular,for projection of a candidate matrix on a positive semidefinite cone thatestablish its membership to the EDM cone. The faces of the EDM cone aredescribed, but still open is the question whether all its faces are exposedas they are for the positive semidefinite cone. The Schoenberg criterion(479), relating the EDM cone and a positive semidefinite cone, is revealedto be a discretized membership relation (dual generalized inequalities, a newFarkas’-like lemma) between the EDM cone and its ordinary dual, EDM N∗ .A matrix criterion for membership to the dual EDM cone is derived that issimpler than the Schoenberg criterion:D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (816)We derive a new concise equality of the EDM cone to two subspaces and apositive semidefinite cone;EDM N = S N h ∩ ( )S N⊥c − S N +(810)

Semidefinite programming is reviewed in chapter 6 with particularattention to optimality conditions of prototypical primal and dual conicprograms, their interplay, and the perturbation method of rank reductionof optimal solutions (extant but not well-known). Positive definite Farkas’lemma is derived, and a three-dimensional polyhedral analogue for thepositive semidefinite cone of 3 × 3 symmetric matrices is introduced. Thisanalogue is a new tool for visualizing coexistence of low- and high-rankoptimal solutions in 6 dimensions. We solve one instance of a combinatorialoptimization problem via semidefinite program relaxation; id est,43minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n(874)an optimal minimum cardinality Boolean solution to Ax=b .In chapter 7, EDM proximity, we explore methods of solution to a fewfundamental and prevalent Euclidean distance matrix proximity problems;the problem of finding that Euclidean distance matrix closest to a givenmatrix in some sense. We discuss several heuristics for the problems whencompounded with rank minimization: (936)minimize ‖−V (D − H)V ‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖D − H‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖ ◦√ D − H‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMNminimize ‖−V ( ◦√ D − H)V ‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMNWe offer a new geometrical proof of a famous result discovered byEckart & Young in 1936 [67] regarding Euclidean projection of a pointon that nonconvex subset of the positive semidefinite cone comprisingall positive semidefinite matrices having rank not exceeding a prescribedbound ρ . We explain how this problem is transformed to a convexoptimization for any rank ρ .

44 CHAPTER 1. OVERVIEWEight appendices are provided so as to be more self-contained:linear algebra (appendix A is primarily concerned with properstatements of semidefiniteness for square matrices),simple matrices (dyad, doublet, elementary, Householder, Schoenberg,orthogonal, etcetera, in appendix B),a collection of known analytical solutions to some importantoptimization problems (appendix C),matrix calculus (appendix D concerns matrix-valued functions, theirderivatives and directional derivatives, Taylor series, and tables of firstandsecond-order gradients and derivatives),an elaborate and insightful exposition of orthogonal and nonorthogonalprojection on convex sets (the connection between projection andpositive semidefiniteness, for example, in appendix E),software to discriminate EDMs, conic independence, software to reducerank of an optimal solution to a semidefinite program, and two distinctmethods of reconstructing a map of the United States given onlydistance information (appendix G).

Chapter 2Convex geometryConvexity has an immensely rich structure and numerousapplications. On the other hand, almost every “convex” idea canbe explained by a two-dimensional picture.−Alexander Barvinok [17, p.vii]There is relatively less published pertaining to matrix-valued convex sets andfunctions. [133] [126,6.6] [185] As convex geometry and linear algebra areinextricably bonded, we provide much background material on linear algebra(especially in the appendices) although it is assumed the reader is comfortablewith [212], [214], [125], or any other intermediate-level text. The essentialreferences to convex analysis are [123] [194]. The reader is referred to [210][17] [242] [27] [38] [191] [231] for a comprehensive treatment of convexity.2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.45

46 CHAPTER 2. CONVEX GEOMETRY2.1 Convex setA set C is convex iff for all Y, Z ∈ C and 0≤µ≤1µ Y + (1 − µ)Z ∈ C (1)Under that defining condition on µ , the linear sum in (1) is called a convexcombination of Y and Z . If Y and Z are points in Euclidean real vectorspace R n or R m×n (matrices), then (1) represents the closed line segmentjoining them. All line segments are thereby convex sets. Apparent from thedefinition, a convex set is a connected set. [160,3.4,3.5] [27, p.2]The idealized chicken egg, an ellipsoid (Figure 9(c), p.50), is a goodorganic icon for a convex set in three dimensions R 3 .2.1.1 subspaceA subset of Euclidean real vector space R n is called a subspace (formallydefined in2.5) if every vector of the form αx + βy , for α,β∈R , is in thesubset whenever vectors x and y are. [154,2.3] A subspace is a convex setcontaining the origin, by definition. [194, p.4] Any subspace is therefore openin the sense that it contains no boundary, but closed in the sense [160,2]R n + R n = R n (2)It is not difficult to showR n = −R n (3)as is true for any subspace R , because x∈ R n ⇔ −x∈ R n .The intersection of an arbitrary collection of subspaces remains asubspace. Any subspace not constituting the entire ambient vector space R nis a proper subspace; e.g., 2.1 any line through the origin in two-dimensionalEuclidean space R 2 . The vector space R n is itself a conventional subspace,inclusively, [140,2.1] although not proper.2.1 We substitute the abbreviation e.g. in place of the Latin exempli gratia.

2.1. CONVEX SET 47Figure 8: A slab is a convex Euclidean body infinite in extent butnot affine. Illustrated in R 2 , it may be constructed by intersectingtwo opposing halfspaces whose bounding hyperplanes are parallel but notcoincident. Because number of halfspaces used in its construction is finite,slab is a polyhedron. (Cartesian axes and vector inward-normal to eachhalfspace-boundary are drawn for reference.)2.1.2 linear independenceArbitrary given vectors in Euclidean space {Γ i ∈ R n , i=1... N} are linearlyindependent (l.i.) if and only if, for all ζ ∈ R NΓ 1 ζ 1 + · · · + Γ N−1 ζ N−1 + Γ N ζ N = 0 (4)has only the trivial solution ζ = 0 ; in other words, iff no vector from thegiven set can be expressed as a linear combination of those remaining.2.1.2.1 preservationLinear independence can be preserved under linear transformation. Givenmatrix Y ∆ = [y 1 y 2 · · · y N ]∈ R N×N , consider the mappingT(Γ) : R n×N → R n×N ∆ = ΓY (5)whose domain is the set of all matrices Γ∈ R n×N holding a linearlyindependent set columnar. Linear independence of {Γy i ∈ R n , i=1... N}demands, by definition, there exists no nontrivial solution ζ ∈ R N toΓy 1 ζ i + · · · + Γy N−1 ζ N−1 + Γy N ζ N = 0 (6)By factoring Γ , we see that is ensured by linear independence of {y i ∈ R N }.

48 CHAPTER 2. CONVEX GEOMETRY2.1.3 Orthant:name given to a closed convex set that is the higher-dimensionalgeneralization of quadrant from the classical Cartesian partition of R 2 .The most common is the nonnegative orthant R n + or R n×n+ (analogueto quadrant I) to which membership denotes nonnegative vector- ormatrix-entries respectively; e.g.,R n +∆= {x∈ R n | x i ≥ 0 ∀i} (7)The nonpositive orthant R n − or R n×n− (analogue to quadrant III) denotesnegative and 0 entries. Orthant convexity 2.2 is easily verified bydefinition (1).2.1.4 affine setA nonempty affine set (from the word affinity) is any subset of R n that is atranslation of some subspace. An affine set is convex and open so contains noboundary: e.g., ∅ , point, line, plane, hyperplane (2.4.2), subspace, etcetera.For some parallel 2.3 subspace M and any point x∈AA is affine ⇔ A = x + M= {y | y − x∈M}(8)The intersection of an arbitrary collection of affine sets remains affine. Theaffine hull of a set C ⊆ R n (2.3.1) is the smallest affine set containing it.2.1.5 dimensionDimension of an arbitrary set S is the dimension of its affine hull; [242, p.14]dim S ∆ = dim aff S = dim aff(S − s) , s∈ S (9)the same as dimension of the subspace parallel to that affine set aff S whennonempty. Hence dimension (of a set) is synonymous with affine dimension.[123, A.2.1]2.2 All orthants are self-dual simplicial cones. (2.13.5.1,2.12.3.1.1)2.3 Two affine sets are said to be parallel when one is a translation of the other. [194, p.4]

2.1. CONVEX SET 492.1.6 empty set versus empty interiorEmptiness of a set ∅ is handled differently than interior in the classicalliterature. It is common for a nonempty convex set to have empty interior;[123,A.2.1] e.g., paper in the real world. Thus the term relative is theconventional fix to this ambiguous terminology: 2.4An ordinary flat piece of paper is an example of a nonempty convex setin R 3 having empty interior but relatively nonempty interior.2.1.6.1 relative interiorWe distinguish interior from relative interior throughout. [210] [242] [231]The relative interior rel int C of a convex set C ⊆ R n is its interior relativeto its affine hull. 2.5 Thus defined, it is common (though confusing) for int Cthe interior of C to be empty while its relative interior is not: this happenswhenever dimension of its affine hull is less than dimension of the ambientspace (dim aff C < n , e.g., were C a flat piece of paper in R 3 ) or in theexception when C is a single point; [160,2.2.1]rel int{x} ∆ = aff{x} = {x} , int{x} = ∅ , x∈ R n (10)In any case, closure of the relative interior of a convex set C always yieldsthe closure of the set itself;rel int C = C (11)If C is convex then rel int C is convex, and it is always possible to pass to asmaller ambient Euclidean space where a nonempty set acquires an interior.[17,II.2.3].Given the intersection of convex set C with an affine set Arel int(C ∩ A) = rel int(C) ∩ A (12)If C has nonempty interior, then rel int C = int C .2.4 Superfluous mingling of terms as in relatively nonempty set would be an unfortunateconsequence. From the opposite perspective, some authors use the term full orfull-dimensional to describe a set having nonempty interior.2.5 Likewise for relative boundary (2.6.1.3), although relative closure is superfluous.[123,A.2.1]

50 CHAPTER 2. CONVEX GEOMETRY(a)R(b)R 2(c)R 3Figure 9: (a) Ellipsoid in R is a line segment whose boundary comprises twopoints. Intersection of line with ellipsoid in R , (b) in R 2 , (c) in R 3 . Eachellipsoid illustrated has entire boundary constituted by 0-dimensional faces.Intersection of line with boundary is a point at entry to interior. These samefacts hold in higher dimension.2.1.7 classical boundary(confer2.6.1.3) Boundary of a set C is the closure of C less its interiorpresumed nonempty; [36,1.1]∂ C = C \ int C (13)which follows from the factint C = C (14)assuming nonempty interior. 2.6 One implication is: an open set has aboundary defined although not contained in the set.2.6 Otherwise, for x∈ R n as in (10), [160,2.1,2.3]the empty set is both open and closed.int{x} = ∅ = ∅

2.1. CONVEX SET 512.1.7.1 Line intersection with boundaryA line can intersect the boundary of a convex set in any dimension at apoint demarcating the line’s entry to the set interior. On one side of thatentry-point along the line is the exterior of the set, on the other side is theset interior. In other words,starting from any point of a convex set, a move toward the interior isan immediate entry into the interior. [17,II.2]When a line intersects the interior of a convex body in any dimension, theboundary appears to the line to be as thin as a point. This is intuitivelyplausible because, for example, a line intersects the boundary of the ellipsoidsin Figure 9 at a point in R , R 2 , and R 3 . Such thinness is a remarkablefact when pondering visualization of convex polyhedra (2.12,4.14.3) infour dimensions, for example, having boundaries constructed from otherthree-dimensional convex polyhedra called faces.We formally define face in (2.6). For now, we observe the boundaryof a convex body to be entirely constituted by all its faces of dimensionlower than the body itself. For example: The ellipsoids in Figure 9 haveboundaries composed only of zero-dimensional faces. The two-dimensionalslab in Figure 8 is a polyhedron having one-dimensional faces making itsboundary. The three-dimensional bounded polyhedron in Figure 11 haszero-, one-, and two-dimensional polygonal faces constituting its boundary.2.1.7.1.1 Example. Intersection of line with boundary in R 6 .The convex cone of positive semidefinite matrices S 3 + (2.9) in the ambientsubspace of symmetric matrices S 3 (2.2.2.0.1) is a six-dimensional Euclideanbody in isometrically isomorphic R 6 (2.2.1). The boundary of thepositive semidefinite cone in this dimension comprises faces having only thedimensions 0, 1, and 3 ; id est, {ρ(ρ+1)/2, ρ=0, 1, 2}.Unique minimum-distance projection PX (E.9) of any point X ∈ S 3 onthat cone is known in closed form (7.1.2). Given, for example, λ ∈ int R 3 +and diagonalization (A.5.2) of exterior point⎡X = QΛQ T ∈ S 3 , Λ =∆ ⎣⎤λ 1 0λ 2⎦ (15)0 −λ 3

52 CHAPTER 2. CONVEX GEOMETRYwhere Q∈ R 3×3 is an orthogonal matrix, then the projection on S 3 + in R 6 is⎡PX = Q⎣λ 1 0λ 20 0⎤⎦Q T ∈ S 3 + (16)This positive semidefinite matrix PX nearest X thus has rank 2, found bydiscarding all negative eigenvalues. The line connecting these two points is{X + (PX−X)t , t∈R} where t=0 ⇔ X and t=1 ⇔ PX . Becausethis line intersects the boundary of the positive semidefinite cone S 3 + atpoint PX and passes through its interior (by assumption), then the matrixcorresponding to an infinitesimally positive perturbation of t there shouldreside interior to the cone (rank 3). Indeed, for ε an arbitrarily small positiveconstant,⎡ ⎤λ 1 0X + (PX−X)t| t=1+ε= Q(Λ+(PΛ−Λ)(1+ε))Q T = Q⎣λ 2⎦Q T ∈ int S 3 +0 ελ 32.1.7.2 Tangential line intersection with boundary(17)A higher-dimensional boundary ∂ C of a convex Euclidean body C is simplya dimensionally larger set through which a line can pass when it does notintersect the body’s interior. Still, for example, a line existing in five ormore dimensions may pass tangentially (intersecting no point interior to C[135,15.3]) through a single point relatively interior to a three-dimensionalface on ∂ C . Let’s understand why by inductive reasoning.Figure 10(a) shows a vertical line-segment whose boundary comprisesits two endpoints. For a line to pass through the boundary tangentially(intersecting no point relatively interior to the line-segment), it must existin an ambient space of at least two dimensions. Otherwise, the line is confinedto the same one-dimensional space as the line-segment and must pass alongthe segment to reach the end points.Figure 10(b) illustrates a two-dimensional ellipsoid whose boundary isconstituted entirely by zero-dimensional faces. Again, a line must exist inat least two dimensions to tangentially pass through any single arbitrarilychosen point on the boundary (without intersecting the ellipsoid interior).

2.1. CONVEX SET 53(a)R 2(b)(c)R 3(d)Figure 10: Tangential line (a) (b) to relative interior of a 0-dimensional facein R 2 , (c) (d) to relative interior of a 1-dimensional face in R 3 .

54 CHAPTER 2. CONVEX GEOMETRYNow let’s move to an ambient space of three dimensions. Figure 10(c)shows a polygon rotated into three dimensions. For a line to pass through itszero-dimensional boundary (one of its vertices) tangentially, it must existin at least the two dimensions of the polygon. But for a line to passtangentially through a single arbitrarily chosen point in the relative interiorof a one-dimensional face on the boundary as illustrated, it must exist in atleast three dimensions.Figure 10(d) illustrates a solid circular pyramid (upside-down) whoseone-dimensional faces are line-segments emanating from its pointed end(its vertex). This pyramid’s boundary is constituted solely by theseone-dimensional line-segments. A line may pass through the boundarytangentially, striking only one arbitrarily chosen point relatively interior toa one-dimensional face, if it exists in at least the three-dimensional ambientspace of the pyramid.From these few examples, way deduce a general rule (without proof):A line may pass tangentially through a single arbitrarily chosen pointrelatively interior to a k-dimensional face on the boundary of a convexEuclidean body if the line exists in dimension at least equal to k+2.Now the interesting part, with regard to Figure 11 showing a boundedpolyhedron in R 3 ; call it P : A line existing in at least four dimensions isrequired in order to pass tangentially (without hitting int P) through a singlearbitrary point in the relative interior of any two-dimensional polygonal faceon the boundary of polyhedron P . Now imagine that polyhedron P is itselfa three-dimensional face of some other polyhedron in R 4 . To pass a linetangentially through polyhedron P itself, striking only one point from itsrelative interior rel int P as claimed, requires a line existing in at least fivedimensions.This rule can help determine whether there exists unique solution to aconvex optimization problem whose feasible set is an intersecting line; e.g.,the trilateration problem (4.4.2.2.4).2.1.8 intersection, sum, difference, product2.1.8.0.1 Theorem. Intersection. [38,2.3.1] [194,2]The intersection of an arbitrary collection of convex sets is convex. ⋄

2.1. CONVEX SET 55This theorem in converse is implicitly false in so far as a convex set canbe formed by the intersection of sets that are not.Vector sum of two convex sets C 1 and C 2C 1 + C 2 = {x + y | x ∈ C 1 , y ∈ C 2 } (18)is convex.By additive inverse, we can similarly define vector difference of two convexsetsC 1 − C 2 = {x − y | x∈ C 1 , y ∈ C 2 } (19)which is convex. Applying this definition to nonempty convex set C 1 , itsself-difference C 1 − C 1 is generally nonempty, nontrivial, and convex; e.g.,for any convex cone K , (2.7.2) the set K − K constitutes its affine hull.[194, p.15]Cartesian productC 1 × C 2 ={[ xy]| x ∈ C 1 , y ∈ C 2}=[C1C 2](20)remains convex.Convex results are also obtained for scaling κ C of a convex set C ,rotation/reflection Q C , or translation C+ α ; all similarly defined.Given any operator T and convex set C , we are prone to write T(C)meaningT(C) ∆ = {T(x) | x∈ C} (21)Given linear operator T , it therefore follows from (18),T(C 1 + C 2 ) = {T(x + y) | x∈ C 1 , y ∈ C 2 }= {T(x) + T(y) | x∈ C 1 , y ∈ C 2 }= T(C 1 ) + T(C 2 )(22)

56 CHAPTER 2. CONVEX GEOMETRY2.1.9 inverse image2.1.9.0.1 Theorem. Image, Inverse image. [194,3] [38,2.3.2]Let f be a mapping from R p×k to R m×n .The image of a convex set C under any affine function (3.1.1.2)is convex.The inverse image 2.7 of a convex set F ,f(C) = {f(X) | X ∈ C} ⊆ R m×n (23)f −1 (F) = {X | f(X)∈ F} ⊆ R p×k (24)a single or many-valued mapping, under any affine function f is convex.In particular, any affine transformation of an affine set remains affine.[194, p.8]Each converse of this two-part theorem is generally false; id est, givenf affine, a convex image f(C) does not imply that set C is convex, andneither does a convex inverse image f −1 (F) imply set F is convex. Acounter-example is easy to visualize when the affine function is an orthogonalprojector [212] [154]:⋄2.1.9.0.2 Corollary. Projection on subspace. [194,3] 2.8Orthogonal projection of a convex set on a subspace is another convex set.⋄Again, the converse is false. Shadows, for example, are umbral projectionsthat can be convex when the body providing the shade is not.2.7 See2.9.1.0.2 for an example.2.8 The corollary holds more generally for projection on hyperplanes (2.4.2); [242,6.6]hence, for projection on affine subsets (2.3.1, nonempty intersections of hyperplanes).Orthogonal projection on affine subsets is reviewed inE.4.0.0.1.

2.2. VECTORIZED-MATRIX INNER PRODUCT 572.2 Vectorized-matrix inner productEuclidean space R n comes equipped with a linear vector inner-product〈y,z〉 ∆ = y T z (25)Two vectors are orthogonal (perpendicular) to one another if and only if theirinner product vanishes;A vector inner-product defines a normy ⊥ z ⇔ 〈y,z〉 = 0 (26)‖y‖ 2 ∆ = √ y T y , ‖y‖ 2 = 0 ⇔ y = 0 (27)When orthogonal vectors each have unit norm, then they are orthonormal.For linear operation A on a vector, represented by a real matrix, the adjointoperation A T is transposition and defined for matrix A by [140,3.10]〈y,A T z〉 ∆ = 〈Ay,z〉 (28)The vector inner-product for matrices is calculated just as it is for vectors;by first transforming a matrix in R p×k to a vector in R pk by concatenatingits columns in the natural order. For lack of a better term, we shall callthat linear bijective (one-to-one and onto [140, App.A1.2]) transformationvectorization. For example, the vectorization of Y = [y 1 y 2 · · · y k ] ∈ R p×k[94] [208] is⎡ ⎤y 1vec Y =∆ y⎢ 2⎥⎣ . ⎦ ∈ Rpk (29)y kThen the vectorized-matrix inner-product is trace of matrix inner-product;for Z ∈ R p×k , [38,2.6.1] [123,0.3.1] [251,8] [238,2.2]where (A.1.1)〈Y , Z〉 ∆ = tr(Y T Z) = vec(Y ) T vec Z (30)tr(Y T Z) = tr(ZY T ) = tr(YZ T ) = tr(Z T Y ) = 1 T (Y ◦ Z)1 (31)

58 CHAPTER 2. CONVEX GEOMETRYand where ◦ denotes the Hadamard product 2.9 of matrices [125] [88,1.1.4].The adjoint operation A T on a matrix can therefore be defined in like manner:〈Y , A T Z〉 ∆ = 〈AY , Z〉 (32)For example, take any element C 1 from a matrix-valued set in R p×k ,and consider any particular dimensionally compatible real vectors v andw . Then vector inner-product of C 1 with vw T is〈vw T , C 1 〉 = v T C 1 w = tr(wv T C 1 ) = 1 T( (vw T )◦ C 1)1 (33)2.2.0.0.1 Example. Application of the image theorem.Suppose the set C ⊆ R p×k is convex. Then for any particular vectors v ∈R pand w ∈R k , the set of vector inner-productsY ∆ = v T Cw = 〈vw T , C〉 ⊆ R (34)is convex. This result is a consequence of the image theorem. Yet it is easyto show directly that convex combination of elements from Y remains anelement of Y . 2.10More generally, vw T in (34) may be replaced with any particular matrixZ ∈ R p×k while convexity of the set 〈Z , C〉⊆ R persists. Further, byreplacing v and w with any particular respective matrices U and W ofdimension compatible with all elements of convex set C , then set U T CWis convex by the image theorem because it is a linear mapping of C .2.9 The Hadamard product is a simple entrywise product of corresponding entries fromtwo matrices of like size; id est, not necessarily square.2.10 To verify that, take any two elements C 1 and C 2 from the convex matrix-valued setC , and then form the vector inner-products (34) that are two elements of Y by definition.Now make a convex combination of those inner products; videlicet, for 0≤µ≤1µ 〈vw T , C 1 〉 + (1 − µ) 〈vw T , C 2 〉 = 〈vw T , µ C 1 + (1 − µ)C 2 〉The two sides are equivalent by linearity of inner product. The right-hand side remainsa vector inner-product of vw T with an element µ C 1 + (1 − µ)C 2 from the convex set C ;hence it belongs to Y . Since that holds true for any two elements from Y , then it mustbe a convex set.

2.2. VECTORIZED-MATRIX INNER PRODUCT 592.2.1 Frobenius’2.2.1.0.1 Definition. Isomorphic.An isomorphism of a vector space is a transformation equivalent to a linearbijective mapping. The image and inverse image under the transformationoperator are then called isomorphic vector spaces.△Isomorphic vector spaces are characterized by preservation of adjacency;id est, if v and w are points connected by a line segment in one vector space,then their images will also be connected by a line segment. Two Euclideanbodies may be considered isomorphic of there exists an isomorphism of theircorresponding ambient spaces. [239,I.1]When Z =Y ∈ R p×k in (30), Frobenius’ norm is resultant from vectorinner-product;‖Y ‖ 2 F = ‖ vec Y ‖2 2 = 〈Y , Y 〉 = tr(Y T Y )= ∑ i,jY 2ij = ∑ iλ(Y T Y ) i = ∑ iσ(Y ) 2 i(35)where λ(Y T Y ) i is the i th eigenvalue of Y T Y , and σ(Y ) i the i th singularvalue of Y . Were Y a normal matrix (A.5.2), then σ(Y )= |λ(Y )|[261,8.1] thus‖Y ‖ 2 F = ∑ iλ(Y ) 2 i = ‖λ(Y )‖ 2 2 (36)The converse (36) ⇒ normal Y also holds. [125,2.5.4]Because the metrics are equivalent‖ vec X −vec Y ‖ 2 = ‖X −Y ‖ F (37)and because vectorization (29) is a linear bijective map, then vector spaceR p×k is isometrically isomorphic with vector space R pk in the Euclidean senseand vec is an isometric isomorphism on R p×k . 2.11 Because of this Euclideanstructure, all the known results from convex analysis in Euclidean space R ncarry over directly to the space of real matrices R p×k .2.11 Given matrix A, its range R(A) (2.5) is isometrically isomorphic with its vectorizedrange vec R(A) but not with R(vec A).

60 CHAPTER 2. CONVEX GEOMETRY2.2.1.0.2 Definition. Isometric isomorphism.An isometric isomorphism of a vector space having a metric defined on it is alinear bijective mapping T that preserves distance; id est, for all x,y ∈dom T‖Tx − Ty‖ = ‖x − y‖ (38)Then the isometric isomorphism T is a bijective isometry.△Unitary linear operator Q : R n → R n representing orthogonal matrixQ∈ R n×n (B.5), for example, is an isometric ⎡ isomorphism. ⎤ Yet isometric1 0operator T : R 2 → R 3 representing T = ⎣ 0 1 ⎦ on R 2 is injective but not0 0a surjective map [140,1.6] to R 3 .The Frobenius norm is orthogonally invariant; meaning, for X,Y ∈ R p×kand dimensionally compatible orthonormal matrix 2.12 U and orthogonalmatrix Q‖U(X −Y )Q‖ F = ‖X −Y ‖ F (39)2.2.2 Symmetric matrices2.2.2.0.1 Definition. Symmetric matrix subspace.Define a subspace of R M×M : the convex set of all symmetric M×M matrices;S M ∆ = { A∈ R M×M | A=A T} ⊆ R M×M (40)This subspace comprising symmetric matrices S M is isomorphic with thevector space R M(M+1)/2 whose dimension is the number of free variables in asymmetric M ×M matrix. The orthogonal complement [212] [154] of S M isS M⊥ ∆ = { A∈ R M×M | A=−A T} ⊂ R M×M (41)the subspace of antisymmetric matrices in R M×M ; id est,S M ⊕ S M⊥ = R M×M (42)△2.12 Any matrix whose columns are orthonormal with respect to each other; this includesthe orthogonal matrices.

2.2. VECTORIZED-MATRIX INNER PRODUCT 61All antisymmetric matrices are hollow by definition (have 0main-diagonal). Any square matrix A∈ R M×M can be written as the sum ofits symmetric and antisymmetric parts: respectively,A = 1 2 (A +AT ) + 1 2 (A −AT ) (43)The symmetric part is orthogonal in R M2 to the antisymmetric part; videlicet,tr ( (A T + A)(A −A T ) ) = 0 (44)In the ambient space of real matrices, the antisymmetric matrix subspacecan be described{ }S M⊥ =∆ 12 (A −AT ) | A∈ R M×M ⊂ R M×M (45)because any matrix in S M is orthogonal to any matrix in S M⊥ . Furtherconfined to the ambient subspace of symmetric matrices, because ofantisymmetry, S M⊥ would become trivial.2.2.2.1 Isomorphism on symmetric matrix subspaceWhen a matrix is symmetric in S M , we may still employ the vectorizationtransformation (29) to R M2 ; vec , an isometric isomorphism. We mightinstead choose to realize in the lower-dimensional subspace R M(M+1)/2 byignoring redundant entries (below the main diagonal) during transformation.Such a realization would remain isomorphic but not isometric. Lack ofisometry is a spatial distortion due now to disparity in metric between R M2and R M(M+1)/2 . To realize isometrically in R M(M+1)/2 , we must make acorrection: For Y = [Y ij ]∈ S M we introduce the symmetric vectorization⎡ ⎤√2Y12Y 11Y 22svec Y =∆ √2Y13√2Y23∈ R M(M+1)/2 (46)⎢ Y 33 ⎥⎣ ⎦.Y MM

62 CHAPTER 2. CONVEX GEOMETRYwhere all entries off the main diagonal have been scaled. Now for Z ∈ S M〈Y , Z〉 ∆ = tr(Y T Z) = vec(Y ) T vec Z = svec(Y ) T svec Z (47)Then because the metrics become equivalent, for X ∈ S M‖ svec X − svec Y ‖ 2 = ‖X − Y ‖ F (48)and because symmetric vectorization (46) is a linear bijective mapping, thensvec is an isometric isomorphism on the symmetric matrix subspace. In otherwords, S M is isometrically isomorphic with R M(M+1)/2 in the Euclidean senseunder transformation svec .The set of all symmetric matrices S M forms a proper subspace in R M×M ,so for it there exists a standard orthonormal basis in isometrically isomorphicR M(M+1)/2 {E ij ∈ S M } =⎧⎨⎩e i e T i ,1 √2(e i e T j + e j e T ii = j = 1...M), 1 ≤ i < j ≤ M⎫⎬⎭(49)where M(M+1)/2 standard basis matrices E ij are formed from the standardbasis vectors e i ∈ R M . Thus we have a basic orthogonal expansion for Y ∈ S MY =M∑ j∑〈E ij ,Y 〉 E ij (50)j=1 i=1whose coefficients〈E ij ,Y 〉 ={Yii , i = 1... MY ij√2 , 1 ≤ i < j ≤ M(51)correspond to entries of the symmetric vectorization (46).

2.2. VECTORIZED-MATRIX INNER PRODUCT 632.2.3 Symmetric hollow subspace2.2.3.0.1 Definition. Hollow subspaces. [228]Define a subspace of R M×M : the convex set of all (real) symmetric M ×Mmatrices having 0 main-diagonal;R M h∆= { A ∈ R M×M | A=A T , δ(A) = 0 } ⊂ R M×M (52)where the main diagonal of A ∈ R M×M is denoted (A.1.1)δ(A) ∈ R M (1024)Operating on a vector, δ naturally returns a diagonal matrix; δ 2 (A) is adiagonal matrix. Operating recursively on a vector Λ ∈ R N or diagonalmatrix Λ∈ R N×N , δ(δ(Λ)) returns Λ itself;δ 2 (Λ) ≡ δ(δ(Λ)) ∆ = Λ (1025)The subspace R M h (52) comprising (real) symmetric hollow matrices isisomorphic with subspace R M(M−1)/2 . The orthogonal complement of R M h isR M⊥h∆= { A∈ R M×M | A=−A T + 2δ 2 (A) } ⊆ R M×M (53)the subspace of antisymmetric antihollow matrices in R M×M ; id est,R M h ⊕ R M⊥h = R M×M (54)Yet defined instead as a proper subspace of S M ,S M h∆= { A ∈ S M | δ(A) = 0 } ⊂ S M (55)the orthogonal complement S M⊥h of S M h in ambient S MS M⊥h∆= { A∈ S M | A=δ 2 (A) } ⊆ S M (56)is simply the subspace of diagonal matrices; id est,S M h ⊕ S M⊥h = S M (57)△

64 CHAPTER 2. CONVEX GEOMETRYAny matrix A∈ R M×M can be written as the sum of its symmetric hollowand antisymmetric antihollow parts: respectively,( ) ( 1 1A =2 (A +AT ) − δ 2 (A) +2 (A −AT ) + δ (A))2 (58)The symmetric hollow part is orthogonal in R M2 to the antisymmetricantihollow part; videlicet,))1 1tr((2 (A +AT ) − δ (A))( 2 2 (A −AT ) + δ 2 (A) = 0 (59)In the ambient space of real matrices, the antisymmetric antihollow subspaceis described{ }∆ 1=2 (A −AT ) + δ 2 (A) | A∈ R M×M ⊆ R M×M (60)S M⊥hbecause any matrix in S M h is orthogonal to any matrix in S M⊥h . Yet inthe ambient space of symmetric matrices S M , the antihollow subspace isnontrivial;S M⊥h∆= { δ 2 (A) | A∈ S M} = { δ(u) | u∈ R M} ⊆ S M (61)In anticipation of their utility with Euclidean distance matrices (EDMs)in4, for symmetric hollow matrices we introduce the linear bijectivevectorization dvec that is the natural analogue to symmetric matrixvectorization svec (46): for Y = [Y ij ]∈ S M h⎡dvecY = ∆ √ 2⎢⎣⎤Y 12Y 13Y 23Y 14Y 24Y 34 ⎥.⎦Y M−1,M∈ R M(M−1)/2 (62)Like svec (46), dvec is an isometric isomorphism on the symmetric hollowsubspace.

2.3. HULLS 65Figure 11: Convex hull of a random list of points in R 3 . Some pointsfrom that generating list reside in the interior of this convex polyhedron.[244, Convex Polyhedron] (Avis-Fukuda-Mizukoshi)The set of all symmetric hollow matrices S M h forms a proper subspace inR M×M , so for it there must be a standard orthonormal basis in isometricallyisomorphic R M(M−1)/2{E ij ∈ S M h } ={ }1 ( )√ ei e T j + e j e T i , 1 ≤ i < j ≤ M 2(63)where M(M −1)/2 standard basis matrices E ij are formed from the standardbasis vectors e i ∈ R M .The symmetric hollow majorization corollary on page 426 characterizeseigenvalues of symmetric hollow matrices.2.3 Hulls2.3.1 Affine dimension, affine hullAffine dimension of any set in R n is the dimension of the smallest affineset (empty set, point, line, plane, hyperplane (2.4.2), subspace, R n ) thatcontains it. For nonempty sets, affine dimension is the same as dimension ofthe subspace parallel to that affine set. [194,1] [123,A.2.1]Ascribe the points in a list {x l ∈ R n , l=1... N} to the columns ofmatrix X :X = [x 1 · · · x N ] ∈ R n×N (64)

66 CHAPTER 2. CONVEX GEOMETRYIn particular, we define affine dimension r of the N-point list X asdimension of the smallest affine set in Euclidean space R n that contains X ;r ∆ = dim aff X (65)Affine dimension r is a lower bound sometimes called embedding dimension.[228] [110] That affine set A in which those points are embedded is uniqueand called the affine hull [38,2.1.2] [210,2.1];A ∆ = aff {x l ∈ R n , l=1... N} = aff X= x 1 + R{x l − x 1 , l=2... N} = {Xa | a T 1 = 1} ⊆ R n (66)Given some arbitrary set C and any x∈ Cwhere aff(C−x) is a subspace.aff C = x + aff(C − x) (67)2.3.1.0.1 Definition. We analogize affine subset to subspace, 2.13defining it to be any nonempty affine set (2.1.4).△The affine hull of a point x is that point itself;aff ∅ ∆ = ∅ (68)aff{x} = {x} (69)The affine hull of two distinct points is the unique line through them.(Figure 14) The affine hull of three noncollinear points in any dimensionis that unique plane containing the points, and so on. The subspace ofsymmetric matrices S m is the affine hull of the cone of positive semidefinitematrices; (2.9)aff S m + = S m (70)2.13 The popular term affine subspace is an oxymoron.

2.3. HULLS 67Figure 12: A simplicial cone (2.12.3.1.1) in R 3 whose boundary is drawntruncated; constructed using A∈ R 3×3 and C = 0 in (234). By the mostfundamental definition of a cone (2.7.1), entire boundary can be constructedfrom an aggregate of rays emanating exclusively from the origin. Eachof three extreme directions corresponds to an edge (2.6.0.0.3); they areconically, affinely, and linearly independent for this cone. Because thisset is polyhedral, exposed directions are in one-to-one correspondence withextreme directions; there are only three. Its extreme directions give rise towhat is called a vertex-description of this polyhedral cone; simply, the conichull of extreme directions. Obviously this cone can also be constructed byintersection of three halfspaces; hence the equivalent halfspace-description.

68 CHAPTER 2. CONVEX GEOMETRY2.3.2 Convex hullThe convex hull [123,A.1.4] [38,2.1.4] [194] of any bounded 2.14 list (or set)of N points X ∈ R n×N forms a unique convex polyhedron (2.12.0.0.1) whosevertices constitute some subset of that list;P ∆ = conv {x l , l=1... N} = conv X = {Xa | a T 1 = 1, a ≽ 0} ⊆ R n(71)The union of relative interior and relative boundary (2.6.1.3) of thepolyhedron comprise the convex hull P , the smallest closed convex set thatcontains the list X ; e.g., Figure 11. Given P , the generating list {x l } isnot unique.Given some arbitrary set Cconv C ⊆ aff C = aff C = aff C = aff conv C (72)Any closed bounded convex set C is equal to the convex hull of its boundary;C = conv ∂ C (73)conv ∅ ∆ = ∅ (74)2.3.2.1 Comparison with respect to R N +The notation a ≽ 0 means vector a belongs to the nonnegative orthantR N + , whereas a ≽ b denotes comparison of vector a to vector b on R N withrespect to the nonnegative orthant; id est, a ≽ b means a −b belongs to thenonnegative orthant. In particular, a ≽ b ⇔ a i ≽ b i ∀i. (297)Comparison of matrices with respect to the positive semidefinite cone,like I ≽A ≽ 0 in Example 2.3.2.1.1, is explained in2.9.0.1.2.14 A set in R n is bounded if and only if it can be contained in a Euclidean ball havingfinite radius. [61,2.2] (confer4.7.3.0.1)

2.3. HULLS 69svec ∂ S 2 +I[ α ββ γ]γα√2βFigure 13: The circle, (radius 1 √2) shown here on boundary of positivesemidefinite cone S 2 + in isometrically isomorphic R 3 from Figure 29,comprises boundary of a Fantope (75) in this dimension (k = 1, N = 2).Lone point illustrated is the identity matrix I and corresponds to case k = 2.(View is from inside PSD cone looking toward origin.)

70 CHAPTER 2. CONVEX GEOMETRY2.3.2.1.1 Example. Convex hull of outer product. [179] [9,4.1][183,3] [148,2.4] Hull of the set comprising outer product of orthonormalmatrices has equivalent expression:conv { UU T | U ∈ R N×k , U T U = I } = { A∈ S N | I ≽ A ≽ 0, 〈I , A〉= k }(75)This we call a Fantope. In case k = 1, there is slight simplification: ((1205),Example 2.9.2.3.1)conv { UU T | U ∈ R N , U T U = I } = { A∈ S N | A ≽ 0, 〈I , A〉= 1 } (76)The set, for any k{UU T | U ∈ R N×k , U T U = I} (77)comprises the extreme points (2.6.0.0.1) of its convex hull. By (1067), eachand every extreme point UU T has only k nonzero eigenvalues and they allequal 1 ; id est, λ(UU T ) 1:k = λ(U T U) = 1. So the Frobenius norm of eachand every extreme point equals the same constant‖UU T ‖ 2 F = k (78)Each extreme point simultaneously lies on the boundary of the positivesemidefinite cone (when k < N ,2.9) and on the boundary of a hypersphereof dimension k(N −√k(1 k + 1) and radius − k ) centered at the ray (base 0)2 2 Nthrough the identity matrix in isomorphic vector space R N(N+1)/2 (2.2.2.1).Figure 13 illustrates the extreme points (77) comprising the boundaryof a Fantope; but that circumscription does not hold in higher dimension.(2.9.2.4)2.3.3 Conic hullIn terms of a finite-length point list (or set) arranged columnar in X ∈ R n×N(64), its conic hull is expressedK ∆ = cone {x l , l=1... N} = coneX = {Xa | a ≽ 0} ⊆ R n (79)id est, every nonnegative combination of points from the list. The conic hullof any list forms a polyhedral cone [123,A.4.3] (2.12.1.0.1; e.g., Figure 12);the smallest closed convex cone that contains the list.

2.3. HULLS 71Aaffine hull (drawn truncated)Cconvex hullKconic hull (truncated)Rrange or span is a plane (truncated)Figure 14: Given two points in Euclidean vector space of any dimension,their various hulls are illustrated. Each hull is a subset of range; generally,A , C, K ⊆ R . (Cartesian axes drawn for reference.)

72 CHAPTER 2. CONVEX GEOMETRYBy convention, the aberration [210,2.1]cone ∅ = ∆ {0} (80)Given some arbitrary set C , it is apparentconv C ⊆ cone C (81)2.3.4 Vertex-descriptionThe conditions in (66), (71), and (79) respectively define an affine, convex,and conic combination of elements from the set or list. Whenever a Euclideanbody can be described as some hull or span of a set of points, then thatrepresentation is loosely called a vertex-description.2.4 Halfspace, HyperplaneA two-dimensional affine set is called a plane. An (n −1)-dimensional affineset in R n is called a hyperplane. [194] [123] Every hyperplane partially boundsa halfspace (which is convex but not affine).2.4.1 Halfspaces H + and H −Euclidean space R n is partitioned into two halfspaces by any hyperplane∂H ; id est, H − + H + = R n . The resulting (closed convex) halfspaces, bothpartially bounded by ∂H , may be describedH − = {y | a T y ≤ b} = {y | a T (y − y p ) ≤ 0} ⊂ R n (82)H + = {y | a T y ≥ b} = {y | a T (y − y p ) ≥ 0} ⊂ R n (83)where nonzero vector a∈R n is an outward-normal to the hyperplane partiallybounding H − while an inward-normal with respect to H + . For any vectory −y p that makes an obtuse angle with normal a , vector y will lie in thehalfspace H − on one side (shaded in Figure 15) of the hyperplane, whileacute angles denote y in H + on the other side.An equivalent more intuitive representation of a halfspace comes aboutwhen we consider all the points in R n closer to point d than to point c orequidistant, in the Euclidean sense; from Figure 15,

2.4. HALFSPACE, HYPERPLANE 73H +ay pcyd∆∂H={y | a T (y − y p )=0}H −N(a T )={y | a T y=0}Figure 15: Hyperplane illustrated ∂H is a line partially bounding halfspacesH − = {y | a T (y − y p )≤0} and H + = {y | a T (y − y p )≥0} in R 2 . Shaded isa rectangular piece of semi-infinite H − with respect to which vector a isoutward-normal to bounding hyperplane; vector a is inward-normal withrespect to H + . Halfspace H − contains nullspace N(a T ) (dashed linethrough origin) because a T y p > 0. Hyperplane, halfspace, and nullspace areeach drawn truncated. Points c and d are equidistant from hyperplane, andvector c − d is normal to it. ∆ is distance from origin to hyperplane.

74 CHAPTER 2. CONVEX GEOMETRYH − = {y | ‖y − d‖ ≤ ‖y − c‖} (84)This representation, in terms of proximity, is resolved with the moreconventional representation of a halfspace (82) by squaring both sides ofthe inequality in (84);}H − ={y | (c − d) T y ≤ ‖c‖2 − ‖d‖ 22=( {y | (c − d) T y − c + d ) }≤ 02(85)2.4.1.1 PRINCIPLE 1: Halfspace-description of convex setsThe most fundamental principle in convex geometry follows from thegeometric Hahn-Banach theorem [154,5.12] [14,1] [70,I.1.2] whichguarantees any closed convex set to be an intersection of halfspaces.2.4.1.1.1 Theorem. Halfspaces. [38,2.3.1] [194,18][123,A.4.2(b)] [27,2.4] A closed convex set in R n is equivalent to theintersection of all halfspaces that contain it.⋄Intersection of multiple halfspaces in R n may be represented using amatrix constant A ;⋂H i−= {y | A T y ≼ b} = {y | A T (y − y p ) ≼ 0} (86)i⋂H i+= {y | A T y ≽ b} = {y | A T (y − y p ) ≽ 0} (87)iwhere b is now a vector, and the i th column of A is normal to a hyperplane∂H i partially bounding H i . By the halfspaces theorem, intersections likethis can describe interesting convex Euclidean bodies such as polyhedra andcones, giving rise to the term halfspace-description.

2.4. HALFSPACE, HYPERPLANE 7511−1−11[ 1a =1][ −1b =−1]−1−11{y | a T y=1}{y | b T y=−1}{y | a T y=−1}{y | b T y=1}[ −1c =1{y | c T y=1}]−11−11{y | d T y=−1}−11−11d =[ 1−1]{y | c T y=−1}{y | d T y=1}[ 1e =0]−11{y | e T y=−1} {y | e T y=1}Figure 16: Hyperplanes in R 2 (drawn truncated). Hyperplane movementin normal direction increases vector inner-product. This visual conceptis exploited to attain analytical solution of some linear programs; e.g.,Example 2.4.2.6.2, Example 3.1.1.2.1, [38, exer.4.8-exer.4.20]. Each graph isalso interpretable as a contour plot of a real affine function of two variablesas in Figure 51.

76 CHAPTER 2. CONVEX GEOMETRY2.4.2 Hyperplane ∂H representationsEvery hyperplane ∂H is an affine set parallel to an (n −1)-dimensionalsubspace of R n ; it is itself a subspace if and only if it contains the origin.dim∂H = n − 1 (88)Every hyperplane can be described as the intersection of complementaryhalfspaces; [194,19]∂H = H − ∩ H + = {y | a T y ≤ b , a T y ≥ b} = {y | a T y = b} (89)a halfspace-description. Assuming normal a ∈ R n to be nonzero, then anyhyperplane in R n can be described as the solution set to vector equationa T y = b , illustrated in Figure 15 and Figure 16 for R 2 ;∂H ∆ = {y | a T y = b} = {y | a T (y −y p ) = 0} = {Zξ+y p | ξ ∈ R n−1 } ⊂ R n (90)All solutions y constituting the hyperplane are offset from the nullspace ofa T by the same constant vector y p ∈ R n that is any particular solution toa T y=b ; id est,y = Zξ + y p (91)where the columns of Z ∈ R n×n−1 constitute a basis for the nullspaceN(a T ) = {x∈ R n | a T x=0} . 2.15Conversely, given any point y p in R n , the unique hyperplane containingit having normal a is the affine set ∂H (90) where b equals a T y p andwhere a basis for N(a T ) is arranged in Z columnar. Hyperplane dimensionis apparent from the dimensions of Z ; that hyperplane is parallel to thespan of its columns.2.15 We will later find this expression for y in terms of nullspace of a T (more generally, ofmatrix A T (116)) to be a useful device for eliminating affine equality constraints, much aswe did here.

2.4. HALFSPACE, HYPERPLANE 772.4.2.0.1 Example. Distance from origin to hyperplane.Given the (shortest) distance ∆∈ R + from the origin to a hyperplanehaving normal vector a , we can find its representation ∂H by droppinga perpendicular. The point thus found is the orthogonal projection of theorigin on ∂H (E.5.0.0.5), equal to a∆/‖a‖ if the origin is known a priorito belong to halfspace H − (Figure 15), or equal to −a∆/‖a‖ if the originbelongs to halfspace H + ; id est, when H − ∋0∂H = { y | a T (y − a∆/‖a‖) = 0 } = { y | a T y = ‖a‖∆ } (92)or when H + ∋0∂H = { y | a T (y + a∆/‖a‖) = 0 } = { y | a T y = −‖a‖∆ } (93)Knowledge of only distance ∆ and normal a thus introduces ambiguity intothe hyperplane representation.2.4.2.1 Matrix variableAny halfspace in R mn may be represented using a matrix variable. Forvariable Y ∈ R m×n , given constants A∈ R m×n and b = 〈A , Y p 〉 ∈ R ,H − = {Y ∈ R mn | 〈A, Y 〉 ≤ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≤ 0} (94)H + = {Y ∈ R mn | 〈A, Y 〉 ≥ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≥ 0} (95)Recall vector inner-product from2.2, 〈A, Y 〉= tr(A T Y ).Hyperplanes in R mn may, of course, also be represented using matrixvariables.∂H = {Y | 〈A, Y 〉 = b} = {Y | 〈A, Y −Y p 〉 = 0} ⊂ R mn (96)Vector a from Figure 15 is normal to the hyperplane illustrated. Likewise,nonzero vectorized matrix A is normal to hyperplane ∂H ;A ⊥ ∂H in R mn (97)

78 CHAPTER 2. CONVEX GEOMETRYA 1A 2A 30Figure 17: Any one particular point of three points illustrated does not belongto affine hull A i (i∈1, 2, 3, each drawn truncated) of points remaining.Three corresponding vectors in R 2 are, therefore, affinely independent (butneither linearly or conically independent).2.4.2.2 Vertex-description of hyperplaneAny hyperplane in R n may be described as the affine hull of a minimal set ofpoints {x l ∈ R n , l = 1... n} arranged columnar in a matrix X ∈ R n×n (64):∂H = aff{x l ∈ R n , l = 1... n} , dim aff{x l ∀l}=n−1= aff X , dim aff X = n−1= x 1 + R{x l − x 1 , l=2... n} , dim R{x l − x 1 , l=2... n}=n−1= x 1 + R(X − x 1 1 T ) , dim R(X − x 1 1 T ) = n−1(98)whereR(A) = {Ax | x∈ R n } (114)2.4.2.3 Affine independence, minimal setFor any particular affine set, a minimal set of points constituting itsvertex-description is an affinely independent descriptive set and vice versa.

2.4. HALFSPACE, HYPERPLANE 79Arbitrary given points {x i ∈ R n , i=1... N} are affinely independent(a.i.) if and only if, over all ζ ∈ R N ζ T 1=1, ζ k = 0 (confer2.1.2)x i ζ i + · · · + x j ζ j − x k = 0, i≠ · · · ≠j ≠k = 1... N (99)has no solution ζ ; in words, iff no point from the given set can be expressedas an affine combination of those remaining. We deducel.i. ⇒ a.i. (100)Consequently, {x i , i=1... N} is an affinely independent set if and only if{x i −x 1 , i=2... N} is a linearly independent (l.i.) set. [ [128,3] ] (Figure 17)XThis is equivalent to the property that the columns of1 T (for X ∈ R n×Nas in (64)) form a linearly independent set. [123,A.1.3]2.4.2.4 Preservation of affine independenceIndependence in the linear (2.1.2.1), affine, and conic (2.10.1) senses canbe preserved under linear transformation. Suppose a matrix X ∈ R n×N (64)holds an affinely independent set in its columns. Consider a transformationT(X) : R n×N → R n×N ∆ = XY (101)where the given matrix Y ∆ = [y 1 y 2 · · · y N ]∈ R N×N is represented by linearoperator T . Affine independence of {Xy i ∈ R n , i=1... N} demands (bydefinition (99)) there exists no solution ζ ∈ R N ζ T 1=1, ζ k = 0, toXy i ζ i + · · · + Xy j ζ j − Xy k = 0, i≠ · · · ≠j ≠k = 1... N (102)By factoring X , we see that is ensured by affine independence of {y i ∈ R N }and by R(Y )∩ N(X) = 0 where2.4.2.5 affine mapsR(A) = {Ax | x∈ R n } (114)N(A) = {x∈ R n | Ax=0} (115)Affine transformations preserve affine hulls. Given any affine mapping T[194, p.8]aff(T C) = T(aff C) (103)

80 CHAPTER 2. CONVEX GEOMETRYCH −H +0 > κ 3 > κ 2 > κ 1{z ∈ R 2 | a T z = κ 1 }a{z ∈ R 2 | a T z = κ 2 }{z ∈ R 2 | a T z = κ 3 }Figure 18: Each shaded line segment {z ∈ C | a T z = κ i } belonging to setC ⊂ R 2 shows intersection with hyperplane parametrized by scalar κ i ; eachshows a (linear) contour in vector z of equal inner product with normalvector a . Cartesian axes drawn for reference. (confer Figure 51)

2.4. HALFSPACE, HYPERPLANE 81tradition(a)Yy paH +H −∂H −nontraditional(b)Yy pãH −H +∂H +Figure 19: (a) Hyperplane ∂H − (104) supporting closed set Y ∈ R 2 .Vector a is inward-normal to hyperplane with respect to halfspace H + ,but outward-normal with respect to set Y . A supporting hyperplane canbe considered the limit of an increasing sequence in the normal-direction likethat in Figure 18. (b) Hyperplane ∂H + nontraditionally supporting Y .Vector ã is inward-normal to hyperplane now with respect to bothhalfspace H + and set Y . Tradition [123] [194] recognizes only positivenormal polarity in support function σ Y as in (104); id est, normal a ,figure (a). But both interpretations of supporting hyperplane are useful.

82 CHAPTER 2. CONVEX GEOMETRY2.4.2.6 PRINCIPLE 2: Supporting hyperplaneThe second most fundamental principle of convex geometry also follows fromthe geometric Hahn-Banach theorem [154,5.12] [14,1] that guaranteesexistence of at least one hyperplane in R n supporting a convex set (havingnonempty interior) 2.16 at each point on its boundary.2.4.2.6.1 Definition. Supporting hyperplane ∂H .The partial boundary ∂H of a closed halfspace containing arbitrary set Yis called a supporting hyperplane ∂H to Y when it contains at least onepoint of Y . [194,11] Specifically, given normal a≠0 (belonging to H + bydefinition), the supporting hyperplane to Y at y p ∈ ∂Y [sic] is∂H − = { y | a T (y − y p ) = 0, y p ∈ Y , a T (z − y p ) ≤ 0 ∀z∈Y }= { y | a T y = sup{a T z |z∈Y} } (104)where normal a and set Y reside in opposite halfspaces. (Figure 19(a)) Realfunctionσ Y (a) ∆ = sup{a T z |z∈Y} (404)is called the support function for Y .An equivalent but nontraditional representation 2.17 for a supportinghyperplane is obtained by reversing polarity of normal a ; (1260)∂H + = { y | ã T (y − y p ) = 0, y p ∈ Y , ã T (z − y p ) ≥ 0 ∀z∈Y }= { y | ã T y = − inf{ã T z |z∈Y} = sup{−ã T z |z∈Y} } (105)where normal ã and set Y now both reside in H + . (Figure 19(b))When a supporting hyperplane contains only a single point of Y , thathyperplane is termed strictly supporting (and termed tangent to Y if thesupporting hyperplane is unique there [194,18, p.169]).△2.16 It is conventional to speak of a hyperplane supporting set C but not containing C ;called nontrivial support. [194, p.100] Hyperplanes in support of lower-dimensional bodiesare admitted.2.17 useful for constructing the dual cone; e.g., Figure 40(b). Tradition recognizes thepolar cone; which is the negative of the dual cone.

2.4. HALFSPACE, HYPERPLANE 83A closed convex set C ⊂ R n , for example, can be expressed as theintersection of all halfspaces partially bounded by hyperplanes supportingit; videlicet, [154, p.135]C = ⋂a∈R n {y | a T y ≤ σ C (a) } (106)by the halfspaces theorem (2.4.1.1.1).There is no geometric difference 2.18 between supporting hyperplane ∂H +or ∂H − or ∂H and an ordinary hyperplane ∂H coincident with them.2.4.2.6.2 Example. Minimization on the unit cube.Consider minimization of a linear function on a hypercube, given vector cminimize c T xxsubject to −1 ≼ x ≼ 1(107)This convex optimization problem is called a linear program because theconstraints describe intersection of a finite number of halfspaces (andhyperplanes). Applying graphical concepts from Figure 16, Figure 18, andFigure 19, an optimal solution can be shown to be x ⋆ = − sgn(c) but is notnecessarily unique. Because a solution always exists at a hypercube vertexregardless of the value of vector c , [53] mathematicians see this as a means torelax a problem whose desired solution is integer (confer Example 6.2.3.0.2).2.4.2.6.3 Exercise. Unbounded below.Suppose instead we minimize over the unit hypersphere in Example 2.4.2.6.2;‖x‖ ≤ 1. What is an expression for optimal solution now? Is that programstill linear?Now suppose we instead minimize absolute value in (107). Are thefollowing programs equivalent for some arbitrary real convex set C ?(confer (1272))minimize |x|x∈Rsubject to −1 ≤ x ≤ 1x ∈ C≡minimize x + + x −x + , x −subject to 1 ≥ x − ≥ 01 ≥ x + ≥ 0x + − x − ∈ C(108)2.18 If vector-normal polarity is unimportant, we may instead signify a supportinghyperplane by ∂H .

84 CHAPTER 2. CONVEX GEOMETRYMany optimization problems of interest and some older methods ofsolution require nonnegative variables. The method illustrated below splitsa variable into its nonnegative and negative parts; x = x + − x − (extensibleto vectors). Under what conditions on vector a and scalar b is an optimalsolution x ⋆ negative infinity?minimizex + ∈ R , x − ∈ Rx + − x −subject to x − ≥ 0x + ≥ 0[ ]a T x+= bx −Minimization of the objective entails maximization of x − .(109)2.4.2.7 PRINCIPLE 3: Separating hyperplaneThe third most fundamental principle of convex geometry again follows fromthe geometric Hahn-Banach theorem [154,5.12] [14,1] [70,I.1.2] thatguarantees existence of a hyperplane separating two nonempty convex sets inR n whose relative interiors are nonintersecting. Separation intuitively meanseach set belongs to a halfspace on an opposing side of the hyperplane. Thereare two cases of interest:1) If the two sets intersect only at their relative boundaries (2.6.1.3), thenthere exists a separating hyperplane ∂H containing the intersection butcontaining no points relatively interior to either set. If at least one ofthe two sets is open, conversely, then the existence of a separatinghyperplane implies the two sets are nonintersecting. [38,2.5.1]2) A strictly separating hyperplane ∂H intersects the closure of neitherset; its existence is guaranteed when the intersection of the closures isempty and at least one set is bounded. [123,A.4.1]

2.5. SUBSPACE REPRESENTATIONS 852.5 Subspace representationsThere are two common forms of expression for subspaces, both comingfrom elementary linear algebra: range form and nullspace form; a.k.a,vertex-description and halfspace-description respectively.The fundamental vector subspaces associated with a matrix A∈ R m×n[212,3.1] are ordinarily relatedand of dimension:R(A T ) ⊥ N(A), N(A T ) ⊥ R(A) (110)dim R(A T ) = dim R(A) = rankA ≤ min{m,n} (111)dim N(A) = n − rankA, dim N(A T ) = m − rankA (112)From these four fundamental subspaces, the rowspace and range identify oneform of subspace description (range form or vertex-description (2.3.4))R(A T ) ∆ = spanA T = {A T y | y ∈ R m } = {x∈ R n | A T y=x, y ∈R(A)} (113)R(A) ∆ = spanA = {Ax | x∈ R n } = {y ∈ R m | Ax=y , x∈R(A T )} (114)while the nullspaces identify the second common form (nullspace form orhalfspace-description (89))N(A) ∆ = {x∈ R n | Ax=0} (115)N(A T ) ∆ = {y ∈ R m | A T y=0} (116)Range forms (113) (114) are realized as the respective span of the columnvectors in matrices A T and A , whereas nullspace form (115) or (116) is thesolution set to a linear equation similar to hyperplane definition (90). Yetbecause matrix A generally has multiple rows, halfspace-description N(A) isactually the intersection of as many hyperplanes through the origin; for (115),each row of A is normal to a hyperplane while each row of A T is a normalfor (116).

86 CHAPTER 2. CONVEX GEOMETRY2.5.1 Subspace or affine set. ..Any particular vector subspace R p can be described as N(A) the nullspaceof some matrix A or as R(B) the range of some matrix B .More generally, we have the choice of expressing an n − m-dimensionalaffine subset in R n as the intersection of m hyperplanes, or as the offset spanof n − m vectors:2.5.1.1 ...as hyperplane intersectionAny affine subset A of dimension n−m can be described as an intersectionof m hyperplanes in R n ; given fat full-rank (rank = min{m,n}) matrixand vector b∈R m ,⎡a T1A =∆ ⎣ .a T m⎤⎦∈ R m×n (117)A ∆ = {x∈ R n | Ax=b} =m⋂ { }x | aTi x=b ii=1(118)a halfspace-description. (89)For example: The intersection of any two independent 2.19 hyperplanesin R 3 is a line, whereas three independent hyperplanes intersect at apoint. In R 4 , the intersection of two independent hyperplanes is a plane(Example 2.5.1.2.1), whereas three hyperplanes intersect at a line, four at apoint, and so on.For n>kA ∩ R k = {x∈ R n | Ax=b} ∩ R k =m⋂ { }x∈ R k | a i (1:k) T x=b ii=1(119)The result in2.4.2.2 is extensible; id est, any affine subset A also has avertex-description.2.19 Hyperplanes are said to be independent iff the normals defining them are linearlyindependent.

2.5. SUBSPACE REPRESENTATIONS 872.5.1.2 ...as span of nullspace basisAlternatively, we may compute a basis for the nullspace of matrix A in (117)and then equivalently express the affine subset as its range plus an offset:DefineZ ∆ = basis N(A)∈ R n×n−m (120)so AZ = 0. Then we have the vertex-description,A = {x∈ R n | Ax = b} = { Zξ + x p | ξ ∈ R n−m} ⊆ R n (121)the offset span of n−m column vectors, where x p is any particular solutionto Ax = b .2.5.1.2.1 Example. Intersecting planes in 4-space.Two planes can intersect at a point in four-dimensional Euclidean vectorspace. It is easy to visualize intersection of two planes in three dimensions;a line can be formed. In four dimensions it is harder to visualize. So let’sresort to the tools acquired.Suppose an intersection of two hyperplanes in four dimensions is specifiedby a fat full-rank matrix A 1 ∈ R 2×4 (m = 2, n = 4) as in (118):{ ∣ [ ] }∣∣∣A ∆ 1 = x∈ R 4 a11 a 12 a 13 a 14x = ba 21 a 22 a 23 a 1 (122)24The nullspace of A 1 is two dimensional (from Z in (121)), so A 1 representsa plane in four dimensions. Similarly define a second plane in terms ofA 2 ∈ R 2×4 :{ ∣ [ ] }∣∣∣A ∆ 2 = x∈ R 4 a31 a 32 a 33 a 34x = ba 41 a 42 a 43 a 2 (123)44If the two planes are independent (meaning any line in one is linearlyindependent [ ] of any line from the other), they will intersect at a point becauseA1then is invertible;A 2A 1 ∩ A 2 ={ ∣ [ ] [ ]}∣∣∣x∈ R 4 A1 b1x =A 2 b 2(124)

88 CHAPTER 2. CONVEX GEOMETRY2.5.2 Intersection of subspacesThe intersection of nullspaces associated with two matrices A∈ R m×n andB ∈ R k×n can be expressed most simply as([ AN(A) ∩ N(B) = NB])[∆ A= {x∈ R n |B]x = 0} (125)the nullspace of their rowwise concatenation.Suppose the columns of a matrix Z constitute a basis for N(A) while thecolumns of a matrix W constitute a basis for N(BZ). Then [88,12.4.2]N(A) ∩ N(B) = R(ZW) (126)If each basis is orthonormal, then the columns of ZW constitute anorthonormal basis for the intersection.In the particular circumstance A and B are each positive semidefinite[16,6], or in the circumstance A and B are two linearly independent dyads(B.1.1), thenN(A) ∩ N(B) = N(A + B),⎧⎨⎩A,B ∈ S M +orA + B = u 1 v T 1 + u 2 v T 2(l.i.)(127)2.6 Extreme, Exposed2.6.0.0.1 Definition. Extreme point.An extreme point x ε of a convex set C is a point, belonging to its closureC [27,3.3], that is not expressible as a convex combination of points in Cdistinct from x ε ; id est, for x ε ∈ C and all x 1 ,x 2 ∈ C \x εµ x 1 + (1 − µ)x 2 ≠ x ε , µ ∈ [0, 1] (128)△

2.6. EXTREME, EXPOSED 89In other words, x ε is an extreme point of C if and only if x ε is not apoint relatively interior to any line segment in C . [231,2.10]Borwein & Lewis offer: [36,4.1.6] An extreme point of a convex set C isa point x ε in C whose relative complement C \x ε is convex.The set consisting of a single point C ={x ε } is itself an extreme point.2.6.0.0.2 Theorem. Extreme existence. [194,18.5.3] [17,II.3.5]A nonempty closed convex set containing no lines has at least one extremepoint.⋄2.6.0.0.3 Definition. Face, edge. [123,A.2.3]A face F of convex set C is a convex subset F ⊆ C such that everyclosed line segment x 1 x 2 in C , having a relatively interior pointx∈rel intx 1 x 2 in F , has both endpoints in F . The zero-dimensionalfaces of C constitute its extreme points. The empty set and C itselfare conventional faces of C . [194,18]All faces F are extreme sets by definition; id est, for F ⊆ C and allx 1 ,x 2 ∈ C \Fµ x 1 + (1 − µ)x 2 /∈ F , µ ∈ [0, 1] (129)A one-dimensional face of a convex set is called an edge.△Dimension of a face is the penultimate number of affinely independentpoints (2.4.2.3) belonging to it;dim F = sup dim{x 2 − x 1 , x 3 − x 1 , ... , x ρ − x 1 | x i ∈ F , i=1... ρ} (130)ρThe point of intersection in C with a strictly supporting hyperplaneidentifies an extreme point, but not vice versa. The nonempty intersection ofany supporting hyperplane with C identifies a face, in general, but not viceversa. To acquire a converse, the concept exposed face requires introduction:

90 CHAPTER 2. CONVEX GEOMETRYABCDFigure 20: Closed convex set in R 2 . Point A is exposed hence extreme. PointB is extreme but not an exposed point. Point C is exposed and extreme.Point D is neither an exposed or extreme point although it belongs to aone-dimensional exposed face. [123,A.2.4] [210,3.6] Closed face AB isexposed; a facet. The arc is not a conventional face, yet it is composedentirely of extreme points. The union of rotations of this entire set about itsvertical edge produces another set in three dimensions having no edges; thesame is false for rotation about the horizontal edge.

2.6. EXTREME, EXPOSED 912.6.1 Exposure2.6.1.0.1 Definition. Exposed face, exposed point, vertex, facet.[123,A.2.3, A.2.4]Fis an exposed face of an n-dimensional convex set C iff there is asupporting hyperplane ∂H to C such thatF = C ∩ ∂H (131)Only faces of dimension −1 through n −1 can be exposed by ahyperplane.An exposed point, the definition of vertex, is equivalent to azero-dimensional exposed face; the point of intersection with a strictlysupporting hyperplane.Afacet is an (n −1)-dimensional exposed face of an n-dimensionalconvex set C ; in one-to-one correspondence with the(n −1)-dimensional faces. 2.20{exposed points} = {extreme points}{exposed faces} ⊆ {faces}△2.6.1.1 Density of exposed pointsFor any closed convex set C , its exposed points constitute a dense subset ofits extreme points; [194,18] [215] [210,3.6, p.115] dense in the sense [244]that closure of that subset yields the set of extreme points.For the convex set illustrated in Figure 20, point B cannot be exposedbecause it relatively bounds both the facet AB and the closed quarter circle,each bounding the set. Since B is not relatively interior to any line segmentin the set, then B is an extreme point by definition. Point B may be regardedas the limit of some sequence of exposed points beginning at C .2.20 This coincidence occurs simply because the facet’s dimension is the same as thedimension of the supporting hyperplane exposing it.

92 CHAPTER 2. CONVEX GEOMETRY2.6.1.2 Face transitivity and algebraFaces of a convex set enjoy transitive relation. If F 1 is a face (an extremeset) of F 2 which in turn is a face of F 3 , then it is always true that F 1 isa face of F 3 . (The parallel statement for exposed faces is false. [194,18])For example, any extreme point of F 2 is an extreme point of F 3 ; in thisexample, F 2 could be a face exposed by a hyperplane supporting polyhedronF 3 . [138, def.115/6, p.358] Yet it is erroneous to presume that a face, ofdimension 1 or more, consists entirely of extreme points, nor is a face ofdimension 2 or more entirely composed of edges, and so on.For the polyhedron in R 3 from Figure 11, for example, the nonempty facesexposed by a hyperplane are the vertices, edges, and facets; there are no more.The zero-, one-, and two-dimensional faces are in one-to-one correspondencewith the exposed faces in that example.Define the smallest face F that contains some element G of a convexset C :F(C ∋G) (132)videlicet, C ⊇ F(C ∋G) ∋ G . An affine set has no faces except itself and theempty set. The smallest face that contains G of the intersection of convexset C with an affine set A [148,2.4]F((C ∩ A)∋G) = F(C ∋G) ∩ A (133)equals the intersection of A with the smallest face that contains G of set C .2.6.1.3 BoundaryThe classical definition of boundary of a set C presumes nonempty interior:∂ C = C \ int C (13)More suitable for the study of convex sets is the relative boundary; defined[123,A.2.1.2]rel∂C = C \ rel int C (134)the boundary relative to the affine hull of C , conventionally equivalent to:

2.7. CONES 932.6.1.3.1 Definition. Conventional boundary of convex set. [123,C.3.1]The relative boundary ∂ C of a nonempty convex set C is the union of all theexposed faces of C .△Equivalence of this definition to (134) comes about because it isconventionally presumed that any supporting hyperplane, central to thedefinition of exposure, does not contain C . [194, p.100]Any face F of convex set C (that is not C itself) belongs to rel∂C .(2.8.2.1) In the exception when C is a single point {x} , (10)rel∂{x} = {x}\{x} = ∅ , x∈ R n (135)A bounded convex polyhedron (2.12.0.0.1) having nonempty interior, forexample, in R has a boundary constructed from two points, in R 2 fromat least three line segments, in R 3 from convex polygons, while a convexpolychoron (a bounded polyhedron in R 4 [244]) has a boundary constructedfrom three-dimensional convex polyhedra.By Definition 2.6.1.3.1, an affine set has no relative boundary.2.7 ConesIn optimization, convex cones achieve prominence because they generalizesubspaces. Most compelling is the projection analogy: Projection on asubspace can be ascertained from projection on its orthogonal complement(E), whereas projection on a closed convex cone can be determined fromprojection instead on its algebraic complement (2.13,E.9.2.1); called thepolar cone.2.7.0.0.1 Definition. Ray.The one-dimensional set{ζΓ + B | ζ ≥ 0, Γ ≠ 0} ⊂ R n (136)defines a halfline called a ray in direction Γ∈ R n having base B ∈ R n . WhenB=0, a ray is the conic hull of direction Γ ; hence a convex cone. △The conventional boundary of a single ray, base 0, in any dimension isthe origin because that is the union of all exposed faces not containing theentire set. Its relative interior is the ray itself excluding the origin.

94 CHAPTER 2. CONVEX GEOMETRYX(a)00(b)XFigure 21: (a) Two-dimensional nonconvex cone drawn truncated. Boundaryof this cone is itself a cone. Each polar half is itself a convex cone. (b) Thisconvex cone (drawn truncated) is a line through the origin in any dimension.It has no relative boundary, while its relative interior comprises entire line.0Figure 22: This nonconvex cone in R 2 is a pair of lines through the origin.[154,2.4]

2.7. CONES 950Figure 23: Boundary of a convex cone in R 2 is a nonconvex cone; a pair ofrays emanating from the origin.X0Figure 24: Nonconvex cone X drawn truncated in R 2 . Boundary is also acone. [154,2.4] Cone exterior is convex cone.

96 CHAPTER 2. CONVEX GEOMETRYXXFigure 25: Truncated nonconvex cone X = {x ∈ R 2 | x 1 ≥ x 2 , x 1 x 2 ≥ 0}.Boundary is also a cone. [154,2.4] Cartesian axes drawn for reference. Eachhalf (about the origin) is itself a convex cone.2.7.1 Cone definedA set X is called, simply, cone if and only ifΓ ∈ X ⇒ ζΓ ∈ X for all ζ ≥ 0 (137)where X denotes closure of cone X . An example of such a cone is theunion of two opposing quadrants; e.g., X = { x∈ R 2 | x 1 x 2 ≥0 } which is notconvex. [242,2.5] Similar examples are shown in Figure 21 and Figure 25.All cones can be defined by an aggregate of rays emanating exclusivelyfrom the origin (but not all cones are convex). Hence all closed cones containthe origin and are unbounded, excepting the simplest cone {0}. The emptyset ∅ is not a cone, butcone ∅ ∆ = {0} (80)

2.7. CONES 972.7.2 Convex coneWe call the set K ⊆ R M a convex cone iffΓ 1 , Γ 2 ∈ K ⇒ ζΓ 1 + ξΓ 2 ∈ K for all ζ,ξ ≥ 0 (138)Apparent from this definition, ζΓ 1 ∈ K and ξΓ 2 ∈ K for all ζ,ξ ≥ 0. Theset K is convex since, for any particular ζ,ξ ≥ 0because µ ζ,(1 − µ)ξ ≥ 0.Obviously,µ ζΓ 1 + (1 − µ)ξΓ 2 ∈ K ∀µ ∈ [0, 1] (139){X } ⊃ {K} (140)the set of all convex cones is a proper subset of all cones. The set ofconvex cones is a narrower but more familiar class of cone, any memberof which can be equivalently described as the intersection of a possibly(but not necessarily) infinite number of hyperplanes (through the origin)and halfspaces whose bounding hyperplanes pass through the origin; ahalfspace-description (2.4). The interior of a convex cone is possibly empty.Figure 26: Not a cone; ironically, the three-dimensional flared horn (with orwithout its interior) resembling the mathematical symbol ≻ denoting conemembership and partial order.

98 CHAPTER 2. CONVEX GEOMETRYFamiliar examples of convex cones include an unbounded ice-cream coneunited with its interior (a.k.a: second-order cone, quadratic cone, circularcone (2.9.2.4.1), Lorentz cone (confer Figure 32) [38, exmps.2.3 & 2.25]),{[ ]}xK l = ∈ R n × R | ‖x‖tl ≤ t , l=2 (141)and any polyhedral cone (2.12.1.0.1); e.g., any orthant generated byCartesian axes (2.1.3). Esoteric examples of convex cones include the pointat the origin, any line through the origin, any ray having the origin as basesuch as the nonnegative real line R + in subspace R , any halfspace partiallybounded by a hyperplane through the origin, the positive semidefinitecone S M + (154), the cone of Euclidean distance matrices EDM N (462) (5),any subspace, and Euclidean vector space R n .2.7.2.1 cone invariance(confer figures: 12, 21, 22, 23, 24, 25, 26, 27, 29, 36, 39, 42, 43, 44, 45,46, 47, 48, 49, 74, 87) More Euclidean bodies are cones, it seems, than arenot. This class of convex body, the convex cone, is invariant to nonnegativescaling, vector summation, affine and inverse affine transformation, andintersection, [194, p.22] but is not invariant to projection; e.g., Figure 31.2.7.2.1.1 Theorem. Cone intersection. [194,2,19]The intersection of an arbitrary collection of convex cones is a convex cone.Intersection of an arbitrary collection of closed convex cones is a closedconvex cone. [160,2.3] Intersection of a finite number of polyhedral cones(2.12.1.0.1, Figure 36 p.132) is polyhedral.⋄The property pointedness is associated with a convex cone.2.7.2.1.2 Definition. Pointed convex cone. (confer2.12.2.2)A convex cone K is pointed iff it contains no line. Equivalently, K is notpointed iff there exists any nonzero direction Γ ∈ K such that −Γ ∈ K .[38,2.4.1] If the origin is an extreme point of K or, equivalently, ifK ∩ −K = {0} (142)then K is pointed, and vice versa. [210,2.10]△

2.7. CONES 99Thus the simplest and only bounded [242, p.75] convex coneK = {0} ⊆ R n is pointed by convention, but has empty interior. Its relativeboundary is the empty set (135) while its relative interior is the pointitself (10). The pointed convex cone that is a halfline emanating from theorigin in R n has the origin as relative boundary while its relative interior isthe halfline itself, excluding the origin.2.7.2.1.3 Theorem. Pointed cones. [36,3.3.15, exer.20]A closed convex cone K ⊂ R n is pointed if and only if there exists a normalα such that the setC ∆ = {x ∈ K | 〈x,α〉=1} (143)is closed, bounded, and K = cone C . Equivalently, if and only if there existsa vector β and positive scalar ǫ such that〈x,β〉 ≥ ǫ‖x‖ ∀x∈ K (144)If closed convex cone K is not pointed, then it has no extreme point. Yeta pointed closed convex cone has only one extreme point; it resides at theorigin. [27,3.3]From the cone intersection theorem it follows that an intersection ofconvex cones is pointed if at least one of the cones is.2.7.2.2 Pointed closed convex cone and partial orderA pointed closed convex cone K induces partial order [244] on R n or R m×n ,[16,1] respectively defined by vector or matrix inequality;⋄x ≼Kz ⇔ z − x ∈ K (145)x ≺Kz ⇔ z − x ∈ rel int K (146)Neither x or z is necessarily a member of K for these relations to hold. Onlywhen K is the nonnegative orthant do these inequalities reduce to ordinaryentrywise comparison. (2.13.4.2.2) Inclusive of that special case, we ascribenomenclature generalized inequality to comparison with respect to a pointedclosed convex cone.

100 CHAPTER 2. CONVEX GEOMETRYThe visceral mechanics of actually comparing vectors when the cone is notan orthant is well illustrated in the example of Figure 46 which relies on theequivalent membership-interpretation in (145) or (146). Comparable pointsand the minimum element of some vector- or matrix-valued set are thus welldefined (and decreasing sequences with respect to K therefore converge):Vector x ∈ C is the (unique) minimum element of set C with respect tocone K if and only if for each and every z ∈ C we have x ≼ z ; equivalently,iff C ⊆ x + K . 2.21 A closely related concept minimal element, is useful forsets having no minimum element: Vector x ∈ C is a minimal element of setC with respect to K if and only if (x − K) ∩ C = x .More properties of partial ordering with respect to K are catalogedin [38,2.4] with respect to proper cones: 2.22 such as reflexivity (x≼x),antisymmetry (x≼z , z ≼x ⇒ x=z), transitivity (x≼y , y ≼z ⇒ x≼z),additivity (x≼z , u≼v ⇒ x+u ≼ z+v), strict properties, and preservationunder nonnegative scaling or limiting operations.2.7.2.2.1 Definition. Proper cone: [38,2.4.1] A cone that ispointedclosedconvexhas nonempty interior△A proper cone remains proper under injective linear transformation.[51,5.1]2.21 Borwein & Lewis [36,3.3, exer.21] ignore possibility of equality to x + K in thiscondition, and require a second condition: . . . and C ⊂ y + K for some y in R n impliesx ∈ y + K .2.22 We distinguish pointed closed convex cones here because the generalized inequalityand membership corollary (2.13.2.0.1) remains intact. [123,A.4.2.7]

2.8. CONE BOUNDARY 101Examples of proper cones are the positive semidefinite cone S M + in theambient space of symmetric matrices (2.9), the nonnegative real line R + invector space R , or any orthant in R n .2.8 Cone boundaryEvery hyperplane supporting a convex cone contains the origin. [123,A.4.2]Because any supporting hyperplane to a convex cone must therefore itself bea cone, then from the cone intersection theorem it follows:2.8.0.0.1 Lemma. Cone faces. [17,II.8]Each nonempty exposed face of a convex cone is a convex cone. ⋄2.8.0.0.2 Theorem. Proper-cone boundary.Suppose a nonzero point Γ lies on the boundary ∂K of proper cone K in R n .Then it follows that the ray {ζΓ | ζ ≥ 0} also belongs to ∂K . ⋄Proof. By virtue of its propriety, a proper cone guarantees the existenceof a strictly supporting hyperplane at the origin. [194, Cor.11.7.3] 2.23 Hencethe origin belongs to the boundary of K because it is the zero-dimensionalexposed face. The origin belongs to the ray through Γ , and the ray belongsto K by definition (137). By the cone faces lemma, each and every nonemptyexposed face must include the origin. Hence the closed line segment 0Γ mustlie in an exposed face of K because both endpoints do by Definition 2.6.1.3.1.That means there exists a supporting hyperplane ∂H to K containing 0Γ .So the ray through Γ belongs both to K and to ∂H . ∂H must thereforeexpose a face of K that contains the ray; id est,{ζΓ | ζ ≥ 0} ⊆ K ∩ ∂H ⊂ ∂K (147)2.23 Rockafellar’s corollary yields a supporting hyperplane at the origin to any convex conein R n not equal to R n .

102 CHAPTER 2. CONVEX GEOMETRYProper cone {0} in R 0 has no boundary (134) because (10)rel int{0} = {0} (148)The boundary of any proper cone in R is the origin.The boundary of any proper cone whose dimension exceeds 1 can beconstructed entirely from an aggregate of rays emanating exclusively fromthe origin.2.8.1 Extreme directionThe property extreme direction arises naturally in connection with thepointed closed convex cone K ⊂ R n , being analogous to extreme point.[194,18, p.162] 2.24 An extreme direction Γ ε of pointed K is a vectorcorresponding to an edge that is a ray emanating from the origin. 2.25Nonzero direction Γ ε in pointed K is extreme if and only if,ζ 1 Γ 1 +ζ 2 Γ 2 ≠ Γ ε ∀ ζ 1 , ζ 2 ≥ 0, ∀ Γ 1 , Γ 2 ∈ K\{ζΓ ε ∈ K | ζ ≥0} (149)In words, an extreme direction in a pointed closed convex cone is thedirection of a ray, called an extreme ray, that cannot be expressed as a coniccombination of any ray directions in the cone distinct from it.By (81), extreme direction Γ ε is not a point relatively interior to anyline segment in K\{ζΓ ε ∈ K | ζ ≥0} . Thus, by analogy, the correspondingextreme ray {ζΓ ε | ζ ≥0} is not a ray relatively interior to any planesegment 2.26 in K .2.8.1.1 extreme distinction, uniquenessAn extreme direction is unique, but its vector representation Γ ε is notbecause any positive scaling of it produces another vector in the same(extreme) direction. Hence an extreme direction is unique to within a positivescaling. When we say extreme directions are distinct, we are referring todistinctness of rays containing them. Nonzero vectors of various length in2.24 We diverge from Rockafellar’s extreme direction: “extreme point at infinity”.2.25 An edge of a convex cone is not necessarily a ray. A convex cone may contain an edgethat is a line; e.g., a wedge-shaped polyhedral cone (K ∗ in Figure 27).2.26 A planar fragment; in this context, a planar cone.

2.8. CONE BOUNDARY 103∂K ∗KFigure 27: K is a pointed polyhedral cone having empty interior in R 3(drawn truncated and in a plane parallel to the floor upon which you stand).K ∗ is a wedge whose truncated boundary is illustrated (drawn perpendicularto the floor). In this particular instance, K ⊂ int K ∗ (excepting the origin).Cartesian coordinate axes drawn for reference.

104 CHAPTER 2. CONVEX GEOMETRYthe same extreme direction are therefore interpreted to be identical extremedirections. 2.27The extreme directions of the polyhedral cone in Figure 12 (page 67), forexample, correspond to its three edges.The extreme directions of the positive semidefinite cone (2.9) comprisethe infinite set of all symmetric rank-one matrices. [16,6] [120,III] Itis sometimes prudent to instead consider the less infinite but completenormalized set, for M >0 (confer (183)){zz T ∈ S M | ‖z‖= 1} (150)The positive semidefinite cone in one dimension M =1, S + the nonnegativereal line, has one extreme direction belonging to its relative interior; anidiosyncrasy of dimension 1.Pointed closed convex cone K = {0} has no extreme direction becauseextreme directions are nonzero by definition.If closed convex cone K is not pointed, then it has no extreme directionsand no vertex. [16,1]Conversely, pointed closed convex cone K is equivalent to the convex hullof its vertex and all its extreme directions. [194,18, p.167] That is thepractical utility of extreme direction; to facilitate construction of polyhedralsets, apparent from the extremes theorem:2.8.1.1.1 Theorem. (Klee) Extremes. [210,3.6] [194,18, p.166](confer2.3.2,2.12.2.0.1) Any closed convex set containing no lines canbe expressed as the convex hull of its extreme points and extreme rays. ⋄2.27 Like vectors, an extreme direction can be identified by the Cartesian point at thevector’s head with respect to the origin.

2.8. CONE BOUNDARY 105It follows that any element of a convex set containing no lines maybe expressed as a linear combination of its extreme elements; e.g.,Example 2.9.2.3.1.2.8.1.2 GeneratorsIn the narrowest sense, generators for a convex set comprise any collectionof points and directions whose convex hull constructs the set.When the extremes theorem applies, the extreme points and directionsare called generators of a convex set. An arbitrary collection of generatorsfor a convex set includes its extreme elements as a subset; the set of extremeelements of a convex set is a minimal set of generators for that convex set.Any polyhedral set has a minimal set of generators whose cardinality is finite.When the convex set under scrutiny is a closed convex cone, coniccombination of generators during construction is implicit as shown inExample 2.8.1.2.1 and Example 2.10.2.0.1. So, a vertex at the origin (if itexists) becomes benign.We can, of course, generate affine sets by taking the affine hull of anycollection of points and directions. We broaden, thereby, the meaning ofgenerator to be inclusive of all kinds of hulls.Any hull of generators is loosely called a vertex-description. (2.3.4)Hulls encompass subspaces, so any basis constitutes generators for avertex-description; span basis R(A).2.8.1.2.1 Example. Application of extremes theorem.Given an extreme point at the origin and N extreme rays, denoting the i thextreme direction by Γ i ∈ R n , then the convex hull is (71)P = { [0 Γ 1 Γ 2 · · · Γ N ]aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ] aζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ] b | b ≽ 0 } (151)⊂ R na closed convex set that is simply a conic hull like (79).

106 CHAPTER 2. CONVEX GEOMETRY2.8.2 Exposed direction2.8.2.0.1 Definition. Exposed point & direction of pointed convex cone.[194,18] (confer2.6.1.0.1)When a convex cone has a vertex, an exposed point, it resides at theorigin; there can be only one.In the closure of a pointed convex cone, an exposed direction is thedirection of a one-dimensional exposed face that is a ray emanatingfrom the origin.{exposed directions} ⊆ {extreme directions}△For a proper cone in vector space R n with n ≥ 2, we can say more:{exposed directions} = {extreme directions} (152)It follows from Lemma 2.8.0.0.1 for any pointed closed convex cone, thereis one-to-one correspondence of one-dimensional exposed faces with exposeddirections; id est, there is no one-dimensional exposed face that is not a raybase 0.The pointed closed convex cone EDM 2 , for example, is a ray inisomorphic subspace R whose relative boundary (2.6.1.3.1) is the origin.The conventionally exposed directions of EDM 2 constitute the empty set∅ ⊂ {extreme direction}. This cone has one extreme direction belonging toits relative interior; an idiosyncrasy of dimension 1.2.8.2.1 Connection between boundary and extremes2.8.2.1.1 Theorem. Exposed. [194,18.7] (confer2.8.1.1.1)Any closed convex set C containing no lines (and whose dimension is atleast 2) can be expressed as the closure of the convex hull of its exposedpoints and exposed rays.⋄

2.8. CONE BOUNDARY 107BCADFigure 28: Properties of extreme points carry over to extreme directions.[194,18] Four rays (drawn truncated) on boundary of conic hull oftwo-dimensional closed convex set from Figure 20 lifted to R 3 . Raythrough point A is exposed hence extreme. Extreme direction B on coneboundary is not an exposed direction, although it belongs to the exposedface cone{A,B} . Extreme ray through C is exposed. Point D is neitheran exposed or extreme direction although it belongs to a two-dimensionalexposed face of the conic hull.0

108 CHAPTER 2. CONVEX GEOMETRYFrom Theorem 2.8.1.1.1,rel∂C = C \ rel int C (134)= conv{exposed points and exposed rays} \ rel int C= conv{extreme points and extreme rays} \ rel int C⎫⎪⎬⎪⎭(153)Thus each and every extreme point of a convex set (that is not a point)resides on its relative boundary, while each and every extreme direction of aconvex set (that is not a halfline and contains no line) resides on its relativeboundary because extreme points and directions of such respective sets donot belong to the relative interior by definition.The relationship between extreme sets and the relative boundary actuallygoes deeper: Any face F of convex set C (that is not C itself) belongs torel ∂ C , so dim F < dim C . [194,18.1.3]2.8.2.2 Converse caveatIt is inconsequent to presume that each and every extreme point and directionis necessarily exposed, as might be erroneously inferred from the conventionalboundary definition (2.6.1.3.1); although it can correctly be inferred: eachand every extreme point and direction belongs to some exposed face.Arbitrary points residing on the relative boundary of a convex set are notnecessarily exposed or extreme points. Similarly, the direction of an arbitraryray, base 0, on the boundary of a convex cone is not necessarily an exposedor extreme direction. For the polyhedral cone illustrated in Figure 12, forexample, there are three two-dimensional exposed faces constituting theentire boundary, each composed of an infinity of rays. Yet there are onlythree exposed directions.Neither is an extreme direction on the boundary of a pointed convex conenecessarily an exposed direction. Lift the two-dimensional set in Figure 20,for example, into three dimensions such that no two points in the set arecollinear with the origin. Then its conic hull can have an extreme directionB on the boundary that is not an exposed direction, illustrated in Figure 28.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1092.9 Positive semidefinite (PSD) coneThe cone of positive semidefinite matrices studied in this sectionis arguably the most important of all non-polyhedral cones whosefacial structure we completely understand.−Alexander Barvinok [17, p.78]2.9.0.0.1 Definition. Positive semidefinite (PSD) cone.The set of all symmetric positive semidefinite matrices of particulardimension M is called the positive semidefinite cone:S M ∆+ = { A ∈ S M | A ≽ 0 }= { A ∈ S M | y T Ay ≥0 ∀ ‖y‖ = 1 }= ⋂ {A ∈ S M | 〈yy T , A〉 ≥ 0 } (154)‖y‖=1formed by the intersection of an infinite number of halfspaces (2.4.1.1) invectorized variable A , 2.28 each halfspace having partial boundary containingthe origin in isomorphic R M(M+1)/2 . It is a unique immutable proper conein the ambient space of symmetric matrices S M .The positive definite (full-rank) matrices comprise the cone interior, 2.29int S M + = { A ∈ S M | A ≻ 0 }= { A ∈ S M | y T Ay>0 ∀ ‖y‖ = 1 }= {A ∈ S M + | rankA = M}(155)while all singular positive semidefinite matrices (having at least one 0eigenvalue) reside on the cone boundary (Figure 29); (A.7.4)∂S M + = { A ∈ S M | min{λ(A) i , i=1... M} = 0 }= { A ∈ S M + | 〈yy T , A〉=0 for some ‖y‖ = 1 }= { A ∈ S M + | rankA < M } (156)where λ(A)∈ R M holds the eigenvalues of A .△2.28 infinite in number when M >1. Because y T A y=y T A T y , matrix A is almost alwaysassumed symmetric. (A.2.1)2.29 The remaining inequalities in (154) also become strict for membership to the coneinterior.

110 CHAPTER 2. CONVEX GEOMETRYThe only symmetric positive semidefinite matrix in S M +0-eigenvalues resides at the origin. (A.7.2.0.1)having M2.9.0.1 MembershipObserve the notation A ≽ 0 ; meaning, 2.30 matrix A is symmetricand belongs to the positive semidefinite cone in the subspace ofsymmetric matrices, whereas A ≻ 0 denotes membership to that cone’sinterior. (2.13.2) This notation further implies that coordinates [sic] fororthogonal expansion of a positive (semi)definite matrix must be its(nonnegative) positive eigenvalues (2.13.7.1.1,E.6.4.1.1) when expandedin its eigenmatrices (A.5.1).2.9.0.1.1 Example. Equality constraints in semidefinite program (845).Employing properties of partial ordering (2.7.2.2) for the pointed closedconvex positive semidefinite cone, it is easy to show, given A + S = CS ≽ 0 ⇔ A ≼ C (157)2.9.1 PSD cone is convexThe set of all positive semidefinite matrices forms a convex cone in theambient space of symmetric matrices because any pair satisfies definition(138); [125,7.1] videlicet, for all ζ 1 , ζ 2 ≥ 0 and each and every A 1 , A 2 ∈ S Mζ 1 A 1 + ζ 2 A 2 ≽ 0 ⇐ A 1 ≽ 0, A 2 ≽ 0 (158)a fact easily verified by the definitive test for positive semidefiniteness of asymmetric matrix (A):A ≽ 0 ⇔ x T Ax ≥ 0 for each and every ‖x‖ = 1 (159)2.30 For matrices, notation A ≽B denotes comparison on S M with respect to the positivesemidefinite cone; (A.3.1) id est, A ≽B ⇔ A −B ∈ S M + , a generalization of comparisonon the real line. The symbol ≥ is reserved for scalar comparison on the real line R withrespect to the nonnegative real line R + as in a T y ≥ b, while a ≽ b denotes comparisonof vectors on R M with respect to the nonnegative orthant R M + .

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 111γsvec ∂ S 2 +[ α ββ γ]α√2βMinimal set of generators are the extreme directions: svec{yy T | y ∈ R M }Figure 29: Truncated boundary of PSD cone in S 2 plotted in isometricallyisomorphic R 3 via svec (46); courtesy, Alexandre W. d’Aspremont. (Plottedis 0-contour of minimum eigenvalue (156). Lightest shading is closest.Darkest shading is furthest and inside shell.) Entire boundary can beconstructed from an aggregate of rays (2.7.0.0.1) emanating exclusivelyfrom the origin, { √κ 2 [z12 2z1 z 2 z2 2 ] T | κ∈ R } . In this dimension the coneis circular (2.9.2.4) while each and every ray on boundary corresponds toan extreme direction, but such is not the case in any higher dimension(confer Figure 12). PSD cone geometry is not as simple in higher dimensions[17,II.12], although for real matrices it is self-dual (298) in ambient spaceof symmetric matrices. [120,II] PSD cone has no two-dimensional faces inany dimension, and its only extreme point resides at the origin.

112 CHAPTER 2. CONVEX GEOMETRYCXFigure 30: Convex set C ={X ∈ S × x∈ R | X ≽ xx T } drawn truncated.xid est, for A 1 , A 2 ≽ 0 and each and every ζ 1 , ζ 2 ≥ 0ζ 1 x T A 1 x + ζ 2 x T A 2 x ≥ 0 for each and every normalized x ∈ R M (160)The convex cone S M + is more easily visualized in the isomorphic vectorspace R M(M+1)/2 whose dimension is the number of free variables in asymmetric M ×M matrix. When M = 2 the PSD cone is semi-infinite inexpanse in R 3 , having boundary illustrated in Figure 29. When M = 3 thePSD cone is six-dimensional, and so on.2.9.1.0.1 Example. Sets from maps of the PSD cone.The setC = {X ∈ S n × x∈ R n | X ≽ xx T } (161)is convex because it has a Schur form; (A.4)X − xx T ≽ 0 ⇔ f(X , x) ∆ =[ X xx T 1]≽ 0 (162)e.g., Figure 30. Set C is the inverse image (2.1.9.0.1) of S n+1+ under theaffine mapping f . The set {X ∈ S n × x∈ R n | X ≼ xx T } is not convex, incontrast, having no Schur form. Yet for fixed x = x p , the set{X ∈ S n | X ≼ x p x T p } (163)is simply the negative semidefinite cone shifted to x p x T p .

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1132.9.1.0.2 Example. Inverse image of the PSD cone.Now consider finding the set of all matrices X ∈ S N satisfyinggiven A,B∈ S N . Define the setAX + B ≽ 0 (164)X ∆ = {X | AX + B ≽ 0} ⊆ S N (165)which is the inverse image of the positive semidefinite cone under affinetransformation g(X) =AX+B ∆ . Set X must therefore be convex byTheorem 2.1.9.0.1.Yet we would like a less amorphous characterization of this set, so insteadwe consider its vectorization (29) which is easier to visualize:wherevec g(X) = vec(AX) + vec B = (I ⊗A) vec X + vec B (166)I ⊗A ∆ = QΛQ T ∈ S N 2 (167)is a block-diagonal matrix formed by Kronecker product (A.1 no.20,D.1.2.1). Assignx ∆ = vec X ∈ R N 2b ∆ = vec B ∈ R N 2 (168)then make the equivalent problem: Findwherevec X = {x∈ R N 2 | (I ⊗A)x + b ∈ K} (169)K ∆ = vec S N + (170)is a proper cone isometrically isomorphic with the positive semidefinite conein the subspace of symmetric matrices; the vectorization of every element ofS N + . Utilizing the diagonalization (167),vec X = {x | ΛQ T x ∈ Q T (K − b)}= {x | ΦQ T x ∈ Λ † Q T (K − b)} ⊆ R N 2 (171)whereΦ ∆ = Λ † Λ (172)

114 CHAPTER 2. CONVEX GEOMETRYis a diagonal projection matrix whose entries are either 1 or 0 (E.3). Wehave the complementary sumΦQ T x + (I − Φ)Q T x = Q T x (173)So, adding (I −Φ)Q T x to both sides of the membership within (171) admitsvec X = {x∈ R N 2 | Q T x ∈ Λ † Q T (K − b) + (I − Φ)Q T x}= {x | Q T x ∈ Φ ( Λ † Q T (K − b) ) ⊕ (I − Φ)R N 2 }= {x ∈ QΛ † Q T (K − b) ⊕ Q(I − Φ)R N 2 }= (I ⊗A) † (K − b) ⊕ N(I ⊗A)(174)where we used the facts: linear function Q T x in x on R N 2 is a bijection,and ΦΛ † = Λ † .vec X = (I ⊗A) † vec(S N + − B) ⊕ N(I ⊗A) (175)In words, set vec X is the vector sum of the translated PSD cone(linearly mapped onto the rowspace of I ⊗ A (E)) and the nullspace ofI ⊗ A (synthesis of fact fromA.6.3 andA.7.2.0.1). Should I ⊗A have nonullspace, then vec X =(I ⊗A) −1 vec(S N + − B) which is the expected result.2.9.2 PSD cone boundaryFor any symmetric positive semidefinite matrix A of rank ρ , there mustexist a rank ρ matrix Y such that A be expressible as an outer productin Y ; [212,6.3]A = Y Y T ∈ S M + , rankA=ρ , Y ∈ R M×ρ (176)Then the boundary of the positive semidefinite cone may be expressed∂S M + = { A ∈ S M + | rankA

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 115√2βsvec S 2 +(a)(b)α[ α ββ γ]Figure 31: (a) Projection of the PSD cone S 2 + , truncated above γ =1, onαβ-plane in isometrically isomorphic R 3 . View is from above with respect toFigure 29. (b) Truncated above γ =2. From these plots we may infer, forexample, the line { [ 0 1/ √ 2 γ ] T | γ ∈ R } intercepts the PSD cone at somelarge value of γ ; in fact, γ =∞ .

116 CHAPTER 2. CONVEX GEOMETRY2.9.2.1 rank ρ subset of the PSD coneFor the same reason (closure), this applies more generally; for 0≤ρ≤M{A ∈ SM+ | rankA=ρ } = { A ∈ S M + | rankA≤ρ } (179)For easy reference, we give such generally nonconvex sets a name:rank ρ subset of the positive semidefinite cone. For ρ < M this subset,nonconvex for M > 1, resides on the PSD cone boundary. (We leave proofof equality an exercise.)For example,∂S M + = { A ∈ S M + | rankA=M− 1 } = { A ∈ S M + | rankA≤M− 1 } (180)In S 2 , each and every ray on the boundary of the positive semidefinite conein isomorphic R 3 corresponds to a symmetric rank-1 matrix (Figure 29), butthat does not hold in any higher dimension.2.9.2.2 Faces of PSD cone, their dimension versus rankEach and every face of the positive semidefinite cone, having dimension lessthan that of the cone, is exposed. [152,6] [133,2.3.4] Because each andevery face of the positive semidefinite cone contains the origin (2.8.0.0.1),each face belongs to a subspace of the same dimension.Given positive semidefinite matrix A ∈ S M + , define F ( S M + ∋A ) (132) asthe smallest face that contains A of the positive semidefinite cone S M + . ThenA , having diagonalization QΛQ T (A.5.2), is relatively interior to [17,II.12][61,31.5.3] [148,2.4]F ( S M + ∋A ) = {X ∈ S M + | N(X) ⊇ N(A)}= {X ∈ S M + | 〈Q(I − ΛΛ † )Q T , X〉 = 0}≃ S rank A+(181)which is isomorphic with the convex cone S rank A+ . Thus dimension of thesmallest face containing given matrix A isdim F ( S M + ∋A ) = rank(A)(rank(A) + 1)/2 (182)in isomorphic R M(M+1)/2 , and each and every face of S M + is isomorphic witha positive semidefinite cone having dimension the same as the face. Observe:not all dimensions are represented, and the only zero-dimensional face is theorigin. The PSD cone has no facets, for example.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1172.9.2.2.1 Table: Rank versus dimension of S 3 + facesk dim F(S 3 + ∋ rank-k matrix)0 0boundary 1 12 3interior 3 6For the positive semidefinite cone in isometrically isomorphic R 3 depictedin Figure 29, for example, rank-2 matrices belong to the interior of the facehaving dimension 3 (the entire closed cone), while rank-1 matrices belong tothe relative interior of a face having dimension 1 (the boundary constitutesall the one-dimensional faces, in this dimension, which are rays emanatingfrom the origin), and the only rank-0 matrix is the point at the origin (thezero-dimensional face).2.9.2.3 Extreme directions of the PSD coneBecause the positive semidefinite cone is pointed (2.7.2.1.2), there is aone-to-one correspondence of one-dimensional faces with extreme directionsin any dimension M ; id est, because of the cone faces lemma (2.8.0.0.1)and the direct correspondence of exposed faces to faces of S M + , it followsthere is no one-dimensional face of the positive semidefinite cone that is nota ray emanating from the origin.Symmetric dyads constitute the set of all extreme directions: For M >0(150){yy T ∈ S M | y ∈ R M } ⊂ ∂S M + (183)this superset of extreme directions for the positive semidefinite coneis, generally, a subset of the boundary. For two-dimensional matrices,(Figure 29){yy T ∈ S 2 | y ∈ R 2 } = ∂S 2 + (184)while for one-dimensional matrices, in exception, (2.7){yy T ∈ S | y≠0} = int S + (185)Each and every extreme direction yy T makes the same angle with the identity

118 CHAPTER 2. CONVEX GEOMETRYmatrix in isomorphic R M(M+1)/2 , dependent only on dimension; videlicet, 2.31(yy T , I) = arccos( )〈yy T , I〉 1= arccos √‖yy T ‖ F ‖I‖ F M∀y ∈ R M (186)2.9.2.3.1 Example. Positive semidefinite matrix from extreme directions.Diagonalizability (A.5) of symmetric matrices yields the following results:Any symmetric positive semidefinite matrix (1057) can be written in theformA = ∑ λ i z i zi T = ÂÂT = ∑ â i â T i ≽ 0, λ ≽ 0 (187)iia conic combination of linearly independent extreme directions (â i â T i or z i z T iwhere ‖z i ‖=1), where λ is a vector of eigenvalues.If we limit consideration to all symmetric positive semidefinite matricesbounded such that trA=1C ∆ = {A ≽ 0 | trA = 1} (188)then any matrix A from that set may be expressed as a convex combinationof linearly independent extreme directions;A = ∑ iλ i z i z T i ∈ C , 1 T λ = 1, λ ≽ 0 (189)Implications are:1. set C is convex, (it is an intersection of PSD cone with hyperplane)2. because the set of eigenvalues corresponding to a given square matrixA is unique, no single eigenvalue can exceed 1 ; id est, I ≽ A .2.9.2.4 PSD cone is not circularExtreme angle equation (186) suggests that the positive semidefinite conemight be invariant to rotation about its axis of revolution; id est, a circularcone. We investigate this now:2.31 Analogy with respect to the EDM cone is considered by Hayden & Wells et alii[110, p.162] where it is found: angle is not constant. The extreme directions of the EDMcone can be found in5.5.3.2 while the cone axis is −E =11 T − I (642).

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 119R0Figure 32: This circular cone continues upward infinitely. Axis of revolutionis illustrated as vertical line segment through origin. R is the radius, thedistance measured from any extreme direction to axis of revolution. Werethis a Lorentz cone, any plane slice containing the axis of revolution wouldmake a right angle.

120 CHAPTER 2. CONVEX GEOMETRY2.9.2.4.1 Definition. Circular cone: 2.32a pointed closed convex cone having hyperspherical sections orthogonal toits axis of revolution about which the cone is invariant to rotation. △A conic section is the intersection of a cone with any hyperplane. In threedimensions, an intersecting plane perpendicular to a circular cone’s axis ofrevolution produces a section bounded by a circle. (Figure 32) A prominentexample of a circular cone in convex analysis is the Lorentz cone (141). Wealso find that the positive semidefinite cone and cone of Euclidean distancematrices are circular cones, but only in low dimension.The positive semidefinite cone has axis of revolution that is the ray(base 0) through the identity matrix I . Consider the set of normalizedextreme directions of the positive semidefinite cone: for some arbitrarypositive constant a{yy T ∈ S M | ‖y‖ = √ a} ⊂ ∂S M + (190)The distance from each extreme direction to the axis of revolution is theradius√R ∆ = infc ‖yyT − cI‖ F = a1 − 1 M(191)which is the distance from yy T to a M I ; the length of vector yyT − a M I .Because distance R (in a particular dimension) from the axis of revolutionto each and every normalized extreme direction is identical, the extremedirections lie on the boundary of a hypersphere in isometrically isomorphicR M(M+1)/2 . From Example 2.9.2.3.1, the convex hull (excluding the vertexat the origin) of the normalized extreme directions is a conic sectionC ∆ = conv{yy T | y ∈ R M , y T y = a} = S M + ∩ {A∈ S M | 〈I , A〉 = a} (192)orthogonal to the identity matrix I ;〈C − a M I , I〉 = tr(C − a I) = 0 (193)MAlthough the positive semidefinite cone possesses some characteristics ofa circular cone, we can prove it is not by demonstrating a shortage of extreme2.32 A circular cone is assumed convex throughout, although not so by other authors. Wealso assume a right circular cone.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 121yy TaI θ aMRM 11TFigure 33: Illustrated is a section, perpendicular to the axis of revolution,of the circular cone from Figure 32. Radius R is the distance from anyextreme direction to the axis at a M I . Vector a M 11T is an arbitrary referenceby which to measure angle θ .directions; id est, some extreme directions corresponding to each and everyangle of rotation about the axis of revolution are nonexistent: Referring toFigure 33, [249,1-7]cos θ =〈 aM 11T − a M I , yyT − a M I〉a 2 (1 − 1 M ) (194)Solving for vector y we geta(1 + (M −1) cosθ) = (1 T y) 2 (195)Because this does not have real solution for every matrix dimension M andfor all 0 ≤ θ ≤ 2π , then we can conclude that the positive semidefinite conemight be circular but only in matrix dimensions 1 and 2 . 2.33 Because of the shortage of extreme directions, the conic section (192)cannot be hyperspherical by the extremes theorem (2.8.1.1.1).2.33 In fact, the positive semidefinite cone is circular in those dimensions while it is arotation of the Lorentz cone in dimension 2.

122 CHAPTER 2. CONVEX GEOMETRY0-1-0.5p =[ 1/21]10.500.5110.750.50 0.25p 1 A 11 ≥ ±p 2 A 12p 2 A 22 ≥ ±p 1 A 120-1-0.50.50[ 1p =1]10.5110.750.50 0.250-1-0.5A 11√2A12A 220.50[ 2p =1]10.5110.750.50 0.25svec ∂S 2 +Figure 34: Polyhedral proper cone K , created by intersection of halfspaces,inscribes PSD cone in isometrically isomorphic R 3 as predicted by Geršgorindiscs theorem for A=[A ij ]∈ S 2 . Hyperplanes supporting K intersect alongboundary of PSD cone. Four extreme directions of K coincide with extremedirections of PSD cone. Fascinating exercise is to find dual polyhedral propercone K ∗ .

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1232.9.2.4.2 Example. PSD cone inscription in three dimensions.Theorem. Geršgorin discs. [125,6.1] [237]For p∈R m + given A=[A ij ]∈ S m , then all the eigenvalues of A belong to theunion of m closed intervals on the real line;λ(A) ∈ m ⋃i=1⎧⎪⎨ξ ∈ R⎪⎩|ξ − A ii | ≤ ̺i∆= 1 m∑p ij=1j ≠ i⎫⎪⎬⋃p j |A ij | = m [A ii −̺i , A ii +̺i]i=1 ⎪⎭(196)Furthermore, if a union of k of these m [intervals] forms a connected regionthat is disjoint from all the remaining n − k [intervals], then there areprecisely k eigenvalues of A in this region.⋄To apply the theorem to determine positive semidefiniteness of symmetricmatrix A , we observe that for each i we must haveSupposeA ii ≥ ̺i (197)m = 2 (198)so A ∈ S 2 . Vectorizing A as in (46), svec A belongs to isometricallyisomorphic R 3 . Then we have m2 m−1 = 4 inequalities, in the matrix entriesA ij with Geršgorin parameters p =[p i ]∈ R 2 + ,p 1 A 11 ≥ ±p 2 A 12p 2 A 22 ≥ ±p 1 A 12(199)which describe an intersection of four halfspaces in R m(m+1)/2 . Thatintersection creates the polyhedral proper cone K (2.12.1) whoseconstruction is illustrated in Figure 34. Drawn truncated is the boundaryof the positive semidefinite cone svec S 2 + and the bounding hyperplanessupporting K .Created by means of Geršgorin discs, K always belongs to the positivesemidefinite cone for any nonnegative value of p ∈ R m + . Hence, any point inK corresponds to some positive semidefinite matrix A . Only the extremedirections of K intersect the positive semidefinite cone boundary in thisdimension; the four extreme directions of K are extreme directions of the

124 CHAPTER 2. CONVEX GEOMETRYpositive semidefinite cone. As p 1 /p 2 increases in value from 0, two extremedirections of K sweep the entire boundary of this positive semidefinite cone.Because the entire positive semidefinite cone can be swept by K , thedynamic system of linear inequalities[Y T svec A =∆ p1 ±p 2 / √ ]2 00 ±p 1 / √ svec A ≽ 0 (200)2 p 2can replace a semidefinite constraint A≽0 ; id est, forwhere Y ∈ R m(m+1)/2×m2m−1K = {z | Y T z ≽ 0} ⊂ svec S m + (201)∃p Y T svec A ≽ 0 ⇔ A ≽ 0svec A ∈ K ⇒ A ∈ S m +(202)In higher dimension (m > 2), the boundary of the positive semidefinitecone is no longer constituted completely by its extreme directions (symmetricrank-one matrices); the geometry becomes complicated. How all the extremedirections can be swept by an inscribed polyhedral cone, 2.34 similarly to theforegoing example, remains an open question.2.9.2.5 Boundary constituents of the PSD cone2.9.2.5.1 Lemma. Sum of positive semidefinite matrices.For A,B ∈ S M +rank(A + B) = rank(µA + (1 −µ)B) (203)over the open interval (0, 1) of µ .Proof. Any positive semidefinite matrix belonging to the PSD conehas an eigen decomposition that is a positively scaled sum of linearlyindependent symmetric dyads. By the linearly independent dyads definitioninB.1.1.0.1, rank of the sum A +B is equivalent to the number of linearlyindependent dyads constituting it. Linear independence is insensitive tofurther positive scaling by µ . The assumption of positive semidefinitenessprevents annihilation of any dyad from the sum A +B . 2.34 It is not necessary to sweep the entire boundary in higher dimension.⋄

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1252.9.2.5.2 Example. Rank function quasiconcavity. (confer3.2)For A,B ∈ R m×n [125,0.4]that follows from the fact [212,3.6]rankA + rankB ≥ rank(A + B) (204)dim R(A) + dim R(B) = dim R(A + B) + dim(R(A) ∩ R(B)) (205)For A,B ∈ S M +[38,3.4.2]rankA + rankB ≥ rank(A + B) ≥ min{rankA, rankB} (206)that follows from the factN(A + B) = N(A) ∩ N(B) , A,B ∈ S M + (127)Rank is a quasiconcave function on S M + because the right-hand inequality in(206) has the concave form (444); videlicet, Lemma 2.9.2.5.1. From this example we see, unlike convex functions, quasiconvex functionsare not necessarily continuous. (3.2) We also glean:2.9.2.5.3 Theorem. Convex subsets of the positive semidefinite cone.The subsets of the positive semidefinite cone S M + , for 0≤ρ≤MS M +(ρ) ∆ = {X ∈ S M + | rankX ≥ ρ} (207)are pointed convex cones, but not closed unless ρ = 0 ; id est, S M +(0)= S M + .⋄Proof. Given ρ , a subset S M +(ρ) is convex if and only ifconvex combination of any two members has rank at least ρ . That isconfirmed applying identity (203) from Lemma 2.9.2.5.1 to (206); id est, forA,B ∈ S M +(ρ) on the closed interval µ∈[0, 1]rank(µA + (1 −µ)B) ≥ min{rankA, rankB} (208)It can similarly be shown, almost identically to proof of the lemma, any coniccombination of A,B in subset S M +(ρ) remains a member; id est, ∀ζ, ξ ≥0rank(ζA + ξB) ≥ min{rank(ζA), rank(ξB)} (209)Therefore, S M +(ρ) is a convex cone.Another proof of convexity can be made by projection arguments:

126 CHAPTER 2. CONVEX GEOMETRY2.9.2.6 Projection on S M +(ρ)Because these cones S M +(ρ) indexed by ρ (207) are convex, projection onthem is straightforward. Given a symmetric matrix H having diagonalizationH = ∆ QΛQ T ∈ S M (A.5.2) with eigenvalues Λ arranged in nonincreasingorder, then its Euclidean projection (minimum-distance projection) on S M +(ρ)P S M+ (ρ)H = QΥ ⋆ Q T (210)corresponds to a map of its eigenvalues:Υ ⋆ ii ={ max {ǫ , Λii } , i=1... ρmax {0, Λ ii } , i=ρ+1... M(211)where ǫ is positive but arbitrarily close to 0. We leave proof an exercise ofTheorem E.9.2.0.1.Because each H ∈ S M has unique projection on S M +(ρ) (despite possibilityof repeated eigenvalues in Λ), we may conclude it is a convex set by theBunt-Motzkin theorem (E.9.0.0.1).Compare (211) to the well-known result regarding Euclidean projectionon a rank ρ subset (2.9.2.1) of the positive semidefinite coneS M + \ S M +(ρ + 1) = {X ∈ S M + | rankX ≤ ρ} (212)P S M+ \S M + (ρ+1) H = QΥ ⋆ Q T (213)As proved in7.1.4, this projection of H corresponds to the eigenvalue mapΥ ⋆ ii ={ max {0 , Λii } , i=1... ρ0 , i=ρ+1... M(967)Together these two results (211) and (967) mean: A higher-rank solutionto projection on the positive semidefinite cone lies arbitrarily close to anygiven lower-rank projection, but not vice versa. Were the number ofnonnegative eigenvalues in Λ known a priori not to exceed ρ , then thesetwo different projections would produce identical results in the limit ǫ→0.2.9.2.7 Uniting constituentsInterior of the PSD cone int S M + is convex by Theorem 2.9.2.5.3, for example,because all positive semidefinite matrices having rank M constitute the coneinterior.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 127All positive semidefinite matrices of rank less than M constitute the coneboundary; an amalgam of positive semidefinite matrices of different rank.Thus each nonconvex subset of positive semidefinite matrices, for 0

128 CHAPTER 2. CONVEX GEOMETRY2.9.3 Barvinok’s propositionBarvinok posits existence and quantifies an upper bound on rank of a positivesemidefinite matrix belonging to the intersection of the PSD cone with anaffine subset:2.9.3.0.1 Proposition. (Barvinok) Affine intersection with PSD cone.[17,II.13] [18,2.2] Consider finding a matrix X ∈ S N satisfyingX ≽ 0, 〈A j , X〉 = b j , j =1... m (217)given nonzero linearly independent A j ∈ S N and real b j . Define the affinesubsetA ∆ = {X | 〈A j , X〉=b j , j =1... m} ⊆ S N (218)If the intersection A ∩ S N + is nonempty, then there exists a matrixX ∈ A ∩ S N + such that given a number of equalities mrankX (rankX + 1)/2 ≤ m (219)whence the upper bound 2.35⌊√ ⌋ 8m + 1 − 1rankX ≤2(220)or given desired rank instead, equivalently,m < (rankX + 1)(rankX + 2)/2 (221)An extreme point of A ∩ S N + satisfies (220) and (221). (confer6.1.1.2)A matrix X ∆ =R T R is an extreme point if and only if the smallest face thatcontains X of A ∩ S N + has dimension 0 ; [148,2.4] id est, iff (132)dim F ( (A ∩ S N +)∋X )= rank(X)(rank(X) + 1)/2 − rank [ (222)svec RA 1 R T svec RA 2 R T · · · svec RA m R ] Tequals 0 in isomorphic R N(N+1)/2 .2.356.1.1.2 contains an intuitive explanation. This bound is itself limited above, of course,by N ; a tight limit corresponding to an interior point of S N + .

2.10. CONIC INDEPENDENCE (C.I.) 129Now the intersection A ∩ S N + is assumed bounded: Assume a givennonzero upper bound ρ on rank, a number of equalitiesm=(ρ + 1)(ρ + 2)/2 (223)and matrix dimension N ≥ ρ + 2 ≥ 3. If the intersection is nonempty andbounded, then there exists a matrix X ∈ A ∩ S N + such thatrankX ≤ ρ (224)This represents a tightening of the upper bound; a reduction by exactly 1of the bound provided by (220) given the same specified number m (223) ofequalities; id est,√ 8m + 1 − 1rankX ≤− 1 (225)2⋄When the intersection A ∩ S N + is known a priori to consist only of asingle point, then Barvinok’s proposition provides the greatest upper boundon its rank not exceeding N . The intersection can be a single nonzero pointonly if the number of linearly independent hyperplanes m constituting Asatisfies 2.36 N(N + 1)/2 − 1 ≤ m ≤ N(N + 1)/2 (226)2.10 Conic independence (c.i.)In contrast to extreme direction, the property conically independent directionis more generally applicable, inclusive of all closed convex cones (notonly pointed closed convex cones). Similar to the definition for linearindependence, arbitrary given directions {Γ i ∈ R n , i=1... N} are conicallyindependent if and only if, for all ζ ∈ R N +Γ i ζ i + · · · + Γ j ζ j − Γ l ζ l = 0, i≠ · · · ≠j ≠l = 1... N (227)2.36 For N >1, N(N+1)/2 −1 independent hyperplanes in R N(N+1)/2 can make a linetangent to svec ∂ S N + at a point because all one-dimensional faces of S N + are exposed.Because a pointed convex cone has only one vertex, the origin, there can be no intersectionof svec ∂ S N + with any higher-dimensional affine subset A that will make a nonzero point.

130 CHAPTER 2. CONVEX GEOMETRY000(a) (b) (c)Figure 35: Vectors in R 2 : (a) affinely and conically independent,(b) affinely independent but not conically independent, (c) conicallyindependent but not affinely independent. None of the examples exhibitslinear independence. (In general, a.i. c.i.)has only the trivial solution ζ =0; in words, iff no direction from the givenset can be expressed as a conic combination of those remaining. (Figure 35,for example. A Matlab implementation of test (227) is given inG.2.) Itis evident that linear independence (l.i.) of N directions implies their conicindependence;l.i. ⇒ c.i.Arranging any set of generators for a particular convex cone in a matrixcolumnar,X ∆ = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (228)then the relationship l.i. ⇒ c.i. suggests: the number of l.i. generators inthe columns of X cannot exceed the number of c.i. generators. Denoting byk the number of conically independent generators contained in X , we havethe most fundamental rank inequality for convex conesdim aff K = dim aff[0 X ] = rankX ≤ k ≤ N (229)Whereas N directions in n dimensions can no longer be linearly independentonce N exceeds n , conic independence remains possible:

2.10. CONIC INDEPENDENCE (C.I.) 1312.10.0.0.1 Table: Maximum number c.i. directionsn supk (pointed) supk (not pointed)0 0 01 1 22 2 43.∞.∞.Assuming veracity of this table, there is an apparent vastness between twoand three dimensions. The finite numbers of conically independent directionsindicate:Convex cones in dimensions 0, 1, and 2 must be polyhedral. (2.12.1)Conic independence is certainly one convex idea that cannot be completelyexplained by a two-dimensional picture. [17, p.vii]From this table it is also evident that dimension of Euclidean space cannotexceed the number of conically independent directions possible;n ≤ supkWe suspect the number of conically independent columns (rows) of X tobe the same for X †T .The columns (rows) of X are c.i. ⇔ the columns (rows) of X †T are c.i.Proof. Pending.2.10.1 Preservation of conic independenceIndependence in the linear (2.1.2.1), affine (2.4.2.4), and conic senses canbe preserved under linear transformation. Suppose a matrix X ∈ R n×N (228)holds a conically independent set columnar. Consider the transformationT(X) : R n×N → R n×N ∆ = XY (230)where the given matrix Y = ∆ [y 1 y 2 · · · y N ]∈ R N×N is represented by linearoperator T . Conic independence of {Xy i ∈ R n , i=1... N} demands, bydefinition (227),Xy i ζ i + · · · + Xy j ζ j − Xy l ζ l = 0, i≠ · · · ≠j ≠l = 1... N (231)have no nontrivial solution ζ ∈ R N + . That is ensured by conic independenceof {y i ∈ R N } and by R(Y )∩ N(X) = 0 ; seen by factoring X .

132 CHAPTER 2. CONVEX GEOMETRYK(a)K(b)∂K ∗Figure 36: (a) A pointed polyhedral cone (drawn truncated) in R 3 having sixfacets. The extreme directions, corresponding to six edges emanating fromthe origin, are generators for this cone; not linearly independent but theymust be conically independent. (b) The boundary of dual cone K ∗ (drawntruncated) is now added to the drawing of same K . K ∗ is polyhedral, proper,and has the same number of extreme directions as K has facets.

2.10. CONIC INDEPENDENCE (C.I.) 1332.10.1.1 linear maps of cones[16,7] If K is a convex cone in Euclidean space R and T is any linearmapping from R to Euclidean space M , then T(K) is a convex cone in Mand x ≼ y with respect to K implies T(x) ≼ T(y) with respect to T(K).If K is closed or has nonempty interior in R , then so is T(K) in M .If T is a linear bijection, then x ≼ y ⇔ T(x) ≼ T(y). Further, if F isa face of K , then T(F) is a face of T(K).2.10.2 Pointed closed convex K & conic independenceThe following bullets can be derived from definitions (149) and (227) inconjunction with the extremes theorem (2.8.1.1.1):The set of all extreme directions from a pointed closed convex coneK ⊂ R n is not necessarily a linearly independent set, yet it must be a conicallyindependent set; (compare Figure 12 on page 67 with Figure 36(a)){extreme directions} ⇒ {c.i.}Conversely, when a conically independent set of directions from pointedclosed convex cone K is known a priori to comprise generators, then alldirections from that set must be extreme directions of the cone;{extreme directions} ⇔ {c.i. generators of pointed closed convex K}Barker & Carlson [16,1] call the extreme directions a minimal generatingset for a pointed closed convex cone. A minimal set of generators is thereforea conically independent set of generators, and vice versa, 2.37 for a pointedclosed convex cone.Any collection of n or fewer extreme directions from pointed closedconvex cone K ⊂ R n must be linearly independent;{≤ n extreme directions in R n } ⇒ {l.i.}Conversely, because l.i. ⇒ c.i.,{extreme directions} ⇐ {l.i. generators of pointed closed convex K}2.37 This converse does not hold for nonpointed closed convex cones as Table 2.10.0.0.1implies; e.g., ponder four conically independent generators for a plane (case n=2).

134 CHAPTER 2. CONVEX GEOMETRYHx 2x 30∂HFigure 37: Minimal set of generators X = [x 1 x 2 x 3 ]∈ R 2×3 for halfspaceabout origin.x 12.10.2.0.1 Example. Vertex-description of halfspace H about origin.From n + 1 points in R n we can make a vertex-description of a convexcone that is a halfspace H , where {x l ∈ R n , l=1... n} constitutes aminimal set of generators for a hyperplane ∂H through the origin. Anexample is illustrated in Figure 37. By demanding the augmented set{x l ∈ R n , l=1... n + 1} be affinely independent (we want x n+1 not parallelto ∂H), thenH = ⋃ (ζ x n+1 + ∂H)ζ ≥0= {ζ x n+1 + cone{x l ∈ R n , l=1... n} | ζ ≥0}(232)= cone{x l ∈ R n , l=1... n + 1}a union of parallel hyperplanes. Cardinality is one step beyond dimension ofthe ambient space, but {x l ∀l} is a minimal set of generators for this convexcone H which has no extreme elements.

2.11. WHEN EXTREME MEANS EXPOSED 1352.10.3 Utility of conic independencePerhaps the most useful application of conic independence is determinationof the intersection of closed convex cones from their halfspace-descriptions,or representation of the sum of closed convex cones from theirvertex-descriptions.⋂KiA halfspace-description for the intersection of any number of closedconvex cones K i can be acquired by pruning normals; specifically,only the conically independent normals from the aggregate of all thehalfspace-descriptions need be retained.∑KiGenerators for the sum of any number of closed convex cones K i canbe determined by retaining only the conically independent generatorsfrom the aggregate of all the vertex-descriptions.Such conically independent sets are not necessarily unique or minimal.2.11 When extreme means exposedFor any convex polyhedral set in R n having nonempty interior, distinctionbetween the terms extreme and exposed vanishes [210,2.4] [61,2.2] forfaces of all dimensions except n ; their meanings become equivalent as wesaw in Figure 11 (discussed in2.6.1.2). In other words, each and every faceof any polyhedral set (except the set itself) can be exposed by a hyperplane,and vice versa; e.g., Figure 12.Lewis [152,6] [133,2.3.4] claims nonempty extreme proper subsets andthe exposed subsets coincide for S n + ; id est, each and every face of the positivesemidefinite cone, whose dimension is less than the dimension of the cone,is exposed. A more general discussion of cones having this property can befound in [220]; e.g., the Lorentz cone (141) [15,II.A].

136 CHAPTER 2. CONVEX GEOMETRY2.12 Convex polyhedraEvery polyhedron, such as the convex hull (71) of a bounded list X , canbe expressed as the solution set of a finite system of linear equalities andinequalities, and vice versa. [61,2.2]2.12.0.0.1 Definition. Convex polyhedra, halfspace-description.[38,2.2.4] A convex polyhedron is the intersection of a finite number ofhalfspaces and hyperplanes;P = {y | Ay ≽ b, Cy = d} ⊆ R n (233)where coefficients A and C generally denote matrices. Each row of C is avector normal to a hyperplane, while each row of A is a vector inward-normalto a hyperplane partially bounding a halfspace.△By the halfspaces theorem in2.4.1.1.1, a polyhedron thus described is aclosed convex set having possibly empty interior; e.g., Figure 11. Convexpolyhedra 2.38 are finite-dimensional comprising all affine sets (2.3.1),polyhedral cones, line segments, rays, halfspaces, convex polygons, solids[138, def.104/6, p.343], polychora, polytopes, 2.39 etcetera.It follows from definition (233) by exposure that each face of a convexpolyhedron is a convex polyhedron.The projection of any polyhedron on a subspace remains a polyhedron.More generally, the image of a polyhedron under any linear transformationis a polyhedron. [17,I.9]When b and d in (233) are 0, the resultant is a polyhedral cone. Theset of all polyhedral cones is a subset of convex cones:2.38 We consider only convex polyhedra throughout, but acknowledge the existence ofconcave polyhedra. [244, Kepler-Poinsot Solid]2.39 Some authors distinguish bounded polyhedra via the designation polytope. [61,2.2]

2.12. CONVEX POLYHEDRA 1372.12.1 Polyhedral coneFrom our study of cones we see the number of intersecting hyperplanes andhalfspaces constituting a convex cone is possibly but not necessarily infinite.When the number is finite, the convex cone is termed polyhedral. That isthe primary distinguishing feature between the set of all convex cones andpolyhedra; all polyhedra, including polyhedral cones, are finitely generated[194,19]. We distinguish polyhedral cones in the set of all convex cones forthis reason, although all convex cones of dimension 2 or less are polyhedral.2.12.1.0.1 Definition. Polyhedral cone, halfspace-description. 2.40(confer (240)) A polyhedral cone is the intersection of a finite number ofhalfspaces and hyperplanes about the origin;K = {y | Ay ≽ 0, Cy = 0} ⊆ R n (a)= {y | Ay ≽ 0, Cy ≽ 0, Cy ≼ 0} (b)⎧ ⎡ ⎤ ⎫⎨ A ⎬=⎩ y | ⎣ C ⎦y ≽ 0(c)⎭−C(234)where coefficients A and C generally denote matrices of finite dimension.Each row of C is a vector normal to a hyperplane containing the origin,while each row of A is a vector inward-normal to a hyperplane containingthe origin and partially bounding a halfspace.△A polyhedral cone thus defined is closed, convex, possibly has emptyinterior, and only a finite number of generators (2.8.1.2), and vice versa.(Minkowski/Weyl) [210,2.8]From the definition it follows that any single hyperplane through theorigin, or any halfspace partially bounded by a hyperplane through the originis a polyhedral cone. The most familiar example of polyhedral cone is anyquadrant (or orthant,2.1.3) generated by Cartesian axes. Esoteric examplesof polyhedral cone include the point at the origin, any line through the origin,any ray having the origin as base such as the nonnegative real line R + in2.40 Rockafellar [194,19] proposes affine sets be handled via complementary pairs of affineinequalities; e.g., Cy ≽d and Cy ≼d .

138 CHAPTER 2. CONVEX GEOMETRYsubspace R , polyhedral flavors of the (proper) Lorentz cone (confer (141)){[ xK l =t]}∈ R n × R | ‖x‖ l ≤ t, l=1 or ∞ (235)any subspace, and R n . More examples are illustrated in Figure 36 andFigure 12.2.12.2 Vertices of convex polyhedraBy definition, a vertex (2.6.1.0.1) always lies on the relative boundary of aconvex polyhedron. [138, def.115/6, p.358] In Figure 11, each vertex of thepolyhedron is located at the intersection of three or more facets, and everyedge belongs to precisely two facets [17,VI.1, p.252]. In Figure 12, the onlyvertex of that polyhedral cone lies at the origin.The set of all polyhedral cones is clearly a subset of convex polyhedra anda subset of convex cones. Not all convex polyhedra are bounded, evidently,neither can they all be described by the convex hull of a bounded set of pointsas we defined it in (71). Hence we propose a universal vertex-description ofpolyhedra in terms of that same finite-length list X (64):2.12.2.0.1 Definition. Convex polyhedra, vertex-description.(confer2.8.1.1.1) Denote the truncated a-vector,[ ]aia i:l = .a l(236)By discriminating a suitable finite-length generating list (or set) arrangedcolumnar in X ∈ R n×N , then any particular polyhedron may be describedP = { Xa | a T 1:k1 = 1, a m:N ≽ 0, {1... k} ∪ {m ...N} = {1... N} } (237)where 0 ≤ k ≤ N and 1 ≤ m ≤ N + 1 . Setting k = 0 removes the affineequality condition. Setting m=N + 1 removes the inequality. △

2.12. CONVEX POLYHEDRA 139Coefficient indices in (237) may or may not be overlapping, but allthe coefficients are assumed constrained. From (66), (71), and (79), wesummarize how the coefficient conditions may be applied;affine sets −→ a T 1:k 1 = 1polyhedral cones −→ a m:N ≽ 0}←− convex hull (m ≤ k) (238)It is always possible to describe a convex hull in the region of overlappingindices because, for 1 ≤ m ≤ k ≤ N{a m:k | a T m:k1 = 1, a m:k ≽ 0} ⊆ {a m:k | a T 1:k1 = 1, a m:N ≽ 0} (239)Members of a generating list are not necessarily vertices of thecorresponding polyhedron; certainly true for (71) and (237), some subsetof list members reside in the polyhedron’s relative interior. Conversely, whenboundedness (71) applies, the convex hull of the vertices is a polyhedronidentical to the convex hull of the generating list.2.12.2.1 Vertex-description of polyhedral coneGiven closed convex cone K in a subspace of R n having any set of generatorsfor it arranged in a matrix X ∈ R n×N as in (228), then that cone is describedsetting m=1 and k=0 in vertex-description (237):K = cone(X) = {Xa | a ≽ 0} ⊆ R n (240)a conic hull, like (79), of N generators.This vertex description is extensible to an infinite number of generators;which follows from the extremes theorem (2.8.1.1.1) and Example 2.8.1.2.1.

140 CHAPTER 2. CONVEX GEOMETRY2.12.2.2 Pointedness[210,2.10] Assuming all generators constituting the columns of X ∈ R n×Nare nonzero, K is pointed (2.7.2.1.2) if and only if there is no nonzero a ≽ 0that solves Xa=0; id est, iff N(X) ∩ R N + = 0. (If rankX = n , then thedual cone K ∗ is pointed. (253))A polyhedral proper cone in R n must have at least n linearly independentgenerators, or be the intersection of at least n halfspaces whose partialboundaries have normals that are linearly independent. Otherwise, the conewill contain at least one line and there can be no vertex; id est, the conecannot otherwise be pointed.For any pointed polyhedral cone, there is a one-to-one correspondence ofone-dimensional faces with extreme directions.Examples of pointed closed convex cones K are not limited to polyhedralcones: the origin, any 0-based ray in a subspace, any two-dimensionalV-shaped cone in a subspace, the Lorentz (ice-cream) cone and its polyhedralflavors, the cone of Euclidean distance matrices EDM N in S N h , the propercones: S M + in ambient S M , any orthant in R n or R m×n ; e.g., the nonnegativereal line R + in vector space R .2.12.3 Unit simplexA peculiar convex subset of the nonnegative orthant havinghalfspace-descriptionS ∆ = {s | s ≽ 0, 1 T s ≤ 1} ⊆ R n + (241)is a unique bounded convex polyhedron called unit simplex (Figure 38)having nonempty interior, n + 1 vertices, and dimension [38,2.2.4]dim S = n (242)The origin supplies one vertex while heads of the standard basis [125][212] {e i , i=1... n} in R n constitute those remaining; 2.41 thus itsvertex-description:2.41 In R 0 the unit simplex is the point at the origin, in R the unit simplex is the linesegment [0,1], in R 2 it is a triangle and its relative interior, in R 3 it is the convex hull ofa tetrahedron (Figure 38), in R 4 it is the convex hull of a pentatope [244], and so on.

2.12. CONVEX POLYHEDRA 141S = {s | s ≽ 0, 1 T s ≤ 1}1Figure 38: Unit simplex S in R 3 is a unique solid tetrahedron, but is notregular.

142 CHAPTER 2. CONVEX GEOMETRYS= conv {0, {e i , i=1... n}}= { [0 e 1 e 2 · · · e n ]a | a T 1 = 1, a ≽ 0 } (243)2.12.3.1 SimplexThe unit simplex comes from a class of general polyhedra called simplex,having vertex-description: [53] [194] [242] [61]conv{x l ∈ R n } | l = 1... k+1, dim aff{x l } = k , n ≥ k (244)So defined, a simplex is a closed bounded convex set having possibly emptyinterior. Examples of simplices, by increasing affine dimension, are: a point,any line segment, any triangle and its relative interior, a general tetrahedron,polychoron, and so on.2.12.3.1.1 Definition. Simplicial cone.A polyhedral proper (2.7.2.2.1) cone K in R n is called simplicial iff Khas exactly n extreme directions; [15,II.A] equivalently, iff proper K hasexactly n linearly independent generators contained in any given set ofgenerators.△There are an infinite variety of simplicial cones in R n ; e.g., Figure 12,Figure 39, Figure 47. Any orthant is simplicial, as is any rotation thereof.2.12.4 Converting between descriptionsConversion between halfspace-descriptions (233) (234) and equivalentvertex-descriptions (71) (237) is nontrivial, in general, [12] [61,2.2] but theconversion is easy for simplices. [38,2.2] Nonetheless, we tacitly assume thetwo descriptions to be equivalent. [194,19, thm.19.1] We explore conversionsin2.13.4 and2.13.9:

2.12. CONVEX POLYHEDRA 143Figure 39: Two views of a simplicial cone and its dual in R 3 (second view onnext page). Semi-infinite boundary of each cone is truncated for illustration.Cartesian axes are drawn for reference.

144 CHAPTER 2. CONVEX GEOMETRY

2.13. DUAL CONE & GENERALIZED INEQUALITY 1452.13 Dual cone & generalized inequality& biorthogonal expansionThese three concepts, dual cone, generalized inequality, and biorthogonalexpansion, are inextricably melded; meaning, it is difficult to completelydiscuss one without mentioning the others. The dual cone is critical in testsfor convergence by contemporary primal/dual methods for numerical solutionof conic problems. [259] [174,4.5] For unique minimum-distance projectionon a closed convex cone K , the negative dual cone −K ∗ plays the role thatorthogonal complement plays for subspace projection. 2.42 (E.9.2.1) Indeed,−K ∗ is the algebraic complement in R n ;K ⊞ −K ∗ = R n (245)where ⊞ denotes unique orthogonal vector sum.One way to think of a pointed closed convex cone is as a new kind ofcoordinate system whose basis is generally nonorthogonal; a conic system,very much like the familiar Cartesian system whose analogous cone is thefirst quadrant or nonnegative orthant. Generalized inequality ≽ K is aformalized means to determine membership to any pointed closed convexcone (2.7.2.2), while biorthogonal expansion is simply a formulation forexpressing coordinates in a pointed conic coordinate system. When coneK is the nonnegative orthant, then these three concepts come into alignmentwith the Cartesian prototype; biorthogonal expansion becomes orthogonalexpansion.2.13.1 Dual coneFor any set K (convex or not), the dual cone [38,2.6.1] [59,4.2]K ∗ ∆ = {y ∈ R n | 〈y, x〉 ≥ 0 for all x ∈ K} (246)2.42 Namely, projection on a subspace is ascertainable from its projection on the orthogonalcomplement.

146 CHAPTER 2. CONVEX GEOMETRYis a unique cone 2.43 that is always closed and convex because it is anintersection of halfspaces (halfspaces theorem (2.4.1.1.1)) whose partialboundaries each contain the origin, each halfspace having inward-normal xbelonging to K ; e.g., Figure 40(a).When cone K is convex, there is a second and equivalent construction:Dual cone K ∗ is the union of each and every vector y inward-normal toa hyperplane supporting K or bounding a halfspace containing K ; e.g.,Figure 40(b). When K is represented by a halfspace-description such as(234), for example, where⎡⎤⎡⎤A =∆ ⎣a T1 .⎦∈ R m×n , C =∆ ⎣c T1 .⎦∈ R p×n (247)a T mc T pthen the dual cone can be represented as the conic hullK ∗ = cone{a 1 ,..., a m , ±c 1 ,..., ±c p } (248)a vertex-description, because each and every conic combination of normalsfrom the halfspace-description of K yields another inward-normal to ahyperplane supporting or bounding a halfspace containing K .Perhaps the most instructive graphical method of dual cone constructionis cut-and-try. The simplest exercise of dual cone equation (246) is to findthe dual of each polyhedral cone from Figure 41.As defined, dual cone K ∗ exists even when the affine hull of the originalcone is a proper subspace; id est, even when the original cone has emptyinterior. Rockafellar formulates the dimension of K and K ∗ . [194,14] 2.44To further motivate our understanding of the dual cone, consider theease with which convergence can be observed in the following optimizationproblem (p):2.43 The dual cone is the negative polar cone defined by many authors; K ∗ = −K ◦ .[123,A.3.2] [194,14] [26] [17] [210,2.7]2.44 His monumental work Convex Analysis has not one figure or illustration. See[17,II.16] for a good illustration of Rockafellar’s recession cone [27].

2.13. DUAL CONE & GENERALIZED INEQUALITY 147K ∗(a)0K ∗KyK(b)0Figure 40: Two equivalent constructions of dual cone K ∗ in R 2 : (a) Showingconstruction by intersection of halfspaces about 0 (drawn truncated). Onlythose two halfspaces whose bounding hyperplanes have inward-normalcorresponding to an extreme direction of this pointed closed convex coneK ⊂ R 2 need be drawn; by (296). (b) Suggesting construction by union ofinward-normals y to each and every hyperplane ∂H + supporting K . Thisinterpretation is valid when K is convex because existence of a supportinghyperplane is then guaranteed (2.4.2.6).

148 CHAPTER 2. CONVEX GEOMETRYR 2 R 310.8(a)0.60.40.20∂K ∗K∂K ∗K(b)−0.2−0.4−0.6K ∗−0.8−1−0.5 0 0.5 1 1.5x ∈ K ⇔ 〈y, x〉 ≥ 0 for all y ∈ G(K ∗ ) (294)Figure 41: Dual cone construction by right-angle. Each extreme direction of apolyhedral cone is orthogonal to a facet of its dual cone, and vice versa, in anydimension. (2.13.6.1) (a) This characteristic guides graphical constructionof dual cone in two dimensions: It suggests finding dual-cone boundary ∂by making right-angles with extreme directions of polyhedral cone. Theconstruction is then pruned so that each dual boundary vector does notexceed π/2 radians in angle with each and every vector from polyhedralcone. Were dual cone in R 2 to narrow, Figure 42 would be reached in limit.(b) Same polyhedral cone and its dual continued into three dimensions.(confer Figure 47)

2.13. DUAL CONE & GENERALIZED INEQUALITY 149KK ∗0Figure 42: K is a halfspace about the origin in R 2 . K ∗ is a ray base 0,hence has empty interior in R 2 ; so K cannot be pointed. (Both convexcones appear truncated.)2.13.1.0.1 Example. Dual problem. (confer6.1)Essentially, duality theory concerns the representation of a given optimizationproblem as half a minimax problem (minimize x maximize y ˆf(x,y)) [194,36][38,5.4] whose saddle-value [83] exists. [191, p.3] Consider primal conicproblem (p) and its corresponding dual problem (d): [184,3.3.1] [148,2.1]given vectors α,β and matrix constant C(p)minimize α T xxsubject to x ∈ KCx = βmaximize β T zy,zsubject to y ∈ K ∗C T z + y = α(d) (249)Observe the dual problem is also conic, and its objective 2.45 never exceedsthat of the primal;α T x ≥ β T zx T (C T z + y) ≥ (Cx) T zx T y ≥ 02.45 The objective is the function that is argument to minimization or maximization.(250)

150 CHAPTER 2. CONVEX GEOMETRYwhich holds by definition (246). Under the sufficient condition: (249)(p) isa convex problem and satisfies Slater’s condition, 2.46 then each problem (p)and (d) attains the same optimal value of its objective and each problemis called a strong dual to the other because the duality gap (primal-dualobjective difference) is 0. Then (p) and (d) are together equivalent to theminimax problemminimize α T x − β T zx,y,zsubject to x ∈ K ,y ∈ K ∗Cx = β , C T z + y = α(p)−(d) (251)whose optimal objective always has the saddle-value 0 (regardless of theparticular convex cone K and other problem parameters). [232,3.2] Thusdetermination of convergence for either primal or dual problem is facilitated.2.13.1.1 Key properties of dual coneFor any cone, (−K) ∗ = −K ∗For any cones K 1 and K 2 , K 1 ⊆ K 2 ⇒ K ∗ 1 ⊇ K ∗ 2[210,2.7](conjugation) [194,14] [59,4.5] When K is any convex cone, the dualof the dual cone is the closure of the original cone; K ∗∗ = K . BecauseK ∗∗∗ = K ∗ K ∗ = (K) ∗ (252)When K is closed and convex, then the dual of the dual cone is theoriginal cone; K ∗∗ = K .2.46 A convex problem, essentially, has convex objective function optimized over a convexset. (6) In this context, (p) is convex if K is a convex cone. Slater’s condition is satisfiedwhenever any primal strictly feasible point exists. (p.364)

2.13. DUAL CONE & GENERALIZED INEQUALITY 151If any cone K has nonempty interior, then K ∗ is pointed;K nonempty interior ⇒ K ∗ pointed (253)Conversely, if the closure of any convex cone K is pointed, then K ∗ hasnonempty interior;K pointed ⇒ K ∗ nonempty interior (254)Given that a cone K ⊂ R n is closed and convex, K is pointed ifand only if K ∗ − K ∗ = R n ; id est, iff K ∗has nonempty interior.[36,3.3, exer.20](vector sum) [194, thm.3.8] For convex cones K 1 and K 2K 1 + K 2 = conv(K 1 ∪ K 2 ) (255)(dual vector-sum) [194,16.4.2] [59,4.6] For convex cones K 1 and K 2K ∗ 1 ∩ K ∗ 2 = (K 1 + K 2 ) ∗ = (K 1 ∪ K 2 ) ∗ (256)(closure of vector sum of duals) 2.47 For closed convex cones K 1 and K 2(K 1 ∩ K 2 ) ∗ = K ∗ 1 + K ∗ 2 = conv(K ∗ 1 ∪ K ∗ 2) (257)where closure becomes superfluous under the condition K 1 ∩ int K 2 ≠∅[36,3.3, exer.16,4.1, exer.7].(Krein-Rutman) For closed convex cones K 1 ⊆ R m and K 2 ⊆ R nand any linear map A : R n → R m , then provided int K 1 ∩ AK 2 ≠ ∅[36,3.3.13, confer4.1, exer.9](A −1 K 1 ∩ K 2 ) ∗ = A T K ∗ 1 + K ∗ 2 (258)where the dual of cone K 1 is with respect to its ambient space R m andthe dual of cone K 2 is with respect to R n , where A −1 K 1 denotes theinverse image (2.1.9.0.1) of K 1 under mapping A , and where A Tdenotes the adjoint operation.2.47 These parallel analogous results for subspaces R 1 , R 2 ⊆ R n ; [59,4.6]R ⊥⊥ = R for any subspace R.(R 1 + R 2 ) ⊥ = R ⊥ 1 ∩ R ⊥ 2(R 1 ∩ R 2 ) ⊥ = R ⊥ 1 + R⊥ 2

152 CHAPTER 2. CONVEX GEOMETRYK is proper if and only if K ∗ is proper.K is polyhedral if and only if K ∗ is polyhedral. [210,2.8]K is simplicial if and only if K ∗ is simplicial. (2.13.9.2) A simplicialcone and its dual are polyhedral proper cones (Figure 47, Figure 39),but not the converse.K ⊞ −K ∗ = R n ⇔ K is closed and convex. (1582) (p.610)Any direction in a proper cone K is normal to a hyperplane separatingK from −K ∗ .2.13.1.2 Examples of dual coneWhen K is R n , K ∗ is the point at the origin, and vice versa.When K is a subspace, K ∗ is its orthogonal complement, and vice versa.(E.9.2.1, Figure 43)When cone K is a halfspace in R n with n > 0 (Figure 42 for example),the dual cone K ∗ is a ray (base 0) belonging to that halfspace but orthogonalto its bounding hyperplane (that contains the origin), and vice versa.When convex cone K is a closed halfplane in R 3 (Figure 44), it is neitherpointed or of nonempty interior; hence, the dual cone K ∗ can be neither ofnonempty interior or pointed.When K is any particular orthant in R n , the dual cone is identical; id est,K = K ∗ .When K is any quadrant in subspace R 2 , K ∗ is a wedge-shaped polyhedralcone in R 3 ; e.g., for K equal to quadrant I , K ∗ =[R2+RWhen K is a polyhedral flavor of the Lorentz cone K l (235), the dual isthe polyhedral proper cone K q : for l=1 or ∞K q = K ∗ l ={[ xt].]}∈ R n × R | ‖x‖ q ≤ t(259)where ‖x‖ q is the dual norm determined via solution to 1/l + 1/q = 1.

2.13. DUAL CONE & GENERALIZED INEQUALITY 153K ∗K0K ∗K0Figure 43: When convex cone K is any one Cartesian axis, its dual cone isthe convex hull of all axes remaining. In R 3 , dual cone K ∗ (drawn tiled andtruncated) is a hyperplane through the origin; its normal belongs to line K .In R 2 , the dual cone K ∗ is a line through the origin while convex cone K isthe line orthogonal.

154 CHAPTER 2. CONVEX GEOMETRYKK ∗Figure 44: K and K ∗ are halfplanes in R 3 ; blades. Both semi-infinite convexcones appear truncated. Each cone is like K in Figure 42, but embedded ina two-dimensional subspace of R 3 . Cartesian coordinate axes drawn forreference.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1552.13.2 Abstractions of Farkas’ lemma2.13.2.0.1 Corollary. Generalized inequality and membership relation.[123,A.4.2] Let K be any closed convex cone and K ∗ its dual, and let xand y belong to a vector space R n . Thenx ∈ K ⇔ 〈y, x〉 ≥ 0 for all y ∈ K ∗ (260)which is a simple translation of the Farkas lemma [71] as in [194,22] to thelanguage of convex cones, and a generalization of the well-known Cartesianfactx ≽ 0 ⇔ 〈y, x〉 ≥ 0 for all y ≽ 0 (261)for which implicitly K = K ∗ = R n + the nonnegative orthant. By closure wehave conjugation:y ∈ K ∗ ⇔ 〈y, x〉 ≥ 0 for all x ∈ K (262)merely, a statement of fact by definition of the dual cone (246).When K and K ∗ are pointed closed convex cones, membership relation(260) is often written instead using dual generalized inequalitiesx ≽K0 ⇔ 〈y, x〉 ≥ 0 for all y ≽ 0 (263)K ∗meaning, coordinates for biorthogonal expansion of x (2.13.8) [238] mustbe nonnegative when x belongs to K . By conjugation [194, thm.14.1]y ≽K ∗0 ⇔ 〈y, x〉 ≥ 0 for all x ≽K0 (264)⋄When pointed closed convex cone K is not polyhedral, coordinate axesfor biorthogonal expansion asserted by the corollary are taken from extremedirections of K ; expansion is assured by Carathéodory’s theorem (E.6.4.1.1).We presume, throughout, the obvious:x ∈ K ⇔ 〈y, x〉 ≥ 0 for all y ∈ K ∗ (260)⇔x ∈ K ⇔ 〈y, x〉 ≥ 0 for all y ∈ K ∗ , ‖y‖= 1(265)An easy graphical exercise of this corollary is the two-dimensional polyhedralcone and its dual in Figure 41.

156 CHAPTER 2. CONVEX GEOMETRYWhen pointed closed convex cone K(confer2.7.2.2)x ≽ 0 ⇔ x ∈ Kx ≻ 0 ⇔ x ∈ rel int Kis implicit from context:(266)Strict inequality x ≻ 0 means coordinates for biorthogonal expansion of xmust be positive when x belongs to rel int K . Strict membership relationsare useful; e.g., for any proper cone K and its dual K ∗x ∈ int K ⇔ 〈y, x〉 > 0 for all y ∈ K ∗ , y ≠ 0 (267)x ∈ K , x ≠ 0 ⇔ 〈y, x〉 > 0 for all y ∈ int K ∗ (268)By conjugation, we also have the dual relations:y ∈ int K ∗ ⇔ 〈y, x〉 > 0 for all x ∈ K , x ≠ 0 (269)y ∈ K ∗ , y ≠ 0 ⇔ 〈y, x〉 > 0 for all x ∈ int K (270)Boundary-membership relations for proper cones are also useful:x ∈ ∂K ⇔ ∃ y 〈y, x〉 = 0, y ∈ K ∗ , y ≠ 0, x ∈ K (271)y ∈ ∂K ∗ ⇔ ∃ x 〈y, x〉 = 0, x ∈ K , x ≠ 0, y ∈ K ∗ (272)2.13.2.0.2 Example. Dual linear transformation. [217,4]Consider a given matrix A and closed convex cone K . By membershiprelation we haveAy ∈ K ∗ ⇔ x T Ay ≥0 ∀x ∈ K⇔ y T z ≥0 ∀z ∈ {A T x | x ∈ K}⇔ y ∈ {A T x | x ∈ K} ∗ (273)This implies{y | Ay ∈ K ∗ } = {A T x | x ∈ K} ∗ (274)If we regard A as a linear operator, then A T is its adjoint.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1572.13.2.1 Null certificate, Theorem of the alternativeIf in particular x p /∈ K a closed convex cone, then the construction inFigure 40(b) suggests there exists a hyperplane having inward-normalbelonging to dual cone K ∗ separating x p from K ; indeed, (260)x p /∈ K ⇔ ∃ y ∈ K ∗ 〈y , x p 〉 < 0 (275)The existence of any one such y is a certificate of null membership. From adifferent perspective,x p ∈ Kor in the alternative∃ y ∈ K ∗ 〈y , x p 〉 < 0(276)By alternative is meant: these two systems are incompatible; one system isfeasible while the other is not.2.13.2.1.1 Example. Theorem of the alternative for linear inequality.Myriad alternative systems of linear inequality can be explained in terms ofpointed closed convex cones and their duals.From membership relation (262) with affine transformation of dualvariable we write, for example,b − Ay ∈ K ∗ ⇔ x T (b − Ay)≥ 0 ∀x ∈ K (277)A T x=0, b − Ay ∈ K ∗ ⇒ x T b ≥ 0 ∀x ∈ K (278)where A∈ R n×m and b∈R n are given. Given membership relation (277),conversely, suppose we allow any y ∈ R m . Then because −x T Ay isunbounded below, x T (b −Ay)≥0 implies A T x=0: for y ∈ R mIn toto,A T x=0, b − Ay ∈ K ∗ ⇐ x T (b − Ay)≥ 0 ∀x ∈ K (279)b − Ay ∈ K ∗ ⇔ x T b ≥ 0, A T x=0 ∀x ∈ K (280)Vector x belongs to cone K but is constrained to lie in a subspaceof R n specified by an intersection of hyperplanes through the origin{x∈ R n |A T x=0}. From this, alternative systems of generalized inequalitywith respect to pointed closed convex cones K and K ∗

158 CHAPTER 2. CONVEX GEOMETRYAy ≼K ∗or in the alternativeb(281)x T b < 0, A T x=0, x ≽K0derived from (280) simply by taking the complementary sense of theinequality in x T b . These two systems are alternatives; if one system hasa solution, then the other does not. 2.48 [194, p.201]By invoking a strict membership relation between proper cones (267), wecan construct a more exotic interdependency strengthened by the demandfor an interior point;b − Ay ≻K ∗0 ⇔ x T b > 0, A T x=0 ∀x ≽K0, x ≠ 0 (282)From this, alternative systems of generalized inequality [38, pp.50, 54, 262]Ay ≺K ∗or in the alternativeb(283)x T b ≤ 0, A T x=0, x ≽K0, x ≠ 0derived from (282) by taking the complementary sense of the inequalityin x T b .2.48 If solutions at ±∞ are disallowed, then the alternative systems become insteadmutually exclusive with respect to nonpolyhedral cones. Simultaneous infeasibility ofthe two systems is not precluded by mutual exclusivity; called a weak alternative.Ye provides an example illustrating simultaneous [ ] infeasibility[ with respect ] to the positivesemidefinite cone: x∈ S 2 1 00 1, y ∈ R , A = , and b = where x0 01 0T b means〈x , b〉 . A better strategy than disallowing solutions at ±∞ is to demand an interiorpoint as in (283) or Lemma 6.2.1.1.2. Then question of simultaneous infeasibility is moot.

2.13. DUAL CONE & GENERALIZED INEQUALITY 159And from this, alternative systems with respect to the nonnegativeorthant attributed to Gordan in 1873: [89] [36,2.2] substituting A ←A Tand setting b = 0A T y ≺ 0or in the alternativeAx = 0, x ≽ 0, ‖x‖ 1 = 1(284)2.13.3 Optimality conditionThe general first-order necessary and sufficient condition for optimalityof solution x ⋆ to a convex problem ((249)(p) for example) with realdifferentiable objective function ˆf(x) : R n →R [193,3] is∇ ˆf(x ⋆ ) T (x − x ⋆ ) ≥ 0 ∀x ∈ C , x ⋆ ∈ C (285)where C is the feasible set, the convex set of all variable values satisfying theproblem constraints.2.13.3.0.1 Example. Equality constrained problem.Given a real differentiable convex function ˆf(x) : R n →R defined ondomain R n , a fat full-rank matrix C ∈ R p×n , and vector d∈ R p , the convexoptimization problemminimize ˆf(x)x(286)subject to Cx = dis characterized by the well-known necessary and sufficient optimalitycondition [38,4.2.3]∇ ˆf(x ⋆ ) + C T ν = 0 (287)where ν ∈ R p is the eminent Lagrange multiplier. [192] Feasible solution x ⋆is optimal, in other words, if and only if ∇ ˆf(x ⋆ ) belongs to R(C T ). Viamembership relation, we now derive this particular condition from the generalfirst-order condition for optimality (285):In this case, the feasible set isC ∆ = {x∈ R n | Cx = d} = {Zξ + x p | ξ ∈ R n−rank C } (288)

160 CHAPTER 2. CONVEX GEOMETRYwhere Z ∈ R n×n−rank C holds basis N(C) columnar, and x p is any particularsolution to Cx=d . Since x ⋆ ∈ C , we arbitrarily choose x p =x ⋆ which yieldsthe equivalent optimality condition∇ ˆf(x ⋆ ) T Zξ ≥ 0 ∀ξ∈ R n−rank C (289)But this is simply half of a membership relation, and the cone dual toR n−rank C is the origin in R n−rank C . We must therefore haveZ T ∇ ˆf(x ⋆ ) = 0 ⇔ ∇ ˆf(x ⋆ ) T Zξ ≥ 0 ∀ξ∈ R n−rank C (290)meaning, ∇ ˆf(x ⋆ ) must be orthogonal to N(C). This conditionZ T ∇ ˆf(x ⋆ ) = 0 , x ⋆ ∈ C (291)is necessary and sufficient for optimality of x ⋆ .2.13.4 Discretization of membership relation2.13.4.1 Dual halfspace-descriptionHalfspace-description of the dual cone is equally simple (and extensibleto an infinite number of generators) as vertex-description (240) for thecorresponding closed convex cone: By definition (246), for X ∈ R n×N as in(228), (confer (234))K ∗ = { y ∈ R n | z T y ≥ 0 for all z ∈ K }= { y ∈ R n | z T y ≥ 0 for all z = Xa , a ≽ 0 }= { y ∈ R n | a T X T y ≥ 0, a ≽ 0 }= { y ∈ R n | X T y ≽ 0 } (292)that follows from the generalized inequality and membership corollary (261).The semi-infinity of tests specified by all z ∈ K has been reduced to a setof generators for K constituting the columns of X ; id est, the test has beendiscretized.Whenever K is known to be closed and convex, then the converse mustalso hold; id est, given any set of generators for K ∗ arranged columnarin Y , then the consequent vertex-description of the dual cone connotes ahalfspace-description for K : [210,2.8]K ∗ = {Y a | a ≽ 0} ⇔ K ∗∗ = K = { y | Y T y ≽ 0 } (293)

2.13. DUAL CONE & GENERALIZED INEQUALITY 1612.13.4.2 First dual-cone formulaFrom these two results (292) and (293) we deduce a general principle:From any given vertex-description of a convex cone K , ahalfspace-description of the dual cone K ∗ is immediate by matrixtransposition; conversely, from any given halfspace-description, a dualvertex-description is immediate.Various other converses are just a little trickier. (2.13.9)We deduce further: For any polyhedral cone K , the dual cone K ∗ is alsopolyhedral and K ∗∗ = K . [210,2.8]The generalized inequality and membership corollary is discretized in thefollowing theorem [16,1] 2.49 that follows directly from (292) and (293):2.13.4.2.1 Theorem. Discrete membership. (confer2.13.2.0.1)Given any set of generators (2.8.1.2) denoted by G(K) for closed convexcone K ⊆ R n and any set of generators denoted G(K ∗ ) for its dual, let xand y belong to vector space R n . Then discretization of the generalizedinequality and membership corollary is necessary and sufficient for certifyingmembership:x ∈ K ⇔ 〈γ ∗ , x〉 ≥ 0 for all γ ∗ ∈ G(K ∗ ) (294)y ∈ K ∗ ⇔ 〈γ , y〉 ≥ 0 for all γ ∈ G(K) (295)(Exercise this theorem on Figure 41(a), for example, using the extremedirections as generators.) From this we may further deduce a more surgicaldescription of dual cone that prescribes only a finite number of halfspaces forits construction when polyhedral: (Figure 40(a))K ∗ = {y ∈ R n | 〈γ , y〉 ≥ 0 for all γ ∈ G(K)} (296)⋄2.49 Barker & Carlson state the theorem only for the pointed closed convex case.

162 CHAPTER 2. CONVEX GEOMETRY2.13.4.2.2 Example. Comparison with respect to orthant.When comparison is with respect to the nonnegative orthant K = R n + , thenfrom the discrete membership theorem it directly follows:x ≼ z ⇔ x i ≤ z i ∀i (297)Simple examples show entrywise inequality holds only with respect to thenonnegative orthant.2.13.5 Dual PSD cone and generalized inequalityThe dual positive semidefinite cone K ∗ is confined to S M by convention;S M +∗ ∆ = {Y ∈ S M | 〈Y , X〉 ≥ 0 for all X ∈ S M + } = S M + (298)The positive semidefinite cone is self-dual in the ambient space of symmetricmatrices [38, exmp.2.24] [25] [120,II]; K = K ∗ .Dual generalized inequalities with respect to the positive semidefinite conein the ambient space of symmetric matrices can therefore be simply stated:(Fejér)X ≽ 0 ⇔ tr(Y T X) ≥ 0 for all Y ≽ 0 (299)Membership to this cone can be determined in the isometrically isomorphicEuclidean space R M2 via (30). (2.2.1) By the two interpretations in2.13.1,positive semidefinite matrix Y can be interpreted as inward-normal to ahyperplane supporting the positive semidefinite cone.The fundamental statement of positive semidefiniteness, y T Xy ≥0 ∀y(A.3.0.0.1), is a particular instance of these generalized inequalities (299):X ≽ 0 ⇔ 〈yy T , X〉 ≥ 0 ∀yy T (≽ 0) (1049)Discretization (2.13.4.2.1) allows replacement of positive semidefinitematrices Y with this minimal set of generators comprising the extremedirections of the positive semidefinite cone (2.9.2.3).

2.13. DUAL CONE & GENERALIZED INEQUALITY 163(a)K0 ∂K ∗∂K ∗K(b)Figure 45: Two (truncated) views of a polyhedral cone K and its dual in R 3 .Each of four extreme directions from K belongs to a face of dual cone K ∗ .Shrouded (inside) cone K is symmetrical about its axis of revolution. Eachpair of diametrically opposed extreme directions from K makes a right angle.An orthant (or any rotation thereof; a simplicial cone) is not the only self-dualpolyhedral cone in three or more dimensions; [15,2.A.21] e.g., consider anequilateral with five extreme directions.

164 CHAPTER 2. CONVEX GEOMETRY2.13.5.1 self-dual conesFrom (106) (a consequence of the halfspaces theorem (2.4.1.1.1)), where thesupport function for a convex cone always has 0 value, or from discretizeddefinition (296) of the dual cone we get a rather self-evident characterizationof self-duality:K = K ∗ ⇔ K = ⋂ {y | γ T y ≥ 0 } (300)γ∈G(K)In words: Cone K is self-dual iff its own extreme directions areinward-normals to a (minimal) set of hyperplanes bounding halfspaces whoseintersection constructs it. This means each extreme direction of K is normalto a hyperplane exposing one of its own faces; a necessary but insufficientcondition for self-duality (Figure 45, for example).Self-dual cones are of necessarily nonempty interior [22,I] and invariantto rotation about the origin. Their most prominent representatives are theorthants, the positive semidefinite cone S M + in the ambient space of symmetricmatrices (298), and the Lorentz cone (141) [15,II.A] [38, exmp.2.25]. Inthree dimensions, a plane containing the axis of revolution of a self-dual cone(and the origin) will produce a slice whose boundary makes a right angle.2.13.5.1.1 Example. Linear matrix inequality.Consider a peculiar vertex-description for a closed convex cone defined overthe positive semidefinite cone (instead of the nonnegative orthant as indefinition (79)): for X ∈ S n given A j ∈ S n , j =1... m⎧⎡⎨K = ⎣⎩⎧⎡⎨= ⎣⎩〈A 1 , X〉.〈A m , X〉⎤ ⎫⎬⎦ | X ≽ 0⎭ ⊆ Rm⎤ ⎫svec(A 1 ) T⎬. ⎦svec X | X ≽ 0svec(A m ) T ⎭∆= {A svec X | X ≽ 0}(301)where A∈ R m×n(n+1)/2 , and where symmetric vectorization svec is definedin (46). K is indeed a convex cone because by (138)A svec X p1 , A svec X p2 ∈ K ⇒ A(ζ svec X p1 +ξ svec X p2 ) ∈ K for all ζ,ξ ≥ 0(302)

2.13. DUAL CONE & GENERALIZED INEQUALITY 165since a nonnegatively weighted sum of positive semidefinite matrices must bepositive semidefinite. (A.3.1.0.2) Although matrix A is finite-dimensional,K is generally not a polyhedral cone (unless m equals 1 or 2) becauseX ∈ S n + . Provided the A j matrices are linearly independent, thenrel int K = int K (303)meaning, the cone interior is nonempty implying the dual cone is pointedby (253).If matrix A has no nullspace, on the other hand, then (by2.10.1.1 andDefinition 2.2.1.0.1) A svec X is an isomorphism in X between the positivesemidefinite cone and R(A). In that case, convex cone K has relativeinteriorrel int K = {A svec X | X ≻ 0} (304)and boundaryrel ∂K = {A svec X | X ≽ 0, X ⊁ 0} (305)Now consider the (closed convex) dual cone:K ∗ = {y | 〈A svec X , y〉 ≥ 0 for all X ≽ 0} ⊆ R m= { y | 〈svec X , A T y〉 ≥ 0 for all X ≽ 0 }= { y | svec −1 (A T y) ≽ 0 } (306)that follows from (299) and leads to an equally peculiar halfspace-descriptionm∑K ∗ = {y ∈ R m | y j A j ≽ 0} (307)The summation inequality with respect to the positive semidefinite cone isknown as a linear matrix inequality. [37] [81] [163] [235]When the A j matrices are linearly independent, function g(y) = ∆ ∑ y j A jon R m is a linear bijection. The inverse image of the positive semidefinitecone under g(y) must therefore have dimension m . In that circumstance,the dual cone interior is nonemptym∑int K ∗ = {y ∈ R m | y j A j ≻ 0} (308)having boundary∂K ∗ = {y ∈ R m |j=1j=1m∑y j A j ≽ 0,j=1m∑y j A j ⊁ 0} (309)j=1

166 CHAPTER 2. CONVEX GEOMETRY2.13.6 Dual of pointed polyhedral coneIn a subspace of R n , now we consider a pointed polyhedral cone K given interms of its extreme directions Γ i arranged columnar in X ;X = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (228)The extremes theorem (2.8.1.1.1) provides the vertex-description of apointed polyhedral cone in terms of its finite number of extreme directionsand its lone vertex at the origin:2.13.6.0.1 Definition. Pointed polyhedral cone, vertex-description.(encore) (confer (240) (151)) Given pointed polyhedral cone K in a subspaceof R n , denoting its i th extreme direction by Γ i ∈ R n arranged in a matrix Xas in (228), then that cone may be described: (71) (confer (241))K = { [0 X ] aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= Xaζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{ } (310)= Xb | b ≽ 0 ⊆ Rnthat is simply a conic hull (like (79)) of a finite number N of directions.△Whenever cone K is pointed closed and convex (not only polyhedral), thendual cone K ∗ has a halfspace-description in terms of the extreme directionsΓ i of K :K ∗ = { y | γ T y ≥ 0 for all γ ∈ {Γ i , i=1... N} ⊆ rel ∂K } (311)because when {Γ i } constitutes any set of generators for K , the discretizationresult in2.13.4.1 allows relaxation of the requirement ∀x∈ K in (246) to∀γ∈{Γ i } directly. 2.50 That dual cone so defined is unique, identical to (246),polyhedral whenever the number of generators N is finiteK ∗ = { y | X T y ≽ 0 } ⊆ R n (292)and has nonempty interior because K is assumed pointed (but K ∗ is notnecessarily pointed unless K has nonempty interior (2.13.1.1)).2.50 The extreme directions of K constitute a minimal set of generators.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1672.13.6.1 Facet normal & extreme directionWe see from (292) that the conically independent generators of cone K(namely, the extreme directions of pointed closed convex cone K constitutingthe columns of X) each define an inward-normal to a hyperplane supportingK ∗ (2.4.2.6.1) and exposing a dual facet when N is finite. Were K ∗pointed and finitely generated, then by conjugation the dual statementwould also hold; id est, the extreme directions of pointed K ∗ each definean inward-normal to a hyperplane supporting K and exposing a facet whenN is finite. Examine Figure 41 or Figure 45, for example.We may conclude the extreme directions of polyhedral proper K arerespectively orthogonal to the facets of K ∗ ; likewise, the extreme directionsof polyhedral proper K ∗ are respectively orthogonal to the facets of K .2.13.7 Biorthogonal expansion by example2.13.7.0.1 Example. Relationship to dual polyhedral cone.The simplicial cone K illustrated in Figure 46 induces a partial order on R 2 .All points greater than x with respect to K , for example, are contained inthe translated cone x + K . The extreme directions Γ 1 and Γ 2 of K do notmake an orthogonal set; neither do extreme directions Γ 3 and Γ 4 of dualcone K ∗ ; rather, we have the biorthogonality condition, [238]Γ T 4 Γ 1 = Γ T 3 Γ 2 = 0Γ T 3 Γ 1 ≠ 0, Γ T 4 Γ 2 ≠ 0(312)Biorthogonal expansion of x ∈ K is thenx = Γ 1Γ T 3 xΓ T 3 Γ 1+ Γ 2Γ T 4 xΓ T 4 Γ 2(313)where Γ T 3 x/(Γ T 3 Γ 1 ) is the nonnegative coefficient of nonorthogonal projection(E.6.1) of x on Γ 1 in the direction orthogonal to Γ 3 , and whereΓ T 4 x/(Γ T 4 Γ 2 ) is the nonnegative coefficient of nonorthogonal projection ofx on Γ 2 in the direction orthogonal to Γ 4 ; they are coordinates in thisnonorthogonal system. Those coefficients must be nonnegative x ≽ K0because x ∈ K (266) and K is simplicial.If we ascribe the extreme directions of K to the columns of a matrixX ∆ = [ Γ 1 Γ 2 ] (314)

168 CHAPTER 2. CONVEX GEOMETRYΓ 4Γ 2Γ 3K ∗xzKyΓ 10Γ 1 ⊥ Γ 4Γ 2 ⊥ Γ 3K ∗w + KwFigure 46: Simplicial cone K in R 2 and its dual K ∗ drawn truncated. Conicallyindependent generators Γ 1 and Γ 2 constitute extreme directions of K whileΓ 3 and Γ 4 constitute extreme directions of K ∗ . Dotted ray-pairs boundtranslated cones K . Point x is comparable to point z (and vice versa) butnot to y ; z ≽ x ⇔ z − x ∈ K ⇔ z − x ≽ K0 iff ∃ nonnegative coordinatesfor biorthogonal expansion of z − x . Point y is not comparable to z becausez does not belong to y ± K . Flipping a translated cone is quite helpful forvisualization: x ≼ z ⇔ x ∈ z − K ⇔ x − z ≼ K0. Points need not belongto K to be comparable; e.g., all points greater than w belong to w + K .

2.13. DUAL CONE & GENERALIZED INEQUALITY 169then we findTherefore,X †T =[Γ 31Γ T 3 Γ 1Γ 41Γ T 4 Γ 2]x = XX † x (321)(315)is the biorthogonal expansion (313) (E.0.1), and the biorthogonalitycondition (312) can be expressed succinctly (E.1.1) 2.51X † X = I (322)Expansion w=XX † w for any w ∈ R 2 is unique if and only if the extremedirections of K are linearly independent; id est, iff X has no nullspace.2.13.7.1 Pointed cones and biorthogonalityBiorthogonality condition X † X = I from Example 2.13.7.0.1 means Γ 1 andΓ 2 are linearly independent generators of K (B.1.1.1); generators becauseevery x ∈ K is their conic combination. From2.10.2 we know that meansΓ 1 and Γ 2 must be extreme directions of K .A biorthogonal expansion is necessarily associated with a pointed closedconvex cone; pointed, otherwise there can be no extreme directions (2.8.1).We will address biorthogonal expansion with respect to a pointed polyhedralcone having empty interior in2.13.8.2.13.7.1.1 Example. Expansions implied by diagonalization.(confer5.5.3.2.1) When matrix X ∈ R M×M is diagonalizable (A.5),⎡ ⎤w T1 M∑X = SΛS −1 = [s 1 · · · s M ] Λ⎣. ⎦ = λ i s i wi T (1144)coordinates for biorthogonal expansion are its eigenvalues λ i (contained indiagonal matrix Λ) when expanded in S ;⎡ ⎤wX = SS −1 1 T X M∑X = [s 1 · · · s M ] ⎣ . ⎦ = λ i s i wi T (316)wM T X i=12.51 Possibly confusing is the fact that formula XX † x is simultaneously the orthogonalprojection of x on R(X) (1471), and a sum of nonorthogonal projections of x ∈ R(X) onthe range of each and every column of full-rank X skinny-or-square (E.5.0.0.2).w T Mi=1

170 CHAPTER 2. CONVEX GEOMETRYCoordinate value depend upon the geometric relationship of X to its linearlyindependent eigenmatrices s i w T i . (A.5.1,B.1.1)Eigenmatrices s i w T i are linearly independent dyads constituted by rightand left eigenvectors of diagonalizable X and are generators of somepointed polyhedral cone K in a subspace of R M×M .When S is real and X belongs to that polyhedral cone K , for example,then coordinates of expansion (the eigenvalues λ i ) must be nonnegative.When X = QΛQ T is symmetric, coordinates for biorthogonal expansionare its eigenvalues when expanded in Q ; id est, for X ∈ S MX = QQ T X =M∑q i qi T X =i=1M∑λ i q i qi T ∈ S M (317)i=1becomes an orthogonal expansion with orthonormality condition Q T Q=Iwhere λ i is the i th eigenvalue of X , q i is the corresponding i th eigenvectorarranged columnar in orthogonal matrixQ = [q 1 q 2 · · · q M ] ∈ R M×M (318)and where eigenmatrix q i qiT is an extreme direction of some pointedpolyhedral cone K ⊂ S M and an extreme direction of the positive semidefinitecone S M + .Orthogonal expansion is a special case of biorthogonal expansion ofX ∈ aff K occurring when polyhedral cone K is any rotation about theorigin of an orthant belonging to a subspace.Similarly, when X = QΛQ T belongs to the positive semidefinite cone inthe subspace of symmetric matrices, coordinates for orthogonal expansionmust be its nonnegative eigenvalues (1057) when expanded in Q ; id est, forX ∈ S M +M∑M∑X = QQ T X = q i qi T X = λ i q i qi T ∈ S M + (319)i=1where λ i ≥0 is the i th eigenvalue of X . This means X simultaneouslybelongs to the positive semidefinite cone and to the pointed polyhedral coneK formed by the conic hull of its eigenmatrices.i=1

2.13. DUAL CONE & GENERALIZED INEQUALITY 1712.13.7.1.2 Example. Expansion respecting nonpositive orthant.Suppose x ∈ K any orthant in R n . 2.52 Then coordinates for biorthogonalexpansion of x must be nonnegative; in fact, absolute value of the Cartesiancoordinates.Suppose, in particular, x belongs to the nonpositive orthant K = R n − .Then the biorthogonal expansion becomes an orthogonal expansionn∑n∑x = XX T x = −e i (−e T i x) = −e i |e T i x| ∈ R n − (320)i=1and the coordinates of expansion are nonnegative. For this orthant K we haveorthonormality condition X T X = I where X = −I , e i ∈ R n is a standardbasis vector, and −e i is an extreme direction (2.8.1) of K .Of course, this expansion x=XX T x applies more broadly to domain R n ,but then the coordinates each belong to all of R .2.13.8 Biorthogonal expansion, derivationBiorthogonal expansion is a means for determining coordinates in a pointedconic coordinate system characterized by a nonorthogonal basis. Studyof nonorthogonal bases invokes pointed polyhedral cones and their duals;extreme directions of a cone K are assumed to constitute the basis whilethose of the dual cone K ∗ determine coordinates.Unique biorthogonal expansion with respect to K depends upon existenceof its linearly independent extreme directions: Polyhedral cone K must bepointed; then it possesses extreme directions. Those extreme directions mustbe linearly independent to uniquely represent any point in their span.We consider nonempty pointed polyhedral cone K having possibly emptyinterior; id est, we consider a basis spanning a subspace. Then we needonly observe that section of dual cone K ∗ in the affine hull of K because, byexpansion of x , membership x∈aff K is implicit and because any breachof the ordinary dual cone into ambient space becomes irrelevant (2.13.9.3).Biorthogonal expansioni=1x = XX † x ∈ aff K = aff cone(X) (321)is expressed in the extreme directions {Γ i } of K arranged columnar in2.52 An orthant is simplicial and self-dual.X = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (228)

172 CHAPTER 2. CONVEX GEOMETRYunder assumption of biorthogonalityX † X = I (322)We therefore seek, in this section, a vertex-description for K ∗ ∩ aff K interms of linearly independent dual generators {Γ ∗ i }⊂aff K in the same finitequantity 2.53 as the extreme directions {Γ i } ofK = cone(X) = {Xa | a ≽ 0} ⊆ R n (240)We assume the quantity of extreme directions N does not exceed thedimension n of ambient vector space because, otherwise, the expansion couldnot be unique; id est, assume N linearly independent extreme directionshence N ≤ n (X skinny 2.54 -or-square full-rank). In other words, fat full-rankmatrix X is prohibited by uniqueness because of the existence of an infinityof right-inverses;polyhedral cones whose extreme directions number in excess of theambient space dimension are precluded in biorthogonal expansion.2.13.8.1 x ∈ KSuppose x belongs to K ⊆ R n . Then x =Xa for some a ≽ 0. Vector a isunique only when {Γ i } is a linearly independent set. 2.55 Vector a ∈ R N cantake the form a =Bx if R(B)= R N . Then we require Xa =XBx = x andBx = BXa = a . The pseudoinverse B = X † ∈ R N×n (E) is suitable whenX is skinny-or-square and full-rank. In that case rankX = N , and for allc ≽ 0 and i=1... Na ≽ 0 ⇔ X † Xa ≽ 0 ⇔ a T X T X †T c ≥ 0 ⇔ Γ T i X †T c ≥ 0 (323)2.53 When K is contained in a proper subspace of R n , the ordinary dual cone K ∗ will havemore generators in any minimal set than K has extreme directions.2.54 “Skinny” meaning thin; more rows than columns.2.55 Conic independence alone (2.10) is insufficient to guarantee uniqueness.

2.13. DUAL CONE & GENERALIZED INEQUALITY 173The penultimate inequality follows from the generalized inequality andmembership corollary, while the last inequality is a consequence of thatcorollary’s discretization (2.13.4.2.1). 2.56 From (323) and (311) we deduceK ∗ ∩ aff K = cone(X †T ) = {X †T c | c ≽ 0} ⊆ R n (324)is the vertex-description for that section of K ∗ in the affine hull of K becauseR(X †T ) = R(X) by definition of the pseudoinverse. From (253), we knowK ∗ ∩ aff K must be pointed if rel int K is logically assumed nonempty withrespect to aff K .Conversely, suppose full-rank skinny-or-square matrix[ ]X †T =∆ Γ ∗ 1 Γ ∗ 2 · · · Γ ∗ N ∈ R n×N (325)comprises the extreme directions {Γ ∗ i } ⊂ aff K of the dual-cone intersectionwith the affine hull of K . 2.57 From the discrete membership theorem and(257) we get a partial dual to (311); id est, assuming x∈aff cone X{ }x ∈ K ⇔ γ ∗T x ≥ 0 for all γ ∗ ∈ Γ ∗ i , i=1... N ⊂ ∂K ∗ ∩ aff K (326)⇔ X † x ≽ 0 (327)that leads to a partial halfspace-description,K = { x∈aff cone X | X † x ≽ 0 } (328)For γ ∗ =X †T e i , any x =Xa , and for all i we have e T i X † Xa = e T i a ≥ 0only when a ≽ 0. Hence x∈ K .2.56a ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀(c ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀a ≽ 0)∀(c ≽ 0 ⇔ Γ T i X†T c ≥ 0 ∀i) Intuitively, any nonnegative vector a is a conic combination of the standard basis{e i ∈ R N }; a≽0 ⇔ a i e i ≽0 for all i. The last inequality in (323) is a consequence of thefact that x=Xa may be any extreme direction of K , in which case a is a standard basisvector; a = e i ≽ 0. Theoretically, because c≽0 defines a pointed polyhedral cone (in fact,the nonnegative orthant in R N ), we can take (323) one step further by discretizing c :a ≽ 0 ⇔ Γ T i Γ ∗ j ≥ 0 for i,j =1... N ⇔ X † X ≥ 0In words, X † X must be a matrix whose entries are each nonnegative.2.57 When closed convex cone K has empty interior, K ∗ has no extreme directions.

174 CHAPTER 2. CONVEX GEOMETRYWhen X is full-rank, then the unique biorthogonal expansion of x ∈ Kbecomes (321)N∑x = XX † x = Γ i Γ ∗Ti x (329)whose coordinates Γ ∗Ti x must be nonnegative because K is assumed pointedclosed and convex. Whenever X is full-rank, so is its pseudoinverse X † .(E) In the present case, the columns of X †T are linearly independent andgenerators of the dual cone K ∗ ∩ aff K ; hence, the columns constitute itsextreme directions. (2.10) That section of the dual cone is itself a polyhedralcone (by (234) or the cone intersection theorem,2.7.2.1.1) having the samenumber of extreme directions as K .2.13.8.2 x ∈ aff KThe extreme directions of K and K ∗ ∩ aff K have a distinct relationship;because X † X = I , then for i,j = 1... N , Γ T i Γ ∗ i = 1, while for i ≠ j ,Γ T i Γ ∗ j = 0. Yet neither set of extreme directions, {Γ i } nor {Γ ∗ i } , isnecessarily orthogonal. This is, precisely, a biorthogonality condition,[238,2.2.4] [125] implying each set of extreme directions is linearlyindependent. (B.1.1.1)The biorthogonal expansion therefore applies more broadly; meaning,for any x ∈ aff K , vector x can be uniquely expressed x =Xb whereb∈R N because aff K contains the origin. Thus, for any such x∈ R(X)(conferE.1.1), biorthogonal expansion (329) becomes x =XX † Xb =Xb .2.13.9 Formulae, algorithm finding dual cone2.13.9.1 Pointed K , dual, X skinny-or-square full-rankWe wish to derive expressions for a convex cone and its ordinary dualunder the general assumptions: pointed polyhedral K denoted by its linearlyindependent extreme directions arranged columnar in matrix X such thatThe vertex-description is given:i=1rank(X ∈ R n×N ) = N ∆ = dim aff K ≤ n (330)K = {Xa | a ≽ 0} ⊆ R n (331)

2.13. DUAL CONE & GENERALIZED INEQUALITY 175from which a halfspace-description for the dual cone follows directly:By defining a matrixK ∗ = {y ∈ R n | X T y ≽ 0} (332)X ⊥ ∆ = basis N(X T ) (333)(a columnar basis for the orthogonal complement of R(X)), we can sayaff cone X = aff K = {x | X ⊥T x = 0} (334)meaning K lies in a subspace, perhaps R n . Thus we have ahalfspace-descriptionK = {x ∈ R n | X † x ≽ 0, X ⊥T x = 0} (335)and from (257), a vertex-description 2.58K ∗ = { [X †T X ⊥ −X ⊥ ]b | b ≽ 0 } ⊆ R n (336)These results are summarized for a pointed polyhedral cone, havinglinearly independent generators, and its ordinary dual:2.13.9.2 Simplicial caseCone Table 1 K K ∗vertex-description X X †T , ±X ⊥halfspace-description X † , X ⊥T X TWhen a convex cone is simplicial (2.12.3), Cone Table 1 simplifies becausethen aff coneX = R n : For square X and assuming simplicial K such thatrank(X ∈ R n×N ) = N ∆ = dim aff K = n (337)we haveCone Table S K K ∗vertex-description X X †Thalfspace-description X † X T2.58 These descriptions are not unique. A vertex-description of the dual cone, for example,might use four conically independent generators for a plane (2.10.0.0.1) when only threewould suffice.

176 CHAPTER 2. CONVEX GEOMETRYFor example, vertex-description (336) simplifies toK ∗ = {X †T b | b ≽ 0} ⊂ R n (338)Now, because dim R(X) = dim R(X †T ) , (E) the dual cone K ∗ is simplicialwhenever K is.2.13.9.3 Membership relations in a subspaceIt is obvious by definition (246) of the ordinary dual cone K ∗ in ambient vectorspace R that its determination instead in subspace M⊆ R is identical toits intersection with M ; id est, assuming closed convex cone K ⊆ M andK ∗ ⊆ R(K ∗ were ambient M) ≡ (K ∗ in ambient R) ∩ M (339)because{y ∈ M | 〈y, x〉 ≥ 0 for all x ∈ K} = {y ∈ R | 〈y, x〉 ≥ 0 for all x ∈ K} ∩ M(340)From this, a constrained membership relation for the ordinary dual coneK ∗ ⊆ R , assuming x,y ∈ M and closed convex cone K ⊆ My ∈ K ∗ ∩ M ⇔ 〈y, x〉 ≥ 0 for all x ∈ K (341)By closure in subspace M we have conjugation (2.13.1.1):x ∈ K ⇔ 〈y, x〉 ≥ 0 for all y ∈ K ∗ ∩ M (342)This means membership determination in subspace M requires knowledgeof the dual cone only in M . For sake of completeness, for proper cone Kwith respect to subspace M (confer (267))x ∈ int K ⇔ 〈y, x〉 > 0 for all y ∈ K ∗ ∩ M, y ≠ 0 (343)x ∈ K , x ≠ 0 ⇔ 〈y, x〉 > 0 for all y ∈ int K ∗ ∩ M (344)(By conjugation, we also have the dual relations.) Yet when M equals aff Kfor K a closed convex conex ∈ rel int K ⇔ 〈y, x〉 > 0 for all y ∈ K ∗ ∩ aff K , y ≠ 0 (345)x ∈ K , x ≠ 0 ⇔ 〈y, x〉 > 0 for all y ∈ rel int(K ∗ ∩ aff K) (346)

182 CHAPTER 2. CONVEX GEOMETRYwhileK ∗ M = {y ∈ R n | X ∗† y ≽ 0, X ∗⊥T y = 0} (359)is the dual monotone cone halfspace-description.2.13.9.5 More pointed cone descriptions with equality conditionConsider pointed polyhedral cone K having a linearly independent set ofgenerators and whose subspace membership is explicit; id est, we are giventhe ordinary halfspace-descriptionK = {x | Ax ≽ 0, Cx = 0} ⊆ R n(234a)where A ∈ R m×n and C ∈ R p×n . This can be equivalently written in termsof nullspace of C and vector ξ :K = {Zξ ∈ R n | AZξ ≽ 0} (360)where R(Z ∈ R n×n−rank C ) ∆ = N(C) . Assuming (330) is satisfiedrankX ∆ = rank ( (AZ) † ∈ R n−rank C×m) = m − l = dim aff K ≤ n − rankC(361)where l is the number of conically dependent rows in AZ (2.10)that must be removed to make ÂZ before the cone tables becomeapplicable. 2.60 Then the results collected in the cone tables admit theassignment ˆX =(ÂZ)† ∆ ∈ R n−rank C×m−l , where Â ∈ Rm−l×n , followed withlinear transformation by Z . So we get the vertex-description, for (ÂZ)†skinny-or-square full-rank,K = {Z(ÂZ)† b | b ≽ 0} (362)From this and (292) we get a halfspace-description of the dual coneK ∗ = {y ∈ R n | (Z T Â T ) † Z T y ≽ 0} (363)From this and Cone Table 1 (p.175) we get a vertex-description, (1443)K ∗ = {[Z †T (ÂZ)T C T −C T ]c | c ≽ 0} (364)2.60 When the conically dependent rows are removed, the rows remaining must be linearlyindependent for the cone tables to apply.

184 CHAPTER 2. CONVEX GEOMETRY2.13.10.0.2 Example. Optimality conditions for conic problem.Consider a convex optimization problem having real differentiable convexobjective function ˆf(x) : R n →R defined on domain R n ;minimize ˆf(x)xsubject to x ∈ K(370)The feasible set is a pointed polyhedral cone K possessing a linearlyindependent set of generators and whose subspace membership is madeexplicit by fat full-rank matrix C ∈ R p×n ; id est, we are given thehalfspace-descriptionK = {x | Ax ≽ 0, Cx = 0} ⊆ R n(234a)where A∈ R m×n . The vertex-description of this cone, assuming (ÂZ)†skinny-or-square full-rank, isK = {Z(ÂZ)† b | b ≽ 0} (362)where Â∈ Rm−l×n , l is the number of conically dependent rows in AZ(2.10) that must be removed, and Z ∈ R n×n−rank C holds basis N(C)columnar. From optimality condition (285),because∇ ˆf(x ⋆ ) T (Z(ÂZ)† b − x ⋆ )≥ 0 ∀b ≽ 0 (371)−∇ ˆf(x ⋆ ) T Z(ÂZ)† (b − b ⋆ )≤ 0 ∀b ≽ 0 (372)x ⋆ ∆ = Z(ÂZ)† b ⋆ ∈ K (373)From membership relation (368) and Example 2.13.10.0.1〈−(Z T Â T ) † Z T ∇ ˆf(x ⋆ ), b − b ⋆ 〉 ≤ 0 for all b ∈ R m−l+⇔(374)−(Z T Â T ) † Z T ∇ ˆf(x ⋆ ) ∈ −R m−l+ ∩ b ⋆⊥Then the equivalent necessary and sufficient conditions for optimality of theconic program (370) with pointed polyhedral feasible set K are: (confer (291))(Z T Â T ) † Z T ∇ ˆf(x ⋆ ) ≽R m−l+0 , b ⋆ ≽ 0 , ∇ ˆf(x ⋆ ) T Z(ÂZ)† b ⋆ = 0 (375)R m−l+

2.13. DUAL CONE & GENERALIZED INEQUALITY 185When K = R n + , in particular, then C =0, A=Z =I ∈ S n ; id est,minimize ˆf(x)xsubject to x ≽ 0 (376)R n +The necessary and sufficient conditions become (confer [38,4.2.3])∇ ˆf(x ⋆ ) ≽ 0 , x ⋆ ≽ 0 , ∇ ˆf(x ⋆ ) T x ⋆ = 0 (377)R n +R n +2.13.10.0.3 Example. Linear complementarity. [171] [198]Given matrix A ∈ R n×n and vector q ∈ R n , the complementarity problem isa feasibility problem:find w , zsubject to w ≽ 0z ≽ 0w T z = 0w = q + Az(378)Volumes have been written about this problem, most notably by Cottle [48].The problem is not convex if both vectors w and z are variable. But if one ofthem is fixed, then the problem becomes convex with a very simple geometricinterpretation: Define the affine subsetA ∆ = {y ∈ R n | Ay=w − q} (379)For w T z to vanish, there must be a complementary relationship between thenonzero entries of vectors w and z ; id est, w i z i =0 ∀i. Given w ≽0, thenz belongs to the convex set of feasible solutions:z ∈ −K ⊥ R n + (w ∈ Rn +) ∩ A = R n + ∩ w ⊥ ∩ A (380)where KR ⊥ (w) is the normal cone to n Rn + + at w (369). If this intersection isnonempty, then the problem is solvable.

186 CHAPTER 2. CONVEX GEOMETRY2.13.11 Proper nonsimplicial K , dual, X fat full-rankAssume we are given a set of N conically independent generators 2.61 (2.10)of an arbitrary polyhedral proper cone K in R n arranged columnar inX ∈ R n×N such that N > n (fat) and rankX = n . Having found formula(338) to determine the dual of a simplicial cone, the easiest way to find avertex-description of the proper dual cone K ∗ is to first decompose K intosimplicial parts K i so that K = ⋃ K i . 2.62 Each component simplicial conein K corresponds to some subset of n linearly independent columns from X .The key idea, here, is how the extreme directions of the simplicial parts mustremain extreme directions of K . Finding the dual of K amounts to findingthe dual of each simplicial part:2.13.11.0.1 Theorem. Dual cone intersection. [210,2.7]Suppose proper cone K ⊂ R n equals the union of M simplicial cones K i whoseextreme directions all coincide with those of K . Then proper dual cone K ∗is the intersection of M dual simplicial cones K ∗ i ; id est,M⋃M⋂K = K i ⇒ K ∗ = K ∗ i (381)i=1Proof. For X i ∈ R n×n , a complete matrix of linearly independentextreme directions (p.133) arranged columnar, corresponding simplicial K i(2.12.3.1.1) has vertex-descriptioni=1K i = {X i c | c ≽ 0} (382)⋄Now suppose,K =M⋃K i =i=1M⋃{X i c | c ≽ 0} (383)i=12.61 We can always remove conically dependent columns from X to construct K or todetermine K ∗ . (G.2)2.62 That proposition presupposes, of course, that we know how to perform simplicialdecomposition efficiently; also called “triangulation”. [190] [100,3.1] [101,3.1] Existenceof multiple simplicial parts means expansion of x∈ K like (329) can no longer be uniquebecause N the number of extreme directions in K exceeds n the dimension of the space.

2.13. DUAL CONE & GENERALIZED INEQUALITY 187The union of all K i can be equivalently expressed⎧⎡ ⎤⎪⎨K =[ X 1 X 2 · · · X M ] ⎢⎣⎪⎩aḅ.c⎥⎦ | a,b,... ,c ≽ 0 ⎫⎪ ⎬⎪ ⎭(384)Because extreme directions of the simplices K i are extreme directions of Kby assumption, then by the extremes theorem (2.8.1.1.1),K = { [ X 1 X 2 · · · X M ]d | d ≽ 0 } (385)Defining X ∆ = [X 1 X 2 · · · X M ] (with any redundant [sic] columns optionallyremoved from X), then K ∗ can be expressed, (292) (Cone Table S, p.175)K ∗ = {y | X T y ≽ 0} =M⋂{y | Xi T y ≽ 0} =i=1M⋂K ∗ i (386)i=1To find the extreme directions of the dual cone, first we observe that somefacets of each simplicial part K i are common to facets of K by assumption,and the union of all those common facets comprises the set of all facets ofK by design. For any particular polyhedral proper cone K , the extremedirections of dual cone K ∗ are respectively orthogonal to the facets of K .(2.13.6.1) Then the extreme directions of the dual cone can be found amongthe inward-normals to facets of the component simplicial cones K i ; thosenormals are extreme directions of the dual simplicial cones K ∗ i . From thetheorem and Cone Table S (p.175),K ∗ =M⋂K ∗ i =i=1M⋂i=1{X †Ti c | c ≽ 0} (387)The set of extreme directions {Γ ∗ i } for proper dual cone K ∗ is thereforeconstituted by the conically independent generators, from the columnsof all the dual simplicial matrices {X †Ti } , that do not violate discretedefinition (292) of K ∗ ;{ } {}Γ ∗ 1 , Γ ∗ 2 ... Γ ∗ N = c.i. X †Ti (:,j), i=1... M , j =1... n | X † i (j,:)Γ l ≥ 0, l =1... N(388)

188 CHAPTER 2. CONVEX GEOMETRYwhere c.i. denotes selection of only the conically independent vectors fromthe argument set, argument (:,j) denotes the j th column while (j,:) denotesthe j th row, and {Γ l } constitutes the extreme directions of K . Figure 36(b)(p.132) shows a cone and its dual found via this formulation.2.13.11.0.2 Example. Dual of K nonsimplicial in subspace aff K .Given conically independent generators for pointed closed convex cone K inR 4 arranged columnar inX = [ Γ 1 Γ 2 Γ 3 Γ 4 ] =⎡⎢⎣1 1 0 0−1 0 1 00 −1 0 10 0 −1 −1⎤⎥⎦ (389)having dim aff K = rankX = 3, then performing the most inefficientsimplicial decomposition in aff K we findX 1 =X 3 =4X †T1 =⎢⎣⎡⎢⎣⎡⎢⎣1 1 0−1 0 10 −1 00 0 −11 0 0−1 1 00 0 10 −1 −1⎤⎥⎦ , X 2 =⎤⎥⎦ , X 4 =⎡⎢⎣⎡⎢⎣1 1 0−1 0 00 −1 10 0 −11 0 00 1 0−1 0 10 −1 −1⎤⎥⎦⎤⎥⎦(390)The corresponding dual simplicial cones in aff K have generators respectivelycolumnar in⎡ ⎤ ⎡ ⎤2 1 1−2 1 1 ⎢4X †T3 =⎡⎢⎣2 −3 1−2 1 −33 2 −1−1 2 −1−1 −2 3−1 −2 −1⎥⎦ ,⎤⎥⎦ ,4X†T 2 =4X†T 4 =⎢⎣⎡⎢⎣1 2 1−3 2 11 −2 11 −2 −33 −1 2−1 3 −2−1 −1 2−1 −1 −2⎥⎦⎤⎥⎦(391)

2.13. DUAL CONE & GENERALIZED INEQUALITY 189Applying (388) we get[ ]Γ ∗ 1 Γ ∗ 2 Γ ∗ 3 Γ ∗ 4⎡= 1 ⎢4⎣1 2 3 21 2 −1 −21 −2 −1 2−3 −2 −1 −2⎤⎥⎦ (392)whose rank is 3, and is the known result; 2.63 the conically independentgenerators for that pointed section of the dual cone K ∗ in aff K ; id est,K ∗ ∩ aff K .2.63 These calculations proceed so as to be consistent with [62,6]; as if the ambient vectorspace were the proper subspace aff K whose dimension is 3. In that ambient space, Kmay be regarded as a proper cone. Yet that author (from the citation) erroneously statesthe dimension of the ordinary dual cone to be 3 ; it is, in fact, 4.

190 CHAPTER 2. CONVEX GEOMETRY

Chapter 3Geometry of convex functionsThe link between convex sets and convex functions is via theepigraph: A function is convex if and only if its epigraph is aconvex set.−Stephen Boyd & Lieven Vandenberghe [38,3.1.7]We limit our treatment of multidimensional functions 3.1 to finite-dimensionalEuclidean space. Then the icon for the one-dimensional (real)convex function is bowl-shaped (Figure 54), whereas the concave icon is theinverted bowl; respectively characterized by a unique global minimum andmaximum whose existence is assumed. Because of this simple relationship,usage of the term convexity is often implicitly inclusive of concavity in theliterature. Despite the iconic imagery, the reader is reminded that the setof all convex, concave, quasiconvex, and quasiconcave functions contains themonotone [126] [133,2.3.5] functions; e.g., [38,3.6, exer.3.46].3.1 vector- or matrix-valued functions including the real functions. Appendix D, with itstables of first- and second-order gradients, is the practical adjunct to this chapter.2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.191

192 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1 Convex function3.1.1 Vector-valued functionVector-valued function f(X) : R p×k →R M assigns each X in its domaindomf (a subset of ambient vector space R p×k ) to a specific element [160, p.3]of its range (a subset of R M ). Function f(X) is linear in X on its domain ifand only if, for each and every Y,Z ∈domf and α , β ∈ Rf(αY + βZ) = αf(Y ) + βf(Z) (393)A vector-valued function f(X) : R p×k →R M is convex in X if and only ifdomf is a convex set and, for each and every Y,Z ∈domf and 0≤µ≤1f(µY + (1 − µ)Z) ≼µf(Y ) + (1 − µ)f(Z) (394)R M +As defined, continuity is implied but not differentiability. Reversing thesense of the inequality flips this definition to concavity. Linear functions are,apparently, simultaneously convex and concave.Vector-valued functions are most often compared (145) as in (394) withrespect to the M-dimensional self-dual nonnegative orthant R M + , a propercone. 3.2 In this case, the test prescribed by (394) is simply a comparison on Rof each entry of the vector-valued function. (2.13.4.2.2) The vector-valuedfunction case is therefore a straightforward generalization of conventionalconvexity theory for a real function. [38,3,4] This conclusion follows fromtheory of generalized inequality (2.13.2.0.1) which assertsf convex ⇔ w T f convex ∀w ∈ G(R M + ) (395)shown by substitution of the defining inequality (394). Discretization(2.13.4.2.1) allows relaxation of the semi-infinite number of conditions∀w ≽ 0 to:∀w ∈ G(R M + ) = {e i , i=1... M} (396)(the standard basis for R M and a minimal set of generators (2.8.1.2) for R M + )from which the stated conclusion follows; id est, the test for convexity of avector-valued function is a comparison on R of each entry.3.2 The definition of convexity can be broadened to other (not necessarily proper) cones;referred to in the literature as K-convexity. [185]

3.1. CONVEX FUNCTION 193ˆf 1 (x)ˆf2 (x)(a)(b)Figure 50: Each convex real function has a unique minimizer x ⋆ but,for x∈ R , ˆf1 (x)=x 2 is strictly convex whereas ˆf2 (x)= √ x 2 =|x| is not.Strict convexity of a real function is therefore only a sufficient condition forminimizer uniqueness.Relation (395) implies the set of all vector-valued convex functions in R Mis a convex cone. (Proof pending.) Indeed, any nonnegatively weighted sumof (strictly) convex functions remains (strictly) convex. 3.3 Interior to thecone would be the strictly convex functions.3.1.1.1 Strict convexityWhen f(X) instead satisfies, for each and every distinct Y and Z in dom fand all 0

194 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSAny convex real function ˆf(X) has unique minimum value over anyconvex subset of its domain. Yet solution to some convex optimizationproblem is, in general, not unique; e.g., given a minimization of a convexreal function over some abstracted convex set Cminimize ˆf(X)Xsubject to X ∈ C(399)any optimal solution X ⋆ comes from a convex set of optimal solutionsX ⋆ ∈ {X | ˆf(X) = infY ∈ Cˆf(Y ) } (400)But a strictly convex real function has a unique minimizer X ⋆ ; id est, for theoptimal solution set in (400) to be a single point, it is sufficient (Figure 50)that ˆf(X) be a strictly convex real function and set C convex.It is customary to consider only a real function for the objective of aconvex optimization problem because vector- or matrix-valued functions canintroduce ambiguity into the optimal value of the objective. (2.7.2.2)Quadratic real functions x T Px + q T x + r characterized by a symmetricpositive definite matrix P are strictly convex. The vector 2-norm squared‖x‖ 2 (Euclidean norm squared) and Frobenius norm squared ‖X‖ 2 F , forexample, are strictly convex functions of their respective argument (eachnorm is convex but not strictly convex). Figure 50(a) illustrates a strictlyconvex real function.3.1.1.2 Affine functionA function f(X) is affine when it is continuous and has the dimensionallyextensible form (confer2.9.1.0.2)f(X) = AX + B (401)When B=0 then f(X) is a linear function. All affine functions aresimultaneously convex and concave.The real affine function in Figure 51 illustrates hyperplanes in its domainconstituting contours of equal function-value (level sets).

3.1. CONVEX FUNCTION 195Variegated multidimensional affine functions are recognized by theexistence of no multivariate terms in argument entries and no polynomialterms in argument entries of degree higher than 1 ; id est, entries of thefunction are characterized only by linear combinations of the argumententries plus constants.For X ∈ S M and matrices A,B, Q, R of any compatible dimensions, forexample, the expression XAX is not affine in X whereas[ ]R Bg(X) =T XXB Q + A T (402)X + XAis an affine multidimensional function.engineering control. [256,2.2] 3.4 [37] [81]Such a function is typical in3.1.1.2.1 Example. Linear objective.Consider minimization of a real affine function ˆf(z)= a T z + b over theconvex feasible set C in its domain R 2 illustrated in Figure 51. Sincevector b is fixed, the problem posed is the same as the convex optimizationminimize a T zzsubject to z ∈ C(403)whose objective of minimization is a real linear function. Were convex set Cpolyhedral (2.12), then this problem would be called a linear program.Were C a positive semidefinite cone, then this problem would be called asemidefinite program.There are two distinct ways to visualize this problem: one in theobjective function’s domain R 2 , [ the other ] including the ambient space of theR2objective function’s range as in . Both visualizations are illustratedRin Figure 51. Visualization in the function domain is easier because of lowerdimension and because level sets of any affine function are affine (2.1.9).In this circumstance, the level sets are parallel hyperplanes with respectto R 2 . One solves optimization problem (403) graphically by finding thathyperplane intersecting feasible set C furthest right (in the direction ofnegative gradient (3.1.1.4)).3.4 The interpretation from this citation of {X ∈ S M | g(X) ≽ 0} as “an intersectionbetween a linear subspace and the cone of positive semidefinite matrices” is incorrect.(See2.9.1.0.2 for a similar example.) The conditions they state under which strongduality holds for semidefinite programming are conservative. (confer6.2.3.0.1)

196 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSˆf(z)Az 2z 1CH−H+a{z ∈ R 2 | a T z = κ1 }{z ∈ R 2 | a T z = κ2 }{z ∈ R 2 | a T z = κ3 }Figure 51: Drawn (truncated) are cartesian axes in R 3 and three hyperplanesintersecting convex set C ⊂ R 2 reproduced from Figure 18. Plotted with thirddimension is affine set A = ˆf(R 2 ) a plane. Sequence of hyperplanes, w.r.tdomain R 2 of an affine function ˆf(z)= a T z + b : R 2 → R , is increasing indirection of gradient a (3.1.1.4.1) because affine function increases in normaldirection (Figure 16).

3.1. CONVEX FUNCTION 197{a T z 1 + b 1 | a∈ R}supa T pz i + b ii{a T z 2 + b 2 | a∈ R}{a T z 3 + b 3 | a∈ R}a{a T z 4 + b 4 | a∈ R}{a T z 5 + b 5 | a∈ R}Figure 52: Pointwise supremum of convex functions remains a convexfunction. Illustrated is a supremum of affine functions in variable a evaluatedfor a particular argument a p . Topmost affine function is supremum for eachvalue of a .

198 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.1.2.2 Example. Support function. [38,3.2]For arbitrary set Y ⊆ R n , its support function σ Y (a) : R n → R is definedσ Y (a) = ∆ supa T z (404)z∈Ywhose range contains ±∞ [154, p.135]. For each z ∈ Y , a T z is a linearfunction of vector a . Because σ Y (a) is the pointwise supremum of linearfunctions, it is convex in a . (Figure 52) Application of the support functionis illustrated in Figure 19(a) for one particular normal a . 3.1.1.3 Epigraph, sublevel setIt is well established that a continuous real function is convex if andonly if its epigraph makes a convex set. [123] [194] [231] [242] [154]Thereby, piecewise-continuous convex functions are admitted. Epigraph isthe connection between convex sets and convex functions. Its generalizationto a vector-valued function f(X) : R p×k →R M is straightforward: [185]epif ∆ = {(X, t) | X ∈domf , f(X) ≼t } ⊆ R p×k × R M (405)R M +id est,f convex ⇔ epi f convex (406)Necessity is proven: [38, exer.3.60] Given any (X, u), (Y , v)∈ epif , wemust show for all µ ∈[0, 1] that µ(X, u) + (1−µ)(Y , v) ∈ epi f ; id est, wemust showf(µX + (1−µ)Y ) ≼R M +µu + (1−µ)v (407)Yet this holds by definition because f(µX+(1−µ)Y ) ≼ µf(X)+(1−µ)f(Y ).The converse also holds.Sublevel sets of a real convex function are convex. Likewise, sublevel setsL ν f ∆ = {X ∈ domf | f(X) ≼ν } ⊆ R p×k (408)R M +

3.1. CONVEX FUNCTION 19921.510.5Y 20−0.5−1−1.5−2−2 −1.5 −1 −0.5 0 0.5 1 1.5 2Figure 53: Gradient in R 2 evaluated on grid over some open disc in domainof convex quadratic bowl ˆf(Y )= Y T Y : R 2 → R illustrated in Figure 54.Circular contours are level sets; each defined by a constant function-value.Y 1of a vector-valued convex function are convex. As for real functions, theconverse does not hold.To prove necessity of convex sublevel sets: For any X, Y ∈ L ν f we mustshow for each and every µ∈[0, 1] that µX + (1−µ)Y ∈ L ν f . By definition,f(µX + (1−µ)Y ) ≼R M +µf(X) + (1−µ)f(Y ) ≼R M +ν (409)3.1.1.4 GradientThe gradient ∇f (D.1) of any differentiable multidimensional function fmaps each entry f i to a space having the same dimension as the ambientspace of its domain. The gradient can be interpreted as a vector pointingin the direction of greatest change. [135,15.6] The gradient can also beinterpreted as that vector normal to a level set; e.g., Figure 55.

200 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSFor the quadratic bowl in Figure 54, the gradient maps to R 2 ; illustratedin Figure 53. For a one-dimensional function of real variable, for example,the gradient evaluated at any point in the function domain is just the slope(or derivative) of that function there. (conferD.1.4.1)3.1.1.4.1 Example. Hyperplane, line, described by affine function.Consider the real affine function of vector variable,ˆf(x) : R p → R = a T x + b (410)whose domain is R p and whose gradient ∇ ˆf(x)=a is a constant vector(independent of x). This function describes the real line R (its range), andit describes a nonvertical [123,B.1.2] hyperplane ∂H in the space R p × Rfor any particular vector a (confer2.4.2);{[∂H =having nonzero normalxa T x + bη =[ a−1] }| x∈ R p ⊂ R p ×R (411)]∈ R p ×R (412)This equivalence to a hyperplane holds only for real functions. 3.5[ The ]epigraph of the real affine function ˆf(x) Rpis therefore a halfspace in ,R3.5 To prove that, consider a vector-valued affine functionf(x) : R p →R M = Ax + bhaving gradient ∇f(x)=A T ∈ R p×M : The affine set{[ ] }x| x∈ R p ⊂ R p ×R MAx + bis perpendicular tobecauseη T ([η ∆ =[ ∇f(x)−IxAx + b]∈ R p×M × R M×M] [ 0−b])= 0 ∀x ∈ R pYet η is a vector (in R p ×R M ) only when M = 1.

3.1. CONVEX FUNCTION 201so we have:The real affine function is to convex functionsasthe hyperplane is to convex sets.Similarly, the matrix-valued affine function of real variable x , for anyparticular matrix A∈ R M×N ,describes a line in R M×N in direction Aand describes a line in R×R M×N{[h(x) : R→R M×N = Ax + B (413){Ax + B | x∈ R} ⊆ R M×N (414)xAx + B]}| x∈ R ⊂ R×R M×N (415)whose slope with respect to x is A .3.1.1.5 First-order convexity condition, real functionDiscretization of w ≽0 in (395) invites refocus to the real-valued function:3.1.1.5.1 Theorem. Necessary and sufficient convexity condition.[38,3.1.3] [70,I.5.2] [259,1.2.3] [27,1.2] [210,4.2] [193,3] For realdifferentiable function ˆf(X) : R p×k →R with matrix argument on openconvex domain, the condition (D.1.6)ˆf(Y ) ≥ ˆf(X)〈+ ∇ ˆf(X)〉, Y − X for each and every X,Y ∈ dom ˆf (416)is necessary and sufficient for convexity of ˆf .⋄When ˆf(X) : R p →R is a real differentiable convex function with vectorargument on open convex domain, there is simplification of the first-ordercondition (416); for each and every X,Y ∈ dom ˆfˆf(Y ) ≥ ˆf(X) + ∇ ˆf(X) T (Y − X) (417)

202 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSˆf(Y )[∇ ˆf(X)−1]∂H −Figure 54: When a real function ˆf is differentiable at each point in its opendomain, there is an intuitive geometric interpretation of function convexityin terms of its gradient ∇ ˆf and its epigraph: Drawn is a convex quadraticbowl in R 2 ×R (confer Figure 96, p.505); ˆf(Y )= Y T Y : R 2 → R versus Yon some open disc in R 2 . Supporting hyperplane ∂H − ∈ R 2 × R (which istangent, only partially drawn) and its normal vector [ ∇ ˆf(X) T −1 ] T at theparticular point of support [X T ˆf(X) ] T are illustrated. The interpretation:At each and every coordinate Y , there is such a hyperplane containing[Y T ˆf(Y ) ] T and supporting the epigraph.

3.1. CONVEX FUNCTION 203From this we can find a unique [242,5.5.4] nonvertical [123,B.1.2]hyperplane ∂H − (2.4),[expressed]in terms of the function gradient,supporting epi ˆf Xat : videlicet, defining ˆf(Y /∈ dom ˆf) = ˆf(X)∆ ∞[38,3.1.7][ Yt]∈ epi ˆf ⇔ t ≥ ˆf(Y ) ⇒[∇ ˆf(X) T] ([ Y−1t] [−Xˆf(X)])≤ 0(418)This means, for each and every point X in the domain of a real convex[function ˆf(X) ], there exists a hyperplane ∂H −[in R p ×]R having normal∇ ˆf(X)Xsupporting the function epigraph at ∈ ∂H −−1ˆf(X){[ Y∂H − =t]∈ R p × R[∇ ˆf(X) T] ([ Y−1t] [−Xˆf(X)]) }= 0(419)One such supporting hyperplane (confer Figure 19(a)) is illustrated inFigure 54 for a convex quadratic.From (417) we deduce, for each and every X,Y ∈ dom ˆf∇ ˆf(X) T (Y − X) ≥ 0 ⇒ ˆf(Y ) ≥ ˆf(X) (420)meaning, the gradient at X identifies a supporting hyperplane there in R p∂H − ∩ R p = {Y ∈ R p | ∇ ˆf(X) T (Y − X) = 0} (421)to the convex sublevel sets of convex function ˆf (confer (408))L ˆf(X)ˆf ∆ = {Y ∈ dom ˆf | ˆf(Y ) ≤ ˆf(X)} ⊆ R p (422)illustrated for an arbitrary real convex function in Figure 55.

204 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSαβα ≥ β ≥ γ∇ ˆf(X)Y−Xγ{Z | ˆf(Z) = α}{Y | ∇ ˆf(X) T (Y − X) = 0, ˆf(X)=α}Figure 55: Shown is a plausible contour plot in R 2 of some arbitrary realconvex function ˆf(Z) at selected levels α , β , and γ ; contours of equallevel ˆf (level sets) drawn in the function’s domain. A convex functionhas convex sublevel sets L ˆf(X)ˆf (422). [194,4.6] The sublevel set whoseboundary is the level set at α , for instance, comprises all the shaded regions.For any particular convex function, the family comprising all its sublevel setsis nested. [123, p.75] Were the sublevel sets not convex, we may certainlyconclude the corresponding function is neither convex. Contour plots of realaffine functions are illustrated in Figure 16 and Figure 51.

3.1. CONVEX FUNCTION 2053.1.1.5.2 Theorem. Gradient monotonicity. [123,B.4.1.4][36,3.1, exer.20] Given ˆf(X) : R p×k →R a real differentiable function withmatrix argument on open convex domain, the condition〈∇ ˆf(Y ) − ∇ ˆf(X)〉, Y − X ≥ 0 for each and every X,Y ∈ dom ˆf (423)is necessary and sufficient for convexity of ˆf . Strict inequality and caveatdistinct Y ,X provide necessary and sufficient conditions for strict convexity.⋄3.1.1.6 First-order convexity condition, vector-valued functionNow consider the first-order necessary and sufficient condition for convexityof a vector-valued function: Differentiable function f(X) : R p×k →R M isconvex if and only if domf is open, convex, and for each and everyX,Y ∈ domff(Y ) ≽R M +→Y −Xf(X) +→Y −Xdf(X)= f(X) + d dt∣ f(X+ t (Y − X)) (424)t=0where df(X) is the directional derivative 3.6 [135] [213] of f at X in directionY − X . This, of course, follows from the real-valued function case: bygeneralized inequality (2.13.2.0.1),f(Y ) − f(X) −→Y −Xdf(X) ≽0 ⇔〈〉→Y −Xf(Y ) − f(X) − df(X) , w ≥ 0 ∀w ≽ 0whereR M +→Y −Xdf(X) =⎡⎢⎣(425)tr ( ∇f 1 (X) T (Y − X) ) ⎤tr ( ∇f 2 (X) T (Y − X) )⎥.tr ( ∇f M (X) T (Y − X) ) ⎦ ∈ RM (426)R M +3.6 We extend the traditional definition of directional derivative inD.1.4 so that directionmay be indicated by a vector or a matrix, thereby broadening the scope of the Taylorseries (D.1.6). The right-hand side of the inequality (424) is the first-order Taylor seriesexpansion of f about X .

206 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSNecessary and sufficient discretization (2.13.4.2.1) allows relaxation of thesemi-infinite number of conditions w ≽ 0 instead to w ∈ {e i , i=1... M} theextreme directions of the nonnegative orthant. Each extreme direction picksout an entry from the vector-valued function and its directional derivative,satisfying Theorem 3.1.1.5.1.The vector-valued function case (424) is therefore a straightforwardapplication of the first-order convexity condition for real functions to eachentry of the vector-valued function.3.1.1.7 Second-order convexity condition, vector-valued functionAgain, by discretization, we are obliged only to consider each individual entryf i of a vector-valued function f .For f(X) : R p →R M , a twice differentiable vector-valued function withvector argument on open convex domain,∇ 2 f i (X) ≽0 ∀X ∈ domf , i=1... M (427)S p +is a necessary and sufficient condition for convexity of f .Strict inequality is a sufficient condition for strict convexity, but that isnothing new; videlicet, the strictly convex real function f i (x)=x 4 does nothave positive second derivative at each and every x∈ R . Quadratic formsconstitute a notable exception where the strict-case converse is reliably true.3.1.1.8 second-order ⇒ first-order conditionFor a twice-differentiable real function f i (X) : R p →R having open domain, aconsequence of the mean value theorem from calculus allows compression ofits complete Taylor series expansion about X ∈ domf i (D.1.6) to threeterms: On some open interval of ‖Y ‖ so each and every line segment[X,Y ] belongs to domf i , there exists an α ∈ [0, 1] such that [259,1.2.3][27,1.1.4]f i (Y ) = f i (X)+ ∇f i (X) T (Y −X)+ 1 2 (Y −X)T ∇ 2 f i (αX +(1 −α)Y )(Y −X)(428)The first-order condition for convexity (417) follows directly from this andthe second-order condition (427).

3.1. CONVEX FUNCTION 2073.1.2 Matrix-valued functionWe need different tools for matrix argument: We are primarily interested incontinuous matrix-valued functions g(X). We choose symmetric g(X)∈ S Mbecause matrix-valued functions are most often compared (429) with respectto the positive semidefinite cone S M + in the ambient space of symmetricmatrices. 3.73.1.2.0.1 Definition. Convex matrix-valued function.1) Matrix-definition.A function g(X) : R p×k →S M is convex in X iff domg is a convex set and,for each and every Y,Z ∈domg and all 0≤µ≤1 [133,2.3.7]g(µY + (1 − µ)Z) ≼µg(Y ) + (1 − µ)g(Z) (429)S M +Reversing the sense of the inequality flips this definition to concavity. Strictconvexity is defined less a stroke of the pen in (429) similarly to (397).2) Scalar-definition.It follows that g(X) : R p×k →S M is convex in X iff w T g(X)w : R p×k →R isconvex in X for each and every ‖w‖= 1; shown by substituting the defininginequality (429). By generalized inequality we have the equivalent but morebroad criterion, (2.13.5)g convex ⇔ 〈W , g〉 convex ∀W ≽0 (430)S M +Strict convexity on both sides requires caveat W ≠ 0. Because the set ofall extreme directions for the positive semidefinite cone (2.9.2.3) comprisesa minimal set of generators for that cone, discretization (2.13.4.2.1) allowsreplacement of matrix W with symmetric dyad ww T as proposed. △3.7 Function symmetry is not a necessary requirement for convexity; indeed, for A∈R m×pand B ∈R m×k , g(X) = AX + B is a convex (affine) function in X on domain R p×k withrespect to the nonnegative orthant R+ m×k . Symmetric convex functions share the samebenefits as symmetric matrices. Horn & Johnson [125,7.7] liken symmetric matrices toreal numbers, and (symmetric) positive definite matrices to positive real numbers.

208 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.2.1 First-order convexity condition, matrix-valued functionFrom the scalar-definition we have, for differentiable matrix-valued functiong and for each and every real vector w of unit norm ‖w‖= 1,w T g(Y )w ≥ w T →Y −XTg(X)w + w dg(X) w (431)that follows immediately from the first-order condition (416) for convexity ofa real function because→Y −XTw dg(X) w = 〈 ∇ X w T g(X)w , Y − X 〉 (432)→Y −Xwhere dg(X) is the directional derivative (D.1.4) of function g at X indirection Y − X . By discretized generalized inequality, (2.13.5)g(Y ) − g(X) −→Y −Xdg(X) ≽0 ⇔〈g(Y ) − g(X) −→Y −X〉dg(X) , ww T ≥ 0 ∀ww T (≽ 0)S M +Hence, for each and every X,Y ∈ domg (confer (424))(433)S M +g(Y ) ≽g(X) +→Y −Xdg(X) (434)S M +must therefore be necessary and sufficient for convexity of a matrix-valuedfunction of matrix variable on open convex domain.3.1.2.2 Epigraph of matrix-valued function, sublevel setsWe generalize the epigraph to a continuous matrix-valued functiong(X) : R p×k →S M :epi g ∆ = {(X, T ) | X ∈domg , g(X) ≼T } ⊆ R p×k × S M (435)from which it followsS M +Proof of necessity is similar to that in3.1.1.3.g convex ⇔ epi g convex (436)

3.1. CONVEX FUNCTION 209Sublevel sets of a matrix-valued convex function (confer (408))L V g ∆ = {X ∈ dom g | g(X) ≼V } ⊆ R p×k (437)S M +are convex. There is no converse.3.1.2.3 Second-order convexity condition, matrix-valued function3.1.2.3.1 Theorem. Line theorem. [38,3.1.1]Matrix-valued function g(X) : R p×k →S M is convex in X if and only if itremains convex on the intersection of any line with its domain. ⋄Now we assume a twice differentiable function and drop the subscript S M +from an inequality when apparent.3.1.2.3.2 Definition. Differentiable convex matrix-valued function.Matrix-valued function g(X) : R p×k →S M is convex in X iff domg is anopen convex set, and its second derivative g ′′ (X+ t Y ) : R→S M is positivesemidefinite on each point along every line X+ t Y that intersects domg ;id est, iff for each and every X, Y ∈ R p×k such that X+ t Y ∈ domg oversome open interval of t ∈ Rd 2g(X+ t Y ) ≽ 0 (438)dt2 Similarly, ifd 2g(X+ t Y ) ≻ 0 (439)dt2 then g is strictly convex; the converse is generally false. [38,3.1.4] 3.8 △3.8 Quadratic forms constitute a notable exception where the strict-case converse isreliably true.

210 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.2.3.3 Example. Matrix inverse.The matrix-valued function X µ is convex on int S M + for −1≤µ≤0 or1≤µ≤2 and concave for 0≤µ≤1. [38,3.6.2] The function g(X)=X −1 ,in particular, is convex on int S M + . For each and every Y ∈ S M(D.2.1,A.3.1.0.5)d 2dt 2 g(X+ t Y ) = 2(X+ t Y )−1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 ≽ 0 (440)on some open interval of t ∈ R such that X + t Y ≻0. Hence, g(X) isconvex in X . This result is extensible; 3.9 trX −1 is convex on that samedomain. [125,7.6, prob.2] [36,3.1, exer.25]3.1.2.3.4 Example. Matrix products.Iconic real function ˆf(x)= x 2 is strictly convex on R . The matrix-valuedfunction g(X)=X 2 is convex on the domain of symmetric matrices; forX, Y ∈ S M and any open interval of t ∈ R (D.2.1)d 2d2g(X+ t Y ) =dt2 dt 2(X+ t Y )2 = 2Y 2 (441)which is positive semidefinite when Y is symmetric because then Y 2 = Y T Y(1062). 3.10 A more appropriate matrix-valued counterpart for ˆf isg(X)=X T X which is a convex function on domain X ∈ R m×n , and strictlyconvex whenever X is skinny-or-square full-rank.3.1.2.3.5 Example. Matrix exponential.The matrix-valued function g(X) = e X : S M → S M is convex on the subspaceof circulant [96] symmetric matrices. Applying the line theorem, for all t∈Rand circulant X, Y ∈ S M , from Table D.2.7 we haved 2dt 2eX+ t Y = Y e X+ t Y Y ≽ 0, (XY ) T = XY (442)because all circulant matrices are commutative and, for symmetric matrices,XY = Y X ⇔ (XY ) T = XY (1076). Given symmetric argument, the matrix3.9 d/dt trg(X+ tY ) = trd/dt g(X+ tY ).3.10 By (1077) inA.3.1, changing the domain instead to all symmetric and nonsymmetricpositive semidefinite matrices, for example, will not produce a convex function.

3.2. QUASICONVEX 211exponential always resides interior to the cone of positive semidefinitematrices in the symmetric matrix subspace; e A ≻0 ∀A∈ S M (1438). Thenfor any matrix Y of compatible dimension, Y T e A Y is positive semidefinite.(A.3.1.0.5)The subspace of circulant symmetric matrices contains all diagonalmatrices. The matrix exponential of any diagonal matrix e Λ exponentiateseach individual entry on the main diagonal. [155,5.3] So, changingthe function domain to the subspace of real diagonal matrices reduces thematrix exponential to a vector-valued function in an isometrically isomorphicsubspace R M ; known convex (3.1.1) from the real-valued function case[38,3.1.5].There are, of course, multifarious methods to determine functionconvexity, [38] [27] [70] each of them efficient when appropriate.3.2 QuasiconvexQuasiconvex functions [38,3.4] [123] [210] [242] [151,2] are useful inpractical problem solving because they are unimodal (by definition whennonmonotone); a global minimum is guaranteed to exist over any convex setin the function domain. The scalar definition of convexity carries over toquasiconvexity:3.2.0.0.1 Definition. Quasiconvex matrix-valued function.g(X) : R p×k →S M is a quasiconvex function of matrix X iff domg is a convexset and for each and every: Y,Z ∈domg , 0 ≤µ≤1, and real vector w ofunit norm,w T g(µ Y + (1 − µ)Z)w ≤ max{w T g(Y )w , w T g(Z)w} (443)A quasiconcave function is characterizedw T g(µ Y + (1 − µ)Z)w ≥ min{w T g(Y )w , w T g(Z)w} (444)In either case, vector w becomes superfluous for real functions.△Unlike convex functions, quasiconvex functions are not necessarilycontinuous. Although insufficient for convex functions, convexity of allsublevel sets is necessary and sufficient for quasiconvexity.

212 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.2.0.0.2 Definition. Quasiconvex function by sublevel sets.Matrix-valued function g(X) : R p×k →S M is a quasiconvex function ofmatrix X if and only if domg is a convex set and the sublevel setcorresponding to each and every V ∈ S M (confer (408))L V g = {X ∈ domg | g(X) ≼ V } ⊆ R p×k (437)is convex.△Necessity of convex sublevel sets is proven as follows: For any X, Y ∈ L V gwe must show for each and every µ ∈[0, 1] that µX + (1−µ)Y ∈ L V g . Bydefinition we have, for each and every real vector w of unit norm,w T g(µX + (1−µ)Y )w ≤ max{w T g(X)w , w T g(Y )w} ≤ w T V w (445)Likewise, convexity of all superlevel sets 3.11 is necessary and sufficient forquasiconcavity.3.2.1 Differentiable and quasiconvex3.2.1.0.1 Definition. Differentiable quasiconvex matrix-valued function.Assume function g(X) : R p×k →S M is twice differentiable, and domg is anopen convex set.Then g(X) is quasiconvex in X if wherever in its domain the directionalderivative (D.1.4) becomes 0, the second directional derivative (D.1.5) ispositive definite there [38,3.4.3] in the same direction Y ; id est, g isquasiconvex if for each and every point X ∈ dom g , all nonzero directionsY ∈ R p×k , and for t ∈ Rddt∣ g(X+ t Y ) = 0 ⇒t=03.11 The superlevel set is similarly defined:d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ≻ 0 (446)L Sg = {X ∈ dom g | g(X) ≽ S }

3.3. SALIENT PROPERTIES 213If g(X) is quasiconvex, conversely, then for each and every X ∈ domgand all Y ∈ R p×k∣ddt∣ g(X+ t Y ) = 0 ⇒d2 ∣∣∣t=0g(X+ t Y ) ≽ 0 (447)t=0dt 2 △3.3 Salient propertiesof convex and quasiconvex functions1.Aconvex (or concave) function is assumed continuous but notnecessarily differentiable on the relative interior of its domain.[194,10]A quasiconvex (or quasiconcave) function is not necessarily acontinuous function.2. convexity ⇒ quasiconvexity ⇔ convex sublevel setsconcavity ⇒ quasiconcavity ⇔ convex superlevel sets3. g convex ⇔ −g concaveg quasiconvex ⇔ −g quasiconcave4. The scalar-definition of matrix-valued function convexity (3.1.2.0.1)and the line theorem (3.1.2.3.1) [38,3.4.2] translate identically toquasiconvexity (and quasiconcavity).5. Composition g(h(X)) of a convex (concave) function g with anyaffine function h : R m×n → R p×k , such that h(R m×n ) ∩ domg ≠ ∅,becomes convex (concave) in X ∈ R m×n . [123,B.2.1] Likewise for thequasiconvex (quasiconcave) functions g .6. Convexity, concavity, quasiconvexity, and quasiconcavity are invariantto nonnegative scaling.7.A sum of (strictly) convex (concave) functions remains (strictly)convex (concave). A pointwise supremum (infimum) of convex(concave) functions remains convex (concave). [194,5]Quasiconvexity (quasiconcavity) is invariant to pointwisesupremum (infimum) of quasiconvex (quasiconcave) functions.

214 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS

Chapter 4Euclidean Distance MatrixThese results were obtained by Schoenberg (1935), a surprisinglylate date for such a fundamental property of Euclidean geometry.−John Clifford Gower [90,3]By itself, distance information between many points in Euclidean space islacking. We might want to know more; such as, relative or absolute positionor dimension of some hull. A question naturally arising in some fields(e.g., geodesy, economics, genetics, psychology, biochemistry, engineering)[57] asks what facts can be deduced given only distance information. Whatcan we know about the underlying points that the distance informationpurports to describe? We also ask what it means when given distanceinformation is incomplete; or suppose the distance information is not reliable,available, or specified only by certain tolerances (affine inequalities). Thesequestions motivate a study of interpoint distance, well represented in anyspatial dimension by a simple matrix from linear algebra. 4.1 In what follows,we will answer some of these questions via Euclidean distance matrices.4.1 e.g.,◦√D ∈ R N×N , a classical two-dimensional matrix representation of absoluteinterpoint distance because its entries (in ordered rows and columns) can be written neatlyon a piece of paper. Matrix D will be reserved throughout to hold distance-square.2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.215

216 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX√52D =⎡⎣1 2 30 1 51 0 45 4 0⎤⎦1231Figure 56: Convex hull of three points (N = 3) is shaded in R 3 (n = 3).Dotted lines are imagined vectors to points.4.1 EDMEuclidean space R n is a finite-dimensional real vector space having an innerproduct defined on it, hence a metric as well. [140,3.1] A Euclidean distancematrix, an EDM in R N×N+ , is an exhaustive table of distance-square d ijbetween points taken by pair from a list of N points {x l , l=1... N} in R n ;the squared metric the measure of distance-square:d ij = ‖x i − x j ‖ 2 2∆= 〈x i − x j , x i − x j 〉 (448)Each point is labelled ordinally, hence the row or column index of an EDM,i or j =1... N , individually addresses all the points in the list.Consider the following example of an EDM for the case N = 3 :D = [d ij ] =⎡⎣⎤d 11 d 12 d 13d 21 d 22 d 23⎦ =d 31 d 32 d 33⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎡⎦ = ⎣0 1 51 0 45 4 0⎤⎦ (449)Matrix D has N 2 entries but only N(N −1)/2 pieces of information. InFigure 56 are shown three points in R 3 that can be arranged in a list to

4.2. FIRST METRIC PROPERTIES 217correspond to D in (449). Such a list is not unique because any rotation,reflection, or translation (4.5) of the points in Figure 56 would produce thesame EDM D .4.2 First metric propertiesFor i,j = 1... N , the Euclidean distance between points x i and x j mustsatisfy the requirements imposed by any metric space: [140,1.1] [160,1.7]1. √ d ij ≥ 0, i ≠ j nonnegativity2. √ d ij = 0, i = j self-distance3. √ d ij = √ d ji symmetry4. √ d ij ≤ √ d ik + √ d kj , i≠j ≠k triangle inequalitywhere √ d ij is the Euclidean metric in R n (4.4). Then all entries of an EDMmust be in concord with these Euclidean metric properties: specifically, eachentry must be nonnegative, 4.2 the main diagonal must be 0 , 4.3 and an EDMmust be symmetric. The fourth property provides upper and lower bounds foreach entry. Property 4 is true more generally when there are no restrictionson indices i,j,k , but furnishes no new information.4.3 ∃ fifth Euclidean metric propertyThe four properties of the Euclidean metric provide information insufficientto certify that a bounded convex polyhedron more complicated than atriangle has a Euclidean realization. [90,2] Yet any list of points or thevertices of any bounded convex polyhedron must conform to the properties.4.2 Implicit from the terminology, √ d ij ≥ 0 ⇔ d ij ≥ 0 is always assumed.4.3 What we call self-distance, Marsden calls nondegeneracy. [160,1.6] Kreyszig callsthese first metric properties axioms of the metric; [140, p.4] Blumenthal refers to them aspostulates. [32, p.15]

218 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.3.0.0.1 Example. Triangle.Consider the EDM in (449), but missing one of its entries:⎡0⎤1 d 13D = ⎣ 1 0 4 ⎦ (450)d 31 4 0Can we determine unknown entries of D by applying the metric properties?Property 1 demands √ d 13 , √ d 31 ≥ 0, property 2 requires the main diagonalbe 0, while property 3 makes √ d 31 = √ d 13 . The fourth property tells us1 ≤ √ d 13 ≤ 3 (451)Indeed, described over that closed interval [1, 3] is a family of triangularpolyhedra whose angle at vertex x 2 varies from 0 to π radians. So, yes wecan determine the unknown entries of D , but they are not unique; nor shouldthey be from the information given for this example.4.3.0.0.2 Example. Small completion problem, I.Now consider the polyhedron in Figure 57(b) formed from an unknownlist {x 1 ,x 2 ,x 3 ,x 4 }. The corresponding EDM less one critical piece ofinformation, d 14 , is given by⎡⎤0 1 5 d 14D = ⎢ 1 0 4 1⎥⎣ 5 4 0 1 ⎦ (452)d 14 1 1 0From metric property 4 we may write a few inequalities for the two trianglescommon to d 14 ; we find√5−1 ≤√d14 ≤ 2 (453)We cannot further narrow those bounds on √ d 14 using only the four metricproperties (4.8.3.1.1). Yet there is only one possible choice for √ d 14 becausepoints x 2 ,x 3 ,x 4 must be collinear. All other values of √ d 14 in the interval[ √ 5−1, 2] specify impossible distances in any dimension; id est, in thisparticular example the triangle inequality does not yield an interval for √ d 14over which a family of convex polyhedra can be reconstructed.

4.3. ∃ FIFTH EUCLIDEAN METRIC PROPERTY 219x 3x 4x 3x 4(a)√512(b)1x 1 1 x 2x 1 x 2Figure 57: (a) Complete dimensionless EDM graph. (b) Emphasizingobscured segments x 2 x 4 , x 4 x 3 , and x 2 x 3 , now only five (2N −3) absolutedistances are specified. EDM so represented is incomplete, missing d 14 asin (452), yet the isometric reconstruction (4.4.2.2.5) is unique as proved in4.9.3.0.1 and4.14.4.1.1. First four properties of Euclidean metric are nota recipe for reconstruction of this polyhedron.We will return to this simple Example 4.3.0.0.2 to illustrate more elegantmethods of solution in4.8.3.1.1,4.9.3.0.1, and4.14.4.1.1. Until then, wecan deduce some general principles from the foregoing examples:Unknown d ij of an EDM are not necessarily uniquely determinable.The triangle inequality does not produce necessarily tight bounds. 4.4Four Euclidean metric properties are insufficient for reconstruction.4.4 The term tight with reference to an inequality means equality is achievable.

220 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.3.1 LookaheadThere must exist at least one requirement more than the four properties ofthe Euclidean metric that makes them altogether necessary and sufficient tocertify realizability of bounded convex polyhedra. Indeed, there are infinitelymany more; there are precisely N + 1 necessary and sufficient Euclideanmetric requirements for N points constituting a generating list (2.3.2). Hereis the fifth requirement:4.3.1.0.1 Fifth Euclidean metric property. Relative-angle inequality.(confer4.14.2.1.1) Augmenting the four fundamental properties of theEuclidean metric in R n , for all i,j,l ≠ k ∈{1... N}, i

4.3. ∃ FIFTH EUCLIDEAN METRIC PROPERTY 221kθ iklθ ikjθ jkliljFigure 58: Nomenclature for fifth Euclidean metric property. Each angle θis made by a vector pair at vertex k while i,j,k,l index four points. Thefifth property is necessary for realization of four or more points reckoned bythree angles in any dimension.

222 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXNow we develop some invaluable concepts, moving toward a link of theEuclidean metric properties to matrix criteria.4.4 EDM definitionAscribe points in a list {x l ∈ R n , l=1... N} to the columns of a matrix X ;X = [x 1 · · · x N ] ∈ R n×N (64)where N is regarded as cardinality of list X . When matrix D =[d ij ] is anEDM, its entries must be related to those points constituting the list by theEuclidean distance-square: for i,j=1... N (A.1.1 no.22)d ij = ‖x i − x j ‖ 2 = (x i − x j ) T (x i − x j ) = ‖x i ‖ 2 + ‖x j ‖ 2 − 2x T ix j= [ ] [ ] [ ]x T i x T I −I xij−I I x j= vec(X) T (Φ ij ⊗ I) vec X = 〈Φ ij , X T X〉(456)where⎡vec X = ⎢⎣⎤x 1x 2⎥. ⎦ ∈ RnN (457)x Nand where Φ ij ⊗I has I ∈ S n in its ii th and jj th block of entries while −I ∈ S nfills its ij th and ji th block; id est,Φ ij ∆ = δ((e i e T j + e j e T i )1) − (e i e T j + e j e T i ) ∈ S N += e i e T i + e j e T j − e i e T j − e j e T (458)i= (e i − e j )(e i − e j ) Twhere {e i ∈ R N , i=1... N} is the set of standard basis vectors, and ⊗signifies the Kronecker product (D.1.2.1). Thus each entry d ij is a convexquadratic function [38,3,4] of vecX (29). [194,6]

4.4. EDM DEFINITION 223The collection of all Euclidean distance matrices EDM N is a convex subsetof R N×N+ called the EDM cone (5, Figure 89, p.390);0 ∈ EDM N ⊆ S N h ∩ R N×N+ ⊂ S N (459)An EDM D must be expressible as a function of some list X ; id est, it musthave the formD(X) ∆ = δ(X T X)1 T + 1δ(X T X) T − 2X T X ∈ EDM N (460)= [vec(X) T (Φ ij ⊗ I) vecX , i,j=1... N] (461)Function D(X) will make an EDM given any X ∈ R n×N , conversely, butD(X) is not a convex function of X (4.4.1). Now the EDM cone may bedescribed:EDM N = { D(X) | X ∈ R N−1×N} (462)Expression D(X) is a matrix definition of EDM and so conforms to theEuclidean metric properties:Nonnegativity of EDM entries (property 1,4.2) is obvious from thedistance-square definition (456), so holds for any D expressible in the formD(X) in (460).When we say D is an EDM, reading from (460), it implicitly meansthe main diagonal must be 0 (property 2, self-distance) and D must besymmetric (property 3); δ(D) = 0 and D T = D or, equivalently, D ∈ S N hare necessary matrix criteria.4.4.0.1 homogeneityFunction D(X) is homogeneous in the sense, for ζ ∈ R√ √◦D(ζX) = |ζ|◦D(X) (463)where the positive square root is entrywise.Any nonnegatively scaled EDM remains an EDM; id est, the matrix classEDM is invariant to nonnegative scaling (αD(X) for α≥0) because allEDMs of dimension N constitute a convex cone EDM N (5, Figure 75).

224 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.4.1 −V T N D(X)V N convexity([ ])xiWe saw that EDM entries d ij are convex quadratic functions. Yetx j−D(X) (460) is not a quasiconvex function of matrix X ∈ R n×N because thesecond directional derivative (3.2)− d2dt 2 ∣∣∣∣t=0D(X+ t Y ) = 2 ( −δ(Y T Y )1 T − 1δ(Y T Y ) T + 2Y T Y ) (464)is indefinite for any Y ∈ R n×N since its main diagonal is 0. [88,4.2.8][125,7.1, prob.2] Hence −D(X) can neither be convex in X .The outcome is different when instead we consider−V T N D(X)V N = 2V T NX T XV N (465)where we introduce the full-rank skinny Schoenberg auxiliary matrix (B.4.2)⎡V ∆ N = √ 12 ⎢⎣(N(V N )=0) having range−1 −1 · · · −11 01. . .0 1⎤⎥⎦= 1 √2[ −1TI]∈ R N×N−1 (466)R(V N ) = N(1 T ) , V T N 1 = 0 (467)Matrix-valued function (465) meets the criterion for convexity in3.1.2.3.2over its domain that is all of R n×N ; videlicet, for any Y ∈ R n×N− d2dt 2 V T N D(X + t Y )V N = 4V T N Y T Y V N ≽ 0 (468)Quadratic matrix-valued function −VN TD(X)V N is therefore convex in Xachieving its minimum, with respect to a positive semidefinite cone (2.7.2.2),at X = 0. When the penultimate number of points exceeds the dimensionof the space n < N −1, strict convexity of the quadratic (465) becomesimpossible because (468) could not then be positive definite.

4.4. EDM DEFINITION 2254.4.2 Gram-form EDM definitionPositive semidefinite matrix X T X in (460), formed from inner product of thelist, is known as a Gram matrix; [154,3.6]⎡⎤‖x 1 ‖ 2 x T 1x 2 x T 1x 3 · · · x T 1x NxG = ∆ T 2x 1 ‖x 2 ‖ 2 x T 2x 3 · · · x T 2x NX T X =x T 3x 1 x T 3x 2 ‖x 3 ‖ 2 ... x T 3x N∈ S N +⎢⎥⎣ . ....... . ⎦xN Tx 1 xN Tx 2 xN Tx 3 · · · ‖x N ‖ 2⎡⎤⎛⎡⎤⎞1 cos ψ 12 cos ψ 13 · · · cos ψ 1N ⎛⎡⎤⎞‖x 1 ‖‖x ‖x 2 ‖cosψ 12 1 cos ψ 23 · · · cos ψ 2N1 ‖= δ⎜⎢⎥⎟. ⎝⎣. ⎦⎠cosψ 13 cos ψ 23 1 . . cos ψ ‖x 2 ‖3Nδ⎜⎢⎥⎟⎢⎥ ⎝⎣. ⎦⎠‖x N ‖⎣ . ....... . ⎦‖x N ‖cos ψ 1N cosψ 2N cos ψ 3N · · · 1∆= δ 2 (G) 1/2 Ψ δ 2 (G) 1/2 (469)where ψ ij (489) is the angle between vectors x i and x j , and whereδ 2 denotes a diagonal matrix in this case. Positive semidefinitenessof interpoint angle matrix Ψ implies positive semidefiniteness of Grammatrix G ; [38,8.3]G ≽ 0 ⇐ Ψ ≽ 0 (470)When δ 2 (G) is nonsingular, then G ≽ 0 ⇔ Ψ ≽ 0. (A.3.1.0.5)Distance-square d ij (456) is related to Gram matrix entries G T = G ∆ = [g ij ]d ij = g ii + g jj − 2g ij= 〈Φ ij , G〉(471)where Φ ij is defined in (458). Hence the linear EDM definition}D(G) = ∆ δ(G)1 T + 1δ(G) T − 2G ∈ EDM N⇐ G ≽ 0 (472)= [〈Φ ij , G〉 , i,j=1... N]The EDM cone may be described, (confer (538)(544))EDM N = { }D(G) | G ∈ S N +(473)

226 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.4.2.1 First point at originAssume the first point x 1 in an unknown list X resides at the origin;Xe 1 = 0 ⇔ Ge 1 = 0 (474)Consider the symmetric translation (I − 1e T 1 )D(G)(I − e 1 1 T ) that shiftsthe first row and column of D(G) to the origin; setting Gram-form EDMoperator D(G) = D for convenience,− ( D − (De 1 1 T + 1e T 1D) + 1e T 1De 1 1 T) 12 = G − (Ge 11 T + 1e T 1G) + 1e T 1Ge 1 1 Twheree 1 ∆ =⎡⎢⎣10.0⎤(475)⎥⎦ (476)is the first vector from the standard basis. Then it follows for D ∈ S N hG = − ( D − (De 1 1 T + 1e T 1D) ) 12 , x 1 = 0= − [ 0 √ ] TD [ √ ]2V N 0 1 2VN2[ ] 0 0T=0 −VN TDV N(477)VN TGV N = −VN TDV N 1 , 2 ∀XwhereI − e 1 1 T =[0 √ 2V N](478)is a projector nonorthogonally projecting (E.1) onS N 1 = {G∈ S N | Ge 1 = 0}{ [0 √ ] T [ √ ]= 2VN Y 0 2VN | Y ∈ SN} (1565)in the Euclidean sense. From (477) we get sufficiency of the first matrixcriterion for an EDM proved by Schoenberg in 1935; [199] 4.74.7 From (467) we know R(V N )= N(1 T ) , so (479) is the same as (455). In fact, anymatrix V in place of V N will satisfy (479) whenever R(V )= R(V N )= N(1 T ). But V N isthe matrix implicit in Schoenberg’s seminal exposition.

4.4. EDM DEFINITION 227D ∈ EDM N⇔{−VTN DV N ∈ S N−1+D ∈ S N h(479)We provide a rigorous complete more geometric proof of this Schoenbergcriterion in4.9.1.0.3.[ ] 0 0TBy substituting G =0 −VN TDV (477) into D(G) (472), assumingNx 1 = 0[0D =δ ( −VN TDV )N] [1 T + 1 0 δ ( ) ]−VNDV T TNWe provide details of this bijection in4.6.2.[ ] 0 0T− 20 −VN TDV N(480)4.4.2.2 0 geometric centerAssume the geometric center (4.5.1.0.1) of an unknown list X is the origin;X1 = 0 ⇔ G1 = 0 (481)Now consider the calculation (I − 1 N 11T )D(G)(I − 1 N 11T ) , a geometriccentering or projection operation. (E.7.2.0.2) Setting D(G) = D forconvenience as in4.4.2.1,G = − ( D − 1 N (D11T + 11 T D) + 1N 2 11 T D11 T) 12 , X1 = 0= −V DV 1 2V GV = −V DV 1 2∀X(482)where more properties of the auxiliary (geometric centering, projection)matrixV ∆ = I − 1 N 11T ∈ S N (483)are found inB.4. From (482) and the assumption D ∈ S N h we get sufficiencyof the more popular form of Schoenberg’s criterion:

228 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXD ∈ EDM N⇔{−V DV ∈ SN+D ∈ S N h(484)Of particular utility when D ∈ EDM N is the fact, (B.4.2 no.20) (456)tr ( −V DV 1 2)=12N∑i,jd ij = 12N vec(X)T (∑i,jΦ ij ⊗ I= tr(V GV ) , G ≽ 0)vec X∑= trG = N ‖x l ‖ 2 = ‖X‖ 2 F , X1 = 0l=1(485)where ∑ Φ ij ∈ S N + (458), therefore convex in vecX . We will find this traceuseful as a heuristic to minimize affine dimension of an unknown list arrangedcolumnar in X , (7.2.2) but it tends to facilitate reconstruction of a listconfiguration having least energy; id est, it compacts a reconstructed list byminimizing total norm-square of the vertices.By substituting G=−V DV 1 (482) into D(G) (472), assuming X1=02(confer4.6.1)D = δ ( −V DV 2) 1 1 T + 1δ ( −V DV 2) 1 T ( )− 2 −V DV12(486)These relationships will allow combination of distance and Gramconstraints in any optimization problem we may pose:Constraining all main-diagonal entries of a Gram matrix to 1, forexample, is equivalent to the constraint that all points lie on ahypersphere (4.9.1.0.2) of radius 1 centered at the origin. Thisis equivalent to the EDM constraint: D1 = 2N1. [50, p.116] Anyfurther constraint on that Gram matrix then applies only to interpointangle Ψ .More generally, interpoint angle Ψ can be constrained by fixing all theindividual point lengths δ(G) 1/2 ; thenΨ = − 1 2 δ2 (G) −1/2 V DV δ 2 (G) −1/2 (487)

4.4. EDM DEFINITION 2294.4.2.2.1 Example. List member constraints via Gram matrix.Capitalizing on identity (482) relating Gram and EDM D matrices, aconstraint set such astr ( − 1V DV e )2 ie T i = ‖xi ‖⎫⎪ 2tr ( ⎬− 1V DV (e 2 ie T j + e j e T i ) 2) 1 = xTi x jtr ( (488)− 1V DV e ) ⎪2 je T j = ‖xj ‖ 2 ⎭relates list member x i to x j to within an isometry through inner-productidentity [249,1-7]cos ψ ij =xT i x j‖x i ‖ ‖x j ‖(489)For M list members, there are a total of M(M+1)/2 such constraints.Consider the academic problem of finding a Gram matrix subject toconstraints on each and every entry of the corresponding EDM:findD∈S N h−V DV 1 2 ∈ SNsubject to 〈 D , (e i e T j + e j e T i ) 1 2〉= ď ij , i,j=1... N , i < j−V DV ≽ 0(490)where the ďij are given nonnegative constants. EDM D can, of course,be replaced with the equivalent Gram-form (472). Requiring only theself-adjointness property (1026) of the main-diagonal linear operator δ weget, for A∈ S N〈D , A〉 = 〈 δ(G)1 T + 1δ(G) T − 2G , A 〉 = 〈G , δ(A1) − A〉 2 (491)Then the problem equivalent to (490) becomes, for G∈ S N c ⇔ G1=0

230 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXx 3x 4x 5x 6Figure 59: Arbitrary hexagon in R 3 whose vertices are labelled clockwise.x 1x 2findG∈S N csubject toG ∈ S N〈G , δ ( (e i e T j + e j e T i )1 ) 〉− (e i e T j + e j e T i ) = ďij , i,j=1... N , i < jG ≽ 0 (492)Barvinok’s Proposition 2.9.3.0.1 predicts existence for either formulation(490) or (492) such that implicit equality constraints induced by subspacemembership are ignored⌊√ ⌋8(N(N −1)/2) + 1 − 1rankG , rankV DV ≤= N − 1 (493)2because, in each case, the Gram matrix is confined to a face of positivesemidefinite cone S N + isomorphic with S N−1+ (5.7.1). (E.7.2.0.2) This boundis tight (4.7.1.1) and is the greatest upper bound. 4.84.8 −V DV | N←1 = 0 (B.4.1)

4.4. EDM DEFINITION 2314.4.2.2.2 Example. Hexagon.Barvinok [19,2.6] poses a problem in geometric realizability of an arbitraryhexagon (Figure 59) having:1. prescribed (one-dimensional) face-lengths2. prescribed angles between the three pairs of opposing faces3. a constraint on the sum of norm-square of each and every vertexten affine equality constraints in all on a Gram matrix G∈ S 6 (482). Let’srealize this as a convex feasibility problem (with constraints written in thesame order) also assuming 0 geometric center (481):findD∈S 6 h−V DV 1 2 ∈ S6subject to tr ( )D(e i e T j + e j e T i ) 1 2 = l2ij , j−1 = (i = 1... 6) mod 6tr ( − 1V DV (A 2 i + A T i ) 2) 1 = cos ϕi , i = 1, 2, 3tr(− 1 2 V DV ) = 1−V DV ≽ 0 (494)where, for A i ∈ R 6×6 A 1 = (e 1 − e 6 )(e 3 − e 4 ) T /(l 61 l 34 )A 2 = (e 2 − e 1 )(e 4 − e 5 ) T /(l 12 l 45 )A 3 = (e 3 − e 2 )(e 5 − e 6 ) T /(l 23 l 56 )(495)and where the first constraint on length-square l 2 ij can be equivalently writtenas a constraint on the Gram matrix −V DV 1 via (491). We show how to2numerically solve such a problem by alternating projection inE.10.2.1.1.Barvinok’s Proposition 2.9.3.0.1 asserts existence of a list, correspondingto Gram matrix G solving this feasibility problem, whose affine dimension(4.7.1.1) does not exceed 3 because the convex feasible set is bounded bythe third constraint tr(− 1 V DV ) = 1 (485).24.4.2.2.3 Example. Kissing number of ball packing.Two Euclidean balls are said to kiss if they touch. An elementary geometricalproblem can be posed: Given hyperspheres, each having the same diameter 1,how many hyperspheres can simultaneously kiss one central hypersphere?[262] The noncentral hyperspheres are allowed but not required to kiss.

232 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXFigure 60: Sphere-packing illustration from [244, kissing number].Translucent balls illustrated all have the same diameter.As posed, the problem seeks the maximum number of spheres K kissinga central sphere in a particular dimension. The total number of spheres isN = K + 1. In one dimension the answer to this kissing problem is 2. In twodimensions, 6. (Figure 7)The question was presented in three dimensions to Isaac Newton in thecontext of celestial mechanics, and became controversy with David Gregoryon the campus of Cambridge University in 1694. Newton correctlyidentified the kissing number as 12 (Figure 60) while Gregory argued for 13.In 2003, Oleg Musin tightened the upper bound on kissing number Kin four dimensions from 25 to K = 24 by refining a proof offered byPhilippe Delsarte in 1973 based on linear programming.There are no proofs known for kissing number in higher dimensionexcepting dimensions eight and twenty four.Ye proposes translating this problem to an EDM graph realization(Figure 57, Figure 61). Imagine the centers of each sphere are connected byline segments. Then the distance between centers must obey simple criteria:Each sphere touching the central sphere has a line segment of length exactly 1joining its center to the central sphere’s center. All spheres, excepting thecentral sphere, must have centers separated by a distance of at least 1.From this perspective, the kissing problem can be posed as a semidefiniteprogram. Assign index 1 to the central sphere, and assume a total of N

4.4. EDM DEFINITION 233spheres:minimize − tr W VN TDV NDsubject to D 1j = 1, j = 2... N(496)D ij ≥ 1, 2 ≤ i < j = 3... ND ∈ EDM NThen the kissing numberK = N max − 1 (497)is found from the maximum number of spheres N that solve this semidefiniteprogram in a given affine dimension r . Matrix W can be interpreted as thedirection of search through a positive semidefinite cone; it is constant, in thisprogram, determined by a method discussed in... In two dimensions,⎡W =⎢⎣4 1 2 −1 −1 11 4 −1 −1 2 12 −1 4 1 1 −1−1 −1 1 4 1 2−1 2 1 1 4 −11 1 −1 2 −1 4⎤⎥⎦16(498)In three dimensions,⎡W =⎢⎣9 1 −2 −1 3 −1 −1 1 2 1 −2 11 9 3 −1 −1 1 1 −2 1 2 −1 −1−2 3 9 1 2 −1 −1 2 −1 −1 1 2−1 −1 1 9 1 −1 1 −1 3 2 −1 13 −1 2 1 9 1 1 −1 −1 −1 1 −1−1 1 −1 −1 1 9 2 −1 2 −1 2 3−1 1 −1 1 1 2 9 3 −1 1 −2 −11 −2 2 −1 −1 −1 3 9 2 −1 1 12 1 −1 3 −1 2 −1 2 9 −1 1 −11 2 −1 2 −1 −1 1 −1 −1 9 3 1−2 −1 1 −1 1 2 −2 1 1 3 9 −11 −1 2 1 −1 3 −1 1 −1 1 −1 9⎤112⎥⎦(499)The four-dimensional solution has rational direction matrix as well.

234 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX(a)(c)x 3x 4 x 5x 5 x 6√d13√d14x 3ˇx 3 ˇx 4√d12ˇx 2x 1x 1x 2x 4x 1x 1Figure 61: (a) Given three distances indicated with absolute pointpositions ˇx 2 , ˇx 3 , ˇx 4 known and noncollinear, absolute position of x 1 in R 2can be precisely and uniquely determined by trilateration; solution to asystem of nonlinear equations. Dimensionless EDM graphs (b) (c) (d)represent EDMs in various states of completion. Line segments representknown absolute distances and may cross without vertex at intersection.(b) Four-point list must always be embeddable in affine subset havingdimension rankVN TDV N not exceeding 3. To determine relative position ofx 2 ,x 3 ,x 4 , triangle inequality is necessary and sufficient (4.14.1). Knowingall distance information, then (by injectivity of D (4.6)) point position x 1is uniquely determined to within an isometry in any dimension. (c) Whenfifth point is introduced, only distances to x 3 ,x 4 ,x 5 are required todetermine relative position of x 2 in R 2 . Graph represents first instanceof missing distance information; √ d 12 . (d) Three distances are absent( √ d 12 , √ d 13 , √ d 23 ) from complete set of interpoint distances, yet uniqueisometric reconstruction (4.4.2.2.5) of six points in R 2 is certain.(b)(d)

4.4. EDM DEFINITION 235This next example shows how finding the common point of intersectionfor three circles in a plane, a nonlinear problem, has convex expression.4.4.2.2.4 Example. Ye trilateration in small wireless sensor network.Given three known absolute point positions in R 2 (three anchors ˇx 2 , ˇx 3 , ˇx 4 )and only one unknown point (one sensor x 1 ∈ R 2 ), the sensor’s absoluteposition is determined from its noiseless measured distance-square ď i1to each of three anchors (Figure 2, Figure 61(a)). This trilaterationcan be expressed as a convex optimization problem in terms of listX = ∆ [x 1 ˇx 2 ˇx 3 ˇx 4 ]∈ R 2×4 and Gram matrix G∈ S 4 (469):minimize trGG∈S 4 , X∈R2×4 subject to tr(GΦ i1 ) = ďi1 , i = 2, 3, 4tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 2, 3, 4tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3, 4X(:, 2:4) = [ ˇx 2 ˇx 3 ˇx 4 ][ ] I X≽ 0GX T(500)whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (458)and where the constraint on distance-square ďi1 is equivalently written asa constraint on the Gram matrix via (471). There are 9 independent affineequality constraints on that Gram matrix while the sensor is constrained onlyby dimensioning to lie in R 2 . Although the objective of minimization 4.9insures a solution on the boundary of positive semidefinite cone S 4 + , weclaim that the set of feasible Gram matrices forms a line (2.5.1.1) inisomorphic R 10 tangent (2.1.7.2) to the positive semidefinite cone boundary.(confer6.2.1.3)4.9 Trace (trG = 〈I , G〉) minimization is a heuristic for rank minimization. (7.2.2.1) Itmay be interpreted as squashing G which is bounded below by X T X as in (501).

236 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXBy Schur complement (A.4) any feasible G and X provideG ≽ X T X (501)which is a convex relaxation (2.9.1.0.1) of the desired (nonconvex) equalityconstraint[ ] [ ]I X I=G X T [ I X ] (502)X Texpected positive semidefinite rank-2 under noiseless conditions. But by(1089), the relaxation admits(3 ≥) rankG ≥ rankX (503)(a third dimension corresponding to[an intersection ] of three spheresI X(not circles) were there noise). If rank⋆X ⋆T G ⋆ of an optimal solutionequals 2, then G ⋆ = X ⋆T X ⋆ by Theorem A.4.0.0.2.As posed, this localization problem does not require affinely independent(Figure 17, three noncollinear) anchors. Assuming the anchors exhibit norotational or reflective symmetry in their affine hull (4.5.2) and assumingthe sensor lies in that affine hull, then solution X ⋆ (:, 1) is unique undernoiseless measurement. [204]This preceding transformation of trilateration to a semidefinite programworks all the time (rankG ⋆ = 2) despite relaxation (501) because the optimalsolution set is a unique point.Proof (sketch). Only the sensor location x 1 is unknown. The objectivefunction together with the equality constraints make a linear system ofequations in Gram matrix variable GtrG = ‖x 1 ‖ 2 + ‖ˇx 2 ‖ 2 + ‖ˇx 3 ‖ 2 + ‖ˇx 4 ‖ 2tr(GΦ i1 = ďi1 , i = 2, 3, 4tr ( ) (504)Ge i e T i = ‖ˇxi ‖ 2 , i = 2, 3, 4tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3, 4

4.4. EDM DEFINITION 237which is invertible:⎡⎤svec(I) Tsvec(Φ 21 ) Tsvec(Φ 31 ) Tsvec(Φ 41 ) Tsvec(e 2 e T 2 ) Tsvec G =svec(e 3 e T 3 ) Tsvec(e 4 e T 4 ) Tsvec ( (e⎢ 2 e T 3 + e 3 e T 2 )/2 ) T⎣ svec ( (e 2 e T 4 + e 4 e T 2 )/2 ) T ⎥svec ( (e 3 e T 4 + e 4 e T 3 )/2 ) ⎦T−1⎡‖x 1 ‖ 2 + ‖ˇx 2 ‖ 2 + ‖ˇx 3 ‖ 2 + ‖ˇx 4 ‖ 2 ⎤ď 21ď 31ď 41‖ˇx 2 ‖ 2‖ˇx 3 ‖ 2‖ˇx 4 ‖ 2⎢ ˇx T 2ˇx 3⎥⎣ ˇx T 2ˇx 4⎦ˇx T 3ˇx 4(505)That line in the ambient space S 4 of G is traced by the right-hand side.One must show this line to be tangential (2.1.7.2) to S 4 + in order to proveuniqueness. Tangency is possible for affine dimension 1 or 2 while itsoccurrence depends completely on the known measurement data. But as soon as significant noise is introduced or whenever distance datais incomplete, such problems can remain convex although the set of alloptimal solutions generally becomes a convex set bigger than a single pointbut containing the noiseless solution. We explore ramifications of noise andincomplete data throughout. Recourse to high-rank optimal solution undernoiseless conditions is discussed in Example 4.4.2.2.6.4.4.2.2.5 Definition. Isometric reconstruction. (confer4.5.3)Isometric reconstruction from an EDM means building a list X correct towithin a rotation, reflection, and translation; in other terms, reconstructionof relative position, correct to within an isometry, or to within a rigidtransformation.△How much distance information is needed to uniquely localize asensor? The narrative in Figure 61 helps dispel any notion of distancedata proliferation in low affine dimension (r

238 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXx 1x 4x 6x 3x 5x 2Figure 62: Incomplete EDM corresponding to this dimensionless EDMgraph provides unique isometric reconstruction in R 2 . (drawn freehand, nosymmetry intended)and Pardalos [128,4.2] claim O(2N) distances is a least lower bound(independent of affine dimension r) for unique isometric reconstruction;achievable under certain conditions on connectivity and point position.Alfakih shows how to ascertain uniqueness over all affine dimensions viaGale matrix. [7] [2] [3] Figure 57(b) (page 219, from small completionproblem Example 4.3.0.0.2) is an example in R 2 requiring only 2N − 3 = 5known symmetric entries for unique isometric reconstruction, although thefour-point example in Figure 61(b) will not yield a unique reconstructionwhen any one of the distances is left unspecified.The list represented by the particular dimensionless EDM graph inFigure 62, having only 2N − 3 = 9 absolute distances specified, is uniquelylocalizable in R 2 but has realizations in higher dimensions. For sake ofreference, we provide the complete corresponding EDM:⎡D =⎢⎣0 50641 56129 8245 18457 2664550641 0 49300 25994 8810 2061256129 49300 0 24202 31330 91608245 25994 24202 0 4680 529018457 8810 31330 4680 0 665826645 20612 9160 5290 6658 0⎤⎥⎦(506)We push lack of distance information past the envelope in this nextexample.

4.4. EDM DEFINITION 239ˇx 4x 1x 2ˇx 3 ˇx 5Figure 63: Two sensors • and three anchors ◦ in R 2 . (Ye) Connectingline-segments denote known absolute distances. Incomplete EDMcorresponding to this dimensionless EDM graph provides unique isometricreconstruction in R 2 .4.4.2.2.6 Example. Tandem trilateration in wireless sensor network.Given three known absolute point-positions in R 2 (three anchors ˇx 3 , ˇx 4 , ˇx 5 )and two unknown sensors x 1 , x 2 ∈ R 2 , the sensors’ absolute positions aredeterminable from their noiseless distances-square (as indicated in Figure 63)assuming the anchors exhibit no rotational or reflective symmetry in theiraffine hull (4.5.2). This example differs from Example 4.4.2.2.4 in so faras trilateration of each sensor is now in terms of one unknown position, theother sensor. We express this localization as a convex optimization problem(a semidefinite program,6.1) in terms of list X ∆ = [x 1 x 2 ˇx 3 ˇx 4 ˇx 5 ]∈ R 2×5and Gram matrix G∈ S 5 (469) via relaxation (501):minimize trGG∈S 5 , X∈R2×5 subject to tr(GΦ i1 ) = ďi1 , i = 2, 4, 5tr(GΦ i2 ) = ďi2 , i = 3, 5tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 3, 4, 5tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 3 ≤ i < j = 4, 5X(:, 3:5) = [ ˇx 3 ˇx 4 ˇx 5 ][ ] I X≽ 0GX T(507)whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (458)

240 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX105ˇx 4ˇx 50ˇx 3−5x 2x 1−10−15−6 −4 −2 0 2 4 6 8Figure 64: Given in red are two discrete linear trajectories of sensors x 1and x 2 in R 2 localized by algorithm (507) as indicated by blue bullets • .Anchors ˇx 3 , ˇx 4 , ˇx 5 corresponding to Figure 63 are indicated by ⊗ . Whentargets and bullets • coincide under these noiseless conditions, localizationis successful. On this run, two visible localization errors are due to rank-3Gram optimal solutions. These errors can be corrected by choosing a differentnormal in objective of minimization.

4.4. EDM DEFINITION 241This problem realization is fragile because of the unknown distances betweensensors and anchors. Yet there is no more information we may include beyondthe 11 independent equality constraints on the Gram matrix (nonredundantconstraints not antithetical) to reduce the feasible set 4.11 . (By virtue of theirdimensioning, the sensors are already constrained to R 2 the affine hull of theanchors.)Exhibited in Figure 64 are two errors in solution X ⋆ (:,1:2) due toa rank-3 optimal Gram matrix G ⋆ . The trace objective is a heuristicminimizing convex envelope of quasiconcave rank function 4.12 rankG.(2.9.2.5.2,7.2.2.1) A rank-2 optimal Gram matrix can be found by choosinga different normal for the linear objective function, now implicitly the identitymatrix I ; id est,trG = 〈G , I 〉 ← 〈G , δ(u)〉 (508)where vector u ∈ R 5 is randomly selected. A random search for a goodnormal δ(u) in only a few iterations is quite easy and effective becausethe problem is small, an optimal solution is known a priori to exist intwo dimensions, a good normal direction is not necessarily unique, and (wespeculate) because the feasible affine-subset slices the positive semidefinitecone thinly in the Euclidean sense. 4.134.4.2.2.7 Example. Cellular telephone network.Utilizing measurements of distance, time of flight, angle of arrival, or signalpower, multilateration is the process of localizing (determining absoluteposition of) a radio signal source • by inferring geometry relative to multiplefixed base stations ◦ whose locations are known.We consider localization of a regular cellular telephone by distancegeometry, so we assume distance to any particular base station can be inferredby received signal power. On a large open flat expanse of terrain, signal-powermeasurement corresponds well with inverse distance. But it is not uncommonfor measurement of signal power to suffer 20 decibels in loss caused by factorssuch as multipath interference (reflections), mountainous terrain, man-made4.11 the presumably nonempty convex set of all points G and X satisfying the constraints.4.12 Projection on that nonconvex subset of all N ×N-dimensional positive semidefinitematrices whose rank does not exceed 2 in an affine subset is a problem considered difficultto solve. [229,4]4.13 The log det rank-heuristic from7.2.2.4 does not work here because it chooses thewrong normal. Rank reduction (6.1.1.1) is unsuccessful here because Barvinok’s upperbound (2.9.3.0.1) on rank of G ⋆ is 4.

242 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXˇx 7x 1ˇx 2ˇx 3ˇx 4ˇx 6ˇx 5Figure 65: Regions of coverage by base stations ◦ in a cellular telephonenetwork. The term cellular arises from packing of regions best coveredby neighboring base stations. Illustrated is a pentagonal cell best coveredby base station ˇx 2 . Like a Voronoi diagram, cell geometry depends onbase-station arrangement. In some US urban environments, it is not unusualto find base stations spaced approximately 1 mile apart. There can beas many as 20 base-station antennae capable of receiving signal from anygiven cell phone • ; practically, that number is closer to 6. Depictedis one cell phone x 1 whose signal power is automatically and repeatedlymeasured by 6 base stations ◦ nearby. (Cell phone signal power is typicallyencoded logarithmically with 1-decibel increment and 64-decibel dynamicrange.) Those signal power measurements are transmitted from that cellphone to base station ˇx 2 who decides whether to transfer (hand-off orhand-over) responsibility for that call should the user roam outside its cell.(Because distance to base station is quite difficult to infer from signal powermeasurements in an urban environment, localization of a particular cellphone • by distance geometry would be far easier were the whole cellularsystem instead conceived so cell phone x 1 also transmits (to base station ˇx 2 )its signal power as received by all other cell phones within range.)

4.4. EDM DEFINITION 243structures, turning one’s head, or rolling the windows up in an automobile.Consequently, contours of equal signal power are no longer circular; theirgeometry is irregular and would more aptly be approximated by translatedellipses of various orientation.Due to noise, at least one distance measurement more than the minimumnumber of measurements is required for reliable localization in practice;3 measurements are minimum in two dimensions, 4 in three. We arecurious to know how these convex optimization algorithms fare in the face ofmeasurement noise, and how they compare with traditional methods solvingsimultaneous hyperbolic equations; but we leave that question open for nowas our purpose here is to illustrate how a problem in distance geometrycan be solved without any measured distance data, just upper and lowerbounds on it. Existence of noise precludes measured distance from theinput data. We instead assign measured distance to a known range specifiedby individual upper and lower bounds; d i1 is the upper bound on actualdistance-square from the cell phone to i th base station, while d i1 is the lowerbound. These bounds become the input data. Each measurement range ispresumed different from the others.Then convex problem (500) takes the form:minimize trGG∈S 7 , X∈R2×7 subject to d i1 ≤ tr(GΦ i1 ) ≤ d i1 , i = 2... 7tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 2... 7tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3... 7whereX(:, 2:7) = [ ˇx 2 ˇx 3 ˇx 4 ˇx 5 ˇx 6 ˇx 7 ][ ] I X≽ 0 (509)GX TΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (458)This semidefinite program realizes the problem illustrated in Figure 65.We take X ⋆ (:, 1) as solution, although measurement noise will often causerankG ⋆ to exceed 2. Randomized search for a rank-2 optimal solution is notso easy here as in Example 4.4.2.2.6. We revisit this problem in...To formulate the same problem in three dimensions, X is simplyredimensioned in the semidefinite program.

244 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.4.3 Inner-product form EDM definitionWe might, for example, realize a constellation given only interstellar distance(or, equivalently, distance from Earth and relative angular measurement;the Earth as vertex to two stars). [p.34]In the rigid covalent geometry approximation, the bond lengths andangles are treated as completely fixed, so that a given spatial structure can bedescribed very compactly indeed by a list of torsion angles alone... Theseare the dihedral angles between the planes spanned by the two consecutivetriples in a chain of four covalently bonded atoms. [49,1.1]Equivalent to (456) is [249,1-7] [212,3.2]d ij = d ik + d kj − 2 √ d ik d kj cos θ ikj= [√ d ik√dkj] [ 1 −e ıθ ikj−e −ıθ ikj1] [√ ]d ik√dkj(510)called the law of cosines, where ı ∆ = √ −1 , i,k,j are positive integers, andθ ikj is the angle at vertex x k formed by vectors x i − x k and x j − x k ;cos θ ikj =1(d 2 ik + d kj − d ij )√ = (x i − x k ) T (x j − x k )dik d kj ‖x i − x k ‖ ‖x j − x k ‖(511)where ([√the numerator ]) forms an inner product of vectors. Distance-squaredikd ij √dkj is a convex quadratic function 4.14 on R 2 + whereas d ij (θ ikj )is quasiconvex (3.2) minimized over domain −π ≤θ ikj ≤π by θ ⋆ ikj =0, weget the Pythagorean theorem when θ ikj = ±π/2, and d ij (θ ikj ) is maximizedwhen θ ⋆ ikj =±π ;[]4.14 1 −e ıθ ikj−e −ıθ ≽ 0, having eigenvalues {0,2}. Minimum is attained forikj1[ √ ] {dik√dkjµ1, µ ≥ 0, θikj = 0=0, −π ≤ θ ikj ≤ π , θ ikj ≠ 0 . (D.2.1, [38, exmp.4.5])

4.4. EDM DEFINITION 245d ij = (√ d ik + √ d kj) 2,θikj = ±πd ij = d ik + d kj , θ ikj = ± π 2d ij = (√ d ik − √ d kj) 2,θikj = 0(512)so| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj (513)Hence the triangle inequality, Euclidean metric property 4, holds for anyEDM D .We may construct an inner-product form of the EDM definition formatrices by evaluating (510) for k=1: By defining⎡√ √ √ ⎤d 12 d12 d 13 cos θ 213 d12 d 14 cos θ 214 · · · d12 d 1N cos θ 21N√ √ √ d12 dΘ T Θ =∆ 13 cosθ 213 d 13 d13 d 14 cos θ 314 · · · d13 d 1N cos θ 31N√ √ √ d12 d 14 cosθ 214 d13 d 14 cos θ 314 d 14... d14 d 1N cos θ 41N∈ S N−1⎢⎥⎣ ........√ √. ⎦√d12 d 1N cosθ 21N d13 d 1N cos θ 31N d14 d 1N cos θ 41N · · · d 1Nthen any EDM may be expressed(514)D(Θ) ∆ ==[[0δ(Θ T Θ)]1 T + 1 [ 0 δ(Θ T Θ) ] [ 0 0 T T− 20 Θ T Θ0 δ(Θ T Θ) Tδ(Θ T Θ) δ(Θ T Θ)1 T + 1δ(Θ T Θ) T − 2Θ T Θ]]∈ EDM N(515)EDM N = { D(Θ) | Θ ∈ R N−1×N−1} (516)for which all Euclidean metric properties hold. Entries of Θ T Θ result fromvector inner-products as in (511); id est,

246 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXΘ = [x 2 − x 1 x 3 − x 1 · · · x N − x 1 ] = X √ 2V N ∈ R n×N−1 (517)Inner product Θ T Θ is obviously related to a Gram matrix (469),[ 0 0TG =0 Θ T Θ], x 1 = 0 (518)For D = D(Θ) and no condition on the list X (confer (477) (482))Θ T Θ = −V T NDV N ∈ R N−1×N−1 (519)4.4.3.1 Relative-angle formThe inner-product form EDM definition is not a unique definition ofEuclidean distance matrix; there are approximately five flavors distinguishedby their argument to operator D . Here is another one:Like D(X) (460), D(Θ) will make an EDM given any Θ∈ R n×N−1 , it isneither a convex function of Θ (4.4.3.2), and it is homogeneous in the sense(463). Scrutinizing Θ T Θ (514) we find that because of the arbitrary choicek = 1, distances therein are all with respect to point x 1 . Similarly, relativeangles in Θ T Θ are between all vector pairs having vertex x 1 . Yet pickingarbitrary θ i1j to fill Θ T Θ will not necessarily make an EDM; inner product(514) must be positive semidefinite.Θ T Θ = √ δ(d) Ω √ δ(d) =∆⎡√ ⎤⎡⎤⎡√ ⎤d12 0 1 cos θ 213 · · · cos θ 21N d12 0√ d13 ⎢⎥cosθ 213 1... cos θ √31Nd13 ⎣ ... ⎦⎢⎥⎢⎥√ ⎣ ....... . ⎦⎣... ⎦√0d1N cos θ 21N cos θ 31N · · · 1 0d1N(520)Expression D(Θ) defines an EDM for any positive semidefinite relative-anglematrixΩ = [cos θ i1j , i,j = 2...N] ∈ S N−1 (521)and any nonnegative distance vectord = [d 1j , j = 2...N] = δ(Θ T Θ) ∈ R N−1 (522)

4.4. EDM DEFINITION 247because (A.3.1.0.5)Ω ≽ 0 ⇒ Θ T Θ ≽ 0 (523)Decomposition (520) and the relative-angle matrix inequality Ω ≽ 0 lead toa different expression of an inner-product form EDM definition (515)D(Ω,d) ∆ ==[ ] 01dT + 1 [ 0 d ] √ ([ ]) [ ] √ ([ ])0 0 0 T T 0− 2 δδd 0 Ω d[ ]0 dTd d1 T + 1d T − 2 √ δ(d) Ω √ ∈ EDM Nδ(d)(524)and another expression of the EDM cone:EDM N ={D(Ω,d) | Ω ≽ 0, √ }δ(d) ≽ 0(525)In the particular circumstance x 1 = 0, we can relate interpoint angle matrixΨ from the Gram decomposition in (469) to relative-angle matrix Ω in(520). Thus,[ ] 1 0TΨ ≡ , x0 Ω 1 = 0 (526)4.4.3.2 Inner-product form −V T N D(Θ)V N convexityWe saw that each EDM entry d ij is a convex quadratic function of[√ ]dik√dkjand a quasiconvex function of θ ikj . Here the situation for inner-productform EDM operator D(Θ) (515) is identical to that in4.4.1 for list-formD(X) ; −D(Θ) is not a quasiconvex function of Θ by the same reasoning,and from (519)−V T N D(Θ)V N = Θ T Θ (527)is a convex quadratic function of Θ on domain R n×N−1 achieving its minimumat Θ = 0.

248 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.4.3.3 Inner-product form, discussionWe deduce that knowledge of interpoint distance is equivalent to knowledge ofdistance and angle from the perspective of one point, x 1 in our chosen case.The total amount of information in Θ T Θ , N(N −1)/2, is unchanged 4.15with respect to EDM D .4.5 InvarianceWhen D is an EDM, there exist an infinite number of corresponding N-pointlists X (64) in Euclidean space. All those lists are related by isometrictransformation: rotation, reflection, and translation (offset or shift).4.5.1 TranslationAny translation common among all the points x l in a list will be cancelledin the formation of each d ij . Proof follows directly from (456). Knowingthat translation α in advance, we may remove it from the list constitutingthe columns of X by subtracting α1 T . Then it stands to reason by list-formdefinition (460) of an EDM, for any translation α∈ R nD(X − α1 T ) = D(X) (528)In words, interpoint distances are unaffected by offset; EDM D is translationinvariant. When α = x 1 in particular,[x 2 −x 1 x 3 −x 1 · · · x N −x 1 ] = X √ 2V N ∈ R n×N−1 (517)and so(D(X −x 1 1 T ) = D(X −Xe 1 1 T ) = D X[0 √ ])2V N= D(X) (529)4.15 The reason for the amount O(N 2 ) information is because of the relative measurements.The use of a fixed reference in the measurement of angles and distances would reducethe required information but is antithetical. In the particular case n = 2, for example,ordering all points x l in a length-N list by increasing angle of vector x l −x 1 with respectto x 2 − x 1 , θ i1j becomes equivalent to j−1 ∑θ k,1,k+1 ≤ 2π and the amount of informationis reduced to 2N −3; rather, O(N).k=i

4.5. INVARIANCE 2494.5.1.0.1 Example. Translating geometric center to origin.We might choose to shift the geometric center α c of an N-point list {x l }(arranged columnar in X) to the origin; [228] [91]α = α ∆ c = Xb ∆ c = 1 X1 ∈ P ⊆ A (530)NIf we were to associate a point-mass m l with each of the points x l in thelist, then their center of mass (or gravity) would be ( ∑ x l m l )/ ∑ m l . Thegeometric center is the same as the center of mass under the assumption ofuniform mass density across points. [135] The geometric center always lies inthe convex hull P of the list; id est, α c ∈ P because b T c 1=1 and b c ≽ 0 . 4.16Subtracting the geometric center from every list member,X − α c 1 T = X − 1 N X11T = X(I − 1 N 11T ) = XV ∈ R n×N (531)So we have (confer (460))D(X) = D(XV ) = δ(V T X T XV )1 T +1δ(V T X T XV ) T − 2V T X T XV ∈ EDM N(532)4.5.1.1 Gram-form invarianceConsequent to list translation invariance, the Gram-form EDM operator(472) exhibits invariance to translation by a doublet (B.2) u1 T + 1u T ;D(G) = D(G − (u1 T + 1u T )) (533)because, for any u∈ R N , D(u1 T + 1u T )=0. The collection of all suchdoublets forms the nullspace to the operator (542); the translation-invariantsubspace S N⊥c (1563) of the symmetric matrices S N . This means matrix Gcan belong to an expanse more broad than a positive semidefinite cone; id est,G∈ S N + − S N⊥c . So explains Gram matrix sufficiency in EDM definition (472).4.16 Any b from α = Xb chosen such that b T 1 = 1, more generally, makes an auxiliaryV -matrix. (B.4.5)

250 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.5.2 Rotation/ReflectionRotation of the list X ∈ R n×N about some arbitrary point α∈ R n , orreflection through some affine subset containing α can be accomplished viaQ(X −α1 T ) where Q is an orthogonal matrix (B.5).We rightfully expectD ( Q(X − α1 T ) ) = D(QX − β1 T ) = D(QX) = D(X) (534)Because list-form D(X) is translation invariant, we may safely ignoreoffset and consider only the impact of matrices that premultiply X .Interpoint distances are unaffected by rotation or reflection; we say,EDM D is rotation/reflection invariant. Proof follows from the fact,Q T =Q −1 ⇒ X T Q T QX =X T X . So (534) follows directly from (460).The class of premultiplying matrices for which interpoint distances areunaffected is a little more broad than orthogonal matrices. Looking at EDMdefinition (460), it appears that any matrix Q p such thatwill have the propertyX T Q T pQ p X = X T X (535)D(Q p X) = D(X) (536)An example is skinny Q p ∈ R m×n (m>n) having orthonormal columns. Wecall such a matrix orthonormal.4.5.2.1 Inner-product form invarianceLikewise, D(Θ) (515) is rotation/reflection invariant;so (535) and (536) similarly apply.4.5.3 Invariance conclusionD(Q p Θ) = D(QΘ) = D(Θ) (537)In the making of an EDM, absolute rotation, reflection, and translationinformation is lost. Given an EDM, reconstruction of point position (4.12,the list X) can be guaranteed correct only in affine dimension r and relativeposition. Given a noiseless complete EDM, this isometric reconstruction isunique in so far as every realization of a corresponding list X is congruent:

4.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 2514.6 Injectivity of D & unique reconstructionInjectivity implies uniqueness of isometric reconstruction; hence, we endeavorto demonstrate it.EDM operators list-form D(X) (460), Gram-form D(G) (472), andinner-product form D(Θ) (515) are many-to-one surjections (4.5) onto thesame range; the EDM cone (5): (confer (473) (544))EDM N = { D(X) : R N−1×N → S N h | X ∈ R N−1×N}= { }D(G) : S N → S N h | G ∈ S N + − S N⊥c= { (538)D(Θ) : R N−1×N−1 → S N h | Θ ∈ R N−1×N−1}where (4.5.1.1)S N⊥c = {u1 T + 1u T | u∈ R N } ⊆ S N (1563)4.6.1 Gram-form bijectivityBecause linear Gram-form EDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G (472)has no nullspace [47,A.1] on the geometric center subspace 4.17 (E.7.2.0.2)S N c∆= {G∈ S N | G1 = 0} (1561)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1562)≡ {V N AV T N | A ∈ SN−1 }(539)then D(G) on that subspace is injective.4.17 The equivalence ≡ in (539) follows from the fact: Given B = V Y V = V N AVN T ∈ SN cwith only matrix A∈ S N−1 unknown, then V † †TNBV N = A or V † N Y V †TN = A .

252 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX∈ basis S N⊥hdim S N c = dim S N h = N(N−1)2in R N(N+1)/2dim S N⊥c= dim S N⊥h= N in R N(N+1)/2basis S N c = V {E ij }V (confer (49))S N cbasis S N⊥cS N h∈ basis S N⊥hFigure 66: Orthogonal complements in S N abstractly oriented in isometricallyisomorphic R N(N+1)/2 . Case N = 2 accurately illustrated in R 3 . Orthogonalprojection of basis for S N⊥h on S N⊥c yields another basis for S N⊥c . (Basisvectors for S N⊥c are illustrated lying in a plane orthogonal to S N c in thisdimension. Basis vectors for each ⊥ space outnumber those for its respectiveorthogonal complement; such is not the case in higher dimension.)

4.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 253To prove injectivity of D(G) on S N c : Any matrix Y ∈ S N can bedecomposed into orthogonal components in S N ;Y = V Y V + (Y − V Y V ) (540)where V Y V ∈ S N c and Y −V Y V ∈ S N⊥c (1563). Because of translationinvariance (4.5.1.1) and linearity, D(Y −V Y V )=0 hence N(D)⊇ S N⊥c . Itremains only to showD(V Y V ) = 0 ⇔ V Y V = 0 (541)(⇔ Y = u1 T + 1u T for some u∈ R N) . D(V Y V ) will vanish whenever2V Y V = δ(V Y V )1 T + 1δ(V Y V ) T . But this implies R(1) (B.2) were asubset of R(V Y V ) , which is contradictory. Thus we haveN(D) = {Y | D(Y )=0} = {Y | V Y V = 0} = S N⊥c (542)Since G1=0 ⇔ X1=0 (481) simply means list X is geometricallycentered at the origin, and because the Gram-form EDM operator D istranslation invariant and N(D) is the translation-invariant subspace S N⊥c ,then EDM definition D(G) (538) on 4.18S N c ∩ S N + = {V Y V ≽ 0 | Y ∈ S N } ≡ {V N AV T N| A∈ S N−1+ } ⊂ S N (543)(confer5.6.1,5.7.1) must be surjective onto EDM N ; (confer (473))EDM N = { }D(G) | G ∈ S N c ∩ S N +4.6.1.1 Gram-form operator D inversionDefine the linear geometric centering operator V ; (confer (482))(544)V(D) : S N → S N ∆ = −V DV 1 2(545)[50,4.3] 4.19 This orthogonal projector V has no nullspace onS N h = aff EDM N (791)4.18 The equivalence ≡ in (543) follows from the fact: Given B = V Y V = V N AVN T ∈ SN +with only matrix A unknown, then V † †TNBV N= A and A∈ SN−1 + must be positivesemidefinite by positive semidefiniteness of B and Corollary A.3.1.0.5.4.19 Critchley cites Torgerson (1958) [225, ch.11,2] for a history and derivation of (545).

254 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXbecause the projection of −D/2 on S N c (1561) can be 0 if and only ifD ∈ S N⊥c ; but S N⊥c ∩ S N h = 0 (Figure 66). Projector V on S N h is thereforeinjective hence invertible. Further, −V S N h V/2 is equivalent to the geometriccenter subspace S N c in the ambient space of symmetric matrices; a surjection,S N c = V(S N ) = V ( ) ( )S N h ⊕ S N⊥h = V SNh (546)because (61)V ( ) ( ) (S N h ⊇ V SN⊥h = V δ 2 (S N ) ) (547)Because D(G) on S N c is injective, and aff D ( V(EDM N ) ) = D ( V(aff EDM N ) )by property (103) of the affine hull, we find for D ∈ S N h (confer (486))id est,orD(−V DV 1 2 ) = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (548)D = D ( V(D) ) (549)−V DV = V ( D(−V DV ) ) (550)S N h = D ( V(S N h ) ) (551)−V S N h V = V ( D(−V S N h V ) ) (552)These operators V and D are mutual inverses.The Gram-form D ( )S N c (472) is equivalent to SNh ;D ( S N c) (= D V(SNh ⊕ S N⊥h ) ) = S N h + D ( V(S N⊥h ) ) = S N h (553)because S N h ⊇ D ( V(S N⊥h ) ) . In summary, for the Gram-form we have theisomorphisms [51,2] [50, p.76, p.107] [5,2.1] 4.20 [4,2] [6,18.2.1] [1,2.1]and from the bijectivity results in4.6.1,S N h = D(S N c ) (554)S N c = V(S N h ) (555)EDM N = D(S N c ∩ S N +) (556)S N c ∩ S N + = V(EDM N ) (557)4.20 [5, p.6, line 20] Delete sentence: Since G is also...not a singleton set.[5, p.10, line 11] x 3 =2 (not 1).

4.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 2554.6.2 Inner-product form bijectivityThe Gram-form EDM operator D(G)= δ(G)1 T + 1δ(G) T − 2G (472) is aninjective map, for example, on the domain that is the subspace of symmetricmatrices having all zeros in the first row and columnS N 1∆= {G∈ S N | Ge 1 = 0}{[ ] [ 0 0T 0 0T= Y0 I 0 I] }| Y ∈ S N(1565)because it obviously has no nullspace there. Since Ge 1 = 0 ⇔ Xe 1 = 0 (474)means the first point in the list X resides at the origin, then D(G) on S N 1 ∩ S N +must be surjective onto EDM N .Substituting Θ T Θ ← −VN TDV N (527) into inner-product form EDMdefinition D(Θ) (515), it may be further decomposed: (confer (480))[ ]0D(D) =δ ( [−VN TDV ) 1 T + 1 0 δ ( −VN TDV ) ] [ ]T 0 0TN − 2N0 −VN TDV N(558)This linear operator D is another flavor of inner-product form and an injectivemap of the EDM cone onto itself. Yet when its domain is instead the entiresymmetric hollow subspace S N h = aff EDM N , D(D) becomes an injectivemap onto that same subspace. Proof follows directly from the fact: linear Dhas no nullspace [47,A.1] on S N h = aff D(EDM N )= D(aff EDM N ) (103).4.6.2.1 Inversion of D ( −VN TDV )NInjectivity of D(D) suggests inversion of (confer (477))V N (D) : S N → S N−1 ∆ = −V T NDV N (559)a linear surjective 4.21 mapping onto S N−1 having nullspace 4.22 S N⊥c ;V N (S N h ) = S N−1 (560)4.21 Surjectivity of V N (D) is demonstrated via the Gram-form EDM operator D(G):Since S N h = D(S N c ) (553), then for any Y ∈ S N−1 , −VN T †TD(VN Y V † N /2)V N = Y .4.22 N(V N ) ⊇ S N⊥c is apparent. There exists a linear mappingT(V N (D)) ∆ = V †TN V N(D)V † N = −V DV 1 2 = V(D)

256 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXinjective on domain S N h because S N⊥c ∩ S N h = 0. Revising the argument ofthis inner-product form (558), we get another flavorD ( [−VN TDV )N =0δ ( −VN TDV )N] [1 T + 1 0 δ ( −VN TDV ) ] TNand we obtain mutual inversion of operators V N and D , for D ∈ S N h[ 0 0T− 20 −VN TDV N(561)]orD = D ( V N (D) ) (562)−V T NDV N = V N(D(−VTN DV N ) ) (563)S N h = D ( V N (S N h ) ) (564)−V T N S N h V N = V N(D(−VTN S N h V N ) ) (565)Substituting Θ T Θ ← Φ into inner-product form EDM definition (515),any EDM may be expressed by the new flavor[ ]D(Φ) =∆ 01δ(Φ)T + 1 [ 0 δ(Φ) ] [ ] 0 0 T T− 2∈ EDM N0 Φ⇔(566)Φ ≽ 0where this D is a linear surjective operator onto EDM N by definition,injective because it has no nullspace on domain S N−1+ . More broadly,aff D(S N−1+ )= D(aff S N−1+ ) (103),S N h = D(S N−1 )S N−1 = V N (S N h )(567)demonstrably isomorphisms, and by bijectivity of this inner-product form:such thatEDM N = D(S N−1+ ) (568)S N−1+ = V N (EDM N ) (569)N(T(V N )) = N(V) ⊇ N(V N ) ⊇ S N⊥c= N(V)where the equality S N⊥c = N(V) is known (E.7.2.0.2).

4.7. EMBEDDING IN AFFINE HULL 2574.7 Embedding in affine hullThe affine hull A (66) of a point list {x l } (arranged columnar in X ∈ R n×N(64)) is identical to the affine hull of that polyhedron P (71) formed from allconvex combinations of the x l ; [38,2] [194,17]A = aff X = aff P (570)Comparing hull definitions (66) and (71), it becomes obvious that the x land their convex hull P are embedded in their unique affine hull A ;A ⊇ P ⊇ {x l } (571)Recall: affine dimension r is a lower bound on embedding, equal todimension of the subspace parallel to that nonempty affine set A in whichthe points are embedded. (2.3.1) We define dimension of the convex hullP to be the same as dimension r of the affine hull A [194,2], but r is notnecessarily equal to the rank of X (590).For the particular example illustrated in Figure 56, P is the triangle plusits relative interior while its three vertices constitute the entire list X . Theaffine hull A is the unique plane that contains the triangle, so r=2 in thatexample while the rank of X is 3. Were there only two points in Figure 56,then the affine hull would instead be the unique line passing through them;r would become 1 while the rank would then be 2.4.7.1 Determining affine dimensionKnowledge of affine dimension r becomes important because we lose anyabsolute offset common to all the generating x l in R n when reconstructingconvex polyhedra given only distance information. (4.5.1) To calculate r , wefirst remove any offset that serves to increase dimensionality of the subspacerequired to contain polyhedron P ; subtracting any α ∈ A in the affine hullfrom every list member will work,translating A to the origin: 4.23X − α1 T (572)A − α = aff(X − α1 T ) = aff(X) − α (573)P − α = conv(X − α1 T ) = conv(X) − α (574)4.23 The manipulation of hull functions aff and conv follows from their definitions.

258 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXBecause (570) and (571) translate,R n ⊇ A −α = aff(X − α1 T ) = aff(P − α) ⊇ P − α ⊇ {x l − α} (575)where from the previous relations it is easily shownaff(P − α) = aff(P) − α (576)Translating A neither changes its dimension or the dimension of theembedded polyhedron P ; (65)r ∆ = dim A = dim(A − α) ∆ = dim(P − α) = dim P (577)For any α ∈ R n , (573)-(577) remain true. [194, p.4, p.12] Yet when α ∈ A ,the affine set A −α becomes a unique subspace of R n in which the {x l − α}and their convex hull P − α are embedded (575), and whose dimension ismore easily calculated.4.7.1.0.1 Example. Translating first list-member to origin.Subtracting the first member α = ∆ x 1 from every list member will translatetheir affine hull A and their convex hull P and, in particular, x 1 ∈ P ⊆ A tothe origin in R n ; videlicet,X − x 1 1 T = X − Xe 1 1 T = X(I − e 1 1 T ) = X[0 √ ]2V N ∈ R n×N (578)where V N is defined in (466), and e 1 in (476). Applying (575) to (578),R n ⊇ R(XV N ) = A−x 1 = aff(X −x 1 1 T ) = aff(P −x 1 ) ⊇ P −x 1 ∋ 0(579)where XV N ∈ R n×N−1 . Hencer = dim R(XV N ) (580)Since shifting the geometric center to the origin (4.5.1.0.1) translates theaffine hull to the origin as well, then it must also be truer = dim R(XV ) (581)

4.7. EMBEDDING IN AFFINE HULL 259For any matrix whose range is R(V )= N(1 T ) we get the same result; e.g.,becauser = dim R(XV †TN ) (582)R(XV ) = {Xz | z ∈ N(1 T )} (583)and R(V ) = R(V N ) = R(V †TN) (E). These auxiliary matrices (B.4.2) aremore closely related;V = V N V † N(1242)4.7.1.1 Affine dimension r versus rankNow, suppose D is an EDM as defined byD(X) ∆ = δ(X T X)1 T + 1δ(X T X) T − 2X T X ∈ EDM N (460)and we premultiply by −V T N and postmultiply by V N . Then because V T N 1=0(467), it is always true that−V T NDV N = 2V T NX T XV N = 2V T N GV N ∈ S N−1 (584)where G is a Gram matrix.(confer (482))Similarly pre- and postmultiplying by V−V DV = 2V X T XV = 2V GV ∈ S N (585)always holds because V 1=0 (1232). Likewise, multiplying inner-productform EDM definition (515), it always holds:−V T NDV N = Θ T Θ ∈ S N−1 (519)For any matrix A , rankA T A = rankA = rankA T . [125,0.4] 4.24 So, by(583), affine dimensionr = rankXV = rankXV N = rankXV †TN= rank Θ= rankV DV = rankV GV = rankVN TDV N = rankVN TGV N(586)4.24 For A∈R m×n , N(A T A) = N(A). [212,3.3]

260 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXBy conservation of dimension, (A.7.2.0.1)r + dim N(VNDV T N ) = N −1 (587)r + dim N(V DV ) = N (588)For D ∈ EDM N −V T NDV N ≻ 0 ⇔ r = N −1 (589)but −V DV ⊁ 0. The general fact 4.25 (confer (493))r ≤ min{n, N −1} (590)is evident from (578) but can be visualized in the example illustrated inFigure 56. There we imagine a vector from the origin to each point in thelist. Those three vectors are linearly independent in R 3 , but affine dimensionr is 2 because the three points lie in a plane. When that plane is translatedto the origin, it becomes the only subspace of dimension r=2 that cancontain the translated triangular polyhedron.4.7.2 PrécisWe collect expressions for affine dimension: for list X ∈ R n×N and Grammatrix G∈ S N +r∆= dim(P − α) = dim P = dim conv X= dim(A − α) = dim A = dim aff X= rank(X − x 1 1 T ) = rank(X − α c 1 T )= rank Θ (517)= rankXV N = rankXV = rankXV †TN= rankX , Xe 1 = 0 or X1=0= rankVN TGV N = rankV GV = rankV † N GV N= rankG , Ge 1 = 0 (477) or G1=0 (482)= rankVN TDV N = rankV DV = rankV † N DV N = rankV N (VN TDV N)VNT= rank Λ (675)([ ]) 0 1T= N −1 − dim N1 −D[ 0 1T= rank1 −D]− 2 (598)(591)⎫⎪⎬D ∈ EDM N⎪⎭4.25 rankX ≤ min{n , N}

4.7. EMBEDDING IN AFFINE HULL 2614.7.3 Eigenvalues of −V DV versus −V † N DV NSuppose for D ∈ EDM N we are given eigenvectors v i ∈ R N of −V DV andcorresponding eigenvalues λ∈ R N so that−V DV v i = λ i v i , i = 1... N (592)From these we can determine the eigenvectors and eigenvalues of −V † N DV N :Defineν i ∆ = V † N v i , λ i ≠ 0 (593)Then we have−V DV N V † N v i = λ i v i (594)−V † N V DV N ν i = λ i V † N v i (595)−V † N DV N ν i = λ i ν i (596)the eigenvectors of −V † N DV N are given by (593) while its correspondingnonzero eigenvalues are identical to those of −V DV although −V † N DV Nis not necessarily positive semidefinite. In contrast, −VN TDV N is positivesemidefinite but its nonzero eigenvalues are generally different.4.7.3.0.1 Theorem. EDM rank versus affine dimension r .[91,3] [109,3] [90,3] For D ∈ EDM N (confer (746))1. r = rank(D) − 1 ⇔ 1 T D † 1 ≠ 0Points constituting a list X generating the polyhedron corresponding toD lie on the relative boundary of an r-dimensional circumhyperspherehavingdiameter = √ 2 ( 1 T D † 1 ) −1/2circumcenter = XD† 11 T D † 1(597)2. r = rank(D) − 2 ⇔ 1 T D † 1 = 0There can be no circumhypersphere whose relative boundary containsa generating list for the corresponding polyhedron.3. In Cayley-Menger form [61,6.2] [49,3.3] [32,40] (4.11.2),([ ]) [ ]0 1T 0 1Tr = N −1 − dim N= rank − 2 (598)1 −D 1 −DCircumhyperspheres exist for r< rank(D)−2. [224,7]⋄

262 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXFor all practical purposes,max{0, rank(D) − 2} ≤ r ≤ min{n, N −1} (599)4.8 Euclidean metric versus matrix criteria4.8.1 Nonnegativity property 1When D=[d ij ] is an EDM (460), then it is apparent from (584)2VNX T T XV N = −VNDV T N ≽ 0 (600)because for any matrix A , A T A≽0 . 4.26 We claim nonnegativity of the d ijis enforced primarily by the matrix inequality (600); id est,−V T N DV N ≽ 0D ∈ S N h}⇒ d ij ≥ 0, i ≠ j (601)(The matrix inequality to enforce strict positivity differs by a stroke of thepen. (604))We now support our claim: If any matrix A∈ R m×m is positivesemidefinite, then its main diagonal δ(A)∈ R m must have all nonnegativeentries. [88,4.2] Given D ∈ S N h−VN TDV N =⎡⎤1d 12 2 (d 112+d 13 −d 23 )2 (d 11,i+1+d 1,j+1 −d i+1,j+1 ) · · ·2 (d 12+d 1N −d 2N )12 (d 112+d 13 −d 23 ) d 13 2 (d 11,i+1+d 1,j+1 −d i+1,j+1 ) · · ·2 (d 13+d 1N −d 3N )1⎢2 (d 11,j+1+d 1,i+1 −d j+1,i+1 )2 (d .1,j+1+d 1,i+1 −d j+1,i+1 ) d .. 11,i+1 2 (d 14+d 1N −d 4N )⎥⎣..... . .. . ⎦12 (d 112+d 1N −d 2N )2 (d 113+d 1N −d 3N )2 (d 14+d 1N −d 4N ) · · · d 1N= 1 2 (1D 1,2:N + D 2:N,1 1 T − D 2:N,2:N ) ∈ S N−1 (602)4.26 For A∈R m×n , A T A ≽ 0 ⇔ y T A T Ay = ‖Ay‖ 2 ≥ 0 for all ‖y‖ = 1. When A isfull-rank skinny-or-square, A T A ≻ 0.

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 263where row,column indices i,j ∈ {1... N −1}. [199] It follows:−V T N DV N ≽ 0D ∈ S N h}⎡⇒ δ(−VNDV T N ) = ⎢⎣⎤d 12d 13⎥.d 1N⎦ ≽ 0 (603)Multiplication of V N by any permutation matrix Ξ has null effect on its rangeand nullspace. In other words, any permutation of the rows or columns of V Nproduces a basis for N(1 T ); id est, R(Ξ r V N )= R(V N Ξ c )= R(V N )= N(1 T ).Hence, −VN TDV N ≽ 0 ⇔ −VN T ΞT rDΞ r V N ≽ 0 (⇔ −Ξ T c VN TDV N Ξ c ≽ 0).Various permutation matrices 4.27 will sift the remaining d ij similarlyto (603) thereby proving their nonnegativity. Hence −VN TDV N ≽ 0 isa sufficient test for the first property (4.2) of the Euclidean metric,nonnegativity.When affine dimension r equals 1, in particular, nonnegativity symmetryand hollowness become necessary and sufficient criteria satisfying matrixinequality (600). (5.6.0.0.1)4.8.1.1 Strict positivityShould we require the points in R n to be distinct, then entries of D off themain diagonal must be strictly positive {d ij > 0, i ≠ j} and only those entriesalong the main diagonal of D are 0. By similar argument, the strict matrixinequality is a sufficient test for strict positivity of Euclidean distance-square;−V T N DV N ≻ 0D ∈ S N h}⇒ d ij > 0, i ≠ j (604)4.27 The rule of thumb is: If Ξ r (i,1) = 1, then δ(−V T N ΞT rDΞ r V N )∈ R N−1 is somepermutation of the i th row or column of D excepting the 0 entry from the main diagonal.

264 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.8.2 Triangle inequality property 4In light of Kreyszig’s observation [140,1.1, prob.15] that properties 2through 4 of the Euclidean metric (4.2) together imply property 1,the nonnegativity criterion (601) suggests that the matrix inequality−V T N DV N ≽ 0 might somehow take on the role of triangle inequality; id est,δ(D) = 0D T = D−V T N DV N ≽ 0⎫⎬⎭ ⇒ √ d ij ≤ √ d ik + √ d kj , i≠j ≠k (605)We now show that is indeed the case: Let T be the leading principalsubmatrix in S 2 of −VN TDV N (upper left 2×2 submatrix from (602));[T =∆1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13(606)Submatrix T must be positive (semi)definite whenever −VN TDV N(A.3.1.0.4,4.8.3) Now we have,is.−V T N DV N ≽ 0 ⇒ T ≽ 0 ⇔ λ 1 ≥ λ 2 ≥ 0−V T N DV N ≻ 0 ⇒ T ≻ 0 ⇔ λ 1 > λ 2 > 0(607)where λ 1 and λ 2 are the eigenvalues of T , real due only to symmetry of T :(λ 1 = 1 d2 12 + d 13 + √ )d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ R(λ 2 = 1 d2 12 + d 13 − √ )d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ R(608)Nonnegativity of eigenvalue λ 1 is guaranteed by only nonnegativity of the d ijwhich in turn is guaranteed by matrix inequality (601). Inequality betweenthe eigenvalues in (607) follows from only realness of the d ij . Since λ 1always equals or exceeds λ 2 , conditions for the positive (semi)definitenessof submatrix T can be completely determined by examining λ 2 the smaller ofits two eigenvalues. A triangle inequality is made apparent when we expressT eigenvalue nonnegativity in terms of D matrix entries; videlicet,

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 265T ≽ 0 ⇔ detT = λ 1 λ 2 ≥ 0 , d 12 ,d 13 ≥ 0 (c)⇔λ 2 ≥ 0(b)⇔| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)(609)Triangle inequality (609a) (confer (513) (621)), in terms of three rootedentries from D , is equivalent to metric property 4√d13 ≤ √ d 12 + √ d 23√d23≤ √ d 12 + √ d 13√d12≤ √ d 13 + √ d 23(610)for the corresponding points x 1 ,x 2 ,x 3 from some length-N list. 4.284.8.2.1 CommentGiven D whose dimension N equals or exceeds 3, there are N!/(3!(N − 3)!)distinct triangle inequalities in total like (513) that must be satisfied, of whicheach d ij is involved in N−2, and each point x i is in (N −1)!/(2!(N −1 − 2)!).We have so far revealed only one of those triangle inequalities; namely, (609a)that came from T (606). Yet we claim if −VN TDV N ≽ 0 then all triangleinequalities will be satisfied simultaneously;| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj , i

266 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.8.2.1.1 Shore. The columns of Ξ r V N Ξ c hold a basis for N(1 T )when Ξ r and Ξ c are permutation matrices. In other words, any permutationof the rows or columns of V N leaves its range and nullspace unchanged;id est, R(Ξ r V N Ξ c )= R(V N )= N(1 T ) (467). Hence, two distinct matrixinequalities can be equivalent tests of the positive semidefiniteness of D onR(V N ) ; id est, −VN TDV N ≽ 0 ⇔ −(Ξ r V N Ξ c ) T D(Ξ r V N Ξ c )≽0. By properlychoosing permutation matrices, 4.29 the leading principal submatrix T Ξ ∈ S 2of −(Ξ r V N Ξ c ) T D(Ξ r V N Ξ c ) may be loaded with the entries of D needed totest any particular triangle inequality (similarly to (602)-(609)). Because allthe triangle inequalities can be individually tested using a test equivalent tothe lone matrix inequality −VN TDV N ≽ 0, it logically follows that the lonematrix inequality tests all those triangle inequalities simultaneously. Weconclude that −VN TDV N ≽0 is a sufficient test for the fourth property of theEuclidean metric, triangle inequality.4.8.2.2 Strict triangle inequalityWithout exception, all the inequalities in (609) and (610) can be madestrict while their corresponding implications remain true. The thenstrict inequality (609a) or (610) may be interpreted as a strict triangleinequality under which collinear arrangement of points is not allowed.[138,24/6, p.322] Hence by similar reasoning, −VN TDV N ≻ 0 is a sufficienttest of all the strict triangle inequalities; id est,δ(D) = 0D T = D−V T N DV N ≻ 0⎫⎬⎭ ⇒ √ d ij < √ d ik + √ d kj , i≠j ≠k (612)4.8.3 −V T N DV N nestingFrom (606) observe that T =−VN TDV N | N←3 . In fact, for D ∈ EDM N , theleading principal submatrices of −VN TDV N form a nested sequence (byinclusion) whose members are individually positive semidefinite [88] [125][212] and have the same form as T ; videlicet, 4.304.29 To individually test triangle inequality | √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj forparticular i,k,j, set Ξ r (i,1)= Ξ r (k,2)= Ξ r (j,3)=1 and Ξ c = I .4.30 −V DV | N←1 = 0 ∈ S 0 + (B.4.1)

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 267−V T N DV N | N←1 = [ ∅ ] (o)−V T N DV N | N←2 = [d 12 ] ∈ S + (a)−V T N DV N | N←3 =−V T N DV N | N←4 =[⎡⎢⎣1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13= T ∈ S 2 + (b)1d 12 (d 12 12+d 13 −d 23 ) (d ⎤2 12+d 14 −d 24 )1(d 12 12+d 13 −d 23 ) d 13 (d 2 13+d 14 −d 34 )1(d 12 12+d 14 −d 24 ) (d 2 13+d 14 −d 34 ) d 14⎥⎦ (c).−VN TDV N | N←i =.−VN TDV N =⎡⎣⎡⎣−VN TDV ⎤N | N←i−1 ν(i)⎦ ∈ S i−1ν T + (d)(i) d 1i−VN TDV ⎤N | N←N−1 ν(N)⎦ ∈ S N−1ν T + (e)(N) d 1N (613)where⎡ν(i) = ∆ 1 ⎢2 ⎣⎤d 12 +d 1i −d 2id 13 +d 1i −d 3i⎥. ⎦ ∈ Ri−2 , i > 2 (614)d 1,i−1 +d 1i −d i−1,iHence, the leading principal submatrices of EDM D must also be EDMs. 4.31Bordered symmetric matrices in the form (613d) are known to haveintertwined [212,6.4] (or interlaced [125,4.3] [209,IV.4.1]) eigenvalues;(confer4.11.1) that means, for the particular submatrices (613a) and (613b),4.31 In fact, each and every principal submatrix of an EDM D is another EDM. [145,4.1]

268 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXλ 2 ≤ d 12 ≤ λ 1 (615)where d 12 is the eigenvalue of submatrix (613a) and λ 1 , λ 2 are theeigenvalues of T (613b) (606). Intertwining in (615) predicts that shouldd 12 become 0, then λ 2 must go to 0 . 4.32 The eigenvalues are similarlyintertwined for submatrices (613b) and (613c);γ 3 ≤ λ 2 ≤ γ 2 ≤ λ 1 ≤ γ 1 (616)where γ 1 , γ 2 ,γ 3 are the eigenvalues of submatrix (613c). Intertwininglikewise predicts that should λ 2 become 0 (a possibility revealed in4.8.3.1),then γ 3 must go to 0. Combining results so far for N = 2, 3, 4: (615) (616)γ 3 ≤ λ 2 ≤ d 12 ≤ λ 1 ≤ γ 1 (617)The preceding logic extends by induction through the remaining membersof the sequence (613).4.8.3.1 Tightening the triangle inequalityNow we apply the Schur complement fromA.4 to tighten the triangleinequality in the case cardinality N = 4. We find that the gains by doing soare modest. From (613) we identify:[ ] A BCB T∆= −V T NDV N | N←4 (618)A ∆ = T = −V T NDV N | N←3 (619)both positive semidefinite by assumption, B = ν(4) defined in (614), andC = d 14 . Using nonstrict CC † -form (1108), C ≽ 0 by assumption (4.8.1)and CC † = I . So by the positive semidefinite ordering of eigenvalues theorem(A.3.1.0.1),−V T NDV N | N←4 ≽ 0 ⇔ T ≽ d −114 ν(4)ν T (4) ⇒{λ1 ≥ d −114 ‖ν(4)‖ 2λ 2 ≥ 0(620)where {d −114 ‖ν(4)‖ 2 , 0} are the eigenvalues of d −114 ν(4)ν T (4) while λ 1 , λ 2 arethe eigenvalues of T .4.32 If d 12 were 0, eigenvalue λ 2 becomes 0 (608) because d 13 must then be equal to d 23 ;id est, d 12 = 0 ⇔ x 1 = x 2 . (4.4)

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 2694.8.3.1.1 Example. Small completion problem, II.Applying the inequality for λ 1 in (620) to the small completion problem onpage 218 Figure 57, the lower bound on √ d 14 (1.236 in (453)) is tightenedto 1.289 . The correct value of √ d 14 to three significant figures is 1.414 .4.8.4 Affine dimension reduction in two dimensions(confer4.14.4) The leading principal 2 ×2 submatrix T of −VN TDV N haslargest eigenvalue λ 1 (608) which is a convex function of D . 4.33 λ 1 can neverbe 0 unless d 12 = d 13 = d 23 = 0. Eigenvalue λ 1 can never be negative whilethe d ij are nonnegative. The remaining eigenvalue λ 2 is a concave functionof D that becomes 0 only at the upper and lower bounds of inequality (609a)and its equivalent forms: (confer (611))| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)⇔| √ d 12 − √ d 13 | ≤ √ d 23 ≤ √ d 12 + √ d 13 (b)⇔| √ d 13 − √ d 23 | ≤ √ d 12 ≤ √ d 13 + √ d 23 (c)(621)In between those bounds, λ 2 is strictly positive; otherwise, it would benegative but prevented by the condition T ≽ 0.When λ 2 becomes 0, it means triangle △ 123 has collapsed to a linesegment; a potential reduction in affine dimension r . The same logic is validfor any particular principal 2×2 submatrix of −VN TDV N , hence applicableto other triangles.4.33 The maximum eigenvalue of any symmetric matrix is always a convex function of itsentries, while the ⎡ minimum ⎤ eigenvalue is always concave. [38, exmp.3.10] In our particulard 12case, say d =∆ ⎣d 13⎦∈ R 3 . Then the Hessian (1343) ∇ 2 λ 1 (d)≽0 certifies convexityd 23whereas ∇ 2 λ 2 (d)≼0 certifies concavity. Each Hessian has rank equal to 1. The respectivegradients ∇λ 1 (d) and ∇λ 2 (d) are nowhere 0.

270 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.9 Bridge: Convex polyhedra to EDMsThe criteria for the existence of an EDM include, by definition (460) (515),the properties imposed upon its entries d ij by the Euclidean metric. From4.8.1 and4.8.2, we know there is a relationship of matrix criteria to thoseproperties. Here is a snapshot of what we are sure: for i , j , k ∈{1... N}(confer4.2)√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇐−V T N DV N ≽ 0δ(D) = 0D T = D(622)all implied by D ∈ EDM N . In words, these four Euclidean metric propertiesare necessary conditions for D to be a distance matrix. At the moment,we have no converse. As of concern in4.3, we have yet to establishmetric requirements beyond the four Euclidean metric properties that wouldallow D to be certified an EDM or might facilitate polyhedron or listreconstruction from an incomplete EDM. We deal with this problem in4.14.Our present goal is to establish ab initio the necessary and sufficient matrixcriteria that will subsume all the Euclidean metric properties and any furtherrequirements 4.34 for all N >1 (4.8.3); id est,−V T N DV N ≽ 0D ∈ S N h}⇔ D ∈ EDM N (479)or for EDM definition (524),}Ω ≽ 0√δ(d) ≽ 0⇔ D = D(Ω,d) ∈ EDM N (623)4.34 In 1935, Schoenberg [199, (1)] first extolled matrix product −VN TDV N (602)(predicated on symmetry and self-distance) specifically incorporating V N , albeitalgebraically. He showed: nonnegativity −y T VN TDV N y ≥ 0, for all y ∈ R N−1 , is necessaryand sufficient for D to be an EDM. Gower [90,3] remarks how surprising it is that sucha fundamental property of Euclidean geometry was obtained so late.

4.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 271Figure 67: Elliptope E 3 in isometrically isomorphic R 6 (projected on R 3 ) is aconvex body. Elliptope relative boundary is not smooth and not polyhedraland comprises all set members (624) having at least one 0 eigenvalue. Thiselliptope has only four vertices but an infinity of extreme points; meaning,there are only four points belonging to this elliptope that can be isolatedby a strictly supporting hyperplane. Each vertex is rank-1 correspondingto a dyad yy T having binary vector y ∈ R 3 with entries in {±1}. In thisdimension, the elliptope resembles a malformed pillow in the shape of abulging tetrahedron.

272 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.9.1 Geometric arguments4.9.1.0.1 Definition. Elliptope: [147] [145,2.3] [61,31.5] a uniquebounded immutable convex Euclidean body in S n ; intersection of positivesemidefinite cone S n + with that set of n hyperplanes defined by unity maindiagonal;E n ∆ = S n + ∩ {Φ∈ S n | δ(Φ)=1} (624)a.k.a, the set of all correlation matrices of dimensiondim E n = n(n −1)/2 in R n(n+1)/2 (625)The 2 n−1 vertices (2.6.1.0.1) of E n are extreme directions yy T of thepositive semidefinite cone, where the entries of vector y ∈ R n each belong to{±1} while the vector exercises every combination. △The elliptope for dimension n = 2 is a line segment in isometricallyisomorphic R n(n+1)/2 . The elliptope for dimension n = 3 is realized inFigure 67.4.9.1.0.2 Lemma. Hypersphere. [13,4] (confer bullet p.228)Matrix A = [A ij ]∈ S N belongs to the elliptope in S N iff there exist N pointsp on the boundary of a hypersphere having radius 1 in R rank A such that‖p i − p j ‖ = √ 2 √ 1 − A ij , i,j=1... N (626)There is a similar theorem for Euclidean distance matrices:We derive matrix criteria for D to be an EDM, validating (479) usingsimple geometry; distance to the polyhedron formed by the convex hull of alist of points (64) in Euclidean space R n .4.9.1.0.3 EDM assertion.D is a Euclidean distance matrix if and only if D ∈ S N h and distances-squarefrom the origin{‖p(y)‖ 2 = −y T V T NDV N y | y ∈ S − β} (627)correspond to points p in some bounded convex polyhedronP − α = {p(y) | y ∈ S − β} (628)⋄

4.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 273having N or fewer vertices embedded in an r-dimensional subspace A − αof R n , where α ∈ A = aff P and where the domain of linear surjection p(y)is the unit simplex S ⊂ R N−1+ shifted such that its vertex at the origin istranslated to −β in R N−1 . When β = 0, then α = x 1 .⋄In terms of V N , the unit simplex (241) in R N−1 has an equivalentrepresentation:S = {s ∈ R N−1 | √ 2V N s ≽ −e 1 } (629)where e 1 is as in (476). Incidental to the EDM assertion, shifting theunit-simplex domain in R N−1 translates the polyhedron P in R n . Indeed,there is a map from vertices of the unit simplex to members of the listgenerating P ;p : R N−1⎛⎧⎪⎨p⎜⎝⎪⎩−βe 1 − βe 2 − β.e N−1 − β⎫⎞⎪⎬⎟⎠⎪⎭→=⎧⎪⎨⎪⎩R nx 1 − αx 2 − αx 3 − α.x N − α⎫⎪⎬⎪⎭(630)4.9.1.0.4 Proof. EDM assertion.(⇒) We demonstrate that if D is an EDM, then each distance-square ‖p(y)‖ 2described by (627) corresponds to a point p in some embedded polyhedronP − α . Assume D is indeed an EDM; id est, D can be made from some listX of N unknown points in Euclidean space R n ; D =D(X) for X ∈ R n×Nas in (460). Since D is translation invariant (4.5.1), we may shift the affinehull A of those unknown points to the origin as in (572). Then take anypoint p in their convex hull (71);P − α = {p = (X − Xb1 T )a | a T 1 = 1, a ≽ 0} (631)where α = Xb ∈ A ⇔ b T 1 = 1. Solutions to a T 1 = 1 are: 4.35a ∈{e 1 + √ }2V N s | s ∈ R N−1(632)4.35 Since R(V N )= N(1 T ) and N(1 T )⊥ R(1) , then over all s∈ R N−1 , V N s is ahyperplane through the origin orthogonal to 1. Thus the solutions {a} constitute ahyperplane orthogonal to the vector 1, and offset from the origin in R N by any particularsolution; in this case, a = e 1 .

274 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXwhere e 1 is as in (476). Similarly, b = e 1 + √ 2V N β .P − α = {p = X(I − (e 1 + √ 2V N β)1 T )(e 1 + √ 2V N s) | √ 2V N s ≽ −e 1 }= {p = X √ 2V N (s − β) | √ 2V N s ≽ −e 1 }(633)that describes the domain of p(s) as the unit simplexMaking the substitution s − β ← yS = {s | √ 2V N s ≽ −e 1 } ⊂ R N−1+ (629)P − α = {p = X √ 2V N y | y ∈ S − β} (634)Point p belongs to a convex polyhedron P−α embedded in an r-dimensionalsubspace of R n because the convex hull of any list forms a polyhedron, andbecause the translated affine hull A − α contains the translated polyhedronP − α (575) and the origin (when α∈A), and because A has dimension r bydefinition (577). Now, any distance-square from the origin to the polyhedronP − α can be formulated{p T p = ‖p‖ 2 = 2y T V T NX T XV N y | y ∈ S − β} (635)Applying (584) to (635) we get (627).(⇐) To validate the EDM assertion in the reverse direction, we prove: Ifeach distance-square ‖p(y)‖ 2 (627) on the shifted unit-simplex S −β ⊂ R N−1corresponds to a point p(y) in some embedded polyhedron P − α , then Dis an EDM. The r-dimensional subspace A − α ⊆ R n is spanned byp(S − β) = P − α (636)because A − α = aff(P − α) ⊇ P − α (575). So, outside the domain S − βof linear surjection p(y) , the simplex complement \S − β ⊂ R N−1 mustcontain the domain of the distance-square ‖p(y)‖ 2 = p(y) T p(y) to remainingpoints in the subspace A − α ; id est, to the polyhedron’s relative exterior\P − α . For ‖p(y)‖ 2 to be nonnegative on the entire subspace A − α ,−V T N DV N must be positive semidefinite and is assumed symmetric; 4.36−V T NDV N ∆ = Θ T pΘ p (637)4.36 The antisymmetric part ( −V T N DV N − (−V T N DV N) T) /2 is annihilated by ‖p(y)‖ 2 . Bythe same reasoning, any positive (semi)definite matrix A is generally assumed symmetricbecause only the symmetric part (A +A T )/2 survives the test y T Ay ≥ 0. [125,7.1]

4.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 275where 4.37 Θ p ∈ R m×N−1 for some m ≥ r . Because p(S − β) is a convexpolyhedron, it is necessarily a set of linear combinations of points from somelength-N list because every convex polyhedron having N or fewer verticescan be generated that way (2.12.2). Equivalent to (627) are{p T p | p ∈ P − α} = {p T p = y T Θ T pΘ p y | y ∈ S − β} (638)Because p ∈ P − α may be found by factoring (638), the list Θ p is foundby factoring (637). A unique EDM can be made from that list usinginner-product form definition D(Θ)| Θ=Θp (515). That EDM will be identicalto D if δ(D)=0, by injectivity of D (558).4.9.2 Necessity and sufficiencyFrom (600) we learned that the matrix inequality −VN TDV N ≽ 0 is anecessary test for D to be an EDM. In4.9.1, the connection betweenconvex polyhedra and EDMs was pronounced by the EDM assertion; thematrix inequality together with D ∈ S N h became a sufficient test when theEDM assertion demanded that every bounded convex polyhedron have acorresponding EDM. For all N >1 (4.8.3), the matrix criteria for theexistence of an EDM in (479), (623), and (455) are therefore necessaryand sufficient and subsume all the Euclidean metric properties and furtherrequirements.4.9.3 Example revisitedNow we apply the necessary and sufficient EDM criteria (479) to an earlierproblem.4.9.3.0.1 Example. Small completion problem, III. (confer4.8.3.1.1)Continuing Example 4.3.0.0.2 pertaining to Figure 57 where N = 4,distance-square d 14 is ascertainable from the matrix inequality −VN TDV N ≽0.Because all distances in (452) are known except √ d 14 , we may simplycalculate the minimum eigenvalue of −VN TDV N over a range of d 14 as inFigure 68. We observe a unique value of d 14 satisfying (479) where theabscissa is tangent to the hypograph of the minimum eigenvalue. Sincethe minimum eigenvalue of a symmetric matrix is known to be a concave4.37 A T = A ≽ 0 ⇔ A = R T R for some real matrix R . [212,6.3]

276 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXfunction (4.8.4), we calculate its second partial derivative with respect tod 14 evaluated at 2 and find −1/3. We conclude there are no other satisfyingvalues of d 14 . Further, that value of d 14 does not meet an upper or lowerbound of a triangle inequality like (611), so neither does it cause the collapseof any triangle. Because the minimum eigenvalue is 0, affine dimension r ofany point list corresponding to D cannot exceed N −2. (4.7.1.1) 4.10 EDM-entry compositionLaurent [145,2.3] applies results from Schoenberg (1938) [200] to showcertain nonlinear compositions of individual EDM entries yield EDMs;in particular,D ∈ EDM N ⇔ [1 − e −αd ij] ∈ EDM N ∀α > 0⇔ [e −αd ij] ∈ E N ∀α > 0(639)where D = [d ij ] and E N is the elliptope (624).Proof.F.Schoenberg’s results [200,6, thm.5] (confer [140, p.108-109]) alsosuggest certain finite positive roots of EDM entries produce EDMs;specifically,D ∈ EDM N ⇔ [d 1/αij ] ∈ EDM N ∀α > 1 (640)The special case α = 2 is of interest because it corresponds to absolutedistance; e.g.,D ∈ EDM N ⇒ ◦√ D ∈ EDM N (641)Assuming that points constituting a corresponding list X are distinct(604), then it follows: for D ∈ S N hlimα→∞ [d1/α ij] = limα→∞[1 − e −αd ij] = −E ∆ = 11 T − I (642)Negative elementary matrix −E (B.3) is relatively interior to the EDM cone(5.6) and terminal to the respective trajectories (639) and (640) as functionsof α . Both trajectories are confined to the EDM cone; in engineering terms,the EDM cone is an invariant set [197] to either trajectory. Further, if D is

4.10. EDM-ENTRY COMPOSITION 277minimum eigenvalue1 2 3 4d 14-0.2-0.4-0.6Figure 68: Minimum eigenvalue of −VN TDV None value of d 14 ; 2.positive semidefinite for only1.41.2d 1/αijlog 2 (1+d 1/αij )(a)10.80.60.40.21−e −αd ij40.5 1 1.5 2d ij3(b)21d 1/αijlog 2 (1+d 1/αij )1−e −αd ij0.5 1 1.5 2αFigure 69: Some entrywise EDM compositions: (a) α = 2. Concavenondecreasing in d ij . (b) Trajectory convergence in α for d ij = 2.

278 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXnot an EDM but for some particular α p it becomes an EDM, then for allgreater values of α it remains an EDM.These preliminary findings lead one to speculate whether any concavenondecreasing composition of individual EDM entries d ij on R + will produceanother EDM; e.g., empirical evidence suggests that given EDM D , for eachfixed α ≥ 1 [sic] the composition [log 2 (1 + d 1/αij )] is also an EDM. Figure 69illustrates that composition’s concavity in d ij together with functions from(639) and (640).4.10.1 EDM by elliptopeFor some κ∈ R + and C ∈ S N + in the elliptope E N (4.9.1.0.1), Alfakih assertsany given EDM D is expressible, [7] [61,31.5]D = κ(11 T − C) ∈ EDM N (643)This expression exhibits nonlinear combination of variables κ and C . Wetherefore propose a different expression requiring redefinition of the elliptope(624) by parametrization;E n t∆= S n + ∩ {Φ∈ S n | δ(Φ)=t1} (644)where, of course, E n = E n 1 . Then any given EDM D is expressiblewhich is linear in variables t∈ R + and E∈ E N t .D = t11 T − E ∈ EDM N (645)4.11 EDM indefinitenessBy the known result inA.7.1 regarding a 0-valued entry on the maindiagonal of a symmetric positive semidefinite matrix, there can be no positivenor negative semidefinite EDM except the 0 matrix because EDM N ⊆ S N h(459) andS N h ∩ S N + = 0 (646)the origin. So when D ∈ EDM N , there can be no factorization D =A T Anor −D =A T A . [212,6.3] Hence eigenvalues of an EDM are neither allnonnegative or all nonpositive; an EDM is indefinite and possibly invertible.

4.11. EDM INDEFINITENESS 2794.11.1 EDM eigenvalues, congruence transformationFor any symmetric −D , we can characterize its eigenvalues by congruencetransformation: [212,6.3][ ]VT−W T NDW = − D [ V N1 T1 ] = −[VTN DV N V T N D11 T DV N 1 T D1]∈ S N (647)BecauseW ∆ = [V N 1 ] ∈ R N×N (648)is full-rank, then (1114)inertia(−D) = inertia ( −W T DW ) (649)the congruence (647) has the same number of positive, zero, and negativeeigenvalues as −D . Further, if we denote by {γ i , i=1... N −1} theeigenvalues of −VN TDV N and denote eigenvalues of the congruence −W T DWby {ζ i , i=1... N} and if we arrange each respective set of eigenvalues innonincreasing order, then by theory of interlacing eigenvalues for borderedsymmetric matrices [125,4.3] [212,6.4] [209,IV.4.1]ζ N ≤ γ N−1 ≤ ζ N−1 ≤ γ N−2 ≤ · · · ≤ γ 2 ≤ ζ 2 ≤ γ 1 ≤ ζ 1 (650)When D ∈ EDM N , then γ i ≥ 0 ∀i (1057) because −V T N DV N ≽0 as weknow. That means the congruence must have N −1 nonnegative eigenvalues;ζ i ≥ 0, i=1... N −1. The remaining eigenvalue ζ N cannot be nonnegativebecause then −D would be positive semidefinite, an impossibility; so ζ N < 0.By congruence, nontrivial −D must therefore have exactly one negativeeigenvalue; 4.38 [61,2.4.5]4.38 All the entries of the corresponding eigenvector must have the same sign with respectto each other [50, p.116] because that eigenvector is the Perron vector corresponding tothe spectral radius; [125,8.2.6] the predominant characteristic of negative [sic] matrices.

280 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXD ∈ EDM N ⇒⎧λ(−D) i ≥ 0, i=1... N −1⎪⎨ ( N)∑λ(−D) i = 0i=1⎪⎩D ∈ S N h ∩ R N×N+(651)where the λ(−D) i are nonincreasingly ordered eigenvalues of −D whosesum must be 0 only because trD = 0 [212,5.1]. The eigenvalue summationcondition, therefore, can be considered redundant. Even so, all theseconditions are insufficient to determine whether some given H ∈ S N h is anEDM, as shown by counter-example. 4.394.11.1.0.1 Exercise. Spectral inequality.Prove whether it holds: for D=[d ij ]∈ EDM Nλ(−D) 1 ≥ d ij ≥ λ(−D) N−1 ∀i ≠ j (652)4.11.2 Spectral cones[ 0 1TDenoting the eigenvalues of Cayley-Menger matrix1 −D([ 0 1Tλ1 −D]∈ S N+1 by])∈ R N+1 (653)we have the Cayley-Menger form (4.7.3.0.1) of necessary and sufficientconditions for D ∈ EDM N from the literature; [109,3] 4.40 [43,3] [61,6.2](confer (479) (455))4.39 When N = 3, for example, the symmetric hollow matrix⎡H = ⎣0 1 11 0 51 5 0⎤⎦ ∈ S N h ∩ R N×N+is not an EDM, although λ(−H) = [5 0.3723 −5.3723] T conforms to (651).4.40 Recall: for D ∈ S N h , −V T N DV N ≽ 0 subsumes nonnegativity property 1 (4.8.1).

4.11. EDM INDEFINITENESS 281D ∈ EDM N ⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])≥ 0 ,ii = 1... N⎫⎪⎬⎪⎭ ⇔ {−VTN DV N ≽ 0D ∈ S N h(654)These conditions say the Cayley-Menger form has one ([ and only])one negative0 1Teigenvalue. When D is an EDM, the eigenvalues λbelong to1 −Dthe particular orthant in R N+1 having the N+1 th coordinate as sole negativecoordinate 4.41 ;[ ]RN+= cone {eR 1 , e 2 , · · · e N , −e N+1 } (655)−4.11.2.1 Cayley-Menger versus SchoenbergConnection to the Schoenberg criterion (479) is made when theCayley-Menger form is further partitioned:⎡ [ ] [ ] ⎤[ ] 0 1T 0 1 1 T⎢= ⎣ 1 0 −D 1,2:N⎥⎦ (656)1 −D[1 −D 2:N,1 ] −D 2:N,2:N[ ] 0 1Matrix D ∈ S N h is an EDM if and only if the Schur complement of1 0in this partition (A.4) is positive semidefinite; [13,1] [134,3] id est,(confer (602))D ∈ EDM N⇔[ ][ ]0 1 1−D 2:N,2:N − [1 −D 2:N,1 ]T= −2VN 1 0 −D TDV N ≽ 01,2:Nand(657)D ∈ S N h4.41 Empirically, all except one entry of the corresponding eigenvector have the same signwith respect to each other.

282 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXNow we apply results from chapter 2 with regard to polyhedral cones andtheir duals.4.11.2.2 Ordered eigenvaluesWhen eigenvalues λ are nonincreasingly ordered, conditions (654) specifytheir [ membership ] to the smallest pointed polyhedral spectral cone for0 1T1 −EDM N :K ∆ λ = {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N ≥ 0 ≥ ζ N+1 , 1 T ζ = 0}[ ]RN+= K M ∩ ∩ ∂HR −([ ])0 1T= λ1 −EDM N(658)where∂H = {ζ ∈ R N+1 | 1 T ζ = 0} (659)is a hyperplane through the origin, and K M is the monotone cone(2.13.9.4.2) which has nonempty interior but is not pointed;K M = {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N+1 } (354)K ∗ M = { [e 1 − e 2 e 2 −e 3 · · · e N −e N+1 ]a | a ≽ 0 } ⊂ R N+1 (355)So because of the hyperplane,indicating K λ has empty interior. DefiningA ∆ =⎡⎢⎣e T 1 − e T 2e T 2 − e T 3.e T N − eT N+1⎤dim aff K λ = dim ∂H = N (660)⎥⎦ ∈ RN×N+1 , B ∆ =⎡⎢⎣e T 1e T2 .e T N−e T N+1⎤⎥⎦ ∈ RN+1×N+1 (661)

4.11. EDM INDEFINITENESS 283we have the halfspace-description:K λ = {ζ ∈ R N+1 | Aζ ≽ 0, Bζ ≽ 0, 1 T ζ = 0} (662)From this and (362) we get a vertex-description for a pointed spectral conehaving empty interior:K λ ={V N([ ÂˆB] ) †}V N b | b ≽ 0(663)where V N ∈ R N+1×N , and where [sic]ˆB = e T N ∈ R 1×N+1 (664)andÂ =⎡⎢⎣e T 1 − e T 2e T 2 − e T 3.e T N−1 − eT N⎤⎥⎦ ∈ RN−1×N+1 (665)hold [ ] those rows of A and B corresponding to conically independent rows inAVB N .For eigenvalues arranged in nonincreasing order, conditions (654) can berestated in terms of a spectral cone for Euclidean distance matrices:D ∈ EDM N⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])∈ K M ∩[ ]RN+∩ ∂HR −The vertex-description of the dual spectral cone is, (257)(666)K ∗ λ = K∗ M + [RN+R −] ∗+ ∂H ∗ ⊆ R N+1= {[ A T B T 1 −1 ] b | b ≽ 0 } ={ }[ÂT ˆBT 1 −1]a | a ≽ 0(667)

284 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXFrom (663) and (363) we get a halfspace-description:K ∗ λ = {y ∈ R N+1 | (V T N [ÂT ˆBT ]) † V T N y ≽ 0} (668)This polyhedral dual spectral cone K ∗ λ is closed, convex, has nonemptyinterior because K λ is pointed, but is not pointed because K λ has emptyinterior.4.11.2.3 Unordered eigenvaluesSpectral cones are not unique. When eigenvalue ordering is of no concern, 4.42things simplify: The conditions (654) specify eigenvalue membership to([ ]) [ ]0 1T RNλ1 −EDM N += ∩ ∂HR −= {ζ ∈ R N+1 | Bζ ≽ 0, 1 T ζ = 0}(669)where B is defined in (661), and ∂H in (659). From (362) we get avertex-description for a pointed spectral cone having empty interior:([ ])0 1T {λ1 −EDM N = V N ( ˜BV}N ) † b | b ≽ 0{[ ] } (670)I=−1 T b | b ≽ 0where V N ∈ R N+1×N and˜B ∆ =⎡⎢⎣e T 1e T2 .e T N⎤⎥⎦ ∈ RN×N+1 (671)holds only those rows of B corresponding to conically independent rowsin BV N .4.42 Eigenvalue ordering (represented by a cone having monotone description such as (658))would be of no concern in (958), for example, where projection of a given presorted seton the nonnegative orthant in a subspace is equivalent to its projection on the monotonenonnegative cone in that same subspace; equivalence is a consequence of presorting.

4.11. EDM INDEFINITENESS 285For unordered eigenvalues, (654) can be equivalently restatedD ∈ EDM N⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])∈[ ]RN+∩ ∂HR −(672)Vertex-description of the dual spectral cone is, (257)([ ]) 0 1T ∗λ1 −EDM N =[ ]RN++ ∂H ∗ ⊆ R N+1R −= {[ B T 1 −1 ] b | b ≽ 0 } =From (363) we get a halfspace-description:{ [ }˜BT 1 −1]a | a ≽ 0(673)([ ]) 0 1T ∗λ1 −EDM N = {y ∈ R N+1 | (VN T ˜B T ) † VN T y ≽ 0}= {y ∈ R N+1 | [I −1 ]y ≽ 0}(674)This polyhedral dual spectral cone is closed, convex, has nonempty interiorbut is not pointed. (Notice that any nonincreasingly ordered eigenspectrumbelongs to this dual spectral cone.)4.11.2.4 Dual cone versus dual spectral coneAn open question regards the relationship of convex cones and their duals tothe corresponding spectral cones and their duals. The positive semidefinitecone, for example, is self-dual. Both the nonnegative orthant and themonotone nonnegative cone are spectral cones for it. When we considerthe nonnegative orthant, then the spectral cone for the self-dual positivesemidefinite cone is also self-dual.

286 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.12 List reconstructionThe traditional term multidimensional scaling 4.43 [159] [58] [229] [57][162] [50] refers to any reconstruction of a list X ∈ R n×N in Euclideanspace from interpoint distance information, possibly incomplete (5.4),ordinal (4.13.2), or specified perhaps only by bounding-constraints(4.4.2.2.7) [227]. Techniques for reconstruction are essentially methodsfor optimally embedding an unknown list of points, corresponding togiven Euclidean distance data, in an affine subset of desired or minimumdimension. The oldest known precursor is called principal componentanalysis [93] which analyzes the correlation matrix (4.9.1.0.1); [34,22]a.k.a, Karhunen−Loéve transform in the digital signal processing literature.Isometric reconstruction (4.5.3) of point list X is best performed by eigendecomposition of a Gram matrix; for then, numerical errors of factorizationare easily spotted in the eigenvalues.We now consider how rotation/reflection and translation invariance factorinto a reconstruction.4.12.1 x 1 at the origin. V NAt the stage of reconstruction, we have D ∈ EDM N and we wish to finda generating list (2.3.2) for P − α by factoring positive semidefinite−VN TDV N (637) as suggested in4.9.1.0.4. One way to factor −VN TDV Nis via diagonalization of symmetric matrices; [212,5.6] [125] (A.5.2,A.3)−VNDV T ∆ N = QΛQ T (675)QΛQ T ≽ 0 ⇔ Λ ≽ 0 (676)where Q ∈ R N−1×N−1 is an orthogonal matrix containing eigenvectors whileΛ∈ R N−1×N−1 is a diagonal matrix containing corresponding nonnegativeeigenvalues ordered by nonincreasing value. From the diagonalization,identify the list via positive square root (A.5.2.1) using (584);−V T NDV N = 2V T NX T XV N ∆ = Q √ ΛQ T pQ p√ΛQT(677)where √ ΛQ T pQ p√Λ ∆ = Λ = √ Λ √ Λ and where Q p ∈ R n×N−1 is unknown asis its dimension n . Rotation/reflection is accounted for by Q p yet only its4.43 Scaling [225] means making a scale, i.e., a numerical representation of qualitative data.If the scale is multidimensional, it’s multidimensional scaling.−Jan de Leeuw

4.12. LIST RECONSTRUCTION 287first r columns are necessarily orthonormal. 4.44 Assuming membership tothe unit simplex y ∈ S (634), then point p = X √ 2V N y = Q p√ΛQ T y in R nbelongs to the translated polyhedronP − x 1 (678)whose generating list constitutes the columns of (578)[ √ ] [0 X 2VN = 0 √ ]−VN TDV N=[ √ ]0 Q p ΛQT∈ R n×N= [0 x 2 −x 1 x 3 −x 1 · · · x N −x 1 ](679)The scaled auxiliary matrix V N represents that translation. A simple choicefor Q p has n set to N − 1; id est, Q p = I . Ideally, each member of thegenerating list has at most r nonzero entries; r being, affine dimensionrankV T NDV N = rankQ p√ΛQ T = rank Λ = r (680)Each member then has at least N −1 − r zeros in its higher-dimensionalcoordinates because r ≤ N −1. (590) To truncate those zeros, choose nequal to affine dimension which is the smallest n possible because XV N hasrank r ≤ n (586). 4.45 In that case, the simplest choice for Q p is [ I 0 ]having dimensions r ×N −1.We may wish to verify the list (679) found from the diagonalization of−VN TDV N . Because of rotation/reflection and translation invariance (4.5),EDM D can be uniquely made from that list by calculating: (460)4.44 Recall r signifies affine dimension. Q p is not necessarily an orthogonal matrix. Q p isconstrained such that only its first r columns are necessarily orthonormal because thereare only r nonzero eigenvalues in Λ when −VN TDV N has rank r (4.7.1.1). Remainingcolumns of Q p are arbitrary.⎡⎤⎡q T1 ⎤λ 1 0. .. 4.45 If we write Q T = ⎣. .. ⎦ as rowwise eigenvectors, Λ = ⎢ λr ⎥qN−1T ⎣ 0 ... ⎦0 0in terms of eigenvalues, and Q p = [ ]q p1 · · · q pN−1 as column vectors, then√Q p Λ Q T ∑= r √λi q pi qiT is a sum of r linearly independent rank-one matrices (B.1.1).i=1Hence the summation has rank r .

288 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXD(X) = D(X[0 √ 2V N ]) = D(Q p [0 √ ΛQ T ]) = D([0 √ ΛQ T ]) (681)This suggests a way to find EDM D given −V T N DV N ; (confer (562))[0D =δ ( −VN TDV )N] [1 T + 1 0 δ ( −VN TDV ) ] TN[ 0 0T− 20 −VN TDV N(480)]4.12.2 0 geometric center. VAlternatively, we may perform reconstruction using instead the auxiliarymatrix V (B.4.1), corresponding to the polyhedronP − α c (682)whose geometric center has been translated to the origin. RedimensioningQ, Λ∈R N×N and Q p ∈ R n×N , (585)−V DV = 2V X T XV ∆ = Q √ ΛQ T pQ p√ΛQT(683)where the geometrically centered generating list constitutes (confer (679))XV =√−V DV 1 2= 1 √2Q p√ΛQ T ∈ R n×N= [x 1 − 1 N X1 x 2 − 1 N X1 x 3 − 1 N X1 · · · x N − 1 N X1 ] (684)where α c = 1 X1. (4.5.1.0.1)NNow EDM D can be uniquely made from the list found, by calculating:(460)D(X) = D(XV ) = D( 1 √2Q p√ΛQ T ) = D( √ ΛQ T ) 1 2(685)This EDM is, of course, identical to (681). Similarly to (480), from −V DVwe can find EDM D ; (confer (549))D = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (486)

4.12. LIST RECONSTRUCTION 289(a)(c)(b)(d)(f)(e)Figure 70: Map of United States of America showing some state boundariesand the Great Lakes. All plots made using 5020 connected points. Anydifference in scale in (a) through (d) is an artifact of plotting routine.(a) shows original map made from decimated (latitude, longitude) data.(b) Original map data rotated (freehand) to highlight curvature of Earth.(c) Map isometrically reconstructed from the EDM.(d) Same reconstructed map illustrating curvature.(e)(f) Two views of one isotonic reconstruction; problem (694) with no sortconstraint Ξd (and no hidden line removal).

290 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.13 Reconstruction examples4.13.1 Isometric reconstruction4.13.1.0.1 Example. Map of the USA.The most fundamental application of EDMs is to reconstruct relative pointposition given only interpoint distance information. Drawing a map of theUnited States is a good illustration of isometric reconstruction from completedistance data. We obtained latitude and longitude information for the coast,border, states, and Great Lakes from the usalo atlas data file within theMatlab Mapping Toolbox; the conversion to Cartesian coordinates (x,y,z)via:φ ∆ = π/2 − latitudeθ ∆ = longitudex = sin(φ) cos(θ)y = sin(φ) sin(θ)z = cos(φ)(686)We used 64% of the available map data to calculate EDM D from N = 5020points. The original (decimated) data and its isometric reconstruction via(677) are shown in Figure 70(a)-(d). The Matlab code is inG.3.1. Theeigenvalues computed for (675) areλ(−V T NDV N ) = [199.8 152.3 2.465 0 0 0 · · · ] T (687)The 0 eigenvalues have absolute numerical error on the order of 2E-13 ;meaning, the EDM data indicates three dimensions (r = 3) are required forreconstruction to nearly machine precision.4.13.2 Isotonic reconstructionSometimes only comparative information about distance is known (Earth iscloser to the Moon than it is to the Sun). Suppose, for example, the EDMD for three points is unknown:

4.13. RECONSTRUCTION EXAMPLES 291D = [d ij ] =⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0but the comparative data is available:⎤⎦ ∈ S 3 h (449)d 13 ≥ d 23 ≥ d 12 (688)With the vectorization d = [d 12 d 13 d 23 ] T ∈ R 3 , we express the comparativedistance relationship as the nonincreasing sortingΞd =⎡⎣0 1 00 0 11 0 0⎤⎡⎦⎣⎤d 12d 13⎦ =d 23⎡⎣⎤d 13d 23⎦ ∈ K M+ (689)d 12where Ξ is a given permutation matrix expressing the known sorting actionon the entries of unknown EDM D , and K M+ is the monotone nonnegativecone (2.13.9.4.1)K M+ ∆ = {z | z 1 ≥ z 2 ≥ · · · ≥ z N(N−1)/2 ≥ 0} ⊆ R N(N−1)/2+ (347)where N(N −1)/2 = 3 for the present example.vectorization (689) we create the sort-index matrixgenerally defined⎡O = ⎣0 1 2 3 21 2 0 2 23 2 2 2 0⎤From the sorted⎦ ∈ S 3 h ∩ R 3×3+ (690)O ij ∆ = k 2 | d ij = ( Π Ξd ) k , j ≠ i (691)where Π is a permutation matrix completely reversing order of vector entries.Replacing EDM data with indices-square of a nonincreasing sorting likethis is, of course, a heuristic we invented and may be regarded as a nonlinearintroduction of much noise into the Euclidean distance matrix. For largedata sets, this heuristic makes an otherwise intense problem computationallytractable; we see an example in relaxed problem (695).

292 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXλ(−V T N OV N) j90080070060050040030020010001 2 3 4 5 6 7 8 9 10jFigure 71: Largest ten eigenvalues of −VN TOV N for map of USA, sorted bynonincreasing value. In the code (G.3.2), we normalize O by (N(N −1)/2) 2 .Any process of reconstruction that leaves comparative distanceinformation intact is called ordinal multidimensional scaling or isotonicreconstruction. Beyond rotation, reflection, and translation error, (4.5)list reconstruction by isotonic reconstruction is subject to error in absolutescale (dilation) and distance ratio. Yet Borg & Groenen argue: [34,2.2]reconstruction from complete comparative distance information for a largenumber of points is as highly constrained as reconstruction from an EDM;the larger the number, the better.4.13.2.1 Isotonic map of the USATo test Borg & Groenen’s conjecture, suppose we make a complete sort-indexmatrix O ∈ S N h ∩ R N×N+ for the map of the USA and then substitute O in placeof EDM D in the reconstruction process of4.12. Whereas EDM D returnedonly three significant eigenvalues (687), the sort-index matrix O is generallynot an EDM (certainly not an EDM with corresponding affine dimension 3)so returns many more. The eigenvalues, calculated with absolute numericalerror approximately 5E-7, are plotted in Figure 71:λ(−V T N OV N ) = [880.1 463.9 186.1 46.20 17.12 9.625 8.257 1.701 0.7128 0.6460 · · · ] T(692)

4.13. RECONSTRUCTION EXAMPLES 293The extra eigenvalues indicate that affine dimension corresponding to anEDM near O is likely to exceed 3. To realize the map, we must simultaneouslyreduce that dimensionality and find an EDM D closest to O in some sense(a problem explored more in7) while maintaining the known comparativedistance relationship; e.g., given permutation matrix Ξ expressing the knownsorting action on the entries d of unknown D ∈ S N h , (62)d ∆ = 1 √2dvec D =⎡⎢⎣⎤d 12d 13d 23d 14d 24d 34⎥⎦.d N−1,N∈ R N(N−1)/2 (693)we can make the sort-index matrix O input to the optimization problemminimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3(694)Ξd ∈ K M+D ∈ EDM Nthat finds the EDM D (corresponding to affine dimension not exceeding 3 inisomorphic dvec EDM N ∩ Ξ T K M+ ) closest to O in the sense of Schoenberg(479).Analytical solution to this problem, ignoring the sort constraintΞd ∈ K M+ , is known [229]: we get the convex optimization [sic] (7.1)minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3(695)D ∈ EDM NOnly the three largest nonnegative eigenvalues in (692) need be retainedto make list (679); the rest are discarded. The reconstruction fromEDM D found in this manner is plotted in Figure 70(e)(f) from which itbecomes obvious that inclusion of the sort constraint is necessary for isotonicreconstruction.

294 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXThat sort constraint demands: any optimal solution D ⋆ must possess theknown comparative distance relationship that produces the original ordinaldistance data O (691). Ignoring the sort constraint, apparently, violates it.Yet even more remarkable is how much the map reconstructed using onlyordinal data still resembles the original map of the USA after suffering themany violations produced by solving relaxed problem (695). This suggeststhe simple reconstruction techniques of4.12 are robust to a significantamount of noise.4.13.2.2 Isotonic solution with sort constraintBecause problems involving rank are generally difficult, we will partition(694) into two problems we know how to solve and then alternate theirsolution until convergence:minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM Nminimize ‖σ − Ξd‖σsubject to σ ∈ K M+(a)(b)(696)where the sort-index matrix O (a given constant in (a)) becomes an implicitvectorized variable o i solving the i th instance of (696b)o i ∆ = Ξ T σ ⋆ = 1 √2dvec O i ∈ R N(N−1)/2 , i∈{1, 2, 3...} (697)As mentioned in discussion of relaxed problem (695), a closed-formsolution to problem (696a) exists. Only the first iteration of (696a) sees theoriginal sort-index matrix O whose entries are nonnegative whole numbers;id est, O 0 =O∈ S N h ∩ R N×N+ (691). Subsequent iterations i take the previoussolution of (696b) as inputreal successors to the sort-index matrix O .O i = dvec −1 ( √ 2o i ) ∈ S N (698)

4.13. RECONSTRUCTION EXAMPLES 295New problem (696b) finds the unique minimum-distance projection of Ξdon the monotone nonnegative cone K M+ . By definingY †T ∆ = [e 1 − e 2 e 2 −e 3 e 3 −e 4 · · · e m ] ∈ R m×m (348)where m=N(N ∆ −1)/2, we may rewrite (696b) as an equivalent quadraticprogram; a convex optimization problem [38,4] in terms of thehalfspace-description of K M+ :minimize (σ − Ξd) T (σ − Ξd)σsubject to Y † σ ≽ 0(699)This quadratic program can be converted to a semidefinite program via Schurform (A.4.1); we get the equivalent problemminimizet∈R , σsubject tot[tI σ − Ξd(σ − Ξd) T 1]≽ 0(700)Y † σ ≽ 04.13.2.3 ConvergenceInE.10 we discuss convergence of alternating projection on intersectingconvex sets in a Euclidean vector space; convergence to a point in theirintersection. Here the situation is different for two reasons:Firstly, sets of positive semidefinite matrices having an upper bound onrank are generally not convex. Yet in7.1.4.0.1 we prove (696a) is equivalentto a projection of nonincreasingly ordered eigenvalues on a subset of thenonnegative orthant:minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM N≡minimize ‖Υ − Λ‖ FΥ [ ]R3+ (701)subject to δ(Υ) ∈0∆where −VN TDV N =UΥU T ∈ S N−1 and −VN TOV N =QΛQ T ∈ S N−1 areordered diagonalizations (A.5). It so happens: optimal orthogonal U ⋆always equals Q given. Linear operator T(A) = U ⋆T AU ⋆ , acting on square∆

296 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXmatrix A , is a bijective isometry because the Frobenius norm is orthogonallyinvariant (39). This isometric isomorphism T thus maps a nonconvexproblem to a convex one that preserves distance.Secondly, the second half (696b) of the alternation takes place in adifferent vector space; S N h (versus S N−1 ). From4.6 we know these twovector spaces are related by an isomorphism, S N−1 =V N (S N h ) (567), but notby an isometry.We have, therefore, no guarantee from theory of alternating projectionthat the alternation (696) converges to a point, in the set of all EDMscorresponding to affine dimension not in excess of 3, belonging todvec EDM N ∩ Ξ T K M+ .4.13.2.4 InterludeWe have not implemented the second half (699) of alternation (696) forUSA map data because memory-demands exceed the capability of our32-bit laptop computer. We leave it an exercise to empirically demonstrateconvergence on a smaller data set.It would be remiss not to mention another method of solution to thisisotonic reconstruction problem: Once again we assume only comparativedistance data like (688) is available. Given known set of indices Iminimize rankV DVD(702)subject to d ij ≤ d kl ≤ d mn ∀(i,j,k,l,m,n)∈ ID ∈ EDM Nthis problem minimizes affine dimension while finding an EDM whoseentries satisfy known comparative relationships. Suitable rank heuristics arediscussed in7.2.2 that will transform this to a convex optimization problem.Using contemporary computers, even with a rank heuristic in place of theobjective function, this problem formulation is more difficult to compute thanthe relaxed counterpart problem (695). That is because there exist efficientalgorithms to compute a selected few eigenvalues and eigenvectors from avery large matrix. Regardless, it is important to recognize: the optimalsolution set for this problem (702) is practically always different from theoptimal solution set for its counterpart, problem (694).

4.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 2974.14 Fifth property of Euclidean metricWe continue now with the question raised in4.3 regarding the necessityfor at least one requirement more than the four properties of the Euclideanmetric (4.2) to certify realizability of a bounded convex polyhedron or toreconstruct a generating list for it from incomplete distance information.There we saw that the four Euclidean metric properties are necessary forD ∈ EDM N in the case N = 3, but become insufficient when cardinality Nexceeds 3 (regardless of affine dimension).4.14.1 RecapitulateIn the particular case N = 3, −V T N DV N ≽ 0 (607) and D ∈ S 3 h are necessaryand sufficient conditions for D to be an EDM. By (609), triangle inequalityis then the only Euclidean condition bounding the necessarily nonnegatived ij ; and those bounds are tight. That means the first four properties of theEuclidean metric are necessary and sufficient conditions for D to be an EDMin the case N = 3 ; for i,j∈{1, 2, 3}√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇔ −V T N DV N ≽ 0D ∈ S 3 h⇔ D ∈ EDM 3(703)Yet those four properties become insufficient when N > 3.4.14.2 Derivation of the fifthCorrespondence between the triangle inequality and the EDM was developedin4.8.2 where a triangle inequality (609a) was revealed within theleading principal 2 ×2 submatrix of −VN TDV N when positive semidefinite.Our choice of the leading principal submatrix was arbitrary; actually,a unique triangle inequality like (513) corresponds to any one of the(N −1)!/(2!(N −1 − 2)!) principal 2 ×2 submatrices. 4.46 Assuming D ∈ S 4 hand −VN TDV N ∈ S 3 , then by the positive (semi)definite principal submatrices4.46 There are fewer principal 2×2 submatrices in −V T N DV N than there are triangles madeby four or more points because there are N!/(3!(N − 3)!) triangles made by point triples.The triangles corresponding to those submatrices all have vertex x 1 . (confer4.8.2.1)

298 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXtheorem (A.3.1.0.4) it is sufficient to prove all d ij are nonnegative, alltriangle inequalities are satisfied, and det(−VN TDV N) is nonnegative. WhenN = 4, in other words, that nonnegative determinant becomes the fifth andlast Euclidean metric requirement for D ∈ EDM N . We now endeavor toascribe geometric meaning to it.4.14.2.1 Nonnegative determinantBy (519) when D ∈EDM 4 , −VN TDV N is equal to inner product (514),⎡√ √ ⎤d 12 d12 d 13 cosθ 213 d12 √d12 √d 14 cosθ 214Θ T Θ = ⎣ d 13 cosθ 213 d 13 d13 √d12d 14 cosθ 314⎦√(704)d 14 cosθ 214 d13 d 14 cosθ 314 d 14Because Euclidean space is an inner-product space, the more conciseinner-product form of the determinant is admitted;det(Θ T Θ) = −d 12 d 13 d 14(cos(θ213 ) 2 +cos(θ 214 ) 2 +cos(θ 314 ) 2 − 2 cos θ 213 cosθ 214 cos θ 314 − 1 )The determinant is nonnegative if and only if(705)cos θ 214 cos θ 314 − √ sin(θ 214 ) 2 sin(θ 314 ) 2 ≤ cos θ 213 ≤ cos θ 214 cos θ 314 + √ sin(θ 214 ) 2 sin(θ 314 ) 2⇔cos θ 213 cos θ 314 − √ sin(θ 213 ) 2 sin(θ 314 ) 2 ≤ cos θ 214 ≤ cos θ 213 cos θ 314 + √ sin(θ 213 ) 2 sin(θ 314 ) 2⇔cos θ 213 cos θ 214 − √ sin(θ 213 ) 2 sin(θ 214 ) 2 ≤ cos θ 314 ≤ cos θ 213 cos θ 214 + √ sin(θ 213 ) 2 sin(θ 214 ) 2which simplifies, for 0 ≤ θ i1l ,θ l1j ,θ i1j ≤ π and all i≠j ≠l ∈{2, 3, 4}, to(706)cos(θ i1l + θ l1j ) ≤ cos θ i1j ≤ cos(θ i1l − θ l1j ) (707)Analogously to triangle inequality (621), the determinant is 0 upon equalityon either side of (707) which is tight. Inequality (707) can be equivalentlywritten linearly as a “triangle inequality”, but between relative angles[261,1.4];|θ i1l − θ l1j | ≤ θ i1j ≤ θ i1l + θ l1jθ i1l + θ l1j + θ i1j ≤ 2π(708)0 ≤ θ i1l ,θ l1j ,θ i1j ≤ πGeneralizing this:

4.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 299θ 213-θ 214-θ 314Figure 72: The relative-angle inequality tetrahedron (709) bounding EDM 4is regular; drawn in entirety. Each angle θ (511) must belong to this solidto be realizable.π4.14.2.1.1 Fifth property of Euclidean metric - restatement.Relative-angle inequality. (confer4.3.1.0.1) [31] [32, p.17, p.107] [145,3.1]Augmenting the four fundamental Euclidean metric properties in R n , for alli,j,l ≠ k ∈{1... N}, i

300 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXtriangle inequality remains a necessary although penultimate test; (4.4.3)Four Euclidean metric properties (4.2).Angle θ inequality (454) or (709).⇔ −V T N DV N ≽ 0D ∈ S 4 h⇔ D = D(Θ) ∈ EDM 4The relative-angle inequality, for this case, is illustrated in Figure 72.4.14.2.2 Beyond the fifth metric property(710)When cardinality N exceeds 4, the first four properties of the Euclideanmetric and the relative-angle inequality together become insufficientconditions for realizability. In other words, the four Euclidean metricproperties and relative-angle inequality remain necessary but become asufficient test only for positive semidefiniteness of all the principal 3 × 3submatrices [sic] in −VN TDV N . Relative-angle inequality can be consideredthe ultimate test only for realizability at each vertex x k of each and everypurported tetrahedron constituting a hyperdimensional body.When N = 5 in particular, relative-angle inequality becomes thepenultimate Euclidean metric requirement while nonnegativity of thenunwieldy det(Θ T Θ) corresponds (by the positive (semi)definite principalsubmatrices theorem inA.3.1.0.4) to the sixth and last Euclidean metricrequirement. Together these six tests become necessary and sufficient, andso on.Yet for all values of N , only assuming nonnegative d ij , relative-anglematrix inequality in (623) is necessary and sufficient to certify realizability;(4.4.3.1)Euclidean metric property 1 (4.2).Angle matrix inequality Ω ≽ 0 (520).⇔ −V T N DV N ≽ 0D ∈ S N h⇔ D = D(Ω,d) ∈ EDM N(711)Like matrix criteria (455), (479), and (623), the relative-angle matrixinequality and nonnegativity property subsume all the Euclidean metricproperties and further requirements.4.14.3 Path not followedAs a means to test for realizability of four or more points, anintuitively appealing way to augment the four Euclidean metric properties

4.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 301is to recognize generalizations of the triangle inequality: In thecase N = 4, the three-dimensional analogue to triangle & distance istetrahedron & facet-area, while in the case N = 5 the four-dimensionalanalogue is polychoron & facet-volume, ad infinitum. For N points, N + 1metric properties are required.4.14.3.1 N = 4Each of the four facets of a general tetrahedron is a triangle and itsrelative interior. Suppose we identify each facet of the tetrahedron by itsarea-squared: c 1 , c 2 , c 3 , c 4 . Then analogous to metric property 4, we maywrite a tight 4.47 area inequality for the facets√ci ≤ √ c j + √ c k + √ c l , i≠j ≠k≠l∈{1, 2, 3, 4} (712)which is a generalized “triangle” inequality [140,1.1] that follows from√ci = √ c j cos ϕ ij + √ c k cos ϕ ik + √ c l cos ϕ il (713)[150] [244, Law of Cosines] where ϕ ij is the dihedral angle at the commonedge between triangular facets i and j .If D is the EDM corresponding to the whole tetrahedron, thenarea-squared of the i th triangular facet has a convenient formula in termsof D i ∈ EDM N−1 the EDM corresponding to that particular facet: From theCayley-Menger determinant 4.48 for simplices, [244] [69] [90,4] [49,3.3] thei th facet area-squared for i∈{1... N} is (A.4.2)[ ]−1 0 1c i =2 N−2 (N −2)! 2 det T(714)1 −D i==(−1) N2 N−2 (N −2)! 2 detD i(1 T D −1i 1 ) (715)(−1) N2 N−2 (N −2)! 2 1T cof(D i ) T 1 (716)where D i is the i th principal N−1×N−1 submatrix 4.49 of D ∈ EDM N , and4.47 The upper bound is met when all angles in (713) are simultaneously 0; that occurs,for example, if one point is relatively interior to the convex hull of the three remaining.4.48 whose foremost characteristic is: the determinant vanishes [ if and ] only if affine0 1Tdimension does not equal penultimate cardinality; id est, det = 0 ⇔ r < N −11 −Dwhere D is any EDM (4.7.3.0.1). Otherwise, the determinant is negative.4.49 Every principal submatrix of an EDM remains an EDM. [145,4.1]

302 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXcof(D i ) is the N −1×N −1 matrix of cofactors [212,4] corresponding toD i . The number of principal 3 × 3 submatrices in D is, of course, equalto the number of triangular facets in the tetrahedron; four (N!/(3!(N −3)!))when N = 4.The triangle inequality (property 4) and area inequality (712) areconditions necessary for D to be an EDM; we do not prove their sufficiencyin conjunction with the remaining three Euclidean metric properties.4.14.3.2 N = 5Moving to the next level, we might encounter a Euclidean body calledpolychoron, a bounded polyhedron in four dimensions. 4.50 The polychoronhas five (N!/(4!(N −4)!)) facets, each of them a general tetrahedron whosevolume-squared c i is calculated using the same formula; (714) whereD is the EDM corresponding to the polychoron, and D i is the EDMcorresponding to the i th facet (the principal 4 × 4 submatrix of D ∈ EDM Ncorresponding to the i th tetrahedron). The analogue to triangle & distanceis now polychoron & facet-volume. We could then write another generalized“triangle” inequality like (712) but in terms of facet volume; [248,IV]√ci ≤ √ c j + √ c k + √ c l + √ c m , i≠j ≠k≠l≠m∈{1... 5} (717)Now, for N = 5, the triangle (distance) inequality (4.2) and the areainequality (712) and the volume inequality (717) are conditions necessary forD to be an EDM. We do not prove their sufficiency.4.14.3.3 Volume of simplicesThere is no known formula for the volume of a bounded general convexpolyhedron expressed either by halfspace or vertex-description. [259,2.1][177, p.173] [142] [143] [100] [101] Volume is a concept germane to R 3 ;in higher dimensions it is called content. Applying the EDM assertion(4.9.1.0.3) and a result given in [38,8.3.1], a general nonempty simplex(2.12.3) in R N−1 corresponding to an EDM D ∈ S N h has content√ c = content(S) det 1/2 (−V T NDV N ) (718)4.50 The simplest polychoron is called a pentatope [244]; a regular simplex hence convex.(A pentahedron is a three-dimensional body having five vertices.)

4.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 303.Figure 73: Length of one-dimensional face a equals height h=a=1 of thisnonsimplicial pyramid in R 3 with square base inscribed in a circle of radius Rcentered at the origin. [244, Pyramid]where the content-squared of the unit simplex S ⊂ R N−1 is proportional toits Cayley-Menger determinant;[ ]content(S) 2 −1 0 1=2 N−1 (N −1)! 2 det T(719)1 −D([0 e 1 e 2 · · · e N−1 ])where e i ∈ R N−1 and the EDM operator used is D(X) (460).4.14.3.3.1 Example. Pyramid.A formula for volume of a pyramid is known; 4.51 it is 1 the product of its3base area with its height. [138] The pyramid in Figure 73 has volume 1 . 3To find its volume using EDMs, we must first decompose the pyramid intosimplicial parts. Slicing it in half along the plane containing the line segmentscorresponding to radius R and height h we find the vertices of one simplex,⎡⎤1/2 1/2 −1/2 0X = ⎣ 1/2 −1/2 −1/2 0 ⎦∈ R n×N (720)0 0 0 1where N = n + 1 for any nonempty simplex in R n .this simplex is half that of the entire pyramid; id est,evaluating (718).With that, we conclude digression of path.√The volume ofc =1found by64.51 Pyramid volume is independent of the paramount vertex position as long as its heightremains constant.

304 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.14.4 Affine dimension reduction in three dimensions(confer4.8.4) The determinant of any M ×M matrix is equal to the productof its M eigenvalues. [212,5.1] When N = 4 and det(Θ T Θ) is 0, thatmeans one or more eigenvalues of Θ T Θ∈ R 3×3 are 0. The determinant willgo to 0 whenever equality is attained on either side of (454), (709a), or (709b),meaning that a tetrahedron has collapsed to a lower affine dimension; id est,r = rank Θ T Θ = rank Θ is reduced below N − 1 exactly by the number of0 eigenvalues (4.7.1.1).In solving completion problems of any size N where one or more entriesof an EDM are unknown, therefore, dimension r of the affine hull requiredto contain the unknown points is potentially reduced by selecting distancesto attain equality in (454) or (709a) or (709b).4.14.4.1 Exemplum reduxWe now apply the fifth Euclidean metric property to an earlier problem:4.14.4.1.1 Example. Small completion problem, IV. (confer4.9.3.0.1)Returning again to Example 4.3.0.0.2 that pertains to Figure 57 whereN =4, distance-square d 14 is ascertainable from the fifth Euclidean metricproperty. Because all distances in (452) are known except √ d 14 , thencos θ 123 =0 and θ 324 =0 result from identity (511). Applying (454),cos(θ 123 + θ 324 ) ≤ cos θ 124 ≤ cos(θ 123 − θ 324 )0 ≤ cos θ 124 ≤ 0(721)It follows again from (511) that d 14 can only be 2. As explained in thissubsection, affine dimension r cannot exceed N −2 because equality isattained in (721).

Chapter 5EDM coneFor N > 3, the cone of EDMs is no longer a circular cone andthe geometry becomes complicated...−Hayden, Wells, Liu, & Tarazaga (1991) [110,3]In the subspace of symmetric matrices S N , we know the convex coneof Euclidean distance matrices EDM N does not intersect the positivesemidefinite (PSD) cone S N + except at the origin, their only vertex; therecan be no positive nor negative semidefinite EDM. (646) [145]EDM N ∩ S N + = 0 (722)Even so, the two convex cones can be related. We prove the new equalityEDM N = S N h ∩ ( )S N⊥c − S N + (810)a resemblance to EDM definition (460) whereS N h∆= { A ∈ S N | δ(A) = 0 } (55)is the symmetric hollow subspace (2.2.3) and whereS N⊥c = {u1 T + 1u T | u∈ R N } (1563)is the orthogonal complement of the geometric center subspace (E.7.2.0.2)S N c∆= {Y ∈ S N | Y 1 = 0} (1561)In5.8.1.7 we show: the Schoenberg criterion (479) for discriminatingEuclidean distance matrices is simply a membership relation (2.13.2)between the EDM cone and its ordinary dual.2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.305

306 CHAPTER 5. EDM CONE5.1 Defining the EDM coneWe invoke a popular matrix criterion to illustrate correspondence between theEDM and PSD cones belonging to the ambient space of symmetric matrices:{−V DV ∈ SND ∈ EDM N+⇔(484)D ∈ S N hwhere V ∈ S N is the geometric centering projection matrix (B.4). The setof all EDMs of dimension N ×N forms a closed convex cone EDM N becauseany pair of EDMs satisfies the definition of a convex cone (138); videlicet, foreach and every ζ 1 , ζ 2 ≥ 0 (A.3.1.0.2)ζ 1 V D 1 V + ζ 2 V D 2 V ≽ 0ζ 1 D 1 + ζ 2 D 2 ∈ S N h⇐ V D 1V ≽ 0, V D 2 V ≽ 0D 1 ∈ S N h , D 2 ∈ S N h(723)and convex cones are invariant to inverse affine transformation [194, p.22].5.1.0.0.1 Definition. Cone of Euclidean distance matrices.In the subspace of symmetric matrices, the set of all Euclidean distancematrices forms a unique immutable pointed closed convex cone called theEDM cone: for N > 0EDM N = ∆ { }D ∈ S N h | −V DV ∈ S N += ⋂ {D ∈ S N | 〈zz T , −D〉≥0, δ(D)=0 } (724)z∈N(1 T )The EDM cone in isomorphic R N(N+1)/2 [sic] is the intersection of an infinitenumber (when N >2) of halfspaces about the origin and a finite numberof hyperplanes through the origin in vectorized variable D = [d ij ] . HenceEDM N has empty interior with respect to S N because it is confined to thesymmetric hollow subspace S N h . The EDM cone relative interior comprisesrel int EDM N = ⋂ {D ∈ S N | 〈zz T , −D〉>0, δ(D)=0 }z∈N(1 T )= { D ∈ EDM N | rank(V DV ) = N −1 } (725)while its relative boundary comprisesrel∂EDM N = { D ∈ EDM N | 〈zz T , −D〉 = 0 for some z ∈ N(1 T ) }= { D ∈ EDM N | rank(V DV ) < N −1 } (726)△

5.1. DEFINING THE EDM CONE 307dvec rel∂EDM 3d 0 13 0.20.2d 120.40.40.60.60.80.8111d 230.80.60.4d 23(a)0.20(d)d 12d 130.80.8√d23 0.60.6√ √ 0 0 d13 0.20.2 d120.20.40.4 0.40.60.60.60.2 0.4 0.60.80.80.80.811 11110.40.40.20.2(b)00(c)Figure 74: Relative boundary (tiled) of EDM cone EDM 3 drawn truncatedin isometrically isomorphic subspace R 3 . (a) EDM cone drawn in usualdistance-square coordinates d ij . View is from interior toward origin. Unlikepositive semidefinite cone, EDM cone is not self-dual, neither is it properin ambient symmetric subspace (dual EDM cone for this example belongsto isomorphic R 6 ). (b) Drawn in its natural coordinates √ d ij (absolutedistance), cone remains convex (confer4.10); intersection of three halfspaces(610) whose partial boundaries each contain origin. Cone geometry becomes“complicated” (nonpolyhedral) in higher dimension. [110,3] (c) Twocoordinate systems artificially superimposed. Coordinate transformationfrom d ij to √ d ij appears a topological contraction. (d) Sitting onits vertex 0, pointed EDM 3 is a circular cone having axis of revolutiondvec(−E)= dvec(11 T − I) (642) (62). Rounded vertex is plot artifact.

308 CHAPTER 5. EDM CONEThis cone is more easily visualized in the isomorphic vector subspaceR N(N−1)/2 corresponding to S N h :In the case N = 1 point, the EDM cone is the origin in R 0 .In the case N = 2, the EDM cone is the nonnegative real line in R ; ahalfline in a subspace of the realization in Figure 82.The EDM cone in the case N = 3 is a circular cone in R 3 illustrated inFigure 74(a)(d); rather, the set of all matrices⎡D = ⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎦ ∈ EDM 3 (727)makes a circular cone in this dimension. In this case, the first four Euclideanmetric properties are necessary and sufficient tests to certify realizabilityof triangles; (703). Thus triangle inequality property 4 describes threehalfspaces (610) whose intersection makes a polyhedral cone in R 3 ofrealizable √ d ij (absolute distance); an isomorphic subspace representationof the set of all EDMs D in the natural coordinates⎡ √ √ ⎤0 d12 d13◦√D =∆ √d12 √⎣ 0 d23 ⎦√d13 √(728)d23 0illustrated in Figure 74(b).5.2 Polyhedral boundsThe convex cone of EDMs is nonpolyhedral in d ij for N > 2 ; e.g.,Figure 74(a). Still we found necessary and sufficient bounding polyhedralrelations consistent with EDM cones for cardinality N = 1, 2, 3, 4:N = 3. Transforming distance-square coordinates d ij by taking their positivesquare root provides polyhedral cone in Figure 74(b); polyhedralbecause an intersection of three halfspaces in natural coordinates√dij is provided by triangle inequalities (610). This polyhedralcone implicitly encompasses necessary and sufficient metric properties:nonnegativity, self-distance, symmetry, and triangle inequality.

5.2. POLYHEDRAL BOUNDS 309(b)(c)dvec rel∂EDM 3(a)∂H0Figure 75: (a) In isometrically isomorphic subspace R 3 , intersection ofEDM 3 with hyperplane ∂H representing one fixed symmetric entry d 23 =κ(both drawn truncated, rounded vertex is artifact of plot). EDMs in thisdimension corresponding to affine dimension 1 comprise relative boundary ofEDM cone (5.6). Since intersection illustrated includes a nontrivial subsetof cone’s relative boundary, then it is apparent there exist infinitely manyEDM completions corresponding to affine dimension 1. In this dimension it isimpossible to represent a unique nonzero completion corresponding to affinedimension 1, for example, using a single hyperplane because any hyperplanesupporting relative boundary at a particular point Γ contains an entire ray{ζΓ | ζ ≥0} belonging to rel∂EDM 3 by Lemma 2.8.0.0.1. (b) d 13 =κ.(c) d 12 =κ.

310 CHAPTER 5. EDM CONEN = 4. Relative-angle inequality (709) together with four Euclidean metricproperties are necessary and sufficient tests for realizability oftetrahedra. (710) Albeit relative angles θ ikj (511) are nonlinearfunctions of the d ij , relative-angle inequality provides a regulartetrahedron in R 3 [sic] (Figure 72) bounding angles θ ikj at vertex x kconsistently with EDM 4 . 5.1Yet were we to employ the procedure outlined in4.14.3 for makinggeneralized triangle inequalities, then we would find all the necessary andsufficient d ij -transformations for generating bounding polyhedra consistentwith EDMs of any higher dimension (N > 3).5.3 √ EDM cone is not convexFor some applications, like the molecular conformation problem (Figure 3)or multidimensional scaling, [56] [230] absolute distance √ d ij is the preferredvariable. Taking square root of the entries in all EDMs D of dimension N ,we get another cone but not a convex cone when N > 3 (Figure 74(b)):[50,4.5.2]√EDM N ∆ = { ◦√ D | D ∈ EDM N } (729)where ◦√ D is defined as in (728). It is a cone simply because any coneis completely constituted by rays emanating from the origin: (2.7) Anygiven ray {ζΓ∈ R N(N−1)/2 | ζ ≥0} remains a ray under entrywise square root:{ √ ζΓ∈ R N(N−1)/2 | ζ ≥0}. Because of how √ EDM N is defined, it is obviousthat (confer4.10)D ∈ EDM N ⇔ ◦√ √D ∈ EDM N (730)Were √ EDM N convex, then given◦ √ D 1 , ◦√ D 2 ∈ √ EDM N we wouldexpect their conic combination ◦√ D 1 + ◦√ D 2 to be a member of √ EDM N .That is easily proven false by counterexample via (730), for then( ◦√ D 1 + ◦√ D 2 ) ◦( ◦√ D 1 + ◦√ D 2 ) would need to be a member of EDM N .Notwithstanding, in7.2.1 we learn how to transform a nonconvexproximity problem in the natural coordinates √ d ij to a convex optimization.5.1 Still, property-4 triangle inequalities (610) corresponding to each principal 3 ×3submatrix of −VN TDV N demand that the corresponding √ d ij belong to a polyhedralcone like that in Figure 74(b).

5.3.√EDM CONE IS NOT CONVEX 311(a)2 nearest neighbors(b)3 nearest neighborsFigure 76: Neighborhood graph (dashed) with dimensionless EDM subgraphcompletion (solid) superimposed (but not covering dashed). Local viewof a few dense samples from relative interior of some arbitraryEuclidean manifold whose affine dimension appears two-dimensional in thisneighborhood. All line segments measure absolute distance. Dashed linesegments help visually locate nearest neighbors; suggesting, best number ofnearest neighbors can be greater than value of embedding dimension aftertopological transformation. (confer [131,2]) Solid line segments representcompletion of EDM subgraph from available distance data for an arbitrarilychosen sample and its nearest neighbors. Each distance from EDM subgraphbecomes distance-square in corresponding EDM submatrix.

312 CHAPTER 5. EDM CONE5.4 a geometry of completionIntriguing is the question of whether the list in X ∈ R n×N (64) maybe reconstructed given an incomplete noiseless EDM, and under whatcircumstances reconstruction is unique. [1] [2] [3] [4] [6] [13] [128] [134] [144][145] [146] If one or more entries of a particular EDM are fixed, then geometricinterpretation of the feasible set of completions is the intersection of the EDMcone EDM N in isomorphic subspace R N(N−1)/2 with as many hyperplanes asthere are fixed symmetric entries. Unless the intersection is a point (assuminga nonempty intersection) then the number of completions is generally infinite,and those corresponding to particular affine dimension r

5.4. A GEOMETRY OF COMPLETION 313the graph. The dimensionless EDM subgraph between each sample andits nearest neighbors is completed from available data and included asinput; one such EDM subgraph completion is drawn superimposed upon theneighborhood graph in Figure 76. 5.2 The consequent dimensionless EDMgraph comprising all the subgraphs is incomplete, in general, because theneighbor number is relatively small; incomplete even though it is a supersetof the neighborhood graph. Remaining distances (those not graphed at all)are squared then made variables within the algorithm; it is this variabilitythat admits unfurling.To demonstrate, consider untying the trefoil knot drawn in Figure 77(a).A corresponding Euclidean distance matrix D = [d ij , i,j=1... N]employing only 2 nearest neighbors is banded having the incomplete form⎡D =⎢⎣0 ď 12 ď 13 ? · · · ? ď 1,N−1 ď 1Nď 12 0 ď 23 ď 24... ? ? ď 2Nď 13 ď 23 0 ď 34. . . ? ? ?? ď 24 ď 34 0...... ? ?................... ?? ? ?...... 0 ď N−2,N−1 ď N−2,Nď 1,N−1 ? ? ?... ď N−2,N−1 0 ď N−1,Nď 1N ď 2N ? ? ? ď N−2,N ď N−1,N 0⎤⎥⎦(731)where ďij denotes a given fixed distance-square. The unfurling algorithmcan be expressed as an optimization problem; constrained distance-squaremaximization:maximize 〈−V , D〉Dsubject to 〈D , e i e T j + e j e T i 〉 1 = 2 ďij ∀(i,j)∈ I(732)rank(V DV ) = 2D ∈ EDM N5.2 Local reconstruction of point position from the EDM submatrix corresponding to acomplete dimensionless EDM subgraph is unique to within an isometry (4.6,4.12).

314 CHAPTER 5. EDM CONE(a)(b)Figure 77: (a) Trefoil knot in R 3 from Weinberger & Saul [243].(b) Topological transformation algorithm employing 4 nearest neighbors andN = 539 samples reduces affine dimension of knot to r=2. Choosing instead2 nearest neighbors would make this embedding more circular.

5.4. A GEOMETRY OF COMPLETION 315where e i ∈ R N is the i th member of the standard basis, where set I indexesthe given distance-square data like that in (731), where V ∈ R N×N is thegeometric centering projection matrix (B.4.1), and where〈−V , D〉 = tr(−V DV ) = 2 trG = 1 N∑d ij (485)i,jwhere G is the Gram matrix producing D assuming G1 = 0.If we ignore the (rank) constraint on affine dimension, then problem(732) becomes convex, a corresponding solution D ⋆ can be found, and thena nearest rank-2 solution can be found by ordered eigen [ decomposition ] ofR2−V D ⋆ V followed by spectral projection (7.1.3) on ⊂ R N . This0two-step process is necessarily suboptimal. Yet because the decompositionfor the trefoil knot reveals only two dominant eigenvalues, the spectralprojection is nearly benign. Such a reconstruction of point position (4.12)utilizing 4 nearest neighbors is drawn in Figure 77(b); a low-dimensionalembedding of the trefoil knot.This problem (732) can, of course, be written equivalently in terms ofGram matrix G , facilitated by (491); videlicet, for Φ ij as in (458)maximize 〈I , G〉G∈S N csubject to 〈G , Φ ij 〉 = ďijrankG = 2G ≽ 0∀(i,j)∈ I(733)The advantage to converting EDM to Gram is: Gram matrix G is a bridgebetween point list X and EDM D ; constraints on any or all of thesethree variables may now be introduced. (Example 4.4.2.2.4) Confining Gto the geometric center subspace suffers no loss of generality and serves notheoretical purpose; numerically, this implicit constraint G1 = 0 keeps Gindependent of its translation-invariant subspace S N⊥c (4.5.1.1, Figure 84)so as not to become unbounded.

316 CHAPTER 5. EDM CONEFigure 78: Trefoil ribbon; courtesy, Kilian Weinberger. Same topologicaltransformation algorithm with 5 nearest neighbors and N = 1617 samples.

5.5. EDM DEFINITION IN 11 T 3175.5 EDM definition in 11 TAny EDM D corresponding to affine dimension r has representation(confer (460))D(V X , y) ∆ = y1 T + 1y T − 2V X V T X + λ N 11T ∈ EDM N (734)where R(V X ∈ R N×r )⊆ N(1 T ) ,V T X V X = δ 2 (V T X V X ) , V X is full-rank with orthogonal columns, (735)λ ∆ = 2‖V X ‖ 2 F , and y ∆ = δ(V X V T X ) − λ2N 1 (736)where y=0 if and only if 1 is an eigenvector of EDM D . [110,2] Scalar λbecomes an eigenvalue when corresponding eigenvector 1 exists; e.g., whenX = I in EDM definition (460).Formula (734) can be validated by substituting (736); we findD(V X ) ∆ = δ(V X V T X )1 T + 1δ(V X V T X ) T − 2V X V T X ∈ EDM N (737)is simply the standard EDM definition (460) where X T X has been replacedwith the subcompact singular value decomposition (A.6.2)V X V T X ≡ V T X T XV (738)Then inner product VX TV Xsingular values. 5.3is an r×r diagonal matrix Σ of nonzero∆5.3 Subcompact SVD: V X VXT = QΣ 1/2 Σ 1/2 Q T ≡ V T X T XV . So VX T is not necessarilyXV (4.5.1.0.1), although affine dimension r = rank(VX T ) = rank(XV ). (581)

318 CHAPTER 5. EDM CONEProof. Next we validate eigenvector 1 and eigenvalue λ .(⇒) Suppose 1 is an eigenvector of EDM D . Then becauseit followsV T X 1 = 0 (739)D1 = δ(V X V T X )1T 1 + 1δ(V X V T X )T 1 = N δ(V X V T X ) + ‖V X ‖ 2 F 1⇒ δ(V X V T X ) ∝ 1 (740)For some κ∈ R +δ(V X V T X ) T 1 = N κ = tr(V T X V X ) = ‖V X ‖ 2 F ⇒ δ(V X V T X ) = 1 N ‖V X ‖ 2 F1 (741)so y=0.(⇐) Now suppose δ(V X VX T)= λ 1 ; id est, y=0. Then2ND = λ N 11T − 2V X V T X ∈ EDM N (742)1 is an eigenvector with corresponding eigenvalue λ . 5.5.1 Range of EDM DFromB.1.1 pertaining to linear independence of dyad sums: If the transposehalves of all the dyads in the sum (737) 5.4 make a linearly independent set,then the nontranspose halves constitute a basis for the range of EDM D .Saying this mathematically: For D ∈ EDM NR(D)= R([δ(V X V T X ) 1 V X ]) ⇐ rank([δ(V X V T X ) 1 V X ])= 2 + rR(D)= R([1 V X ]) ⇐ otherwise (743)5.4 Identifying the columns V X∆= [v1 · · · v r ] , then V X V T X = ∑ iv i v T iis a sum of dyads.

5.5. EDM DEFINITION IN 11 T 319N(1 T )δ(V X V T X )1V XFigure 79: Example of V X selection to make an EDM corresponding tocardinality N = 3 and affine dimension r = 1 ; V X is a vector in nullspaceN(1 T )⊂ R 3 . Nullspace of 1 T is hyperplane in R 3 (drawn truncated) havingnormal 1. Vector δ(V X VX T) may or may not be in plane spanned by {1, V X } ,but belongs to nonnegative orthant which is strictly supported by N(1 T ).

320 CHAPTER 5. EDM CONETo prove that, we need the condition under which the rank equality issatisfied: We know R(V X )⊥1, but what is the relative geometric orientationof δ(V X VX T) ? δ(V X V X T)≽0 because V X V X T ≽0, and δ(V X V X T )∝1 remainspossible (740); this means δ(V X VX T) /∈ N(1T ) simply because it has nonegative entries. (Figure 79) If the projection of δ(V X VX T) on N(1T ) doesnot belong to R(V X ), then that is a necessary and sufficient condition forlinear independence (l.i.) of δ(V X VX T) with respect to R([1 V X ]) ; id est,V δ(V X V T X ) ≠ V X a for any a∈ R r(I − 1 N 11T )δ(V X V T X ) ≠ V X aδ(V X V T X ) − 1 N ‖V X ‖ 2 F 1 ≠ V X aδ(V X V T X ) − λ2N 1 = y ≠ V X a ⇔ {1, δ(V X V T X ), V X } is l.i.(744)On the other hand when this condition is violated (when (736) y=V X a p forsome particular a∈ R r ), then from (734) we haveR ( D = y1 T + 1y T − 2V X V T X + λ N 11T) = R ( (V X a p + λ N 1)1T + (1a T p − 2V X )V T X= R([V X a p + λ N 1 1aT p − 2V X ])= R([1 V X ]) (745)An example of such a violation is (742) where, in particular, a p = 0. Then a statement parallel to (743) is, for D ∈ EDM N (Theorem 4.7.3.0.1)rank(D) = r + 2 ⇔ y /∈ R(V X ) ( ⇔ 1 T D † 1 = 0 )rank(D) = r + 1 ⇔ y ∈ R(V X ) ( ⇔ 1 T D † 1 ≠ 0 ) (746))5.5.2 Boundary constituents of EDM coneExpression (737) has utility in forming the set of all EDMs corresponding toaffine dimension r :

5.5. EDM DEFINITION IN 11 T 32110(a)-110-1V X ∈ R 3×1տ-101←−dvecD(V X ) ⊂ dvec rel ∂EDM 3(b)Figure 80: (a) Vector V X from Figure 79 spirals in N(1 T )⊂ R 3decaying toward origin. (Spiral is two-dimensional in vector space R 3 .)(b) Corresponding trajectory D(V X ) on EDM cone relative boundary createsa vortex also decaying toward origin. There are two complete orbits on EDMcone boundary about axis of revolution for every single revolution of V Xabout origin. (Vortex is three-dimensional in isometrically isomorphic R 3 .)

322 CHAPTER 5. EDM CONE{D ∈ EDM N | rank(V DV )= r }= { D(V X ) | V X ∈ R N×r , rankV X =r , VX TV X = δ2 (VX TV X ), R(V X)⊆ N(1 T ) }(747)whereas {D ∈ EDM N | rank(V DV )≤ r} is the closure of this same set;{D ∈ EDM N | rank(V DV )≤ r } = { D ∈ EDM N | rank(V DV )= r } (748)For example,rel ∂EDM N = { D ∈ EDM N | rank(V DV )< N −1 }= N−2⋃ {D ∈ EDM N | rank(V DV )= r } (749)r=0None of these are necessarily convex sets, althoughEDM N = N−1 ⋃ {D ∈ EDM N | rank(V DV )= r }r=0= { D ∈ EDM N | rank(V DV )= N −1 }rel int EDM N = { D ∈ EDM N | rank(V DV )= N −1 } (750)are pointed convex cones.When cardinality N = 3 and affine dimension r = 2, for example, therelative interior rel int EDM 3 is realized via (747). (5.6)When N = 3 and r = 1, the relative boundary of the EDM conedvec rel∂EDM 3 is realized in isomorphic R 3 as in Figure 74(d). This figurecould be constructed via (748) by spiraling vector V X tightly about theorigin in N(1 T ) ; as can be imagined with aid of Figure 79. Vectors closeto the origin in N(1 T ) are correspondingly close to the origin in EDM N .As vector V X orbits the origin in N(1 T ) , the corresponding EDM orbitsthe axis of revolution while remaining on the boundary of the circular conedvec rel∂EDM 3 . (Figure 80)

5.5. EDM DEFINITION IN 11 T 3235.5.3 Faces of EDM cone5.5.3.1 Isomorphic facesIn high cardinality N , any set of EDMs made via (747) or (748) withparticular affine dimension r is isomorphic with any set admitting the sameaffine dimension but made in lower cardinality. We do not prove that here.5.5.3.2 Extreme direction of EDM coneIn particular, extreme directions (2.8.1) of EDM N correspond to affinedimension r = 1 and are simply represented: for any particular cardinalityN ≥ 2 (2.8.2) and each and every nonzero vector z in N(1 T )Γ ∆ = (z ◦ z)1 T + 1(z ◦ z) T − 2zz T ∈ EDM N= δ(zz T )1 T + 1δ(zz T ) T − 2zz T (751)is an extreme direction corresponding to a one-dimensional face of the EDMcone EDM N that is a ray in isomorphic subspace R N(N−1)/2 .Proving this would exercise the fundamental definition (149) of extremedirection. Here is a sketch: Any EDM may be representedD(V X ) ∆ = δ(V X V T X )1 T + 1δ(V X V T X ) T − 2V X V T X ∈ EDM N (737)where matrix V X (735) has orthogonal columns. For the same reason (1071)that zz T is an extreme direction of the positive semidefinite cone (2.9.2.3)for any particular nonzero vector z , there is no conic combination of distinctEDMs (each conically independent of Γ) equal to Γ .5.5.3.2.1 Example. Biorthogonal expansion of an EDM.(confer2.13.7.1.1) When matrix D belongs to the EDM cone, nonnegativecoordinates for biorthogonal expansion are the eigenvalues λ∈ R N of−V DV 1 : For any D ∈ 2 SN h it holds

324 CHAPTER 5. EDM CONED = δ ( −V DV 2) 1 1 T + 1δ ( −V DV 2) 1 T ( )− 2 −V DV12(548)By diagonalization −V DV 1 2∆= QΛQ T ∈ S N c (A.5.2) we may write( N) (∑N)∑ T∑D = δ λ i q i qiT 1 T + 1δ λ i q i qiT − 2 N λ i q i qiTi=1i=1i=1∑= N ( )λ i δ(qi qi T )1 T + 1δ(q i qi T ) T − 2q i qiTi=1(752)where q i is the i th eigenvector of −V DV 1 arranged columnar in orthogonal2matrixQ = [q 1 q 2 · · · q N ] ∈ R N×N (318)and where {δ(q i q T i )1 T + 1δ(q i q T i ) T − 2q i q T i , i=1... N} are extremedirections of some pointed polyhedral cone K ⊂ S N h and extreme directionsof EDM N . Invertibility of (752)−V DV 1 = −V ∑ N (λ2 i δ(qi qi T )1 T + 1δ(q i qi T ) T − 2q i qiTi=1∑= N λ i q i qiTi=1)V12(753)implies linear independence of those extreme directions.biorthogonal expansion is expresseddvec D = Y Y † dvec D = Y λ ( )−V DV 1 2Then the(754)whereY ∆ = [ dvec ( δ(q i q T i )1 T + 1δ(q i q T i ) T − 2q i q T i), i = 1... N]∈ R N(N−1)/2×N (755)When D belongs to the EDM cone in the subspace of symmetric hollowmatrices, unique coordinates Y † dvecD for this biorthogonal expansionmust be the nonnegative eigenvalues λ of −V DV 1 . This means D2simultaneously belongs to the EDM cone and to the pointed polyhedral conedvec K = cone(Y ).

5.5. EDM DEFINITION IN 11 T 3255.5.3.3 Smallest faceNow suppose we are given a particular EDM D(V Xp )∈ EDM N correspondingto affine dimension r and parametrized by V Xp in (737). The EDM cone’ssmallest face that contains D(V Xp ) isF ( EDM N ∋ D(V Xp ) )= { D(V X ) | V X ∈ R N×r , rankV X =r , V T X V X = δ2 (V T X V X ), R(V X)⊆ R(V Xp ) }which is isomorphic 5.5 with the convex cone EDM r+1 , hence of dimension(756)dim F ( EDM N ∋ D(V Xp ) ) = (r + 1)r/2 (757)in isomorphic R N(N−1)/2 . Not all dimensions are represented; e.g., the EDMcone has no two-dimensional faces.When cardinality N = 4 and affine dimension r=2 so that R(V Xp ) is anytwo-dimensional subspace of three-dimensional N(1 T ) in R 4 , for example,then the corresponding face of EDM 4 is isometrically isomorphic with: (748)EDM 3 = {D ∈ EDM 3 | rank(V DV )≤ 2} ≃ F(EDM 4 ∋ D(V Xp )) (758)Each two-dimensional subspace of N(1 T ) corresponds to anotherthree-dimensional face.Because each and every principal submatrix of an EDM in EDM N(4.14.3) is another EDM [145,4.1], for example, then each principalsubmatrix belongs to a particular face of EDM N .5.5.3.4 Open questionThis result (757) is analogous to that for the positive semidefinite cone,although the question remains open whether all faces of EDM N (whosedimension is less than the dimension of the cone) are exposed like they arefor the positive semidefinite cone. (2.9.2.2) [224]5.5 The fact that the smallest face is isomorphic with another (perhaps smaller) EDMcone is implicit in [110,2].

326 CHAPTER 5. EDM CONE5.6 Correspondence to PSD cone S N−1+Hayden & Wells et alii [110,2] assert one-to-one correspondence of EDMswith positive semidefinite matrices in the symmetric subspace. Becauserank(V DV )≤N−1 (4.7.1.1), the positive semidefinite cone correspondingto the EDM cone can only be S N−1+ . [6,18.2.1] To clearly demonstrate thatcorrespondence, we invoke inner-product form EDM definition[ ]D(Φ) =∆ 01δ(Φ)T + 1 [ 0 δ(Φ) ] [ ] 0 0 T T− 2∈ EDM N0 Φ⇔(566)Φ ≽ 0Then the EDM cone may be expressedEDM N = { D(Φ) | Φ ∈ S N−1+}(759)Hayden & Wells’ assertion can therefore be equivalently stated in terms ofan inner-product form EDM operatorD(S N−1+ ) = EDM N (568)V N (EDM N ) = S N−1+ (569)identity (569) holding because R(V N )= N(1 T ) (467), linear functions D(Φ)and V N (D)= −VN TDV N (4.6.2.1) being mutually inverse.In terms of affine dimension r , Hayden & Wells claim particularcorrespondence between PSD and EDM cones:r = N −1: Symmetric hollow matrices −D positive definite on N(1 T ) correspondto points relatively interior to the EDM cone.r < N −1: Symmetric hollow matrices −D positive semidefinite on N(1 T ) , where−VN TDV N has at least one 0 eigenvalue, correspond to points on therelative boundary of the EDM cone.r = 1: Symmetric hollow nonnegative matrices rank-one on N(1 T ) correspondto extreme directions (751) of the EDM cone; id est, for some nonzerovector u (A.3.1.0.7)rankV T N DV N =1D ∈ S N h ∩ R N×N+}⇔{D ∈ EDM N−VTD is an extreme direction ⇔ N DV N ≡ uu TD ∈ S N h(760)

5.6. CORRESPONDENCE TO PSD CONE S N−1+ 3275.6.0.0.1 Proof. Case r = 1 is easily proved: From the nonnegativitydevelopment in4.8.1, extreme direction (751), and Schoenberg criterion(479), we need show only sufficiency; id est, proverankV T N DV N =1D ∈ S N h ∩ R N×N+}⇒D ∈ EDM ND is an extreme directionAny symmetric matrix D satisfying the rank condition must have the form,for z,q ∈ R N and nonzero z ∈ N(1 T ) ,because (4.6.2.1, conferE.7.2.0.2)D = ±(1q T + q1 T − 2zz T ) (761)N(V N (D)) = {1q T + q1 T | q ∈ R N } ⊆ S N (762)Hollowness demands q = δ(zz T ) while nonnegativity demands choice ofpositive sign in (761). Matrix D thus takes the form of an extreme direction(751) of the EDM cone. The foregoing proof is not extensible in rank: An EDMwith corresponding affine dimension r has the general form, for{z i ∈ N(1 T ), i=1... r} an independent set,( r)∑ T ( r)∑ ∑D = 1δ z i zi T + δ z i ziT 1 T − 2 r z i zi T ∈ EDM N (763)i=1i=1i=1The EDM so defined relies principally on the sum ∑ z i ziT having positivesummand coefficients (⇔ −VN TDV N ≽0) 5.6 . Then it is easy to find asum incorporating negative coefficients while meeting rank, nonnegativity,and symmetric hollowness conditions but not positive semidefiniteness onsubspace R(V N ) ; e.g., from page 280,⎡−V ⎣0 1 11 0 51 5 05.6 (⇐) For a i ∈ R N−1 , let z i =V †TN a i .⎤⎦V 1 2 = z 1z T 1 − z 2 z T 2 (764)

328 CHAPTER 5. EDM CONE5.6.0.0.2 Example. ⎡ Extreme ⎤ rays versus rays on the boundary.0 1 4The EDM D = ⎣ 1 0 1 ⎦ is an extreme direction of EDM 3 where[ ]4 1 01u = in (760). Because −VN 2TDV N has eigenvalues {0, 5}, the ray whosedirection is D also lies on the relative [ boundary ] of EDM 3 .0 1In exception, EDM D = κ , for any particular κ > 0, is an1 0extreme direction of EDM 2 but −VN TDV N has only one eigenvalue: {κ} .Because EDM 2 is a ray whose relative boundary (2.6.1.3.1) is the origin,this conventional boundary does not include D which belongs to the relativeinterior in this dimension. (2.7.0.0.1)5.6.1 Gram-form correspondence to S N−1+With respect to D(G)=δ(G)1 T + 1δ(G) T − 2G (472) the linear Gram-formEDM operator, results in4.6.1 provide [1,2.6]EDM N = D ( V(EDM N ) ) ≡ D ( )V N S N−1+ VNT(765)V N S N−1+ VN T ≡ V ( D ( ))V N S N−1+ VN T = V(EDM N ) = ∆ −V EDM N V 1 = 2 SN c ∩ S N +(766)a one-to-one correspondence between EDM N and S N−1+ .5.6.2 EDM cone by elliptopeDefining the elliptope parametrized by scalar t>0E N t∆= S N + ∩ {Φ∈ S N | δ(Φ)=t1} (644)then following Alfakih [7] we haveEDM N = cone{11 T − E N 1 } = {t(11 T − E N 1 ) | t ≥ 0} (767)Identification E N = E1Nparametrized elliptope.equates the standard elliptope (4.9.1.0.1) to our

5.6. CORRESPONDENCE TO PSD CONE S N−1+ 329dvec rel ∂ EDM 3dvec(11 T − E 3 )EDM N = cone{11 T − E N } = {t(11 T − E N ) | t ≥ 0} (767)Figure 81: Three views of translated negated elliptope 11 T − E13(confer Figure 67) shrouded by truncated EDM cone. Fractal on EDMcone relative boundary is numerical artifact belonging to intersection withelliptope relative boundary. The fractal is trying to convey existence of aneighborhood about the origin where the translated elliptope boundary andEDM cone boundary intersect.

330 CHAPTER 5. EDM CONE5.6.2.0.1 Expository. Define T E (11 T ) to be the tangent cone to theelliptope at point 11 T ; id est,T E (11 T ) ∆ = {t(E N − 11 T ) | t≥0} (768)The normal cone (E.10.3.2.1) K ⊥ E (11T ) to the elliptope at 11 T is definedK ⊥ E (11 T ) ∆ = {B | 〈B , Φ − 11 T 〉 ≤ 0, Φ∈ E N } (769)The polar cone of any set K is the closed convex cone (confer (246))K ◦ ∆ = {B | 〈B , A〉≤0, for all A∈ K} (770)The normal cone is well known to be the polar of the tangent cone,and vice versa; [123,A.5.2.4]From Deza & Laurent [61, p.535] we haveK ⊥ E (11 T ) = T E (11 T ) ◦ (771)K ⊥ E (11 T ) ◦ = T E (11 T ) (772)EDM N = −T E (11 T ) (773)The polar EDM cone is also expressible in terms of the elliptope. From (771)we haveEDM N◦ = −K ⊥ E (11 T ) (774)⋆In4.10.1 we proposed the expression for EDM DD = t11 T − E ∈ EDM N (645)where t ∈ R + and E belongs to the parametrized elliptope. We furtherpropose, for any particular t>0Proof. Pending.EDM N = cone{t11 T − E N t } (775)Relationship of the translated negated elliptope with the EDM cone isillustrated in Figure 81. We speculateEDM N = limt→∞t11 T − E N t (776)

5.7. VECTORIZATION & PROJECTION INTERPRETATION 3315.7 Vectorization & projection interpretationInE.7.2.0.2 we learn: −V DV can be interpreted as orthogonal projection[4,2] of vectorized −D ∈ S N h on the subspace of geometrically centeredsymmetric matricesS N c = {G∈ S N | G1 = 0} (1561)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1562)≡ {V N AVN T | A ∈ SN−1 }(539)because elementary auxiliary matrix V is an orthogonal projector (B.4.1).Yet there is another useful projection interpretation:Revising the fundamental matrix criterion for membership to the EDMcone (455), 5.7〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N h}⇔ D ∈ EDM N (777)this is equivalent, of course, to the Schoenberg criterion−VN TDV }N ≽ 0⇔ D ∈ EDM N (479)D ∈ S N hbecause N(11 T )= R(V N ). When D ∈ EDM N , correspondence (777) means−z T Dz is proportional to a nonnegative coefficient of orthogonal projection(E.6.4.2, Figure 83) of −D in isometrically isomorphic R N(N+1)/2 on therange of each and every vectorized (2.2.2.1) symmetric dyad (B.1) in thenullspace of 11 T ; id est, on each and every member ofT ∆ = { svec(zz T ) | z ∈ N(11 T )= R(V N ) } ⊂ svec ∂ S N += { svec(V N υυ T V T N ) | υ ∈ RN−1} (778)whose dimension isdim T = N(N − 1)/2 (779)5.7 N(11 T )= N(1 T ) and R(zz T )= R(z)

332 CHAPTER 5. EDM CONEThe set of all symmetric dyads {zz T |z∈R N } constitute the extremedirections of the positive semidefinite cone (2.8.1,2.9) S N + , hence lie onits boundary. Yet only those dyads in R(V N ) are included in the test (777),thus only a subset T of all vectorized extreme directions of S N + is observed.In the particularly simple case D ∈ EDM 2 = {D ∈ S 2 h | d 12 ≥ 0} , forexample, only one extreme direction of the PSD cone is involved:zz T =[1 −1−1 1](780)Any nonnegative scaling of vectorized zz T belongs to the set T illustratedin Figure 82 and Figure 83.5.7.1 Face of PSD cone S N + containing VIn any case, set T (778) constitutes the vectorized extreme directions ofan N(N −1)/2-dimensional face of the PSD cone S N + containing auxiliarymatrix V ; a face isomorphic with S N−1+ = S rank V+ (2.9.2.2).To show this, we must first find the smallest face that contains auxiliarymatrix V and then determine its extreme directions. From (181),F ( S N + ∋V ) = {W ∈ S N + | N(W) ⊇ N(V )} = {W ∈ S N + | N(W) ⊇ 1}= {V Y V ≽ 0 | Y ∈ S N } ≡ {V N BV T N | B ∈ SN−1 + }≃ S rank V+ = −V T N EDMN V N (781)where the equivalence ≡ is from4.6.1 while isomorphic equality ≃ withtransformed EDM cone is from (569). Projector V belongs to F ( S N + ∋V )because V N V † N V †TN V N T = V . (B.4.3) Each and every rank-one matrixbelonging to this face is therefore of the form:V N υυ T V T N | υ ∈ R N−1 (782)Because F ( S N + ∋V ) is isomorphic with a positive semidefinite cone S N−1+ ,then T constitutes the vectorized extreme directions of F , the originconstitutes the extreme points of F , and auxiliary matrix V is some convexcombination of those extreme points and directions by the extremes theorem(2.8.1.1.1).

5.7. VECTORIZATION & PROJECTION INTERPRETATION 333svec ∂ S 2 +[ ]d11 d 12d 12 d 22d 22svec EDM 20−Td 11√2d12T ∆ ={[ 1svec(zz T ) | z ∈ N(11 T )= κ−1] }, κ∈ R ⊂ svec ∂ S 2 +Figure 82: Truncated boundary of positive semidefinite cone S 2 + inisometrically isomorphic R 3 (via svec (46)) is, in this dimension, constitutedsolely by its extreme directions. Truncated cone of Euclidean distancematrices EDM 2 in isometrically isomorphic subspace R . Relativeboundary of EDM cone is constituted solely by matrix 0. HalflineT = {κ 2 [ 1 − √ 2 1 ] T | κ∈ R} on PSD cone boundary depicts that loneextreme ray (780) on which orthogonal projection of −D must be positivesemidefinite if D is to belong to EDM 2 . aff cone T = svec S 2 c . (785) DualEDM cone is halfspace in R 3 whose bounding hyperplane has inward-normalsvec EDM 2 .

334 CHAPTER 5. EDM CONEsvec ∂ S 2 +[ ]d11 d 12d 12 d 22d 22svec S 2 h−D0D−Td 11√2d12Projection of vectorized −D on range of vectorized zz T :P svec zz T(svec(−D)) = 〈zzT , −D〉〈zz T , zz T 〉 zzTD ∈ EDM N⇔{〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N h(777)Figure 83: Candidate matrix D is assumed to belong to symmetric hollowsubspace S 2 h ; a line in this dimension. Negating D , we find its polar alongS 2 h . Set T (778) has only one ray member in this dimension; not orthogonalto S 2 h . Orthogonal projection of −D on T (indicated by half dot) hasnonnegative projection coefficient. Candidate D must therefore be EDM.

5.7. VECTORIZATION & PROJECTION INTERPRETATION 335In fact, the smallest face that contains auxiliary matrix V of the PSDcone S N + is the intersection with the geometric center subspace (1561) (1562);F ( S N + ∋V ) = cone { V N υυ T V T N= S N c ∩ S N +In isometrically isomorphic R N(N+1)/2| υ ∈ RN−1}(783)svec F ( S N + ∋V ) = cone T (784)related to S N c byaff cone T = svec S N c (785)5.7.2 EDM criteria in 11 TLaurent specifies an elliptope trajectory condition for EDM conemembership, [145,2.3]D ∈ EDM N ⇔ [1 − e −αd ij] ∈ EDM N ∀α > 0 (639)and from the parametrized elliptope E N tD ∈ EDM N ⇔ ∃ t∈ R +E∈ E N tin5.6.2 we propose} D = t11 T − E (786)Chabrillac & Crouzeix [43,4] prove a different criterion they attributeto Finsler (1937) [77]. We apply it to EDMs: for D ∈ S N h (589)−V T N DV N ≻ 0 ⇔ ∃κ>0 −D + κ11 T ≻ 0⇔D ∈ EDM N with corresponding affine dimension r=N −1(787)This Finsler criterion has geometric interpretation in terms of thevectorization & projection already discussed in connection with (777). Withreference to Figure 82, the offset 11 T is simply a direction orthogonal to Tin isomorphic R 3 . Intuitively, translation of −D in direction 11 T is likeorthogonal projection on T in so far as similar information can be obtained.

336 CHAPTER 5. EDM CONEWhen the Finsler criterion (787) is applied despite lower affine dimension,the constant κ can go to infinity making the test −D+κ11 T ≽ 0 impracticalfor numerical computation. Chabrillac & Crouzeix invent a criterion for thesemidefinite case, but is no more practical: for D ∈ S N hD ∈ EDM N⇔(788)∃κ p >0 ∀κ≥κ p , −D − κ11 T [sic] has exactly one negative eigenvalue5.8 Dual EDM cone5.8.1 Ambient S NWe consider finding the ordinary dual EDM cone in ambient space S N whereEDM N is pointed, closed, convex, but has empty interior. The set of all EDMsin S N is a closed convex cone because it is the intersection of halfspaces aboutthe origin in vectorized variable D (2.4.1.1.1,2.7.2):EDM N = ⋂ {D ∈ S N | 〈e i e T i , D〉 ≥ 0, 〈e i e T i , D〉 ≤ 0, 〈zz T , −D〉 ≥ 0 }z∈ N(1 T )i=1...N(789)By definition (246), dual cone K ∗comprises each and every vectorinward-normal to a hyperplane supporting convex cone K (2.4.2.6.1) orbounding a halfspace containing K . The dual EDM cone in the ambientspace of symmetric matrices is therefore expressible as the aggregate of everyconic combination of inward-normals from (789):EDM N∗ = cone{e i e T i , −e j e T j | i, j =1... N } − cone{zz T | 11 T zz T =0}∑= { N ∑ζ i e i e T i − N ξ j e j e T j | ζ i ,ξ j ≥ 0} − cone{zz T | 11 T zz T =0}i=1j=1= {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1 , (‖v‖= 1) } ⊂ S N= {δ 2 (Y ) − V N ΨV T N | Y ∈ SN , Ψ∈ S N−1+ } (790)The EDM cone is not self-dual in ambient S N because its affine hull belongsto a proper subspaceaff EDM N = S N h (791)

5.8. DUAL EDM CONE 337The ordinary dual EDM cone cannot, therefore, be pointed. (2.13.1.1)When N = 1, the EDM cone is the point at the origin in R . Auxiliarymatrix V N is empty [ ∅ ] , and dual cone EDM ∗ is the real line.When N = 2, the EDM cone is a nonnegative real line in isometricallyisomorphic R 3 ; there S 2 h is a real line containing the EDM cone. Dualcone EDM 2∗ is the particular halfspace whose boundary has inward-normalEDM 2 . Diagonal matrices {δ(u)} in (790) are represented by a hyperplanethrough the origin {d | [ 0 1 0 ]d = 0} while the term cone{V N υυ T VN T }is represented by the halfline T in Figure 82 belonging to the positivesemidefinite cone boundary. The dual EDM cone is formed by translatingthe hyperplane along the negative semidefinite halfline −T ; the union ofeach and every translation. (confer2.10.2.0.1)When cardinality N exceeds 2, the dual EDM cone can no longer bepolyhedral simply because the EDM cone cannot. (2.13.1.1)5.8.1.1 EDM cone and its dual in ambient S NConsider the two convex conessoK ∆ 1 = S N hK ∆ 2 =⋂{A ∈ S N | 〈yy T , −A〉 ≥ 0 }y∈N(1 T )= { A ∈ S N | −z T V AV z ≥ 0 ∀zz T (≽ 0) } (792)= { A ∈ S N | −V AV ≽ 0 }K 1 ∩ K 2 = EDM N (793)The dual cone K ∗ 1 = S N⊥h ⊆ S N (61) is the subspace of diagonal matrices.From (790) via (257),K ∗ 2 = − cone { V N υυ T V T N | υ ∈ R N−1} ⊂ S N (794)Gaffke & Mathar [79,5.3] observe that projection on K 1 and K 2 havesimple closed forms: Projection on subspace K 1 is easily performed bysymmetrization and zeroing the main diagonal or vice versa, while projectionof H ∈ S N on K 2 isP K2 H = H − P S N+(V H V ) (795)

338 CHAPTER 5. EDM CONEProof. First, we observe membership of H−P S N+(V H V ) to K 2 because()P S N+(V H V ) − H = P S N+(V H V ) − V H V + (V H V − H) (796)The term P S N+(V H V ) − V H V necessarily belongs to the (dual) positivesemidefinite cone by Theorem E.9.2.0.1. V 2 = V , hence()−V H −P S N+(V H V ) V ≽ 0 (797)by Corollary A.3.1.0.5.Next, we requireExpanding,〈P K2 H −H , P K2 H 〉 = 0 (798)〈−P S N+(V H V ) , H −P S N+(V H V )〉 = 0 (799)〈P S N+(V H V ) , (P S N+(V H V ) − V H V ) + (V H V − H)〉 = 0 (800)〈P S N+(V H V ) , (V H V − H)〉 = 0 (801)Product V H V belongs to the geometric center subspace; (E.7.2.0.2)V H V ∈ S N c = {Y ∈ S N | N(Y )⊇1} (802)Diagonalize V H V ∆ =QΛQ T (A.5) whose nullspace is spanned bythe eigenvectors corresponding to 0 eigenvalues by Theorem A.7.2.0.1.Projection of V H V on the PSD cone (7.1) simply zeros negative eigenvaluesin diagonal matrix Λ . Thenfrom which it follows:N(P S N+(V H V )) ⊇ N(V H V ) (⊇ N(V )) (803)P S N+(V H V ) ∈ S N c (804)so P S N+(V H V ) ⊥ (V H V −H) because V H V −H ∈ S N⊥c .Finally, we must have P K2 H −H =−P S N+(V H V )∈ K ∗ 2 .From5.7.1we know dual cone K ∗ 2 =−F ( S N + ∋V ) is the negative of the positivesemidefinite cone’s smallest face that contains auxiliary matrix V . ThusP S N+(V H V )∈ F ( S N + ∋V ) ⇔ N(P S N+(V H V ))⊇ N(V ) (2.9.2.2) which wasalready established in (803).

5.8. DUAL EDM CONE 339EDM 2 = S 2 h ∩ ( )S 2⊥c − S 2 +svec S 2⊥csvec S 2 csvec ∂S 2 +svec S 2 h0Figure 84: Orthogonal complement S 2⊥c (1563) (B.2) of geometric centersubspace (a plane in isometrically isomorphic R 3 ; drawn is a tiled fragment)apparently supporting positive semidefinite cone. (Rounded vertex is artifactof plot.) Line svec S 2 c = aff cone T (785), also drawn in Figure 82, intersectssvec ∂S 2 + ; it runs along PSD cone boundary. (confer Figure 66)

340 CHAPTER 5. EDM CONEEDM 2 = S 2 h ∩ ( )S 2⊥c − S 2 +svec S 2⊥csvec S 2 csvec S 2 h0− svec ∂S 2 +Figure 85: EDM cone construction in isometrically isomorphic R 3 by addingpolar PSD cone to svec S 2⊥c . Difference svec ( S 2⊥c − S+) 2 is halfspace partiallybounded by svec S 2⊥c . EDM cone is nonnegative halfline along svec S 2 h inthis dimension.

5.8. DUAL EDM CONE 341From the results inE.7.2.0.2, we know matrix product V H V is theorthogonal projection of H ∈ S N on the geometric center subspace S N c . Thusthe projection productP K2 H = H − P S N+P S N cH (805)5.8.1.1.1 Lemma. Projection on PSD cone ∩ geometric center subspace.P S N+ ∩ S N c= P S N+P S N c(806)⋄Proof. For each and every H ∈ S N , projection of P S N cH on the positivesemidefinite cone remains in the geometric center subspaceS N c = {G∈ S N | G1 = 0} (1561)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1562)(539)That is because: Eigenvectors of P S N cH corresponding to 0 eigenvalues spanits nullspace. (A.7.2.0.1) To project P S N cH on the positive semidefinitecone, its negative eigenvalues are zeroed. (7.1.2) The nullspace is therebyexpanded while the eigenvectors originally spanning N(P S N cH) remainintact. Because the geometric center subspace is invariant to projection onthe PSD cone, then the rule for projection on a convex set in a subspacegoverns (E.9.5 projectors do not commute), and statement (806) followsdirectly.From the lemma it followsThen from (1583){P S N+P S N cH | H ∈ S N } = {P S N+ ∩ S N c H | H ∈ SN } (807)− ( S N c ∩ S N +) ∗= {H − PS N+P S N cH | H ∈ S N } (808)From (257) we get closure of a vector sumK 2 = − ( )S N c ∩ S N ∗+ = SN⊥c − S N + (809)therefore the new equalityEDM N = K 1 ∩ K 2 = S N h ∩ ( )S N⊥c − S N +(810)

342 CHAPTER 5. EDM CONEwhose veracity is intuitively evident, in hindsight, [50, p.109] from themost fundamental EDM definition (460). Formula (810) is not a matrixcriterion for membership to the EDM cone, and it is not an equivalencebetween EDM operators. Rather, it is a recipe for constructing the EDMcone whole from large Euclidean bodies: the positive semidefinite cone,orthogonal complement of the geometric center subspace, and symmetrichollow subspace. A realization of this construction in low dimension isillustrated in Figure 84 and Figure 85.The dual EDM cone follows directly from (810) by standard propertiesof cones (2.13.1.1):EDM N∗ = K ∗ 1 + K ∗ 2 = S N⊥h − S N c ∩ S N + (811)which bears strong resemblance to (790).5.8.1.2 nonnegative orthantThat EDM N is a proper subset of the nonnegative orthant is not obviousfrom (810). We wish to verifyEDM N = S N h ∩ ( )S N⊥c − S N + ⊂ RN×N+ (812)While there are many ways to prove this, it is sufficient to show that allentries of the extreme directions of EDM N must be nonnegative; id est, forany particular nonzero vector z = [z i , i=1... N]∈ N(1 T ) (5.5.3.2),δ(zz T )1 T + 1δ(zz T ) T − 2zz T ≥ 0 (813)where the inequality denotes entrywise comparison. The inequality holdsbecause the i,j th entry of an extreme direction is squared: (z i − z j ) 2 .We observe that the dyad 2zz T ∈ S N + belongs to the positive semidefinitecone, the doubletδ(zz T )1 T + 1δ(zz T ) T ∈ S N⊥c (814)to the orthogonal complement (1563) of the geometric center subspace, whiletheir difference is a member of the symmetric hollow subspace S N h .Here is an algebraic method provided by a reader to prove nonnegativity:Suppose we are given A∈ S N⊥c and B = [B ij ]∈ S N + and A −B ∈ S N h .Then we have, for some vector u , A = u1 T + 1u T = [A ij ] = [u i + u j ] andδ(B)= δ(A)= 2u . Positive semidefiniteness of B requires nonnegativityA −B ≥ 0 because(e i −e j ) T B(e i −e j ) = (B ii −B ij )−(B ji −B jj ) = 2(u i +u j )−2B ij ≥ 0 (815)

5.8. DUAL EDM CONE 3435.8.1.3 Dual Euclidean distance matrix criterionConditions necessary for membership of a matrix D ∗ ∈ S N to thedual EDM cone EDM N∗ may be derived from (790): D ∗ ∈ EDM N∗ ⇒D ∗ = δ(y) − V N AVNT for some vector y and positive semidefinite matrixA∈ S N−1+ . This in turn implies δ(D ∗ 1)=δ(y) . Then, for D ∗ ∈ S Nwhere, for any symmetric matrix D ∗D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (816)δ(D ∗ 1) − D ∗ ∈ S N c (817)To show sufficiency of the matrix criterion in (816), recall Gram-formEDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G (472)where Gram matrix G is positive semidefinite by definition, and recall theself-adjointness property of the main-diagonal linear operator δ (A.1):〈D , D ∗ 〉 = 〈 δ(G)1 T + 1δ(G) T − 2G , D ∗〉 = 〈G , δ(D ∗ 1) − D ∗ 〉 2 (491)Assuming 〈G , δ(D ∗ 1) − D ∗ 〉≥ 0 (1082), then we have known membershiprelation (2.13.2.0.1)D ∗ ∈ EDM N∗ ⇔ 〈D , D ∗ 〉 ≥ 0 ∀D ∈ EDM N (818)Elegance of this matrix criterion (816) for membership to the dualEDM cone is the lack of any other assumptions except D ∗ be symmetric.(Recall: Schoenberg criterion (479) for membership to the EDM cone requiresmembership to the symmetric hollow subspace.)Linear Gram-form EDM operator has adjoint, for Y ∈ S NThen we have: [50, p.111]D T (Y ) ∆ = (δ(Y 1) − Y ) 2 (819)EDM N∗ = {Y ∈ S N | δ(Y 1) − Y ≽ 0} (820)the dual EDM cone expressed in terms of the adjoint operator. A dual EDMcone determined this way is illustrated in Figure 87.We leave it an exercise to find a spectral cone as in4.11.2 correspondingto EDM N∗ .

344 CHAPTER 5. EDM CONED ◦ = δ(D ◦ 1) + (D ◦ − δ(D ◦ 1)) ∈ S N⊥h⊕ S N c ∩ S N + = EDM N◦S N c ∩ S N +EDM N◦ D ◦ − δ(D ◦ 1)D ◦δ(D ◦ 1)0EDM N◦S N⊥hFigure 86: Hand-drawn abstraction of polar EDM cone (drawn truncated).Any member D ◦ of polar EDM cone can be decomposed into two linearlyindependent nonorthogonal components: δ(D ◦ 1) and D ◦ − δ(D ◦ 1).

5.8. DUAL EDM CONE 3455.8.1.4 Nonorthogonal components of dual EDMNow we tie construct (811) for the dual EDM cone together with the matrixcriterion (816) for dual EDM cone membership. For any D ∗ ∈ S N it isobvious:δ(D ∗ 1) ∈ S N⊥h (821)any diagonal matrix belongs to the subspace of diagonal matrices (56). Weknow when D ∗ ∈ EDM N∗ δ(D ∗ 1) − D ∗ ∈ S N c ∩ S N + (822)this adjoint expression (819) belongs to that face (783) of the positivesemidefinite cone S N + in the geometric center subspace. Any nonzerodual EDMD ∗ = δ(D ∗ 1) − (δ(D ∗ 1) − D ∗ ) ∈ S N⊥h ⊖ S N c ∩ S N + = EDM N∗ (823)can therefore be expressed as the difference of two linearly independentnonorthogonal components (Figure 66, Figure 86).5.8.1.5 Affine dimension complementarityFrom5.8.1.3 we have, for some A∈ S N−1+ (confer (822))δ(D ∗ 1) − D ∗ = V N AV T N ∈ S N c ∩ S N + (824)if and only if D ∗ belongs to the dual EDM cone. Call rank(V N AV T N ) dualaffine dimension. Empirically, we find a complementary relationship in affinedimension between the projection of some arbitrary symmetric matrix H onthe polar EDM cone, EDM N◦ = −EDM N∗ , and its projection on the EDM

346 CHAPTER 5. EDM CONEcone; id est, the optimal solution of 5.8minimize ‖D ◦ − H‖ FD ◦ ∈ S Nsubject to D ◦ − δ(D ◦ 1) ≽ 0(825)has dual affine dimension complementary to affine dimension correspondingto the optimal solution ofminimize ‖D − H‖ FD∈S N hsubject to −VN TDV N ≽ 0(826)Precisely,rank(D ◦⋆ −δ(D ◦⋆ 1)) + rank(V T ND ⋆ V N ) = N −1 (827)and rank(D ◦⋆ −δ(D ◦⋆ 1))≤N−1 because vector 1 is always in the nullspaceof rank’s argument. This is similar to the known result for projection on theself-dual positive semidefinite cone and its polar:rankP −S N+H + rankP S N+H = N (828)When low affine dimension is a desirable result of projection on theEDM cone, projection on the polar EDM cone should be performed instead.Convex polar problem (825) can be solved for D ◦⋆ by transforming to anequivalent Schur-form semidefinite program (A.4.1). Interior-point methodsfor numerically solving semidefinite programs tend to produce high-ranksolutions. (6.1.1) Then D ⋆ = H − D ◦⋆ ∈ EDM N by Corollary E.9.2.2.1, andD ⋆ will tend to have low affine dimension. This approach breaks whenattempting projection on a cone subset discriminated by affine dimensionor rank, because then we have no complementarity relation like (827) or(828) (7.1.4.1).5.8 This dual projection can be solved quickly (without semidefinite programming) viaLemma 5.8.1.1.1; rewriting,minimize ‖(D ◦ − δ(D ◦ 1)) − (H − δ(D ◦ 1))‖ FD ◦ ∈ S Nsubject to D ◦ − δ(D ◦ 1) ≽ 0which is the projection of affinely transformed optimal solution H − δ(D ◦⋆ 1) on S N c ∩ S N + ;D ◦⋆ − δ(D ◦⋆ 1) = P S N+P S N c(H − δ(D ◦⋆ 1))Foreknowledge of an optimal solution D ◦⋆ as argument to projection suggests recursion.

5.8. DUAL EDM CONE 3475.8.1.6 EDM cone dualityIn4.6.1.1, via Gram-form EDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G ∈ EDM N ⇐ G ≽ 0 (472)we established clear connection between the EDM cone and that face (783)of positive semidefinite cone S N + in the geometric center subspace:EDM N = D(S N c ∩ S N +) (556)V(EDM N ) = S N c ∩ S N + (557)whereIn4.6.1 we establishedV(D) = −V DV 1 2(545)S N c ∩ S N + = V N S N−1+ V T N (543)Then from (816), (824), and (790) we can deduceδ(EDM N∗ 1) − EDM N∗ = V N S N−1+ V T N = S N c ∩ S N + (829)which, by (556) and (557), means the EDM cone can be related to the dualEDM cone by an equality:(EDM N = D δ(EDM N∗ 1) − EDM N∗) (830)V(EDM N ) = δ(EDM N∗ 1) − EDM N∗ (831)This means projection −V(EDM N ) of the EDM cone on the geometric centersubspace S N c (E.7.2.0.2) is an affine transformation of the dual EDM cone:EDM N∗ − δ(EDM N∗ 1). Secondarily, it means the EDM cone is not self-dual.

348 CHAPTER 5. EDM CONE5.8.1.7 Schoenberg criterion is discretized membership relationWe show the Schoenberg criterion−VN TDV }N ≽ 0D ∈ S N h⇔ D ∈ EDM N (479)to be a discretized membership relation (2.13.4) between a closed convexcone K and its dual K ∗ like〈y, x〉 ≥ 0 for all y ∈ G(K ∗ ) ⇔ x ∈ K (294)where G(K ∗ ) is any set of generators whose conic hull constructs closedconvex dual cone K ∗ :The Schoenberg criterion is the same as}〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0⇔ D ∈ EDM N (777)D ∈ S N hwhich, by (778), is the same as〈zz T , −D〉 ≥ 0 ∀zz T ∈ { V N υυ T VN T | υ ∈ RN−1}D ∈ S N h}⇔ D ∈ EDM N (832)where the zz T constitute a set of generators G for the positive semidefinitecone’s smallest face F ( S N + ∋V ) (5.7.1) that contains auxiliary matrix V .From the aggregate in (790) we get the ordinary membership relation,assuming only D ∈ S N [123, p.58]〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ EDM N∗ ⇔ D ∈ EDM N(833)〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1} ⇔ D ∈ EDM NDiscretization yields (294):〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ {e i e T i , −e j e T j , −V N υυ T V T N | i, j =1... N , υ ∈ R N−1 } ⇔ D ∈ EDM N(834)

5.8. DUAL EDM CONE 349Because 〈 {δ(u) | u∈ R N }, D 〉 ≥ 0 ⇔ D ∈ S N h , we can restrict observationto the symmetric hollow subspace without loss of generality. Then for D ∈ S N h〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ { −V N υυ T V T N | υ ∈ R N−1} ⇔ D ∈ EDM N (835)this discretized membership relation becomes (832); identical to theSchoenberg criterion.Hitherto a correspondence between the EDM cone and a face of a PSDcone, the Schoenberg criterion is now accurately interpreted as a discretizedmembership relation between the EDM cone and its ordinary dual.5.8.2 Ambient S N hWhen instead we consider the ambient space of symmetric hollow matrices(791), then still we find the EDM cone is not self-dual for N >2. Thesimplest way to prove this is as follows:Given a set of generators G = {Γ} (751) for the pointed closed convexEDM cone, the discrete membership theorem in2.13.4.2.1 asserts thatmembers of the dual EDM cone in the ambient space of symmetric hollowmatrices can be discerned via discretized membership relation:EDM N∗ ∩ S N h∆= {D ∗ ∈ S N h | 〈Γ , D ∗ 〉 ≥ 0 ∀ Γ ∈ G(EDM N )}(836)By comparison= {D ∗ ∈ S N h | 〈δ(zz T )1 T + 1δ(zz T ) T − 2zz T , D ∗ 〉 ≥ 0 ∀z∈ N(1 T )}= {D ∗ ∈ S N h | 〈1δ(zz T ) T − zz T , D ∗ 〉 ≥ 0 ∀z∈ N(1 T )}EDM N = {D ∈ S N h | 〈−zz T , D〉 ≥ 0 ∀z∈ N(1 T )} (837)the term δ(zz T ) T D ∗ 1 foils any hope of self-duality in ambient S N h .

350 CHAPTER 5. EDM CONETo find the dual EDM cone in ambient S N h per2.13.9.4 we prune theaggregate in (790) describing the ordinary dual EDM cone, removing anymember having nonzero main diagonal:EDM N∗ ∩ S N h = cone { δ 2 (V N υυ T V T N ) − V N υυT V T N | υ ∈ RN−1}= {δ 2 (V N ΨV T N ) − V N ΨV T N| Ψ∈ SN−1 + }(838)When N = 1, the EDM cone and its dual in ambient S h each comprisethe origin in isomorphic R 0 ; thus, self-dual in this dimension. (confer (80))When N = 2, the EDM cone is the nonnegative real line in isomorphic R .(Figure 82) EDM 2∗ in S 2 h is identical, thus self-dual in this dimension.This [ result]is in agreement [ with ] (836), verified directly: for all κ∈ R ,11z = κ and δ(zz−1T ) = κ 2 ⇒ d ∗ 12 ≥ 0.1The first case adverse to self-duality N = 3 may be deduced fromFigure 74; the EDM cone is a circular cone in isomorphic R 3 corresponding tono rotation of the Lorentz cone (141) (the self-dual circular cone). Figure 87illustrates the EDM cone and its dual in ambient S 3 h ; no longer self-dual.5.9 Theorem of the alternativeIn2.13.2.1.1 we showed how alternative systems of generalized inequalitycan be derived from closed convex cones and their duals. This section is,therefore, a fitting postscript to the discussion of the dual EDM cone.5.9.0.0.1 Theorem. EDM alternative. [91,1]Given D ∈ S N hD ∈ EDM Nor in the alternative{1 T z = 1∃ z such thatDz = 0(839)In words, either N(D) intersects hyperplane {z | 1 T z =1} or D is an EDM;the alternatives are incompatible.⋄

5.9. THEOREM OF THE ALTERNATIVE 3510dvec rel∂EDM 3dvec rel∂(EDM 3∗ ∩ S 3 h)D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (816)Figure 87: Ordinary dual EDM cone projected on S 3 h shrouds EDM 3 ;drawn tiled in isometrically isomorphic R 3 . (It so happens: intersectionEDM N∗ ∩ S N h (2.13.9.3) is identical to projection of dual EDM cone on S N h .)

352 CHAPTER 5. EDM CONEWhen D is an EDM [161,2]N(D) ⊂ N(1 T ) = {z | 1 T z = 0} (840)Because [91,2] (E.0.1)thenDD † 1 = 11 T D † D = 1 T (841)R(1) ⊂ R(D) (842)5.10 postscriptWe provided an equality (810) relating the convex cone of Euclidean distancematrices to the convex cone of positive semidefinite matrices. Projection onthe positive semidefinite cone constrained by an upper bound on rank iseasy and well known; [67] simply a matter of truncating a list of eigenvalues.Projection on the positive semidefinite cone with such a rank constraint is,in fact, a convex optimization problem. (7.1.4)Yet, it remains difficult to project on the EDM cone under a constrainton rank or affine dimension. One thing we can do is invoke the Schoenbergcriterion (479) and then project on a positive semidefinite cone under aconstraint bounding affine dimension from above.It is our hope that the equality herein relating EDM and PSD cones willbecome a step toward understanding projection on the EDM cone under arank constraint.

Chapter 6Semidefinite programmingPrior to 1984 , 6.1 linear and nonlinear programming, one a subsetof the other, had evolved for the most part along unconnectedpaths, without even a common terminology. (The use of“programming” to mean “optimization” serves as a persistentreminder of these differences.)−Forsgren, Gill, & Wright (2002) [78]Given some application of convex analysis, it may at first seem puzzling whythe search for its solution ends abruptly with a formalized statement of theproblem itself as a constrained optimization. The explanation is: typicallywe do not seek analytical solution because there are relatively few. (C) If aproblem can be expressed in convex form, rather, then there exist computerprograms providing efficient numerical global solution. [216] [95] [252] [253][254] [259]6.1 nascence of interior-point methods of solution [236] [250],2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.353

354 CHAPTER 6. SEMIDEFINITE PROGRAMMINGThe goal, then, becomes conversion of a given problem (perhaps anonconvex problem statement) to an equivalent convex form or to analternation of convex subproblems convergent to a solution of the originalproblem: A fundamental property of convex optimization problems is thatany locally optimal point is also (globally) optimal. [38,4.2.2] [193,1] Givenconvex real objective function g and convex feasible set C ⊆dom g , whichis the set of all variable values satisfying the problem constraints, we havethe generic convex optimization problemminimize g(X)Xsubject to X ∈ C(843)where constraints are abstract here in the membership of variable X tofeasible set C . Inequality constraint functions of a convex optimizationproblem are convex while equality constraint functions are conventionallyaffine, but not necessarily so. Affine equality constraint functions (necessarilyconvex), as opposed to the larger set of all convex equality constraintfunctions having convex level sets, make convex optimization tractable.Similarly, the problemmaximize g(X)Xsubject to X ∈ C(844)is convex were g a real concave function. As conversion to convex form is notalways possible, there is much ongoing research to determine which problemclasses have convex expression or relaxation. [24] [37] [81] [174] [223] [80]6.1 Conic problemStill, we are surprised to see the relatively small number ofsubmissions to semidefinite programming (SDP) solvers, as thisis an area of significant current interest to the optimizationcommunity. We speculate that semidefinite programming issimply experiencing the fate of most new areas: Users have yet tounderstand how to pose their problems as semidefinite programs,and the lack of support for SDP solvers in popular modellinglanguages likely discourages submissions.−SIAM News, 2002. [63, p.9]

6.1. CONIC PROBLEM 355Consider a prototypical conic problem (p) and its dual (d): [184,3.3.1][148,2.1](p)minimize c T xxsubject to x ∈ KAx = bmaximize b T yy,ssubject to s ∈ K ∗A T y + s = c(d) (249)where K is a closed convex cone, K ∗ is its dual, matrix A is fixed, and theremaining quantities are vectors.When K is a polyhedral cone (2.12.1), then each conic problem becomesa linear program [53]. More generally, each optimization problem is convexwhen K is a closed convex cone. Unlike the optimal objective value, asolution to each problem is not necessarily unique; in other words, the optimalsolution set {x ⋆ } or {y ⋆ , s ⋆ } is convex and may comprise more than a singlepoint although the corresponding optimal objective value is unique when thefeasible set is nonempty.When K is the self-dual cone of positive semidefinite matrices in thesubspace of symmetric matrices, then each conic problem is called asemidefinite program (SDP); [174,6.4] primal problem (P) having matrixvariable X ∈ S n while corresponding dual (D) has matrix slack variableS ∈ S n and vector variable y = [y i ]∈ R m : [8] [9,2] [259,1.3.8](P)minimizeX∈ S n 〈C , X〉subject to X ≽ 0A svec X = bmaximizey∈R m , S∈S n 〈b, y〉subject to S ≽ 0svec −1 (A T y) + S = C(D)(845)where matrix C ∈ S n and vector b∈R m are fixed, as is⎡ ⎤svec(A 1 ) TA =∆ ⎣ . ⎦ ∈ R m×n(n+1)/2 (846)svec(A m ) Twhere A i ∈ S n , i=1... m , are given. Thus⎡ ⎤〈A 1 , X〉A svec X = ⎣ . ⎦〈A m , X〉(847)∑svec −1 (A T y) = m y i A ii=1

356 CHAPTER 6. SEMIDEFINITE PROGRAMMINGThe vector inner-product for matrices is defined in the Euclidean/Frobeniussense in the isomorphic vector space R n(n+1)/2 ; id est,〈C,X〉 ∆ = tr(C T X) = svec(C) T svec X (30)where svec X defined by (46) denotes symmetric vectorization.Semidefinite programming has emerged recently to prominence primarilybecause it admits a new class of problem previously unsolvable by convexoptimization techniques, [37] secondarily because it theoretically subsumesother convex techniques such as linear, quadratic, and second-order coneprogramming. Determination of the Riemann mapping function fromcomplex analysis [178] [21,8,13], for example, can be posed as asemidefinite program.6.1.1 Maximal complementarityIt has been shown that contemporary interior-point methods (developedcirca 1990 [81]) [38,11] [251] [181] [174] [9] [78] for numericalsolution of semidefinite programs can converge to a solution of maximalcomplementarity; [103,5] [258] [157] [87] not a vertex-solution but a solutionof highest cardinality or rank among all optimal solutions. 6.2 [259,2.5.3]6.1.1.1 Reduced-rank solutionA simple rank reduction algorithm for construction of a primal optimalsolution X ⋆ to (845)(P) satisfying an upper bound on rank governed byProposition 2.9.3.0.1 is presented in6.3. That proposition asserts existenceof feasible solutions with an upper bound on their rank; [17,II.13.1]specifically, it asserts an extreme point (2.6.0.0.1) of the primal feasibleset A ∩ S n + satisfies upper bound⌊√ ⌋ 8m + 1 − 1rankX ≤(220)2where, given A∈ R m×n(n+1)/2 and b∈ R mA ∆ = {X ∈ S n | A svec X = b} (848)is the affine subset from primal problem (845)(P).6.2 This characteristic might be regarded as a disadvantage to this method of numericalsolution, but this behavior is not certain and depends on solver implementation.

6.1. CONIC PROBLEM 357C0PΓ 1Γ 2S+3A=∂HFigure 88: Visualizing positive semidefinite cone in high dimension: Properpolyhedral cone S+ 3 ⊂ R 3 representing positive semidefinite cone S 3 + ⊂ S 3 ;analogizing its intersection with hyperplane S 3 + ∩ ∂H . Number of facetsis arbitrary (analogy is not inspired by eigen decomposition). The rank-0positive semidefinite matrix corresponds to the origin in R 3 , rank-1 positivesemidefinite matrices correspond to the edges of the polyhedral cone, rank-2to the facet relative interiors, and rank-3 to the polyhedral cone interior.Γ 1 and Γ 2 are extreme points of polyhedron P =∂H ∩ S+ 3 , and extremedirections of S+ 3 . A given vector C is normal to another hyperplane(not illustrated but independent w.r.t ∂H) containing line segment Γ 1 Γ 2minimizing real linear function 〈C , X〉 on P . (confer Figure 16)

358 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.1.1.2 Coexistence of low- and high-rank solutions; analogyThat low-rank and high-rank optimal solutions {X ⋆ } of (845)(P) coexist maybe grasped with the following analogy: We compare a proper polyhedral coneS 3 + in R 3 (illustrated in Figure 88) to the positive semidefinite cone S 3 + inisometrically isomorphic R 6 , difficult to visualize. The analogy is good:int S 3 + is constituted by rank-3 matricesint S 3 + has three dimensionsboundary ∂S 3 + contains rank-0, rank-1, and rank-2 matricesboundary ∂S 3 + contains 0-, 1-, and 2-dimensional facesthe only rank-0 matrix resides in the vertex at the originRank-1 matrices are in one-to-one correspondence with extremedirections of S 3 + and S 3 + . The set of all rank-1 symmetric matrices inthis dimension{G ∈ S3+ | rankG=1 } (849)is not a connected set.Rank of a sum of members F+G in Lemma 2.9.2.5.1 and location ofa difference F −G in2.9.2.8.1 similarly hold for S 3 + and S 3 + .Euclidean distance from any particular rank-3 positive semidefinitematrix (in the cone interior) to the closest rank-2 positive semidefinitematrix (on the boundary) is generally less than the distance to theclosest rank-1 positive semidefinite matrix. (7.1.2)distance from any point in ∂S 3 + to int S 3 + is infinitesimal (2.1.7.1.1)distance from any point in ∂S 3 + to int S 3 + is infinitesimal

6.1. CONIC PROBLEM 359faces of S 3 + correspond to faces of S 3 + (confer Table 2.9.2.2.1)k dim F(S+) 3 dim F(S 3 +) dim F(S 3 + ∋ rank-k matrix)0 0 0 0boundary 1 1 1 12 2 3 3interior 3 3 6 6Integer k indexes k-dimensional faces F of S 3 + . Positive semidefinitecone S 3 + has four kinds of faces, including cone itself (k = 3,boundary + interior), whose dimensions in isometrically isomorphicR 6 are listed under dim F(S 3 +). Smallest face F(S 3 + ∋ rank-k matrix)that contains a rank-k positive semidefinite matrix has dimensionk(k + 1)/2 by (182).For A equal to intersection of m hyperplanes having independentnormals, and for X ∈ S 3 + ∩ A , we have rankX ≤ m ; the analogueto (220).Proof. With reference to Figure 88: Assume one (m = 1) hyperplaneA = ∂H intersects the polyhedral cone. Every intersecting planecontains at least one matrix having rank less than or equal to 1 ; id est,from all X ∈ ∂H ∩ S 3 + there exists an X such that rankX ≤ 1. Rank-1is therefore an upper bound in this case.Now visualize intersection of the polyhedral cone with two (m = 2)hyperplanes having linearly independent normals. The hyperplaneintersection A makes a line. Every intersecting line contains at least onematrix having rank less than or equal to 2, providing an upper bound.In other words, there exists a positive semidefinite matrix X belongingto any line intersecting the polyhedral cone such that rankX ≤ 2.In the case of three independent intersecting hyperplanes (m = 3), thehyperplane intersection A makes a point that can reside anywhere inthe polyhedral cone. The upper bound on a point in S+ 3 is also thegreatest upper bound: rankX ≤ 3.

360 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.1.1.2.1 Example. Optimization on A ∩ S 3 + .Consider minimization of the real linear function 〈C , X〉 ona polyhedral feasible set;P ∆ = A ∩ S 3 + (850)f ⋆ 0∆= minimize 〈C , X〉Xsubject to X ∈ A ∩ S+3(851)As illustrated for particular C and A = ∂H in Figure 88, this linear functionis minimized (confer Figure 16) on any X belonging to the face of Pcontaining extreme points {Γ 1 , Γ 2 } and all the rank-2 matrices in between;id est, on any X belonging to the face of PF(P) = {X | 〈C , X〉 = f ⋆ 0 } ∩ A ∩ S 3 + (852)exposed by the hyperplane {X | 〈C , X〉=f ⋆ 0 }. In other words, the set of alloptimal points X ⋆ is a face of P{X ⋆ } = F(P) = Γ 1 Γ 2 (853)comprising rank-1 and rank-2 positive semidefinite matrices. Rank-1 isthe upper bound on existence in the feasible set P for this case m = 1hyperplane constituting A . The rank-1 matrices Γ 1 and Γ 2 in face F(P)are extreme points of that face and (by transitivity (2.6.1.2)) extremepoints of the intersection P as well. As predicted by analogy to Barvinok’sProposition 2.9.3.0.1, the upper bound on rank of X existent in the feasibleset P is satisfied by an extreme point. The upper bound on rank of anoptimal solution X ⋆ existent in F(P) is thereby also satisfied by an extremepoint of P precisely because {X ⋆ } constitutes F(P) ; 6.3 in particular,{X ⋆ ∈ P | rankX ⋆ ≤ 1} = {Γ 1 , Γ 2 } ⊆ F(P) (854)As all linear functions on a polyhedron are minimized on a face, [53] [156][173] [175] by analogy we so demonstrate coexistence of optimal solutions X ⋆of (845)(P) having assorted rank.6.3 and every face contains a subset of the extreme points of P by the extremeexistence theorem (2.6.0.0.2). This means: because the affine subset A and hyperplane{X | 〈C , X 〉 = f ⋆ 0 } must intersect a whole face of P , calculation of an upper bound onrank of X ⋆ ignores counting the hyperplane when determining m in (220).

6.1. CONIC PROBLEM 3616.1.1.3 Previous workBarvinok showed [18,2.2] when given a positive definite matrix C and anarbitrarily small neighborhood of C comprising positive definite matrices,there exists a matrix ˜C from that neighborhood such that optimal solutionX ⋆ to (845)(P) (substituting ˜C) is an extreme point of A ∩ S n + and satisfiesupper bound (220). 6.4 Given arbitrary positive definite C , this meansnothing inherently guarantees that an optimal solution X ⋆ to problem(845)(P) satisfies (220); certainly nothing given any symmetric matrix C ,as the problem is posed. This can be proved by example:6.1.1.3.1 Example. (Ye) Maximal Complementarity.Assume dimension n to be an even positive number. Then the particularinstance of problem (845)(P),minimizeX∈ S n 〈[ I 00 2Isubject to X ≽ 0〈I , X〉 = n] 〉, X(855)has optimal solutionX ⋆ =[ 2I 00 0]∈ S n (856)with an equal number of twos and zeros along the main diagonal. Indeed,optimal solution (856) is a terminal solution along the central path taken bythe interior-point method as implemented in [259,2.5.3]; it is also a solutionof highest rank among all optimal solutions to (855). Clearly, rank of thisprimal optimal solution exceeds by far a rank-1 solution predicted by upperbound (220).6.4 Further, the set of all such ˜C in that neighborhood is open and dense.

362 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.1.1.4 Later developmentsThis rational example (855) indicates the need for a more generally applicableand simple algorithm to identify an optimal solution X ⋆ satisfying Barvinok’sProposition 2.9.3.0.1. We will review such an algorithm in6.3, but first weprovide more background.6.2 Framework6.2.1 Feasible setsDenote by C and C ∗ the convex sets of primal and dual points respectivelysatisfying the primal and dual constraints in (845), each assumed nonempty;⎧ ⎡⎨C =⎩ X ∈ Sn + | ⎣C ∗ =〈A 1 , X〉.〈A m , X〉{S ∈ S n + , y = [y i ]∈ R m |⎤ ⎫⎬⎦= b⎭ = A ∩ Sn +m∑i=1}y i A i + S = C(857)These are the primal feasible set and dual feasible set in domain intersectionof the respective constraint functions. Geometrically, primal feasible A ∩ S n +represents an intersection of the positive semidefinite cone S n + with anaffine subset A of the subspace of symmetric matrices S n in isometricallyisomorphic R n(n+1)/2 . The affine subset has dimension n(n+1)/2 −m whenthe A i are linearly independent. Dual feasible set C ∗ is the Cartesian productof the positive semidefinite cone with its inverse image (2.1.9.0.1) underaffine transformation C − ∑ y i A i . 6.5 Both sets are closed and convex andthe objective functions on a Euclidean vector space are linear, hence (845)(P)and (845)(D) are convex optimization problems.6.5 The inequality C − ∑ y i A i ≽0 follows directly from (845)(D) (2.9.0.1.1) and is knownas a linear matrix inequality. (2.13.5.1.1) Because ∑ y i A i ≼C , matrix S is known as aslack variable (a term borrowed from linear programming [53]) since its inclusion raisesthis inequality to equality.

6.2. FRAMEWORK 3636.2.1.1 A ∩ S n + emptiness determination via Farkas’ lemma6.2.1.1.1 Lemma. Semidefinite Farkas’ lemma.Given an arbitrary set {A i ∈ S n , i=1... m} and a vector b = [b i ]∈ R m ,define the affine subsetA = {X ∈ S n | 〈A i , X〉 = b i , i=1... m} (848)Primal feasible set A ∩ S n + is nonempty if and only if y T b ≥ 0 holds foreach and every vector y = [y i ]∈ R m ∑such that m y i A i ≽ 0.Equivalently, primal feasible set A ∩ S n + is nonempty if and onlyif y T b ≥ 0 holds for each and every norm-1 vector ‖y‖= 1 such thatm∑y i A i ≽ 0.⋄i=1Semidefinite Farkas’ lemma follows directly from a membership relation(2.13.2.0.1) and the closed convex cones from linear matrix inequalityexample 2.13.5.1.1; given convex cone K and its duali=1whereK = {A svec X | X ≽ 0} (301)m∑K ∗ = {y | y j A j ≽ 0} (307)⎡A = ⎣j=1⎤svec(A 1 ) T. ⎦ ∈ R m×n(n+1)/2 (846)svec(A m ) Tthen we have membership relationb ∈ K ⇔ 〈y,b〉 ≥ 0 ∀y ∈ K ∗ (260)and equivalentsb ∈ K ⇔ ∃X ≽ 0 A svec X = b ⇔ A ∩ S n + ≠ ∅ (858)

364 CHAPTER 6. SEMIDEFINITE PROGRAMMINGSemidefinite Farkas’ lemma provides the conditions required for a setof hyperplanes to have a nonempty intersection A ∩ S n + with the positivesemidefinite cone. While the lemma as stated is correct, Ye points out[259,1.3.8] that a positive definite version of this lemma is required forsemidefinite programming because any feasible point in the relative interiorA ∩ int S n + is required by Slater’s condition 6.6 to achieve 0 duality gap(primal-dual objective difference6.2.3). In our circumstance, assuminga nonempty intersection, a positive definite lemma is required to insure apoint of intersection closest to the origin is not at infinity; e.g., Figure 31.Then given A∈ R m×n(n+1)/2 having rank m , we wish to detect existence ofa nonempty relative interior of the primal feasible set; 6.7b ∈ int K ⇔ 〈y, b〉 > 0 for all y ∈ K ∗ , y ≠ 0 ⇔ A ∩ int S n + ≠ ∅(859)A positive definite Farkas’ lemma can easily be constructed from thismembership relation (267) and these proper convex cones K (301) andK ∗ (307):6.2.1.1.2 Lemma. Positive definite Farkas’ lemma.Given a linearly independent set {A i ∈ S n , i=1... m}b = [b i ]∈ R m , define the affine subsetand a vectorA = {X ∈ S n | 〈A i , X〉 = b i , i=1... m} (848)Primal feasible set relative interior A ∩ int S n + is nonempty if and only if∑y T b > 0 holds for each and every vector y = [y i ]≠ 0 such that m y i A i ≽ 0.Equivalently, primal feasible set relative interior A ∩ int S n + is nonemptyif and only if y T b > 0 holds for each and every norm-1 vector ‖y‖= 1 such∑that m y i A i ≽ 0.⋄i=16.6 Slater’s sufficient condition is satisfied whenever any primal strictly feasible pointexists; id est, any point feasible with the affine equality (or affine inequality) constraintfunctions and relatively interior to convex cone K . If cone K is polyhedral, then Slater’scondition is satisfied when any feasible point exists relatively interior to K or on its relativeboundary. [38,5.2.3] [259,1.3.8] [26, p.325]6.7 Detection of A ∩ int S n + by examining K interior is a trick need not be lost.i=1

6.2. FRAMEWORK 3656.2.1.1.3 Example. “New” Farkas’ lemma.In 1995, Lasserre [141,III] presented an example originally offered byBen-Israel in 1969 [23, p.378] as evidence of failure in semidefinite Farkas’Lemma 6.2.1.1.1:[ ]A =∆ svec(A1 ) Tsvec(A 2 ) T =[ 0 1 00 0 1] [ 1, b =0](860)The intersection A ∩ S n + is practically empty because the solution set{X ≽ 0 | A svec X = b} ={[α1 √2√120]≽ 0 | α∈ R}(861)is positive semidefinite only asymptotically (α→∞). Yet the dual systemm∑y i A i ≽0 ⇒ y T b≥0 indicates nonempty intersection; videlicet, for ‖y‖= 1i=1y 1[01 √21 √20]+ y 2[ 0 00 1][ ] 0≽ 0 ⇔ y =1⇒ y T b = 0 (862)On the other hand, positive definite Farkas’ Lemma 6.2.1.1.2 showsA ∩ int S n + is empty; what we need to know for semidefinite programming.Based on Ben-Israel’s example, Lasserre suggested addition of anothercondition to semidefinite Farkas’ Lemma 6.2.1.1.1 to make a “new” lemma.Ye recommends positive definite Farkas’ Lemma 6.2.1.1.2 instead; which issimpler and obviates Lasserre’s proposed additional condition. 6.2.1.2 Theorem of the alternative for semidefinite programmingBecause these Farkas’ lemmas follow from membership relations, we mayconstruct alternative systems from them. From positive definite Farkas’lemma and using the method of2.13.2.1.1, we getA ∩ int S n + ≠ ∅or in the alternativem∑y T b ≤ 0, y i A i ≽ 0, y ≠ 0i=1(863)

366 CHAPTER 6. SEMIDEFINITE PROGRAMMINGAny single vector y satisfying the alternative certifies A ∩ int S n + is empty.Such a vector can be found as a feasible point of another semidefiniteprogram: for linearly independent set {A i ∈ S n , i=1... m}minimizeysubject toy T bm∑y i A i ≽ 0i=1(864)If y ≠ 0 is a feasible point and y T b ≤ 0, then the relative interior of theprimal feasible set A ∩ int S n + from (857) is empty.6.2.1.3 boundary-membership criterionFrom the boundary-membership relation (271) for proper cones and fromlinear matrix inequality cones K (301) and K ∗ (307)b ∈ ∂K ⇔ ∃ y 〈y , b〉 = 0, y ∈ K ∗ , y ≠ 0, b ∈ K ⇔ A ∩ S n + ⊂ ∂S n +(865)Determination of whether b ∈ ∂K , is expressible as a semidefinite program:assuming linearly independent set {A i ∈ S n , i=1... m} 6.8maximize 1 T yy , Xsubject to y T b = 0m∑y i A i ≽ 0i=1‖y‖ 2 ≤ 1A svec X = bX ≽ 0(866)If such an optimal vector y ⋆ ≠ 0 can be found, then affine subset A (848)intersects the positive semidefinite cone but only on its boundary.6.8 From the results of Example 2.13.5.1.1, vector b on the boundary of K cannot bedetected simply by looking for 0 eigenvalues in matrix X . Therefore, none of theseconstraints are redundant. We do not consider the case where matrix A has no nullspacebecause then the feasible set {X} is at most a single point.

6.2. FRAMEWORK 3676.2.2 DualsThe dual objective function evaluated at any feasible point represents a lowerbound on the primal optimal objective value. We can see this by directsubstitution: Assume the feasible sets A ∩ S n + and C ∗ are nonempty. Thenit is always true:〈 ∑i〈C , X〉 ≥ 〈b, y〉〉y i A i + S , X ≥ [ 〈A 1 , X〉 · · · 〈A m , X〉 ] y〈S , X〉 ≥ 0(867)The converse also follows becauseX ≽ 0, S ≽ 0 ⇒ 〈S,X〉 ≥ 0 (1082)The optimal value of the dual objective thus represents the greatest lowerbound on the primal. This fact is known as the weak duality theoremfor semidefinite programming, [259,1.3.8] and can be used to detectconvergence in any primal/dual numerical method of solution.6.2.3 Optimality conditionsWhen any primal feasible point exists relatively interior to A ∩ S n + in S n , orwhen any dual feasible point exists relatively interior to C ∗ in S n × R m , thenby Slater’s sufficient condition these two problems (845)(P) and (845)(D)become strong duals. In other words, the primal optimal objective valuebecomes equivalent to the dual optimal objective value: there is no dualitygap; id est, if ∃X ∈ A ∩ int S n + or ∃S,y ∈ rel int C ∗ then〈 ∑i〈C , X ⋆ 〉 = 〈b, y ⋆ 〉y ⋆ i A i + S ⋆ , X ⋆ 〉= [ 〈A 1 , X ⋆ 〉 · · · 〈A m , X ⋆ 〉 ] y ⋆〈S ⋆ , X ⋆ 〉 = 0(868)where S ⋆ , y ⋆ denote a dual optimal solution. 6.9 We summarize this:6.9 Optimality condition 〈S ⋆ , X ⋆ 〉=0 is called a complementary slackness condition, inkeeping with the tradition of linear programming, [53] that forbids dual inequalities in(845) to simultaneously hold strictly. [193,4]

368 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.2.3.0.1 Corollary. Optimality and strong duality. [232,3.1][259,1.3.8] For semidefinite programs (845)(P) and (845)(D), assumeprimal and dual feasible sets A ∩ S n + ⊂ S n and C ∗ ⊂ S n × R m (857) arenonempty. ThenX ⋆ is optimal for (P)S ⋆ , y ⋆ are optimal for (D)the duality gap 〈C,X ⋆ 〉−〈b, y ⋆ 〉 is 0if and only ifi) ∃X ∈ A ∩ int S n + or ∃S , y ∈ rel int C ∗andii) 〈S ⋆ , X ⋆ 〉 = 0⋄For symmetric positive semidefinite matrices, requirement ii is equivalentto the complementarity (A.7.3)〈S ⋆ , X ⋆ 〉 = 0 ⇔ S ⋆ X ⋆ = X ⋆ S ⋆ = 0 (869)Commutativity of diagonalizable matrices is a necessary and sufficientcondition [125,1.3.12] for these two optimal symmetric matrices to besimultaneously diagonalizable. ThereforerankX ⋆ + rankS ⋆ ≤ n (870)Proof. To see that, the product of symmetric optimal matricesX ⋆ , S ⋆ ∈ S n must itself be symmetric because of commutativity. (1076) Thesymmetric product has diagonalization [9, cor.2.11]S ⋆ X ⋆ = X ⋆ S ⋆ = QΛ S ⋆Λ X ⋆Q T = 0 ⇔ Λ X ⋆Λ S ⋆ = 0 (871)where Q is an orthogonal matrix. The product of the nonnegative diagonal Λmatrices can be 0 if their main-diagonal zeros are complementary or coincide.Due only to symmetry, rankX ⋆ = rank Λ X ⋆ and rankS ⋆ = rank Λ S ⋆ forthese optimal primal and dual solutions. (1064) So, because of thecomplementarity, the total number of nonzero diagonal entries from both Λcannot exceed n .

6.2. FRAMEWORK 369When equality is attained in (870)rankX ⋆ + rankS ⋆ = n (872)there are no coinciding main-diagonal zeros in Λ X ⋆Λ S ⋆ , and so we have whatis called strict complementarity. 6.10 Logically it follows that a necessary andsufficient condition for strict complementarity of an optimal primal and dualsolution isX ⋆ + S ⋆ ≻ 0 (873)The beauty of Corollary 6.2.3.0.1 is its conjugacy; id est, one can solveeither the primal or dual problem and then find a solution to the other via theoptimality conditions. When a dual optimal solution is known, for example,a primal optimal solution belongs to the hyperplane {X | 〈S ⋆ , X〉=0}.6.2.3.0.2 Example. Minimum cardinality Boolean. [52] [24,4.3.4] [223]Consider finding a minimum cardinality Boolean solution x to the classiclinear algebra problem Ax = b given noiseless data A∈ R m×n and b∈ R m ;minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n(874)where ‖x‖ 0 denotes cardinality of vector x (a.k.a, 0-norm; not a convexfunction).A minimum cardinality solution answers the question: “Which fewestlinear combination of columns in A constructs vector b ?” Cardinalityproblems have extraordinarily wide appeal, arising in many fields of scienceand across many disciplines. [203] [132] [98] [99] Yet designing an efficientalgorithm to optimize cardinality has proved difficult. In this example, wealso constrain the variable to be Boolean. The Boolean constraint forcesan identical solution were the norm in problem (874) instead the 1-norm or2-norm; id est, the two problems6.10 distinct from maximal complementarity (6.1.1).

370 CHAPTER 6. SEMIDEFINITE PROGRAMMING(874)minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n=minimize ‖x‖ 1xsubject to Ax = bx i ∈ {0, 1} ,(875)i=1... nare the same. The Boolean constraint makes the 1-norm problem nonconvex.Given data 6.11⎡−1 1 8 1 1 01 1 1A = ⎣ −3 2 82 31 1−9 4 84 9− 1 2 31− 1 4 9⎤ ⎡⎦ , b = ⎣11214⎤⎦ (876)the obvious and desired solution to the problem posed,x ⋆ = e 4 ∈ R 6 (877)has norm ‖x ⋆ ‖ 2 = 1 and minimum cardinality; the minimum number ofnonzero entries in vector x . Though solution (877) is obvious, the simplestnumerical method of solution is, more generally, combinatorial. The Matlabbackslash command x=A\b , for example, finds⎡x M=⎢⎣2128051280901280⎤⎥⎦(878)having norm ‖x M‖ 2 = 0.7044 .id est, an optimal solution toCoincidentally, x Mis a 1-norm solution;minimize ‖x‖ 1xsubject to Ax = b(879)6.11 This particular matrix A is full-rank having three-dimensional nullspace (but thecolumns are not conically independent).

6.2. FRAMEWORK 371The pseudoinverse solution (rounded)⎡ ⎤−0.0456−0.1881x P= A † b =0.0623⎢ 0.2668⎥⎣ 0.3770 ⎦−0.1102(880)has minimum norm ‖x P‖ 2 =0.5165 ; id est, the optimal solution tominimize ‖x‖ 2xsubject to Ax = b(881)Certainly, none of the traditional methods provide x ⋆ = e 4 (877).We can reformulate this minimum cardinality Boolean problem (874) asa semidefinite program: First transform the variablex ∆ = (ˆx + 1) 1 2(882)so ˆx i ∈ {−1, 1} ; equivalently,minimize ‖(ˆx + 1) 1‖ ˆx2 0subject to A(ˆx + 1) 1 = b 2δ(ˆx) 2 = I(883)where δ is the main-diagonal linear operator (A.1). By defining a matrix[X =∆ ˆx1] [ ˆx T 1] =[ ˆxˆxTˆxˆx T 1]∈ S n+1 (884)

372 CHAPTER 6. SEMIDEFINITE PROGRAMMINGproblem (883) becomes equivalent to:minimizeX∈ S n+1 〈subject to[ 0 1X ,1 T 0[ A 00 T 1]Xδ(X) = 1X ≽ 0rankX = 1]〉[ AT00 T 1]=[ (2b −A1)(2b −A1)T2b −A1(2b −A1) T 1](885)where solution is confined to vertices of the elliptope in S n+1 (4.9.1.0.1)by the rank constraint, the positive semidefiniteness, and the equalityconstraints δ(X) = 1. The rank constraint makes this problem nonconvex.By removing it 6.12 we get the semidefinite programminimizeX∈ S n+1subject to(tr X[ A 00 T 1[ ]) 0 11 T 0]Xδ(X) = 1X ≽ 0[ AT00 T 1]=[ (2b −A1)(2b −A1)T2b −A1(2b −A1) T 1](886)whose optimal solution X ⋆ is identical to that of minimum cardinalityBoolean problem (874) if and only if rankX ⋆ = 1. The hope of acquiring arank-1 solution is not ill-founded because all elliptope vertices have rank-1,and we are minimizing an affine function on a subset of the elliptope(Figure 67) containing vertices; id est, by assumption that the feasible setof the minimum cardinality Boolean problem (874) is nonempty, a desiredsolution resides on the elliptope relative boundary at a vertex. Confinementto the elliptope can be considered a kind of normalization akin to A columnnormalization as suggested in [65] and explored in Example 6.2.3.0.3.6.12 Relaxed problem (886) can also be derived via Lagrange duality; [191] [38,5] it is adual of a dual program [sic] to (885). The relaxed problem must therefore be convex.

6.2. FRAMEWORK 373For the data given in (876), our semidefinite program solver (accurate toapproximately 1E-8) 6.13 finds optimal solution to (886)⎡round(X ⋆ ) =⎢⎣1 1 1 −1 1 1 −11 1 1 −1 1 1 −11 1 1 −1 1 1 −1−1 −1 −1 1 −1 −1 11 1 1 −1 1 1 −11 1 1 −1 1 1 −1−1 −1 −1 1 −1 −1 1⎤⎥⎦(887)near a vertex of the elliptope in S n+1 ; its sorted eigenvalues,⎡λ(X ⋆ ) =⎢⎣6.999999777990990.000000226872410.000000022502960.00000000262974−0.00000000999738−0.00000000999875−0.00000001000000⎤⎥⎦(888)The negative eigenvalues are undoubtedly finite-precision effects. Becausethe largest eigenvalue predominates by many orders of magnitude, we canexpect to find a good approximation to a minimum cardinality Booleansolution by truncating all smaller eigenvalues. By so doing we find, indeed,⎛⎡x ⋆ = round⎜⎢⎝⎣0.000000001279470.000000005273690.000000001810010.999999974690440.000000014089500.00000000482903⎤⎞= e 4 (889)⎥⎟⎦⎠the desired result (877).6.13 A typically ignored limitation of interior-point methods of solution is their relativeaccuracy of only about 1E-8 on a machine using 64-bit (double precision) arithmetic;id est, optimal solution cannot be more accurate than square root of machine epsilon(2.2204E-16).

374 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.2.3.0.3 Example. Optimization on elliptope versus 1-norm polyhedron.A minimum cardinality problem is typically formulated via, what is by now,the standard practice [65] of column normalization applied to a 1-normproblem surrogate like (879). Suppose we define a diagonal matrix⎡Λ =∆ ⎢⎣⎤‖A(:,1)‖ 2 0‖A(:, 2)‖ 2 ⎥... ⎦ ∈ S6 (890)0 ‖A(:, 6)‖ 2used to normalize the columns (assumed nonzero) of given noiseless datamatrix A . Then approximate the minimum cardinality Boolean problemaswhere optimal solutionminimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... nminimize ‖ỹ‖ 1ỹsubject to AΛ −1 ỹ = b1 ≽ Λ −1 ỹ ≽ 0(874)(891)y ⋆ = round(Λ −1 ỹ ⋆ ) (892)The inequality in (891) relaxes the Boolean constraint y i ∈ {0, 1} as in (874);serving to bound any solution y ⋆ to a unit cube whose vertices are binarynumbers. Convex problem (891) is one of many conceivable approximations,but Donoho concurs with this particular formulation equivalently expressibleas a linear program via (1269).Problem (891) is therefore equivalent to minimization of an affine functionon a bounded polyhedron, whereas semidefinite program( [ ]) 0 1minimizeX∈ S n+1subject totr X[ A 00 T 1δ(X) = 1X ≽ 01 T 0]X[ AT00 T 1]=[ (2b −A1)(2b −A1)T2b −A1(2b −A1) T 1](886)

6.2. FRAMEWORK 375minimizes an affine function on the elliptope intersected by hyperplanes.Although the same Boolean solution is obtained from this approximation(891) as compared with semidefinite program (886) when given thatparticular data from Example 6.2.3.0.2, Singer confides a counter-example:Instead, given data[ ] [ ]1 0 √2 11A =1, b =(893)0 1 √2 1then solving (891) yields⎛⎡y ⋆ ⎜⎢= round⎝⎣1 − 1 √21 − 1 √21⎤⎞⎥⎟⎦⎠ =⎡⎢⎣001⎤⎥⎦ (894)(infeasible, with or without rounding, with respect to original problem (874))whereas solving semidefinite program (886) produces⎡⎤1 1 −1 1round(X ⋆ ) = ⎢ 1 1 −1 1⎥⎣ −1 −1 1 −1 ⎦ (895)1 1 −1 1with sorted eigenvaluesλ(X ⋆ ) =⎡⎢⎣3.999999650572640.00000035942736−0.00000000000000−0.00000001000000⎤⎥⎦ (896)Truncating all but the largest eigenvalue, versus y ⋆ we obtain⎛⎡⎤⎞⎡ ⎤0.99999999625299 1x ⋆ = round⎝⎣0.99999999625299 ⎦⎠ = ⎣ 1 ⎦ (897)0.00000001434518 0the desired minimum cardinality Boolean result.We leave assessment of general performance of (891) as compared withsemidefinite program (886) an open question.

376 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.3 Rank reduction...it is not clear generally how to predict rankX ⋆ or rankS ⋆before solving the SDP problem.−Farid Alizadeh (1995) [9, p.22]The premise of rank reduction in semidefinite programming is: an optimalsolution found does not satisfy Barvinok’s upper bound (220) on rank. Theparticular numerical algorithm solving a semidefinite program may haveinstead returned a high-rank optimal solution (6.1.1; e.g., (856)) when alower-rank optimal solution was expected.6.3.1 Posit a perturbation of X ⋆Recall from6.1.1.1, there is an extreme point of A ∩ S n + (848) satisfyingupper bound (220) on rank. [18,2.2] It is therefore sufficient to locate anextreme point of the intersection whose primal objective value (845)(P) isoptimal: 6.14 [61,31.5.3] [148,2.4] [5,3] [182]Consider again the affine subsetA = {X ∈ S n | A svec X = b} (848)where for A i ∈ S n ⎡A =∆ ⎣⎤svec(A 1 ) T. ⎦ ∈ R m×n(n+1)/2 (846)svec(A m ) TGiven any optimal solution X ⋆ tominimizeX∈ S n 〈C , X 〉subject to X ∈ A ∩ S n +(845)(P)6.14 There is no known construction for Barvinok’s tighter result (225). −Monique Laurent

6.3. RANK REDUCTION 377whose rank does not satisfy upper bound (220), we posit existence of a setof perturbations{t j B j | t j ∈ R , B j ∈ S n , j =1... n} (898)such that, for some 0≤i≤n and scalars {t j , j =1... i}X ⋆ +i∑t j B j (899)j=1becomes an extreme point of A ∩ S n + and remains an optimal solution of(845)(P). Membership of (899) to affine subset A is secured for the i thperturbation by demanding〈B i , A j 〉 = 0, j =1... m (900)while membership to the positive semidefinite cone S n + is insured by smallperturbation (909). In this manner feasibility is insured. Optimality is provedin6.3.3.The following simple algorithm has very low computational intensity andlocates an optimal extreme point, assuming a nontrivial solution:6.3.1.0.1 Procedure. Rank reduction.initialize: B i = 0 ∀ifor iteration i=1...n{1. compute a nonzero perturbation matrix B i of X ⋆ + i−1 ∑j=1t j B j2. maximize t isubject to X ⋆ + i ∑j=1t j B j ∈ S n +}

378 CHAPTER 6. SEMIDEFINITE PROGRAMMINGA rank-reduced optimal solution is theni∑X ⋆ ← X ⋆ + t j B j (901)j=16.3.2 Perturbation formThe perturbations are independent of constants C ∈ S n and b∈R m in primaland dual programs (845). Numerical accuracy of any rank-reduced result,found by perturbation of an initial optimal solution X ⋆ , is therefore quitedependent upon initial accuracy of X ⋆ .6.3.2.0.1 Definition. Matrix step function. (conferA.6.4.0.1)Define the signum-like real function ψ : S n → Rψ(Z) ∆ ={ 1, Z ≽ 0−1, otherwise(902)The value −1 is taken for indefinite or nonzero negative semidefiniteargument.△Deza & Laurent [61,31.5.3] prove: every perturbation matrix B i ,i=1... n , is of the formB i = −ψ(Z i )R i Z i R T i ∈ S n (903)where∑i−1X ⋆ = ∆ R 1 R1 T , X ⋆ + t j B ∆ j = R i Ri T ∈ S n (904)j=1where the t j are scalars and R i ∈ R n×ρ is full-rank and skinny where( )∑i−1ρ = ∆ rank X ⋆ + t j B jj=1(905)and where matrix Z i ∈ S ρ is found at each iteration i by solving a very

6.3. RANK REDUCTION 379simple feasibility problem: 6.15find Z i ∈ S ρsubject to 〈Z i , RiA T j R i 〉 = 0 ,j =1... m(906)Were there a sparsity pattern common to each member of the set{R T iA j R i ∈ S ρ , j =1... m} , then a good choice for Z i has 1 in each entrycorresponding to a 0 in the pattern; id est, a sparsity pattern complement.At iteration i∑i−1X ⋆ + t j B j + t i B i = R i (I − t i ψ(Z i )Z i )Ri T (907)j=1By fact (1057), therefore∑i−1X ⋆ + t j B j + t i B i ≽ 0 ⇔ 1 − t i ψ(Z i )λ(Z i ) ≽ 0 (908)j=1where λ(Z i )∈ R ρ denotes the eigenvalues of Z i .Maximization of each t i in step 2 of the Procedure reduces rank of (907)and locates a new point on the boundary ∂(A ∩ S n +) . 6.16 Maximization oft i thereby has closed form;6.15 A simple method of solution is closed-form projection of a random nonzero point onthat proper subspace of isometrically isomorphic R ρ(ρ+1)/2 specified by the constraints.(E.5.0.0.6) Such a solution is nontrivial assuming the specified intersection of hyperplanesis not the origin; guaranteed by ρ(ρ + 1)/2 > m. Indeed, this geometric intuition aboutforming the perturbation is what bounds any solution’s rank from below; m is fixed by thenumber of equality constraints in (845)(P) while rank ρ decreases with each iteration i.Otherwise, we might iterate indefinitely.6.16 This holds because rank of a positive semidefinite matrix in S n is diminished belown by the number of its 0 eigenvalues (1064), and because a positive semidefinite matrixhaving one or more 0 eigenvalues corresponds to a point on the PSD cone boundary (156).Necessity and sufficiency are due to the facts: R i can be completed to a nonsingular matrix(A.3.1.0.5), and I − t i ψ(Z i )Z i can be padded with zeros while maintaining equivalencein (907).

380 CHAPTER 6. SEMIDEFINITE PROGRAMMING(t ⋆ i) −1 = max {ψ(Z i )λ(Z i ) j , j =1... ρ} (909)When Z i is indefinite, the direction of perturbation (determined by ψ(Z i )) isarbitrary. We may take an early exit from the Procedure were Z i to become0 or wererank [ svec R T iA 1 R i svec R T iA 2 R i · · · svec R T iA m R i]= ρ(ρ + 1)/2 (910)which characterizes the rank ρ of any [sic] extreme point in A ∩ S n + .[148,2.4]Proof. Assuming the form of every perturbation matrix is indeed (903),then by (906)svec Z i ⊥ [ svec(R T iA 1 R i ) svec(R T iA 2 R i ) · · · svec(R T iA m R i ) ] (911)By orthogonal complement we haverank [ svec(R T iA 1 R i ) · · · svec(R T iA m R i ) ] ⊥+ rank [ svec(R T iA 1 R i ) · · · svec(R T iA m R i ) ] = ρ(ρ + 1)/2(912)When Z i can only be 0, then the perturbation is null because an extremepoint has been found; thus[svec(RTi A 1 R i ) · · · svec(R T iA m R i ) ] ⊥= 0 (913)from which the stated result (910) directly follows.

6.3. RANK REDUCTION 3816.3.3 Optimality of perturbed X ⋆We show that the optimal objective value is unaltered by perturbation (903);id est,i∑〈C , X ⋆ + t j B j 〉 = 〈C , X ⋆ 〉 (914)j=1Proof. From Corollary 6.2.3.0.1 we have the necessary and sufficientrelationship between optimal primal and dual solutions under the assumptionof existence of a relatively interior feasible point:S ⋆ X ⋆ = S ⋆ R 1 R T 1 = X ⋆ S ⋆ = R 1 R T 1 S ⋆ = 0 (915)This means R(R 1 ) ⊆ N(S ⋆ ) and R(S ⋆ ) ⊆ N(R T 1 ). From (904) and (907)we get the sequence:X ⋆ = R 1 R1TX ⋆ + t 1 B 1 = R 2 R2 T = R 1 (I − t 1 ψ(Z 1 )Z 1 )R1TX ⋆ + t 1 B 1 + t 2 B 2 = R 3 R3 T = R 2 (I − t 2 ψ(Z 2 )Z 2 )R2 T = R 1 (I − t 1 ψ(Z 1 )Z 1 )(I − t 2 ψ(Z 2 )Z 2 )R1T.()∑X ⋆ + i i∏t j B j = R 1 (I − t j ψ(Z j )Z j ) R1 T (916)j=1j=1Substituting C = svec −1 (A T y ⋆ ) + S ⋆ from (845),()∑〈C , X ⋆ + i i∏t j B j 〉 =〈svec −1 (A T y ⋆ ) + S ⋆ , R 1 (I − t j ψ(Z j )Z j )j=1〈〉m∑= yk ⋆A ∑k , X ⋆ + i t j B jk=1j=1j=1〈 m〉∑= yk ⋆A k + S ⋆ , X ⋆ = 〈C , X ⋆ 〉 (917)k=1R T 1〉because 〈B i , A j 〉=0 ∀i, j by design (900).

382 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.3.3.0.1 Example. Aδ(X) = b .This academic example demonstrates that a solution found by rank reductioncan certainly have rank less than Barvinok’s upper bound (220): Assume agiven vector b∈ R m belongs to the conic hull of the columns of a given matrixA∈ R m×n ;⎡−1 1 8 1 1A = ⎣ −3 2 8 1/2 1/3−9 4 8 1/4 1/9Consider the convex optimization problem⎤⎡⎦ , b = ⎣11/21/4⎤⎦ (918)minimize trXX∈ S 5subject to X ≽ 0Aδ(X) = b(919)that minimizes the 1-norm of the main diagonal; id est, problem (919) is thesame asminimizeX∈ S 5 ‖δ(X)‖ 1subject to X ≽ 0Aδ(X) = b(920)that finds a solution to Aδ(X)=b. Rank-3 solution X ⋆ = δ(x M) is optimal,where⎡ 2 ⎤1280x M=5⎢ 128 ⎥(921)⎣ 0 ⎦90128Yet upper bound (220) predicts existence of at most a(⌊√ ⌋ )8m + 1 − 1rank-= 22(922)feasible solution from m = 3 equality constraints. To find a lowerrank ρ optimal solution to (919) (barring combinatorics), we invokeProcedure 6.3.1.0.1:

6.3. RANK REDUCTION 383Initialize:C =I , ρ=3, A j ∆ = δ(A(j, :)) , j =1, 2, 3, X ⋆ = δ(x M) , m=3, n=5.{Iteration i=1:⎡Step 1: R 1 =⎢⎣√21280 00 0 00√501280 0√00 090128⎤.⎥⎦find Z 1 ∈ S 3subject to 〈Z 1 , R T 1A j R 1 〉 = 0, j =1, 2, 3(923)A nonzero randomly selected matrix Z 1 having 0 main diagonalis feasible and yields a nonzero perturbation matrix. Choose,arbitrarily,Z 1 = 11 T − I ∈ S 3 (924)then (rounding)⎡B 1 =⎢⎣Step 2: t ⋆ 1= 1 because λ(Z 1 )=[−1 −1⎡X ⋆ ← δ(x M) + B 1 =⎢⎣0 0 0.0247 0 0.10480 0 0 0 00.0247 0 0 0 0.16570 0 0 0 00.1048 0 0.1657 0 02 ] T . So,⎤⎥⎦20 0.0247 0 0.10481280 0 0 0 00.0247 051280 0.16570 0 0 0 00.1048 0 0.1657 090128⎤(925)⎥⎦ (926)has rank ρ ←1 and produces the same optimal objective value.}

384 CHAPTER 6. SEMIDEFINITE PROGRAMMINGAnother demonstration of rank reduction Procedure 6.3.1.0.1 would applyto the maximal complementarity example (6.1.1.3.1); a rank-1 solution cancertainly be found (by Barvinok’s Proposition 2.9.3.0.1) because there is onlyone equality constraint.6.3.4 Final thoughts regarding rank reductionBecause the rank reduction procedure is guaranteed only to produce anotheroptimal solution conforming to Barvinok’s upper bound (220), the Procedurewill not necessarily produce solutions of arbitrarily low rank; but if they exist,the Procedure can. Arbitrariness of search direction when matrix Z i becomesindefinite, mentioned on page 380, and the enormity of choices for Z i (906)are liabilities for this algorithm.6.3.4.1 Inequality constraintsThe question naturally arises: what to do when a semidefinite program (notin prototypical form) 6.17 has inequality constraints of the formα T i svec X ≼ β i , i = 1... k (927)where the β i are scalars. One expedient way to handle this circumstance isto convert the inequality constraints to equality constraints by introducing aslack variable; id est,α T i svec X + γ i = β i , i = 1... k , γ ≽ 0 (928)thereby converting the problem to prototypical form.6.17 Contemporary numerical packages for solving semidefinite programs can solve a widerrange of problem than our conic prototype (845). Generally, they do so by transforming agiven problem into some prototypical form by introducing new constraints and variables.[9] [254] We are momentarily considering a departure from the primal prototype thataugments the constraint set with affine inequalities.

6.3. RANK REDUCTION 385Alternatively, we say an inequality constraint is active when it is met withequality; id est, when for particular i in (927), αi T psvec X ⋆ = β ip . An optimalhigh-rank solution X ⋆ is, of course, feasible satisfying all the constraints.But for the purpose of rank reduction, the inactive inequality constraints areignored while the active inequality constraints are interpreted as equalityconstraints. In other words, we take the union of the active inequalityconstraints (as equalities) with the equality constraints A svec X = b to forma composite affine subset Â for the primal problem (845)(P). Then weproceed with rank reduction of X ⋆ as though the semidefinite program werein prototypical form (845)(P).

386 CHAPTER 6. SEMIDEFINITE PROGRAMMING

Chapter 7EDM proximityIn summary, we find that the solution to problem [(936.3) p.393]is difficult and depends on the dimension of the space as thegeometry of the cone of EDMs becomes more complex.−Hayden, Wells, Liu, & Tarazaga (1991) [110,3]A problem common to various sciences is to find the Euclidean distancematrix (EDM) D ∈ EDM N closest in some sense to a given complete matrixof measurements H under a constraint on affine dimension 0 ≤ r ≤ N −1(2.3.1,4.7.1.1); rather, r is bounded above by desired affine dimension ρ .2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.387

388 CHAPTER 7. EDM PROXIMITY7.0.1 Measurement matrix HIdeally, we want a given matrix of measurements H ∈ R N×N to conformwith the first three Euclidean metric properties (4.2); to belong to theintersection of the orthant of nonnegative matrices R N×N+ with the symmetrichollow subspace S N h (2.2.3.0.1). Geometrically, we want H to belong to thepolyhedral cone (2.12.1.0.1)K ∆ = S N h ∩ R N×N+ (929)Yet in practice, H can possess significant measurement uncertainty (noise).Sometimes realization of an optimization problem demands that itsinput, the given matrix H , possess some particular characteristics; perhapssymmetry and hollowness or nonnegativity. When that H given does notpossess the desired properties, then we must impose them upon H prior tooptimization:When measurement matrix H is not symmetric or hollow, taking itssymmetric hollow part is equivalent to orthogonal projection on thesymmetric hollow subspace S N h .When measurements of distance in H are negative, zeroing negativeentries effects unique minimum-distance projection on the orthant ofnonnegative matrices R N×N+ in isomorphic R N2 (E.9.2.2.3).7.0.1.1 Order of impositionSince convex cone K (929) is the intersection of an orthant with a subspace,we want to project on that subset of the orthant belonging to the subspace;on the nonnegative orthant in the symmetric hollow subspace that is, in fact,the intersection. For that reason alone, unique minimum-distance projectionof H on K (that member of K closest to H in isomorphic R N2 in the Euclideansense) can be attained by first taking its symmetric hollow part, and onlythen clipping negative entries of the result to 0 ; id est, there is only onecorrect order of projection, in general, on an orthant intersecting a subspace:project on the subspace, then project the result on the orthant in thatsubspace. (conferE.9.5)In contrast, order of projection on an intersection of subspaces is arbitrary.

389That order-of-projection rule applies more generally, of course, tothe intersection of any convex set C with any subspace. Consider theproximity problem 7.1 over convex feasible set S N h ∩ C given nonsymmetricnonhollow H ∈ R N×N :minimizeB∈S N hsubject to‖B − H‖ 2 FB ∈ C(930)a convex optimization problem. Because the symmetric hollow subspaceis orthogonal to the antisymmetric antihollow subspace (2.2.3), then forB ∈ S N h( ( ))1tr B T 2 (H −HT ) + δ 2 (H) = 0 (931)so the objective function is equivalent to( )∥ ‖B − H‖ 2 F ≡1 ∥∥∥2∥ B − 2 (H +HT ) − δ 2 (H) +F21∥2 (H −HT ) + δ 2 (H)∥F(932)This means the antisymmetric antihollow part of given matrix H wouldbe ignored by minimization with respect to symmetric hollow variable Bunder the Frobenius norm; id est, minimization proceeds as though giventhe symmetric hollow part of H .This action of the Frobenius norm (932) is effectively a Euclideanprojection (minimum-distance projection) of H on the symmetric hollowsubspace S N h prior to minimization. Thus minimization proceeds inherentlyfollowing the correct order for projection on S N h ∩ C . Therefore wemay either assume H ∈ S N h , or take its symmetric hollow part prior tooptimization.7.0.1.2 Egregious input error under nonnegativity demandMore pertinent to the optimization problems presented herein whereC ∆ = EDM N ⊆ K = S N h ∩ R N×N+ (933)7.1 There are two equivalent interpretations of projection (E.9): one finds a set normal,the other, minimum distance between a point and a set. Here we realize the latter view.

390 CHAPTER 7. EDM PROXIMITY......❜..❜..❜..❜..❜..❜..❜.❜❜❜❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧ 0EDM N❜❜ S N ❝ ✟✟✟✟✟✟✟✟✟✟✟❝❝❝❝❝❝❝❝❝❝❝h❜❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❜❜❜❜❜❜ K = S N .h ∩ R N×N+.❜... S N ❜.. ❜..❜.. ❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧....... . . . ..R N×N........................Figure 89: Pseudo-Venn diagram: The EDM cone belongs to the intersectionof the symmetric hollow subspace with the nonnegative orthant; EDM N ⊆ K(459). EDM N cannot exist outside S N h , but R N×N+ does.......then should some particular realization of a proximity problem demandinput H be nonnegative, and were we only to zero negative entries of anonsymmetric nonhollow input H prior to optimization, then the ensuingprojection on EDM N would be guaranteed incorrect (out of order).Now comes a surprising fact: Even were we to correctly follow theorder-of-projection rule and provide H ∈ K prior to optimization, then theensuing projection on EDM N will be incorrect whenever input H has negativeentries and some proximity problem demands nonnegative input H .This is best understood referring to Figure 89: Suppose nonnegativeinput H is demanded, and then the problem realization correctly projectsits input first on S N h and then directly on C = EDM N . That demandfor nonnegativity effectively requires imposition of K on input H priorto optimization so as to obtain correct order of projection (on S N h first).Yet such an imposition prior to projection on EDM N generally introducesan elbow into the path of projection (illustrated in Figure 90) caused bythe technique itself; that being, a particular proximity problem realizationrequiring nonnegative input.

391HS N h0EDM NK = S N h ∩ R N×N+Figure 90: Pseudo-Venn diagram from Figure 89 showing elbow placedin path of projection of H on EDM N ⊂ S N h by an optimization problemdemanding nonnegative input matrix H . The first two line segmentsleading away from H result from correct order-of-projection required toprovide nonnegative H prior to optimization. Were H nonnegative, then itsprojection on S N h would instead belong to K ; making the elbow disappear.(confer Figure 100)

392 CHAPTER 7. EDM PROXIMITYAny procedure for imposition of nonnegativity on input H can only beincorrect in this circumstance. There is no resolution unless input H isguaranteed nonnegative with no tinkering. Otherwise, we have no choice butto employ a different problem realization; one not demanding nonnegativeinput.7.0.2 Lower boundMost of the problems we encounter in this chapter have the general form:minimize ‖B − A‖ FBsubject to B ∈ C(934)where A ∈ R m×n is given data. This particular objective denotes Euclideanprojection (E) of vectorized matrix A on the set C which may or may not beconvex. When C is convex, then the projection is unique minimum-distancebecause the Frobenius norm when squared is a strictly convex function ofvariable B and because the optimal solution is the same regardless of thesquare (1268). When C is a subspace, then the direction of projection isorthogonal to C .Denoting by A=U A Σ A QA T and B=U BΣ B QB T their full singular valuedecompositions (whose singular values are always nonincreasingly ordered(A.6)), there exists a tight lower bound on the objective over the manifoldof orthogonal matrices;‖Σ B − Σ A ‖ F ≤inf ‖B − A‖ F (935)U A ,U B ,Q A ,Q BThis least lower bound holds more generally for any orthogonally invariantnorm on R m×n (2.2.1) including the Frobenius and spectral norm [209,II.3].[125,7.4.51]7.0.3 Three prevalent proximity problemsThere are three statements of the closest-EDM problem prevalent in theliterature, the multiplicity due primarily to choice of projection on theEDM versus positive semidefinite (PSD) cone and vacillation between thedistance-square variable d ij versus absolute distance √ d ij . In their mostfundamental form, the three prevalent proximity problems are (936.1),(936.2), and (936.3): [221]

(1)(3)minimize ‖−V (D − H)V ‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖D − H‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖ ◦√ D − H‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMNminimize ‖−V ( ◦√ D − H)V ‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMN393(2)(936)(4)where we have made explicit an imposed upper bound ρ on affine dimensionr = rankV T NDV N = rankV DV (591)that is benign when ρ =N−1, and where D=[d ∆ ij ] and ◦√ D = ∆ [ √ d ij ] .Problems (936.2) and (936.3) are Euclidean projections of a vectorizedmatrix H on an EDM cone (5.3), whereas problems (936.1) and (936.4) areEuclidean projections of a vectorized matrix −VHV on a PSD cone. Problem(936.4) is not posed in the literature because it has limited theoreticalfoundation. 7.2Analytical solution to (936.1) is known in closed-form for any bound ρalthough, as the problem is stated, it is a convex optimization only in the caseρ =N−1. We show in7.1.4 how (936.1) becomes a convex optimizationproblem for any ρ when transformed to the spectral domain. When expressedas a function of point list in a matrix X , problem (936.2) is a variant of whatis known in statistics literature as the stress problem. [34, p.34] [56] [230]Problems (936.2) and (936.3) are convex optimization problems in D for thecase ρ =N −1. Even with the rank constraint removed from (936.2), we willsee the convex problem remaining inherently minimizes affine dimension.Generally speaking, each problem in (936) produces a different resultbecause there is no isometry relating them. Of the various auxiliaryV -matrices (B.4), the geometric centering matrix V (483) appears in theliterature most often although V N (466) is the auxiliary matrix naturallyconsequent to Schoenberg’s seminal exposition [199]. Substitution of anyauxiliary matrix or its pseudoinverse into these problems produces anothervalid problem.7.2 D ∈ EDM N ⇒ ◦√ D ∈ EDM N , −V ◦√ DV ∈ S N + (4.10)

394 CHAPTER 7. EDM PROXIMITYSubstitution of VNT for left-hand V in (936.1), in particular, produces adifferent result becauseminimize ‖−V (D − H)V ‖ 2 FD(937)subject to D ∈ EDM Nfinds D to attain the Euclidean distance of vectorized −VHV to the positivesemidefinite cone in ambient isometrically isomorphic R N(N+1)/2 , whereasminimize ‖−VN T(D − H)V N ‖ 2 FD(938)subject to D ∈ EDM Nattains Euclidean distance of vectorized −VN THV N to the positivesemidefinite cone in isometrically isomorphic subspace R N(N−1)/2 ; quitedifferent projections 7.3 regardless of whether affine dimension is constrained.But substitution of auxiliary matrix VW T (B.4.3) or V † Nyields the sameresult as (936.1) because V = V W VW T = V N V † N; id est,‖−V (D − H)V ‖ 2 F = ‖−V W V T W (D − H)V W V T W ‖2 F = ‖−V T W (D − H)V W‖ 2 F= ‖−V N V † N (D − H)V N V † N ‖2 F = ‖−V † N (D − H)V N ‖ 2 F(939)We see no compelling reason to prefer one particular auxiliary V -matrixover another. Each has its own coherent interpretations; e.g.,4.4.2,5.7.Neither can we say any particular problem formulation produces generallybetter results than another.7.1 First prevalent problem:Projection on PSD coneThis first problemminimize ‖−VN T(D − H)V ⎫N ‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 1 (940)⎪D ∈ EDM N ⎭7.3 The isomorphism T(Y )=V †TN Y V † N onto SN c = {V X V | X ∈ S N } relates the map in(938) to that in (937), but is not an isometry.

7.1. FIRST PREVALENT PROBLEM: 395poses a Euclidean projection of −VN THV N in subspace S N−1 on a generallynonconvex subset (when ρ

396 CHAPTER 7. EDM PROXIMITY7.1.1 Closest-EDM Problem 1, convex case7.1.1.0.1 Proof. Solution (942), convex case.When desired affine dimension is unconstrained, ρ =N −1, the rank functiondisappears from (940) leaving a convex optimization problem; a simpleunique minimum-distance projection on the positive semidefinite cone S N−1videlicetby (479). Becauseminimize ‖−VD∈ S N N T(D − H)V N ‖ 2 Fhsubject to −VN TDV N ≽ 0S N−1 = −V T N S N h V N (560)+ :(945)then the necessary and sufficient conditions for projection in isometricallyisomorphic R N(N−1)/2 on the self-dual (298) positive semidefinite cone S N−1+are: 7.6 (E.9.2.0.1) (1179) (confer (1589))−V T N D⋆ V N ≽ 0−V T N D⋆ V N(−VTN D ⋆ V N + V T N HV N)= 0−V T N D⋆ V N + V T N HV N ≽ 0(946)Symmetric −VN THV N is diagonalizable hence decomposable in terms of itseigenvectors v and eigenvalues λ as in (943). Therefore (confer (942))−V T ND ⋆ V N =N−1∑i=1max{0, λ i }v i v T i (947)satisfies (946), optimally solving (945). To see that, recall: these eigenvectorsconstitute an orthogonal set and−V T ND ⋆ V N + V T NHV NN−1∑= − min{0, λ i }v i vi T (948)i=17.6 These conditions for projection on a convex cone are identical to theKarush-Kuhn-Tucker (KKT) optimality conditions for problem (945).

7.1. FIRST PREVALENT PROBLEM: 3977.1.2 Generic problemPrior to determination of D ⋆ , analytical solution to Problem 1B ⋆ ∆ = −V T ND ⋆ V N =ρ∑max{0, λ i }v i vi T ∈ S N−1 (942)i=1is equivalent to solution of a generic rank-constrained projection problem; aEuclidean projection on a subset of a PSD cone (on a generally nonconvexsubset of the PSD cone boundary ∂S N−1+ when ρ

398 CHAPTER 7. EDM PROXIMITYdimension 7.7 ρ ; called rank ρ subset: (179) (207)S N−1+ \ S N−1+ (ρ + 1) = {X ∈ S N−1+ | rankX ≤ ρ} (212)7.1.3 Choice of spectral coneIn this section we develop a method of spectral projection, on polyhedralcones containing eigenspectra, for constraining rank of positive semidefinitematrices in a proximity problem like (949). We will see why an orthant turnsout to be the best choice of spectral cone, and why presorting (π) is critical.7.1.3.0.1 Definition. Spectral projection.Let R be an orthogonal matrix and Λ a nonincreasingly ordereddiagonal matrix of eigenvalues. Spectral projection means uniqueminimum-distance projection of a rotated (R ,B.5.4) nonincreasinglyordered (π) eigenspectrumπ ( δ(R T ΛR) ) ∈ R N−1 (951)on a polyhedral cone containing all eigenspectra corresponding to a rank ρsubset (2.9.2.1) of the positive semidefinite cone S N−1+ . △In the simplest and most common case, projection on the positivesemidefinite cone, orthogonal matrix R equals I (7.1.4.0.1) and diagonalmatrix Λ is ordered during diagonalization. Then spectral projection simplymeans projection of δ(Λ) on a subset of the nonnegative orthant, as we shallnow ascertain:It is curious how nonconvex Problem 1 has such a simple analyticalsolution (942). Although solution to generic problem (949) is known since1936 [67], Trosset [229,2] first observed its equivalence in 1997 to spectralprojection of an ordered eigenspectrum (in diagonal matrix Λ) on a subsetof the monotone nonnegative cone (2.13.9.4.1)K M+ = {υ | υ 1 ≥ υ 2 ≥ · · · ≥ υ N−1 ≥ 0} ⊆ R N−1+ (347)Of interest, momentarily, is only the smallest convex subset of themonotone nonnegative cone K M+ containing every nonincreasingly ordered7.7 Recall: affine dimension is a lower bound on embedding (2.3.1), equal to dimensionof the smallest affine set in which points from a list X corresponding to an EDM D canbe embedded.

7.1. FIRST PREVALENT PROBLEM: 399eigenspectrum corresponding to a rank ρ subset of the positive semidefinitecone S N−1+ ; id est,K ρ M+∆= {υ ∈ R ρ | υ 1 ≥ υ 2 ≥ · · · ≥ υ ρ ≥ 0} ⊆ R ρ + (952)a pointed polyhedral cone, a ρ-dimensional convex subset of themonotone nonnegative cone K M+ ⊆ R N−1+ having property, for λ denotingeigenspectra,[ρ] KM+= π(λ(rank ρ subset)) ⊆ K N−1 ∆0M+= K M+ (953)For each and every elemental eigenspectrumγ ∈ λ(rank ρ subset)⊆ R N−1+ (954)of the rank ρ subset (ordered or unordered in λ), there is a nonlinearsurjection π(γ) onto K ρ M+ . There is no convex subset of K M+ smaller thanK ρ M+containing every ordered eigenspectrum corresponding to the rank ρsubset; a fact not proved here.7.1.3.0.2 Proposition. (Hardy-Littlewood-Pólya) [106,X][36,1.2] Any vectors σ and γ in R N−1 satisfy a tight inequalityπ(σ) T π(γ) ≥ σ T γ ≥ π(σ) T Ξπ(γ) (955)where Ξ is the order-reversing permutation matrix defined in (1313),and permutator π(γ) is a nonlinear function that sorts vector γ intononincreasing order thereby providing the greatest upper bound and leastlower bound with respect to every possible sorting.⋄7.1.3.0.3 Corollary. Monotone nonnegative sort.Any given vectors σ,γ∈R N−1 satisfy a tight Euclidean distance inequality‖π(σ) − π(γ)‖ ≤ ‖σ − γ‖ (956)where nonlinear function π(γ) sorts vector γ into nonincreasing orderthereby providing the least lower bound with respect to every possiblesorting.⋄

400 CHAPTER 7. EDM PROXIMITYGiven γ ∈ R N−1infσ∈R N−1+‖σ−γ‖ = infσ∈R N−1+‖π(σ)−π(γ)‖ = infσ∈R N−1+Yet for γ representing an arbitrary eigenspectrum, becauseinf − γ‖0‖σ 2 ≥ inf − π(γ)‖σ∈R ρ +0‖σ 2 =σ∈R ρ +‖σ−π(γ)‖ = inf ‖σ−π(γ)‖σ∈K M+infσ∈K ρ M+0(957)‖σ − π(γ)‖ 2 (958)then projection of γ on the eigenspectra of a rank ρ subset can be tightenedsimply by presorting γ into nonincreasing order.Proof. Simply because π(γ) 1:ρ ≽ π(γ 1:ρ )inf − γ‖0‖σ 2 = γρ+1:N−1 T γ ρ+1:N−1 + inf ‖σ 1:ρ − γ 1:ρ ‖ 2σ∈R N−1+= γ T γ + inf σ1:ρσ T 1:ρ − 2σ1:ργ T 1:ρσ∈R ρ +inf − γ‖0‖σ 2 ≥ infσ∈R ρ +σ∈R N−1+≥ γ T γ + infσ1:ρσ T 1:ρ − 2σ T (959)1:ρπ(γ) 1:ρσ∈R N−1+− π(γ)‖σ∈R ρ +0‖σ 27.1.3.1 Orthant is best spectral cone for Problem 1This means unique minimum-distance projection of γ on the nearestspectral member of the rank ρ subset is tantamount to presorting γ intononincreasing order. Only then does unique spectral projection on a subsetK ρ M+of the monotone nonnegative cone become equivalent to unique spectralprojection on a subset R ρ + of the nonnegative orthant (which is simpler);in other words, unique minimum-distance projection of sorted γ on thenonnegative orthant in a ρ-dimensional subspace of R N is indistinguishablefrom its projection on the subset K ρ M+of the monotone nonnegative cone inthat same subspace.

7.1. FIRST PREVALENT PROBLEM: 4017.1.4 Closest-EDM Problem 1, “nonconvex” caseTrosset’s proof of solution (942), for projection on a rank ρ subset of thepositive semidefinite cone S N−1+ , was algebraic in nature. [229,2] Here wederive that known result but instead using a more geometric argument viaspectral projection on a polyhedral cone (subsuming the proof in7.1.1). Inso doing, we demonstrate how Problem 1 is implicitly a convex optimization:7.1.4.0.1 Proof. Solution (942), nonconvex case.As explained in7.1.2, we may instead work with the more facile genericproblem (949). With diagonalization of unknownB ∆ = UΥU T ∈ S N−1 (960)given desired affine dimension 0 ≤ ρ ≤ N −1 and diagonalizableA ∆ = QΛQ T ∈ S N−1 (961)having eigenvalues in Λ arranged in nonincreasing order, by (39) the genericproblem is equivalent towhereminimize ‖Υ − R T ΛR‖ 2 FR , Υsubject to rank Υ ≤ ρ(962)Υ ≽ 0R −1 = R TR ∆ = Q T U ∈ R N−1×N−1 (963)in U on the set of orthogonal matrices is a linear bijection. By (1262), thisis equivalent to the problem sequence:minimize ‖Υ − R T ΛR‖ 2 FΥsubject to rank Υ ≤ ρΥ ≽ 0minimize ‖Υ ⋆ − R T ΛR‖ 2 FRsubject to R −1 = R T(a)(b)(964)

402 CHAPTER 7. EDM PROXIMITYProblem (964a) is equivalent to: (1) orthogonal projection of R T ΛRon an N − 1-dimensional subspace of isometrically isomorphic R N(N−1)/2containing δ(Υ)∈ R N−1+ , (2) nonincreasingly ordering the[result, (3) uniqueRρ]+minimum-distance projection of the ordered result on . (E.9.5)0Projection on that N−1-dimensional subspace amounts to zeroing R T ΛR atall entries off the main diagonal; thus, the equivalent sequence leading witha spectral projection:minimize ‖δ(Υ) − π ( δ(R T ΛR) ) ‖ 2Υ [ Rρ]+subject to δ(Υ) ∈0minimize ‖Υ ⋆ − R T ΛR‖ 2 FRsubject to R −1 = R T(a)(b)(965)Because any permutation matrix is an orthogonal matrix, it is alwaysfeasible that δ(R T ΛR)∈ R N−1 be arranged in nonincreasing order; hence,the permutation operator π . Unique minimum-distance projection of vectorπ ( δ(R T ΛR) ) [ Rρ]+on the ρ-dimensional subset of the nonnegative orthant0R N−1+ requires: (E.9.2.0.1)δ(Υ ⋆ ) ρ+1:N−1 = 0δ(Υ ⋆ ) ≽ 0δ(Υ ⋆ ) T( δ(Υ ⋆ ) − π(δ(R T ΛR)) ) = 0δ(Υ ⋆ ) − π(δ(R T ΛR)) ≽ 0(966)which are necessary and sufficient conditions.conditions (966) is optimal for (965a). Soδ(Υ ⋆ ) i ={{max 0, π ( δ(R T ΛR) ) }, i=1... ρi0, i=ρ+1... N −1Any value Υ ⋆ satisfying(967)specifies an optimal solution. The lower bound on the objective with respectto R in (965b) is tight; by (935)

7.1. FIRST PREVALENT PROBLEM: 403‖ |Υ ⋆ | − |Λ| ‖ F ≤ ‖Υ ⋆ − R T ΛR‖ F (968)where | | denotes absolute entry-value. For selection of Υ ⋆ as in (967), thislower bound is attained whenR ⋆ = I (969)which is the known solution.7.1.4.1 SignificanceImportance of this well-known [67] optimal solution (942) for projection ona rank ρ subset of the positive semidefinite cone should not be dismissed:This solution is closed-form and the only method known for enforcinga constraint on rank of an EDM in projection problems such as (936).This solution is equivalent to projection on a polyhedral cone in thespectral domain (projection on a spectral cone); a necessary andsufficient condition (A.3.1) for membership of a symmetric matrix toa rank ρ subset (2.9.2.1) of the positive semidefinite cone.This solution at once encompasses projection on a rank ρ subset (212)of the positive semidefinite cone (generally, a nonconvex subset of itsboundary) from either the exterior or interior of that cone. 7.8 Bytransforming the problem to the spectral domain, projection on arank ρ subset became a convex optimization problem.For the convex case problem, this solution is always unique. Otherwise,distinct eigenvalues (multiplicity 1) in Λ guarantees uniqueness of thissolution by the reasoning inA.5.0.1 . 7.97.8 Projection on the boundary from the interior of a convex Euclidean body is generallya nonconvex problem.7.9 Uncertainty of uniqueness prevents the erroneous conclusion that a rank ρ subset(179) were a convex body by the Bunt-Motzkin theorem (E.9.0.0.1).

404 CHAPTER 7. EDM PROXIMITYBecause U ⋆ = Q , a minimum-distance projection on a rank ρ subsetof the positive semidefinite cone is a positive semidefinite matrixorthogonal (in the Euclidean sense) to direction of projection. 7.107.1.5 Problem 1 in spectral norm, convex caseWhen instead we pose the matrix 2-norm (spectral norm) in Problem 1 (940)for the convex case ρ = N −1, then the new problemminimize ‖−VN T(D − H)V N ‖ 2D(970)subject to D ∈ EDM Nis convex although its solution is not necessarily unique; 7.11 giving rise tooblique projection (E) on the positive semidefinite cone S N−1+ . Indeed,its solution set includes the Frobenius solution (942) for the convex casewhenever −VN THV N is a normal matrix. [109,1] [104] [38,8.1.1] Singularvalue problem (970) is equivalent towhereminimize µµ,Dsubject to −µI ≼ −VN T(D − H)V N ≼ µI (971)D ∈ EDM N{ ∣µ ⋆ = maxλ ( −VN T (D ⋆ − H)V N∣ , i = 1... N −1i)i} ∈ R + (972)the maximum absolute eigenvalue (due to matrix symmetry).For lack of a unique solution here, we prefer the Frobenius rather thanspectral norm.7.10 But Theorem E.9.2.0.1 for unique projection on a closed convex cone does not applyhere because the direction of projection is not necessarily a member of the dual PSD cone.This occurs, for example, whenever positive eigenvalues[ ]are truncated.2 07.11 For each and every |t|≤2, for example, has the same spectral norm.0 t

7.2. SECOND PREVALENT PROBLEM: 4057.2 Second prevalent problem:Projection on EDM cone in √ d ijLet◦√D ∆ = [ √ d ij ] ∈ K = S N h ∩ R N×N+ (973)be an unknown matrix of absolute distance; id est,D = [d ij ] ∆ = ◦√ D ◦ ◦√ D ∈ EDM N (974)where ◦ denotes Hadamard product. The second prevalent proximityproblem is a Euclidean projection (in the natural coordinates √ d ij ) of matrixH on a nonconvex subset of the boundary of the nonconvex cone of Euclideanabsolute-distance matrices rel∂ √ EDM N : (5.3, confer Figure 74(b))minimize ‖ ◦√ ⎫D − H‖◦√ 2 FD⎪⎬subject to rankVN TDV N ≤ ρ Problem 2 (975)◦√ √D ∈ EDMN⎪⎭where√EDM N = { ◦√ D | D ∈ EDM N } (729)This statement of the second proximity problem is considered difficult tosolve because of the constraint on desired affine dimension ρ (4.7.2) andbecause the objective function‖ ◦√ D − H‖ 2 F = ∑ i,j( √ d ij − h ij ) 2 (976)is expressed in the natural coordinates; projection on a doubly nonconvexset.Our solution to this second problem prevalent in the literature requiresmeasurement matrix H to be nonnegative;H = [h ij ] ∈ R N×N+ (977)If the H matrix given has negative entries, then the technique of solutionpresented here becomes invalid. As explained in7.0.1, projection of Hon K = S N h ∩ R N×N+ (929) prior to application of this proposed solution isincorrect.

406 CHAPTER 7. EDM PROXIMITY7.2.1 Convex caseWhen ρ = N − 1, the rank constraint vanishes and a convex problememerges: 7.12minimize◦√Dsubject to‖ ◦√ D − H‖ 2 F◦√D ∈√EDMN⇔minimizeD∑ √d ij − 2h ij dij + h 2 iji,jsubject to D ∈ EDM N (978)For any fixed i and j , the argument of summation is a convex functionof d ij because (for nonnegative constant h ij ) the negative square root isconvex in nonnegative d ij and because d ij + h 2 ij is affine (convex). Becausethe sum of any number of convex functions in D remains convex [38,3.2.1]and because the feasible set is convex in D , we have a convex optimizationproblem:minimize 1 T (D − 2H ◦ ◦√ D )1 + ‖H‖ 2 FD(979)subject to D ∈ EDM NThe objective function being a sum of strictly convex functions is,moreover, strictly convex in D on the nonnegative orthant. Existenceof a unique solution D ⋆ for this second prevalent problem depends uponnonnegativity of H and a convex feasible set (3.1.1.1). 7.137.2.1.1 Equivalent semidefinite program, Problem 2, convex caseConvex problem (978) is numerically solvable for its global minimum usingan interior-point method [38,11] [259] [181] [174] [251] [9] [78]. We translate(978) to an equivalent semidefinite program (SDP) for a pedagogical reasonmade clear in7.2.2.2 and because there exist readily available computerprograms for numerical solution [216] [95] [232] [24] [252] [253] [254].Substituting a new matrix variable Y ∆ = [y ij ]∈ R N×N+h ij√dij ← y ij (980)7.12 still thought to be a nonconvex problem as late as 1997 [230] even though discoveredconvex by de Leeuw in 1993. [56] [34,13.6] Yet using methods from3, it can be easilyascertained: ‖ ◦√ D − H‖ F is not convex in D .7.13 The transformed problem in variable D no longer describes √ Euclidean projection onan EDM cone. Otherwise we might erroneously conclude EDM N were a convex bodyby the Bunt-Motzkin theorem (E.9.0.0.1).

7.2. SECOND PREVALENT PROBLEM: 407Boyd proposes: problem (978) is equivalent to the semidefinite program∑minimize d ij − 2y ij + h 2 ijD , Yi,j[ ]dij y ij(981)subject to≽ 0 , i,j = 1... Ny ij h 2 ijD ∈ EDM NTo see that, recall d ij ≥ 0 is implicit to D ∈ EDM N (4.8.1, (479)). Sowhen H ∈ R N×N+ is nonnegative as assumed,[ ]dij y √ij≽ 0 ⇔ h ij√d ij ≥ yij 2 (982)y ijh 2 ijMinimization of the objective function implies maximization of y ij thatis bounded above. √ Hence nonnegativity of y ij is implicit to (981) and, asdesired, y ij →h ij dij as optimization proceeds.If the given matrix H is now assumed symmetric and nonnegative,H = [h ij ] ∈ S N ∩ R N×N+ (983)then Y = H ◦ ◦√ D must belong to K= S N h ∩ R N×N+ (929). Then becauseY ∈ S N h (B.4.2 no.20)‖ ◦√ D − H‖ 2 F = ∑ i,jd ij − 2y ij + h 2 ij = −N tr(V (D − 2Y )V ) + ‖H‖ 2 F (984)so convex problem (981) is equivalent to the semidefinite programminimize − tr(V (D − 2Y )V )D , Y[ ]dij y ijsubject to≽ 0 ,j > i = 1... N −1y ij h 2 ij(985)Y ∈ S N hD ∈ EDM Nwhere the constants h 2 ij and N have been dropped arbitrarily from theobjective.

408 CHAPTER 7. EDM PROXIMITY7.2.1.2 Gram-form semidefinite program, Problem 2, convex caseThere is great advantage to expressing problem statement (985) inGram-form because Gram matrix G is a bidirectional bridge between pointlist X and distance matrix D ; e.g., Example 4.4.2.2.4, Example 5.4.0.0.1.This way, problem convexity can be maintained while simultaneouslyconstraining point list X , Gram matrix G , and distance matrix D at ourdiscretion.Convex problem (985) may be equivalently written via linear bijective(4.6.1) EDM operator D(G) (472);minimizeG∈S N c , Y ∈ S N hsubject to− tr(V (D(G) − 2Y )V )[ ]〈Φij , G〉 y ij≽ 0 ,y ijh 2 ijj > i = 1... N −1(986)G ≽ 0where distance-square D = [d ij ] ∈ S N h (456) is related to Gram matrix entriesG = [g ij ] ∈ S N c ∩ S N + byd ij = g ii + g jj − 2g ij= 〈Φ ij , G〉(471)whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (458)Confinement of G to the geometric center subspace provides numericalstability and no loss of generality (confer (733)); implicit constraint G1 = 0is otherwise unnecessary.To include constraints on the list X ∈ R n×N , we would first rewrite (986)minimize − tr(V (D(G) − 2Y )V )G∈S N c , Y ∈ SN h , X∈ Rn×N [ ]〈Φij , G〉 y ijsubject to≽ 0 ,X Ty ijh 2 ij[ ] I X≽ 0GX ∈ Cj > i = 1... N −1(987)

7.2. SECOND PREVALENT PROBLEM: 409and then add the constraints, realized here in abstract membership tosome convex set C . This problem realization includes a relaxation of thenonconvex constraint G = X T X and, if desired, more constraints on G couldbe added. This technique is discussed in4.4.2.2.4.7.2.2 Minimization of affine dimension in Problem 2When desired affine dimension ρ is diminished, the rank function becomesreinserted into problem (981) that is then rendered difficult to solve becausethe feasible set {D , Y } loses convexity in S N h × R N×N . Indeed, the rankfunction is quasiconcave (3.2) on the positive semidefinite cone; (2.9.2.5.2)id est, its sublevel sets are not convex.7.2.2.1 Rank minimization heuristicA remedy developed in [73] [163] [74] [72] introduces the convex envelope(cenv) of the quasiconcave rank function:7.2.2.1.1 Definition. Convex envelope. [122]The convex envelope of a function f : C →R is defined as the largest convexfunction g such that g ≤ f on convex domain C ⊆ R n . 7.14 △The convex envelope of the rank function is proportional to the tracefunction when its argument is constrained to be symmetric and positivesemidefinite.A properly scaled trace thus represents the best convex lower bound on rank.The idea, then, is to substitute the convex envelope for the rank of somevariable A∈ S M +rankA ← cenv(rankA) ∝trA = ∑ iσ(A) i = ∑ iequivalent to the sum of all eigenvalues or singular values.λ(A) i = ‖λ(A)‖ 1(988)7.14 Provided f ≢+∞ and there exists an affine function h ≤f on R n , then the convexenvelope is equal to the convex conjugate (the Legendre-Fenchel transform) of the convexconjugate of f ; id est, the conjugate-conjugate function f ∗∗ . [123,E.1]

410 CHAPTER 7. EDM PROXIMITY7.2.2.2 Applying trace rank-heuristic to Problem 2Substituting the rank envelope for the rank function in Problem 2, forD ∈ EDM N (confer (591))cenv rank(−V T NDV N ) = cenv rank(−V DV ) ∝ − tr(V DV ) (989)and for desired affine dimension ρ ≤ N−1 and nonnegative H [sic] we havethe relaxed convex optimization problemminimize ‖ ◦√ D − H‖ 2 FDsubject to − tr(V DV ) ≤ κ ρ(990)D ∈ EDM Nwhere κ ∈ R + is a constant determined by cut-and-try. 7.15 The equivalentsemidefinite program makes κ variable: for nonnegative and symmetric HminimizeD , Y , κsubject toκ ρ + 2 tr(V Y V )[ ]dij y ij≽ 0 ,y ijh 2 ijj > i = 1... N −1(991)− tr(V DV ) ≤ κ ρY ∈ S N hD ∈ EDM Nwhich is the same as (985), the problem with no explicit constraint on affinedimension. As the present problem is stated, the desired affine dimension ρyields to the variable scale factor κ ; ρ is effectively ignored.Yet this result is an illuminant for problem (985) and it equivalents(all the way back to (978)): When the given measurement matrix H isnonnegative and symmetric, finding the closest EDM D as in problem(978), (981), or (985) implicitly entails minimization of affine dimension(confer4.8.4,4.14.4). Those non rank-constrained problems are eachinherently equivalent to cenv(rank)-minimization problem (991), in otherwords, and their optimal solutions are unique because of the strictly convexobjective function in (978).7.15 cenv(rank A) on {A∈ S n + | ‖A‖ 2 ≤κ} equals tr(A)/κ [73] [72]

7.2. SECOND PREVALENT PROBLEM: 4117.2.2.3 Rank-heuristic insightMinimization of affine dimension by use of this trace rank-heuristic (989)tends to find the list configuration of least energy; rather, it tends to optimizecompaction of the reconstruction by minimizing total distance. (485) It is bestused where some physical equilibrium implies such an energy minimization;e.g., [228,5].For this Problem 2, the trace rank-heuristic arose naturally in theobjective in terms of V . We observe: V (in contrast to VN T ) spreads energyover all available distances (B.4.2 no.20, no.22) although the rank functionitself is insensitive to the choice.7.2.2.4 Rank minimization heuristic beyond convex envelopeFazel, Hindi, and Boyd [74] propose a rank heuristic more potent than trace(988) for problems of rank minimization;rankY ← log det(Y +εI) (992)the concave surrogate function log det in place of quasiconcave rankY(2.9.2.5.2) when Y ∈ S n + is variable and where ε is a small positive constant.They propose minimization of the surrogate by substituting a sequencecomprising infima of a linearized surrogate about the current estimate Y i ;id est, from the first-order Taylor series expansion about Y i on some openinterval of ‖Y ‖ (D.1.6)log det(Y + εI) ≈ log det(Y i + εI) + tr ( (Y i + εI) −1 (Y − Y i ) ) (993)we make the surrogate sequence of infima over bounded convex feasible set Cwherearg inf rankY ← lim Y i+1 (994)Y ∈ C i→∞Y i+1 = arg infY ∈ C tr( (Y i + εI) −1 Y ) (995)Choosing Y 0 =I , the first step becomes equivalent to finding the infimum oftrY ; the trace rank-heuristic (988). The intuition underlying (995) is thenew term in the argument of trace; specifically, (Y i + εI) −1 weights Y so thatrelatively small eigenvalues of Y found by the infimum are made even smaller.

412 CHAPTER 7. EDM PROXIMITYTo see that,diagonalizationsThen from (1290) we have,substitute into (995) the nonincreasingly orderedY i + εI ∆ = Q(Λ + εI)Q TY ∆ = UΥU Tinf δ((Λ + εI) −1 ) T δ(Υ) =Υ∈U ⋆T CU ⋆≤infΥ∈U T CU(a)(b)inf tr ( (Λ + εI) −1 R T ΥR )R T =R −1inf tr((Y i + εI) −1 Y )Y ∈ C(996)(997)where R = ∆ Q T U in U on the set of orthogonal matrices is a bijection. Therole of ε is, therefore, to limit the maximum weight; the smallest entry onthe main diagonal of Υ gets the largest weight.7.2.2.5 Applying log det rank-heuristic to Problem 2When the log det rank-heuristic is inserted into Problem 2, problem (991)becomes the problem sequence in iminimizeD , Y , κsubject toκ ρ + 2 tr(V Y V )[ ]djl y jl≽ 0 ,y jlh 2 jll > j = 1... N −1− tr((−V D i V + εI) −1 V DV ) ≤ κ ρwhere D i+1 ∆ =D ⋆ ∈ EDM N , and D 0 ∆ =11 T − I .(998)Y ∈ S N hD ∈ EDM N7.2.2.6 Tightening this log det rank-heuristicLike the trace method, this log det technique for constraining rank offersno provision for meeting a predetermined upper bound ρ . Yet since theeigenvalues of the sum are simply determined, λ(Y i + εI) = δ(Λ + εI) , wemay certainly force selected weights to ε −1 by manipulating diagonalization(996a). Empirically we find this sometimes leads to better results, althoughaffine dimension of a solution cannot be guaranteed.

7.3. THIRD PREVALENT PROBLEM: 4137.3 Third prevalent problem:Projection on EDM cone in d ijReformulating Problem 2 (p.405) in terms of EDM D changes the problemconsiderably:⎫minimize ‖D − H‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 3 (999)⎪D ∈ EDM N ⎭This third prevalent proximity problem is a Euclidean projection of givenmatrix H on a generally nonconvex subset (when ρ < N −1) of theboundary of the convex cone of Euclidean distance matrices ∂EDM N relativeto subspace S N h (Figure 74(d)). Because coordinates of projection aredistance-square and H presumably now holds distance-square measurements,numerical solution to Problem 3 is generally different than that of Problem 2.For the moment, we need make no assumptions regarding measurementmatrix H .7.3.1 Convex caseminimize ‖D − H‖ 2 FD(1000)subject to D ∈ EDM NWhen the rank constraint disappears (for ρ = N −1), this third problembecomes obviously convex because the feasible set is then the entire EDMcone and because the objective function‖D − H‖ 2 F = ∑ i,j(d ij − h ij ) 2 (1001)is a strictly convex quadratic in D ; 7.167.16 For nonzero Y ∈ S N h and some open interval of t∈R (3.1.2.3.2,D.2.3)d 2dt 2 ‖(D + tY ) − H‖2 F = 2 tr Y T Y > 0

414 CHAPTER 7. EDM PROXIMITY∑minimize d 2 ij − 2h ij d ij + h 2 ijDi,j(1002)subject to D ∈ EDM NOptimal solution D ⋆ is therefore unique, as expected, for this simpleprojection on the EDM cone.7.3.1.1 Equivalent semidefinite program, Problem 3, convex caseIn the past, this convex problem was solved numerically by means ofalternating projection. (Example 7.3.1.1.1) [85] [79] [110,1] We translate(1002) to an equivalent semidefinite program because we have a good solver:Assume the given measurement matrix H to be nonnegative andsymmetric; 7.17H = [h ij ] ∈ S N ∩ R N×N+ (983)We then propose: Problem (1002) is equivalent to the semidefinite program,for ∂ = ∆ [d 2 ij ]=D ◦D distance-square squared,minimize − tr(V (∂ − 2H ◦D)V )∂ , D[ ]∂ij d ijsubject to≽ 0 , j > i = 1... N −1d ij 1(1003)whereD ∈ EDM N∂ ∈ S N h[∂ij d ijd ij 1]≽ 0 ⇔ ∂ ij ≥ d 2 ij (1004)Symmetry of input H facilitates trace in the objective (B.4.2 no.20), whileits nonnegativity causes ∂ ij →d 2 ij as optimization proceeds.7.17 If that H given has negative entries, then the technique of solution presented herebecomes invalid. Projection of H on K (929) prior to application of this proposedtechnique, as explained in7.0.1, is incorrect.

7.3. THIRD PREVALENT PROBLEM: 4157.3.1.1.1 Example. Alternating projection on nearest EDM.By solving (1003) we confirm the result from an example given by Glunt,Hayden, et alii [85,6] who found an analytical solution to the convexoptimization problem (1000) for the particular cardinality N = 3 by usingthe alternating projection method of von Neumann (E.10):⎡H = ⎣0 1 11 0 91 9 0⎤⎦ , D ⋆ =⎡⎢⎣19 1909 919 7609 91997690⎤⎥⎦ (1005)The original problem (1000) of projecting H on the EDM cone is transformedto an equivalent iterative sequence of projections on the two convex cones(792) from5.8.1.1. Using ordinary alternating projection, input H goes toD ⋆ with an accuracy of four decimal places in about 17 iterations. Affinedimension corresponding to this optimal solution is r = 1.Obviation of semidefinite programming’s computational expense is theprincipal advantage of this alternating projection technique. 7.3.1.2 Schur-form semidefinite program, Problem 3 convex caseSemidefinite program (1003) can be reformulated by moving the objectivefunction inminimize ‖D − H‖ 2 FD(1000)subject to D ∈ EDM Nto the constraints. This makes an equivalent second-order cone program: forany measurement matrix Hminimize tt∈R , Dsubject to ‖D − H‖ 2 F ≤ t(1006)D ∈ EDM NWe can transform this problem to an equivalent Schur-form semidefiniteprogram; (A.4.1)minimizet∈R , Dsubject tot[]tI vec(D − H)vec(D − H) T ≽ 0 (1007)1D ∈ EDM N

416 CHAPTER 7. EDM PROXIMITYcharacterized by great sparsity and structure. The advantage of this SDPis lack of conditions on input H ; e.g., negative entries would invalidate anysolution provided by (1003). (7.0.1.2)7.3.1.3 Gram-form semidefinite program, Problem 3 convex caseFurther, this problem statement may be equivalently written in terms of aGram matrix via linear bijective (4.6.1) EDM operator D(G) (472);minimizeG∈S N c , t∈Rsubject tot[tI vec(D(G) − H)vec(D(G) − H) T 1]≽ 0(1008)G ≽ 0To include constraints on the list X ∈ R n×N , we would rewrite this:G∈S N cminimize, t∈R , X∈ Rn×Ntsubject to[]tI vec(D(G) − H)vec(D(G) − H) T ≽ 01[ ] I XX T ≽ 0GX ∈ C(1009)where C is some abstract convex set. This technique is discussed in4.4.2.2.4.7.3.2 Minimization of affine dimension in Problem 3When the desired affine dimension ρ is diminished, Problem 3 (999) isdifficult to solve [110,3] because the feasible set in R N(N−1)/2 loses convexity.By substituting the rank envelope (989) into Problem 3, then for any givenH we have the relaxed convex problemminimize ‖D − H‖ 2 FDsubject to − tr(V DV ) ≤ κ ρ(1010)D ∈ EDM Nwhere κ ∈ R + is a constant determined by cut-and-try. Given κ , problem(1010) is a convex optimization having unique solution in any desired

7.3. THIRD PREVALENT PROBLEM: 417affine dimension ρ ; an approximation to Euclidean projection on thatnonconvex subset of the EDM cone containing EDMs with correspondingaffine dimension no greater than ρ .The SDP equivalent to (1010) does not move κ into the variables as onpage 410: for nonnegative symmetric input Hminimize − tr(V (∂ − 2H ◦D)V )∂ , D[ ]∂ij d ijsubject to≽ 0 , j > i = 1... N −1d ij 1− tr(V DV ) ≤ κ ρD ∈ EDM N(1011)∂ ∈ S N hThat means we will not see equivalence of this cenv(rank)-minimizationproblem to the non rank-constrained problems (1002) and (1003) like wesaw for its counterpart (991) in Problem 2.Another approach to affine dimension minimization is to project insteadon the polar EDM cone; discussed in5.8.1.5.7.3.3 Constrained affine dimension, Problem 3When one desires affine dimension diminished further below what can beachieved via cenv(rank)-minimization as in (1011), spectral projection isa natural consideration. Spectral projection is motivated by its successfulapplication to projection on a rank ρ subset of the positive semidefinite conein7.1.4. Yet it is wrong here to zero eigenvalues of −V DV or −V GV or avariant to reduce affine dimension, because that particular method is germaneto projection on a positive semidefinite cone; zeroing eigenvalues here wouldplace an elbow in the projection path. (confer Figure 90) Problem 3 isinstead a projection on the EDM cone whose associated spectral cone isconsiderably different. (4.11.2.3)We shall now show why direct application of spectral projection to theEDM cone is difficult. Although philosophically questionable to providea negative example, it is instructive and will hopefully motivate futureresearchers to overcome the obstacles we encountered.

418 CHAPTER 7. EDM PROXIMITY7.3.3.1 Cayley-Menger formWe use Cayley-Menger composition of the Euclidean distance matrix to solvea problem that is the same as Problem 3 (999): (4.7.3.0.1)[ ] [ ]∥ minimize0 1T 0 1T ∥∥∥2D∥ −1 −D 1 −H[ ]F0 1T(1012)subject to rank≤ ρ + 21 −DD ∈ EDM Na projection of H on a generally nonconvex subset (when ρ < N −1) of theEuclidean distance matrix cone boundary rel∂EDM N ; id est, projectionfrom the EDM cone interior or exterior on a subset of its relative boundary(5.6, (726)).Rank of an optimal solution is intrinsically bounded above and below;[ ] 0 1T2 ≤ rank1 −D ⋆ ≤ ρ + 2 ≤ N + 1 (1013)Our proposed strategy ([ for low-rank solution ]) is projection on that subset0 1Tof a spectral cone λ1 −EDM N (4.11.2) corresponding to affinedimension not in excess of that ρ desired; id est, spectral projection onthe convex cone ⎡ ⎤⎣ Rρ+1 +0 ⎦ ∩ ∂H ⊂ R N+1 (1014)R −where∂H = ∆ {λ ∈ R N+1 | 1 T λ = 0} (1015)This pointed polyhedral cone (1014), to which membership subsumes therank constraint, has empty interior.Given desired affine dimension 0 ≤ ρ ≤ N −1 and diagonalization (A.5)of unknown EDM D [ ] 0 1T∆= UΥU T ∈ S N+11 −Dh(1016)and given symmetric H in diagonalization[ ] 0 1T∆= QΛQ T ∈ S N+1 (1017)1 −H

7.3. THIRD PREVALENT PROBLEM: 419having eigenvalues arranged in nonincreasing order, then by (654) problem(1012) is equivalent towhereminimize ‖Υ − R T ΛR‖ 2 FΥ,Rsubject to rank Υ ≤ ρ + 2δ 2 (Υ) = Υ holds exactly one negative eigenvalue (1018)δ(QRΥR T Q T ) = 0R −1 = R TR ∆ = Q T U ∈ R N+1×N+1 (1019)in U on the set of orthogonal matrices is a bijection. Two eigenvalueconstraints can be precisely and equivalently restated in terms of spectralcone (1014):minimize ‖Υ − R T ΛR‖ 2 FΥ,R⎡ ⎤subject to δ(Υ) ∈ ⎣ Rρ+1 +0 ⎦ ∩ ∂H(1020)R −δ(QRΥR T Q T ) = 0R −1 = R Twhere ∂H insures one negative eigenvalue. Diagonal constraintδ(QRΥR T Q T ) = 0 makes problem (1020) difficult by making the twovariables dependent. Without it, the problem could be solved exactly by(1262) reducing it to a sequence of two infima.Our strategy is to instead divide problem (1020) into two and thenalternate their solution:minimizeΥsubject to δ(Υ) ∈∥∥δ(Υ) − π ( δ(R T ΛR) )∥ ∥ 2⎡ ⎤⎣ Rρ+1 +0 ⎦ ∩ ∂HR −(a)(1021)minimize ‖Υ − R T ΛR‖ 2 FRsubject to δ(QRΥR T Q T ) = 0R −1 = R T(b)

420 CHAPTER 7. EDM PROXIMITYwhere π is the permutation operator from7.1.3 arranging its vectorargument in nonincreasing order. (Recall, any permutation matrix is anorthogonal matrix.)We justify disappearance of the equality constraint in convex optimizationproblem (1021a): From the arguments in7.1.3 with regard⎡to⎤π thepermutation operator, cone membership constraint δ(Υ) ∈⎣ Rρ+1 +0 ⎦∩ ∂HR −is equivalent to⎡ ⎤δ(Υ) ∈ ⎣ Rρ+1 +0 ⎦ ∩ ∂H ∩ K M (1022)R −where K M is the monotone cone (2.13.9.4.2). Membership to the polyhedralcone of majorization (A.1.1 no.27)K ∗ λδ = ∂H ∩ K ∗ M+ (1033)where K ∗ M+ is the dual monotone nonnegative cone (2.13.9.4.1), is acondition (in the absence of a main-diagonal [ ] δ constraint) insuring existence0 1Tof a symmetric hollow matrix . Intersection of this feasible1 −D⎡ ⎤superset ⎣ Rρ+1 +0 ⎦∩ ∂H ∩ K M with K ∗ λδ is a benign operation; id est,R −∂H ∩ K ∗ M+ ∩ K M = ∂H ∩ K M (1023)verifiable by observing conic dependencies (2.10.3) among the aggregate ofhalfspace-description normals. The cone membership constraint in (1021a)therefore insures existence of a symmetric hollow matrix.Optimization (1021b) would be a Procrustes problem (C.5) were it notfor the equality constraint; it is, instead, a minimization over the intersectionof the nonconvex manifold of orthogonal matrices with another nonconvexset specified by the equality constraint. That is the first obstacle.The second obstacle is deriving conditions for convergence of the iterationto the global minimum.

7.4. CONCLUSION 4217.4 ConclusionAt the present time, the compounded difficulties presented by existence ofrank constraints in projection problems appear insurmountable. [229,4]There has been little progress since the discovery by Eckart & Young in 1936[67] of a formula for projection on a rank ρ subset (2.9.2.1) of the positivesemidefinite cone. The only method presently available for solving proximityproblems, having constraints on rank, is based on their discovery (Problem 1,7.1,4.13). The importance and application of solving rank-constrainedprojection problems are enormous, a conclusion generally accepted gratis bythe mathematics and engineering communities. We, of course, encouragefuture research be directed to this area.

422 CHAPTER 7. EDM PROXIMITY

Appendix ALinear algebraA.1 Main-diagonal δ operator, trace, vecWe introduce notation δ denoting the main-diagonal linear self-adjointoperator. When linear function δ operates on a square matrix A ∈ R N×N ,δ(A) returns a vector composed of all the entries from the main diagonal inthe natural order;δ(A) ∈ R N (1024)Operating on a vector, δ naturally returns a diagonal matrix. Operatingrecursively on a vector Λ∈ R N or diagonal matrix Λ∈ R N×N , δ(δ(Λ))returns Λ itself;δ 2 (Λ) ≡ δ(δ(Λ)) ∆ = Λ (1025)Defined in this manner, main-diagonal linear operator δ is self-adjoint[140,3.10,9.5-1]; A.1 videlicet, for y ∈ R N (2.2)δ(A) T y = 〈δ(A), y〉 = 〈A , δ(y)〉 = tr ( A T δ(y) ) (1026)A.1 Linear operator T : R m×n → R M×N is self-adjoint when, for each and everyX 1 , X 2 ∈ R m×n 〈T(X 1 ), X 2 〉 = 〈X 1 , T(X 2 )〉2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.423

424 APPENDIX A. LINEAR ALGEBRAA.1.1IdentitiesThis δ notation is efficient and unambiguous as illustrated in the followingexamples where A ◦ B denotes Hadamard product [125] [88,1.1.4] ofmatrices of like size, ⊗ the Kronecker product (D.1.2.1), y a vector, X amatrix, e i the i th member of the standard basis for R n , S N h the symmetrichollow subspace, σ a vector of (nonincreasingly) ordered singular values,and λ denotes a vector of nonincreasingly ordered eigenvalues:1. δ(A) = δ(A T )2. tr(A) = tr(A T ) = δ(A) T 13. 〈I, A〉 = trA4. δ(cA) = cδ(A), c∈R5. tr(c √ A T A) = c tr √ A T A = c1 T σ(A), c∈R6. tr(cA) = c tr(A) = c1 T λ(A), c∈R7. δ(A + B) = δ(A) + δ(B)8. tr(A + B) = tr(A) + tr(B)9. δ(AB) = (A ◦ B T )1 = (B T ◦ A)110. δ(AB) T = 1 T (A T ◦ B) = 1 T (B ◦ A T )⎡ ⎤u 1 v 111. δ(uv T ⎢ ⎥) = ⎣ . ⎦ = u ◦ v , u,v ∈ R Nu N v N12. tr(A T B) = tr(AB T ) = tr(BA T ) = tr(B T A)= 1 T (A ◦ B)1 = 1 T δ(AB T ) = δ(A T B) T 1 = δ(BA T ) T 1 = δ(B T A) T 113. D = [d ij ] ∈ S N h , H = [h ij ] ∈ S N h , V = I − 1 N 11T ∈ S N (conferB.4.2 no.20)N tr(−V (D ◦ H)V ) = tr(D T H) = 1 T (D ◦ H)1 = tr(11 T (D ◦ H)) = ∑ d ij h iji,j14. y T Bδ(A) = tr ( Bδ(A)y T) = tr ( δ(B T y)A ) = tr ( Aδ(B T y) )= δ(A) T B T y = tr ( yδ(A) T B T) = tr ( A T δ(B T y) ) = tr ( δ(B T y)A T)

A.1. MAIN-DIAGONAL δ OPERATOR, TRACE, VEC 42515. δ 2 (A T A) = ∑ ie i e T iA T Ae i e T i16. δ ( δ(A)1 T) = δ ( 1δ(A) T) = δ(A)17. δ(A1)1 = δ(A11 T ) = A1 , δ(y)1 = δ(y1 T ) = y18. δ(I1) = δ(1) = I19. δ(e i e T j 1) = δ(e i ) = e i e T i20. vec(AXB) = (B T ⊗ A) vec X21. vec(BXA) = (A T ⊗ B) vec X22. tr(AXBX T ) = vec(X) T vec(AXB) = vec(X) T (B T ⊗ A) vec X [94]23.tr(AX T BX) = vec(X) T vec(BXA) = vec(X) T (A T ⊗ B) vec X= δ ( vec(X) vec(X) T (A T ⊗ B) ) T124. For ζ =[ζ i ]∈ R k and x=[x i ]∈ R k ,∑ζ i /x i = ζ T δ(x) −1 1i25. Let λ(A)∈ C N denote the eigenvalues of A ∈ R N×N . Thenδ(A) = λ(I ◦A) (1027)26. For any permutation matrix Ξ and dimensionally compatible vector yor matrix Aδ(Ξy) = Ξδ(y) Ξ T (1028)δ(ΞAΞ T ) = Ξδ(A) (1029)So given any permutation matrix Ξ and any dimensionally compatiblematrix B , for example,δ 2 (B) = Ξδ 2 (Ξ T B Ξ)Ξ T (1030)

426 APPENDIX A. LINEAR ALGEBRA27. Theorem (Schur). Majorization. [261,7.4] [125,4.3] [126,5.5]Let λ ∈ R N denote a vector of eigenvalues and let δ ∈ R N denote avector of main diagonal entries, both arranged in nonincreasing order.Then∃A∈ S N λ(A)=λ and δ(A)= δ ⇐ λ − δ ∈ K ∗ λδ (1031)and converselyA∈ S N ⇒ λ(A) − δ(A) ∈ K ∗ λδ (1032)the pointed (empty interior) polyhedral cone of majorization(confer (256))∆= K ∗ M+ ∩ {ζ1 | ζ ∈ R} ∗ (1033)K ∗ λδwhere K ∗ M+ is the dual monotone nonnegative cone (353), and wherethe dual of the line is a hyperplane; ∂H = {ζ1 | ζ ∈ R} ∗ = 1 ⊥ . ⋄The majorization cone K ∗ λδ is naturally consequent to the definitionof majorization; id est, vector y ∈ R N majorizes vector x if and only ifk∑x i ≤i=1k∑y i ∀ 1 ≤ k ≤ N (1034)i=1and1 T x = 1 T y (1035)Under these circumstances, rather, vector x is majorized by vector y .In the particular circumstance δ(A)=0, we get:Corollary. Symmetric hollow majorization.Let λ ∈ R N denote a vector of eigenvalues arranged in nonincreasingorder. Then∃A∈ S N h λ(A)=λ ⇐ λ ∈ K ∗ λδ (1036)and converselywhere K ∗ λδis defined in (1033).⋄A∈ S N h ⇒ λ(A) ∈ K ∗ λδ (1037)

A.2. SEMIDEFINITENESS: DOMAIN OF TEST 427A.2 Semidefiniteness: domain of testThe most fundamental necessary, sufficient, and definitive test for positivesemidefiniteness of matrix A ∈ R n×n is: [126,1]x T Ax ≥ 0 for each and every x ∈ R n such that ‖x‖ = 1. (1038)Traditionally, authors demand evaluation over broader domain; namely,over all x ∈ R n which is sufficient but unnecessary. Indeed, that standardtextbook requirement is far over-reaching because if x T Ax is nonnegative forparticular x=x p , then it is nonnegative for any αx p where α ∈R . Thus,only normalized x in R n need be evaluated.Many authors add the further requirement that the domain be complex;the broadest domain. By so doing, only Hermitian matrices (A H = A wheresuperscript H denotes conjugate transpose) A.2 are admitted to the set ofpositive semidefinite matrices (1041); an unnecessary prohibitive condition.A.2.1Symmetry versus semidefinitenessWe call (1038) the most fundamental test of positive semidefiniteness. Yetsome authors instead say, for real A and complex domain (x∈ C n ), thecomplex test x H Ax≥0 is most fundamental. That complex broadening of thedomain of test causes nonsymmetric real matrices to be excluded from the setof positive semidefinite matrices. Yet admitting nonsymmetric real matricesor not is a matter of preference A.3 unless that complex test is adopted, as weshall now explain.Any real square matrix A has a representation in terms of its symmetricand antisymmetric parts; id est,A = (A +AT )2+ (A −AT )2(43)Because, for all real A , the antisymmetric part vanishes under real test,x T (A −AT )2x = 0 (1039)A.2 Hermitian symmetry is the complex analogue; the real part of a Hermitian matrixis symmetric while its imaginary part is antisymmetric. A Hermitian matrix has realeigenvalues and real main diagonal.A.3 Golub & Van Loan [88,4.2.2], for example, admit nonsymmetric real matrices.

428 APPENDIX A. LINEAR ALGEBRAonly the symmetric part of A , (A +A T )/2, has a role determining positivesemidefiniteness. Hence the oft-made presumption that only symmetricmatrices may be positive semidefinite is, of course, erroneous under (1038).Because eigenvalue-signs of a symmetric matrix translate unequivocally toits semidefiniteness, the eigenvalues that determine semidefiniteness arealways those of the symmetrized matrix. (A.3) For that reason, andbecause symmetric (or Hermitian) matrices must have real eigenvalues,the convention adopted in the literature is that semidefinite matrices aresynonymous with symmetric semidefinite matrices. Certainly misleadingunder (1038), that presumption is typically bolstered with compellingexamples from the physical sciences where symmetric matrices occur withinthe mathematical exposition of natural phenomena. A.4 [76,52]Perhaps a better explanation of this pervasive presumption of symmetrycomes from Horn & Johnson [125,7.1] whose perspective A.5 is the complexmatrix, thus necessitating the complex domain of test throughout theirtreatise. They explain, if A∈C n×n...and if x H Ax is real for all x ∈ C n , then A is Hermitian.Thus, the assumption that A is Hermitian is not necessary in thedefinition of positive definiteness. It is customary, however.Their comment is best explained by noting, the real part of x H Ax comesfrom the Hermitian part (A +A H )/2 of A ;rather,Re(x H Ax) = x H A +AH2x (1040)x H Ax ∈ R ⇔ A H = A (1041)because the imaginary part of x H Ax comes from the anti-Hermitian part(A −A H )/2 ;Im(x H Ax) = x H A −AH x2(1042)that vanishes for nonzero x if and only if A = A H . So the Hermitiansymmetry assumption is unnecessary, according to Horn & Johnson, notA.4 Symmetric matrices are certainly pervasive in the our chosen subject as well.A.5 A totally complex perspective is not necessarily more advantageous. The positivesemidefinite cone, for example, is not self-dual (2.13.5) in the ambient space of Hermitianmatrices. [120,II]

A.2. SEMIDEFINITENESS: DOMAIN OF TEST 429because nonHermitian matrices could be regarded positive semidefinite,rather because nonHermitian (includes nonsymmetric real) matrices are notcomparable on the real line under x H Ax . Yet that complex edifice isdismantled in the test of real matrices (1038) because the domain of testis no longer necessarily complex; meaning, x T Ax will certainly always bereal, regardless of symmetry, and so real A will always be comparable.In summary, if we limit the domain of test to x in R n as in (1038),then nonsymmetric real matrices are admitted to the realm of semidefinitematrices because they become comparable on the real line. One importantexception occurs for rank-one matrices Ψ=uv T where u and v are realvectors: Ψ is positive semidefinite if and only if Ψ=uu T . (A.3.1.0.7)We might choose to expand the domain of test to x in C n so that onlysymmetric matrices would be comparable. The alternative to expandingdomain of test is to assume all matrices of interest to be symmetric; thatis commonly done, hence the synonymous relationship with semidefinitematrices.A.2.1.0.1 Example. Nonsymmetric positive definite product.Horn & Johnson assert and Zhang agrees:If A,B ∈ C n×n are positive definite, then we know that theproduct AB is positive definite if and only if AB is Hermitian.[125,7.6, prob.10] [261,6.2,3.2]Implicitly in their statement, A and B are assumed individually Hermitianand the domain of test is assumed complex.We prove that assertion to be false for real matrices under (1038) thatadopts a real domain of test.A T = A =⎡⎢⎣3 0 −1 00 5 1 0−1 1 4 10 0 1 4⎤⎥⎦ ,⎡λ(A) = ⎢⎣5.94.53.42.0⎤⎥⎦ (1043)B T = B =⎡⎢⎣4 4 −1 −14 5 0 0−1 0 5 1−1 0 1 4⎤⎥⎦ ,⎡λ(B) = ⎢⎣8.85.53.30.24⎤⎥⎦ (1044)

430 APPENDIX A. LINEAR ALGEBRA(AB) T ≠ AB =⎡⎢⎣13 12 −8 −419 25 5 1−5 1 22 9−5 0 9 17⎤⎥⎦ ,⎡λ(AB) = ⎢⎣36.29.10.0.72⎤⎥⎦ (1045)1(AB + 2 (AB)T ) =⎡⎢⎣13 15.5 −6.5 −4.515.5 25 3 0.5−6.5 3 22 9−4.5 0.5 9 17⎤⎥⎦ , λ( 12 (AB + (AB)T ) ) =⎡⎢⎣36.30.10.0.014(1046)Whenever A∈ S n + and B ∈ S n + , then λ(AB)=λ(A 1/2 BA 1/2 ) will alwaysbe a nonnegative vector by (1067) and Corollary A.3.1.0.5. Yet positivedefiniteness of the product AB is certified instead by the nonnegativeeigenvalues λ ( 12 (AB + (AB)T ) ) in (1046) (A.3.1.0.1) despite the fact ABis not symmetric. A.6 Horn & Johnson and Zhang resolve the anomaly bychoosing to exclude nonsymmetric matrices and products; they do so byexpanding the domain of test to C n .⎤⎥⎦A.3 Proper statementsof positive semidefinitenessUnlike Horn & Johnson and Zhang, we never adopt the complex domainof test in regard to real matrices. So motivated is our consideration ofproper statements of positive semidefiniteness under real domain of test.This restriction, ironically, complicates the facts when compared to thecorresponding statements for the complex case (found elsewhere, [125] [261]).We state several fundamental facts regarding positive semidefiniteness ofreal matrix A and the product AB and sum A +B of real matrices underfundamental real test (1038); a few require proof as they depart from thestandard texts, while those remaining are well established or obvious.A.6 It is a little more difficult to find a counter-example in R 2×2 or R 3×3 ; which mayhave served to advance any confusion.

A.3. PROPER STATEMENTS 431A.3.0.0.1 Theorem. Positive (semi)definite matrix.A∈ S M is positive semidefinite if and only if for each and every real vectorx of unit norm, ‖x‖ = 1 , A.7 we have x T Ax ≥ 0 (1038);A ≽ 0 ⇔ tr(xx T A) = x T Ax ≥ 0 (1047)Matrix A ∈ S M is positive definite if and only if for each and every ‖x‖ = 1we have x T Ax > 0 ;A ≻ 0 ⇔ tr(xx T A) = x T Ax > 0 (1048)Proof. Statements (1047) and (1048) are each a particular instanceof dual generalized inequalities (2.13.2) with respect to the positivesemidefinite cone; videlicet, [233]⋄A ≽ 0 ⇔ 〈xx T , A〉 ≥ 0 ∀xx T (≽ 0)A ≻ 0 ⇔ 〈xx T , A〉 > 0 ∀xx T (≽ 0), xx T ≠ 0(1049)Relations (1047) and (1048) remain true when xx T is replaced with “for eachand every” X ∈ S M + [38,2.6.1] (2.13.5) of unit norm ‖X‖= 1 as inA ≽ 0 ⇔ tr(XA) ≥ 0 ∀X ∈ S M +A ≻ 0 ⇔ tr(XA) > 0 ∀X ∈ S M + , X ≠ 0(1050)but this condition is far more than what is necessary. By the discretemembership theorem in2.13.4.2.1, the extreme directions xx T of the positivesemidefinite cone constitute a minimal set of generators necessary andsufficient for discretization of dual generalized inequalities (1050) certifyingmembership to that cone.A.7 The traditional condition requiring all x∈ R M for defining positive (semi)definitenessis actually far more than what is necessary. The set of norm-1 vectors is necessary andsufficient to establish positive semidefiniteness; actually, any particular norm and anynonzero norm-constant will work.

432 APPENDIX A. LINEAR ALGEBRAA.3.1Semidefiniteness, eigenvalues, nonsymmetricWhen A∈ R n×n , let λ ( 12 (A +AT ) ) ∈ R n denote eigenvalues of thesymmetrized matrix A.8 arranged in nonincreasing order.By positive semidefiniteness of A∈ R n×n we mean, A.9 [171,1.3.1](conferA.3.1.0.1)x T Ax ≥ 0 ∀x∈ R n ⇔ A +A T ≽ 0 ⇔ λ(A +A T ) ≽ 0 (1051)(2.9.0.1)A ≽ 0 ⇒ A T = A (1052)A ≽ B ⇔ A −B ≽ 0 A ≽ 0 or B ≽ 0 (1053)x T Ax≥0 ∀x A T = A (1054)Matrix symmetry is not intrinsic to positive semidefiniteness;A T = A, λ(A) ≽ 0 ⇒ x T Ax ≥ 0 ∀x (1055)If A T = A thenλ(A) ≽ 0 ⇐ A T = A, x T Ax ≥ 0 ∀x (1056)λ(A) ≽ 0 ⇔ A ≽ 0 (1057)meaning, matrix A belongs to the positive semidefinite cone in thesubspace of symmetric matrices if and only if its eigenvalues belong tothe nonnegative orthant.〈A , A〉 = 〈λ(A), λ(A)〉 (1058)For µ∈ R , A∈ R n×n , and vector λ(A)∈ C n holding the orderedeigenvalues of Aλ(I + µA) = 1 + λ(µA) (1059)Proof: A=MJM −1 and I + µA = M(I + µJ)M −1 where Jis the Jordan form for A ; [212,5.6] id est, δ(J) = λ(A) , soλ(I + µA) = δ(I + µJ) because I + µJ is also a Jordan form. A.8 The symmetrization of A is (A +A T )/2. λ ( 12 (A +AT ) ) = λ(A +A T )/2.A.9 Strang agrees [212, p.334] it is not λ(A) that requires observation. Yet he is mistakenby proposing the Hermitian part alone x H (A+A H )x be tested, because the anti-Hermitianpart does not vanish under complex test unless A is Hermitian. (1042)

A.3. PROPER STATEMENTS 433Similarly, λ(µI + A)=µ1 + λ(A). For vector σ(A) holding thesingular values of any matrix A , σ(I + µA T A)= π(|1 + µσ(A T A)|)and σ(µI + A T A)= π(|µ1 + σ(A T A)|) where π is a nonlinear operatorsorting its vector argument into nonincreasing order.For A∈ S M and each and every ‖w‖= 1 [125,7.7, prob.9]w T Aw ≤ µ ⇔ A ≼ µI ⇔ λ(A) ≼ µ1 (1060)[125,2.5.4] (confer (35))A is normal ⇔ ‖A‖ 2 F = λ(A) T λ(A) (1061)For A∈ R m×n A T A ≽ 0, AA T ≽ 0 (1062)because, for dimensionally compatible vector x , x T A T Ax = ‖Ax‖ 2 2 ,x T AA T x = ‖A T x‖ 2 2 .For A∈ R n×n and c∈Rtr(cA) = c tr(A) = c1 T λ(A)(A.1.1 no.6)detA = ∏ iλ(A) i . For m a nonnegative integer,tr(A m ) =n∑λ m i (1063)i=1For A diagonalizable, (confer [212, p.255])rankA = rankδ(λ(A)) (1064)meaning, rank is the same as the number of nonzero eigenvalues invector λ(A) by the 0 eigenvalues theorem (A.7.2.0.1).

434 APPENDIX A. LINEAR ALGEBRA(Fan) For A,B∈ S n [36,1.2] (confer (1326))tr(AB) ≤ λ(A) T λ(B) (1065)with equality (Theobald) when A and B are simultaneouslydiagonalizable [125] with the same ordering of eigenvalues.For A∈ R m×n and B ∈ R n×mtr(AB) = tr(BA) (1066)and η eigenvalues of the product and commuted product are identical,including their multiplicity; [125,1.3.20] id est,λ(AB) 1:η = λ(BA) 1:η , η ∆ =min{m , n} (1067)Any eigenvalues remaining are zero. By the 0 eigenvalues theorem(A.7.2.0.1),rank(AB) = rank(BA), AB and BA diagonalizable (1068)For any compatible matrices A,B [125,0.4]min{rankA, rankB} ≥ rank(AB) (1069)For A,B ∈ S n + (206)rankA + rankB ≥ rank(A + B) ≥ min{rankA, rankB} ≥ rank(AB)(1070)For A,B ∈ S n + linearly independent, each rank-1, (B.1.1)rankA + rankB = rank(A + B) > min{rankA, rankB} = rank(AB)(1071)

A.3. PROPER STATEMENTS 435For A∈ R m×n having no nullspace, and for any B ∈ R n×krank(AB) = rank(B) (1072)Proof. For any compatible matrix C , N(CAB)⊇ N(AB)⊇ N(B)is obvious. By assumption ∃A † A † A = I . Let C = A † , thenN(AB)= N(B) and the stated result follows by conservation ofdimension (1173).For A∈ S n and any nonsingular matrix Yinertia(A) = inertia(YAY T ) (1073)a.k.a, Sylvester’s law of inertia. (1114) [61,2.4.3]For A,B∈R n×n square,Yet for A∈ R m×n and B ∈ R n×m [44, p.72]det(AB) = det(BA) (1074)det(I + AB) = det(I + BA) (1075)For A,B ∈ S n , product AB is symmetric if and only if AB iscommutative;(AB) T = AB ⇔ AB = BA (1076)Proof. (⇒) Suppose AB=(AB) T . (AB) T =B T A T =BA .AB=(AB) T ⇒ AB=BA .(⇐) Suppose AB=BA . BA=B T A T =(AB) T . AB=BA ⇒AB=(AB) T .Commutativity alone is insufficient for symmetry of the product.[212, p.26] Diagonalizable matrices A,B ∈ R n×n commute if and onlyif they are simultaneously diagonalizable. [125,1.3.12]

436 APPENDIX A. LINEAR ALGEBRAFor A,B ∈ R n×n and AB = BAx T Ax ≥ 0, x T Bx ≥ 0 ∀x ⇒ λ(A+A T ) i λ(B+B T ) i ≥ 0 ∀i x T ABx ≥ 0 ∀x(1077)the negative result arising because of the schism between the productof eigenvalues λ(A + A T ) i λ(B + B T ) i and the eigenvalues of thesymmetrized matrix product λ(AB + (AB) T ) i. For example, X 2 isgenerally not positive semidefinite unless matrix X is symmetric; then(1062) applies. Simply substituting symmetric matrices changes theoutcome:For A,B ∈ S n and AB = BAA ≽ 0, B ≽ 0 ⇒ λ(AB) i =λ(A) i λ(B) i ≥0 ∀i ⇔ AB ≽ 0 (1078)Positive semidefiniteness of A and B is sufficient but not a necessarycondition for positive semidefiniteness of the product AB .Proof. Because all symmetric matrices are diagonalizable, (A.5.2)[212,5.6] we have A=SΛS T and B=T∆T T , where Λ and ∆ arereal diagonal matrices while S and T are orthogonal matrices. Because(AB) T =AB , then T must equal S , [125,1.3] and the eigenvaluesof A are ordered identically to those of B ; id est, λ(A) i = δ(Λ) i andλ(B) i =δ(∆) i correspond to the same eigenvector.(⇒) Assume λ(A) i λ(B) i ≥0 for i=1... n . AB=SΛ∆S T issymmetric and has nonnegative eigenvalues contained in diagonalmatrix Λ∆ by assumption; hence positive semidefinite by (1051). Nowassume A,B ≽0. That, of course, implies λ(A) i λ(B) i ≥0 for all ibecause all the individual eigenvalues are nonnegative.(⇐) Suppose AB=SΛ∆S T ≽ 0. Then Λ∆≽0 by (1051),and so all products λ(A) i λ(B) i must be nonnegative; meaning,sgn(λ(A))= sgn(λ(B)). We may, therefore, conclude nothing aboutthe semidefiniteness of A and B .For A,B ∈ S n and A ≽ 0, B ≽ 0 (Example A.2.1.0.1)AB = BA ⇒ λ(AB) i =λ(A) i λ(B) i ≥ 0 ∀i ⇒ AB ≽ 0 (1079)AB = BA ⇒ λ(AB) i ≥ 0, λ(A) i λ(B) i ≥ 0 ∀i ⇔ AB ≽ 0 (1080)

A.3. PROPER STATEMENTS 437For A,B ∈ S n [261,6.2]A ≽ 0 ⇒ trA ≥ 0 (1081)A ≽ 0, B ≽ 0 ⇒ trA trB ≥ tr(AB) ≥ 0 (1082)Because A ≽ 0, B ≽ 0 ⇒ λ(AB)= λ(A 1/2 BA 1/2 ) ≽ 0 by (1067) andCorollary A.3.1.0.5, then we have tr(AB) ≥ 0.For A,B,C ∈ S n (Löwner)A ≽ 0 ⇔ tr(AB)≥ 0 ∀B ≽ 0 (299)A ≼ B , B ≼ C ⇒ A ≼ C (1083)A ≼ B ⇔ A + C ≼ B + C (1084)A ≼ B , A ≽ B ⇒ A = B (1085)For A,B ∈ R n×n x T Ax ≥ x T Bx ∀x ⇒ trA ≥ trB (1086)Proof. x T Ax≥x T Bx ∀x ⇔ λ((A −B) + (A −B) T )/2 ≽ 0 ⇒tr(A+A T −(B+B T ))/2 = tr(A −B)≥0. There is no converse. For A,B ∈ S n [261,6.2, prob.1] (Theorem A.3.1.0.4)A ≽ B ⇒ trA ≥ trB (1087)A ≽ B ⇒ δ(A) ≽ δ(B) (1088)There is no converse, and restriction to the positive semidefinite conedoes not improve the situation. The all-strict versions hold. From[261,6.2]A ≽ B ≽ 0 ⇒ rankA ≥ rankB (1089)A ≽ B ≽ 0 ⇒ detA ≥ detB ≥ 0 (1090)A ≻ B ≽ 0 ⇒ detA > detB ≥ 0 (1091)For A,B ∈ int S n + [24,4.2] [125,7.7.4]A ≽ B ⇔ A −1 ≼ B −1 (1092)

438 APPENDIX A. LINEAR ALGEBRAFor A,B ∈ S n [261,6.2]A ≽ B ≽ 0 ⇒ A 1/2 ≽ B 1/2 (1093)For A,B ∈ S n and AB = BA [261,6.2, prob.3]A ≽ B ≽ 0 ⇒ A k ≽ B k , k=1, 2,... (1094)A.3.1.0.1 Theorem. Positive semidefinite ordering of eigenvalues.For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M . Then, [212,6]x T Ax ≥ 0 ∀x ⇔ λ ( A +A T) ≽ 0 (1095)x T Ax > 0 ∀x ≠ 0 ⇔ λ ( A +A T) ≻ 0 (1096)because x T (A −A T )x=0. (1039) Now arrange the entries of λ ( 1(A 2 +AT ) )and λ ( 1(B 2 +BT ) ) in nonincreasing order so λ ( 1(A 2 +AT ) ) holds the1largest eigenvalue of symmetrized A while λ ( 1(B 2 +BT ) ) holds the largest1eigenvalue of symmetrized B , and so on. Then [125,7.7, prob.1, prob.9]for κ ∈ Rx T Ax ≥ x T Bx ∀x ⇒ λ ( A +A T) ≽ λ ( B +B T)x T Ax ≥ x T Ixκ ∀x ⇔ λ ( 12 (A +AT ) ) ≽ κ1(1097)Now let A,B ∈ S M have diagonalizations A=QΛQ T and B =UΥU T withλ(A)=δ(Λ) and λ(B)=δ(Υ) arranged in nonincreasing order. ThenA ≽ B ⇔ λ(A−B) ≽ 0 (1098)A ≽ B ⇒ λ(A) ≽ λ(B) (1099)A ≽ B λ(A) ≽ λ(B) (1100)S T AS ≽ B ⇐ λ(A) ≽ λ(B) (1101)where S = QU T . [261,7.5]⋄

A.3. PROPER STATEMENTS 439A.3.1.0.2 Theorem. (Weyl) Eigenvalues of sum. [125,4.3]For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M in nonincreasingorder so λ ( 1(A 2 +AT ) ) holds the largest eigenvalue of symmetrized A while1λ ( 1(B 2 +BT ) ) holds the largest eigenvalue of symmetrized B , and so on.1Then, for any k ∈{1... M }λ ( A +A T) k + λ( B +B T) M ≤ λ( (A +A T ) + (B +B T ) ) k ≤ λ( A +A T) k + λ( B +B T) 1(1102)Weyl’s theorem establishes positive semidefiniteness of a sum of positivesemidefinite matrices. Because S M + is a convex cone (2.9.0.0.1), thenby (138)A,B ≽ 0 ⇒ ζA + ξB ≽ 0 for all ζ,ξ ≥ 0 (1103)⋄A.3.1.0.3 Corollary. Eigenvalues of sum and difference. [125,4.3]For A∈ S M and B ∈ S M + , place the eigenvalues of each matrix into therespective vectors λ(A),λ(B) ∈ R M in nonincreasing order so λ(A) 1 holdsthe largest eigenvalue of A while λ(B) 1 holds the largest eigenvalue of B ,and so on. Then, for any k ∈{1... M}λ(A − B) k ≤ λ(A) k ≤ λ(A +B) k (1104)⋄When B is rank-one positive semidefinite, the eigenvalues interlace; id est,for B = qq Tλ(A) k−1 ≤ λ(A − qq T ) k ≤ λ(A) k ≤ λ(A + qq T ) k ≤ λ(A) k+1 (1105)

440 APPENDIX A. LINEAR ALGEBRAA.3.1.0.4 Theorem. Positive (semi)definite principal submatrices. A.10A∈ S M is positive semidefinite if and only if all M principal submatricesof dimension M−1 are positive semidefinite and detA is nonnegative.A ∈ S M is positive definite if and only if any one principal submatrixof dimension M −1 is positive definite and detA is positive. ⋄If any one principal submatrix of dimension M−1 is not positive definite,conversely, then A can neither be. Regardless of symmetry, if A ∈ R M×M ispositive (semi)definite, then the determinant of each and every principalsubmatrix is (nonnegative) positive. [171,1.3.1]A.3.1.0.5Corollary. Positive (semi)definite symmetric products.If A∈ S M is positive definite and any particular dimensionallycompatible matrix Z has no nullspace, then Z T AZ is positive definite.If matrix A∈ S M is positive (semi)definite then, for any matrix Z ofcompatible dimension, Z T AZ is positive semidefinite.A∈ S M is positive (semi)definite if and only if there exists a nonsingularZ such that Z T AZ is positive (semi)definite. [125, p.399]If A ∈ S M is positive semidefinite and singular it remains possible, forsome skinny Z ∈ R M×N with N

A.3. PROPER STATEMENTS 441We can deduce from these, given nonsingular matrix Z and any particulardimensionally[ ]compatible Y : matrix A∈ S M is positive semidefinite if andZTonly ifY T A[Z Y ] is positive semidefinite.Products such as Z † Z and ZZ † are symmetric and positive semidefinitealthough, given A ≽ 0, Z † AZ and ZAZ † are neither necessarily symmetricor positive semidefinite.A.3.1.0.6 Theorem. Symmetric projector semidefinite. [15,III][16,6] [137, p.55] For symmetric idempotent matrices P and RP,R ≽ 0P ≽ R ⇔ R(P ) ⊇ R(R) ⇔ N(P ) ⊆ N(R)(1106)Projector P is never positive definite [214,6.5, prob.20] unless it is theidentity matrix.⋄A.3.1.0.7 Theorem. Symmetric positive semidefinite.Given real matrix Ψ with rank Ψ = 1Ψ ≽ 0 ⇔ Ψ = uu T (1107)where u is a real vector.⋄Proof. Any rank-one matrix must have the form Ψ = uv T . (B.1)Suppose Ψ is symmetric; id est, v = u . For all y ∈ R M , y T uu T y ≥ 0.Conversely, suppose uv T is positive semidefinite. We know that can hold ifand only if uv T + vu T ≽ 0 ⇔ for all normalized y ∈ R M , 2y T uv T y ≥ 0 ;but that is possible only if v = u .The same does not hold true for matrices of higher rank, as Example A.2.1.0.1shows.

442 APPENDIX A. LINEAR ALGEBRAA.4 Schur complementConsider the block matrix G : Given A T = A and C T = C , then [37][ ] A BG =B T ≽ 0C(1108)⇔ A ≽ 0, B T (I −AA † ) = 0, C −B T A † B ≽ 0⇔ C ≽ 0, B(I −CC † ) = 0, A−BC † B T ≽ 0where A † denotes the Moore-Penrose (pseudo)inverse (E). In the firstinstance, I − AA † is a symmetric projection matrix orthogonally projectingon N(A T ). (1471) It is apparently requiredR(B) ⊥ N(A T ) (1109)which precludes A = 0 when B is any nonzero matrix. Note that A ≻ 0 ⇒A † =A −1 ; thereby, the projection matrix vanishes. Likewise, in the secondinstance, I − CC † projects orthogonally on N(C T ). It is requiredR(B T ) ⊥ N(C T ) (1110)which precludes C = 0 for B nonzero. Again, C ≻ 0 ⇒ C † = C −1 . So weget, for A or C nonsingular,[ ] A BG =≽ 0CB T⇔A ≻ 0, C −B T A −1 B ≽ 0orC ≻ 0, A−BC −1 B T ≽ 0(1111)When A is full-rank then, for all B of compatible dimension, R(B) is inR(A). Likewise, when C is full-rank, R(B T ) is in R(C). Thus the flavor,for A and C nonsingular,[ ] A BG =≻ 0CB T⇔ A ≻ 0, C −B T A −1 B ≻ 0⇔ C ≻ 0, A−BC −1 B T ≻ 0(1112)where C − B T A −1 B is called the Schur complement of A in G , while theSchur complement of C in G is A − BC −1 B T . [82,4.8]

A.4. SCHUR COMPLEMENT 443A.4.0.0.1 Example. Sparse Schur conditions.Setting matrix A to the identity simplifies the Schur conditions.consequence relates the definiteness of three quantities:[ ] I B≽ 0 ⇔CB TC − B T B ≽ 0 ⇔[ I 00 T C −B T BOne]≽ 0 (1113)Origin of the term Schur complement is from complementary inertia:[61,2.4.4] Defineinertia ( G∈ S M) ∆ = {p,z,n} (1114)where p,z,n respectively represent the number of positive, zero, andnegative eigenvalues of G ; id est,Then, when C is invertible,and when A is invertible,M = p + z + n (1115)inertia(G) = inertia(C) + inertia(A − BC −1 B T ) (1116)inertia(G) = inertia(A) + inertia(C − B T A −1 B) (1117)When A=C =0, denoting by σ(B)∈ R m the nonincreasingly orderedsingular values of matrix B ∈ R m×m , then we have the eigenvalues[36,1.2, prob.17]([ 0 Bλ(G) = λB T 0])=[σ(B)−Ξσ(B)](1118)andinertia(G) = inertia(B T B) + inertia(−B T B) (1119)where Ξ is the order-reversing permutation matrix defined in (1313).

444 APPENDIX A. LINEAR ALGEBRAA.4.0.0.2 Theorem. Rank of partitioned matrices.When symmetric matrix A is invertible and C is symmetric,[ ] [ ]A B A 0rankB T = rankC 0 T C −B T A −1 B(1120)equals rank of a block on the main diagonal plus rank of its Schurcomplement. Similarly, when symmetric matrix C is invertible and A issymmetric,[ A BrankB T C][ A − BC= rank−1 B T 00 T C](1121)Proof. The first assertion (1120) holds if and only if [125,0.4.6(c)][ ] [ ]A B A 0∃ nonsingular X,Y X Y =C 0 T C −B T A −1 (1122)BB T⋄Let [125,7.7.6]Y = X T =[ I −A −1 B0 T I](1123)A.4.0.0.3 Lemma. Rank of Schur-form block. [74] [72]Matrix B ∈ R m×n has rankB≤ ρ if and only if there exist matrices A∈ S mand C ∈ S n such thatrankA + rankC ≤ 2ρ and[ ] A BB T ≽ 0C(1124)⋄A.4.1Semidefinite program via SchurSchur complement (1108) can be used to convert a projection problemto an optimization problem in epigraph form. Suppose, for example,we are presented with the constrained projection problem studied byHayden & Wells in [109] (who provide analytical solution): Given A∈ R M×Mand some full-rank matrix S ∈ R M×L with L < M

A.4. SCHUR COMPLEMENT 445minimize ‖A − X‖ 2X∈ S MFsubject to S T XS ≽ 0(1125)Variable X is constrained to be positive semidefinite, but only on a subspacedetermined by S . First we write the epigraph form (3.1.1.3):minimize tX∈ S M , t∈Rsubject to ‖A − X‖ 2 F ≤ tS T XS ≽ 0(1126)Next we use the Schur complement [174,6.4.3] [153] and matrixvectorization (2.2):minimizeX∈ S M , t∈Rsubject tot[tI vec(A − X)vec(A − X) T 1]≽ 0(1127)S T XS ≽ 0This semidefinite program is an epigraph form in disguise, equivalentto (1125).Were problem (1125) instead equivalently expressed without the squareminimize ‖A − X‖ FX∈ S Msubject to S T XS ≽ 0(1128)then we get a subtle variation:minimize tX∈ S M , t∈Rsubject to ‖A − X‖ F ≤ tS T XS ≽ 0(1129)that leads to an equivalent for (1128)minimizeX∈ S M , t∈Rsubject tot[tI vec(A − X)vec(A − X) T t]≽ 0(1130)S T XS ≽ 0

446 APPENDIX A. LINEAR ALGEBRAA.4.1.0.1 Example. Schur anomaly.Consider the problem abstract in the convex constraint, given symmetricmatrix Aminimize ‖X‖ 2X∈ S M F − ‖A − X‖2 Fsubject to X ∈ Ca difference of two functions convex in matrix X . Observe equality(1131)‖X‖ 2 F − ‖A − X‖ 2 F = 2 tr(XA) − ‖A‖ 2 F (1132)so problem (1131) is equivalent to the convex optimizationminimize tr(XA)X∈ S Msubject to X ∈ C(1133)But this problem (1131) does not have a Schur formminimize t − αX∈ S M , α , tsubject to X ∈ C‖X‖ 2 F ≤ t(1134)‖A − X‖ 2 F ≥ αbecause the constraint in α is nonconvex. (2.9.1.0.1)A.4.2Determinant[ A BG =CB T](1135)We consider again a matrix G partitioned similarly to (1108), but notnecessarily positive (semi)definite, where A and C are symmetric.

A.4. SCHUR COMPLEMENT 447When A is invertible,detG = detA det(C − B T A −1 B) (1136)When C is invertible,detG = detC det(A − BC −1 B T ) (1137)When B is full-rank and skinny, C = 0, and A ≽ 0, then [38,10.1.1]detG ≠ 0 ⇔ A + BB T ≻ 0 (1138)When B is a (column) vector, then for all C ∈ R and all A of dimensioncompatible with Gwhile for C ≠ 0detG = det(A)C − B T A T cofB (1139)detG = C det(A − 1 C BBT ) (1140)where A cof is the matrix of cofactors [212,4] corresponding to A .When B is full-rank and fat, A = 0, and C ≽ 0, thendetG ≠ 0 ⇔ C + B T B ≻ 0 (1141)When B is a row vector, then for A ≠ 0 and all C of dimensioncompatible with GdetG = A det(C − 1 A BT B) (1142)while for all A∈ RdetG = det(C)A − BCcofB T T (1143)where C cof is the matrix of cofactors corresponding to C .

448 APPENDIX A. LINEAR ALGEBRAA.5 eigen decompositionWhen a square matrix X ∈ R m×m is diagonalizable, [212,5.6] then⎡ ⎤w T1 m∑X = SΛS −1 = [ s 1 · · · s m ] Λ⎣. ⎦ = λ i s i wi T (1144)where s i ∈ C m are linearly independent (right-)eigenvectors A.12 constitutingthe columns of S ∈ C m×m defined byw T mi=1XS = SΛ (1145)w T i ∈ C m are linearly independent left-eigenvectors of X constituting the rowsof S −1 defined by [125]S −1 X = ΛS −1 (1146)and where λ i ∈ C are eigenvalues (populating diagonal matrix Λ∈ C m×m )corresponding to both left and right eigenvectors.There is no connection between diagonalizability and invertibility of X .[212,5.2] Diagonalizability is guaranteed by a full set of linearly independenteigenvectors, whereas invertibility is guaranteed by all nonzero eigenvalues.distinct eigenvalues ⇒ l.i. eigenvectors ⇔ diagonalizablenot diagonalizable ⇒ repeated eigenvalue(1147)A.5.0.0.1 Theorem. Real eigenvector. Eigenvectors of a real matrixcorresponding to real eigenvalues must be real.⋄Proof. Ax = λx . Given λ=λ ∗ , x H Ax = λx H x = λ‖x‖ 2 = x T Ax ∗x = x ∗ , where x H =x ∗T . The converse is equally simple. ⇒A.5.0.1UniquenessFrom the fundamental theorem of algebra it follows: Eigenvalues, includingtheir multiplicity, for a given square matrix are unique; meaning, there is noother set of eigenvalues for that matrix. (Conversely, many different matricesmay share the same unique set of eigenvalues.)Uniqueness of eigenvectors, in contrast, disallows multiplicity of the samedirection.A.12 Eigenvectors must, of course, be nonzero. The prefix eigen is from the German; inthis context meaning, something akin to “characteristic”. [209, p.14]

A.5. EIGEN DECOMPOSITION 449A.5.0.1.1 Definition. Unique eigenvectors.When eigenvectors are unique, we mean unique to within a real nonzeroscaling and their directions are distinct.△If S is a matrix of eigenvectors of X as in (1144), for example, then −Sis certainly another matrix of eigenvectors decomposing X with the sameeigenvalues.For any square matrix, [209, p.220]distinct eigenvalues ⇒ eigenvectors unique (1148)Eigenvectors corresponding to a repeated eigenvalue are not unique for adiagonalizable matrix;repeated eigenvalue ⇒ eigenvectors not unique (1149)Proof follows from the observation: any linear combination of distincteigenvectors of diagonalizable X , corresponding to a particular eigenvalue,produces another eigenvector. For eigenvalue λ whose multiplicity A.13dim N(X −λI) exceeds 1, in other words, any choice of independentvectors from N(X − λI) (of the same multiplicity) constitutes eigenvectorscorresponding to λ .Caveat diagonalizability insures linear independence which impliesexistence of distinct eigenvectors. We may conclude, for diagonalizablematrices,distinct eigenvalues ⇔ eigenvectors unique (1150)A.5.1EigenmatrixThe (right-)eigenvectors {s i } are naturally orthogonal to the left-eigenvectors{w i } except, for i = 1... m , w T i s i = 1 ; called a biorthogonality condition[238,2.2.4] [125] because neither set of left or right eigenvectors is necessarilyan orthogonal set. Consequently, each dyad from a diagonalization is anindependent (B.1.1) nonorthogonal projector becauseA.13 For a diagonalizable matrix, algebraic multiplicity is the same as geometric multiplicity.[209, p.15]

450 APPENDIX A. LINEAR ALGEBRAs i w T i s i w T i = s i w T i (1151)(whereas the dyads of singular value decomposition are not inherentlyprojectors (confer (1155))).The dyads of eigen decomposition can be termed eigenmatrices becauseX s i w T i = λ i s i w T i (1152)A.5.2Symmetric matrix diagonalizationThe set of normal matrices is, precisely, that set of all real matrices havinga complete orthonormal set of eigenvectors; [261,8.1] [214, prob.10.2.31]e.g., orthogonal and circulant matrices [96]. All normal matrices arediagonalizable. A symmetric matrix is a special normal matrix whoseeigenvalues must be real and whose eigenvectors can be chosen to make areal orthonormal set; [214,6.4] [212, p.315] id est, for X ∈ S ms T1X = SΛS T = [ s 1 · · · s m ] Λ⎣.⎡s T m⎤⎦ =m∑λ i s i s T i (1153)where δ 2 (Λ) = Λ∈ R m×m (A.1) and S −1 = S T ∈ R m×m (B.5) because of thesymmetry: SΛS −1 = S −T ΛS T . Because the arrangement of eigenvectors andtheir corresponding eigenvalues is arbitrary, we almost always arrange theeigenvalues in nonincreasing order as is the convention for singular valuedecomposition.i=1A.5.2.1Positive semidefinite matrix square rootWhen X ∈ S m + , its unique positive semidefinite matrix square root is defined√X ∆ = S √ ΛS T ∈ S m + (1154)where the square root of nonnegative diagonal matrix √ Λ is taken entrywiseand positive. Then X = √ X √ X .

A.6. SINGULAR VALUE DECOMPOSITION, SVD 451A.6 Singular value decomposition, SVDA.6.1Compact SVD[88,2.5.4] For any A∈ R m×nq T1A = UΣQ T = [ u 1 · · · u η ] Σ⎣.⎡⎤∑⎦ = η σ i u i qiTqηT i=1(1155)U ∈ R m×η , Σ ∈ R η×η , Q ∈ R n×ηwhere U and Q are always skinny-or-square each having orthonormalcolumns, and whereη ∆ = min{m , n} (1156)Square matrix Σ is diagonal (A.1.1)δ 2 (Σ) = Σ ∈ R η×η (1157)holding the singular values σ i of A which are always arranged innonincreasing order by convention and are related to eigenvalues by A.14⎧√ ⎨ λ(AT A) i = √ (√ )λ(AA T ) i = λ AT A)= λ(√AAT> 0, i = 1... ρσ(A) i =ii⎩0, i = ρ+1... η(1158)of which the last η −ρ are 0 , A.15 whereρ ∆ = rankA = rank Σ (1159)A point sometimes lost: Any real matrix may be decomposed in terms ofits real singular values σ(A) ∈ R η and real matrices U and Q as in (1155),where [88,2.5.3]R{u i |σ i ≠0} = R(A)R{u i |σ i =0} ⊆ N(A T )R{q i |σ i ≠0} = R(A T (1160))R{q i |σ i =0} ⊆ N(A)A.14 When A is normal, σ(A) = |λ(A)|. [261,8.1]A.15 For η = n , σ(A) = √ )λ(A T A) = λ(√AT A where λ denotes eigenvalues.For η = m , σ(A) = √ )λ(AA T ) = λ(√AAT.

452 APPENDIX A. LINEAR ALGEBRAA.6.2Subcompact SVDSome authors allow only nonzero singular values. In that case the compactdecomposition can be made smaller; it can be redimensioned in terms of rankρ because, for any A∈ R m×nρ = rankA = rank Σ = max {i∈{1... η} | σ i ≠ 0} ≤ η (1161)There are η singular values. Rank is equivalent to the number ofnonzero singular values, on the main diagonal of Σ for any flavor SVD,as is well known. Now⎡q T1A = UΣQ T = [ u 1 · · · u ρ ] Σ⎣.⎤∑⎦ = ρ σ i u i qiTqρT i=1(1162)U ∈ R m×ρ , Σ ∈ R ρ×ρ , Q ∈ R n×ρwhere the main diagonal of diagonal matrix Σ has no 0 entries, andR{u i } = R(A)R{q i } = R(A T )(1163)A.6.3Full SVDAnother common and useful expression of the SVD makes U and Qsquare; making the decomposition larger than compact SVD. Completingthe nullspace bases in U and Q from (1160) provides what is called thefull singular value decomposition of A ∈ R m×n [212, App.A]. Orthonormalmatrices U and Q become orthogonal matrices (B.5):R{u i |σ i ≠0} = R(A)R{u i |σ i =0} = N(A T )R{q i |σ i ≠0} = R(A T )R{q i |σ i =0} = N(A)(1164)

A.6. SINGULAR VALUE DECOMPOSITION, SVD 453For any matrix A having rank ρ (= rank Σ)⎡q T1A = UΣQ T = [u 1 · · · u m ] Σ⎣.q T n⎤∑⎦ = η σ i u i qiT⎡σ 1= [ m×ρ basis R(A) m×m−ρ basis N(A T ) ] σ 2⎢⎣...i=1⎤⎡⎥⎣⎦(n×ρ basis R(A T ) ) ⎤T⎦(n×n−ρ basis N(A)) TU ∈ R m×m , Σ ∈ R m×n , Q ∈ R n×n (1165)where upper limit of summation η is defined in (1156). Matrix Σ is nolonger necessarily square, now padded with respect to (1157) by m − ηzero rows or n−η zero columns; the nonincreasingly ordered (possibly 0)singular values appear along its main diagonal as for compact SVD (1158).An important geometrical interpretation of SVD is given in Figure 91for m = n = 2 : The image of the unit sphere under any m × n matrixmultiplication is an ellipse. Considering the three factors of the SVDseparately, note that Q T is a pure rotation of the circle. Figure 91 showshow the axes q 1 and q 2 are first rotated by Q T to coincide with the coordinateaxes. Second, the circle is stretched by Σ in the directions of the coordinateaxes to form an ellipse. The third step rotates the ellipse by U into itsfinal position. Note how q 1 and q 2 are rotated to end up as u 1 and u 2 , theprincipal axes of the final ellipse. A direct calculation shows that Aq j =σ j u j .Thus q j is first rotated to coincide with the j th coordinate axis, stretched bya factor σ j , and then rotated to point in the direction of u j . All of thisis beautifully illustrated for 2 ×2 matrices by the Matlab code eigshow.m(see [211]).A direct consequence of the geometric interpretation is that the largestsingular value σ 1 measures the “magnitude” of A (its 2-norm):‖A‖ 2 = sup ‖Ax‖ 2 = σ 1 (1166)‖x‖ 2 =1This means that ‖A‖ 2 is the length of the longest principal semiaxis of theellipse.

454 APPENDIX A. LINEAR ALGEBRAΣq 2qq 21Q Tq 1q 2 u 2Uu 1q 1Figure 91: Geometrical interpretation of full SVD [170]: Image of circle{x∈ R 2 | ‖x‖ 2 =1} under matrix multiplication Ax is, in general, an ellipse.For the example illustrated, U ∆ =[u 1 u 2 ]∈ R 2×2 , Q ∆ =[q 1 q 2 ]∈ R 2×2 .

A.6. SINGULAR VALUE DECOMPOSITION, SVD 455Expressions for U , Q , and Σ follow readily from (1165),AA T U = UΣΣ T and A T AQ = QΣ T Σ (1167)demonstrating that the columns of U are the eigenvectors of AA T and thecolumns of Q are the eigenvectors of A T A. −Neil Muller et alii [170]A.6.4SVD of symmetric matricesA.6.4.0.1 Definition. Step function. (confer6.3.2.0.1)Define the signum-like entrywise vector-valued function ψ : R n → R n thattakes the value 1 corresponding to an entry-value 0 in the argument:[ψ(a) =∆limx i →a ix i|x i | = { 1, ai ≥ 0−1, a i < 0 , i=1... n ]∈ R n (1168)Eigenvalue signs of a symmetric matrix having diagonalizationA = SΛS T (1153) can be absorbed either into real U or real Q from the fullSVD; [226, p.34] (conferC.5.2.1)orA = SΛS T = Sδ(ψ(δ(Λ))) |Λ|S T ∆ = U ΣQ T ∈ S n (1169)A = SΛS T = S|Λ|δ(ψ(δ(Λ)))S T ∆ = UΣ Q T ∈ S n (1170)where |Λ| denotes entrywise absolute value of diagonal matrix Λ .A.6.5Pseudoinverse by SVDMatrix pseudoinverse (E) is nearly synonymous with singular valuedecomposition because of the elegant expression, given A = UΣQ TA † = QΣ †T U T ∈ R n×m (1171)that applies to all three flavors of SVD, where Σ † simply inverts nonzeroentries of matrix Σ .Given symmetric matrix A∈ S n and its diagonalization A = SΛS T(A.5.2), its pseudoinverse simply inverts all nonzero eigenvalues;A † = SΛ † S T (1172)△

456 APPENDIX A. LINEAR ALGEBRAA.7 ZerosA.7.10 entryIf a positive semidefinite matrix A = [A ij ] ∈ R n×n has a 0 entry A ii on itsmain diagonal, then A ij + A ji = 0 ∀j . [171,1.3.1]Any symmetric positive semidefinite matrix having a 0 entry on its maindiagonal must be 0 along the entire row and column to which that 0 entrybelongs. [88,4.2.8] [125,7.1, prob.2]A.7.2 0 eigenvalues theoremThis theorem is simple, powerful, and widely applicable:A.7.2.0.1 Theorem. Number of 0 eigenvalues.For any matrix A∈ R m×nrank(A) + dim N(A) = n (1173)by conservation of dimension. [125,0.4.4]For any square matrix A∈ R m×m , the number of 0 eigenvalues is at leastequal to dim N(A)dim N(A) ≤ number of 0 eigenvalues ≤ m (1174)while the eigenvectors corresponding to those 0 eigenvalues belong to N(A).[212,5.1] A.16For diagonalizable matrix A (A.5), the number of 0 eigenvalues isprecisely dim N(A) while the corresponding eigenvectors span N(A). Thereal and imaginary parts of the eigenvectors remaining span R(A).A.16 We take as given the well-known fact that the number of 0 eigenvalues cannot be lessthan the dimension of the nullspace. We offer an example of the converse:⎡ ⎤1 0 1 0A = ⎢ 0 0 1 0⎥⎣ 0 0 0 0 ⎦1 0 0 0dim N(A) = 2, λ(A) = [0 0 0 1] T ; three eigenvectors in the nullspace but only two areindependent. The right-hand side of (1174) is tight for nonzero matrices; e.g., (B.1) dyaduv T ∈ R m×m has m 0-eigenvalues when u ∈ v ⊥ .

A.7. ZEROS 457(TRANSPOSE.)Likewise, for any matrix A∈ R m×nrank(A T ) + dim N(A T ) = m (1175)For any square A∈ R m×m , the number of 0 eigenvalues is at least equalto dim N(A T ) = dim N(A) while the left-eigenvectors (eigenvectors of A T )corresponding to those 0 eigenvalues belong to N(A T ).For diagonalizable A , the number of 0 eigenvalues is preciselydim N(A T ) while the corresponding left-eigenvectors span N(A T ). The realand imaginary parts of the left-eigenvectors remaining span R(A T ). ⋄Proof. First we show, for a diagonalizable matrix, the number of 0eigenvalues is precisely the dimension of its nullspace while the eigenvectorscorresponding to those 0 eigenvalues span the nullspace:Any diagonalizable matrix A∈ R m×m must possess a complete set oflinearly independent eigenvectors. If A is full-rank (invertible), then allm=rank(A) eigenvalues are nonzero. [212,5.1]Suppose rank(A)< m . Then dim N(A) = m−rank(A). Thus there isa set of m−rank(A) linearly independent vectors spanning N(A). Eachof those can be an eigenvector associated with a 0 eigenvalue becauseA is diagonalizable ⇔ ∃ m linearly independent eigenvectors. [212,5.2]Eigenvectors of a real matrix corresponding to 0 eigenvalues must be real. A.17Thus A has at least m−rank(A) eigenvalues equal to 0.Now suppose A has more than m−rank(A) eigenvalues equal to 0.Then there are more than m−rank(A) linearly independent eigenvectorsassociated with 0 eigenvalues, and each of those eigenvectors must be inN(A). Thus there are more than m−rank(A) linearly independent vectorsin N(A) ; a contradiction.Therefore diagonalizable A has rank(A) nonzero eigenvalues and exactlym−rank(A) eigenvalues equal to 0 whose corresponding eigenvectors spanN(A).By similar argument, the left-eigenvectors corresponding to 0 eigenvaluesspan N(A T ).Next we show when A is diagonalizable, the real and imaginary parts ofits eigenvectors (corresponding to nonzero eigenvalues) span R(A) :A.17 Let ∗ denote complex conjugation. Suppose A=A ∗ and As i = 0. Then s i = s ∗ i ⇒As i =As ∗ i ⇒ As ∗ i =0. Conversely, As∗ i =0 ⇒ As i=As ∗ i ⇒ s i= s ∗ i .

458 APPENDIX A. LINEAR ALGEBRAThe right-eigenvectors of a diagonalizable matrix A∈ R m×m are linearlyindependent if and only if the left-eigenvectors are. So, matrix A hasa representation in terms of its right- and left-eigenvectors; from thediagonalization (1144), assuming 0 eigenvalues are ordered last,A =m∑λ i s i wi T =i=1k∑≤ mi=1λ i ≠0λ i s i w T i (1176)From the linearly independent dyads theorem (B.1.1.0.2), the dyads {s i w T i }must be independent because each set of eigenvectors are; hence rankA=k ,the number of nonzero eigenvalues. Complex eigenvectors and eigenvaluesare common for real matrices, and must come in complex conjugate pairs forthe summation to remain real. Assume that conjugate pairs of eigenvaluesappear in sequence. Given any particular conjugate pair from (1176), we getthe partial summationλ i s i w T i + λ ∗ i s ∗ iw ∗Ti = 2Re(λ i s i w T i )= 2 ( Res i Re(λ i w T i ) − Im s i Im(λ i w T i ) ) (1177)where A.18 λ ∗ i = λ i+1 , s ∗ iequivalently written∆∆= s i+1 , and w ∗ i∆= w i+1 . Then (1176) isA = 2 ∑ iλ ∈ Cλ i ≠0Res 2i Re(λ 2i w T 2i) − Im s 2i Im(λ 2i w T 2i) + ∑ jλ ∈ Rλ j ≠0λ j s j w T j (1178)The summation (1178) shows: A is a linear combination of real and imaginaryparts of its right-eigenvectors corresponding to nonzero eigenvalues. The kvectors {Re s i ∈ R m , Im s i ∈ R m | λ i ≠0, i∈{1... m}} must therefore spanthe range of diagonalizable matrix A .The argument is similar regarding the span of the left-eigenvectors. A.18 The complex conjugate of w is denoted w ∗ , while its conjugate transpose is denotedby w H = w ∗T .

A.7. ZEROS 459A.7.3For X,A∈ S M +0 trace and matrix product[24,2.6.1, exer.2.8] [232,3.1]tr(XA) = 0 ⇔ XA = AX = 0 (1179)Proof. (⇐) Suppose XA = AX = 0. Then tr(XA)=0 is obvious.(⇒) Suppose tr(XA)=0. tr(XA)= tr(A 1/2 XA 1/2 ) whose argument ispositive semidefinite by Corollary A.3.1.0.5. Trace of any square matrix isequivalent to the sum of its eigenvalues. Eigenvalues of a positive semidefinitematrix can total 0 if and only if each and every nonnegative eigenvalueis 0. The only feasible positive semidefinite matrix, having all 0 eigenvalues,resides at the origin; id est,A 1/2 XA 1/2 = ( X 1/2 A 1/2) TX 1/2 A 1/2 = 0 (1180)which in turn implies XA=0. A similar argument shows AX = 0. Symmetric matrices A and X are simultaneously diagonalizable if andonly if they are commutative under multiplication; (1076) id est, they sharea complete set of eigenvectors.A.7.4Zero definiteThe domain over which an arbitrary real matrix A is zero definite can exceedits left and right nullspaces. For any positive semidefinite matrix A∈ R M×M(for A +A T ≽ 0){x | x T Ax = 0} = N(A +A T ) (1181)because ∃R A+A T =R T R , ‖Rx‖=0 ⇔ Rx=0, and N(A+A T )= N(R).For any positive definite matrix A (for A +A T ≻ 0)Further, [261,3.2, prob.5]{x | x T Ax = 0} = 0 (1182){x | x T Ax = 0} = R M ⇔ A T = −A (1183)while{x | x H Ax = 0} = C M ⇔ A = 0 (1184)

460 APPENDIX A. LINEAR ALGEBRAThe positive semidefinite matrix[ ] 1 2A =0 1for example, has no nullspace. Yet(1185){x | x T Ax = 0} = {x | 1 T x = 0} ⊂ R 2 (1186)which is the nullspace of the symmetrized matrix. Symmetric matrices arenot spared from the excess; videlicet,[ ] 1 2B =(1187)2 1has eigenvalues {−1, 3}, no nullspace, but is zero definite on A.19X ∆ = {x∈ R 2 | x 2 = (−2 ± √ 3)x 1 } (1188)A.7.4.0.1 Proposition. (Sturm) Dyad-decompositions. [217,5.2]Given symmetric matrix A ∈ S M , let positive semidefinite matrix X ∈ S M +have rank ρ . Then 〈A, X〉=0 if and only if there is a dyad-decompositionρ∑X = x j xj T (1189)satisfyingj=1〈A, x j x T j 〉 = 〈A, X〉 = 0 for each and every j (1190)⋄A.7.4.0.2 Example. Dyad.The dyad uv T ∈ R M×M (B.1) is zero definite on all x for which either x T u=0or x T v=0;{x | x T uv T x = 0} = {x | x T u=0} ∪ {x | v T x=0} (1191)id est, on u ⊥ ∪ v ⊥ . Symmetrizing the dyad does not change the outcome:{x | x T (uv T + vu T )x/2 = 0} = {x | x T u=0} ∪ {x | v T x=0} (1192)A.19 These two lines represent the limit in the union of two generally distinct hyperbolae;id est, for matrix B and set X as definedlim{x∈ R 2 | x T Bx = ε} = Xε↓0

Appendix BSimple matricesMathematicians also attempted to develop algebra of vectors butthere was no natural definition of the product of two vectorsthat held in arbitrary dimensions. The first vector algebra thatinvolved a noncommutative vector product (that is, v×w need notequal w×v) was proposed by Hermann Grassmann in his bookAusdehnungslehre (1844). Grassmann’s text also introduced theproduct of a column matrix and a row matrix, which resulted inwhat is now called a simple or a rank-one matrix. In the late19th century the American mathematical physicist Willard Gibbspublished his famous treatise on vector analysis. In that treatiseGibbs represented general matrices, which he called dyadics, assums of simple matrices, which Gibbs called dyads. Later thephysicist P. A. M. Dirac introduced the term “bra-ket” for whatwe now call the scalar product of a “bra” (row) vector times a“ket” (column) vector and the term “ket-bra” for the product of aket times a bra, resulting in what we now call a simple matrix, asabove. Our convention of identifying column matrices and vectorswas introduced by physicists in the 20th century.−Marie A. Vitulli [240]2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.461

462 APPENDIX B. SIMPLE MATRICESB.1 Rank-one matrix (dyad)Any matrix formed from the unsigned outer product of two vectors,Ψ = uv T ∈ R M×N (1193)where u ∈ R M and v ∈ R N , is rank-one and called a dyad. Conversely, anyrank-one matrix must have the form Ψ . [125, prob.1.4.1] The product −uv Tis a negative dyad. For matrix products AB T , in general, we haveR(AB T ) ⊆ R(A) , N(AB T ) ⊇ N(B T ) (1194)with equality when B = A [212,3.3,3.6] B.1 or respectively when B isinvertible and N(A)=0. Yet for all nonzero dyads we haveR(uv T ) = R(u) , N(uv T ) = N(v T ) ≡ v ⊥ (1195)where dim v ⊥ =N −1.It is obvious a dyad can be 0 only when u or v is 0;Ψ = uv T = 0 ⇔ u = 0 or v = 0 (1196)The matrix 2-norm for Ψ is equivalent to the Frobenius norm;‖Ψ‖ 2 = ‖uv T ‖ F = ‖uv T ‖ 2 = ‖u‖ ‖v‖ (1197)When u and v are normalized, the pseudoinverse is the transposed dyad.Otherwise,Ψ † = (uv T ) † vu T=(1198)‖u‖ 2 ‖v‖ 2When dyad uv T ∈R N×N is square, uv T has at least N −1 0-eigenvaluesand corresponding eigenvectors spanning v ⊥ . The remaining eigenvector uspans the range of uv T with corresponding eigenvalueB.1 Proof. R(AA T ) ⊆ R(A) is obvious.λ = v T u = tr(uv T ) ∈ R (1199)R(AA T ) = {AA T y | y ∈ R m }⊇ {AA T y | A T y ∈ R(A T )} = R(A) by (114)

B.1. RANK-ONE MATRIX (DYAD) 463R(v)0 0 R(Ψ) = R(u)N(Ψ)= N(v T )N(u T )R N = R(v) ⊕ N(uv T )N(u T ) ⊕ R(uv T ) = R MFigure 92:The four fundamental subspaces [214,3.6] of any dyadΨ = uv T ∈R M×N . Ψ(x) ∆ = uv T x is a linear mapping from R N to R M . Themap from R(v) to R(u) is bijective. [212,3.1]The determinant is the product of the eigenvalues; so, it is always true thatdet Ψ = det(uv T ) = 0 (1200)When λ=1, the square dyad is a nonorthogonal projector projecting on itsrange (Ψ 2 =Ψ ,E.1). It is quite possible that u∈v ⊥ making the remainingeigenvalue instead 0 ; B.2 λ=0 together with the first N − 1 0-eigenvalues;id est, it is possible uv T were nonzero while all its eigenvalues are 0. Thematrix [ 1−1] [ 1 1]=[ 1 1−1 −1](1201)for example, has two 0-eigenvalues. In other words, the value of eigenvectoru may simultaneously be a member of the nullspace and range of the dyad.The explanation is, simply, because u and v share the same dimension,dimu = M = dimv = N :Proof. Figure 92 shows the four fundamental subspaces for the dyad.Linear operator Ψ : R N → R M provides a map between vector spaces thatremain distinct when M =N ;u ∈ R(uv T )u ∈ N(uv T ) ⇔ v T u = 0R(uv T ) ∩ N(uv T ) = ∅(1202)B.2 The dyad is not always diagonalizable (A.5) because the eigenvectors are notnecessarily independent.

464 APPENDIX B. SIMPLE MATRICESB.1.0.1rank-one modificationIf A∈ R N×N is any nonsingular matrix and 1+v T A −1 u≠0, then [136, App.6][261,2.3, prob.16] [82,4.11.2] (Sherman-Morrison)(A + uv T ) −1 = A −1 − A−1 uv T A −11 + v T A −1 u(1203)B.1.0.2dyad symmetryIn the specific circumstance that v = u , then uu T ∈ R N×N is symmetric,rank-one, and positive semidefinite having exactly N −1 0-eigenvalues. Infact, (Theorem A.3.1.0.7)uv T ≽ 0 ⇔ v = u (1204)and the remaining eigenvalue is almost always positive;λ = u T u = tr(uu T ) > 0 unless u=0 (1205)The matrix [ Ψ uu T 1for example, is rank-1 positive semidefinite if and only if Ψ = uu T .](1206)B.1.1Dyad independenceNow we consider a sum of dyads like (1193) as encountered in diagonalizationand singular value decomposition:( k∑)k∑R s i wiT = R ( )k∑s i wiT = R(s i ) ⇐ w i ∀i are l.i. (1207)i=1i=1i=1range of the summation is the vector sum of ranges. B.3 (Theorem B.1.1.1.1)Under the assumption the dyads are linearly independent (l.i.), then thevector sums are unique (p.610): for {w i } l.i. and {s i } l.i.( k∑)R s i wiT = R ( ) ( )s 1 w1 T ⊕ ... ⊕ R sk wk T = R(s1 ) ⊕ ... ⊕ R(s k ) (1208)i=1B.3 Move of range R to inside the summation depends on linear independence of {w i }.

B.1. RANK-ONE MATRIX (DYAD) 465B.1.1.0.1 Definition. Linearly independent dyads. [130, p.29, thm.11][219, p.2] The set of k dyads{si wi T | i=1... k } (1209)where s i ∈C M and w i ∈C N , is said to be linearly independent iffrank(SW T ∆ =k∑s i wiTi=1)= k (1210)where S ∆ = [s 1 · · · s k ] ∈ C M×k and W ∆ = [w 1 · · · w k ] ∈ C N×k .△As defined, dyad independence does not preclude existence of a nullspaceN(SW T ) , nor does it imply SW T is full-rank. In absence of an assumptionof independence, generally, rankSW T ≤ k . Conversely, any rank-k matrixcan be written in the form SW T by singular value decomposition. (A.6)B.1.1.0.2 Theorem. Linearly independent (l.i.) dyads.Vectors {s i ∈ C M , i = 1... k} are l.i. and vectors {w i ∈ C N , i = 1... k} arel.i. if and only if dyads {s i wi T ∈ C M×N , i=1... k} are l.i.⋄Proof. Linear independence of k dyads is identical to definition (1210).(⇒) Suppose {s i } and {w i } are each linearly independent sets. InvokingSylvester’s rank inequality, [125,0.4] [261,2.4]rankS+rankW − k ≤ rank(SW T ) ≤ min{rankS , rankW } (≤ k) (1211)Then k ≤rank(SW T )≤k that implies the dyads are independent.(⇐) Conversely, suppose rank(SW T )=k . Thenk ≤ min{rankS , rankW } ≤ k (1212)implying the vector sets are each independent.

466 APPENDIX B. SIMPLE MATRICESB.1.1.1 Biorthogonality condition, Range and Nullspace of SumDyads characterized by a biorthogonality condition W T S =I areindependent; id est, for S ∈ C M×k and W ∈ C N×k , if W T S =I thenrank(SW T )=k by the linearly independent dyads theorem because(conferE.1.1)W T S =I ⇔ rankS=rankW =k≤M =N (1213)To see that, we need only show: N(S)=0 ⇔ ∃ B BS =I . B.4(⇐) Assume BS =I . Then N(BS)=0={x | BSx = 0} ⊇ N(S). (1194)(⇒) If N(S)=0[ then]S must be full-rank skinny-or-square.B∴ ∃ A,B,C [ S A] = I (id est, [ S A] is invertible) ⇒ BS =I .CLeft inverse B is given as W T here. Because of reciprocity with S , itimmediately follows: N(W)=0 ⇔ ∃ S S T W =I . Dyads produced by diagonalization, for example, are independent becauseof their inherent biorthogonality. (A.5.1) The converse is generally false;id est, linearly independent dyads are not necessarily biorthogonal.B.1.1.1.1 Theorem. Nullspace and range of dyad sum.Given a sum of dyads represented by SW T where S ∈C M×k and W ∈ C N×kN(SW T ) = N(W T ) ⇐ ∃ B BS = IR(SW T ) = R(S) ⇐ ∃ Z W T Z = I(1214)⋄Proof. (⇒) N(SW T )⊇ N(W T ) and R(SW T )⊆ R(S) are obvious.(⇐) Assume the existence of a left inverse B ∈ R k×N and a right inverseZ ∈ R N×k . B.5N(SW T ) = {x | SW T x = 0} ⊆ {x | BSW T x = 0} = N(W T ) (1215)R(SW T ) = {SW T x | x∈ R N } ⊇ {SW T Zy | Zy ∈ R N } = R(S) (1216)B.4 Left inverse is not unique, in general.B.5 By counter example, the theorem’s converse cannot be true; e.g., S = W = [1 0].

B.2. DOUBLET 467R([u v ])R(Π)= R([u v])0 0 N(Π)=v ⊥ ∩ u ⊥([ ]) vR N T= R([u v ]) ⊕ Nu Tv ⊥ ∩ u ⊥([ ]) vTNu T ⊕ R([u v ]) = R NFigure 93: Four fundamental subspaces [214,3.6] of a doubletΠ = uv T + vu T ∈ S N . Π(x) = (uv T + vu T )x is a linear bijective mappingfrom R([u v ]) to R([u v ]).B.2 DoubletConsider a sum of two linearly independent square dyads, one a transpositionof the other:Π = uv T + vu T = [ u v ] [ ]v Tu T = SW T ∈ S N (1217)where u,v ∈ R N . Like the dyad, a doublet can be 0 only when u or v is 0;Π = uv T + vu T = 0 ⇔ u = 0 or v = 0 (1218)By assumption of independence, a nonzero doublet has two nonzeroeigenvaluesλ 1 ∆ = u T v + ‖uv T ‖ , λ 2 ∆ = u T v − ‖uv T ‖ (1219)where λ 1 > 0 >λ 2 , with corresponding eigenvectorsx 1 ∆ = u‖u‖ + v‖v‖ , x 2∆= u‖u‖ −v‖v‖(1220)spanning the doublet range. Eigenvalue λ 1 cannot be 0 unless u and v haveopposing directions, but that is antithetical since then the dyads would nolonger be independent. Eigenvalue λ 2 is 0 if and only if u and v share thesame direction, again antithetical. Generally we have λ 1 > 0 and λ 2 < 0, soΠ is indefinite.

468 APPENDIX B. SIMPLE MATRICESN(u T )R(E) = N(v T )0 0 N(E)=R(u)R(v)R N = N(u T ) ⊕ N(E)R(v) ⊕ R(E) = R NFigure 94: v T u = 1/ζ . The four fundamental subspaces)[214,3.6] ofelementary matrix E as a linear mapping E(x)=(I − uvT x .v T uBy the nullspace and range of dyad sum theorem, doublet Π hasN −2 ([ zero-eigenvalues ]) remaining and corresponding eigenvectors spanningvTNu T . We therefore haveR(Π) = R([u v ]) , N(Π) = v ⊥ ∩ u ⊥ (1221)of respective dimension 2 and N −2.B.3 Elementary matrixA matrix of the formE = I − ζuv T ∈ R N×N (1222)where ζ ∈ R is finite and u,v ∈ R N , is called an elementary matrix or arank-one modification of the identity. [127] Any elementary matrix in R N×Nhas N −1 eigenvalues equal to 1 corresponding to real eigenvectors thatspan v ⊥ . The remaining eigenvalueλ = 1 − ζv T u (1223)corresponds to eigenvector u . B.6 From [136, App.7.A.26] the determinant:detE = 1 − tr ( ζuv T) = λ (1224)B.6 Elementary matrix E is not always diagonalizable because eigenvector u need not beindependent of the others; id est, u∈v ⊥ is possible.

B.3. ELEMENTARY MATRIX 469If λ ≠ 0 then E is invertible; [82]E −1 = I + ζ λ uvT (1225)Eigenvectors corresponding to 0 eigenvalues belong to N(E) , andthe number of 0 eigenvalues must be at least dim N(E) which, here,can be at most one. (A.7.2.0.1) The nullspace exists, therefore, whenλ=0; id est, when v T u=1/ζ , rather, whenever u belongs to thehyperplane {z ∈R N | v T z=1/ζ}. Then (when λ=0) elementary matrixE is a nonorthogonal projector projecting on its range (E 2 =E ,E.1)and N(E)= R(u) ; eigenvector u spans the nullspace when it exists. Byconservation of dimension, dim R(E)=N −dim N(E). It is apparent from(1222) that v ⊥ ⊆ R(E) , but dimv ⊥ =N −1. Hence R(E)≡v ⊥ when thenullspace exists, and the remaining eigenvectors span it.In summary, when a nontrivial nullspace of E exists,R(E) = N(v T ), N(E) = R(u), v T u = 1/ζ (1226)illustrated in Figure 94, which is opposite to the assignment of subspaces fora dyad (Figure 92). Otherwise, R(E)= R N .When E =E T , the spectral norm isB.3.1Householder matrix‖E‖ 2 = max{1, |λ|} (1227)An elementary matrix is called a Householder matrix when it has the definingform, for nonzero vector u [88,5.1.2] [82,4.10.1] [212,7.3] [125,2.2]H = I − 2 uuTu T u ∈ SN (1228)which is a symmetric orthogonal (reflection) matrix (H −1 =H T =H(B.5.2)). Vector u is normal to an N−1-dimensional subspace u ⊥ throughwhich this particular H effects pointwise reflection; e.g., Hu ⊥ = u ⊥ whileHu = −u .Matrix H has N − 1 orthonormal eigenvectors spanning that reflectingsubspace u ⊥ with corresponding eigenvalues equal to 1. The remainingeigenvector u has corresponding eigenvalue −1 ; sodetH = −1 (1229)

470 APPENDIX B. SIMPLE MATRICESDue to symmetry of H , the matrix 2-norm (the spectral norm) is equal to thelargest eigenvalue-magnitude. A Householder matrix is thus characterized,H T = H , H −1 = H T , ‖H‖ 2 = 1, H 0 (1230)For example, the permutation matrix⎡ ⎤1 0 0Ξ = ⎣ 0 0 1 ⎦ (1231)0 1 0is a Householder matrix having u=[ 0 1 −1 ] T / √ 2 . Not all permutationmatrices are Householder matrices, although all permutation matrices areorthogonal matrices. [212,3.4] Neither ⎡are all symmetric ⎤ permutation0 0 0 1matrices Householder matrices; e.g., Ξ = ⎢ 0 0 1 0⎥⎣ 0 1 0 0 ⎦ (1313) is not a1 0 0 0Householder matrix.B.4 Auxiliary V -matricesB.4.1Auxiliary projector matrix VIt is convenient to define a matrix V that arises naturally as a consequence oftranslating the geometric center α c (4.5.1.0.1) of some list X to the origin.In place of X − α c 1 T we may write XV as in (531) whereV ∆ = I − 1 N 11T ∈ S N (483)is an elementary matrix called the geometric centering matrix.Any elementary matrix in R N×N has N−1 eigenvalues equal to 1. For theparticular elementary matrix V , the N th eigenvalue equals 0. The numberof 0 eigenvalues must equal dim N(V ) = 1, by the 0 eigenvalues theorem(A.7.2.0.1), because V =V T is diagonalizable. BecauseV 1 = 0 (1232)the nullspace N(V )=R(1) is spanned by the eigenvector 1. The remainingeigenvectors span R(V ) ≡ 1 ⊥ = N(1 T ) that has dimension N −1.

B.4. AUXILIARY V -MATRICES 471BecauseV 2 = V (1233)and V T = V , elementary matrix V is also a projection matrix (E.3)projecting orthogonally on its range N(1 T ).V = I − 1(1 T 1) −1 1 T (1234)The {0, 1} eigenvalues also indicate diagonalizable V is a projection matrix.[261,4.1, thm.4.1] Symmetry of V denotes orthogonal projection; from(1476),V T = V , V † = V , ‖V ‖ 2 = 1, V ≽ 0 (1235)Matrix V is also circulant [96].B.4.1.0.1 Example. Relationship of auxiliary to Householder matrix.Let H ∈ S N be a Householder matrix (1228) defined byu =⎡⎢⎣1.11 + √ N⎤⎥⎦ ∈ RN (1236)Then we have [85,2][ I 0V = H0 T 0]H (1237)Let D ∈ S N h and define[−HDH = ∆ A b−b T c](1238)where b is a vector. Then because H is nonsingular (A.3.1.0.5) [109,3][ A 0−V DV = −H0 T 0]H ≽ 0 ⇔ −A ≽ 0 (1239)and affine dimension is r = rankA when D is a Euclidean distance matrix.

472 APPENDIX B. SIMPLE MATRICESB.4.2Schoenberg auxiliary matrix V N1. V N = 1 √2[ −1TI2. V T N 1 = 03. I − e 1 1 T = [ 0 √ 2V N]4. [ 0 √ 2V N]VN = V N5. [ 0 √ 2V N]V = V]∈ R N×N−16. V [ 0 √ 2V N]=[0√2VN]7. [ 0 √ 2V N] [0√2VN]=[0√2VN]8. [ 0 √ 2V N] †=[ 0 0T0 I]V9. [ 0 √ 2V N] †V =[0√2VN] †10. [ 0 √ ] [ √ ] †2V N 0 2VN = V11. [ 0 √ [ ]] † [ √ ] 0 0T2V N 0 2VN =0 I12. [ 0 √ ] [ ]0 02V TN = [ 0 √ ]2V0 IN[ ] [ ]0 0T [0 √ ] 0 0T13.2VN =0 I0 I14. [V N1 √21 ] −1 =[ ]V†N√2N 1T15. V † N = √ 2 [ − 1 N 1 I − 1 N 11T] ∈ R N−1×N ,(I −1N 11T ∈ S N−1)16. V † N 1 = 017. V † N V N = I

B.4. AUXILIARY V -MATRICES 47318. V T = V = V N V † N = I − 1 N 11T ∈ S N(19. −V † N (11T − I)V N = I , 11 T − I ∈ EDM N)20. D = [d ij ] ∈ S N h (485)tr(−V DV ) = tr(−V D) = tr(−V † N DV N) = 1 N 1T D 1 = 1 N tr(11T D) = 1 NAny elementary matrix E ∈ S N of the particular form∑d iji,jE = k 1 I − k 2 11 T (1240)where k 1 , k 2 ∈ R , B.7 will make tr(−ED) proportional to ∑ d ij .21. D = [d ij ] ∈ S Ntr(−V DV ) = 1 N22. D = [d ij ] ∈ S N h∑i,ji≠jd ij − N−1N∑d ii = 1 T D1 1 − trD Nitr(−V T N DV N) = ∑ jd 1j23. For Y ∈ S NB.4.3V (Y − δ(Y 1))V = Y − δ(Y 1)The skinny matrix⎡V ∆ W =⎢⎣Orthonormal auxiliary matrix V W−1 √N1 + −1N+ √ N−1N+ √ N.−1N+ √ N√−1−1N· · · √N−1N+ √ N......−1N+ √ N· · ·−1N+ √ N...−1N+ √ N...· · · 1 + −1N+ √ N.⎤∈ R N×N−1 (1241)⎥⎦B.7 If k 1 is 1−ρ while k 2 equals −ρ∈R , then all eigenvalues of E for −1/(N −1) < ρ < 1are guaranteed positive and therefore E is guaranteed positive definite. [189]

474 APPENDIX B. SIMPLE MATRICEShas R(V W )= N(1 T ) and orthonormal columns. [4] We defined threeauxiliary V -matrices: V , V N (466), and V W sharing some attributes listedin Table B.4.4. For example, V can be expressedV = V W V T W = V N V † N(1242)but V T W V W= I means V is an orthogonal projector (1473) andV † W = V T W , ‖V W ‖ 2 = 1 , V T W1 = 0 (1243)B.4.4Auxiliary V -matrix TabledimV rankV R(V ) N(V T ) V T V V V T V V †V N ×N N −1 N(1 T ) R(1) V V V[ ]V N N ×(N −1) N −1 N(1 T 1) R(1) (I + 2 11T 1 N −1 −1T) V2 −1 IV W N ×(N −1) N −1 N(1 T ) R(1) I V VB.4.5More auxiliary matricesMathar shows [161,2] that any elementary matrix (B.3) of the formV M = I − b1 T ∈ R N×N (1244)such that b T 1 = 1 (confer [90,2]), is an auxiliary V -matrix havingR(V T M ) = N(bT ), R(V M ) = N(1 T )N(V M ) = R(b), N(V T M ) = R(1) (1245)Given X ∈ R n×N , the choice b= 1 N 1 (V M=V ) minimizes ‖X(I − b1 T )‖ F .[92,3.2.1]

B.5. ORTHOGONAL MATRIX 475B.5 Orthogonal matrixB.5.1Vector rotationThe property Q −1 = Q T completely defines an orthogonal matrix Q ∈ R n×nemployed to effect vector rotation; [212,2.6,3.4] [214,6.5] [125,2.1] forx ∈ R n ‖Qx‖ = ‖x‖ (1246)The orthogonal matrix is characterized:Q −1 = Q T , ‖Q‖ 2 = 1 (1247)Applying characterization (1247) to Q T we see it too is an orthogonal matrix.Hence the rows and columns of Q respectively form an orthonormal set.All permutation matrices Ξ , for example, are orthogonal matrices. Thelargest magnitude entry of any orthogonal matrix is 1; for each and everyj ∈1... n‖Q(j,:)‖ ∞ ≤ 1(1248)‖Q(:, j)‖ ∞ ≤ 1Each and every eigenvalue of a (real) orthogonal matrix has magnitude 1λ(Q) ∈ C n , |λ(Q)| = 1 (1249)while only the identity matrix can be simultaneously positive definite andorthogonal.A unitary matrix is a complex generalization of the orthogonal matrix.The conjugate transpose defines it: U −1 = U H . An orthogonal matrix issimply a real unitary matrix.B.5.2ReflectionA matrix for pointwise reflection is defined by imposing symmetry uponthe orthogonal matrix; id est, a reflection matrix is completely definedby Q −1 = Q T = Q . The reflection matrix is an orthogonal matrix,characterized:Q T = Q , Q −1 = Q T , ‖Q‖ 2 = 1 (1250)The Householder matrix (B.3.1) is an example of a symmetric orthogonal(reflection) matrix.

476 APPENDIX B. SIMPLE MATRICESFigure 95: Gimbal: a mechanism imparting three degrees of dimensionalfreedom to a Euclidean body suspended at its center. Each ring is free torotate about one axis. (Drawing courtesy of The MathWorks Inc.)Reflection matrices have eigenvalues equal to ±1 and so detQ=±1. Itis natural to expect a relationship between reflection and projection matricesbecause all projection matrices have eigenvalues belonging to {0, 1} . Infact, any reflection matrix Q is related to some orthogonal projector P by[127,1, prob.44]Q = I − 2P (1251)Yet P is, generally, neither orthogonal or invertible. (E.3.2)λ(Q) ∈ R n , |λ(Q)| = 1 (1252)Reflection is with respect to R(P ) ⊥ . Matrix 2P −I represents antireflection.Every orthogonal matrix can be expressed as the product of a rotation anda reflection. The collection of all orthogonal matrices of particular dimensiondoes not form a convex set.B.5.3Rotation of range and rowspaceGiven orthogonal matrix Q , column vectors of a matrix X are simultaneouslyrotated by the product QX . In three dimensions (X ∈ R 3×N ), the precisemeaning of rotation is best illustrated in Figure 95 where the gimbal aidsvisualization of rotation achievable about the origin.

B.5. ORTHOGONAL MATRIX 477B.5.3.0.1 Example. One axis of revolution. [Partition an n + 1-dimensional Euclidean space R n+1 =∆ RnRan n-dimensional subspace]and defineR ∆ = {λ∈ R n+1 | 1 T λ = 0} (1253)(a hyperplane through the origin). We want an orthogonal matrix thatrotates a list in the columns of matrix X ∈ R n+1×N through the dihedralangle between R n and R : (R n , R)= arccos ( 1/ √ n+1 ) radians. Thevertex-description of the nonnegative orthant in R n+1 is{[e 1 e 2 · · · e n+1 ]a | a ≽ 0} = {a ≽ 0} ⊂ R n+1 (1254)Consider rotation of these vertices via orthogonal matrixQ ∆ = [1 1 √ n+1ΞV W ]Ξ ∈ R n+1×n+1 (1255)where permutation matrix Ξ ∈ S n+1 is defined in (1313), and V W ∈ R n+1×nis the orthonormal auxiliary matrix defined inB.4.3. This particularorthogonal matrix is selected because it rotates any point in R n about oneaxis of revolution onto R ; e.g., rotation Qe n+1 aligns the last standard basisvector with subspace normal R ⊥ =1, and from these two vectors we get(R n , R). The rotated standard basis vectors remaining are orthonormalspanning R .Another interpretation of product QX is rotation/reflection of R(X).Rotation of X as in QXQ T is the simultaneous rotation/reflection of rangeand rowspace. B.8Proof. Any matrix can be expressed as a singular value decompositionX = UΣW T (1155) where δ 2 (Σ) = Σ , R(U) ⊇ R(X) , and R(W) ⊇ R(X T ).B.8 The product Q T AQ can be regarded as a coordinate transformation; e.g., givenlinear map y =Ax : R n → R n and orthogonal Q, the transformation Qy = AQx is arotation/reflection of the range and rowspace (113) of matrix A where Qy ∈ R(A) andQx∈R(A T ) (114).

478 APPENDIX B. SIMPLE MATRICESB.5.4Matrix rotationOrthogonal matrices are also employed to rotate/reflect like vectors othermatrices: [sic] [88,12.4.1] Given orthogonal matrix Q , the product Q T Awill rotate A ∈ R n×n in the Euclidean sense in R n2 because the Frobeniusnorm is orthogonally invariant (2.2.1);‖Q T A‖ F = √ tr(A T QQ T A) = ‖A‖ F (1256)(likewise for AQ). Were A symmetric, such a rotation would depart fromS n . One remedy is to instead form the product Q T AQ because‖Q T AQ‖ F =√tr(Q T A T QQ T AQ) = ‖A‖ F (1257)Matrix A is orthogonally equivalent to B if B=S T AS for someorthogonal matrix S . Every square matrix, for example, is orthogonallyequivalent to a matrix having equal entries along the main diagonal.[125,2.2, prob.3]B.5.4.1bijectionAny product AQ of orthogonal matrices remains orthogonal. Givenany other dimensionally compatible orthogonal matrix U , the mappingg(A)= U T AQ is a linear bijection on the domain of orthogonal matrices.[148,2.1]

Appendix CSome analytical optimal resultsinfC.1 properties of infima∅ ∆ = ∞ (1258)sup ∅ ∆ = −∞ (1259)Given f(x) : X →R defined on arbitrary set X [123,0.1.2]inf f(x) = − sup −f(x)x∈X x∈Xsupx∈Xf(x) = −infx∈X −f(x) (1260)arg inf f(x) = arg sup −f(x)x∈X x∈Xarg supx∈Xf(x) = arg infx∈X −f(x) (1261)Given g(x,y) : X × Y →R with independent variables x and y definedon mutually independent arbitrary sets X and Y [123,0.1.2]inf g(x,y) = inf inf g(x,y) = inf inf g(x,y) (1262)x∈X, y∈Y x∈X y∈Y y∈Y x∈XThe variables are independent if and only if the corresponding sets arenot interdependent. An objective function of partial infimum is notnecessarily unique.2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.479

480 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSGiven f(x) : X →R and g(x) : X →R defined on arbitrary set X[123,0.1.2]inf (f(x) + g(x)) ≥ inf f(x) + inf g(x) (1263)x∈X x∈X x∈XGiven f(x) : X ∪ Y →R and arbitrary sets X and Y [123,0.1.2]X ⊂ Y ⇒ inf f(x) ≥ inf f(x) (1264)x∈X x∈Yinf f(x) = min{inf f(x), inf f(x)} (1265)x∈X ∪Y x∈X x∈Yinf f(x) ≥ max{inf f(x), inf f(x)} (1266)x∈X ∩Y x∈X x∈YOver some convex set C given vector constant y or matrix constant Yarg infx∈C ‖x − y‖ 2 = arg infx∈C ‖x − y‖2 2 (1267)arg infX∈ C ‖X − Y ‖ F = arg infX∈ C ‖X − Y ‖2 F (1268)C.2 involving absolute valueOptimal solution is norm dependent. [38, p.297]minimize ‖x‖ 1xsubject to x ∈ C≡minimize 1 T tt,xsubject to −t ≼ x ≼ tx ∈ C(1269)minimize ‖x‖ 2xsubject to x ∈ C≡minimizet,xsubject tot[ tI xx T tx ∈ C]≽ 0(1270)minimize ‖x‖ ∞xsubject to x ∈ C≡minimize tt,xsubject to −t1 ≼ x ≼ t1x ∈ C(1271)In R n the norms respectively represent: length measured along a grid,Euclidean length, maximum |coordinate|.

C.3. INVOLVING DIAGONAL, TRACE, EIGENVALUES 481(Ye) Assuming existence of a minimum element (p.99):minimize |x|xsubject to x ∈ C≡minimize 1 T (α + β)α,β, xsubject to α,β ≽ 0x = α − βx ∈ C(1272)All these problems are convex when C is.C.3 involving diagonal, trace, eigenvaluesFor A∈ R m×n and σ(A) denoting its singular values, [38,A.1.6][73,1]∑iσ(A) i = tr √ A T A = sup tr(X T A) = maximize tr(X T A)‖X‖ 2 ≤1X∈R m×n [ ] I Xsubject toX T ≽ 0I(1273)For X ∈ S m , Y ∈ S n , A∈ C ⊆ R m×n for C convex, and σ(A) denotingthe singular values of A [73,3]minimizeA∑σ(A) isubject to A ∈ Ci≡minimizeA , X , Ysubject totrX + trY[ ] X A≽ 0YA TA ∈ C(1274)For A∈ S N + and β ∈ Rβ trA = maximize tr(XA)X∈ S Nsubject to X ≼ βI(1275)

482 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSFor A∈ S N having eigenvalues λ(A)∈ R N [148,2.1] [28,I.6.15]min{λ(A) i } = inf x T Ax = minimizei‖x‖=1X∈ S N +subject to trX = 1max{λ(A) i } = sup x T Ax = maximizei‖x‖=1X∈ S N +subject to trX = 1tr(XA) = maximize tt∈Rsubject to A ≽ tI(1276)tr(XA) = minimize tt∈Rsubject to A ≼ tI(1277)The minimum eigenvalue of any symmetric matrix is always a concavefunction of its entries, while the maximum eigenvalue is always convex.[38, exmp.3.10] For v 1 a normalized eigenvector of A corresponding tothe largest eigenvalue, and v N a normalized eigenvector correspondingto the smallest eigenvalue,v N = arg inf‖x‖=1 xT Ax (1278)v 1 = arg sup x T Ax (1279)‖x‖=1For A∈ S N having eigenvalues λ(A)∈ R N , consider the unconstrainednonconvex optimization that is a projection (7.1.2) on the rank 1subset (2.9.2.1) of the boundary of positive semidefinite cone S N + :defining λ 1 ∆ = maxi{λ(A) i } and corresponding eigenvector v 1minimizex‖xx T − A‖ 2 F = minimize tr(xx T (x T x) − 2Axx T + A T A)x{‖λ(A)‖ 2 , λ 1 ≤ 0=‖λ(A)‖ 2 − λ 2 1 , λ 1 > 0 (1280)

C.3. INVOLVING DIAGONAL, TRACE, EIGENVALUES 483arg minimizex‖xx T − A‖ 2 F ={0 , λ1 ≤ 0v 1√λ1 , λ 1 > 0(1281)Proof. From (1365) andD.2.1 the gradient is∇ x((x T x) 2 − 2x T Ax ) = 4(x T x)x − 4Ax (1282)When the gradient is set to 0Ax = x(x T x) (1283)becomes necessary for optimal solution. Suppose we replace vectorx with any normalized eigenvector v i (of A∈ S N ) corresponding to apositive eigenvalue λ i (positive by Theorem A.3.1.0.7). If we scale thateigenvector by square root of the eigenvalue, then (1283) is satisfiedx ← v i√λi ⇒ Av i = v i λ i (1284)xx T = λ i v i v T i (1285)and the minimum is achieved when λ i =λ 1 is the largest positiveeigenvalue of A ; if A has no positive eigenvalue then x=0 yields theminimum. Given nonincreasingly ordered diagonalization A = QΛQ Twhere Λ = δ(λ(A)) (A.5), then (1280) has minimum valueminimizex⎧‖QΛQ T ‖ 2 F = ‖δ(Λ)‖2 , λ 1 ≤ 0⎪⎨⎛⎡‖xx T −A‖ 2 F =λ 1 Q ⎜⎢0⎝⎣. ..⎪⎩ ∥0⎤ ⎞ ∥ ∥∥∥∥∥∥2⎡⎥ ⎟⎦− Λ⎠Q T ⎢=⎣∥Fλ 10.0⎤⎥⎦− δ(Λ)∥2(1286), λ 1 > 0

484 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSGiven some convex set C , maximum eigenvalue magnitude µ ofA∈ S N is minimized over C by the semidefinite program (confer7.1.5)minimize ‖A‖ 2Asubject to A ∈ C≡minimize µµ,Asubject to −µI ≼ A ≼ µIA ∈ C(1287)µ ⋆ ∆ = maxi{ |λ(A ⋆ ) i | , i = 1... N } ∈ R + (1288)(Fan) For B ∈ S N whose eigenvalues λ(B)∈ R N are arranged innonincreasing order, and for 1≤k ≤N [9,4.1] [133] [125,4.3.18][232,2] [148,2.1]N∑i=N−k+1λ(B) i = infU∈ R N×kU T U=Ik∑λ(B) i = supi=1U∈ R N×kU T U=Itr(UU T B) = minimizeX∈ S N +tr(XB)subject to X ≼ ItrX = k= maximize (k − N)µ + tr(B − Z)µ∈R , Z∈S N +subject to µI + Z ≽ Btr(UU T B) = maximizeX∈ S N +tr(XB)subject to X ≼ ItrX = k(a)(b)(c)= minimize kµ + trZµ∈R , Z∈S N +subject to µI + Z ≽ B(d)(1289)Given ordered diagonalization B = WΛW T , then optimal Ufor the infimum is U ⋆ = W(:, N − k+1:N)∈ R N×k whereasU ⋆ = W(:, 1:k)∈ R N×k for the supremum. In both cases, X ⋆ = U ⋆ U ⋆T .Optimization (a) searches the convex hull of the outer product UU Tof all N ×k orthonormal matrices. (2.3.2.1.1)

C.3. INVOLVING DIAGONAL, TRACE, EIGENVALUES 485For B ∈ S N whose eigenvalues λ(B)∈ R N are arranged in nonincreasingorder, and for diagonal matrix Υ∈ S k whose diagonal entriesare arranged in nonincreasing order where 1≤k ≤N , we utilizethe main-diagonal δ operator’s property of self-adjointness (1026);[10,4.2]k∑i=1Υ ii λ(B) N−i+1 = inf tr(ΥU T BU) = infU∈ R N×kU T U=I= minimizeV i ∈S Ni=1U∈ R N×kU T U=Iδ(Υ) T δ(U T BU)()∑tr B k (Υ ii −Υ i+1,i+1 )V isubject to trV i = i ,I ≽ V i ≽ 0,where Υ k+1,k+1 ∆ = 0. We speculate,i=1... ki=1... k(1290)k∑Υ ii λ(B) i = supi=1tr(ΥU T BU) = sup δ(Υ) T δ(U T BU) (1291)U∈ R N×kU∈ R N×kU T U=IU T U=IAlizadeh shows: [9,4.2]k∑i=1Υ ii λ(B) i = minimizek∑iµ i + trZ iµ∈R k , Z i ∈S N i=1subject to µ i I + Z i − (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... k= maximizeV i ∈S Nwhere Υ k+1,k+1 ∆ = 0.Z i ≽ 0,()∑tr B k (Υ ii −Υ i+1,i+1 )V ii=1i=1... ksubject to trV i = i , i=1... kI ≽ V i ≽ 0, i=1... k (1292)

486 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSFor B ∈ S N whose eigenvalues λ(B)∈ R N are arranged in nonincreasingorder, let Ξλ(B) be a permutation of λ(B) such thattheir absolute value becomes arranged in nonincreasing order;|Ξλ(B)| 1 ≥ |Ξλ(B)| 2 ≥ · · · ≥ |Ξλ(B)| N . Then, for 1≤k ≤N[9,4.3] C.1k∑i=1k∑i=1|Ξλ(B)| i = minimize kµ + tr(Y + Z)µ∈R , Y,Z∈S N +subject to µI + Y + B ≽ 0µI + Z − B ≽ 0= maximize 〈B , V − W 〉V ,W ∈ S N +subject to I ≽ V , Wtr(V + W)=k(1293)For diagonal matrix Υ∈ S k whose diagonal entries are arranged innonincreasing order where 1≤k ≤NΥ ii |Ξλ(B)| i = minimizek∑iµ i + tr(Y i + Z i )µ∈R k , Y i , Z i ∈S N i=1subject to µ i I + Y i + (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... k= maximizeV i ,W i ∈S Nwhere Υ k+1,k+1 ∆ = 0.µ i I + Z i − (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... kY i ,Z i ≽ 0,()∑tr B k (Υ ii −Υ i+1,i+1 )(V i − W i )i=1i=1... ksubject to tr(V i + W i ) = i , i=1... kI ≽ V i ≽ 0, i=1... kI ≽ W i ≽ 0, i=1... k (1294)For A,B∈ S N whose eigenvalues λ(A), λ(B)∈ R N are respectivelyarranged in nonincreasing order, and for nonincreasingly ordereddiagonalizations A = W A ΥWA T and B = W B ΛW B T [124] [148,2.1]λ(A) T λ(B) = supU∈ R N×NU T U=Itr(A T U T BU) ≥ tr(A T B) (1312)C.1 There exist typographical errors in [183, (6.49) (6.55)] for this minimization.

C.4. ORTHOGONAL PROCRUSTES PROBLEM 487(confer (1317)) where optimal U isU ⋆ = W B W TA ∈ R N×N (1309)We can push that upper bound higher using a result inC.5.2.1:|λ(A)| T |λ(B)| = supU∈ C N×NU H U=IRe tr(A T U H BU) (1295)For step function ψ as defined in (1168), optimal U becomesU ⋆ = W B√δ(ψ(δ(Λ)))H√δ(ψ(δ(Υ))) WTA ∈ C N×N (1296)C.4 Orthogonal Procrustes problemGiven matrices A,B∈ R n×N , their product having full singular valuedecomposition (A.6.3)AB T ∆ = UΣQ T ∈ R n×n (1297)then an optimal solution R ⋆ to the orthogonal Procrustes problemminimize ‖A − R T B‖ FR(1298)subject to R T = R −1maximizes tr(A T R T B) over the nonconvex manifold of orthogonal matrices:[125,7.4.8]R ⋆ = QU T ∈ R n×n (1299)A necessary and sufficient condition for optimalityAB T R ⋆ ≽ 0 (1300)holds whenever R ⋆ is an orthogonal matrix. [92,4]Solution to problem (1298) can reveal rotation/reflection (4.5.2,B.5)of one list in the columns of A with respect to another list B . Solutionis unique if rankBV N = n . [61,2.4.1] The optimal value for objective ofminimization istr ( A T A + B T B − 2AB T R ⋆) = tr(A T A) + tr(B T B) − 2 tr(UΣU T )= ‖A‖ 2 F + ‖B‖2 F − 2δ(Σ)T 1(1301)

488 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSwhile the optimal value for corresponding trace maximization issupR T =R −1 tr(A T R T B) = tr(A T R ⋆T B) = δ(Σ) T 1 ≥ tr(A T B) (1302)The same optimal solution R ⋆ solvesmaximize ‖A + R T B‖ FR(1303)subject to R T = R −1C.4.1Effect of translationConsider the impact of dc offset in known lists A,B∈ R n×N on problem(1298). Rotation of B there is with respect to the origin, so better resultsmay be obtained if offset is first accounted. Because the geometric centersof the lists AV and BV are the origin, instead we solveminimize ‖AV − R T BV ‖ FR(1304)subject to R T = R −1where V ∈ S N is the geometric centering matrix (B.4.1). Now we define thefull singular value decompositionand an optimal rotation matrixAV B T ∆ = UΣQ T ∈ R n×n (1305)R ⋆ = QU T ∈ R n×n (1299)The desired result is an optimally rotated offset listR ⋆T BV + A(I − V ) ≈ A (1306)which most closely matches the list in A . Equality is attained when the listsare precisely related by a rotation/reflection and an offset. When R ⋆T B=Aor B1=A1=0, this result (1306) reduces to R ⋆T B ≈ A .

C.5. TWO-SIDED ORTHOGONAL PROCRUSTES 489C.4.1.1Translation of extended listSuppose an optimal rotation matrix R ⋆ ∈ R n×n were derived as beforefrom matrix B ∈ R n×N , but B is part of a larger list in the columns of[C B ]∈ R n×M+N where C ∈ R n×M . In that event, we wish to applythe rotation/reflection and translation to the larger list. The expressionsupplanting the approximation in (1306) makes 1 T of compatible dimension;R ⋆T [C −B11 T 1 NBV ] + A11 T 1 N(1307)id est, C −B11 T 1 N ∈ Rn×M and A11 T 1 N ∈ Rn×M+N .C.5 Two-sided orthogonal ProcrustesC.5.0.1MinimizationGiven symmetric A,B∈ S N , each having diagonalization (A.5.2)A ∆ = Q A Λ A Q T A , B ∆ = Q B Λ B Q T B (1308)where eigenvalues are arranged in their respective diagonal matrix Λ innonincreasing order, then an optimal solution [68]to the two-sided orthogonal Procrustes problemR ⋆ = Q B Q T A ∈ R N×N (1309)minimize ‖A − R T BR‖ FR= minimize tr ( A T A − 2A T R T BR + B T B )subject to R T = R −1 Rsubject to R T = R −1 (1310)maximizes tr(A T R T BR) over the nonconvex manifold of orthogonal matrices.Optimal product R ⋆T BR ⋆ has the eigenvectors of A but the eigenvalues of B .[92,7.5.1] The optimal value for the objective of minimization is, by (39)‖Q A Λ A Q T A −R ⋆T Q B Λ B Q T BR ⋆ ‖ F = ‖Q A (Λ A −Λ B )Q T A ‖ F = ‖Λ A −Λ B ‖ F (1311)while the corresponding trace maximization has optimal valuesupR T =R −1 tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A Λ B ) ≥ tr(A T B) (1312)

490 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSC.5.0.2MaximizationAny permutation matrix is an orthogonal matrix. Defining a row and columnswapping permutation matrix (a reflection matrix, B.5.2)⎡ ⎤0 1·Ξ = Ξ T =⎢ ·⎥(1313)⎣ 1 ⎦1 0then an optimal solution R ⋆ to the maximization problem [sic]minimizes tr(A T R T BR) : [124] [148,2.1]maximize ‖A − R T BR‖ FR(1314)subject to R T = R −1R ⋆ = Q B ΞQ T A ∈ R N×N (1315)The optimal value for the objective of maximization is‖Q A Λ A Q T A − R⋆T Q B Λ B Q T B R⋆ ‖ F = ‖Q A Λ A Q T A − Q A ΞT Λ B ΞQ T A ‖ F= ‖Λ A − ΞΛ B Ξ‖ F(1316)while the corresponding trace minimization has optimal valueC.5.1inf tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A ΞΛ B Ξ) (1317)R T =R −1Procrustes’ relation to linear programmingAlthough these two-sided Procrustes problems are nonconvex, a connectionwith linear programming [53] was discovered by Anstreicher & Wolkowicz[10,3] [148,2.1]: Given A,B∈ S N , this semidefinite program in S and Tminimize tr(A T R T BR) = maximize tr(S + T ) (1318)RS , T ∈S Nsubject to R T = R −1 subject to A T ⊗ B − I ⊗ S − T ⊗ I ≽ 0

C.5. TWO-SIDED ORTHOGONAL PROCRUSTES 491(where ⊗ signifies Kronecker product (D.1.2.1)) has optimal objective value(1317). These two problems are strong duals (2.13.1.0.1). Given ordereddiagonalizations (1308), make the observation:infR tr(AT R T BR) = inf tr(Λ ˆRT A Λ B ˆR) (1319)ˆRbecause ˆR =Q ∆ B TRQ A on the set of orthogonal matrices (which includes thepermutation matrices) is a bijection. This means, basically, diagonal matricesof eigenvalues Λ A and Λ B may be substituted for A and B , so only the maindiagonals of S and T come into play;maximizeS,T ∈S N 1 T δ(S + T )subject to δ(Λ A ⊗ (ΞΛ B Ξ) − I ⊗ S − T ⊗ I) ≽ 0(1320)a linear program in δ(S) and δ(T) having the same optimal objective valueas the semidefinite program.We relate their results to Procrustes problem (1310) by manipulatingsigns (1260) and permuting eigenvalues:maximize tr(A T R T BR) = minimize 1 T δ(S + T )RS , T ∈S Nsubject to R T = R −1 subject to δ(I ⊗ S + T ⊗ I − Λ A ⊗ Λ B ) ≽ 0= minimize tr(S + T )(1321)S , T ∈S Nsubject to I ⊗ S + T ⊗ I − A T ⊗ B ≽ 0This formulation has optimal objective value identical to that in (1312).C.5.2Two-sided orthogonal Procrustes via SVDBy making left- and right-side orthogonal matrices independent, we can pushthe upper bound on trace (1312) a little further: Given real matrices A,Beach having full singular value decomposition (A.6.3)A ∆ = U A Σ A Q T A ∈ R m×n , B ∆ = U B Σ B Q T B ∈ R m×n (1322)then a well-known optimal solution R ⋆ , S ⋆ to the problemminimize ‖A − SBR‖ FR , Ssubject to R H = R −1(1323)S H = S −1

C.5. TWO-SIDED ORTHOGONAL PROCRUSTES 493where step function ψ is defined in (1168). In this circumstance,S ⋆ = U A U H B = R ⋆T ∈ C n×n (1333)optimal matrices (1324) now unitary are related by transposition.optimal value of objective (1325) isThe‖U A Σ A Q H A − S ⋆ U B Σ B Q H B R ⋆ ‖ F = ‖ |Υ| − |Λ| ‖ F (1334)while the corresponding optimal value of trace maximization (1326) isC.5.2.2supR H =R −1S H =S −1 Re tr(A T SBR) = tr(|Υ| |Λ|) (1335)Diagonal matricesNow suppose A and B are diagonal matricesA = Υ = δ 2 (Υ) ∈ S n , δ(Υ) ∈ K M (1336)B = Λ = δ 2 (Λ) ∈ S n , δ(Λ) ∈ K M (1337)both having their respective main-diagonal entries arranged in nonincreasingorder:minimize ‖Υ − SΛR‖ FR , Ssubject to R H = R −1(1338)S H = S −1Then we have a symmetric decomposition from unitary matrices as in (1330)whereU A ∆ = √ δ(ψ(δ(Υ))) , Q A ∆ = √ δ(ψ(δ(Υ))) H , Σ A = |Υ| (1339)U B ∆ = √ δ(ψ(δ(Λ))) , Q B ∆ = √ δ(ψ(δ(Λ))) H , Σ B = |Λ| (1340)Procrustes solution (1324) again sees the transposition relationshipS ⋆ = U A U H B = R ⋆T ∈ C n×n (1333)but both optimal unitary matrices are now themselves diagonal. So,S ⋆ ΛR ⋆ = δ(ψ(δ(Υ)))Λδ(ψ(δ(Λ))) = δ(ψ(δ(Υ)))|Λ| (1341)

Appendix DMatrix calculusFrom too much study, and from extreme passion, comethmadnesse.−Isaac Newton [84,5]D.1 Directional derivative, Taylor seriesD.1.1GradientsGradient of a differentiable real function f(x) : R K →R with respect to itsvector domain is defined⎡ ⎤∇ 2 f(x) =∆ ⎢⎣∇f(x) =∆ ⎢⎣∂ 2 f(x)∂ 2 x 1∂f(x)∂x 1∂f(x)∂x 2.∂f(x)∂x K⎥⎦ ∈ RK (1342)while the second-order gradient of the twice differentiable real function withrespect to its vector domain is traditionally called the Hessian ;⎡⎤∂ 2 f(x)∂x 1 ∂x 2· · ·∂ 2 f(x)∂x 2 ∂x 1.∂ 2 f(x)∂x K ∂x 1∂ 2 f(x)∂ 2 x 2· · ·.. . .∂ 2 f(x)∂x K ∂x 2· · ·∂ 2 f(x)∂x 1 ∂x K∂ 2 f(x)∂x 2 ∂x K.∂ 2 f(x)∂ 2 x K∈ S K (1343)⎥⎦2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.495

496 APPENDIX D. MATRIX CALCULUSThe gradient of vector-valued function v(x) : R→R N on real domain isa row-vector[∇v(x) =∆∂v 1 (x)∂xwhile the second-order gradient is∂v 2 (x)∂x· · ·∂v N (x)∂x]∈ R N (1344)[∇ 2 v(x) =∆∂ 2 v 1 (x)∂x 2∂ 2 v 2 (x)∂x 2 · · ·]∂ 2 v N (x)∈ R N (1345)∂x 2Gradient of vector-valued function h(x) : R K →R N on vector domain is⎡∇h(x) =∆ ⎢⎣∂h 1 (x)∂x 1∂h 1 (x)∂x 2.∂h 2 (x)∂x 1· · ·∂h 2 (x)∂x 2· · ·.∂h N (x)∂x 1∂h N (x)∂x 2.⎦(1346)∂h 1 (x) ∂h 2 (x) ∂h∂x K ∂x K· · · N (x)∂x K= [ ∇h 1 (x) ∇h 2 (x) · · · ∇h N (x) ] ∈ R K×Nwhile the second-order gradient has a three-dimensional representationdubbed cubix ; D.1⎡∇ 2 h(x) =∆ ⎢⎣∇ ∂h 1(x)∂x 1∇ ∂h 1(x)∂x 2.⎤⎥∇ ∂h 2(x)∂x 1· · · ∇ ∂h N(x)∂x 1∇ ∂h 2(x)∂x 2· · · ∇ ∂h N(x)∂x 2. .∇ ∂h 2(x)∂x K· · · ∇ ∂h N(x)⎦(1347)∇ ∂h 1(x)∂x K∂x K= [ ∇ 2 h 1 (x) ∇ 2 h 2 (x) · · · ∇ 2 h N (x) ] ∈ R K×N×Kwhere the gradient of each real entry is with respect to vector x as in (1342).⎤⎥D.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derivedfrom mater meaning mother.

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 497The gradient of real function g(X) : R K×L →R on matrix domain is⎡∇g(X) =∆ ⎢⎣=∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)[∇X(:,1) g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤∈ R K×L⎥⎦∇ X(:,2) g(X)...∇ X(:,L) g(X) ] ∈ R K×1×L (1348)where the gradient ∇ X(:,i) is with respect to the i th column of X . Thestrange appearance of (1348) in R K×1×L is meant to suggest a third dimensionperpendicular to the page (not a diagonal matrix). The second-order gradienthas representation⎡∇ 2 g(X) =∆ ⎢⎣=∇ ∂g(X)∂X 11∇ ∂g(X)∂X 21.∇ ∂g(X)∂X K1[∇∇X(:,1) g(X)∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ ∂g(X)∂X 22· · · ∇ ∂g(X)∂X 2L. .∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL⎤∈ R K×L×K×L⎥⎦∇∇ X(:,2) g(X)...∇∇ X(:,L) g(X) ] ∈ R K×1×L×K×L (1349)where the gradient ∇ is with respect to matrix X .

498 APPENDIX D. MATRIX CALCULUSGradient of vector-valued function g(X) : R K×L →R N on matrix domainis a cubix[∇X(:,1) g 1 (X) ∇ X(:,1) g 2 (X) · · · ∇ X(:,1) g N (X)∇g(X) ∆ =∇ X(:,2) g 1 (X) ∇ X(:,2) g 2 (X) · · · ∇ X(:,2) g N (X).........∇ X(:,L) g 1 (X) ∇ X(:,L) g 2 (X) · · · ∇ X(:,L) g N (X) ]= [ ∇g 1 (X) ∇g 2 (X) · · · ∇g N (X) ] ∈ R K×N×L (1350)while the second-order gradient has a five-dimensional representation;[∇∇X(:,1) g 1 (X) ∇∇ X(:,1) g 2 (X) · · · ∇∇ X(:,1) g N (X)∇ 2 g(X) ∆ =∇∇ X(:,2) g 1 (X) ∇∇ X(:,2) g 2 (X) · · · ∇∇ X(:,2) g N (X).........∇∇ X(:,L) g 1 (X) ∇∇ X(:,L) g 2 (X) · · · ∇∇ X(:,L) g N (X) ]= [ ∇ 2 g 1 (X) ∇ 2 g 2 (X) · · · ∇ 2 g N (X) ] ∈ R K×N×L×K×L (1351)The gradient of matrix-valued function g(X) : R K×L →R M×N on matrixdomain has a four-dimensional representation called quartix⎡∇g(X) =∆ ⎢⎣∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g 21 (X) ∇g 22 (X) · · · ∇g 2N (X). .∇g M1 (X) ∇g M2 (X) · · ·.∇g MN (X)⎤⎥⎦ ∈ RM×N×K×L (1352)while the second-order gradient has six-dimensional representation⎡∇ 2 g(X) =∆ ⎢⎣and so on.∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X). .∇ 2 g M1 (X) ∇ 2 g M2 (X) · · ·.∇ 2 g MN (X)⎤⎥⎦ ∈ RM×N×K×L×K×L(1353)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 499D.1.2Product rules for matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X)while [34,8.3] [202]∇ X(f(X) T g(X) ) = ∇ X (f)g + ∇ X (g)f (1354)∇ X tr ( f(X) T g(X) ) = ∇ X(tr ( f(X) T g(Z) ) + tr ( g(X)f(Z) T))∣ ∣∣Z←X (1355)These expressions implicitly apply as well to scalar-, vector-, or matrix-valuedfunctions of scalar, vector, or matrix arguments.D.1.2.0.1 Example. Cubix.Suppose f(X) : R 2×2 →R 2 = X T a and g(X) : R 2×2 →R 2 = Xb . We wishto find∇ X(f(X) T g(X) ) = ∇ X a T X 2 b (1356)using the product rule. Formula (1354) calls for∇ X a T X 2 b = ∇ X (X T a)Xb + ∇ X (Xb)X T a (1357)Consider the first of the two terms:∇ X (f)g = ∇ X (X T a)Xb= [ ∇(X T a) 1 ∇(X T a) 2]Xb(1358)The gradient of X T a forms a cubix in R 2×2×2 .⎡∇ X (X T a)Xb =⎢⎣∂(X T a) 1∂X 11 ∂(X T a) 1∂X 21 ∂(X T a) 2∂X 11 ∂(X T a) 1∂(X T a) 2∂X 12 ∂X 12∂(X T a) 2∂X 21 ∂(X T a) 1∂(X T a) 2∂X 22 ∂X 22⎤⎡⎢⎣⎥⎦(1359)⎤(Xb) 1⎥⎦ ∈ R 2×1×2(Xb) 2

500 APPENDIX D. MATRIX CALCULUSBecause gradient of the product (1356) requires total change with respectto change in each entry of matrix X , the Xb vector must make an innerproduct with each vector in the second dimension of the cubix (indicated bydotted line segments);⎡∇ X (X T a)Xb = ⎢⎣⎤a 1 00 a 1⎥a 2 0 ⎦0 a 2[ ]b1 X 11 + b 2 X 12∈ R 2×1×2b 1 X 21 + b 2 X 22(1360)[ ]a1 (b= 1 X 11 + b 2 X 12 ) a 1 (b 1 X 21 + b 2 X 22 )∈ R 2×2a 2 (b 1 X 11 + b 2 X 12 ) a 2 (b 1 X 21 + b 2 X 22 )= ab T X Twhere the cubix appears as a complete 2 ×2×2 matrix. In like manner forthe second term ∇ X (g)f⎡ ⎤b 1 0∇ X (Xb)X T b 2 0[ ]X11 aa =1 + X 21 a 2⎢ ⎥∈ R 2×1×2⎣ 0 b 1⎦ X 12 a 1 + X 22 a 2 (1361)0 b 2= X T ab T ∈ R 2×2The solution∇ X a T X 2 b = ab T X T + X T ab T (1362)can be found from Table D.2.1 or verified using (1355).D.1.2.1Kronecker productA partial remedy for venturing into hyperdimensional representations, suchas the cubix or quartix, is to first vectorize matrices as in (29). This devicegives rise to the Kronecker product of matrices ⊗ ; a.k.a, direct productor tensor product. Although it sees reversal in the literature, [208,2.1] weadopt the definition: for A∈ R m×n and B ∈ R p×q⎡⎤B 11 A B 12 A · · · B 1q AB ⊗ A =∆ B 21 A B 22 A · · · B 2q A⎢⎥⎣ . . . ⎦ ∈ Rpm×qn (1363)B p1 A B p2 A · · · B pq A

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 501One advantage to vectorization is existence of a traditionaltwo-dimensional matrix representation for the second-order gradient ofa real function with respect to a vectorized matrix. For example, fromA.1.1 no.22 (D.2.1) for square A,B∈R n×n [94,5.2] [10,3]∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vec(X)T (B T ⊗A) vec X = B⊗A T +B T ⊗A ∈ R n2 ×n 2(1364)To disadvantage is a large new but known set of algebraic rules and thefact that its mere use does not generally guarantee two-dimensional matrixrepresentation of gradients.D.1.3Chain rules for composite matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X) [135,15.7]∇ X g ( f(X) T) = ∇ X f T ∇ f g (1365)∇ 2 X g( f(X) T) = ∇ X(∇X f T ∇ f g ) = ∇ 2 X f ∇ f g + ∇ X f T ∇ 2f g ∇ Xf (1366)D.1.3.1Two arguments∇ X g ( f(X) T , h(X) T) = ∇ X f T ∇ f g + ∇ X h T ∇ h g (1367)D.1.3.1.1 Example. Chain rule for two arguments. [27,1.1]∇ x g ( f(x) T , h(x) T) =g ( f(x) T , h(x) T) = (f(x) + h(x)) T A (f(x) + h(x)) (1368)[ ] [ ]x1εx1f(x) = , h(x) =(1369)εx 2 x 2∇ x g ( f(x) T , h(x) T) =[ 1 00 ε][ ε 0(A +A T )(f + h) +0 1[ 1 + ε 00 1 + ε](A +A T )(f + h)] ([ ] [ ])(A +A T x1 εx1) +εx 2 x 2(1370)(1371)

502 APPENDIX D. MATRIX CALCULUSfrom Table D.2.1.limε→0 ∇ xg ( f(x) T , h(x) T) = (A +A T )x (1372)These formulae remain correct when the gradients producehyperdimensional representations:D.1.4First directional derivativeAssume that a differentiable function g(X) : R K×L →R M×N has continuousfirst- and second-order gradients ∇g and ∇ 2 g over domg which is an openset. We seek simple expressions for the first and second directional derivativesin direction Y ∈R K×L →Y, dg ∈ R M×N and dg →Y2 ∈ R M×N respectively.Assuming that the limit exists, we may state the partial derivative of themn th entry of g with respect to the kl th entry of X ;∂g mn (X)∂X klg mn (X + ∆t e= limk e T l ) − g mn(X)∆t→0 ∆t∈ R (1373)where e k is the k th standard basis vector in R K while e l is the l th standardbasis vector in R L . The total number of partial derivatives equals KLMNwhile the gradient is defined in their terms; the mn th entry of the gradient is⎡∇g mn (X) =⎢⎣∂g mn(X)∂X 11∂g mn(X)∂X 21.∂g mn(X)∂X K1while the gradient is a quartix∂g mn(X)∂X 12· · ·∂g mn(X)∂X 22· · ·.∂g mn(X)∂X K2· · ·∂g mn(X)∂X 1L∂g mn(X)∂X 2L.∂g mn(X)∂X KL⎤∈ R K×L (1374)⎥⎦⎡⎤∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g 21 (X) ∇g 22 (X) · · · ∇g 2N (X)∇g(X) = ⎢⎥⎣ . .. ⎦ ∈ RM×N×K×L (1375)∇g M1 (X) ∇g M2 (X) · · · ∇g MN (X)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 503By simply rotating our perspective of the four-dimensional representation ofthe gradient matrix, we find one of three useful transpositions of this quartix(connoted T 1 ):⎡∇g(X) T 1=⎢⎣∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤⎥⎦ ∈ RK×L×M×N (1376)When the limit for ∆t ∈ R exists, it is easy to show by substitution ofvariables in (1373)∂g mn (X) g mn (X + ∆t Y kl eY kl = limk e T l ) − g mn(X)∂X kl ∆t→0 ∆t∈ R (1377)which may be interpreted as the change in g mn at X when the change in X klis equal to Y kl , the kl th entry of any Y ∈ R K×L . Because the total changein g mn (X) due to Y is the sum of change with respect to each and everyX kl , the mn th entry of the directional derivative is the corresponding totaldifferential [135,15.8]dg mn (X)| dX→Y= ∑ k,l∂g mn (X)∂X klY kl = tr ( ∇g mn (X) T Y ) (1378)= ∑ g mn (X + ∆t Y kl elimk e T l ) − g mn(X)(1379)∆t→0 ∆tk,lg mn (X + ∆t Y ) − g mn (X)= lim(1380)∆t→0 ∆t= d dt∣ g mn (X+ t Y ) (1381)t=0where t ∈ R . Assuming finite Y , equation (1380) is called the Gâteauxdifferential [26, App.A.5] [123,D.2.1] [231,5.28] whose existence is impliedby the existence of the Fréchet differential, the sum in (1378). [154,7.2] Eachmay be understood as the change in g mn at X when the change in X is equal

504 APPENDIX D. MATRIX CALCULUSin magnitude and direction to Y . D.2 Hence the directional derivative,⎡⎤dg 11 (X) dg 12 (X) · · · dg 1N (X)→Ydg(X) =∆ dg 21 (X) dg 22 (X) · · · dg 2N (X)⎢⎥∈ R M×N⎣ . . . ⎦∣dg M1 (X) dg M2 (X) · · · dg MN (X)⎡= ⎢⎣⎡∣dX→Ytr ( ∇g 11 (X) T Y ) tr ( ∇g 12 (X) T Y ) · · · tr ( ∇g 1N (X) T Y ) ⎤tr ( ∇g 21 (X) T Y ) tr ( ∇g 22 (X) T Y ) · · · tr ( ∇g 2N (X) T Y )⎥...tr ( ∇g M1 (X) T Y ) tr ( ∇g M2 (X) T Y ) · · · tr ( ∇g MN (X) T Y ) ⎦∑k,l∑=k,l⎢⎣ ∑k,l∂g 11 (X)∂X klY kl∑k,l∂g 21 (X)∂X klY kl∑k,l.∂g M1 (X)∑∂X klY klk,l∂g 12 (X)∂X klY kl · · ·∂g 22 (X)∂X klY kl · · ·.∂g M2 (X)∂X klY kl · · ·∑k,l∑k,l∑k,l∂g 1N (X)∂X klY kl∂g 2N (X)∂X klY kl.∂g MN (X)∂X kl⎤⎥⎦Y kl(1382)from which it follows→Ydg(X) = ∑ k,l∂g(X)∂X klY kl (1383)Yet for all X ∈ domg , any Y ∈R K×L , and some open interval of t∈Rg(X+ t Y ) = g(X) + t →Ydg(X) + o(t 2 ) (1384)which is the first-order Taylor series expansion about X . [135,18.4][83,2.3.4] Differentiation with respect to t and subsequent t-zeroing isolatesthe second term of the expansion. Thus differentiating and zeroing g(X+t Y )in t is an operation equivalent to individually differentiating and zeroingevery entry g mn (X+ t Y ) as in (1381). So the directional derivative of g(X)in any direction Y ∈R K×L evaluated at X ∈ domg becomes→Ydg(X) = d dt∣ g(X+ t Y ) ∈ R M×N (1385)t=0D.2 Although Y is a matrix, we may regard it as a vector in R KL .

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 505υ ∆ =⎡⎢⎣∇ x ˆf(α)→∇ x ˆf(α)1d ˆf(α) 2⎤⎥⎦ˆf(α + t y)υ T( ˆf(α),α)✡ ✡✡✡✡✡✡✡✡✡ˆf(x)∂HFigure 96: Drawn is a convex quadratic bowl in R 2 ×R ; ˆf(x)= x T x : R 2 → Rversus x on some open disc in R 2 . Plane slice ∂H is perpendicularto function domain. Slice intersection with domain connotes bidirectionalvector y . Tangent line T slope at point (α , ˆf(α)) is directional derivativevalue ∇ x ˆf(α) T y (1412) at α in slice direction y . Recall, negative gradient−∇ x ˆf(x)∈ R 2 is always steepest descent direction [245]. [135,15.6] Whenvector υ ∈[ R 3 entry ] υ 3 is half directional derivative in gradient direction at αυ1and when = ∇υ x ˆf(α) , then −υ points directly toward bowl bottom.2[174,2.1,5.4.5] [24,6.3.1] which is simplest. The derivative with respectto t makes the directional derivative (1385) resemble ordinary calculus→Y(D.2); e.g., when g(X) is linear, dg(X) = g(Y ). [154,7.2]D.1.4.1Interpretation directional derivativeIn the case of any differentiable real function ˆf(X) : R K×L →R , thedirectional derivative of ˆf(X) at X in any direction Y yields the slopeof ˆf along the line X+ t Y through its domain (parametrized by t ∈ R)evaluated at t = 0. For higher-dimensional functions, by (1382), this slopeinterpretation can be applied to each entry of the directional derivative.Unlike the gradient, directional derivative does not expand dimension; e.g.,directional derivative in (1385) retains the dimensions of g .

506 APPENDIX D. MATRIX CALCULUSFigure 96, for example, shows a plane slice of a real convex bowl-shapedfunction ˆf(x) along a line α + t y through its domain. The slice reveals aone-dimensional real function of t ; ˆf(α + t y). The directional derivativeat x = α in direction y is the slope of ˆf(α + t y) with respect to t att = 0. In the case of a real function having vector argument h(X) : R K →R ,its directional derivative in the normalized direction of its gradient is thegradient magnitude. (1412) For a real function of real variable, the directionalderivative evaluated at any point in the function domain is just the slope ofthat function there scaled by the real direction. (confer3.1.1.4)D.1.4.1.1 Theorem. Directional derivative condition for optimization.[154,7.4] Suppose ˆf(X) : R K×L →R is minimized on convex set C ⊆ R p×kby X ⋆ , and the directional derivative of ˆf exists there. Then for all X ∈ CD.1.4.1.2 Example. Simple bowl.Bowl function (Figure 96)→X−X ⋆d ˆf(X) ≥ 0 (1386)ˆf(x) : R K → R ∆ = (x − a) T (x − a) − b (1387)has function offset −b ∈ R , axis of revolution at x = a , and positive definiteHessian (1343) everywhere in its domain (an open hyperdisc in R K ); id est,strictly convex quadratic ˆf(x) has unique global minimum equal to −b atx = a . A vector −υ based anywhere in dom ˆf × R pointing toward theunique bowl-bottom is specified:[ ] x − aυ ∝∈ R ˆf(x) K × R (1388)+ b⋄Such a vector issince the gradient isυ =⎡⎢⎣∇ x ˆf(x)→∇ x ˆf(x)1d ˆf(x) 2⎤⎥⎦ (1389)∇ x ˆf(x) = 2(x − a) (1390)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 507and the directional derivative in the direction of the gradient is (1412)→∇ x ˆf(x)d ˆf(x)( )= ∇ x ˆf(x) T ∇ x ˆf(x) = 4(x − a) T (x − a) = 4 ˆf(x) + b(1391)D.1.5Second directional derivative∇ ∂g mn(X)∂X klBy similar argument, it so happens: the second directional derivative isequally simple. Given g(X) : R K×L →R M×N on open domain,= ∂∇g mn(X)∂X kl=⎡⎢⎣∂ 2 g mn(X)∂X kl ∂X 11∂ 2 g mn(X)∂X kl ∂X 21.∂ 2 g mn(X)∂X kl ∂X K1∂ 2 g mn(X)∂X kl ∂X 12· · ·∂ 2 g mn(X)∂X kl ∂X 22· · ·.∂ 2 g mn(X)∂X kl ∂X K2· · ·∂ 2 g mn(X)∂X kl ∂X 1L∂ 2 g mn(X)∂X kl ∂X 2L.∂ 2 g mn(X)∂X kl ∂X KL⎤∈ R K×L (1392)⎥⎦⎡∇ 2 g mn (X) =⎢⎣⎡=⎢⎣∇ ∂gmn(X)∂X 11∇ ∂gmn(X)∂X 21.∇ ∂gmn(X)∂X K1∂∇g mn(X)∂X 11∂∇g mn(X)∂X 21.∂∇g mn(X)∂X K1⎤∇ ∂gmn(X)∂X 12· · · ∇ ∂gmn(X)∂X 1L∇ ∂gmn(X)∂X 22· · · ∇ ∂gmn(X)∂X 2L∈ R K×L×K×L⎥. . ⎦∇ ∂gmn(X)∂X K2· · · ∇ ∂gmn(X)∂X KL∂∇g mn(X)∂X 12· · ·∂∇g mn(X)∂X 22.· · ·∂∇g mn(X)∂X K2· · ·∂∇g mn(X)∂X 1L∂∇g mn(X)∂X 2L.∂∇g mn(X)∂X KL⎤⎥⎦(1393)⎡∇ 2 g(X) = ⎢⎣Rotating our perspective, we get several views of the second-order gradient:⎤∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X)⎥. .. ⎦ ∈ RM×N×K×L×K×L (1394)∇ 2 g M1 (X) ∇ 2 g M2 (X) · · · ∇ 2 g MN (X)

508 APPENDIX D. MATRIX CALCULUS⎡∇ 2 g(X) T 1=⎢⎣⎡∇ 2 g(X) T 2=⎢⎣∇ ∂g(X)∂X 11∇ ∂g(X)∂X 21.∇ ∂g(X)∂X K1∂∇g(X)∂X 11∂∇g(X)∂X 21.∂∇g(X)∂X K1∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ ∂g(X)∂X 22.· · · ∇ ∂g(X)∂X 2L.∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL∂∇g(X)∂X 12· · ·∂∇g(X)∂X 22.· · ·∂∇g(X)∂X K2· · ·∂∇g(X)∂X 1L∂∇g(X)∂X 2L.∂∇g(X)∂X KL⎤⎥⎦ ∈ RK×L×M×N×K×L (1395)⎤⎥⎦ ∈ RK×L×K×L×M×N (1396)Assuming the limits exist, we may state the partial derivative of the mn thentry of g with respect to the kl th and ij th entries of X ;∂ 2 g mn(X)∂X kl ∂X ij=g mn(X+∆t elimk e T l +∆τ e i eT j )−gmn(X+∆t e k eT l )−(gmn(X+∆τ e )−gmn(X))i eT j∆τ,∆t→0∆τ ∆t(1397)Differentiating (1377) and then scaling by Y ij∂ 2 g mn(X)∂X kl ∂X ijY kl Y ij = lim∆t→0∂g mn(X+∆t Y kl e k e T l )−∂gmn(X)∂X ijY∆tij(1398)g mn(X+∆t Y kl e= limk e T l +∆τ Y ij e i e T j )−gmn(X+∆t Y kl e k e T l )−(gmn(X+∆τ Y ij e i e T )−gmn(X))j∆τ,∆t→0∆τ ∆twhich can be proved by substitution of variables in (1397).second-order total differential due to any Y ∈R K×L isd 2 g mn (X)| dX→Y= ∑ i,j∑k,l∂ 2 g mn (X)Y kl Y ij = tr(∇ X tr ( ∇g mn (X) T Y ) )TY∂X kl ∂X ijThe mn th(1399)= ∑ ∂g mn (X + ∆t Y ) − ∂g mn (X)limY ij∆t→0 ∂Xi,jij ∆t(1400)g mn (X + 2∆t Y ) − 2g mn (X + ∆t Y ) + g mn (X)= lim∆t→0∆t 2 (1401)= d2dt 2 ∣∣∣∣t=0g mn (X+ t Y ) (1402)Hence the second directional derivative,

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 509→Ydg 2 (X) ∆ =⎡⎢⎣⎡tr(∇tr ( ∇g 11 (X) T Y ) )TY =tr(∇tr ( ∇g 21 (X) T Y ) )TY ⎢⎣ .tr(∇tr ( ∇g M1 (X) T Y ) )TY⎡∑ ∑i,j k,l∑ ∑=i,j k,l⎢⎣ ∑ ∑i,jk,l∂ 2 g 11 (X)∂X kl ∂X ijY kl Y ij∂ 2 g 21 (X)∂X kl ∂X ijY kl Y ij.∂ 2 g M1 (X)∂X kl ∂X ijY kl Y ijd 2 g 11 (X) d 2 g 12 (X) · · · d 2 g 1N (X)d 2 g 21 (X) d 2 g 22 (X) · · · d 2 g 2N (X). .d 2 g M1 (X) d 2 g M2 (X) · · ·.d 2 g MN (X)⎤⎥∈ R M×N⎦∣∣dX→Ytr(∇tr ( ∇g 12 (X) T Y ) )TY · · · tr(∇tr ( ∇g 1N (X) T Y ) ) ⎤TYtr(∇tr ( ∇g 22 (X) T Y ) )TY · · · tr(∇tr ( ∇g 2N (X) T Y ) )T Y ..tr(∇tr ( ∇g M2 (X) T Y ) )TY · · · tr(∇tr ( ∇g MN (X) T Y ) ⎥) ⎦TY∑ ∑∂ 2 g 12 (X)∂X kl ∂X ijY kl Y ij · · ·i,ji,jk,l∑ ∑∂ 2 g 22 (X)∂X kl ∂X ijY kl Y ij · · ·k,l.∑ ∑∂ 2 g M2 (X)∂X kl ∂X ijY kl Y ij · · ·from which it follows→Ydg 2 (X) = ∑ ∑ ∂ 2 g(X)Y kl Y ij = ∑ ∂Xi,j kl ∂X iji,jk,li,jk,l∑∑⎤∂ 2 g 1N (X)∂X kl ∂X ijY kl Y iji,j k,l∑∑∂ 2 g 2N (X)∂X kl ∂X ijY kl Y iji,j k,l. ⎥∑ ∑ ∂ 2 g MN (X) ⎦∂X kl ∂X ijY kl Y iji,jk,l(1403)∂∂X ij→Ydg(X)Y ij (1404)Yet for all X ∈ domg , any Y ∈R K×L , and some open interval of t∈Rg(X+ t Y ) = g(X) + t dg(X) →Y+ 1 →Yt2dg 2 (X) + o(t 3 ) (1405)2!which is the second-order Taylor series expansion about X . [135,18.4][83,2.3.4] Differentiating twice with respect to t and subsequent t-zeroingisolates the third term of the expansion. Thus differentiating and zeroingg(X+tY ) in t is an operation equivalent to individually differentiating andzeroing every entry g mn (X + t Y ) as in (1402). So the second directionalderivative becomes→Ydg 2 (X) = d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ∈ R M×N (1406)[174,2.1,5.4.5] [24,6.3.1] which is again simplest. (confer (1385))

510 APPENDIX D. MATRIX CALCULUSD.1.6Taylor seriesSeries expansions of the differentiable matrix-valued function g(X) , ofmatrix argument, were given earlier in (1384) and (1405). Assuming g(X)has continuous first-, second-, and third-order gradients over the open setdomg , then for X ∈ domg and any Y ∈ R K×L the complete Taylor serieson some open interval of µ∈R is expressedg(X+µY ) = g(X) + µ dg(X) →Y+ 1 →Yµ2dg 2 (X) + 1 →Yµ3dg 3 (X) + o(µ 4 ) (1407)2! 3!or on some open interval of ‖Y ‖g(Y ) = g(X) +→Y −X→Y −X→Y −Xdg(X) + 1 dg 2 (X) + 1 dg 3 (X) + o(‖Y ‖ 4 ) (1408)2! 3!which are third-order expansions about X . The mean value theorem fromcalculus is what insures the finite order of the series. [27,1.1] [26, App.A.5][123,0.4] [135]In the case of a real function g(X) : R K×L →R , all the directionalderivatives are in R :→Ydg(X) = tr ( ∇g(X) T Y ) (1409)→Ydg 2 (X) = tr(∇ X tr ( ∇g(X) T Y ) )TY(= tr∇ X→Ydg(X) T Y(→Ydg 3 (X) = tr ∇ X tr(∇ X tr ( ∇g(X) T Y ) ) ) ( )T TY →YY = tr ∇ X dg 2 (X) T YIn the case g(X) : R K →R has vector argument, they further simplify:and so on.)(1410)(1411)→Ydg(X) = ∇g(X) T Y (1412)→Ydg 2 (X) = Y T ∇ 2 g(X)Y (1413)→Ydg 3 (X) = ∇ X(Y T ∇ 2 g(X)Y ) TY (1414)D.1.6.0.1 Exercise. log det. (confer [38, p.644])Find the first two terms of the Taylor series expansion (1408) for log detX .

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 511D.1.7Correspondence of gradient to derivativeFrom the foregoing expressions for directional derivative, we derive arelationship between the gradient with respect to matrix X and the derivativewith respect to real variable t :D.1.7.1first-orderRemoving from (1385) the evaluation at t = 0 , D.3 we find an expression forthe directional derivative of g(X) in direction Y evaluated anywhere alonga line X+ t Y (parametrized by t) intersecting domg→Ydg(X+ t Y ) = d g(X+ t Y ) (1415)dtIn the general case g(X) : R K×L →R M×N , from (1378) and (1381) we findtr ( ∇ X g mn (X+ t Y ) T Y ) = d dt g mn(X+ t Y ) (1416)which is valid at t = 0, of course, when X ∈ domg . In the important caseof a real function g(X) : R K×L →R , from (1409) we have simplytr ( ∇ X g(X+ t Y ) T Y ) = d g(X+ t Y ) (1417)dtWhen, additionally, g(X) : R K →R has vector argument,∇ X g(X+ t Y ) T Y = d g(X+ t Y ) (1418)dtD.3 Justified by replacing X with X+ tY in (1378)-(1380); beginning,dg mn (X+ tY )| dX→Y= ∑ k,l∂g mn (X+ tY )Y kl∂X kl

512 APPENDIX D. MATRIX CALCULUSD.1.7.1.1 Example. Gradient.g(X) = w T X T Xw , X ∈ R K×L , w ∈R L . Using the tables inD.2,tr ( ∇ X g(X+ t Y ) T Y ) = tr ( 2ww T (X T + t Y T )Y ) (1419)Applying the equivalence (1417),= 2w T (X T Y + t Y T Y )w (1420)ddt g(X+ t Y ) = d dt wT (X+ t Y ) T (X+ t Y )w (1421)= w T( X T Y + Y T X + 2t Y T Y ) w (1422)= 2w T (X T Y + t Y T Y )w (1423)which is the same as (1420); hence, equivalence is demonstrated.It is easy to extract ∇g(X) from (1423) knowing only (1417):D.1.7.2tr ( ∇ X g(X+ t Y ) T Y ) = 2w T (X T Y + t Y T Y )w= 2 tr ( ww T (X T + t Y T )Y )tr ( ∇ X g(X) T Y ) = 2 tr ( ww T X T Y )(1424)⇔∇ X g(X) = 2Xww Tsecond-orderLikewise removing the evaluation at t = 0 from (1406),→Ydg 2 (X+ t Y ) = d2dt2g(X+ t Y ) (1425)we can find a similar relationship between the second-order gradient and thesecond derivative: In the general case g(X) : R K×L →R M×N from (1399) and(1402),tr(∇ X tr ( ∇ X g mn (X+ t Y ) T Y ) )TY = d2dt 2g mn(X+ t Y ) (1426)In the case of a real function g(X) : R K×L →R we have, of course,tr(∇ X tr ( ∇ X g(X+ t Y ) T Y ) )TY = d2dt2g(X+ t Y ) (1427)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 513From (1413), the simpler case, where the real function g(X) : R K → R hasvector argument,Y T ∇ 2 Xd2g(X+ t Y )Y =dt2g(X+ t Y ) (1428)D.1.7.2.1 Example. Second-order gradient.Given real function g(X) = log detX having domain int S K + , we want tofind ∇ 2 g(X)∈R K×K×K×K . From the tables inD.2,h(X) ∆ = ∇g(X) = X −1 ∈ int S K + (1429)so ∇ 2 g(X)=∇h(X). By (1416) and (1384), for Y ∈ S Ktr ( ∇h mn (X) T Y ) ===d dt∣ h mn (X+ t Y ) (1430)( t=0)d dt∣∣ h(X+ t Y ) (1431)t=0 mn( )d dt∣∣ (X+ t Y ) −1 (1432)t=0mn= − ( X −1 Y X −1) mn(1433)Setting Y to a member of the standard basis E kl =e k e T l , for k,l ∈{1... K},and employing a property of the trace function (31) we find∇ 2 g(X) mnkl = tr ( ∇h mn (X) T E kl)= ∇hmn (X) kl = − (X −1 E kl X −1 ) mn(1434)∇ 2 g(X) kl = ∇h(X) kl = − ( X −1 E kl X −1) ∈ R K×K (1435)From all these first- and second-order expressions, we may generate newones by evaluating both sides at arbitrary t (in some open interval) but onlyafter the differentiation.

514 APPENDIX D. MATRIX CALCULUSD.2 Tables of gradients and derivatives[94] [42]When proving results for symmetric matrices algebraically, it is criticalto take gradients ignoring symmetry and to then substitute symmetricentries afterward.a,b∈R n , x,y ∈ R k , A,B∈ R m×n , X,Y ∈ R K×L , t,µ∈R ,i,j,k,l,K,L,m,n,M,N are integers, unless otherwise noted.x µ means δ(δ(x) µ ) for µ ∈R ; id est, entrywise vector exponentiation.δ is the main-diagonal linear operator (1024). x 0 = ∆ 1, X 0 = ∆ I if square.⎡ ⎤ddx∆ ⎢= ⎣ddx 1.ddx k⎥⎦,→ydg(x) ,→ydg 2 (x) (directional derivativesD.1), log x ,sgnx, sin x , x/y (Hadamard quotient), √ x (entrywise square root),etcetera, are maps f : R k → R k that maintain dimension; e.g., (A.1.1)ddx x−1 ∆ = ∇ x 1 T δ(x) −1 1 (1436)The standard basis: { E kl = e k e T l ∈ R K×K | k,l∈{1... K} }For A a scalar or matrix, we have the Taylor series [44,3.6]e A =∞∑k=01k! Ak (1437)Further, [212,5.4]e A ≻ 0 ∀A ∈ S m (1438)For all square A and integer kdet k A = detA k (1439)Table entries with notation X ∈ R 2×2 have been algebraically verifiedin that dimension but may hold more broadly.

D.2. TABLES OF GRADIENTS AND DERIVATIVES 515D.2.1Algebraic∇ x x = ∇ x x T = I ∈ R k×k ∇ X X = ∇ X X T = ∆ I ∈ R K×L×K×L (identity)∇ x (Ax − b) = A T(∇ x x T A − b T) = A∇ x (Ax − b) T (Ax − b) = 2A T (Ax − b)∇x 2 (Ax − b) T (Ax − b) = 2A T A(∇ x x T Ax + 2x T By + y T Cy ) = (A +A T )x + 2By(∇x2 x T Ax + 2x T By + y T Cy ) = A +A T ∇ X a T Xb = ∇ X b T X T a = ab T∇ X a T X 2 b = X T ab T + ab T X T∇ X a T X −1 b = −X −T ab T X −T∇ x a T x T xb = 2xa T b ∇ X a T X T Xb = X(ab T + ba T )∇ x a T xx T b = (ab T + ba T )x ∇ X a T XX T b = (ab T + ba T )X∇ X (X −1 ) kl = ∂X−1∂X kl= −X −1 E kl X −1 , confer (1376)(1435)∇ x a T x T xa = 2xa T a∇ x a T xx T a = 2aa T x∇ x a T yx T b = ba T y∇ x a T y T xb = yb T a∇ x a T xy T b = ab T y∇ x a T x T yb = ya T b∇ X a T X T Xa = 2Xaa T∇ X a T XX T a = 2aa T X∇ X a T YX T b = ba T Y∇ X a T Y T Xb = Y ab T∇ X a T XY T b = ab T Y∇ X a T X T Y b = Y ba T

516 APPENDIX D. MATRIX CALCULUSAlgebraic continuedd(X+ t Y ) = Ydtddt BT (X+ t Y ) −1 A = −B T (X+ t Y ) −1 Y (X+ t Y ) −1 Addt BT (X+ t Y ) −T A = −B T (X+ t Y ) −T Y T (X+ t Y ) −T Addt BT (X+ t Y ) µ A = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M +d 2dt 2 B T (X+ t Y ) −1 A = 2B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 Addt((X+ t Y ) T A(X+ t Y ) ) = Y T AX + X T AY + 2tY T AYd 2dt 2 ((X+ t Y ) T A(X+ t Y ) ) = 2Y T AYd((X+ t Y )A(X+ t Y )) = YAX + XAY + 2tYAYdtd 2dt 2 ((X+ t Y )A(X+ t Y )) = 2YAYD.2.2Trace Kronecker∇ vec X tr(AXBX T ) = ∇ vec X vec(X) T (B T ⊗ A) vec X = (B ⊗A T + B T ⊗A) vec X∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vec(X)T (B T ⊗ A) vec X = B ⊗A T + B T ⊗A

D.2. TABLES OF GRADIENTS AND DERIVATIVES 517D.2.3Trace∇ x µ x = µI ∇ X trµX = ∇ X µ trX = µI∇ x 1 T δ(x) −1 1 = ddx x−1 = −x −2∇ x 1 T δ(x) −1 y = −δ(x) −2 y∇ X trX −1 = −X −2T∇ X tr(X −1 Y ) = ∇ X tr(Y X −1 ) = −X −T Y T X −Tddx xµ = µx µ−1 ∇ X trX µ = µX (µ−1)T , X ∈ R 2×2∇ X trX j = jX (j−1)T∇ x (b − a T x) −1 = (b − a T x) −2 a∇ X tr ( (B − AX) −1) = ( (B − AX) −2 A ) T∇ x (b − a T x) µ = −µ(b − a T x) µ−1 a∇ x x T y = ∇ x y T x = y∇ X tr(X T Y ) = ∇ X tr(Y X T ) = ∇ X tr(Y T X) = ∇ X tr(XY T ) = Y∇ X tr(AXBX T ) = ∇ X tr(XBX T A) = A T XB T + AXB∇ X tr(AXBX) = ∇ X tr(XBXA) = A T X T B T + B T X T A T∇ X tr(AXAXAX) = ∇ X tr(XAXAXA) = 3(AXAXA) T∇ X tr(Y X k ) = ∇ X tr(X k Y ) = k−1 ∑ (X i Y X k−1−i) Ti=0∇ X tr(Y T XX T Y ) = ∇ X tr(X T Y Y T X) = 2 Y Y T X∇ X tr(Y T X T XY ) = ∇ X tr(XY Y T X T ) = 2XY Y T∇ X tr ( (X + Y ) T (X + Y ) ) = 2(X + Y )∇ X tr((X + Y )(X + Y )) = 2(X + Y ) T∇ X tr(A T XB) = ∇ X tr(X T AB T ) = AB T∇ X tr(A T X −1 B) = ∇ X tr(X −T AB T ) = −X −T AB T X −T∇ X a T Xb = ∇ X tr(ba T X) = ∇ X tr(Xba T ) = ab T∇ X b T X T a = ∇ X tr(X T ab T ) = ∇ X tr(ab T X T ) = ab T∇ X a T X −1 b = ∇ X tr(X −T ab T ) = −X −T ab T X −T∇ X a T X µ b = ...

518 APPENDIX D. MATRIX CALCULUSTrace continueddtrg(X+ t Y ) = tr d g(X+ t Y )dt dtdtr(X+ t Y ) = trYdtddt trj (X+ t Y ) = j tr j−1 (X+ t Y ) trYdtr(X+ t Y dt )j = j tr((X+ t Y ) j−1 Y )(∀j)d2tr((X+ t Y )Y ) = trYdtdtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = k tr ( (X+ t Y ) k−1 Y 2) , k ∈{0, 1, 2}dtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = tr k−1 ∑(X+ t Y ) i Y (X+ t Y ) k−1−i Ydtr((X+ t Y dt )−1 Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dtr( B T (X+ t Y ) −1 A ) = − tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 A )dtdtr( B T (X+ t Y ) −T A ) = − tr ( B T (X+ t Y ) −T Y T (X+ t Y ) −T A )dtdtr( B T (X+ t Y ) −k A ) = ..., k>0dtdtr( B T (X+ t Y ) µ A ) = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M dt +d 2tr ( B T (X+ t Y ) −1 A ) = 2 tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 A )dt 2dtr( (X+ t Y ) T A(X+ t Y ) ) = tr ( Y T AX + X T AY + 2tY T AY )dtd 2tr ( (X+ t Y ) T A(X+ t Y ) ) = 2 tr ( Y T AY )dt 2dtr((X+ t Y )A(X+ t Y )) = tr(YAX + XAY + 2tYAY )dtd 2dt 2 tr((X+ t Y )A(X+ t Y )) = 2 tr(YAY )i=0

D.2. TABLES OF GRADIENTS AND DERIVATIVES 519D.2.4Log determinantx≻0, detX >0 on some neighborhood of X , and det(X+ t Y )>0 onsome open interval of t ; otherwise, log( ) would be discontinuous.dlog x = x−1∇dxddx log x−1 = −x −1ddx log xµ = µx −1X log detX = X −T∇ 2 X log det(X) kl = ∂X−T∂X kl= −(X −1 E kl X −1 ) T , confer (1393)(1435)∇ X log detX −1 = −X −T∇ X log det µ X = µX −T∇ X log detX µ = µX −T , X ∈ R 2×2∇ X log detX k = ∇ X log det k X = kX −T∇ X log det µ (X+ t Y ) = µ(X+ t Y ) −T∇ x log(a T x + b) = a 1a T x+b∇ X log det(AX+ B) = A T (AX+ B) −T∇ X log det(I ± A T XA) = ...∇ X log det(X+ t Y ) k = ∇ X log det k (X+ t Y ) = k(X+ t Y ) −Tdlog det(X+ t Y ) = tr((X+ t Y dt )−1 Y )d 2dt 2 log det(X+ t Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dlog det(X+ t Y dt )−1 = − tr((X+ t Y ) −1 Y )d 2dt 2 log det(X+ t Y ) −1 = tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )ddt log det(δ(A(x + t y) + a)2 + µI)= tr ( (δ(A(x + t y) + a) 2 + µI) −1 2δ(A(x + t y) + a)δ(Ay) )

520 APPENDIX D. MATRIX CALCULUSD.2.5Determinant∇ X detX = ∇ X detX T = det(X)X −T∇ X detX −1 = − det(X −1 )X −T = − det(X) −1 X −T∇ X det µ X = µ det µ (X)X −T∇ X detX µ = µ det(X µ )X −T , X ∈ R 2×2∇ X detX k = k det k−1 (X) ( tr(X)I − X T) , X ∈ R 2×2∇ X detX k = ∇ X det k X = k det(X k )X −T = k det k (X)X −T∇ X det µ (X+ t Y ) = µ det µ (X+ t Y )(X+ t Y ) −T∇ X det(X+ t Y ) k = ∇ X det k (X+ t Y ) = k det k (X+ t Y )(X+ t Y ) −Tddet(X+ t Y ) = det(X+ t Y ) tr((X+ t Y dt )−1 Y )d 2dt 2 det(X+ t Y ) = det(X+ t Y )(tr 2 ((X+ t Y ) −1 Y ) − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddet(X+ t Y dt )−1 = − det(X+ t Y ) −1 tr((X+ t Y ) −1 Y )d 2dt 2 det(X+ t Y ) −1 = det(X+ t Y ) −1 (tr 2 ((X+ t Y ) −1 Y ) + tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddt detµ (X+ t Y ) = ...

D.2. TABLES OF GRADIENTS AND DERIVATIVES 521D.2.6Logarithmicdlog(X+ t Y dt )µ = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M + [126,6.6, prob.20]D.2.7Exponential[44,3.6,4.5] [212,5.4]∇ X e tr(Y T X) = ∇ X dete Y TX = e tr(Y T X) Y (∀X,Y )∇ X tre Y X = e Y T X T Y T = Y T e XT Y Tlog-sum-exp & geometric mean [38, p.74]...d jdt j e tr(X+ t Y ) = e tr(X+ t Y ) tr j (Y )ddt etY = e tY Y = Y e tYddt eX+ t Y = e X+ t Y Y = Y e X+ t Y ,d 2dt 2 e X+ t Y = e X+ t Y Y 2 = Y e X+ t Y Y = Y 2 e X+ t Y ,XY = Y XXY = Y Xe X for symmetric X of dimension less than 3 [38, pg.110]...

522 APPENDIX D. MATRIX CALCULUS

Appendix EProjectionFor all A ∈ R m×n , the pseudoinverse [125,7.3, prob.9] [154,6.12, prob.19][88,5.5.4] [212, App.A]A † = limt→0 +(AT A + t I) −1 A T = limt→0 +AT (AA T + t I) −1 ∈ R n×m (1440)is a unique matrix having [166] [209,III.1, exer.1] E.1R(A † ) = R(A T ), R(A †T ) = R(A) (1441)N(A † ) = N(A T ), N(A †T ) = N(A) (1442)and satisfies the Penrose conditions: [244, Moore-Penrose] [127,1.3]1. AA † A = A2. A † AA † = A †3. (AA † ) T = AA †4. (A † A) T = A † AThe Penrose conditions are necessary and sufficient to establish thepseudoinverse whose principal action is to injectively map R(A) onto R(A T ).E.1 Proof of (1441) and (1442) is by singular value decomposition (A.6).2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.523

524 APPENDIX E. PROJECTIONThe following relations are reliably true without qualification:a. A T † = A †Tb. A †† = Ac. (AA T ) † = A †T A †d. (A T A) † = A † A †Te. (AA † ) † = AA †f. (A † A) † = A † AYet for arbitrary A,B it is generally true that (AB) † ≠ B † A † :E.0.0.0.1 Theorem. Pseudoinverse of product. [97] [149, exer.7.23]For A∈ R m×n and B ∈ R n×k (AB) † = B † A † (1443)if and only ifR(A T AB) ⊆ R(B) and R(BB T A T ) ⊆ R(A T ) (1444)For orthogonal matrices U,Q and arbitrary A [209,III.1]E.0.1Logical deductions(UAQ T ) † = QA † U T (1445)When A is invertible, A † = A −1 , of course; so A † A = AA † = I . Otherwise,for A∈ R m×n [82,5.3.3.1] [149,7] [186]g. A † A = I , A † = (A T A) −1 A T , rankA = nh. AA † = I , A † = A T (AA T ) −1 , rankA = mi. A † Aω = ω , ω ∈ R(A T )j. AA † υ = υ , υ ∈ R(A)k. A † A = AA † , A normall. A k† = A †k , A normal, k an integerWhen A is symmetric, A † is symmetric and (A.6)A ≽ 0 ⇔ A † ≽ 0 (1446)⋄

E.1. IDEMPOTENT MATRICES 525E.1 Idempotent matricesProjection matrices are square and defined by idempotence, P 2 =P ;[212,2.6] [127,1.3] equivalent to the condition, P be diagonalizable[125,3.3, prob.3] with eigenvalues φ i ∈ {0, 1} . [261,4.1, thm.4.1]Idempotent matrices are not necessarily symmetric. The transpose of anidempotent matrix remains idempotent; P T P T = P T . Solely exceptingP = I , all projection matrices are neither orthogonal (B.5) or invertible.[212,3.4] The collection of all projection matrices of particular dimensiondoes not form a convex set.Suppose we wish to project nonorthogonally (obliquely) on the rangeof any particular matrix A∈ R m×n . All idempotent matrices projectingnonorthogonally on R(A) may be expressed:P = A(A † + BZ T ) (1447)where R(P ) = R(A) , E.2 B ∈ R n×k for k ∈ {1... m} is otherwise arbitrary,and Z ∈ R m×k is any matrix in N(A T ); id est,A T Z = A † Z = 0 (1448)Evidently, the collection of nonorthogonal projectors projecting on R(A) isan affine setP k = { A(A † + BZ T ) | B ∈ R n×k} (1449)When matrix A in (1447) is full-rank (A † A=I) or even hasorthonormal columns (A T A = I), this characteristic leads to a biorthogonalcharacterization of nonorthogonal projection:E.2 Proof. R(P )⊆ R(A) is obvious [212,3.6]. By (113) and (114),R(A † + BZ T ) = {(A † + BZ T )y | y ∈ R m }⊇ {(A † + BZ T )y | y ∈ R(A)} = R(A T )R(P ) = {A(A † + BZ T )y | y ∈ R m }⊇ {A(A † + BZ T )y | (A † + BZ T )y ∈ R(A T )} = R(A)

526 APPENDIX E. PROJECTIONE.1.1Biorthogonal characterizationAny nonorthogonal projector P 2 =P ∈ R m×m projecting on nontrivial R(U)can be defined by a biorthogonality condition Q T U =I ; the biorthogonaldecomposition of P being (confer (1447))where E.3 (B.1.1.1)and where generally (confer (1476)) E.4P = UQ T , Q T U = I (1450)R(P )= R(U) , N(P )= N(Q T ) (1451)P T ≠ P , P † ≠ P , ‖P ‖ 2 ≠ 1, P 0 (1452)and P is not nonexpansive (1477).(⇐) To verify assertion (1450) we observe: because idempotent matricesare diagonalizable (A.5), [125,3.3, prob.3] they must have the form (1144)P = SΦS −1 =m∑φ i s i wi T =i=1k∑≤ mi=1s i w T i (1453)that is a sum of k = rankP independent projector dyads (idempotent dyadB.1.1) where φ i ∈ {0, 1} are the eigenvalues of P [261,4.1, thm.4.1] indiagonal matrix Φ∈ R m×m arranged in nonincreasing order, and wheres i ,w i ∈ R m are the right- and left-eigenvectors of P , respectively, whichare independent and real. E.5 ThereforeU ∆ = S(:,1:k) = [ s 1 · · · s k]∈ Rm×k(1454)E.3 Proof. Obviously, R(P ) ⊆ R(U). Because Q T U = IR(P ) = {UQ T x | x ∈ R m }⊇ {UQ T Uy | y ∈ R k } = R(U)E.4 Orthonormal decomposition (1473) (conferE.3.4) is a special case of biorthogonaldecomposition (1450) characterized by (1476). So, these characteristics (1452) are notnecessary conditions for biorthogonality.E.5 Eigenvectors of a real matrix corresponding to real eigenvalues must be real.(A.5.0.0.1)

E.1. IDEMPOTENT MATRICES 527is the full-rank matrix S ∈ R m×m having m − k columns truncated(corresponding to 0 eigenvalues), while⎡Q T = ∆ S −1 (1:k, :) = ⎣w T1.w T k⎤⎦ ∈ R k×m (1455)is matrix S −1 having the corresponding m − k rows truncated. By the0 eigenvalues theorem (A.7.2.0.1), R(U)= R(P ) , R(Q)= R(P T ) , andR(P ) = span {s i | φ i = 1 ∀i}N(P ) = span {s i | φ i = 0 ∀i}R(P T ) = span {w i | φ i = 1 ∀i}N(P T ) = span {w i | φ i = 0 ∀i}(1456)Thus biorthogonality Q T U =I is a necessary condition for idempotence, andso the collection of nonorthogonal projectors projecting on R(U) is the affineset P k =UQ T k where Q k = {Q | Q T U = I , Q∈ R m×k }.(⇒) Biorthogonality is a sufficient condition for idempotence;P 2 =k∑s i wiTi=1k∑s j wj T = P (1457)j=1id est, if the cross-products are annihilated, then P 2 =P .Nonorthogonal projection ofbiorthogonal expansion,x on R(P ) has expression like aPx = UQ T x =k∑wi T xs i (1458)i=1When the domain is restricted to the range of P , say x=Uξ forξ ∈ R k , then x = Px = UQ T Uξ = Uξ . Otherwise, any component of xin N(P)= N(Q T ) will be annihilated. The direction of nonorthogonalprojection is orthogonal to R(Q) ⇔ Q T U =I ; id est, for Px∈ R(U)Px − x ⊥ R(Q) in R m (1459)

528 APPENDIX E. PROJECTIONTxT ⊥ R(Q)cone(Q)cone(U)PxFigure 97: Nonorthogonal projection of x∈ R 3 on R(U)= R 2 underbiorthogonality condition; id est, Px=UQ T x such that Q T U =I . Anypoint along imaginary line T connecting x to Px will be projectednonorthogonally on Px with respect to horizontal plane constituting R 2 inthis example. Extreme directions of cone(U) correspond to two columnsof U ; likewise for cone(Q). For purpose of illustration, we truncate eachconic hull by truncating coefficients of conic combination at unity. Conic hullcone(Q) is headed upward at an angle, out of plane of page. Nonorthogonalprojection would fail were N(Q T ) in R(U) (were T parallel to a linein R(U)).

E.1. IDEMPOTENT MATRICES 529E.1.1.0.1 Example. Illustration of nonorthogonal projector.Figure 97 shows cone(U) , the conic hull of the columns of⎡1 1U = ⎣−0.5 0.30 0⎤⎦ (1460)from nonorthogonal projector P = UQ T . Matrix U has a limitless numberof left inverses because N(U T ) is nontrivial. Similarly depicted is left inverseQ T from (1447)⎡Q = U †T + ZB T = ⎣⎡= ⎣0.3750 0.6250−1.2500 1.25000 00.3750 0.6250−1.2500 1.25000.5000 0.5000⎤⎡⎦ + ⎣⎤⎦001⎤⎦[0.5 0.5](1461)where Z ∈ N(U T ) and matrix B is selected arbitrarily; id est, Q T U = Ibecause U is full-rank.Direction of projection on R(U) is orthogonal to R(Q). Any point alongline T in the figure, for example, will have the same projection. Were matrixZ instead equal to 0, then cone(Q) would become the relative dual tocone(U) (sharing the same affine hull;2.13.8, confer Figure 41(a)) In thatcase, projection Px = UU † x of x on R(U) becomes orthogonal projection(and unique minimum-distance).E.1.2Idempotence summarySummarizing, nonorthogonal subspace-projector P is a linear operatordefined by idempotence or biorthogonal decomposition (1450), butcharacterized not by symmetry nor positive semidefiniteness nornonexpansivity (1477).

530 APPENDIX E. PROJECTIONE.2 I−P , Projection on algebraic complementIt follows from the diagonalizability of idempotent matrices that I − P mustalso be a projection matrix because it too is idempotent, and because it maybe expressedm∑I − P = S(I − Φ)S −1 = (1 − φ i )s i wi T (1462)where (1 − φ i ) ∈ {1, 0} are the eigenvalues of I − P (1059) whoseeigenvectors s i ,w i are identical to those of P in (1453). A consequence ofthat complementary relationship of eigenvalues is the fact, [222,2] [218,2]for subspace projector P = P 2 ∈ R m×mR(P ) = span {s i | φ i = 1 ∀i} = span {s i | (1 − φ i ) = 0 ∀i} = N(I − P )N(P ) = span {s i | φ i = 0 ∀i} = span {s i | (1 − φ i ) = 1 ∀i} = R(I − P )R(P T ) = span {w i | φ i = 1 ∀i} = span {w i | (1 − φ i ) = 0 ∀i} = N(I − P T )N(P T ) = span {w i | φ i = 0 ∀i} = span {w i | (1 − φ i ) = 1 ∀i} = R(I − P T )(1463)that is easy to see from (1453) and (1462). Idempotent I −P thereforeprojects vectors on its range, N(P ). Because all eigenvectors of a realidempotent matrix are real and independent, the algebraic complement ofR(P ) [140,3.3] is equivalent to N(P ) ; E.6 id est,R(P )⊕N(P ) = R(P T )⊕N(P T ) = R(P T )⊕N(P ) = R(P )⊕N(P T ) = R mi=1(1464)because R(P ) ⊕ R(I −P )= R m . For idempotent P ∈ R m×m , consequently,rankP + rank(I − P ) = m (1465)E.2.0.0.1 Theorem. Rank/Trace. [261,4.1, prob.9] (confer (1481))P 2 = P⇔rankP = trP and rank(I − P ) = tr(I − P )(1466)E.6 The same phenomenon occurs with symmetric (nonidempotent) matrices, for example.When the summands in A ⊕ B = R m are orthogonal vector spaces, the algebraiccomplement is the orthogonal complement.⋄

E.3. SYMMETRIC IDEMPOTENT MATRICES 531E.2.1Universal projector characteristicAlthough projection is not necessarily orthogonal and R(P )̸⊥ R(I − P ) ingeneral, still for any projector P and any x∈ R mPx + (I − P )x = x (1467)must hold where R(I − P ) = N(P ) is the algebraic complement of R(P).The algebraic complement of closed convex cone K , for example, is thenegative dual cone −K ∗ . (1582)E.3 Symmetric idempotent matricesWhen idempotent matrix P is symmetric, P is an orthogonal projector. Inother words, the projection direction of point x∈ R m on subspace R(P ) isorthogonal to R(P ) ; id est, for P 2 =P ∈ S m and projection Px∈ R(P )Px − x ⊥ R(P ) in R m (1468)Perpendicularity is a necessary and sufficient condition for orthogonalprojection on a subspace. [59,4.9]A condition equivalent to (1468) is: The norm of direction x −Px isthe infimum over all nonorthogonal projections of x on R(P ) ; [154,3.3] forP 2 =P ∈ S m , R(P )= R(A) , matrices A,B, Z and integer k as defined for(1447), and any given x∈ R m‖x − Px‖ 2 =inf ‖x − A(A † + BZ T )x‖ 2 (1469)B∈R n×kThe infimum is attained for R(B)⊆ N(A) over any affine set ofnonorthogonal projectors (1449) indexed by k .The proof is straightforward: The vector 2-norm is a convex function.ApplyingD.2, setting gradient of the norm-square to 0(A T ABZ T − A T (I − AA † ) ) xx T A = 0⇔A T ABZ T xx T A = 0Projection Px=AA † x is therefore unique.(1470)

532 APPENDIX E. PROJECTIONIn any case, P =AA † so the projection matrix must be symmetric. Thenfor any A∈ R m×n , P =AA † projects any vector x in R m orthogonally onR(A). Under either condition (1468) or (1469), the projection Px is uniqueminimum-distance; for subspaces, perpendicularity and minimum-distanceconditions are equivalent.E.3.1Four subspacesWe summarize the orthogonal projectors projecting on the four fundamentalsubspaces: for A∈ R m×nA † A : R n on R(A † A) = R(A T )AA † : R m on R(AA † ) = R(A)I −A † A : R n on R(I −A † A) = N(A)I −AA † : R m on R(I −AA † ) = N(A T )(1471)For completeness: E.7 (1463)N(A † A) = N(A)N(AA † ) = N(A T )N(I −A † A) = R(A T )N(I −AA † ) = R(A)(1472)E.3.2Orthogonal characterizationAny symmetric projector P 2 =P ∈ S m projecting on nontrivial R(Q) canbe defined by the orthonormality condition Q T Q = I . When skinnymatrix Q∈ R m×k has orthonormal columns, then Q † = Q T by the Penroseconditions. Hence, any P having an orthonormal decomposition (E.3.4)where [212,3.3] (1194)P = QQ T , Q T Q = I (1473)R(P ) = R(Q) , N(P ) = N(Q T ) (1474)E.7 Proof is by singular value decomposition (A.6.2): N(A † A)⊆ N(A) is obvious.Conversely, suppose A † Ax=0. Then x T A † Ax=x T QQ T x=‖Q T x‖ 2 =0 where A=UΣQ Tis the subcompact singular value decomposition. Because R(Q)= R(A T ), then x∈N(A)that implies N(A † A)⊇ N(A).

E.3. SYMMETRIC IDEMPOTENT MATRICES 533is an orthogonal projector projecting on R(Q) having, for Px∈ R(Q)(confer (1459))Px − x ⊥ R(Q) in R m (1475)From (1473), orthogonal projector P is obviously positive semidefinite(A.3.1.0.6); necessarily,P T = P , P † = P , ‖P ‖ 2 = 1, P ≽ 0 (1476)and ‖Px‖ = ‖QQ T x‖ = ‖Q T x‖ because ‖Qy‖ = ‖y‖ ∀y ∈ R k . All orthogonalprojectors are therefore nonexpansive because√〈Px, x〉 = ‖Px‖ = ‖Q T x‖ ≤ ‖x‖ ∀x∈ R m (1477)the Bessel inequality, [59] [140] with equality when x∈ R(Q).From the diagonalization of idempotent matrices (1453) on page 526P = SΦS T =m∑φ i s i s T i =i=1k∑≤ mi=1s i s T i (1478)orthogonal projection of point x on R(P ) has expression like an orthogonalexpansion [59,4.10]wherePx = QQ T x =k∑s T i xs i (1479)i=1Q = S(:,1:k) = [ s 1 · · · s k]∈ Rm×k(1480)and where the s i [sic] are orthonormal eigenvectors of symmetric idempotentP . When the domain is restricted to the range of P , say x=Qξ for ξ ∈ R k ,then x = Px = QQ T Qξ = Qξ . Otherwise, any component of x in N(Q T )will be annihilated.E.3.2.0.1 Theorem. Symmetric rank/trace. (confer (1466) (1061))P T = P , P 2 = P⇔rankP = trP = ‖P ‖ 2 F and rank(I − P ) = tr(I − P ) = ‖I − P ‖ 2 F(1481)⋄

534 APPENDIX E. PROJECTIONProof. We take as given Theorem E.2.0.0.1 establishing idempotence.We have left only to show trP =‖P ‖ 2 F ⇒ P T = P , established in [261,7.1].E.3.3Summary, symmetric idempotentIn summary, orthogonal projector P is a linear operator defined[123,A.3.1] by idempotence and symmetry, and characterized bypositive semidefiniteness and nonexpansivity. The algebraic complement(E.2) to R(P ) becomes the orthogonal complement R(I − P ) ; id est,R(P ) ⊥ R(I − P ).E.3.4Orthonormal decompositionWhen Z = 0 in the general nonorthogonal projector A(A † + BZ T ) (1447),an orthogonal projector results (for any matrix A) characterized principallyby idempotence and symmetry. Any real orthogonal projector may, infact, be represented by an orthonormal decomposition such as (1473).[127,1, prob.42]To verify that assertion for the four fundamental subspaces (1471),we need only to express A by subcompact singular value decomposition(A.6.2): From pseudoinverse (1171) of A = UΣQ T ∈ R m×nAA † = UΣΣ † U T = UU T ,I − AA † = I − UU T = U ⊥ U ⊥T ,A † A = QΣ † ΣQ T = QQ TI − A † A = I − QQ T = Q ⊥ Q ⊥T(1482)where U ⊥ ∈ R m×m−rank A holds columnar an orthonormal basis for theorthogonal complement of R(U) , and likewise for Q ⊥ ∈ R n×n−rank A .The existence of an orthonormal decomposition is sufficient to establishidempotence and symmetry of an orthogonal projector (1473). E.3.4.1Unifying trait of all projectors: directionRelation (1482) shows: orthogonal projectors simultaneously possessa biorthogonal decomposition (conferE.1.1) (for example, AA † forskinny-or-square A full-rank) and an orthonormal decomposition (UU Twhence Px = UU T x).

E.3. SYMMETRIC IDEMPOTENT MATRICES 535E.3.4.2 orthogonal projector, orthonormal decompositionConsider orthogonal expansion of x∈ R(U) :x = UU T x =n∑u i u T i x (1483)i=1a sum of one-dimensional orthogonal projections (E.6.3), whereU ∆ = [u 1 · · · u n ] and U T U = I (1484)and where the subspace projector has two expressions, (1482)AA † ∆ = UU T (1485)where A ∈ R m×n has rank n . The direction of projection of x on u j forsome j ∈{1... n} , for example, is orthogonal to u j but parallel to a vectorin the span of all the remaining vectors constituting the columns of U ;u T j(u j u T j x − x) = 0u j u T j x − x = u j u T j x − UU T x ∈ R({u i |i=1... n, i≠j})(1486)E.3.4.3orthogonal projector, biorthogonal decompositionWe get a similar result for the biorthogonal expansion of x∈ R(A). Defineand the rows of the pseudoinverse⎡A ∆ = [a 1 a 2 · · · a n ] ∈ R m×n (1487)A † =∆ ⎢⎣a † 1a †2.a † n⎤⎥⎦ ∈ Rn×m (1488)under the biorthogonality condition A † A=I . In the biorthogonal expansion(2.13.8)n∑x = AA † x = a i a † i x (1489)i=1

536 APPENDIX E. PROJECTIONthe direction of projection of x on a j for some particular j ∈ {1... n} , forexample, is orthogonal to a † j and parallel to a vector in the span of all theremaining vectors constituting the columns of A ;E.3.4.4a † j (a ja † j x − x) = 0a j a † j x − x = a ja † j x − AA† x ∈ R({a i |i=1... n, i≠j})nonorthogonal projector, biorthogonal decomposition(1490)Because the result inE.3.4.3 is independent of symmetry AA † =(AA † ) T , wemust have the same result for any nonorthogonal projector characterized bya biorthogonality condition; namely, for nonorthogonal projector P = UQ T(1450) under biorthogonality condition Q T U = I , in the biorthogonalexpansion of x∈ R(U)k∑x = UQ T x = u i qi T x (1491)wherei=1U = ∆ [ ]u 1 · · · u k ∈ Rm×k⎡ ⎤q T1 (1492)Q T =∆ ⎣ . ⎦ ∈ R k×mq T kthe direction of projection of x on u j is orthogonal to q j and parallel to avector in the span of the remaining u i :q T j (u j q T j x − x) = 0u j q T j x − x = u j q T j x − UQ T x ∈ R({u i |i=1... k , i≠j})E.4 Algebra of projection on affine subsets(1493)Let P A x denote projection of x on affine subset A ∆ = R + α where R is asubspace and α ∈ A . Then, because R is parallel to A , it holds:P A x = P R+α x = (I − P R )(α) + P R x= P R (x − α) + α(1494)Subspace projector P R is a linear operator (E.1.2), and P R (x + y) = P R xwhenever y ⊥R and P R is an orthogonal projector.

E.5. PROJECTION EXAMPLES 537E.4.0.0.1 Theorem. Orthogonal projection on affine subset. [59,9.26]Let A = R + α be an affine subset where α ∈ A , and let R ⊥ be theorthogonal complement of subspace R . Then P A x is the orthogonalprojection of x∈ R n on A if and only ifP A x ∈ A , 〈P A x − x, a − α〉 = 0 ∀a ∈ A (1495)or if and only ifP A x ∈ A , P A x − x ∈ R ⊥ (1496)⋄E.5 Projection examplesE.5.0.0.1 Example. Orthogonal projection on orthogonal basis.Orthogonal projection on a subspace can instead be accomplished byorthogonally projecting on the individual members of an orthogonal basis forthat subspace. Suppose, for example, matrix A∈ R m×n holds an orthonormalbasis for R(A) in its columns; A = ∆ [a 1 a 2 · · · a n ] . Then orthogonalprojection of vector x∈ R n on R(A) is a sum of one-dimensional orthogonalprojectionsPx = AA † x = A(A T A) −1 A T x = AA T x =n∑a i a T i x (1497)i=1where each symmetric dyad a i a T i is an orthogonal projector projecting onR(a i ). (E.6.3) Because ‖x − Px‖ is minimized by orthogonal projection,Px is considered to be the best approximation (in the Euclidean sense) tox from the set R(A) . [59,4.9]E.5.0.0.2 Example. Orthogonal projection on span of nonorthogonal basis.Orthogonal projection on a subspace can also be accomplished by projectingnonorthogonally on the individual members of any nonorthogonal basis forthat subspace. This interpretation is in fact the principal application of thepseudoinverse we discussed. Now suppose matrix A holds a nonorthogonalbasis for R(A) in its columns;A ∆ = [a 1 a 2 · · · a n ] ∈ R m×n (1498)

538 APPENDIX E. PROJECTIONand define the rows of the pseudoinverse⎡ ⎤a † 1A † =∆ a †2⎢ ⎥⎣ . ⎦ ∈ Rn×m (1499)a † nwith A † A = I . Then orthogonal projection of vector x ∈ R n on R(A) is asum of one-dimensional nonorthogonal projectionsPx = AA † x =n∑a i a † i x (1500)i=1where each nonsymmetric dyad a i a † i is a nonorthogonal projector projectingon R(a i ) , (E.6.1) idempotent because of the biorthogonality conditionA † A = I .The projection Px is regarded as the best approximation to x from theset R(A) , as it was in Example E.5.0.0.1.E.5.0.0.3 Example. Biorthogonal expansion as nonorthogonal projection.Biorthogonal expansion can be viewed as a sum of components, each anonorthogonal projection on the range of an extreme direction of a pointedpolyhedral cone K ; e.g., Figure 98.Suppose matrix A∈ R m×n holds a nonorthogonal basis for R(A) inits columns as in (1498), and the rows of pseudoinverse A † are definedas in (1499). Assuming the most general biorthogonality condition(A † + BZ T )A = I with BZ T defined as for (1447), then biorthogonalexpansion of vector x is a sum of one-dimensional nonorthogonal projections;for x∈ R(A)x = A(A † + BZ T )x = AA † x =n∑a i a † i x (1501)i=1where each dyad a i a † i is a nonorthogonal projector projecting on R(a i ).(E.6.1) The extreme directions of K=cone(A) are {a 1 ,..., a n } thelinearly independent columns of A while {a †T1 ,..., a †Tn } the extremedirections of relative dual cone K ∗ ∩aff K= cone(A †T ) (2.13.9.4) correspondto the linearly independent (B.1.1.1) rows of A † . The directions of

E.5. PROJECTION EXAMPLES 539a †T2K ∗a 2a †TzxK0ya 1a 1 ⊥ a †T2a 2 ⊥ a †T1x = y + z = P a1 x + P a2 xK ∗Figure 98: (confer Figure 46) Biorthogonal expansion of point x ∈ aff K isfound by projecting x nonorthogonally on range of extreme directions ofpolyhedral cone K ⊂ R 2 . Direction of projection on extreme direction a 1is orthogonal to extreme direction a †T1 of dual cone K ∗ and parallel to a 2(E.3.4.1); similarly, direction of projection on a 2 is orthogonal to a †T2 andparallel to a 1 . Point x is sum of nonorthogonal projections: x on R(a 1 )and x on R(a 2 ). Expansion is unique because extreme directions of K arelinearly independent. Were a 1 orthogonal to a 2 , then K would be identicalto K ∗ and nonorthogonal projections would become orthogonal.1

540 APPENDIX E. PROJECTIONnonorthogonal projection are determined by the pseudoinverse; id est,direction of projection a i a † i x−x on R(a i) is orthogonal to a †Ti . E.8Because the extreme directions of this cone K are linearly independent,the component projections are unique in the sense:there is only one linear combination of extreme directions of K thatyields a particular point x∈ R(A) wheneverR(A) = aff K = R(a 1 ) ⊕ R(a 2 ) ⊕ ... ⊕ R(a n ) (1502)E.5.0.0.4 Example. Nonorthogonal projection on elementary matrix.Suppose P Y is a linear nonorthogonal projector projecting on subspace Y ,and suppose the range of a vector u is linearly independent of Y ; id est,for some other subspace M containing Y supposeM = R(u) ⊕ Y (1503)Assuming P M x = P u x + P Y x holds, then it follows for vector x∈MP u x = x − P Y x , P Y x = x − P u x (1504)nonorthogonal projection of x on R(u) can be determined fromnonorthogonal projection of x on Y , and vice versa.Such a scenario is realizable were there some arbitrary basis for Ypopulating a full-rank skinny-or-square matrix AA ∆ = [ basis Y u ] ∈ R n+1 (1505)Then P M =AA † fulfills the requirements, with P u =A(:,n + 1)A † (n + 1,:)and P Y =A(:,1 : n)A † (1 : n,:). Observe, P M is an orthogonal projectorwhereas P Y and P u are nonorthogonal projectors.Now suppose, for example, P Y is an elementary matrix (B.3); inparticular,P Y = I − e 1 1 T =[0 √ ]2V N ∈ R N×N (1506)where Y = N(1 T ) . We have M = R N , A = [ √ 2V N e 1 ] , and u = e 1 .Thus P u = e 1 1 T is a nonorthogonal projector projecting on R(u) in adirection parallel to a vector in Y (E.3.4.1), and P Y x = x − e 1 1 T x is anonorthogonal projection of x on Y in a direction parallel to u . E.8 This remains true in high dimension although only a little more difficult to visualizein R 3 ; confer , Figure 47.

E.5. PROJECTION EXAMPLES 541E.5.0.0.5 Example. Projecting the origin on a hyperplane.(confer2.4.2.0.1) Given the hyperplane representation having b ∈ R andnonzero normal a∈R m ∂H = {y | a T y = b} ⊂ R m (90)the orthogonal projection of the origin P0 on that hyperplane is the solutionto a minimization problem: (1469)‖P0 − 0‖ 2 = infy∈∂H ‖y − 0‖ 2= infξ∈R m−1 ‖Zξ + x‖ 2(1507)where x is any solution to a T y=b , and where the columns of Z ∈ R m×m−1constitute a basis for N(a T ) so that y = Zξ + x ∈ ∂H for all ξ ∈ R m−1 .The infimum can be found by setting the gradient (with respect to ξ) ofthe strictly convex norm-square to 0. We find the minimizing argumentsoand from (1471)P0 = y ⋆ = a(a T a) −1 a T x =ξ ⋆ = −(Z T Z) −1 Z T x (1508)y ⋆ = ( I − Z(Z T Z) −1 Z T) x (1509)a Ta‖a‖ ‖a‖ x = ∆ AA † x = a b (1510)‖a‖ 2In words, any point x in the hyperplane ∂H projected on its normal a(confer (1535)) yields that point y ⋆ in the hyperplane closest to the origin.E.5.0.0.6 Example. Projection on affine subset.The technique of Example E.5.0.0.5 is extensible. Given an intersection ofhyperplanesA = {y | Ay = b} ⊂ R m (1511)where each row of A ∈ R m×n is nonzero and b ∈ R(A) , then the orthogonalprojection Px of any point x∈ R n on A is the solution to a minimizationproblem:‖Px − x‖ 2 = infy∈A‖y − x‖ 2= infξ∈R n−rank A ‖Zξ + y p − x‖ 2(1512)

542 APPENDIX E. PROJECTIONwhere y p is any solution to Ay = b , and where the columns ofZ ∈ R n×n−rank A constitute a basis for N(A) so that y = Zξ + y p ∈ A for allξ ∈ R n−rank A .The infimum is found by setting the gradient of the strictly convexnorm-square to 0. The minimizing argument issoand from (1471),ξ ⋆ = −(Z T Z) −1 Z T (y p − x) (1513)y ⋆ = ( I − Z(Z T Z) −1 Z T) (y p − x) + x (1514)Px = y ⋆ = x − A † (Ax − b)= (I − A † A)x + A † Ay p(1515)which is a projection of x on N(A) then translated perpendicularly withrespect to the nullspace until it meets the affine subset A . E.5.0.0.7 Example. Projection on affine subset, vertex-description.Suppose now we instead describe the affine subset A in terms of some givenminimal set of generators arranged columnar in X ∈ R n×N (64); id est,A ∆ = aff X = {Xa | a T 1=1} ⊆ R n (1516)Here minimal set means XV N = [x 2 −x 1 x 3 −x 1 · · · x N −x 1 ]/ √ 2 (517) isfull-rank (2.4.2.2) where V N ∈ R N×N−1 is the Schoenberg auxiliary matrix(B.4.2). Then the orthogonal projection Px of any point x∈ R n on A isthe solution to a minimization problem:‖Px − x‖ 2= inf ‖Xa − x‖ 2a T 1=1(1517)= inf ‖X(V N ξ + a p ) − x‖ 2ξ∈R N−1where a p is any solution to a T 1=1. We find the minimizing argumentξ ⋆ = −(V T NX T XV N ) −1 V T NX T (Xa p − x) (1518)and so the orthogonal projection is [128,3]Px = Xa ⋆ = (I − XV N (XV N ) † )Xa p + XV N (XV N ) † x (1519)a projection of point x on R(XV N ) then translated perpendicularly withrespect to that range until it meets the affine subset A .

E.6. VECTORIZATION INTERPRETATION, 543E.5.0.0.8 Example. Projecting on hyperplane, halfspace, slab.Given the hyperplane representation having b ∈ R and nonzero normala∈ R m ∂H = {y | a T y = b} ⊂ R m (90)the orthogonal projection of any point x∈ R m on that hyperplane isPx = x − a(a T a) −1 (a T x − b) (1520)Orthogonal projection of x on the halfspace parametrized by b ∈ R andnonzero normal a∈ R m H − = {y | a T y ≤ b} ⊂ R m (82)is the pointPx = x − a(a T a) −1 max{0, a T x − b} (1521)Orthogonal projection of x on the convex slab (Figure 8), for c < bis the point [79,5.1]B ∆ = {y | c ≤ a T y ≤ b} ⊂ R m (1522)Px = x − a(a T a) −1 ( max{0, a T x − b} − max{0, c − a T x} ) (1523)E.6 Vectorization interpretation,projection on a matrixE.6.1Nonorthogonal projection on a vectorNonorthogonal projection of vector x on the range of vector y isaccomplished using a normalized dyad P 0 (B.1); videlicet,〈z,x〉〈z,y〉 y = zT xz T y y = yzTz T y x ∆ = P 0 x (1524)where 〈z,x〉/〈z,y〉 is the coefficient of projection on y . Because P0 2 =P 0and R(P 0 )= R(y) , rank-one matrix P 0 is a nonorthogonal projectorprojecting on R(y) . The direction of nonorthogonal projection is orthogonalto z ; id est,P 0 x − x ⊥ R(P0 T ) (1525)

544 APPENDIX E. PROJECTIONE.6.2Nonorthogonal projection on vectorized matrixFormula (1524) is extensible. Given X,Y,Z ∈ R m×n , we have theone-dimensional nonorthogonal projection of X in isomorphic R mn on therange of vectorized Y : (2.2)〈Z , X〉Y , 〈Z , Y 〉 ≠ 0 (1526)〈Z , Y 〉where 〈Z , X〉/〈Z , Y 〉 is the coefficient of projection. The inequalityaccounts for the fact: projection on R(vecY ) is in a direction orthogonal tovec Z .E.6.2.1Nonorthogonal projection on dyadNow suppose we have nonorthogonal projector dyadAnalogous to (1524), for X ∈ R m×mP 0 = yzTz T y ∈ Rm×m (1527)P 0 XP 0 =yzTz T y X yzTz T y = zT Xy(z T y) 2 yzT = 〈zyT , X〉〈zy T , yz T 〉 yzT (1528)is a nonorthogonal projection of matrix X on the range of vectorized dyadP 0 ; from which it follows:P 0 XP 0 = zT Xyz T y〈〉yz T zyT yzTz T y = z T y , X z T y = 〈P 0 T , X〉 P 0 = 〈P 0 T , X〉〈P0 T , P 0 〉 P 0(1529)Yet this relationship between matrix product and vector inner-product onlyholds for a dyad projector. When nonsymmetric projector P 0 is rank-one asin (1527), therefore,R(vecP 0 XP 0 ) = R(vec P 0 ) in R m2 (1530)andP 0 XP 0 − X ⊥ P T 0 in R m2 (1531)

E.6. VECTORIZATION INTERPRETATION, 545E.6.2.1.1 Example. λ as coefficients of nonorthogonal projection.Any diagonalization (A.5)X = SΛS −1 =m∑λ i s i wi T ∈ R m×m (1144)i=1may be expressed as a sum of one-dimensional nonorthogonal projections ofX , each on the range of a vectorized eigenmatrix P j ∆ = s j w T j ;∑X = m 〈(Se i e T jS −1 ) T , X〉 Se i e T jS −1i,j=1∑= m ∑〈(s j wj T ) T , X〉 s j wj T + m 〈(Se i e T jS −1 ) T , SΛS −1 〉 Se i e T jS −1j=1∑= m 〈(s j wj T ) T , X〉 s j wjTj=1i,j=1j ≠ i∆ ∑= m ∑〈Pj T , X〉 P j = m s j wj T Xs j wjTj=1∑= m λ j s j wjTj=1j=1∑= m P j XP jj=1(1532)This biorthogonal expansion of matrix X is a sum of nonorthogonalprojections because the term outside the projection coefficient 〈〉 is notidentical to the inside-term. (E.6.4) The eigenvalues λ j are coefficients ofnonorthogonal projection of X , while the remaining M(M −1)/2 coefficients(for i≠j) are zeroed by projection. When P j is rank-one as in (1532),R(vecP j XP j ) = R(vec s j w T j ) = R(vec P j ) in R m2 (1533)andP j XP j − X ⊥ P T j in R m2 (1534)Were matrix X symmetric, then its eigenmatrices would also be. So theone-dimensional projections would become orthogonal. (E.6.4.1.1)

546 APPENDIX E. PROJECTIONE.6.3Orthogonal projection on a vectorThe formula for orthogonal projection of vector x on the range of vector y(one-dimensional projection) is basic analytic geometry; [11,3.3] [212,3.2][238,2.2] [249,1-8]〈y,x〉〈y,y〉 y = yT xy T y y = yyTy T y x = ∆ P 1 x (1535)where 〈y,x〉/〈y,y〉 is the coefficient of projection on R(y) . An equivalentdescription is: Vector P 1 x is the orthogonal projection of vector x onR(P 1 )= R(y). Rank-one matrix P 1 is a projection matrix because P 2 1 =P 1 .The direction of projection is orthogonalbecause P T 1 = P 1 .E.6.4P 1 x − x ⊥ R(P 1 ) (1536)Orthogonal projection on a vectorized matrixFrom (1535), given instead X, Y ∈ R m×n , we have the one-dimensionalorthogonal projection of matrix X in isomorphic R mn on the range ofvectorized Y : (2.2)〈Y , X〉〈Y , Y 〉 Y (1537)where 〈Y , X〉/〈Y , Y 〉 is the coefficient of projection.For orthogonal projection, the term outside the vector inner-products 〈〉must be identical to the terms inside in three places.E.6.4.1Orthogonal projection on dyadThere is opportunity for insight when Y is a dyad yz T (B.1): Instead givenX ∈ R m×n , y ∈ R m , and z ∈ R n〈yz T , X〉〈yz T , yz T 〉 yzT = yT Xzy T y z T z yzT (1538)is the one-dimensional orthogonal projection of X in isomorphic R mn onthe range of vectorized yz T . To reveal the obscured symmetric projection

E.6. VECTORIZATION INTERPRETATION, 547matrices P 1 and P 2 we rewrite (1538):y T Xzy T y z T z yzT =yyTy T y X zzTz T z∆= P 1 XP 2 (1539)So for projector dyads, projection (1539) is the orthogonal projection in R mnif and only if projectors P 1 and P 2 are symmetric; E.9 in other words,for orthogonal projection on the range of a vectorized dyad yz T , theterm outside the vector inner-products 〈〉 in (1538) must be identicalto the terms inside in three places.When P 1 and P 2 are rank-one symmetric projectors as in (1539), (29)R(vecP 1 XP 2 ) = R(vec yz T ) in R mn (1540)andP 1 XP 2 − X ⊥ yz T in R mn (1541)When y=z then P 1 =P 2 =P T 2 andP 1 XP 1 = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1 (1542)meaning, P 1 XP 1 is equivalent to orthogonal projection of matrix X onthe range of vectorized projector dyad P 1 . Yet this relationship betweenmatrix product and vector inner-product does not hold for general symmetricprojector matrices.E.9 For diagonalizable X ∈ R m×m (A.5), its orthogonal projection in isomorphic R m2 onthe range of vectorized yz T ∈ R m×m becomes:P 1 XP 2 =m∑λ i P 1 s i wi T P 2i=1When R(P 1 ) = R(w j ) and R(P 2 ) = R(s j ), the j th dyad term from the diagonalizationis isolated but only, in general, to within a scale factor because neither set of left orright eigenvectors is necessarily orthonormal unless X is normal [261,3.2]. Yet whenR(P 2 )= R(s k ) , k≠j ∈{1... m}, then P 1 XP 2 =0.

548 APPENDIX E. PROJECTIONE.6.4.1.1 Example. Eigenvalues λ as coefficients of orthogonal projection.Let C represent any convex subset of subspace S M , and let C 1 be any elementof C . Then C 1 can be expressed as the orthogonal expansionC 1 =M∑i=1M∑〈E ij , C 1 〉 E ij ∈ C ⊂ S M (1543)j=1j ≥ iwhere E ij ∈ S M is a member of the standard orthonormal basis for S M(49). This expansion is a sum of one-dimensional orthogonal projectionsof C 1 ; each projection on the range of a vectorized standard basis matrix.Vector inner-product 〈E ij , C 1 〉 is the coefficient of projection of svec C 1 onR(svec E ij ).When C 1 is any member of a convex set C whose dimension is L ,Carathéodory’s theorem [61] [194] [123] [26] [27] guarantees that no morethan L +1 affinely independent members from C are required to faithfullyrepresent C 1 by their linear combination. E.10Dimension of S M is L=M(M+1)/2 in isometrically isomorphicR M(M+1)/2 . Yet because any symmetric matrix can be diagonalized, (A.5.2)C 1 ∈ S M is a linear combination of its M eigenmatrices q i q T i (A.5.1) weightedby its eigenvalues λ i ;C 1 = QΛQ T =M∑λ i q i qi T (1544)i=1where Λ ∈ S M is a diagonal matrix having δ(Λ) i =λ i , and Q=[q 1 · · · q M ]is an orthogonal matrix in R M×M containing corresponding eigenvectors.To derive eigen decomposition (1544) from expansion (1543), M standardbasis matrices E ij are rotated (B.5) into alignment with the M eigenmatricesq i qiT of C 1 by applying a similarity transformation; [212,5.6]{QE ij Q T } ={qi q T i , i = j = 1... M}( )√1qi 2qj T + q j qiT , 1 ≤ i < j ≤ M(1545)E.10 Carathéodory’s theorem guarantees existence of a biorthogonal expansion for anyelement in aff C when C is any pointed closed convex cone.

E.6. VECTORIZATION INTERPRETATION, 549which remains an orthonormal basis for S M . Then remarkably∑C 1 = M 〈QE ij Q T , C 1 〉 QE ij Q Ti,j=1j ≥ i∑= M ∑〈q i qi T , C 1 〉 q i qi T + M 〈QE ij Q T , QΛQ T 〉 QE ij Q Ti=1∑= M 〈q i qi T , C 1 〉 q i qiTi=1i,j=1j > i∆ ∑= M ∑〈P i , C 1 〉 P i = M q i qi T C 1 q i qiTi=1∑= M λ i q i qiTi=1i=1∑= M P i C 1 P ii=1(1546)this orthogonal expansion becomes the diagonalization; still a sum ofone-dimensional orthogonal projections. The eigenvaluesλ i = 〈q i q T i , C 1 〉 (1547)are clearly coefficients of projection of C 1 on the range of each vectorizedeigenmatrix. (conferE.6.2.1.1) The remaining M(M −1)/2 coefficients(i≠j) are zeroed by projection. When P i is rank-one symmetric as in (1546),R(svecP i C 1 P i ) = R(svecq i q T i ) = R(svecP i ) in R M(M+1)/2 (1548)andP i C 1 P i − C 1 ⊥ P i in R M(M+1)/2 (1549)E.6.4.2Positive semidefiniteness test as orthogonal projectionFor any given X ∈ R m×m the familiar quadratic construct y T Xy ≥ 0,over broad domain, is a fundamental test for positive semidefiniteness.(A.2) It is a fact that y T Xy is always proportional to a coefficient oforthogonal projection; letting z in formula (1538) become y ∈ R m , thenP 2 =P 1 =yy T /y T y=yy T /‖yy T ‖ 2 (confer (1197)) and formula (1539) becomes〈yy T , X〉〈yy T , yy T 〉 yyT = yT Xyy T yyy Ty T y = yyTy T y X yyTy T y∆= P 1 XP 1 (1550)

550 APPENDIX E. PROJECTIONBy (1537), product P 1 XP 1 is the one-dimensional orthogonal projection ofX in isomorphic R m2 on the range of vectorized P 1 because, for rankP 1 =1and P 21 =P 1 ∈ S m (confer (1529))P 1 XP 1 = yT Xyy T y〈〉yy T yyT yyTy T y = y T y , X y T y = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1(1551)The coefficient of orthogonal projection 〈P 1 , X〉= y T Xy/(y T y) is also knownas Rayleigh’s quotient. E.11 When P 1 is rank-one symmetric as in (1550),R(vecP 1 XP 1 ) = R(vec P 1 ) in R m2 (1552)andP 1 XP 1 − X ⊥ P 1 in R m2 (1553)The test for positive semidefiniteness, then, is a test for nonnegativity ofthe coefficient of orthogonal projection of X on the range of each and everyvectorized extreme direction yy T (2.8.1) from the positive semidefinite conein the ambient space of symmetric matrices.E.11 When y becomes the j th eigenvector s j of diagonalizable X , for example, 〈P 1 , X 〉becomes the j th eigenvalue: [120,III]〈P 1 , X 〉| y=sj=s T j( m∑λ i s i wiTi=1s T j s j)s j= λ jSimilarly for y = w j , the j th left-eigenvector,〈P 1 , X 〉| y=wj=w T j( m∑λ i s i wiTi=1w T j w j)w j= λ jA quandary may arise regarding the potential annihilation of the antisymmetric part ofX when s T j Xs j is formed. Were annihilation to occur, it would imply the eigenvalue thusfound came instead from the symmetric part of X . The quandary is resolved recognizingthat diagonalization of real X admits complex eigenvectors; hence, annihilation could onlycome about by forming Re(s H j Xs j) = s H j (X +XT )s j /2 [125,7.1] where (X +X T )/2 isthe symmetric part of X, and s H j denotes the conjugate transpose.

E.7. ON VECTORIZED MATRICES OF HIGHER RANK 551E.6.4.3 PXP ≽ 0In some circumstances, it may be desirable to limit the domain of testy T Xy ≥ 0 for positive semidefiniteness; e.g., ‖y‖= 1. Another exampleof limiting domain-of-test is central to Euclidean distance geometry: ForR(V )= N(1 T ) , the test −V DV ≽ 0 determines whether D ∈ S N h is aEuclidean distance matrix. The same test may be stated: For D ∈ S N h (andoptionally ‖y‖=1)D ∈ EDM N ⇔ −y T Dy = 〈yy T , −D〉 ≥ 0 ∀y ∈ R(V ) (1554)The test −V DV ≽ 0 is therefore equivalent to a test for nonnegativity of thecoefficient of orthogonal projection of −D on the range of each and everyvectorized extreme direction yy T from the positive semidefinite cone S N + suchthat R(yy T ) = R(y) ⊆ R(V ). (The validity of this result is independent ofwhether V is itself a projection matrix.)E.7 on vectorized matrices of higher rankE.7.1 PXP misinterpretation for higher-rank PFor a projection matrix P of rank greater than 1, PXP is generally notcommensurate with 〈P,X 〉 P as is the case for projector dyads (1551). Yet〈P,P 〉for a symmetric idempotent matrix P of any rank we are tempted to say“ PXP is the orthogonal projection of X ∈ S m on R(vecP) ”. The fallacyis: vec PXP does not necessarily belong to the range of vectorized P ; themost basic requirement for projection on R(vec P) .E.7.2Orthogonal projection on matrix subspacesWith A 1 ,B 1 ,Z 1 ,A 2 ,B 2 ,Z 2 as defined for nonorthogonal projector (1447),and definingP 1 ∆ = A 1 A † 1 ∈ S m , P 2 ∆ = A 2 A † 2 ∈ S p (1555)where A 1 ∈ R m×n , Z 1 ∈ R m×k , A 2 ∈ R p×n , Z 2 ∈ R p×k , then given any X‖X −P 1 XP 2 ‖ F = inf ‖X −A 1 (A † 1+B 1 Z TB 1 , B 2 ∈R n×k 1 )X(A †T2 +Z 2 B2 T )A T 2 ‖ F (1556)

552 APPENDIX E. PROJECTIONAs for all subspace projectors, the range of the projector is the subspace onwhich projection is made; {P 1 Y P 2 | Y ∈ R m×p }. Altogether, for projectorsP 1 and P 2 of any rank this means the projection P 1 XP 2 is unique, orthogonalP 1 XP 2 − X ⊥ {P 1 Y P 2 | Y ∈ R m×p } in R mp (1557)and projectors P 1 and P 2 must each be symmetric (confer (1539)) to attainthe infimum.E.7.2.0.1 Proof. Minimum Frobenius norm (1556).Defining P ∆ = A 1 (A † 1 + B 1 Z T 1 ) ,inf ‖X − A 1 (A † 1 + B 1 Z1 T )X(A †T2 + Z 2 B2 T )A T 2 ‖ 2 FB 1 , B 2= inf ‖X − PX(A †T2 + Z 2 B2 T )A T 2 ‖ 2 FB 1 , B 2()= inf tr (X T − A 2 (A † 2 + B 2 Z2 T )X T P T )(X − PX(A †T2 + Z 2 B2 T )A T 2 )B 1 , B 2(= inf tr X T X −X T PX(A †T2 +Z 2 B2 T )A T 2 −A 2 (A †B 1 , B 22+B 2 Z2 T )X T P T X)+A 2 (A † 2+B 2 Z2 T )X T P T PX(A †T2 +Z 2 B2 T )A T 2(1558)The Frobenius norm is a convex function. [38,8.1] Necessary and sufficientconditions for the global minimum are ∇ B1 =0 and ∇ B2 =0. (D.1.3.1) Termsnot containing B 2 in (1558) will vanish from gradient ∇ B2 ; (D.2.3)(∇ B2 tr −X T PXZ 2 B2A T T 2 −A 2 B 2 Z2X T T P T X+A 2 A † 2X T P T PXZ 2 B2A T T 2+A 2 B 2 Z T 2X T P T PXA †T2 A T 2+A 2 B 2 Z T 2X T P T PXZ 2 B T 2A T 2= −2A T 2X T PXZ 2 + 2A T 2A 2 A † 2X T P T PXZ 2 +)2A T 2A 2 B 2 Z2X T T P T PXZ 2= A T 2(−X T + A 2 A † 2X T P T + A 2 B 2 Z2X T T P T PXZ 2= 0⇔R(B 1 )⊆ N(A 1 ) and R(B 2 )⊆ N(A 2 ))(1559)The same conclusion is obtained were instead P T = ∆ (A †T2 + Z 2 B2 T )A T 2 andthe gradient with respect to B 1 observed. The projection P 1 XP 2 (1555) istherefore unique.

E.7. ON VECTORIZED MATRICES OF HIGHER RANK 553E.7.2.0.2 Example. PXP redux & N(V).Suppose we define a subspace of m × n matrices, each elemental matrixhaving columns constituting a list whose geometric center (4.5.1.0.1) is theorigin in R m :R m×nc∆= {Y ∈ R m×n | Y 1 = 0}= {Y ∈ R m×n | N(Y ) ⊇ 1} = {Y ∈ R m×n | R(Y T ) ⊆ N(1 T )}= {XV | X ∈ R m×n } ⊂ R m×n (1560)the nonsymmetric geometric center subspace. Further suppose V ∈ S n isa projection matrix having N(V )= R(1) and R(V ) = N(1 T ). Then linearmapping T(X)=XV is the orthogonal projection of any X ∈ R m×n on R m×ncin the Euclidean (vectorization) sense because V is symmetric, N(XV )⊇1,and R(VX T )⊆ N(1 T ).Now suppose we define a subspace of symmetric n ×n matrices each ofwhose columns constitute a list having the origin in R n as geometric center,S n c∆= {Y ∈ S n | Y 1 = 0}= {Y ∈ S n | N(Y ) ⊇ 1} = {Y ∈ S n | R(Y ) ⊆ N(1 T )}(1561)the geometric center subspace. Further suppose V ∈ S n is a projectionmatrix, the same as before. Then V XV is the orthogonal projection ofany X ∈ S n on S n c in the Euclidean sense (1557) because V is symmetric,V XV 1=0, and R(V XV )⊆ N(1 T ). Two-sided projection is necessary onlyto remain in the ambient symmetric matrix subspace. ThenS n c = {V XV | X ∈ S n } ⊂ S n (1562)has dim S n c = n(n−1)/2 in isomorphic R n(n+1)/2 . We find its orthogonalcomplement as the aggregate of all negative directions of orthogonalprojection on S n c : the translation-invariant subspace (4.5.1.1)S n⊥c∆= {X − V XV | X ∈ S n } ⊂ S n= {u1 T + 1u T | u∈ R n }(1563)

554 APPENDIX E. PROJECTIONcharacterized by the doublet u1 T + 1u T (B.2). E.12 Defining thegeometric center mapping V(X) = −V XV 1 consistently with (545), then2N(V)= R(I − V) on domain S n analogously to vector projectors (E.2);id est,N(V) = S n⊥c (1564)a subspace of S n whose dimension is dim S n⊥c = n in isomorphic R n(n+1)/2 .Intuitively, operator V is an orthogonal projector; any argumentduplicitously in its range is a fixed point. So, this symmetric operator’snullspace must be orthogonal to its range.Now compare the subspace of symmetric matrices having all zeros in thefirst row and columnS n 1∆= {Y ∈ S n | Y e 1 = 0}{[ ] [ ] }0 0T 0 0T= X | X ∈ S n0 I 0 I{ [0 √ ] T [ √ ]= 2VN Z 0 2VN | Z ∈ SN}(1565)[ ] 0 0Twhere P = is an orthogonal projector. Then, similarly, PXP is0 Ithe orthogonal projection of any X ∈ S n on S n 1 in the Euclidean sense (1557),andS n⊥1{[ ] [ ] }∆ 0 0T 0 0T= X − X | X ∈ S n ⊂ S n0 I 0 I= { (1566)ue T 1 + e 1 u T | u∈ R n}and obviously S n 1 ⊕ S n⊥1 = S n . E.12 Proof.{X − V X V | X ∈ S n } = {X − (I − 1 n 11T )X(I − 11 T 1 n ) | X ∈ Sn }Because {X1 | X ∈ S n } = R n ,= { 1 n 11T X + X11 T 1 n − 1 n 11T X11 T 1 n | X ∈ Sn }{X − V X V | X ∈ S n } = {1ζ T + ζ1 T − 11 T (1 T ζ 1 n ) | ζ ∈Rn }= {1ζ T (I − 11 T 12n ) + (I − 12n 11T )ζ1 T | ζ ∈R n }where I − 12n 11T is invertible.

E.8. RANGE/ROWSPACE INTERPRETATION 555E.8 Range/Rowspace interpretationFor idempotent matrices P 1 and P 2 of any rank, P 1 XP2T is a projectionof R(X) on R(P 1 ) and a projection of R(X T ) on R(P 2 ) : For any givenX = UΣQ T ∈ R m×p , as in compact singular value decomposition (1155),P 1 XP T 2 =η∑σ i P 1 u i qi T P2 T =i=1η∑σ i P 1 u i (P 2 q i ) T (1567)i=1where η = ∆ min{m , p}. Recall u i ∈ R(X) and q i ∈ R(X T ) when thecorresponding singular value σ i is nonzero. (A.6.1) So P 1 projects u i onR(P 1 ) while P 2 projects q i on R(P 2 ) ; id est, the range and rowspace of anyX are respectively projected on the ranges of P 1 and P 2 . E.13E.9 Projection on convex setThus far we have discussed only projection on subspaces. Now wegeneralize, considering projection on arbitrary convex sets in Euclidean space;convex because projection is, then, unique minimum-distance and a convexoptimization problem:For projection P C x of point x on any closed set C ⊆ R n it is obvious:C = {P C x | x∈ R n } (1568)If C ⊆ R n is a closed convex set, then for each and every x∈ R n there existsa unique point Px belonging to C that is closest to x in the Euclidean sense.Like (1469), unique projection Px (or P C x) of a point x on convex set Cis that point in C closest to x ; [154,3.12]There exists a converse:‖x − Px‖ 2 = infy∈C ‖x − y‖ 2 (1569)E.13 When P 1 and P 2 are symmetric and R(P 1 )=R(u j ) and R(P 2 )=R(q j ), then the j thdyad term from the singular value decomposition of X is isolated by the projection. Yetif R(P 2 )= R(q l ), l≠j ∈{1... η}, then P 1 XP 2 =0.

556 APPENDIX E. PROJECTIONE.9.0.0.1 Theorem. (Bunt-Motzkin) Convex set if projections unique.[242,7.5] [121] If C ⊆ R n is a nonempty closed set and if for each and everyx in R n there is a unique Euclidean projection Px of x on C belonging toC , then C is convex.⋄Borwein & Lewis propose, for closed convex set C [36,3.3, exer.12]whereas, for x /∈ C∇‖x − Px‖ 2 2 = (x − Px)2 (1570)∇‖x − Px‖ 2 = (x − Px) ‖x − Px‖ −12 (1571)As they do, we leave its proof an exercise.A well-known equivalent characterization of projection on a convex set isa generalization of the perpendicularity condition (1468) for projection on asubspace:E.9.0.0.2 Definition. Normal vector. [194, p.15]Vector z is normal to convex set C at point Px∈ C if〈z , y−Px〉 ≤ 0 ∀y ∈ C (1572)△A convex set has a nonzero normal at each of its boundary points.[194, p.100] Hence, the normal or dual interpretation of projection:E.9.1Dual interpretation of projection on convex setE.9.1.0.1 Theorem. Unique minimum-distance projection. [123,A.3.1][154,3.12] [59,4.1] [45] (Figure 103(b), p.572) Given a closed convex setC ⊆ R n , point Px is the unique projection of a given point x∈ R n on C(Px is that point in C nearest x) if and only ifPx ∈ C , 〈x − Px , y − Px〉 ≤ 0 ∀y ∈ C (1573)⋄As for subspace projection, operator P is idempotent in the sense: for eachand every x ∈ R n , P(Px)= Px . Yet operator P is not linear; projector Pis a linear operator if and only if convex set C (on which projection is made)is a subspace. (E.4)

E.9. PROJECTION ON CONVEX SET 557E.9.1.0.2 Theorem. Unique projection via normal cone. E.14 [59,4.3]Given closed convex set C ⊆ R n , point Px is the unique projection of agiven point x∈ R n on C if and only ifPx ∈ C , Px − x ∈ (C − Px) ∗ (1574)In other words, Px is that point in C nearest x if and only if Px − x belongsto that cone dual to translate C − Px .⋄E.9.1.1Dual interpretation as optimizationLuenberger [154, p.134] carries forward the dual interpretation of projectionas solution to a maximization problem: Minimum distance from a pointx∈ R n to a convex set C ⊂ R n can be found by maximizing distance fromx to hyperplane ∂H over the set of all hyperplanes separating x from C .Existence of a separating hyperplane (2.4.2.7) presumes point x lies on theboundary or exterior to set C .The optimal separating hyperplane is characterized by the fact it alsosupports C . Any hyperplane supporting C (Figure 19(a)) has form∂H − = { y ∈ R n | a T y = σ C (a) } (104)where the support function is convex, definedσ C (a) ∆ = supz∈Ca T z (404)When point x is finite and set C contains finite points, under this projectioninterpretation, if the supporting hyperplane is a separating hyperplane thenthe support function is finite. From Example E.5.0.0.8, projection P ∂H− x ofx on any given supporting hyperplane ∂H − isP ∂H− x = x − a(a T a) −1( a T x − σ C (a) ) (1575)With reference to Figure 99, identifyingH + = {y ∈ R n | a T y ≥ σ C (a)} (83)E.14 −(C − Px) ∗ is the normal cone to set C at point Px. (E.10.3.2)

558 APPENDIX E. PROJECTION∂H −C(a)PxH −H +κaP ∂H− xx(b)Figure 99: Dual interpretation of projection of point x on convex set Cin R 2 . (a) κ = (a T a) −1( a T x − σ C (a) ) (b) Minimum distance from x toC is found by maximizing distance to all hyperplanes supporting C andseparating it from x . A convex problem for any convex set, distanceof maximization is unique, e.g., exercise this paradigm on any convexpolyhedral set.

E.9. PROJECTION ON CONVEX SET 559then‖x − Px‖ = sup ‖x − P ∂H− x‖ = sup ‖a(a T a) −1 (a T x − σ C (a))‖∂H − | x∈H + a | x∈H += supa | x∈H +1‖a‖ |aT x − σ C (a)|(1576)which can be expressed as a convex optimization‖x − Px‖ = maximize a T x − σ C (a)asubject to ‖a‖ ≤ 1(1577)The unique minimum-distance projection on convex set C is thereforePx = x − a ⋆ ( a ⋆T x − σ C (a ⋆ ) ) (1578)where optimally ‖a ⋆ ‖=1.In the circumstance set C is a closed convex cone K and there existsa hyperplane separating given point x from K , then optimal σ K (a ⋆ ) hasvalue 0 so problem (1577) becomes‖x − Px‖ = maximize a T xasubject to ‖a‖ ≤ 1(1579)a ∈ K ◦Normals a to all hyperplanes supporting K belong to the polar cone K ◦ .(2.13.6.1) The norm inequality can be handled by Schur complement(A.4). Projection on cone K is Px = (I − a ⋆ a ⋆T )x while projection onthe polar cone −K ∗ is x −Px = a ⋆ a ⋆T x (E.9.2.2.1). Negating vector a ,this maximization problem becomes a minimization (the same problem) andthe polar cone becomes the dual cone.E.9.2Projection on coneWhen convex set Cconditions:is a cone, there is a finer statement of optimality

560 APPENDIX E. PROJECTIONE.9.2.0.1 Theorem. Unique projection on cone. [123,A.3.2]Let K ⊆ R n be a closed convex cone, and K ∗ its dual (2.13.1). Then Px isthe unique minimum-distance projection of x∈ R n on K if and only ifPx ∈ K , 〈Px − x, Px〉 = 0, Px − x ∈ K ∗ (1580)⋄In words, Px is the unique minimum-distance projection of x on K ifand only if1) projection Px lies in K2) direction Px−x is orthogonal to the projection Px3) direction Px−x lies in the dual cone K ∗ .As the theorem is stated, it admits projection on K having empty interior;id est, on convex cones in a proper subspace of R n . Projection on K of anypoint x∈−K ∗ belonging to the negative dual cone is on the origin.E.9.2.1Relation to subspace projectionConditions 1 and 2 of the theorem are common with orthogonal projectionon a subspace R(P ) : Condition 1 is the most basic requirement;namely, Px∈ R(P ) , the projection belongs to the subspace. Invokingperpendicularity condition (1468), we recall the second requirement forprojection on a subspace:Px − x ⊥ R(P ) or Px − x ∈ R(P ) ⊥ (1581)which corresponds to condition 2. Yet condition 3 is a generalizationof subspace projection; id est, for unique minimum-distance projection ona closed convex cone K , polar cone −K ∗ plays the role R(P ) ⊥ playsfor subspace projection (P R x = x − P R ⊥ x). Indeed, −K ∗ is the algebraiccomplement in the orthogonal vector sum (p.610) [168] [123,A.3.2.5]K ⊞ −K ∗ = R n ⇔ cone K is closed and convex (1582)Also, given unique minimum-distance projection Px on K satisfyingTheorem E.9.2.0.1, then by projection on the algebraic complement via I −PinE.2 we have−K ∗ = {x − Px | x∈ R n } = {x∈ R n | Px = 0} (1583)

E.9. PROJECTION ON CONVEX SET 561consequent to Moreau (1585). Recalling any subspace is a closed convexcone E.15 K = R(P ) ⇔ −K ∗ = R(P ) ⊥ (1584)meaning, when a cone is a subspace R(P ) then the dual cone becomes itsorthogonal complement R(P ) ⊥ . [38,2.6.1] In this circumstance, condition 3becomes coincident with condition 2.E.9.2.2 Salient properties: Projection Px on closed convex cone K[123,A.3.2] [59,5.6] For x, x 1 , x 2 ∈ R n1. P K (αx) = α P K x ∀α≥0 (nonnegative homogeneity)2. ‖P K x‖ ≤ ‖x‖3. P K x = 0 ⇔ x ∈ −K ∗4. P K (−x) = −P −K x5. (Jean-Jacques Moreau (1962)) [168]x = x 1 + x 2 , x 1 ∈ K , x 2 ∈−K ∗ , x 1 ⊥ x 2⇔x 1 = P K x , x 2 = P −K∗x(1585)6. K = {x − P −K∗x | x∈ R n } = {x∈ R n | P −K∗x = 0}7. −K ∗ = {x − P K x | x∈ R n } = {x∈ R n | P K x = 0} (1583)E.9.2.2.1 Corollary. I −P for cones. (conferE.2)Denote by K ⊆ R n a closed convex cone, and call K ∗ its dual. Thenx −P −K∗x is the unique minimum-distance projection of x∈ R n on K if andonly if P −K∗x is the unique minimum-distance projection of x on −K ∗ thepolar cone.⋄Proof. Assume x 1 = P K x . Then by Theorem E.9.2.0.1 we havex 1 ∈ K , x 1 − x ⊥ x 1 , x 1 − x ∈ K ∗ (1586)E.15 but a proper subspace is not a proper cone (2.7.2.2.1).

562 APPENDIX E. PROJECTIONNow assume x − x 1 = P −K∗x . Then we havex − x 1 ∈ −K ∗ , −x 1 ⊥ x − x 1 , −x 1 ∈ −K (1587)But these two assumptions are apparently identical. We must therefore havex −P −K∗x = x 1 = P K x (1588)E.9.2.2.2 Corollary. Unique projection via dual or normal cone.[59,4.7] (E.10.3.2, confer Theorem E.9.1.0.2) Given point x∈ R n andclosed convex cone K ⊆ R n , the following are equivalent statements:1. point Px is the unique minimum-distance projection of x on K2. Px ∈ K , x − Px ∈ −(K − Px) ∗ = −K ∗ ∩ (Px) ⊥3. Px ∈ K , 〈x − Px, Px〉 = 0, 〈x − Px, y〉 ≤ 0 ∀y ∈ K⋄E.9.2.2.3 Example. Unique projection on nonnegative orthant.(confer (946)) From Theorem E.9.2.0.1, to project matrix H ∈ R m×n on theself-dual orthant (2.13.5.1) of nonnegative matrices R m×n+ in isomorphicR mn , the necessary and sufficient conditions are:H ⋆ ≥ 0tr ( (H ⋆ − H) T H ⋆) = 0H ⋆ − H ≥ 0(1589)where the inequalities denote entrywise comparison. The optimal solutionH ⋆ is simply H having all its negative entries zeroed;H ⋆ ij = max{H ij , 0} , i,j∈{1... m} × {1... n} (1590)Now suppose the nonnegative orthant is translated by T ∈ R m×n ; id est,consider R m×n+ + T . Then projection on the translated orthant is [59,4.8]H ⋆ ij = max{H ij , T ij } (1591)

E.9. PROJECTION ON CONVEX SET 563E.9.2.2.4 Example. Unique projection on truncated convex cone.Consider the problem of projecting a point x on a closed convex cone thatis artificially bounded; really, a bounded convex polyhedron having a vertexat the origin:minimize ‖x − Ay‖ 2y∈R Nsubject to y ≽ 0(1592)‖y‖ ∞ ≤ 1where the convex cone has vertex-description (2.12.2.0.1), for A∈ R n×NK = {Ay | y ≽ 0} (1593)and where ‖y‖ ∞ ≤ 1 is the artificial bound. This is a convex optimizationproblem having no known closed-form solution, in general. It arises, forexample, in the fitting of hearing aids designed around a programmablegraphic equalizer (a filter bank whose only adjustable parameters are gainper band each bounded above by unity). [54] The problem is equivalent to aSchur-form semidefinite program (A.4.1)minimizey∈R N , t∈Rsubject tot[tI x − Ay(x − Ay) T t]≽ 0(1594)0 ≼ y ≼ 1E.9.3nonexpansivityE.9.3.0.1 Theorem. Nonexpansivity. [102,2] [59,5.3]When C ⊂ R n is an arbitrary closed convex set, projector P projecting on Cis nonexpansive in the sense: for any vectors x,y ∈ R n‖Px − Py‖ ≤ ‖x − y‖ (1595)with equality when x −Px = y −Py . E.16⋄E.16 This condition for equality corrects an error in [45] (where the norm is applied to eachside of the condition given here) easily revealed by counter-example.

564 APPENDIX E. PROJECTIONProof. [35]‖x − y‖ 2 = ‖Px − Py‖ 2 + ‖(I − P )x − (I − P )y‖ 2+ 2〈x − Px, Px − Py〉 + 2〈y − Py , Py − Px〉(1596)Nonnegativity of the last two terms follows directly from the uniqueminimum-distance projection theorem.The foregoing proof reveals another flavor of nonexpansivity; for each andevery x,y ∈ R n‖Px − Py‖ 2 + ‖(I − P )x − (I − P )y‖ 2 ≤ ‖x − y‖ 2 (1597)Deutsch shows yet another: [59,5.5]E.9.4‖Px − Py‖ 2 ≤ 〈x − y , Px − Py〉 (1598)Easy projectionsProjecting any matrix H ∈ R n×n in the Euclidean/Frobenius senseorthogonally on the subspace of symmetric matrices S n in isomorphicR n2 amounts to taking the symmetric part of H ; (2.2.2.0.1) id est,(H+H T )/2 is the projection.To project any H ∈ R n×n orthogonally on the symmetric hollowsubspace S n h in isomorphic (2.2.3.0.1), we may take the symmetricRn2part and then zero all entries along the main diagonal, or vice versa(because this is projection on the intersection of two subspaces); id est,(H + H T )/2 − δ 2 (H) .To project on the nonnegative orthant R m×n+ , simply clip all negativeentries to 0. Likewise, projection on the nonpositive orthant R m×n− seesall positive entries clipped to 0. Projection on other orthants is equallysimple with appropriate clipping.Clipping in excess of |1| each entry of a point x ∈ R n is equivalentto unique minimum-distance projection of x on the unit hypercubecentered at the origin. (conferE.10.3.2)Projecting on hyperplane, halfspace, slab:E.5.0.0.8.

E.9. PROJECTION ON CONVEX SET 565Projection of x∈ R n on a hyper-rectangle: [38,8.1.1]C = {y ∈ R n | l ≼ y ≼ u, l ≺ u} (1599)⎧⎨ l k , x k ≤ l kP(x) k = x k , l k ≤ x k ≤ u k (1600)⎩u k , x k ≥ u kUnique minimum-distance projection of H ∈ S n on the positivesemidefinite cone S n + in the Euclidean/Frobenius sense is accomplishedby eigen decomposition (diagonalization) followed by clipping allnegative eigenvalues to 0.Unique minimum-distance projection on the generally nonconvexsubset of all matrices belonging to S n + having rank not exceeding ρ(2.9.2.1) is accomplished by clipping all negative eigenvalues to 0 andzeroing the smallest nonnegative eigenvalues keeping only ρ largest.(7.1.2)Unique minimum-distance projection of H ∈ R m×n on the set of allm ×n matrices of rank no greater than k in the Euclidean/Frobeniussense is the singular value decomposition (A.6) of H having allsingular values beyond the k th zeroed. [209, p.208] This is also a solutionto the projection in the sense of spectral norm. [38,8.1]Projection on Lorentz cone: [38, exer.8.3(c)]P S N+ ∩ S N c= P S N+P S N c(806)P RN×N+ ∩ S N h= P RN×NP + S NhP RN×N+ ∩ S= P N RN×N+P S N(7.0.1.1)(E.9.5)E.9.4.0.1 Exercise. Largest singular value.Find the unique minimum-distance projection on the set of all m ×nmatrices whose largest singular value does not exceed 1. Unique minimum-distance projection on an ellipsoid (Figure 9) is noteasy but there are several known techniques. [33] [79,5.1]Youla [260,2.5] lists eleven “useful projections” of square-integrableuni- and bivariate real functions on various convex sets, expressible in closedform.

566 APPENDIX E. PROJECTION❇❇❇❇❇❇❇❇ x ❜❜R n⊥ ❇❙ ❜❇❇❇❇◗◗◗◗ ❙❙❙❙❙❙ ❜❜❜❜❜z❜y❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧ R n❜❜C❜❜❜❜❜❜x −z ⊥ z −y ❜❇❜❇❇❇❇ ❜❜❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧Figure 100: Closed convex set C belongs to subspace R n (shown boundedin sketch and drawn without proper perspective). Point y is uniqueminimum-distance projection of x on C ; equivalent to product of orthogonalprojection of x on R n and minimum-distance projection of result z on C .

E.9. PROJECTION ON CONVEX SET 567E.9.5Projection on convex set in subspaceSuppose a convex set C is contained in some subspace R n . Then uniqueminimum-distance projection of any point in R n ⊕ R n⊥ on C can beaccomplished by first projecting orthogonally on that subspace, and thenprojecting the result on C ; [59,5.14] id est, the ordered product of twoindividual projections.Proof. (⇐) To show that, suppose unique minimum-distance projectionP C x on C ⊂ R n is y as illustrated in Figure 100;‖x − y‖ ≤ ‖x − q‖ ∀q ∈ C (1601)Further suppose P Rn x equals z . By the Pythagorean theorem‖x − y‖ 2 = ‖x − z‖ 2 + ‖z − y‖ 2 (1602)because x − z ⊥ z − y . (1468) [154,3.3] Then point y = P C x is the sameas P C z because‖z − y‖ 2 = ‖x − y‖ 2 − ‖x − z‖ 2 ≤ ‖z − q‖ 2 = ‖x − q‖ 2 − ‖x − z‖ 2which holds by assumption (1601).(⇒) Now suppose z = P Rn x and∀q ∈ C(1603)‖z − y‖ ≤ ‖z − q‖ ∀q ∈ C (1604)meaning y = P C z . Then point y is identical to P C x because‖x − y‖ 2 = ‖x − z‖ 2 + ‖z − y‖ 2 ≤ ‖x − q‖ 2 = ‖x − z‖ 2 + ‖z − q‖ 2by assumption (1604).∀q ∈ C(1605)Projecting matrix H ∈ R n×n on convex cone K = S n ∩ R n×n+ in isomorphicR n2 can be accomplished, for example, by first projecting on S n and only thenprojecting the result on R n×n+ (confer7.0.1), because that projection productis equivalent to projection on the subset of the nonnegative orthant in thesymmetric matrix subspace.

568 APPENDIX E. PROJECTIONE.10 Alternating projectionAlternating projection is an iterative technique for finding a point in theintersection of a number of arbitrary closed convex sets C k , or for findingthe distance between two nonintersecting closed convex sets. Because it cansometimes be difficult or inefficient to compute the intersection or expressit analytically, one naturally asks whether it is possible to instead project(unique minimum-distance) alternately on the individual C k , often easier.Once a cycle of alternating projections (an iteration) is complete, we theniterate (repeat the cycle) until convergence. If the intersection of two closedconvex sets is empty, then by convergence we mean the iterate (the resultafter a cycle of alternating projections) settles to a point of minimum distanceseparating the sets.While alternating projection can find the point in the nonemptyintersection closest to a given point b , it does not necessarily find it.Dependably finding that point is solved by an elegantly simple enhancementto the alternating projection technique: this Dykstra algorithm (1637)for projection on the intersection is one of the most beautiful projectionalgorithms ever discovered. It is accurately interpreted as the discoveryof what alternating projection originally sought to accomplish; uniqueminimum-distance projection on the nonempty intersection of a number ofarbitrary closed convex sets C k . Alternating projection is, in fact, a specialcase of the Dykstra algorithm whose discussion we defer untilE.10.3.E.10.0.1commutative projectorsGiven two arbitrary convex sets C 1 and C 2 and their respectiveminimum-distance projection operators P 1 and P 2 , if projectors commutefor each and every x∈ R n then it is easy to show P 1 P 2 x∈ C 1 ∩ C 2 andP 2 P 1 x∈ C 1 ∩ C 2 . When projectors commute (P 1 P 2 =P 2 P 1 ), a point in theintersection can be found in a finite number of steps; while commutativity isa sufficient condition, it is not necessary (5.8.1.1.1 for example).When C 1 and C 2 are subspaces, in particular, projectors P 1 and P 2commute if and only if P 1 P 2 = P C1 ∩ C 2or iff P 2 P 1 = P C1 ∩ C 2or iff P 1 P 2 isthe orthogonal projection on a Euclidean subspace. [59, lem.9.2] Subspaceprojectors will commute, for example, when P 1 (C 2 ) ⊂ C 2 or P 2 (C 1 ) ⊂ C 1 orC 1 ⊂ C 2 or C 2 ⊂ C 1 or C 1 ⊥ C 2 . When subspace projectors commute, thismeans we can find a point in the intersection of those subspaces in a finite

E.10. ALTERNATING PROJECTION 569bC 2∂K ⊥ C 1 ∩ C 2(Pb) + PbC 1Figure 101: First several alternating projections (1615) invon Neumann-style projection of point b converging on closest pointPb in intersection of two closed convex sets in R 2 ; C 1 and C 2 are partiallydrawn in vicinity of their intersection. The pointed normal cone K ⊥ (1639)is translated to Pb , the unique minimum-distance projection of b onintersection. For this particular example, it is possible to start anywherein a large neighborhood of b and still converge to Pb . The alternatingprojections are themselves robust with respect to some significant amountof noise because they belong to translated normal cone.number of steps; we find, in fact, the closest point.E.10.0.2noncommutative projectorsTypically, one considers the method of alternating projection when projectorsdo not commute; id est, when P 1 P 2 ≠P 2 P 1 .The iconic example for noncommutative projectors illustrated inFigure 101 shows the iterates converging to the closest point in theintersection of two arbitrary convex sets. Yet simple examples likeFigure 102 reveal that noncommutative alternating projection does notalways yield the closest point, although we shall show it always yields somepoint in the intersection or a point that attains the distance between twoconvex sets.Alternating projection is also known as successive projection [105] [102][40], cyclic projection [79] [162,3.2], successive approximation [45], orprojection on convex sets [206] [207,6.4]. It is traced back to von Neumann(1933) [241] and later Wiener [246] who showed that higher iterates of a

570 APPENDIX E. PROJECTIONbH 1H 2P 2 bH 1 ∩ H 2PbP 1 P 2 bFigure 102: The sets {C k } in this example comprise two halfspaces H 1 andH 2 . The von Neumann-style alternating projection in R 2 quickly convergesto P 1 P 2 b (feasibility). The unique minimum-distance projection on theintersection is, of course, Pb .product of two orthogonal projections on subspaces converge at each pointin the ambient space to the unique minimum-distance projection on theintersection of the two subspaces. More precisely, if R 1 and R 2 are closedsubspaces of a Euclidean space and P 1 and P 2 respectively denote orthogonalprojection on R 1 and R 2 , then for each vector b in that space,lim (P 1P 2 ) i b = P R1 ∩ R 2b (1606)i→∞Deutsch [59, thm.9.8, thm.9.35] shows rate of convergence for subspaces tobe geometric [259,1.4.4]; bounded above by κ 2i+1 ‖b‖ , i=0, 1, 2..., where0≤κ

E.10. ALTERNATING PROJECTION 571illustrates one application where convergence is reasonably geometric and theresult is the unique minimum-distance projection. Figure 102, in contrast,demonstrates convergence in one iteration to a fixed point (of the projectionproduct) E.17 in the intersection of two halfspaces; a.k.a, feasibility problem.It was Dykstra who in 1983 [66] (E.10.3) first solved this projection problem.E.10.0.3the bulletsAlternating projection has, therefore, various meaning dependent on theapplication or field of study; it may be interpreted to be: a distance problem,a feasibility problem (von Neumann), or a projection problem (Dykstra):Distance. Figure 103(a)(b). Find a unique point of projection P 1 b∈C 1 that attains the distance between any two closed convex sets C 1and C 2 ;‖P 1 b − b‖ = dist(C 1 , C 2 ) ∆ = infz∈C 2‖P 1 z − z‖ (1608)⋂Feasibility. Figure 103(c), Ck ≠ ∅ . Given a number of indexedclosed convex sets C k ⊂ R n , find any fixed point in their intersectionby iterating (i) a projection product starting from b ;(∏ ∞)∏P k b ∈ ⋂i=1 kkC k (1609)Optimization. Figure 103(c), ⋂ C k ≠ ∅ . Given a number of indexedclosed convex sets C k ⊂ R n , uniquely project a given point b on ⋂ C k ;‖Pb − b‖ = infx∈ÌC k‖x − b‖ (1610)E.17 A fixed point of a mapping T : R n → R n is a point x whose image is identical underthe map; id est, Tx = x.

572 APPENDIX E. PROJECTIONa(a)C 1C 1C 1{y | (b −P 1 b) T (y −P 1 b)=0}b(b)yP 1 bC 2C 2bC 2PbP 1 P 2 b(c)Figure 103:(a) (distance) Intersection of two convex sets in R 2 is empty. Method ofalternating projection would be applied to find that point in C 1 nearest C 2 .(b) (distance) Given b ∈ C 2 , then P 1 b ∈ C 1 is nearest b iff(y −P 1 b) T (b −P 1 b)≤0 ∀y ∈ C 1 by the unique minimum-distance projectiontheorem (E.9.1.0.1). When P 1 b attains the distance between the two sets,hyperplane {y | (b −P 1 b) T (y −P 1 b)=0} separates C 1 from C 2 . [38,2.5.1](c) (0 distance) Intersection is nonempty.(optimization) We may want the point Pb in ⋂ C k nearest point b .(feasibility) We may instead be satisfied with a fixed point of the projectionproduct P 1 P 2 b in ⋂ C k .

E.10. ALTERNATING PROJECTION 573E.10.1Distance and existenceExistence of a fixed point is established:E.10.1.0.1 Theorem. Distance. [45]Given any two closed convex sets C 1 and C 2 in R n , then P 1 b∈ C 1 is a fixedpoint of the projection product P 1 P 2 if and only if P 1 b is a point of C 1nearest C 2 .⋄Proof. (⇒) Given fixed point a = P 1 P 2 a ∈ C 1 with b ∆ = P 2 a ∈ C 2 intandem so that a = P 1 b , then by the unique minimum-distance projectiontheorem (E.9.1.0.1)(1611)(b − a) T (u − a) ≤ 0 ∀u∈ C 1(a − b) T (v − b) ≤ 0⇔∀v ∈ C 2‖a − b‖ ≤ ‖u − v‖ ∀u∈ C 1 and ∀v ∈ C 2by Schwarz inequality ‖〈x,y〉‖ ≤ ‖x‖ ‖y‖ [140] [194].(⇐) Suppose a∈ C 1 and ‖a − P 2 a‖ ≤ ‖u − P 2 u‖ ∀u∈ C 1 . Now suppose wechoose u =P 1 P 2 a . Then‖u − P 2 u‖ = ‖P 1 P 2 a − P 2 P 1 P 2 a‖ ≤ ‖a − P 2 a‖ ⇔ a = P 1 P 2 a (1612)Thus a = P 1 b (with b =P 2 a∈ C 2 ) is a fixed point in C 1 of the projectionproduct P 1 P 2 . E.18E.10.2Feasibility and convergenceThe set of all fixed points of any nonexpansive mapping is a closed convexset. [86, lem.3.4] [20,1] The projection product P 1 P 2 is nonexpansive byTheorem E.9.3.0.1 because, for any vectors x,a∈ R n‖P 1 P 2 x − P 1 P 2 a‖ ≤ ‖P 2 x − P 2 a‖ ≤ ‖x − a‖ (1613)If the intersection of two closed convex sets C 1 ∩ C 2 is empty, then the iteratesconverge to a point of minimum distance, a fixed point of the projectionproduct. Otherwise, convergence is to some fixed point in their intersectionE.18 Point b=P 2 a can be shown, similarly, to be a fixed point of the product P 2 P 1 .

574 APPENDIX E. PROJECTION(a feasible point) whose existence is guaranteed by virtue of the fact that eachand every point in the convex intersection is in one-to-one correspondencewith fixed points of the nonexpansive projection product.Bauschke & Borwein [20,2] argue that any sequence monotone in thesense of Fejér is convergent. E.19E.10.2.0.1 Definition. Fejér monotonicity. [169]Given closed convex set C ≠ ∅ , then a sequence x i ∈ R n , i=0, 1, 2..., ismonotone in the sense of Fejér with respect to C iff‖x i+1 − c‖ ≤ ‖x i − c‖ for all i≥0 and each and every c ∈ C (1614)Given x 0 ∆ = b , if we express each iteration of alternating projection byx i+1 = P 1 P 2 x i , i=0, 1, 2... (1615)△and define any fixed point a =P 1 P 2 a , then the sequence x imonotone with respect to fixed point a becauseis Fejér‖P 1 P 2 x i − a‖ ≤ ‖x i − a‖ ∀i ≥ 0 (1616)by nonexpansivity. The nonincreasing sequence ‖P 1 P 2 x i − a‖ is boundedbelow hence convergent because any bounded monotone sequence in R isconvergent; [160,1.2] [27,1.1] P 1 P 2 x i+1 = P 1 P 2 x i = x i+1 . Sequence x itherefore converges to some fixed point. If the intersection C 1 ∩ C 2 isnonempty, convergence is to some point there by the distance theorem.Otherwise, x i converges to a point in C 1 of minimum distance to C 2 .E.10.2.0.2 Example. Hyperplane/orthant intersection.Find a feasible point (1609) belonging to the nonempty intersection of twoconvex sets: given A∈ R m×n , β ∈ R(A)C 1 ∩ C 2 = R n + ∩ A = {y | y ≽ 0} ∩ {y | Ay = β} ⊂ R n (1617)the nonnegative orthant with affine subset A an intersection of hyperplanes.Projection of an iterate x i ∈ R n on A is calculatedP 2 x i = x i − A T (AA T ) −1 (Ax i − β) (1515)E.19 Other authors prove convergence by different means; e.g., [102] [40].

E.10. ALTERNATING PROJECTION 575y 2θC 1 = R 2 +bPby 1C 2 = A = {y | [ 1 1 ]y = 1}Figure 104: From Example E.10.2.0.2 in R 2 , showing von Neumann-stylealternating projection to find feasible point belonging to intersection ofnonnegative orthant with hyperplane. Point Pb lies at intersection ofhyperplane with ordinate axis. In this particular example, the feasible pointfound is coincidentally optimal. The rate of convergence depends upon angleθ ; as it becomes more acute, convergence slows. [102,3]∥ x ∏i − ( ∞ ∏P k )b∥j=1k201816141210864200 5 10 15 20 25 30 35 40 45iFigure 105: Geometric convergence of iterates in norm, forExample E.10.2.0.2 in R 1000 .

576 APPENDIX E. PROJECTIONwhile, thereafter, projection of the result on the orthant is simplyx i+1 = P 1 P 2 x i = max{0,P 2 x i } (1618)where the maximum is entrywise (E.9.2.2.3).One realization of this problem in R 2 is illustrated in Figure 104: ForA = [ 1 1 ] , β = 1, and x 0 = b = [ −3 1/2 ] T , the iterates converge to thefeasible point Pb = [ 0 1 ] T .To give a more palpable sense of convergence in higher dimension, wedo this example again but now we compute an alternating projection forthe case A ∈ R 400×1000 , β ∈ R 400 , and b ∈ R 1000 , all of whose entries areindependently and randomly set to a uniformly distributed real number inthe interval [−1, 1] . Convergence is illustrated in Figure 105. This application of alternating projection to feasibility is extensible toany finite number of closed convex sets.E.10.2.1Relative measure of convergenceInspired by Fejér monotonicity, the alternating projection algorithm fromthe example of convergence illustrated by Figure 105 employs a redundant∏sequence: The first sequence (indexed by j) estimates point ( ∞ ∏P k )b inthe presumably nonempty intersection, then the quantity( ∞ ∥ x ∏∏i − P k)b∥j=1kj=1k(1619)in second sequence x i is observed per iteration i for convergence. A prioriknowledge of a feasible point (1609) is both impractical and antithetical. Weneed another measure:Nonexpansivity implies( ( ∥ ∏ ∏ ∥∥∥∥ P l)x k,i−1 − P l)x ki = ‖x ki − x k,i+1 ‖ ≤ ‖x k,i−1 − x ki ‖ (1620)∥llwherex ki ∆ = P k x k+1,i ∈ R n (1621)

E.10. ALTERNATING PROJECTION 577represents unique minimum-distance projection of x k+1,i on convex set k atiteration i . So a good convergence measure is the total monotonic sequenceε i ∆ = ∑ k‖x ki − x k,i+1 ‖ , i=0, 1, 2... (1622)where limi→∞ε i = 0 whether or not the intersection is nonempty.E.10.2.1.1 Example. Affine subset ∩ positive semidefinite cone.Consider the problem of finding X ∈ S n that satisfiesX ≽ 0, 〈A j , X〉 = b j , j =1... m (1623)given nonzero A j ∈ S n and real b j . Here we take C 1 to be the positivesemidefinite cone S n + while C 2 is the affine subset of S nC 2 = A = ∆ {X | tr(A j X)=b j , j =1... m} ⊆ S n⎡ ⎤svec(A 1 ) T= {X | ⎣ . ⎦svec X = b}svec(A m ) T∆= {X | A svec X = b}(1624)where b = [b j ] ∈ R m , A ∈ R m×n(n+1)/2 , and symmetric vectorization svec isdefined by (46). Projection of iterate X i ∈ S n on A is: (E.5.0.0.6)P 2 svec X i = svec X i − A † (A svec X i − b) (1625)The Euclidean distance from X i to A is thereforedist(X i , A) = ‖X i − P 2 X i ‖ F = ‖A † (A svec X i − b)‖ 2 (1626)Projection on the positive semidefinite cone (7.1.2) is found from the eigendecomposition P 2 X i = ∑ λ j q j qjTjP 1 P 2 X i =n∑max{0 , λ j }q j qj T (1627)j=1

578 APPENDIX E. PROJECTIONThe distance from P 2 X i to the positive semidefinite cone is thereforen∑dist(P 2 X i , S n +) = ‖P 2 X i − P 1 P 2 X i ‖ F = √ min{0,λ j } 2 (1628)When the intersection is empty A ∩ S n + = ∅ , the iterates converge to thatpositive semidefinite matrix closest to A in the Euclidean sense. Otherwise,convergence is to some point in the nonempty intersection.Barvinok (2.9.3.0.1) shows that if a point feasible with (1623) exists,then there exists an X ∈ A ∩ S n + such that⌊√ ⌋ 8m + 1 − 1rankX ≤(220)2E.10.2.1.2 Example. Semidefinite matrix completion.Continuing Example E.10.2.1.1: When m≤n(n + 1)/2 and the A j matricesare distinct members of the standard orthonormal basis {E lq ∈ S n } (49){ }el e T{A j ∈ S n l , l = q = 1... n, j =1... m} ⊆ {E lq } =√12(e l e T q + e q e T l ), 1 ≤ l < q ≤ nand when the constants b jX = ∆ [X lq ]∈ S n{b j , j =1... m} ⊆j=1(1629)are set to constrained entries of variable{ }Xlq√, l = q = 1... nX lq 2 , 1 ≤ l < q ≤ n= {〈X,E lq 〉} (1630)then the equality constraints in (1623) fix individual entries of X ∈ S n . Thusthe feasibility problem becomes a positive semidefinite matrix completionproblem. Projection of iterate X i ∈ S n on A simplifies to (confer (1625))P 2 svec X i = svec X i − A T (A svec X i − b) (1631)From this we can see that orthogonal projection is achieved simply bysetting corresponding entries of P 2 X i to the known entries of X , whilethe remaining entries of P 2 X i are set to corresponding entries of the currentiterate X i .

E.10. ALTERNATING PROJECTION 57910 0dist(P 2 X i , S n +)10 −10 2 4 6 8 10 12 14 16 18iFigure 106: Distance (confer (1628)) between PSD cone and iterate (1631)in affine subset A (1624) for Laurent’s problem; initially, decreasinggeometrically.Using this technique, we find a positive semidefinite completion for⎡⎢⎣4 3 ? 23 4 3 ?? 3 4 32 ? 3 4⎤⎥⎦ (1632)Initializing the unknown entries to 0, they all converge geometrically to1.5858 (rounded) after about 42 iterations.Laurent gives a problem for which no positive semidefinite completionexists: [146]⎡⎢⎣1 1 ? 01 1 1 ?? 1 1 10 ? 1 1⎤⎥⎦ (1633)Initializing unknowns to 0, by alternating projection we find the constrainedmatrix closest to the positive semidefinite cone,

580 APPENDIX E. PROJECTION⎡⎢⎣1 1 0.5454 01 1 1 0.54540.5454 1 1 10 0.5454 1 1⎤⎥⎦ (1634)and we find the positive semidefinite matrix closest to the affine subset A(1624):⎡⎤1.0521 0.9409 0.5454 0.0292⎢ 0.9409 1.0980 0.9451 0.5454⎥⎣ 0.5454 0.9451 1.0980 0.9409 ⎦ (1635)0.0292 0.5454 0.9409 1.0521These matrices (1634) and (1635) attain the Euclidean distance dist(A , S n +) .Convergence is illustrated in Figure 106.E.10.3Optimization and projectionUnique projection on the nonempty intersection of arbitrary convex sets tofind the closest point therein is a convex optimization problem. The firstsuccessful application of alternating projection to this problem is attributedto Dykstra [66] [39] who in 1983 provided an elegant algorithm that prevailstoday. In 1988, Han [105] rediscovered the algorithm and provided aprimal-dual convergence proof. A synopsis of the history of alternatingprojection E.20 can be found in [41] where it becomes apparent that Dykstra’swork is seminal.E.10.3.1Dykstra’s algorithmAssume we are given some point b ∈ R n and closed convex sets{C k ⊂ R n | k=1... L}. Let x ki ∈ R n and y ki ∈ R n respectively denote aprimal and dual vector (whose meaning can be deduced from Figure 107and Figure 108) associated with set k at iteration i . Initializey k0 = 0 ∀k=1... L and x 1,0 = b (1636)E.20 For a synopsis of alternating projection applied to distance geometry, see [229,3.1].

E.10. ALTERNATING PROJECTION 581bH 1y 21x 22x 21Hy 211x 12H 1 ∩ H 2x 11Figure 107: H 1 and H 2 are the same halfspaces as in Figure 102.Dykstra’s alternating projection algorithm generates the alternationsb, x 21 , x 11 , x 22 , x 12 , x 12 ...,. The path illustrated from b to x 12 in R 2terminates at the desired result, Pb . The alternations are not so robust inpresence of noise as for the example in Figure 101.Denoting by P k t the unique minimum-distance projection of t on C k , andfor convenience x L+1,i ∆ = x 1,i−1 , calculation of the iterates x 1i proceeds: E.21for i=1, 2,...until convergence {for k=L...1 {t = x k+1,i − y k,i−1x ki = P k ty ki = P k t − t}}(1637)E.21 We reverse order of projection (k=L...1) in the algorithm for continuity of exposition.

582 APPENDIX E. PROJECTIONK ⊥ H 1 ∩ H 2(0)K ⊥ H 1 ∩ H 2(Pb) + PbH 10H 2K ∆ = H 1 ∩ H 2PbbFigure 108: Two examples (truncated): Normal cone to H 1 ∩ H 2 at theorigin, and at point Pb on the boundary. H 1 and H 2 are the same halfspacesfrom Figure 107. The normal cone at the origin K ⊥ H 1 ∩ H 2(0) is simply −K ∗ .Assuming a nonempty intersection, then the iterates converge to the uniqueminimum-distance projection of point b on that intersection; [59,9.24]Pb = limi→∞x 1i (1638)In the case all the C k are affine, then calculation of y ki is superfluousand the algorithm becomes identical to alternating projection. [59,9.26][79,1] Dykstra’s algorithm is so simple, elegant, and represents such a tinyincrement in computational intensity over alternating projection, it is nearlyalways arguably cost-effective.E.10.3.2Normal coneGlunt [85,4] observes that the overall effect of Dykstra’s iterative procedureis to drive t toward the translated normal cone to ⋂ C k at the solutionPb (translated to Pb). The normal cone gets its name from its graphicalconstruction; which is, loosely speaking, to draw the outward-normals at Pb(Definition E.9.0.0.2) to all the convex sets C k touching Pb . The relativeinterior of the normal cone subtends these normal vectors.

E.10. ALTERNATING PROJECTION 583E.10.3.2.1 Definition. Normal cone. [167] [27, p.261] [123,A.5.2][36,2.1] [193,3] The normal cone to any set S ⊆ R n at any particularpoint a∈R n is defined as the closed coneK ⊥ S (a) ∆ = {z ∈ R n | z T (y −a)≤0 ∀y ∈ S} = −(S − a) ∗ (1639)an intersection of halfspaces about the origin in R n hence convex regardlessof the convexity of S ; the negative dual cone to the translate S − a . △Examples of normal cone construction are illustrated in Figure 108: Thenormal cone at the origin is the vector sum (2.1.8) of two normal cones;[36,3.3, exer.10] for H 1 ∩ int H 2 ≠ ∅K ⊥ H 1 ∩ H 2(0) = K ⊥ H 1(0) + K ⊥ H 2(0) (1640)This formula applies more generally to other points in the intersection.The normal cone to any affine set A at α ∈ A , for example, is theorthogonal complement of A − α . Projection of any point in the translatednormal cone KC ⊥ (a∈ C)+a on convex set C is identical to a ; in other words,point a is that point in C closest to any point belonging to the translatednormal cone KC ⊥ (a) + a ; e.g., Theorem E.4.0.0.1.When set S is a convex cone K , then the normal cone to K at the originK ⊥ K(0) = −K ∗ (1641)is the negative dual cone. Any point belonging to −K ∗ , projected on K ,projects on the origin. More generally, [59,4.5]K ⊥ K(a) = −(K − a) ∗ (1642)KK(a∈ ⊥ K) = −K ∗ ∩ a ⊥ (1643)⋂The normal cone to Ck at Pb in Figure 102 is the ray{ξ(b −Pb) | ξ ≥0} illustrated in Figure 108. Applying Dykstra’s algorithm tothat example, convergence to the desired result is achieved in two iterations asillustrated in Figure 107. Yet applying Dykstra’s algorithm to the examplein Figure 101 does not improve rate of convergence, unfortunately, becausethe given point b and all the alternating projections already belong to thetranslated normal cone at the vertex of intersection.From these few examples we speculate, unique minimum-distanceprojection on blunt polyhedral cones having nonempty interior may be foundby Dykstra’s algorithm in few iterations.

584 APPENDIX E. PROJECTION

Appendix FProof of EDM compositionF.1 EDM-entry exponential(4.10)D ∈ EDM n ⇔ [1 − e −λd ij] ∈ EDM n ∀λ > 0 (639)Lemma 2.1. from A Tour d’Horizon ...on Completion Problems. [145]The following assertions are equivalent: for D =[d ij , i,j =1... n] ∈ S n h andE n the elliptope in S n (4.9.1.0.1),(i) D ∈ EDM n(ii) e −λD ∆ = [e −λd ij] ∈ E n for all λ > 0(iii) 11 T − e −λD ∆ = [1 − e −λd ij] ∈ EDM n for all λ > 0⋄2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.585

586 APPENDIX F. PROOF OF EDM COMPOSITIONProof. [200] (confer [140])Date: Fri, 06 Jun 2003 10:42:47 +0200From: Monique Laurent To: Jon Dattorro Subject: Re: TourHallo Jon,I looked again at the paper of Schoenberg and what I can see isthe following:1) the equivalence of Lemma 2.1 (i) (ii) (my paper) is stated inSchoenberg’s Theorem 1 (page 527).2) (ii) implies (iii) can be seen from the statement in the beginningof section 3, saying that a distance space embeds in $L_2$ iffsome associated matrix is PSD. Let me reformulate it:Let $d=(d_{ij})_{i,j=0,1,..,n}$ be a distance space on $n+1$ points(i.e., symmetric matrix of order $n+1$ with zero diagonal)and let $p=(p_{ij})_{i,j=1,..,n}$ be the symmetric matrix of order$n$ related by relations:(A) $p_{ij} = {d_{0i}+d_{0j}-d_{ij}\2}$ for $i,j=1,..n$or equivalently(B) $d_{0i} = p_{ii}, d_{ij} = p_{ii}+p_{jj}-2p_{ij}$for $i,j=1,..,n$Then, $d$ embeds in $L_2$ iff $p$ is positive semidefinite matrixiff $d$ is of negative type(see second half of page 525 and top of page 526)

F.1. EDM-ENTRY EXPONENTIAL 587For the implication from (ii) to (iii), set:$p = exp(-\lambda d)$ and define $d’$ from $p$ using (B) above.Then, $d’$ is a distance space on $n+1$ points that embedsin $L_2$. Thus its subspace of $n$ points also embeds in $L_2$and is precisely $1-exp(-\lambda d)$.Note that (iii) implies (ii) cannot be read immediately from thisargument since (iii) involves the subdistance of $d’$ on $n$ points(and not the full $d’$ on $n+1$ points).3) Show (iii) implies (i) by using the series expansion of the function$1 - exp(-lambda d)$; the constant term cancels; $\lambda$ factors out;remains a summation of $d$ plus a multiple of $\lambda$; letting$\lambda$ go to 0 gives the result.As far as I can see this is not explicitly written in Schoenberg.But Schoenberg also uses such an argument of expansion of theexponential function plus letting $\lambda$ go to 0(see first proof in page 526).I hope this helps. If not just ask again.Best regards, Monique> Hi Monique>> I’m reading your "A Tour d’Horizon..." from the AMS book "Topics in> Semidefinite and Interior-Point Methods".>> On page 56, Lemma 2.1(iii), 1-exp(-\lambda D) is EDM D is EDM.> You cite Schoenberg 1938; a paper I have acquired. I am missing the> connection of your Lemma(iii) to that paper; most likely because of my> lack of understanding of Schoenberg’s results. I am wondering if you> might provide a hint how you arrived at that result in terms of> Schoenberg’s results.>> Jon Dattorro

588 APPENDIX F. PROOF OF EDM COMPOSITION

Appendix GMatlab programsThese programs are available on the author’s website:http://www.stanford.edu/∼dattorro/tools.htmlG.1 isedm()%Is real D a Euclidean Distance Matrix. -Jon Dattorro%%[Dclosest,X,isisnot,r] = isedm(D,tolerance,verbose,dimension,V)%%Returns: closest EDM in Schoenberg sense (default output),% a generating list X,% string ’is’ or ’isnot’ EDM,% actual affine dimension r of EDM output.%Input: candidate matrix D,% optional absolute numerical tolerance for EDM determination,% optional verbosity ’on’ or ’off’,% optional desired affine dim of generating list X output,% optional choice of ’Vn’ auxiliary matrix (default) or ’V’.function [Dclosest,X,isisnot,r] = isedm(D,tolerance_in,verbose,dim,V);isisnot = ’is’;N = length(D);2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.589

590 APPENDIX G. MATLAB PROGRAMSif nargin < 2 | isempty(tolerance_in)tolerance_in = eps;endtolerance = max(tolerance_in, eps*N*norm(D));if nargin < 3 | isempty(verbose)verbose = ’on’;endif nargin < 5 | isempty(V)use = ’Vn’;elseuse = ’V’;end%is emptyif N < 1if strcmp(verbose,’on’), disp(’Input D is empty.’), endX = [ ];Dclosest = [ ];isisnot = ’isnot’;r = [ ];returnend%is squareif size(D,1) ~= size(D,2)if strcmp(verbose,’on’), disp(’An EDM must be square.’), endX = [ ];Dclosest = [ ];isisnot = ’isnot’;r = [ ];returnend%is realif ~isreal(D)if strcmp(verbose,’on’), disp(’Because an EDM is real,’), endisisnot = ’isnot’;D = real(D);end

G.1. ISEDM() 591%is nonnegativeif sum(sum(chop(D,tolerance) < 0))isisnot = ’isnot’;if strcmp(verbose,’on’), disp(’Because an EDM is nonnegative,’),endend%is symmetricif sum(sum(abs(chop((D - D’)/2,tolerance)) > 0))isisnot = ’isnot’;if strcmp(verbose,’on’), disp(’Because an EDM is symmetric,’), endD = (D + D’)/2; %only required conditionend%has zero diagonalif sum(abs(diag(chop(D,tolerance))) > 0)isisnot = ’isnot’;if strcmp(verbose,’on’)disp(’Because an EDM has zero main diagonal,’)endend%is EDMif strcmp(use,’Vn’)VDV = -Vn(N)’*D*Vn(N);elseVDV = -Vm(N)’*D*Vm(N);end[Evecs Evals] = signeig(VDV);if ~isempty(find(chop(diag(Evals),...max(tolerance_in,eps*N*normest(VDV))) < 0))isisnot = ’isnot’;if strcmp(verbose,’on’), disp(’Because -VDV < 0,’), endendif strcmp(verbose,’on’)if strcmp(isisnot,’isnot’)disp(’matrix input is not EDM.’)elseif tolerance_in == epsdisp(’Matrix input is EDM to machine precision.’)elsedisp(’Matrix input is EDM to specified tolerance.’)end

592 APPENDIX G. MATLAB PROGRAMSend%find generating listr = max(find(chop(diag(Evals),...max(tolerance_in,eps*N*normest(VDV))) > 0));if isempty(r)r = 0;endif nargin < 4 | isempty(dim)dim = r;elsedim = round(dim);endt = r;r = min(r,dim);if r == 0X = zeros(1,N);elseif strcmp(use,’Vn’)X = [zeros(r,1) diag(sqrt(diag(Evals(1:r,1:r))))*Evecs(:,1:r)’];elseX = [diag(sqrt(diag(Evals(1:r,1:r))))*Evecs(:,1:r)’]/sqrt(2);endendif strcmp(isisnot,’isnot’) | dim < tDclosest = Dx(X);elseDclosest = D;end

G.1. ISEDM() 593G.1.1 Subroutines for isedm()G.1.1.1 chop()%zeroing entries below specified absolute tolerance threshold%-Jon Dattorrofunction Y = chop(A,tolerance)R = real(A);I = imag(A);if nargin == 1tolerance = max(size(A))*norm(A)*eps;endidR = find(abs(R) < tolerance);idI = find(abs(I) < tolerance);R(idR) = 0;I(idI) = 0;Y = R + i*I;G.1.1.2 Vn()function y = Vn(N)y = [-ones(1,N-1);eye(N-1)]/sqrt(2);G.1.1.3 Vm()%returns EDM V matrixfunction V = Vm(n)V = [eye(n)-ones(n,n)/n];

594 APPENDIX G. MATLAB PROGRAMSG.1.1.4 signeig()%Sorts signed real part of eigenvalues%and applies sort to values and vectors.%[Q, lam] = signeig(A)%-Jon Dattorrofunction [Q, lam] = signeig(A);[q l] = eig(A);lam = diag(l);[junk id] = sort(real(lam));id = id(length(id):-1:1);lam = diag(lam(id));Q = q(:,id);if nargout < 2Q = diag(lam);endG.1.1.5 Dx()%Make EDM from point listfunction D = Dx(X)[n,N] = size(X);one = ones(N,1);del = diag(X’*X);D = del*one’ + one*del’ - 2*X’*X;

G.2. CONIC INDEPENDENCE, CONICI() 595G.2 conic independence, conici()The recommended subroutine lp() (G.2.1) is a linear program solver fromMatlab’s Optimization Toolbox v2.0 (R11). Later releases of Matlabreplace lp() with linprog() that we find quite inferior to lp() on a widerange of problems.Given an arbitrary set of directions, this c.i. subroutine removes theconically dependent members. Yet a conically independent set returned isnot necessarily unique. In that case, if desired, the set returned may bealtered by reordering the set input.% Test for c.i. of arbitrary directions in rows or columns of X.% -Jon Dattorrofunction [Xci, indep_str, how_many_depend] = conici(X,rowORcol,tol);if nargin < 3tol=max(size(X))*eps*norm(X);endif nargin < 2 | strcmp(rowORcol,’col’)rowORcol = ’col’;Xin = X;elseif strcmp(rowORcol,’row’)Xin = X’;elsedisp(’Invalid rowORcol input.’)returnend[n, N] = size(Xin);indep_str = ’conically independent’;how_many_depend = 0;if rank(Xin) == NXci = X;returnend

596 APPENDIX G. MATLAB PROGRAMScount = 1;new_N = N;%remove zero rows or columnsfor i=1:Nif chop(Xin(:,count),tol)==0how_many_depend = how_many_depend + 1;indep_str = ’conically Dependent’;Xin(:,count) = [ ];new_N = new_N - 1;elsecount = count + 1;endend%remove conic dependenciescount = 1;newer_N = new_N;for i=1:new_Nif newer_N > 1A = [Xin(:,1:count-1) Xin(:,count+1:newer_N); -eye(newer_N-1)];b = [Xin(:,count); zeros(newer_N-1,1)];[a, lambda, how] = lp(zeros(newer_N-1,1),A,b,[ ],[ ],[ ],n,-1);if ~strcmp(how,’infeasible’)how_many_depend = how_many_depend + 1;indep_str = ’conically Dependent’;Xin(:,count) = [ ];newer_N = newer_N - 1;elsecount = count + 1;endendendif strcmp(rowORcol,’col’)Xci = Xin;elseXci = Xin’;end

G.2. CONIC INDEPENDENCE, CONICI() 597G.2.1 lp()LP Linear programming.X=LP(f,A,b) solves the linear programming problem:min f’x subject to: Ax

598 APPENDIX G. MATLAB PROGRAMSG.3 Map of the USAG.3.1 EDM, mapusa()%Find map of USA using only distance information.% -Jon Dattorro%Reconstruction from EDM.clear all;close all;load usalo; %From Matlab Mapping Toolboxhttp://www-ccs.ucsd.edu/matlab/toolbox/map/usalo.html%To speed-up execution (decimate map data), make%’factor’ bigger positive integer.factor = 1;Mg = 2*factor; %Relative decimation factorsMs = factor;Mu = 2*factor;gtlakelat = decimate(gtlakelat,Mg);gtlakelon = decimate(gtlakelon,Mg);statelat = decimate(statelat,Ms);statelon = decimate(statelon,Ms);uslat = decimate(uslat,Mu);uslon = decimate(uslon,Mu);lat = [gtlakelat; statelat; uslat]*pi/180;lon = [gtlakelon; statelon; uslon]*pi/180;phi = pi/2 - lat;theta = lon;x = sin(phi).*cos(theta);y = sin(phi).*sin(theta);z = cos(phi);%plot original dataplot3(x,y,z), axis equal, axis off

G.3. MAP OF THE USA 599lengthNaN = length(lat);id = find(isfinite(x));X = [x(id)’; y(id)’; z(id)’];N = length(X(1,:))% Make the distance matrixclear gtlakelat gtlakelon statelat statelonclear factor x y z phi theta conusclear uslat uslon Mg Ms Mu lat lonD = diag(X’*X)*ones(1,N) + ones(N,1)*diag(X’*X)’ - 2*X’*X;%destroy input dataclear XVn = [-ones(1,N-1); speye(N-1)];VDV = (-Vn’*D*Vn)/2;clear D Vnpack[evec evals flag] = eigs(VDV, speye(size(VDV)), 10, ’LR’);if flag, disp(’convergence problem’), return, end;evals = real(diag(evals));index = find(abs(evals) > eps*normest(VDV)*N);n = sum(evals(index) > 0);Xs = [zeros(n,1) diag(sqrt(evals(index)))*evec(:,index)’];warning off; Xsplot=zeros(3,lengthNaN)*(0/0); warning on;Xsplot(:,id) = Xs;figure(2)%plot map found via EDM.plot3(Xsplot(1,:), Xsplot(2,:), Xsplot(3,:))axis equal, axis off

600 APPENDIX G. MATLAB PROGRAMSG.3.1.1USA map input-data decimation, decimate()function xd = decimate(x,m)roll = 0;rock = 1;for i=1:length(x)if isnan(x(i))roll = 0;xd(rock) = x(i);rock=rock+1;elseif ~mod(roll,m)xd(rock) = x(i);rock=rock+1;endroll=roll+1;endendxd = xd’;G.3.2 EDM using ordinal data, omapusa()%Find map of USA using ORDINAL distance information.% -Jon Dattorroclear all;close all;load usalo; %From Matlab Mapping Toolboxhttp://www-ccs.ucsd.edu/matlab/toolbox/map/usalo.htmlfactor = 1;Mg = 2*factor; %Relative decimation factorsMs = factor;Mu = 2*factor;gtlakelat = decimate(gtlakelat,Mg);gtlakelon = decimate(gtlakelon,Mg);statelat = decimate(statelat,Ms);statelon = decimate(statelon,Ms);

G.3. MAP OF THE USA 601uslatuslon= decimate(uslat,Mu);= decimate(uslon,Mu);lat = [gtlakelat; statelat; uslat]*pi/180;lon = [gtlakelon; statelon; uslon]*pi/180;phi = pi/2 - lat;theta = lon;x = sin(phi).*cos(theta);y = sin(phi).*sin(theta);z = cos(phi);%plot original dataplot3(x,y,z), axis equal, axis offlengthNaN = length(lat);id = find(isfinite(x));X = [x(id)’; y(id)’; z(id)’];N = length(X(1,:))% Make the distance matrixclear gtlakelat gtlakelon statelat statelon state stateborder greatlakesclear factor x y z phi theta conusclear uslat uslon Mg Ms Mu lat lonD = diag(X’*X)*ones(1,N) + ones(N,1)*diag(X’*X)’ - 2*X’*X;%ORDINAL MDS - vectorize Dcount = 1;M = (N*(N-1))/2;f = zeros(M,1);for j=2:Nfor i=1:j-1f(count) = D(i,j);count = count + 1;endend%sorted is f(idx)[sorted idx] = sort(f);clear D sorted X

602 APPENDIX G. MATLAB PROGRAMSf(idx)=((1:M).^2)/M^2;%Create ordinal data matrixO = zeros(N,N);count = 1;for j=2:Nfor i=1:j-1O(i,j) = f(count);O(j,i) = f(count);count = count+1;endendclear f idxVn = sparse([-ones(1,N-1); eye(N-1)]);VOV = (-Vn’*O*Vn)/2;clear O Vnpack[evec evals flag] = eigs(VOV, speye(size(VOV)), 10, ’LR’);if flag, disp(’convergence problem’), return, end;evals = real(diag(evals));Xs = [zeros(3,1) diag(sqrt(evals(1:3)))*evec(:,1:3)’];warning off; Xsplot=zeros(3,lengthNaN)*(0/0); warning on;Xsplot(:,id) = Xs;figure(2)%plot map found via Ordinal MDS.plot3(Xsplot(1,:), Xsplot(2,:), Xsplot(3,:))axis equal, axis off

G.4. RANK REDUCTION SUBROUTINE, RRF() 603G.4 Rank reduction subroutine, RRf()%Rank Reduction function -Jon Dattorro%Inputs are:% Xstar matrix,% affine equality constraint matrix A whose rows are in svec format.%% Tolerance scheme needs revision...function X = RRf(Xstar,A);rand(’seed’,23);m = size(A,1);n = size(Xstar,1);if size(Xstar,1)~=size(Xstar,2)disp(’Rank Reduction subroutine: Xstar not square’)pauseendtoler = norm(eig(Xstar))*size(Xstar,1)*1e-9;if sum(chop(eig(Xstar),toler)

604 APPENDIX G. MATLAB PROGRAMS%try to find sparsity pattern for Z_itolerance = norm(X,’fro’)*size(X,1)*1e-9;Ztem = zeros(rho,rho);pattern = find(chop(cumu,tolerance)==0);if isempty(pattern) %if no sparsity, do random projectionranp = svec(2*(rand(rho,rho)-0.5));Z(1:rho,1:rho,i)...=svecinv((eye(rho*(rho+1)/2)-pinv(svecRAR)*svecRAR)*ranp);elsedisp(’sparsity pattern found’)Ztem(pattern)=1;Z(1:rho,1:rho,i) = Ztem;endphiZ = 1;toler = norm(eig(Z(1:rho,1:rho,i)))*rho*1e-9;if sum(chop(eig(Z(1:rho,1:rho,i)),toler)

G.4. RANK REDUCTION SUBROUTINE, RRF() 605G.4.1 svec()%Map from symmetric matrix to vector% -Jon Dattorrofunction y = svec(Y,N)if nargin == 1N=size(Y,1);endy = zeros(N*(N+1)/2,1);count = 1;for j=1:Nfor i=1:jif i~=jy(count) = sqrt(2)*Y(i,j);elsey(count) = Y(i,j);endcount = count + 1;endend

606 APPENDIX G. MATLAB PROGRAMSG.4.2 svecinv()%convert vector into symmetric matrix. m is dim of matrix.% -Jon Dattorrofunction A = svecinv(y)m = round((sqrt(8*length(y)+1)-1)/2);if length(y) ~= m*(m+1)/2disp(’dimension error in svecinv()’);pauseendA = zeros(m,m);count = 1;for j=1:mfor i=1:mif i

Appendix HNotation and a few definitionsbvector, scalar, logical condition (italic abcdefghijklmnopqrstuvwxyz)b i i th element of vector b or i th b vector from a set or list {b j }b i:jb Tb HA T 1Askinnytruncated vector comprising i th through j th entry of vector btransposeHermitian (conjugate) transposefirst of various transpositions of a cubix or quartix Amatrix, vector, scalar, or logical condition(italic ABCDEFGHIJKLMNOPQRSTUV WXY Z)⎡ ⎤a skinny matrix, meaning more rows than columns; ⎣ ⎦. When thereare more equations than unknowns, we say that the system Ax = b isoverdetermined. [88,5.3]fata fat matrix, meaning more columns than rows;underdetermined[ ],AF(C ∋A)some set (calligraphic ABCDEFGHIJ KLMN OPQRST UVWX YZ)smallest face (132) that contains element A of set C2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.607

608 APPENDIX H. NOTATION AND A FEW DEFINITIONSG(K)A −1A †√generators (2.8.1.2) of set K ; any collection of points and directionswhose hull constructs Kinverse of matrix AMoore-Penrose pseudoinverse of matrix Apositive square rootA 1/2 and √ A A 1/2 is any matrix √ such that A 1/2 A 1/2 =A .For A ∈ S n + , A ∈ Sn+ is unique and √ A √ A =A . [36,1.2]◦√D = ∆ [ √ d ij ] . (973) Hadamard positive square root: D = ◦√ D ◦ ◦√ D .A ijE ijEλ i (X)λ(X) iA(:,i)A(j,:)A i:j,k:le.g.no.a.i.c.i.l.i.w.r.tij th element of matrix Amember of standard orthonormal basis for symmetric (49) or symmetrichollow (63) matriceselementary matrixi th entry of vector λ is function of Xi th entry of vector-valued function of Xi th column of matrix A [88,1.1.8]j th row of matrix Aor A(i:j, k:l) , submatrix taken from i through j th row and kthrough l th columnexempli gratia, from the Latin meaning for sake of examplenumber, from the Latin numeroaffinely independentconically independentlinearly independentwith respect to

609a.k.a also known asReImreal partimaginary part⊂ ⊃ ∩ ∪ standard set theory, subset, superset, intersection, union∈membership, element belongs to, or element is a member of∋ membership, contains as in C ∋ y (C contains element y)∃∴∀∝∞≡such thatthere existsthereforefor allproportional toinfinityequivalent to∆= defined equal to≈≃∼ =approximately equal toisomorphic to or withcongruent to or with◦ Hadamard product of matricesHadamard quotient as in, for x,y ∈ R n ,xy∆= [x i /y i , i=1... n]∈ R n⊗Kronecker product of matrices× Cartesian product

610 APPENDIX H. NOTATION AND A FEW DEFINITIONS⊕⊖⊞vector sum of sets X = Y ⊕ Z where every element x∈X has uniqueexpression x = y + z where y ∈ Y and z ∈ Z ; [194, p.19] then thesummands are algebraic complements. X = Y ⊕ Z ⇒ X = Y + Z .Now assume Y and Z are nontrivial subspaces. X = Y + Z ⇒X = Y ⊕ Z ⇔ Y ∩ Z =0 [195,1.2] [59,5.8]. Each element froma vector sum (+) of subspaces has a unique representation (⊕) when abasis from each subspace is linearly independent of bases from all theother subspaces.vector difference of setsorthogonal vector sum of sets X = Y ⊞ Z where every element x∈Xhas unique orthogonal expression x = y + z where y ∈ Y , z ∈ Z ,and y ⊥ z . [210, p.51] X = Y ⊞ Z ⇒ X = Y + Z . If Z ⊆ Y ⊥ thenX = Y ⊞ Z ⇔ X = Y ⊕ Z . [59,5.8] If Z = Y ⊥ then the summandsare orthogonal complements.± plus or minus⊥as in A ⊥ B meaning A is orthogonal to B (and vice versa), whereA and B are sets, vectors, or matrices. When A and B arevectors (or matrices under Frobenius norm), A ⊥ B ⇔ 〈A,B〉 = 0⇔ ‖A + B‖ 2 = ‖A‖ 2 + ‖B‖ 2\ as in \A means logical not A , or relative complement of set A ; e.g.,B\A = {x∈B | x /∈A}⇒ or ⇐ sufficiency or necessity, implies; e.g., A ⇒ B ⇔ \A ⇐ \B⇔is or →↓if and only if (iff) or corresponds to or necessary and sufficient orthe same asas in A is B means A ⇒ B ; conventional usage of English languageby mathematiciansdoes not implygoes to, or maps togoes to from above; e.g., above might mean positive in some context[123, p.2]

611←is replaced with; substitution| as in f(x) | x∈ C means with the condition(s) or such that orevaluated for, or as in {f(x) | x∈ C} means evaluated for each andevery x belonging to set Cg| xpexpression g evaluated at x p: as in f : R n → R m meaning f is a mapping, or sequence of successiveintegers specified by bounds as in i:j (if j < i then sequence isdescending)f : M → R[A, B ](A, B)meaning f is a mapping from ambient space M to ambient R , notnecessarily denoting either domain or rangeclosed interval or line segment between A and B in Ropen interval between A and B in R , or variable pair perhaps ofdisparate dimensionA, B as in, for example, A, B ∈ S N means A ∈ S N and B ∈ S N( ) hierarchal, parenthetical, optional{ } curly braces denote a set or list, e.g., {Xa | a ≽ 0} the set of all Xafor each and every a ≽ 0 where membership of a to some space isimplicit, a union〈〉 angle brackets denote vector inner-product[ ] matrix or vector, quote insertion[d ij ] matrix whose ij th entry is d ij[x i ] vector whose i th entry is x ix pparticular value of xx 0particular instance of x , or initial value of a sequence x ix 1 first entry of vector x , or first element of a set or list {x i }x ⋆optimal value of variable x

612 APPENDIX H. NOTATION AND A FEW DEFINITIONSx ∗f ∗ˆfcomplex conjugateconvex conjugate functionreal function (one-dimensional)K coneK ∗dual coneK ◦ polar cone; K ∗ = −K ◦P C x or PxP k xδ(A)δ 2 (A)δ(A) 2λ(A)σ(A)π(γ)projection of point x on set C , P is operator or idempotent matrixprojection of point x on set C k(A.1) vector made from the main diagonal of A if A is a matrix;otherwise, diagonal matrix made from vector A≡ δ(δ(A)). For vector or diagonal matrix Λ , δ 2 (Λ) = Λ= δ(A)δ(A) where A is a vectorvector of eigenvalues of matrix A , typically arranged in nonincreasingordervector of singular values of matrix A (always arranged in nonincreasingorder), or support functionpermutation function (or presorting function) arranges vector γ intononincreasing orderψ(a) signum-like step function (902) (1168)VN ×N symmetric elementary, auxiliary, projector, geometric centeringmatrix, R(V )= N(1 T ) , N(V )= R(1) , V 2 =VV N N ×N −1 Schoenberg auxiliary matrix, R(V N )= N(1 T ) ,N(VN T )= R(1)V X V X V T X ≡ V T X T XV (738)Xpoint list (having cardinality N) arranged columnar in R n×N , or set ofgenerators, or extreme directions, or matrix variable

613GDDD T (X)D(X) TD −1 (X)D(X) −1D ⋆D ∗Gram matrix X T Xsymmetric hollow matrix of distance-square, or Euclidean distancematrix, or EDM candidateEuclidean distance matrix operatoradjoint operatortranspose of D(X)inverse operatorinverse of D(X)optimal value of variable Ddual to variable DD ◦ polar variable D∂ partial derivative or matrix of distance-square squared or, as in ∂K ,boundary of set K∂ypartial differential of yV geometric centering operator, V(D)= −V DV 1 2V NrnNdomonontoV N (D)= −V T N DV Naffine dimensiondimension of list X , or integercardinality of list X , or integerfunction domainfunction f(x) on A means A is domf , or projection of x on Ameans A is Euclidean body on which projection of x is madefunction f(x) maps onto M means f over its domain is a surjectionwith respect to M

614 APPENDIX H. NOTATION AND A FEW DEFINITIONSepifunction epigraphspan as in spanA = R(A) = {Ax | x∈ R n }R(A) the subspace, range of A , or span basis R(A) ; R(A) ⊥ N(A T )basis R(A)columnar basis for range of A , or a minimal set constituting generatorsfor the vertex-description of R(A) , or a linearly independent set ofvectors spanning R(A)N(A) the subspace, nullspace of A ; N(A) ⊥ R(A T )ı or j[ ] RmR nR m×nR nC n or C n×n√ −1R m × R n = R m+nEuclidean vector space of m by n dimensional real matricesEuclidean n-dimensional real vector spaceEuclidean complex vector space of respective dimension n and n×nR n + or R n×n+ nonnegative orthant in Euclidean vector space of respective dimensionn and n×nR n −or R n×n−S nS n⊥S n +int S n +S n +(ρ)nonpositive orthant in Euclidean vector space of respective dimensionn and n×nsubspace comprising all (real) symmetric n×n matrices, the symmetricmatrix subspaceorthogonal complement of S n in R n×n , the antisymmetric matricesconvex cone comprising all (real) symmetric positive semidefinite n×nmatrices, the positive semidefinite coneinterior of convex cone comprising all (real) symmetric positivesemidefinite n×n matrices; id est, positive definite matricesconvex set of all positive semidefinite n×n matrices whose rank equalsor exceeds ρ

615EDM NPSDSDPEDMS n 1S n hS n⊥hS n ccone of N × N Euclidean distance matrices in the symmetric hollowsubspacepositive semidefinitesemidefinite programEuclidean distance matrixsubspace comprising all symmetric n×n matrices having all zeros infirst row and column (1565)subspace comprising all symmetric hollow n×n matrices (0 maindiagonal), the symmetric hollow subspace (55)orthogonal complement of S n h in Sn (56), the set of all diagonal matricessubspace comprising all geometrically centered symmetric n×nmatrices; geometric center subspace S N ∆c = {Y ∈ S N | Y 1=0} (1561)S n⊥c orthogonal complement of S n c in S n (1563)R m×ncsubspace comprising all geometrically centered m×n matricesX ⊥ basis N(X T )x ⊥ N(x T ) , {y | x T y = 0}R(P ) ⊥ N(P T )R ⊥ orthogonal complement of R⊆ R n ; R ⊥ ={y ∆ ∈ R n | 〈x,y〉=0 ∀x∈ R}K ⊥K M+K MK ∗ λδH∂Hnormal conemonotone nonnegative conemonotone conecone of majorizationhalfspacehyperplane; id est, partial boundary of halfspace

616 APPENDIX H. NOTATION AND A FEW DEFINITIONS∂H∂H −∂H +dsupporting hyperplanea supporting hyperplane having outward-normal with respect to set itsupportsa supporting hyperplane having inward-normal with respect to set itsupportsvector of distance-squared ijlower bound on distance-square d ijd ijABCupper bound on distance-square d ijclosed line segment ABclosure of set Cdecomposition orthonormal (1473) page 532, biorthogonal (1450) page 526expansion orthogonal (1483) page 535, biorthogonal (321) page 171cubixquartixfeasible setsolution setnatural orderg ′g ′′member of R M×N×Lmember of R M×N×L×Kmost simply, the set of all variable values satisfying all constraints ofan optimization problemmost simply, the set of all optimal solutions to an optimization problem;a subset of the feasible set and not necessarily a single pointwith reference to stacking columns in a vectorization means a vectormade from superposing column 1 on top of column 2 then superposingthe result on column 3 and so on; as in a vector made from entries of themain diagonal δ(A) means taken from left to right and top to bottomfirst derivative of possibly multidimensional function with respect toreal argumentsecond derivative with respect to real argument

617→Ydg→Ydg 2∇∆△ ijk√d ijd ijI∅first directional derivative of possibly multidimensional function g indirection Y ∈R K×L (maintains dimensions of g)second directional derivative of g in direction Ygradient from calculus, ∇ X is gradient with respect to X , ∇ 2 issecond-order gradientdistance scalar, or matrix of absolute distance, or difference operator,or diagonal matrixtriangle made by vertices i , j , and k(absolute) distance scalardistance-square scalar, EDM entryidentity matrixthe empty set is an implicit member of every set0 real zero0 vector or matrix of zeros1 real one1 vector of onese itightmaxmaximizexargsupvector whose i th entry is 1 (otherwise 0), or i th member of the standardbasis for R nwith reference to a bound means a bound that can be met,with reference to an inequality means equality is achievablemaximum [123,0.1.1]find the maximum with respect to variable xargument of operator or function, or variable of optimizationsupremum, least upper bound [123,0.1.1]

618 APPENDIX H. NOTATION AND A FEW DEFINITIONSarg supf(x)subject tominminimizexinfarg inf f(x)argument x at supremum of function f ; not necessarily a member offunction domainspecifies constraints to an optimization problemminimum [123,0.1.1]find the minimum with respect to variable xinfimum, greatest lower bound [123,0.1.1]argument x at infimum of function f ; not necessarily a member offunction domainiff if and only if, necessary and sufficient; also the meaningindiscriminately attached to appearance of the word “if ” in thestatement of a mathematical definition, an esoteric practice worthyof abolition because of the ambiguity thus conferredrelintsgnroundmodtrrankrelativeinteriorsignum functionround to nearest integermodulus functionmatrix traceas in rankA, rank of matrix A ; dim R(A)dim dimension, dim R n = n , dim(x∈ R n ) = n , dim R(x∈ R n ) = 1,dim R(A∈ R m×n )= rank(A)affconvcenvconeaffine hullconvex hullconvex envelopeconic hull

content content of high-dimensional bounded polyhedron, volume in 3dimensions, area in 2, and so on619cofdistvecmatrix of cofactors corresponding to matrix argumentdistance between point or set argumentsvectorization of m ×n matrix, Euclidean dimension mnsvec vectorization of symmetric n ×n matrix, Euclidean dimensionn(n + 1)/2dvec(x,y)≽≻⊁≥vectorization of symmetric hollow n ×n matrix, Euclidean dimensionn(n − 1)/2angle between vectors x and y , or dihedral angle between affinesubsetsgeneralized inequality; e.g., A ≽ 0 means vector or matrix A must beexpressible in a biorthogonal expansion having nonnegative coordinateswith respect to extreme directions of some implicit pointed closed convexcone K , or comparison to the origin with respect to some implicitpointed closed convex cone, or when K= S n + matrix A belongs to thepositive semidefinite cone of symmetric matricesstrict generalized inequalitynot positive definitescalar inequality, greater than or equal to; comparison of scalars orentrywise comparison of vectors or matrices with respect to R +nonnegative for α∈ R , α ≥ 0> greater thanpositive for α∈ R , α > 0⌊ ⌋ floor function, ⌊x⌋ is greatest integer not exceeding x| | entrywise absolute value of scalars, vectors, and matricesdetmatrix determinant

620 APPENDIX H. NOTATION AND A FEW DEFINITIONS‖x‖ vector 2-norm or Euclidean norm ‖x‖ 2√n∑‖x‖ l= l |x j | l vector l-normj=1‖x‖ ∞ = max{|x j | ∀j} infinity-norm‖x‖ 2 2 = x T x = 〈x , x〉‖x‖ 1 = 1 T |x| 1-norm, dual infinity-norm‖x‖ 0 0-norm, cardinality of vector x , 0 0 = ∆ 0‖X‖ 2= sup ‖Xa‖ 2 = σ 1 = √ λ(X T X) 1‖a‖=1largest singular valuematrix 2-norm (spectral norm),‖X‖ = ‖X‖ F Frobenius’ matrix norm

Bibliography[1] Suliman Al-Homidan and Henry Wolkowicz. Approximate and exactcompletion problems for Euclidean distance matrices using semidefiniteprogramming, April 2004.orion.math.uwaterloo.ca/∼hwolkowi/henry/reports/edmapr04.ps .[2] Abdo Y. Alfakih. On the uniqueness of Euclidean distance matrixcompletions. Linear Algebra and its Applications, 370:1–14, 2003.[3] Abdo Y. Alfakih. On the uniqueness of Euclidean distance matrixcompletions: the case of points in general position. Linear Algebra andits Applications, 397:265–277, 2005.[4] Abdo Y. Alfakih, Amir Khandani, and Henry Wolkowicz. SolvingEuclidean distance matrix completion problems via semidefiniteprogramming. Computational Optimization and Applications,12(1):13–30, January 1999.http://citeseer.ist.psu.edu/alfakih97solving.html .[5] Abdo Y. Alfakih and Henry Wolkowicz. On the embeddabilityof weighted graphs in Euclidean spaces. Research Report CORR98-12, Department of Combinatorics and Optimization, University ofWaterloo, May 1998. Erratum: p.254 herein.http://citeseer.ist.psu.edu/alfakih98embeddability.html .[6] Abdo Y. Alfakih and Henry Wolkowicz. Matrix completion problems.In Henry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe,editors, Handbook of Semidefinite Programming: Theory, Algorithms,and Applications, chapter 18. Kluwer, 2000.621

622 BIBLIOGRAPHY[7] Abdo Y. Alfakih and Henry Wolkowicz. Two theorems on Euclideandistance matrices and Gale transform. Linear Algebra and itsApplications, 340:149–154, 2002.[8] Farid Alizadeh. Combinatorial Optimization with Interior PointMethods and Semi-Definite Matrices. PhD thesis, University ofMinnesota, Minneapolis, Computer Science Department, October1991.[9] Farid Alizadeh. Interior point methods in semidefinite programmingwith applications to combinatorial optimization. SIAM Journal onOptimization, 5(1):13–51, February 1995.[10] Kurt Anstreicher and Henry Wolkowicz. On Lagrangian relaxation ofquadratic matrix constraints. SIAM Journal on Matrix Analysis andApplications, 22(1):41–55, 2000.[11] Howard Anton. Elementary Linear Algebra. Wiley, second edition,1977.[12] D. Avis and K. Fukuda. A pivoting algorithm for convex hulls andvertex enumeration of arrangements and polyhedra. Discrete andComputational Geometry, 8:295–313, 1992.[13] Mihály Bakonyi and Charles R. Johnson. The Euclidean distancematrix completion problem. SIAM Journal on Matrix Analysis andApplications, 16(2):646–654, April 1995.[14] Keith Ball. An elementary introduction to modern convex geometry.In Silvio Levy, editor, Flavors of Geometry, volume 31, chapter 1,pages 1–58. MSRI Publications, 1997.http://www.msri.org/publications/books/Book31/files/ball.pdf .[15] George Phillip Barker. Theory of cones. Linear Algebra and itsApplications, 39:263–291, 1981.[16] George Phillip Barker and David Carlson. Cones of diagonallydominant matrices. Pacific Journal of Mathematics, 57(1):15–32, 1975.[17] Alexander Barvinok. A Course in Convexity. American MathematicalSociety, 2002.

BIBLIOGRAPHY 623[18] Alexander I. Barvinok. Problems of distance geometry and convexproperties of quadratic maps. Discrete & Computational Geometry,13(2):189–202, 1995.[19] Alexander I. Barvinok. A remark on the rank of positive semidefinitematrices subject to affine constraints. Discrete & ComputationalGeometry, 25(1):23–31, 2001.http://citeseer.ist.psu.edu/304448.html .[20] Heinz H. Bauschke and Jonathan M. Borwein. On projectionalgorithms for solving convex feasibility problems. SIAM Review,38(3):367–426, September 1996.[21] Steven R. Bell. The Cauchy Transform, Potential Theory, andConformal Mapping. CRC Press, 1992.[22] Jean Bellissard and Bruno Iochum. Homogeneous and faciallyhomogeneous self-dual cones. Linear Algebra and its Applications,19:1–16, 1978.[23] Adi Ben-Israel. Linear equations and inequalities on finite dimensional,real or complex, vector spaces: A unified theory. Journal ofMathematical Analysis and Applications, 27:367–389, 1969.[24] Aharon Ben-Tal and Arkadi Nemirovski. Lectures on Modern ConvexOptimization: Analysis, Algorithms, and Engineering Applications.SIAM, 2001.[25] Abraham Berman. Cones, Matrices, and Mathematical Programming,volume 79 of Lecture Notes in Economics and Mathematical Systems.Springer-Verlag, 1973.[26] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific,second edition, 1999.[27] Dimitri P. Bertsekas, Angelia Nedić, and Asuman E. Ozdaglar.Convex Analysis and Optimization. Athena Scientific, 2003.[28] Rajendra Bhatia. Matrix Analysis. Springer-Verlag, 1997.

624 BIBLIOGRAPHY[29] Pratik Biswas, Tzu-Chen Liang, Ta-Chung Wang, and Yinyu Ye.Semidefinite programming based algorithms for sensor networklocalization.http://www.stanford.edu/∼yyye/combined rev3.pdf , 2005.[30] Pratik Biswas and Yinyu Ye. Semidefinite programming for ad hocwireless sensor network localization.http://www.stanford.edu/∼yyye/adhocn2.pdf , 2003.[31] Leonard M. Blumenthal. On the four-point property. Bulletin of theAmerican Mathematical Society, 39:423–426, 1933.[32] Leonard M. Blumenthal. Theory and Applications of DistanceGeometry. Oxford University Press, 1953.[33] A. W. Bojanczyk and A. Lutoborski. The Procrustes problem fororthogonal Stiefel matrices. SIAM Journal on Scientific Computing,21(4):1291–1304, February 1998. Date is electronic publication:http://citeseer.ist.psu.edu/500185.html .[34] Ingwer Borg and Patrick Groenen. Modern Multidimensional Scaling.Springer-Verlag, 1997.[35] Jonathan M. Borwein and Heinz Bauschke.Projection algorithms and monotone operators.http://oldweb.cecm.sfu.ca/personal/jborwein/projections4.pdf ,1998.[36] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis andNonlinear Optimization: Theory and Examples. Springer-Verlag, 2000.[37] Stephen Boyd, Laurent El Ghaoui, Eric Feron, and VenkataramananBalakrishnan. Linear Matrix Inequalities in System and ControlTheory. SIAM, 1994.[38] Stephen Boyd and Lieven Vandenberghe. Convex Optimization.Cambridge University Press, 2004.http://www.stanford.edu/∼boyd/cvxbook.html .[39] James P. Boyle and Richard L. Dykstra. A method for findingprojections onto the intersection of convex sets in Hilbert spaces. In

BIBLIOGRAPHY 625R. Dykstra, T. Robertson, and F. T. Wright, editors, Advances in OrderRestricted Statistical Inference, pages 28–47. Springer-Verlag, 1986.[40] Lev M. Brègman. The method of successive projection for finding acommon point of convex sets. Soviet Mathematics, 162(3):487–490,1965. AMS translation of Doklady Akademii Nauk SSSR, 6:688-692.[41] Lev M. Brègman, Yair Censor, Simeon Reich, and YaelZepkowitz-Malachi. Finding the projection of a point ontothe intersection of convex sets via projections onto halfspaces.http://www.optimization-online.org/DB FILE/2003/06/669.pdf ,2003.[42] Mike Brookes. Matrix reference manual: Matrix calculus.http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html ,2002.[43] Yves Chabrillac and Jean-Pierre Crouzeix. Definiteness andsemidefiniteness of quadratic forms revisited. Linear Algebra and itsApplications, 63:283–292, 1984.[44] Chi-Tsong Chen. Linear System Theory and Design. Oxford UniversityPress, 1999.[45] Ward Cheney and Allen A. Goldstein. Proximity maps for convex sets.Proceedings of the American Mathematical Society, 10:448–450, 1959.[46] Steven Chu. Autobiography from Les Prix Nobel, 1997.http://nobelprize.org/physics/laureates/1997/chu-autobio.html .[47] John B. Conway. A Course in Functional Analysis. Springer-Verlag,second edition, 1990.[48] Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone. The LinearComplementarity Problem. Academic Press, 1992.[49] G. M. Crippen and T. F. Havel. Distance Geometry and MolecularConformation. Wiley, 1988.[50] Frank Critchley. Multidimensional scaling: a critical examination andsome new proposals. PhD thesis, University of Oxford, Nuffield College,1980.

626 BIBLIOGRAPHY[51] Frank Critchley. On certain linear mappings between inner-productand squared-distance matrices. Linear Algebra and its Applications,105:91–107, 1988.[52] Joachim Dahl, Bernard H. Fleury, and Lieven Vandenberghe.Approximate maximum-likelihood estimation using semidefiniteprogramming. In Proceedings of the IEEE International Conference onAcoustics, Speech, and Signal Processing, volume VI, pages 721–724,April 2003.www.ee.ucla.edu/faculty/papers/vandenberghe ICASSP03 apr03.pdf .[53] George B. Dantzig. Linear Programming and Extensions.Princeton University Press, 1998.[54] Jon Dattorro. Constrained least squares fit of a filter bank to anarbitrary magnitude frequency response, 1991.http://www.stanford.edu/∼dattorro/Hearing.htm .[55] Joel Dawson, Stephen Boyd, Mar Hershenson, and Thomas Lee.Optimal allocation of local feedback in multistage amplifiers viageometric programming. IEEE Circuits and Systems I: FundamentalTheory and Applications, 2001.[56] Jan de Leeuw. Fitting distances by least squares. UCLA StatisticsSeries Technical Report No. 130, Interdivisional Program in Statistics,UCLA, Los Angeles, CA, 1993.gifi.stat.ucla.edu/work/courses/240/mds/distancia.pdf .[57] Jan de Leeuw. Multidimensional scaling. In International Encyclopediaof the Social & Behavioral Sciences. Elsevier, 2001.http://preprints.stat.ucla.edu/274/274.pdf .[58] Jan de Leeuw and Willem Heiser. Theory of multidimensional scaling.In P. R. Krishnaiah and L. N. Kanal, editors, Handbook of Statistics,volume 2, chapter 13, pages 285–316. North-Holland Publishing,Amsterdam, 1982.[59] Frank Deutsch. Best Approximation in Inner Product Spaces.Springer-Verlag, 2001.

BIBLIOGRAPHY 627[60] Frank Deutsch and Hein Hundal. The rate of convergence ofDykstra’s cyclic projections algorithm: The polyhedral case. NumericalFunctional Analysis and Optimization, 15:537–565, 1994.[61] Michel Marie Deza and Monique Laurent. Geometry of Cuts andMetrics. Springer-Verlag, 1997.[62] Carolyn Pillers Dobler. A matrix approach to finding a set of generatorsand finding the polar (dual) of a class of polyhedral cones. SIAMJournal on Matrix Analysis and Applications, 15(3):796–803, July1994. Erratum: p.189 herein.[63] Elizabeth D. Dolan, Robert Fourer, Jorge J. Moré, and Todd S.Munson. Optimization on the NEOS server. SIAM News, 35(6):4,8,9,August 2002.[64] Bruce Randall Donald. 3-D structure in chemistry and molecularbiology, 1998.http://www.cs.dartmouth.edu/∼brd/Teaching/Bio/ .[65] David L. Donoho, Michael Elad, and Vladimir Temlyakov. Stablerecovery of sparse overcomplete representations in the presence of noise,February 2004. StableSparse-Donoho-etal.pdf .[66] Richard L. Dykstra. An algorithm for restricted least squaresregression. Journal of the American Statistical Association,78(384):837–842, 1983.[67] Carl Eckart and Gale Young. The approximation of one matrix byanother of lower rank. Psychometrika, 1(3):211–218, September 1936.http://www.stanford.edu/∼dattorro/eckart&young.1936.pdf .[68] Alan Edelman, Tomás A. Arias, and Steven T. Smith. The geometryof algorithms with orthogonality constraints. SIAM Journal on MatrixAnalysis and Applications, 20(2):303–353, 1998.[69] John J. Edgell. Graphics calculator applications on 4 - D constructs.http://archives.math.utk.edu/ICTCM/EP-9/C47/pdf/paper.pdf ,1996.

628 BIBLIOGRAPHY[70] Ivar Ekeland and Roger Témam. Convex Analysis and VariationalProblems. SIAM, 1999.[71] J. Farkas. Theorie der einfachen ungleichungen. Journal für die Reineund Angewandte Mathematik, 124:1–27, 1902.[72] Maryam Fazel. Matrix Rank Minimization with Applications. PhDthesis, Stanford University, Department of Electrical Engineering,March 2002.http://www.cds.caltech.edu/∼maryam/thesis-final.pdf .[73] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. A rankminimization heuristic with application to minimum order systemapproximation. In Proceedings of the American Control Conference,volume 6, pages 4734–4739. American Automatic Control Council(AACC), June 2001.http://www.cds.caltech.edu/∼maryam/nucnorm.html .[74] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. Log-detheuristic for matrix rank minimization with applications to Hankel andEuclidean distance matrices. In Proceedings of the American ControlConference. American Automatic Control Council (AACC), June 2003.http://www.cds.caltech.edu/∼maryam/acc03 final.pdf .[75] César Fernández, Thomas Szyperski, Thierry Bruyère, Paul Ramage,Egon Mösinger, and Kurt Wüthrich. NMR solution structure ofthe pathogenesis-related protein P14a. Journal of Molecular Biology,266:576–593, 1997.[76] Richard Phillips Feynman, Robert B. Leighton, and Matthew L. Sands.The Feynman Lectures on Physics: Commemorative Issue, volume I.Addison-Wesley, 1989.[77] Paul Finsler. Über das Vorkommen definiter und semidefiniter Formenin Scharen quadratischer Formen. Commentarii Mathematici Helvetici,9:188–192, 1937.[78] Anders Forsgren, Philip E. Gill, and Margaret H. Wright. Interiormethods for nonlinear optimization. SIAM Review, 44(4):525–597,2002.

BIBLIOGRAPHY 629[79] Norbert Gaffke and Rudolf Mathar. A cyclic projection algorithm viaduality. Metrika, 36:29–54, 1989.[80] Jérôme Galtier. Semi-definite programming as a simple extension tolinear programming: convex optimization with queueing, equity andother telecom functionals. In 3ème Rencontres Francophones sur lesAspects Algorithmiques des Télécommunications (AlgoTel). INRIA,France, 2001.www-sop.inria.fr/mascotte/Jerome.Galtier/misc/Galtier01b.pdf .[81] Laurent El Ghaoui and Silviu-Iulian Niculescu, editors. Advances inLinear Matrix Inequality Methods in Control. SIAM, 2000.[82] Philip E. Gill, Walter Murray, and Margaret H. Wright. NumericalLinear Algebra and Optimization, volume 1. Addison-Wesley, 1991.[83] Philip E. Gill, Walter Murray, and Margaret H. Wright. PracticalOptimization. Academic Press, 1999.[84] James Gleik. Isaac Newton. Pantheon Books, 2003.[85] W. Glunt, Tom L. Hayden, S. Hong, and J. Wells. An alternatingprojection algorithm for computing the nearest Euclidean distancematrix. SIAM Journal on Matrix Analysis and Applications,11(4):589–600, 1990.[86] K. Goebel and W. A. Kirk. Topics in Metric Fixed Point Theory.Cambridge University Press, 1990.[87] D. Goldfarb and K. Scheinberg. Interior point trajectoriesin semidefinite programming. SIAM Journal on Optimization,8(4):871–886, 1998.[88] Gene H. Golub and Charles F. Van Loan. Matrix Computations. JohnsHopkins, third edition, 1996.[89] P. Gordan. Ueber die auflösung linearer gleichungen mit reellencoefficienten. Mathematische Annalen, 6:23–28, 1873.[90] John Clifford Gower. Euclidean distance geometry. The MathematicalScientist, 7:1–14, 1982.

630 BIBLIOGRAPHY[91] John Clifford Gower. Properties of Euclidean and non-Euclideandistance matrices. Linear Algebra and its Applications, 67:81–97, 1985.[92] John Clifford Gower and Garmt B. Dijksterhuis. Procrustes Problems.Oxford University Press, 2004.[93] John Clifford Gower and David J. Hand. Biplots. Chapman & Hall,1996.[94] Alexander Graham. Kronecker Products and Matrix Calculus: withApplications. Wiley, 1981.[95] Michael Grant, Stephen Boyd, and Yinyu Ye. cvx: Matlab softwarefor disciplined convex programming.http://www.stanford.edu/∼boyd/cvx/ , 2006.A symbolic front-end for SeDuMi and other optimization software.[96] Robert M. Gray. Toeplitz and circulant matrices: A review.http://www-ee.stanford.edu/∼gray/toeplitz.pdf , 2002.[97] T. N. E. Greville. Note on the generalized inverse of a matrix product.SIAM Review, 8:518–521, 1966.[98] Rémi Gribonval and Morten Nielsen. Highly sparse representationsfrom dictionaries are unique and independent of the sparsenessmeasure, October 2003.http://www.math.auc.dk/research/reports/R-2003-16.pdf .[99] Rémi Gribonval and Morten Nielsen. Sparse representations in unionsof bases. IEEE Transactions on Information Theory, 49(12):1320–1325,December 2003.http://www.math.auc.dk/∼mnielsen/papers.htm .[100] Peter Gritzmann and Victor Klee. On the complexity of some basicproblems in computational convexity: II. Volume and mixed volumes.Technical Report TR:94-31, DIMACS, Rutgers University, 1994.dimacs/TechnicalReports/TechReports/1994/94-31.ps .[101] Peter Gritzmann and Victor Klee. On the complexity of somebasic problems in computational convexity: II. Volume and mixedvolumes. In T. Bisztriczky, P. McMullen, R. Schneider, and A. Ivić

BIBLIOGRAPHY 631Weiss, editors, Polytopes: Abstract, Convex and Computational, pages373–466. Kluwer Academic Publishers, 1994.[102] L. G. Gubin, B. T. Polyak, and E. V. Raik. The method of projectionsfor finding the common point of convex sets. U.S.S.R. ComputationalMathematics and Mathematical Physics, 7(6):1–24, 1967.[103] Osman Güler and Yinyu Ye. Convergence behavior of interior-pointalgorithms. Mathematical Programming, 60(2):215–228, 1993.[104] P. R. Halmos. Positive approximants of operators. Indiana UniversityMathematics Journal, 21:951–960, 1972.[105] Shih-Ping Han. A successive projection method. MathematicalProgramming, 40:1–14, 1988.[106] Godfrey H. Hardy, John E. Littlewood, and George Pólya. Inequalities.Cambridge University Press, second edition, 1952.[107] Arash Hassibi and Mar Hershenson. Automated optimal design ofswitched-capacitor filters. Design Automation and Test in EuropeConference, 2001.[108] Timothy F. Havel and Kurt Wüthrich. An evaluation of the combineduse of nuclear magnetic resonance and distance geometry for thedetermination of protein conformations in solution. Journal ofMolecular Biology, 182:281–294, 1985.[109] Tom L. Hayden and Jim Wells. Approximation by matrices positivesemidefinite on a subspace. Linear Algebra and its Applications,109:115–130, 1988.[110] Tom L. Hayden, Jim Wells, Wei-Min Liu, and Pablo Tarazaga.The cone of distance matrices. Linear Algebra and its Applications,144:153–169, 1991.[111] Bruce Hendrickson. Conditions for unique graph realizations. SIAMJournal on Computing, 21(1):65–84, February 1992.[112] T. Herrmann, Peter Güntert, and Kurt Wüthrich. Protein NMRstructure determination with automated NOE assignment using the

632 BIBLIOGRAPHYnew software CANDID and the torsion angle dynamics algorithmDYANA. Journal of Molecular Biology, 319(1):209–227, May 2002.[113] Mar Hershenson. Design of pipeline analog-to-digital converters viageometric programming. International Conference on Computer AidedDesign - ICCAD, 2002.[114] Mar Hershenson. Efficient description of the design space of analogcircuits. 40 th Design Automation Conference, 2003.[115] Mar Hershenson, Stephen Boyd, and Thomas Lee. Optimal design ofa CMOS OpAmp via geometric programming. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, 2001.[116] Mar Hershenson, Dave Colleran, Arash Hassibi, and Navraj Nandra.Synthesizable full custom mixed-signal IP. Electronics DesignAutomation Consortium - EDA, 2002.[117] Mar Hershenson, Sunderarajan S. Mohan, Stephen Boyd, and ThomasLee. Optimization of inductor circuits via geometric programming,1999.[118] Mar Hershenson and Xiling Shen. A new analog design flow. Cadence,2002.[119] Nick Higham. Matrix Procrustes problems.http://www.ma.man.ac.uk/∼higham/talks/ , 1995. Lecture notes.[120] Richard D. Hill and Steven R. Waters. On the cone of positivesemidefinite matrices. Linear Algebra and its Applications, 90:81–88,1987.[121] Jean-Baptiste Hiriart-Urruty. Ensembles de Tchebychev vs. ensemblesconvexes: l’état de la situation vu via l’analyse convexe non lisse.Annales des Sciences Mathématiques du Québec, 22(1):47–62, 1998.Association Mathématique du Québec,www.lacim.uqam.ca/∼annales/volumes/22-1/PDF/47-62.pdf .[122] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Convex Analysisand Minimization Algorithms II: Advanced Theory and BundleMethods. Springer-Verlag, second edition, 1996.

BIBLIOGRAPHY 633[123] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Fundamentalsof Convex Analysis. Springer-Verlag, 2001.[124] Alan J. Hoffman and Helmut W. Wielandt. The variation of thespectrum of a normal matrix. Duke Mathematical Journal, 20:37–40,1953.[125] Roger A. Horn and Charles R. Johnson. Matrix Analysis. CambridgeUniversity Press, 1987.[126] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis.Cambridge University Press, 1994.[127] Alston S. Householder. The Theory of Matrices in Numerical Analysis.Dover, 1975.[128] Hong-Xuan Huang, Zhi-An Liang, and Panos M. Pardalos. Someproperties for the Euclidean distance matrix and positive semidefinitematrix completion problems. Journal of Global Optimization,25(1):3–21, 2003.http://portal.acm.org/citation.cfm?id=607918 .[129] 5W Infographic. Wireless 911. Technology Review, 107(5):78–79, June2004. http://www.technologyreview.com/ .[130] Nathan Jacobson. Lectures in Abstract Algebra, vol. II - Linear Algebra.Van Nostrand, 1953.[131] Viren Jain and Lawrence K. Saul. Exploratory analysis andvisualization of speech and music by locally linear embedding. InProceedings of the IEEE International Conference on Acoustics,Speech, and Signal Processing, volume 3, pages 984–987, 2004.http://www.cis.upenn.edu/∼lsaul/papers/lle icassp04.pdf .[132] Joakim Jaldén, Cristoff Martin, and Björn Ottersten. Semidefiniteprogramming for detection in linear systems − Optimality conditionsand space-time decoding. In Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing, volume VI,April 2003.http://www.s3.kth.se/signal/reports/03/IR-S3-SB-0309.pdf .

634 BIBLIOGRAPHY[133] Florian Jarre. Convex analysis on symmetric matrices. InHenry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe, editors,Handbook of Semidefinite Programming: Theory, Algorithms, andApplications, chapter 2. Kluwer, 2000.[134] Charles R. Johnson and Pablo Tarazaga. Connections between the realpositive semidefinite and distance matrix completion problems. LinearAlgebra and its Applications, 223/224:375–391, 1995.[135] George B. Thomas, Jr. Calculus and Analytic Geometry.Addison-Wesley, fourth edition, 1972.[136] Thomas Kailath. Linear Systems. Prentice-Hall, 1980.[137] Tosio Kato. Perturbation Theory for Linear Operators.Springer-Verlag, 1966.[138] Paul J. Kelly and Norman E. Ladd. Geometry. Scott, Foresman andCompany, 1965.[139] Ron Kimmel. Numerical Geometry of Images: Theory, Algorithms,and Applications. Springer-Verlag, 2003.[140] Erwin Kreyszig. Introductory Functional Analysis with Applications.Wiley, 1989.[141] Jean B. Lasserre. A new Farkas lemma for positive semidefinitematrices. IEEE Transactions on Automatic Control, 40(6):1131–1133,June 1995.[142] Jean B. Lasserre and Eduardo S. Zeron. A Laplace transform algorithmfor the volume of a convex polytope. Journal of the Association forComputing Machinery, 48(6):1126–1140, November 2001.[143] Jean B. Lasserre and Eduardo S. Zeron. A new algorithm for thevolume of a convex polytope. arXiv.org, June 2001.http://arxiv.org/PS cache/math/pdf/0106/0106168.pdf .[144] Monique Laurent. A connection between positive semidefinite andEuclidean distance matrix completion problems. Linear Algebra andits Applications, 273:9–22, 1998.

BIBLIOGRAPHY 635[145] Monique Laurent. A tour d’horizon on positive semidefinite andEuclidean distance matrix completion problems. In Panos M. Pardalosand Henry Wolkowicz, editors, Topics in Semidefinite andInterior-Point Methods, pages 51–76. American Mathematical Society,1998.[146] Monique Laurent. Matrix completion problems. In Christodoulos A.Floudas and Panos M. Pardalos, editors, Encyclopedia of Optimization,volume III (Interior - M), pages 221–229. Kluwer, 2001.http://homepages.cwi.nl/∼monique/files/opt.ps .[147] Monique Laurent and Svatopluk Poljak. On the facial structure ofthe set of correlation matrices. SIAM Journal on Matrix Analysis andApplications, 17(3):530–547, July 1996.[148] Monique Laurent and Franz Rendl. Semidefinite programming andinteger programming. Optimization Online, 2002.http://www.optimization-online.org/DB HTML/2002/12/585.html .[149] Charles L. Lawson and Richard J. Hanson. Solving Least SquaresProblems. SIAM, 1995.[150] Jung Rye Lee. The law of cosines in a tetrahedron. Journal of the KoreaSociety of Mathematical Education Series B: The Pure and AppliedMathematics, 4(1):1–6, 1997.[151] Vladimir L. Levin. Quasi-convex functions and quasi-monotoneoperators. Journal of Convex Analysis, 2(1/2):167–172, 1995.[152] Adrian S. Lewis. Eigenvalue-constrained faces. Linear Algebra and itsApplications, 269:159–181, 1998.[153] Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and HervéLebret. Applications of second-order cone programming. LinearAlgebra and its Applications, 284:193–228, November 1998. SpecialIssue on Linear Algebra in Control, Signals and Image Processing.http://www.stanford.edu/∼boyd/socp.html .[154] David G. Luenberger. Optimization by Vector Space Methods. Wiley,1969.

636 BIBLIOGRAPHY[155] David G. Luenberger. Introduction to Dynamic Systems: Theory,Models, & Applications. Wiley, 1979.[156] David G. Luenberger. Linear and Nonlinear Programming.Addison-Wesley, second edition, 1989.[157] Zhi-Quan Luo, Jos F. Sturm, and Shuzhong Zhang. Superlinearconvergence of a symmetric primal-dual path following algorithm forsemidefinite programming. SIAM Journal on Optimization, 8(1):59–81,1998.[158] K. V. Mardia. Some properties of classical multi-dimensionalscaling. Communications in Statistics: Theory and Methods,A7(13):1233–1241, 1978.[159] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis.Academic Press, 1979.[160] Jerrold E. Marsden and Michael J. Hoffman. Elementary ClassicalAnalysis. Freeman, second edition, 1995.[161] Rudolf Mathar. The best Euclidean fit to a given distance matrix inprescribed dimensions. Linear Algebra and its Applications, 67:1–6,1985.[162] Rudolf Mathar. Multidimensionale Skalierung. B. G. TeubnerStuttgart, 1997.[163] Mehran Mesbahi and G. P. Papavassilopoulos. On the rankminimization problem over a positive semi-definite linear matrixinequality. IEEE Transactions on Automatic Control, 42(2):239–243,1997.[164] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and ThomasLee. Simple accurate expressions for planar spiral inductances. IEEEJournal of Solid-State Circuits, 1999.[165] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and ThomasLee. Bandwidth extension in CMOS with optimized on-chip inductors.IEEE Journal of Solid-State Circuits, 2000.

BIBLIOGRAPHY 637[166] E. H. Moore. On the reciprocal of the general algebraic matrix. Bulletinof the American Mathematical Society, 26:394–395, 1920. Abstract.[167] B. S. Mordukhovich. Maximum principle in the problem of timeoptimal response with nonsmooth constraints. Journal of AppliedMathematics and Mechanics, 40:960–969, 1976.[168] Jean-Jacques Moreau. Décomposition orthogonale d’un espaceHilbertien selon deux cônes mutuellement polaires. Comptes Rendusde l’Académie des Sciences, Paris, 255:238–240, 1962.[169] T. S. Motzkin and I. J. Schoenberg. The relaxation method for linearinequalities. Canadian Journal of Mathematics, 6:393–404, 1954.[170] Neil Muller, Lourenço Magaia, and B. M. Herbst. Singular valuedecomposition, eigenfaces, and 3D reconstructions. SIAM Review,46(3):518–545, September 2004.[171] Katta G. Murty and Feng-Tien Yu. Linear Complementarity, Linearand Nonlinear Programming. Heldermann Verlag, internet edition,1988. linear complementarity webbook/ .[172] Navraj Nandra. Synthesizable analog IP. IP Based Design Workshop,2002.[173] Stephen G. Nash and Ariela Sofer. Linear and Nonlinear Programming.McGraw-Hill, 1996.[174] Yurii Nesterov and Arkadii Nemirovskii. Interior-Point PolynomialAlgorithms in Convex Programming. SIAM, 1994.[175] Jorge Nocedal and Stephen J. Wright. Numerical Optimization.Springer-Verlag, 1999.[176] Royal Swedish Academy of Sciences. Nobel prize in chemistry, 2002.http://nobelprize.org/chemistry/laureates/2002/public.html .[177] C. S. Ogilvy. Excursions in Geometry. Dover, 1990. Citation:Proceedings of the CUPM Geometry Conference, MathematicalAssociation of America, No.16 (1967), p.21.

638 BIBLIOGRAPHY[178] Brad Osgood. Notes on the Ahlfors mapping of a multiply connecteddomain, 2000.www-ee.stanford.edu/∼osgood/Ahlfors-Bergman-Szego.pdf .[179] M. L. Overton and R. S. Womersley. On the sum of the largesteigenvalues of a symmetric matrix. SIAM Journal on Matrix Analysisand Applications, 13:41–45, 1992.[180] Pythagoras Papadimitriou. Parallel Solution of SVD-Related Problems,With Applications. PhD thesis, Department of Mathematics, Universityof Manchester, October 1993.[181] Panos M. Pardalos and Henry Wolkowicz, editors. Topics inSemidefinite and Interior-Point Methods. American MathematicalSociety, 1998.[182] Gábor Pataki. Cone-LP’s and semidefinite programs: Geometry anda simplex-type method. In William H. Cunningham, S. ThomasMcCormick, and Maurice Queyranne, editors, Integer Programmingand Combinatorial Optimization, Proceedings of the 5th InternationalIPCO Conference, Vancouver, British Columbia, Canada, June 3-5,1996, volume 1084 of Lecture Notes in Computer Science, pages162–174. Springer-Verlag, 1996.[183] Gábor Pataki. On the rank of extreme matrices in semidefiniteprograms and the multiplicity of optimal eigenvalues. Mathematicsof Operations Research, 23(2):339–358, 1998.http://citeseer.ist.psu.edu/pataki97rank.htmlErratum: p.486 herein.[184] Gábor Pataki. The geometry of semidefinite programming. InHenry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe, editors,Handbook of Semidefinite Programming: Theory, Algorithms, andApplications, chapter 3. Kluwer, 2000.[185] Teemu Pennanen and Jonathan Eckstein. Generalized Jacobiansof vector-valued convex functions. Technical Report RRR 6-97,RUTCOR, Rutgers University, May 1997.rutcor.rutgers.edu/pub/rrr/reports97/06.ps .

BIBLIOGRAPHY 639[186] R. Penrose. A generalized inverse for matrices. In Proceedings of theCambridge Philosophical Society, volume 51, pages 406–413, 1955.[187] Chris Perkins. A convergence analysis of Dykstra’s algorithm forpolyhedral sets. SIAM Journal on Numerical Analysis, 40(2):792–804,2002.[188] Florian Pfender and Günter M. Ziegler. Kissing numbers, spherepackings, and some unexpected proofs. Notices of the AmericanMathematical Society, 51(8):873–883, September 2004.[189] Alex Reznik. Problem 3.41, from Sergio Verdú. Multiuser Detection.Cambridge University Press, 1998.www.ee.princeton.edu/∼verdu/mud/solutions/3/3.41.areznik.pdf ,2001.[190] Sara Robinson. Hooked on meshing, researcher creates award-winningtriangulation program. SIAM News, 36(9), November 2003.http://siam.org/siamnews/11-03/meshing.pdf .[191] R. Tyrrell Rockafellar. Conjugate Duality and Optimization. SIAM,1974.[192] R. Tyrrell Rockafellar. Lagrange multipliers in optimization.SIAM-American Mathematical Society Proceedings, 9:145–168, 1976.[193] R. Tyrrell Rockafellar. Lagrange multipliers and optimality. SIAMReview, 35(2):183–238, June 1993.[194] R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press,1997. First published in 1970.[195] C. K. Rushforth. Signal restoration, functional analysis, and Fredholmintegral equations of the first kind. In Henry Stark, editor, ImageRecovery: Theory and Application, chapter 1, pages 1–27. AcademicPress, 1987.[196] Peter Santos. Application platforms and synthesizable analog IP.Embedded Systems Conference, 2003.[197] Shankar Sastry. Nonlinear Systems: Analysis, Stability, and Control.Springer-Verlag, 1999.

640 BIBLIOGRAPHY[198] Uwe Schäfer. A linear complementarity problem with a P-matrix.SIAM Review, 46(2):189–201, June 2004.[199] I. J. Schoenberg. Remarks to Maurice Fréchet’s article “Sur ladéfinition axiomatique d’une classe d’espace distanciés vectoriellementapplicable sur l’espace de Hilbert”. Annals of Mathematics,36(3):724–732, July 1935.http://www.stanford.edu/∼dattorro/Schoenberg1.pdf .[200] I. J. Schoenberg. Metric spaces and positive definite functions.Transactions of the American Mathematical Society, 44:522–536, 1938.[201] Peter H. Schönemann. A generalized solution of the orthogonalProcrustes problem. Psychometrika, 31(1):1–10, 1966.[202] Peter H. Schönemann, Tim Dorcey, and K. Kienapple. Subadditiveconcatenation in dissimilarity judgements. Perception andPsychophysics, 38:1–17, 1985.[203] Joshua A. Singer. Log-Penalized Linear Regression. PhD thesis,Stanford University, Department of Electrical Engineering, June 2004.http://www.stanford.edu/∼singer/Papers/papers.html .[204] Anthony Man-Cho So and Yinyu Ye. Theory of semidefiniteprogramming for sensor network localization, 2004.http://www.stanford.edu/∼yyye/local-theory.pdf .[205] Anthony Man-Cho So and Yinyu Ye. A semidefinite programmingapproach to tensegrity theory and realizability of graphs, 2005.http://www.stanford.edu/∼yyye/d-real-soda2.pdf .[206] Henry Stark, editor. Image Recovery: Theory and Application.Academic Press, 1987.[207] Henry Stark. Polar, spiral, and generalized sampling and interpolation.In Robert J. Marks II, editor, Advanced Topics in Shannon Samplingand Interpolation Theory, chapter 6, pages 185–218. Springer-Verlag,1993.[208] Willi-Hans Steeb. Matrix Calculus and Kronecker Product withApplications and C++ Programs. World Scientific Publishing Co.,1997.

BIBLIOGRAPHY 641[209] Gilbert W. Stewart and Ji-guang Sun. Matrix Perturbation Theory.Academic Press, 1990.[210] Josef Stoer and Christoph Witzgall. Convexity and Optimization inFinite Dimensions I. Springer-Verlag, 1970.[211] Gilbert Strang. Course 18.06: Linear algebra.http://web.mit.edu/18.06/www , 2004.[212] Gilbert Strang. Linear Algebra and its Applications. Harcourt Brace,third edition, 1988.[213] Gilbert Strang. Calculus. Wellesley-Cambridge Press, 1992.[214] Gilbert Strang. Introduction to Linear Algebra. Wellesley-CambridgePress, second edition, 1998.[215] S. Straszewicz. Über exponierte Punkte abgeschlossener Punktmengen.Fund. Math., 24:139–143, 1935.[216] Jos F. Sturm. SeDuMi (self-dual minimization).http://sedumi.mcmaster.ca/ , 2005.Software for optimization over symmetric cones.[217] Jos F. Sturm and Shuzhong Zhang. On cones of nonnegative quadraticfunctions. Optimization Online, April 2001.http://www.optimization-online.org/DB HTML/2001/05/324.html .[218] George P. H. Styan. A review and some extensions of Takemura’sgeneralizations of Cochran’s theorem. Technical Report 56, StanfordUniversity, Department of Statistics, September 1982.[219] George P. H. Styan and Akimichi Takemura. Rank additivityand matrix polynomials. Technical Report 57, Stanford University,Department of Statistics, September 1982.[220] Chen Han Sung and Bit-Shun Tam. A study of projectionally exposedcones. Linear Algebra and its Applications, 139:225–252, 1990.[221] Yoshio Takane. On the relations among four methods ofmultidimensional scaling. Behaviormetrika, 4:29–43, 1977.http://takane.brinkster.net/Yoshio/p008.pdf .

642 BIBLIOGRAPHY[222] Akimichi Takemura. On generalizations of Cochran’s theorem andprojection matrices. Technical Report 44, Stanford University,Department of Statistics, August 1980.[223] Peng Hui Tan and Lars K. Rasmussen. The application of semidefiniteprogramming for detection in CDMA. IEEE Journal on Selected Areasin Communications, 19(8), August 2001.[224] Pablo Tarazaga. Faces of the cone of Euclidean distance matrices:Characterizations, structure and induced geometry. Linear Algebraand its Applications, 408:1–13, 2005.[225] Warren S. Torgerson. Theory and Methods of Scaling. Wiley, 1958.[226] Lloyd N. Trefethen and David Bau, III. Numerical Linear Algebra.SIAM, 1997.[227] Michael W. Trosset. Applications of multidimensional scaling tomolecular conformation. Computing Science and Statistics, 29:148–152,1998.[228] Michael W. Trosset. Distance matrix completion by numericaloptimization. Computational Optimization and Applications,17(1):11–22, October 2000.[229] Michael W. Trosset. Extensions of classical multidimensional scaling:Computational theory. www.math.wm.edu/∼trosset/r.mds.html ,2001. Revision of technical report entitled “Computing distancesbetween convex sets and subsets of the positive semidefinite matrices”first published in 1997.[230] Michael W. Trosset and Rudolf Mathar. On the existence of nonglobalminimizers of the STRESS criterion for metric multidimensionalscaling. In Proceedings of the Statistical Computing Section, pages158–162. American Statistical Association, 1997.[231] Jan van Tiel. Convex Analysis, an Introductory Text. Wiley, 1984.[232] Lieven Vandenberghe and Stephen Boyd. Semidefinite programming.SIAM Review, 38(1):49–95, March 1996.

BIBLIOGRAPHY 643[233] Lieven Vandenberghe and Stephen Boyd. Connections betweensemi-infinite and semidefinite programming. In R. Reemtsen andJ.-J. Rückmann, editors, Semi-Infinite Programming, chapter 8, pages277–294. Kluwer Academic Publishers, 1998.[234] Lieven Vandenberghe and Stephen Boyd. Applications of semidefiniteprogramming. Applied Numerical Mathematics, 29(3):283–299, March1999.[235] Lieven Vandenberghe, Stephen Boyd, and Shao-Po Wu. Determinantmaximization with linear matrix inequality constraints. SIAM Journalon Matrix Analysis and Applications, 19(2):499–533, April 1998.[236] Robert J. Vanderbei. Convex optimization: Interior-point methods andapplications, 1999.www.sor.princeton.edu/∼rvdb/pdf/talks/pumath/talk.pdf .[237] Richard S. Varga. Geršgorin and His Circles. Springer-Verlag, 2004.[238] Martin Vetterli and Jelena Kovačević. Wavelets and Subband Coding.Prentice Hall, 1995.[239] È. B. Vinberg. The theory of convex homogeneous cones.Transactions of the Moscow Mathematical Society, 12:340–403, 1963.American Mathematical Society and London Mathematical Societyjoint translation, 1965.[240] Marie A. Vitulli. A brief history of linear algebra and matrix theory.darkwing.uoregon.edu/∼vitulli/441.sp04/LinAlgHistory.html ,2004.[241] John von Neumann. Functional Operators, Volume II: The Geometryof Orthogonal Spaces. Princeton University Press, 1950. Reprintedfrom mimeographed lecture notes first distributed in 1933.[242] Roger Webster. Convexity. Oxford University Press, 1994.[243] Kilian Q. Weinberger and Lawrence K. Saul. Unsupervised learning ofimage manifolds by semidefinite programming. In Proceedings of theIEEE Computer Society Conference on Computer Vision and PatternRecognition, volume 2, pages 988–995, 2004.http://www.cis.upenn.edu/∼lsaul/papers/sde cvpr04.pdf .

644 BIBLIOGRAPHY[244] Eric W. Weisstein. MathWorld – A Wolfram Web Resource.http://mathworld.wolfram.com/search/ , 2004.[245] Bernard Widrow and Samuel D. Stearns. Adaptive Signal Processing.Prentice-Hall, 1985.[246] Norbert Wiener. On factorization of matrices. CommentariiMathematici Helvetici, 29:97–111, 1955.[247] Michael P. Williamson, Timothy F. Havel, and Kurt Wüthrich.Solution conformation of proteinase inhibitor IIA from bull seminalplasma by 1 H nuclear magnetic resonance and distance geometry.Journal of Molecular Biology, 182:295–315, 1985.[248] Willie W. Wong. Cayley-Menger determinant and generalizedN-dimensional Pythagorean theorem.http://www.princeton.edu/∼wwong/papers/gp-r.pdf ,November 2003. Application of Linear Algebra: Notes on Talk givento Princeton University Math Club.[249] William Wooton, Edwin F. Beckenbach, and Frank J. Fleming. ModernAnalytic Geometry. Houghton Mifflin, 1975.[250] Margaret H. Wright. The interior-point revolution in optimization:History, recent developments, and lasting consequences. Bulletin ofthe American Mathematical Society, 42(1):39–56, January 2005.[251] Stephen J. Wright. Primal-Dual Interior-Point Methods. SIAM, 1997.[252] Shao-Po Wu. max-det Programming with Applications in MagnitudeFilter Design. A dissertation submitted to the department of ElectricalEngineering, Stanford University, December 1997.[253] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver forsemidefinite programming and determinant maximization problemswith matrix structure. User’s guide.http://www.stanford.edu/∼boyd/SDPSOL.html , 1996.[254] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver forsemidefinite programs with matrix structure. In Laurent El Ghaoui andSilviu-Iulian Niculescu, editors, Advances in Linear Matrix Inequality

Methods in Control, chapter 4, pages 79–91. SIAM, 2000.http://www.stanford.edu/∼boyd/sdpsol.html .[255] Shao-Po Wu, Stephen Boyd, and Lieven Vandenberghe. FIRfilter design via spectral factorization and convex optimization.http://www.stanford.edu/∼boyd/reports/magdes.pdf , 1997.[256] David D. Yao, Shuzhong Zhang, and Xun Yu Zhou. Stochasticlinear-quadratic control via primal-dual semidefinite programming.SIAM Review, 46(1):87–111, March 2004. Erratum: p.195 herein.[257] Yinyu Ye. Semidefinite programming for Euclidean distance geometricoptimization. Lecture notes, 2003.http://www.stanford.edu/class/ee392o/EE392o-yinyu-ye.pdf .[258] Yinyu Ye. Convergence behavior of central paths for convexhomogeneous self-dual cones, 1996.http://www.stanford.edu/∼yyye/yyye/ye.ps .[259] Yinyu Ye. Interior Point Algorithms: Theory and Analysis. Wiley,1997.[260] D. C. Youla. Mathematical theory of image restoration by the methodof convex projection. In Henry Stark, editor, Image Recovery: Theoryand Application, chapter 2, pages 29–77. Academic Press, 1987.[261] Fuzhen Zhang. Matrix Theory: Basic Results and Techniques.Springer-Verlag, 1999.[262] Günter M. Ziegler. Kissing numbers: Surprises in dimension four.Emissary, pages 4–5, Spring 2004.http://www.msri.org/publications/emissary/ .2001 Jon Dattorro. CO&EDG version 03.09.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.645

646 BIBLIOGRAPHY

IndexA searchable pdfBook is available online from Meboo Publishing.647

648

649

650

651

652

653

654

Convex Optimization & Euclidean Distance Geometry, DattorroMeboo Publishing USA

v2006.03.09 - Convex Optimization

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?