v2004.06.19 - Convex Optimization

Euclidean Distance GeometryviaConvex OptimizationJon DattorroJune 2004

c○ 2001 Jon Dattorro, all rights reserved.

for Jennie Columba♦♦Antonio♦and Sze Wan

AcknowledgementsReflecting on help given me all along the way, I am awed by my heavy debtof gratitude:Anthony, Teppo Aatola, Bob Adams, Dave Andreas, Chuck Bagnaschi,Jeffrey Barish, Keith Barr, Florence Barker, David Beausejour, LeonardBellin, Dave Berners, Barry Blesser, Richard Bolt, Stephen Boyd, MerrilBradshaw, Gary Brooks, Jack Brown, Patty Brown, Joe Bryan, Barb Burk,John Calderone, Kathryn Calí, Terri Carter, Reginald Centracchio, AlbertCharpentier, Bonnie Charvez, Michael Chille, Jan Chomysin, John Cioffi,Robert Cogan, Sandra, Josie, and Freddy Colletta, Judy Copice, Kathyand Rita Copley, Gary Corsini, John, Christine, and Yvonne D’Attorro,Cosmo Dellagrotta, Robin DeLuca, Lou D’Orsaneo, Antonetta DeCotis,Tom Destefano, Peggy Donatelli, Jeannie Duguay, Pozzi Escot, Terry Fayer,Maryam Fazel, Carol Feola, Valery and Evelyn Ferrullo, Martin Fischer,Salvatore Fransosi, Stanley Freedman, Tony Garnett, Edouard Gauthier,Jeanette Genovese, Amelia Giglio, Chae Teok Goh, Gene Golub, PatriciaGoodrich, Heather Graham, Robert Gray, David Griesinger, Robert Haas,George Heath, Bruce, John, Joy, and Xtine, Roy Hendrickson, Martin Henry,Mar Hershenson, Haitham Hindi, Evan Holland, Alison Kay Hottes, LelandJackson, Rene Jaeger, Sung Dae Jun, Max Kamenetsky, Seymour Kass,Patricia Keefe, James Koch, Mary Kohut, Paul Kruger, Helen Kung, LynnLamberis, Paul Lansky, Karen Law, Francis Lee, Scott Levine, Hal Lipset,Nando Lopez-Lezcano, Pat Macdonald, Ginny Magee, Ed Mahler, Sylviaand David Martinelli, Bill Mauchly, Jamie Jacobs May, Paul and AngeliaMcLean, Joyce, Al, and Tilly Menchini, Teresa Meng, Tom Metcalf, J. AndyMoorer, Denise Murphy, Walter Murray, John O’Donnell, Ray Ouellette,Sal Pandozzi, Bob Peretti, Manny Pokotilow, Michael Radentz, TimothyRapa, Doris Rapisardi, Katharina Reissing, Flaviu, Nancy, and MyronRomanul, Marvin Rous, Carlo Russo, Eric Safire, Jimmy Scialdone, Anthony,Susan, and Esther Scorpio, John Senior, Robin Silver, Josh, Richard, andJeremy Singer, Terry Stancil, Timothy Stilson, Brigitte Strain, John Strawn,Albert Siu-Toung, David Taraborelli, Robin Tepper, Susan Thomas, StavrosToumpis, Georgette Trezvant, Rebecca, Daniel, Jack, and Stella Tuttle,Maryanne Uccello, Joe Votta, Julia Wade, Shelley Marged Weber, JenniferWesterlind, Bernard Widrow, Todd Wilson, George and MaryAnn Young,Donna and Joe Zagami

illustrations: Mayura Drawtypesetting by WinEdt

Contents1 Prelude 151.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Convex geometry 192.1 Convex set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.1.1 Vectorized matrix inner product . . . . . . . . . . . . . 252.1.2 Symmetric matrices . . . . . . . . . . . . . . . . . . . . 282.2 Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.1 Affine dimension, affine hull . . . . . . . . . . . . . . . 332.2.2 Convex hull . . . . . . . . . . . . . . . . . . . . . . . . 342.2.3 Conic hull . . . . . . . . . . . . . . . . . . . . . . . . . 352.3 Halfspace, Hyperplane . . . . . . . . . . . . . . . . . . . . . . 362.3.1 Halfspaces H + and H − . . . . . . . . . . . . . . . . . . 362.3.2 Hyperplane ∂H representations . . . . . . . . . . . . . 402.4 Subspace representations . . . . . . . . . . . . . . . . . . . . . 442.4.1 Subspace or affine set. . . . . . . . . . . . . . . . . . . . 452.4.2 Intersection of subspaces . . . . . . . . . . . . . . . . . 462.5 Extreme, Exposed . . . . . . . . . . . . . . . . . . . . . . . . . 472.5.1 Exposure . . . . . . . . . . . . . . . . . . . . . . . . . 482.6 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.6.1 Cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.6.2 Convex cone . . . . . . . . . . . . . . . . . . . . . . . . 522.6.3 Cone boundary . . . . . . . . . . . . . . . . . . . . . . 552.6.4 Extreme direction . . . . . . . . . . . . . . . . . . . . . 562.6.5 Exposed direction . . . . . . . . . . . . . . . . . . . . . 572.6.6 Positive semidefinite (PSD) cone . . . . . . . . . . . . 602.6.7 Conic independence (c.i.) . . . . . . . . . . . . . . . . . 682.6.8 When extreme means exposed . . . . . . . . . . . . . . 727

8 CONTENTS2.7 Convex polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . 722.7.1 Polyhedral cone . . . . . . . . . . . . . . . . . . . . . . 732.7.2 Vertices of convex polyhedra . . . . . . . . . . . . . . . 742.7.3 Unit simplex . . . . . . . . . . . . . . . . . . . . . . . 762.7.4 Converting between descriptions . . . . . . . . . . . . . 772.8 Dual cone, generalized inequality, . . . . . . . . . . . . . . . . 772.8.1 Dual cone . . . . . . . . . . . . . . . . . . . . . . . . . 782.8.2 Abstractions of Farkas’ lemma . . . . . . . . . . . . . . 812.8.3 Biorthogonal expansion, by example . . . . . . . . . . 882.9 Formulae and algorithm finding dual cone . . . . . . . . . . . . 922.9.1 Biorthogonal expansion, derivation . . . . . . . . . . . 922.9.2 Dual of pointed K , X skinny-or-square full-rank . . . 942.9.3 Dual of proper non-simplicial K , X fat full-rank . . . 1012.10 Polyhedral inscription in PSD cone . . . . . . . . . . . . . . . 1043 Geometry of convex functions 1133.1 Convex function . . . . . . . . . . . . . . . . . . . . . . . . . . 1143.1.1 Vector-valued function . . . . . . . . . . . . . . . . . . 1143.1.2 Matrix-valued function . . . . . . . . . . . . . . . . . . 1213.2 Quasiconvex function . . . . . . . . . . . . . . . . . . . . . . . 1253.3 Salient properties . . . . . . . . . . . . . . . . . . . . . . . . . 1284 Euclidean Distance Matrix 1294.1 EDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1304.2 Metric requirements . . . . . . . . . . . . . . . . . . . . . . . . 1314.3 ∃ fifth Euclidean axiom . . . . . . . . . . . . . . . . . . . . . . 1314.3.1 Lookahead . . . . . . . . . . . . . . . . . . . . . . . . . 1334.4 EDM definition . . . . . . . . . . . . . . . . . . . . . . . . . . 1354.4.1 −VN TD(X)V N convexity . . . . . . . . . . . . . . . . . 1364.4.2 Gram-form EDM definition . . . . . . . . . . . . . . . 1374.4.3 Inner-product form EDM definition . . . . . . . . . . . 1444.5 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1474.5.1 Translation . . . . . . . . . . . . . . . . . . . . . . . . 1484.5.2 Rotation/Reflection . . . . . . . . . . . . . . . . . . . . 1494.5.3 Invariance conclusion . . . . . . . . . . . . . . . . . . . 1504.6 Injectivity of D . Unique reconstruction . . . . . . . . . . . . 1504.6.1 Inversion of D . . . . . . . . . . . . . . . . . . . . . . 1524.7 Embedding in the affine hull . . . . . . . . . . . . . . . . . . . 154

CONTENTS 94.7.1 Determining affine dimension . . . . . . . . . . . . . . 1554.7.2 Affine dimension r versus rank . . . . . . . . . . . . . 1574.7.3 Précis . . . . . . . . . . . . . . . . . . . . . . . . . . . 1584.8 Euclidean metric versus matrix criteria . . . . . . . . . . . . . 1594.8.1 Nonnegativity axiom 1 . . . . . . . . . . . . . . . . . . 1594.8.2 Triangle inequality axiom 4 . . . . . . . . . . . . . . . 1604.8.3 −VN TDV N nesting . . . . . . . . . . . . . . . . . . . . . 1634.8.4 Affine dimension reduction in two dimensions . . . . . 1654.9 Bridge: Convex polyhedra to EDMs . . . . . . . . . . . . . . . 1664.9.1 Geometric criterion . . . . . . . . . . . . . . . . . . . . 1674.9.2 Necessity and sufficiency . . . . . . . . . . . . . . . . . 1704.9.3 Example revisited . . . . . . . . . . . . . . . . . . . . . 1704.10 List reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 1714.10.1 via Gram . . . . . . . . . . . . . . . . . . . . . . . . . 1714.10.2 x 1 at the origin. V N . . . . . . . . . . . . . . . . . . . 1714.11 Reconstruction examples . . . . . . . . . . . . . . . . . . . . . 1754.11.1 Isometric reconstruction . . . . . . . . . . . . . . . . . 1754.11.2 Isotonic reconstruction . . . . . . . . . . . . . . . . . . 1754.11.3 Isotonic reconstruction, alternating projection . . . . . 1804.12 Fifth Euclidean requirement . . . . . . . . . . . . . . . . . . . 1824.12.1 Recapitulate . . . . . . . . . . . . . . . . . . . . . . . . 1824.12.2 Derivation of the fifth . . . . . . . . . . . . . . . . . . 1824.12.3 Path not followed . . . . . . . . . . . . . . . . . . . . . 1854.12.4 Affine dimension reduction in three dimensions . . . . . 1894.13 EDM-entry composition . . . . . . . . . . . . . . . . . . . . . 1894.14 EDM indefiniteness . . . . . . . . . . . . . . . . . . . . . . . . 1914.14.1 EDM eigenvalues, congruence transformation . . . . . . 1924.14.2 Spectral cone . . . . . . . . . . . . . . . . . . . . . . . 1934.14.3 Eigenvalues of −V DV versus −V † N DV N . . . . . . . . 1954.15 DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1964.15.1 DFT via EDM . . . . . . . . . . . . . . . . . . . . . . 1974.16 Self similarity... . . . . . . . . . . . . . . . . . . . . . . . . . . 2005 EDM cone 2035.1 Polyhedral bounds . . . . . . . . . . . . . . . . . . . . . . . . 2065.2 EDM definition in 11 T . . . . . . . . . . . . . . . . . . . . . . 2065.2.1 Range of EDM D . . . . . . . . . . . . . . . . . . . . . 2085.2.2 Faces of EDM cone . . . . . . . . . . . . . . . . . . . . 210

10 CONTENTS5.3 EDM cone by inverse image... . . . . . . . . . . . . . . . . . . 2135.4 Correspondence to PSD cone S N−1+ . . . . . . . . . . . . . . . 2145.4.1 Gram-form correspondence to S N−1+ . . . . . . . . . . . 2155.4.2 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . 2165.5 Vectorization projection interpretation . . . . . . . . . . . . . 2165.5.1 Face of PSD cone S N + containing V . . . . . . . . . . . 2195.6 Dual EDM cone . . . . . . . . . . . . . . . . . . . . . . . . . . 2205.6.1 Ambient S N . . . . . . . . . . . . . . . . . . . . . . . . 2205.6.2 Ambient S N 0 . . . . . . . . . . . . . . . . . . . . . . . . 2215.6.3 Theorem of the alternative . . . . . . . . . . . . . . . . 2236 Convex optimization 2256.1 Semidefinite programming . . . . . . . . . . . . . . . . . . . . 2266.1.1 Conic problem . . . . . . . . . . . . . . . . . . . . . . . 2276.1.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . 2296.1.3 Rank-reduced solution . . . . . . . . . . . . . . . . . . 2326.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2367 EDM proximity 2437.0.1 Lower bound . . . . . . . . . . . . . . . . . . . . . . . 2447.0.2 Measurement matrix H . . . . . . . . . . . . . . . . . . 2447.0.3 Three prevalent fundamental problems . . . . . . . . . 2487.1 First prevalent problem: . . . . . . . . . . . . . . . . . . . . . 2497.1.1 Closest EDM Problem 1, proof of (710),convex case . . . . . . . . . . . . . . . . . . . . . . . . 2507.1.2 rank ρ subset . . . . . . . . . . . . . . . . . . . . . . . 2517.1.3 Closest EDM Problem 1, non-convex case . . . . . . . 2527.1.4 Problem 1 in spectral norm, convex case . . . . . . . . 2557.2 Second prevalent problem: . . . . . . . . . . . . . . . . . . . . 2567.2.1 Convex case . . . . . . . . . . . . . . . . . . . . . . . . 2577.2.2 Minimization of affine dimension in Problem 2 . . . . . 2587.3 Third prevalent problem: . . . . . . . . . . . . . . . . . . . . . 2627.3.1 Convex case . . . . . . . . . . . . . . . . . . . . . . . . 2627.3.2 Minimization of affine dimension in Problem 3 . . . . . 2677.3.3 Tightly constrained affine dimension, Problem 3 . . . . 267

CONTENTS 118 EDM completion 2738.1 Completion not necessarily unique . . . . . . . . . . . . . . . . 2758.2 Requirements for unique completion . . . . . . . . . . . . . . . 2788.2.1 Conservative requirements in R 2 . . . . . . . . . . . . . 2788.2.2 Unique completion in any dimension r . . . . . . . . . 2808.3 Completion problems . . . . . . . . . . . . . . . . . . . . . . . 2848.3.1 Cattle problem . . . . . . . . . . . . . . . . . . . . . . 2848.3.2 Chemistry problem . . . . . . . . . . . . . . . . . . . . 2879 Piano tuning 289A Linear algebra 291A.1 Main diagonal δ operator, trace, vec . . . . . . . . . . . . . . . 291A.2 Semidefiniteness: domain of test . . . . . . . . . . . . . . . . . 294A.2.1 Symmetry versus semidefiniteness . . . . . . . . . . . . 295A.3 Proper statements . . . . . . . . . . . . . . . . . . . . . . . . . 298A.3.1 Eigenvalues, semidefiniteness, nonsymmetric . . . . . . 299A.4 Schur complement . . . . . . . . . . . . . . . . . . . . . . . . 309A.4.1 Semidefinite program via Schur . . . . . . . . . . . . . 311A.4.2 Determinant . . . . . . . . . . . . . . . . . . . . . . . . 312A.5 Eigen-decomposition . . . . . . . . . . . . . . . . . . . . . . . 313A.5.1 Eigenmatrix . . . . . . . . . . . . . . . . . . . . . . . . 314A.5.2 Symmetric diagonalization . . . . . . . . . . . . . . . . 314A.6 Singular value decomposition, SVD . . . . . . . . . . . . . . . 315A.6.1 Compact SVD . . . . . . . . . . . . . . . . . . . . . . . 315A.6.2 Subcompact SVD . . . . . . . . . . . . . . . . . . . . . 316A.6.3 Full SVD . . . . . . . . . . . . . . . . . . . . . . . . . 316A.6.4 SVD of symmetric matrices . . . . . . . . . . . . . . . 317A.6.5 Pseudoinverse by SVD . . . . . . . . . . . . . . . . . . 318A.7 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318A.7.1 0 entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 318A.7.2 0 eigenvalues theorem . . . . . . . . . . . . . . . . . . 318A.7.3 0 trace and product . . . . . . . . . . . . . . . . . . . . 321A.7.4 Zero definite . . . . . . . . . . . . . . . . . . . . . . . . 321B Simple matrices 325B.1 Rank-one matrix (dyad) . . . . . . . . . . . . . . . . . . . . . 326B.1.1 Dyad independence . . . . . . . . . . . . . . . . . . . . 328

12 CONTENTSB.2 Doublet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331B.3 Elementary matrix . . . . . . . . . . . . . . . . . . . . . . . . 332B.3.1 Householder matrix . . . . . . . . . . . . . . . . . . . . 333B.4 Auxiliary V -matrices . . . . . . . . . . . . . . . . . . . . . . . 334B.4.1 Auxiliary matrix V . . . . . . . . . . . . . . . . . . . . 334B.4.2 Schoenberg auxiliary matrix V N . . . . . . . . . . . . . 336B.4.3 Auxiliary matrix V W . . . . . . . . . . . . . . . . . . . 337B.4.4 Auxiliary V -matrix Table . . . . . . . . . . . . . . . . 337B.4.5 More auxiliary matrices . . . . . . . . . . . . . . . . . 338B.5 Orthogonal matrix . . . . . . . . . . . . . . . . . . . . . . . . 338B.5.1 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . 339B.5.2 Matrix rotation . . . . . . . . . . . . . . . . . . . . . . 339B.6 Circulant matrices... . . . . . . . . . . . . . . . . . . . . . . . 340C Some optimal analytical results 343C.1 involving diagonal, trace, eigenvalues . . . . . . . . . . . . . . 343C.2 Orthogonal Procrustes problem . . . . . . . . . . . . . . . . . 346C.2.1 Two-sided orthogonal Procrustes problems . . . . . . . 347D Matrix calculus 351D.1 Directional derivative, Taylor series . . . . . . . . . . . . . . . 351D.1.1 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . 351D.1.2 Product rules for matrix-functions . . . . . . . . . . . . 355D.1.3 Chain rules for composite matrix-functions . . . . . . . 357D.1.4 First directional derivative . . . . . . . . . . . . . . . . 358D.1.5 Second directional derivative . . . . . . . . . . . . . . . 363D.1.6 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . 366D.1.7 Correspondence of gradient to derivative . . . . . . . . 367D.2 Tables of gradients and derivatives . . . . . . . . . . . . . . . 370D.2.1 Algebraic . . . . . . . . . . . . . . . . . . . . . . . . . 373D.2.2 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376D.2.3 Trace Kronecker . . . . . . . . . . . . . . . . . . . . . . 378D.2.4 Log determinant . . . . . . . . . . . . . . . . . . . . . 379D.2.5 Determinant . . . . . . . . . . . . . . . . . . . . . . . . 380D.2.6 Logarithmic . . . . . . . . . . . . . . . . . . . . . . . . 381D.2.7 Exponential . . . . . . . . . . . . . . . . . . . . . . . . 381

CONTENTS 13E Pseudoinverse, Projection 383E.0.1 Logical deductions . . . . . . . . . . . . . . . . . . . . 384E.1 Idempotent matrices . . . . . . . . . . . . . . . . . . . . . . . 385E.1.1 Biorthogonal characterization . . . . . . . . . . . . . . 386E.1.2 Idempotence summary . . . . . . . . . . . . . . . . . . 389E.2 I − P , Projection on algebraic complement . . . . . . . . . . . 389E.3 Symmetric idempotent matrices . . . . . . . . . . . . . . . . . 390E.3.1 Four subspaces . . . . . . . . . . . . . . . . . . . . . . 391E.3.2 Orthogonal characterization . . . . . . . . . . . . . . . 392E.3.3 Symmetric idempotence summary . . . . . . . . . . . . 393E.3.4 Orthonormal decomposition . . . . . . . . . . . . . . . 393E.4 Algebra of projection on affine subsets . . . . . . . . . . . . . 396E.5 Projection examples . . . . . . . . . . . . . . . . . . . . . . . 396E.6 Vectorization interpretation, . . . . . . . . . . . . . . . . . . . 403E.6.1 Nonorthogonal projection on a vector . . . . . . . . . . 403E.6.2 Nonorthogonal projection on vectorized matrix . . . . . 403E.6.3 Orthogonal projection on a vector . . . . . . . . . . . . 405E.6.4 Orthogonal projection on a vectorized matrix . . . . . 405E.7 Range/Rowspace interpretation . . . . . . . . . . . . . . . . . 410E.7.1 Projection on vectorized matrices of higher rank . . . . 411E.8 Projection on convex set . . . . . . . . . . . . . . . . . . . . . 414E.8.1 Projection on cone . . . . . . . . . . . . . . . . . . . . 415E.8.2 Easy projections . . . . . . . . . . . . . . . . . . . . . 418E.8.3 Projection on convex set in subspace . . . . . . . . . . 419E.9 Alternating projection . . . . . . . . . . . . . . . . . . . . . . 422E.9.1 Distance and existence . . . . . . . . . . . . . . . . . . 426E.9.2 Feasibility and convergence . . . . . . . . . . . . . . . 426E.9.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . 432F MATLAB programs... 435F.1 isedm() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435F.1.1 Subroutines for isedm() . . . . . . . . . . . . . . . . . 435F.2 conic independence . . . . . . . . . . . . . . . . . . . . . . . . 437F.2.1 lp() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438F.3 Map of the USA . . . . . . . . . . . . . . . . . . . . . . . . . . 439F.3.1 EDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439F.3.2 EDM using ordinal data . . . . . . . . . . . . . . . . . 441

14 CONTENTSG Notation and some definitions 445

Chapter 1PreludePeople are so afraid of convex analysis.−Claude Lemaréchal, 2003Convex Analysis is the calculus of inequalities while Convex Optimizationis its application. Analysis is inherently the domain of the mathematicianwhile Optimization belongs to the engineer. It can be difficult for the engineerto apply theory without understanding Analysis. Boyd & Vandenberghe’sbook Convex Optimization [1] is a marvelous bridge between the two disciplines;rigorous though accessible to non-mathematicians. Their text wasconventionally published in 2004 having been taught at Stanford Universityand made freely available on the world-wide web for ten prior years; 1.1 adissemination motivated by the belief that a virtual flood of new applicationswould follow by epiphany that many problems hitherto presumednon-convex could be transformed or relaxed into convexity. [2] [3] [4] [5] [6][7, §4.3, p.316-322] [8] [9] [10] Subsequently there were great advances, particularlyin the electronic circuit design industry. [7, §4.7] [11] [12]-[21]1.0 c○ 2001 Jon Dattorro, all rights reserved.1.1 Its influence has already great impact; the web holds countless copies in bits and piecesspanning the book’s evolution.15

16 CHAPTER 1. PRELUDEThese pages constitute the author’s journal over a six year period fillingsome remaining gaps between mathematician and engineer. Consequentlythere is heavy cross-referencing and very much background material providedin the text, footnotes, and appendices so as to be more self-contained andto provide an understanding of fundamental concepts. Material contained inBoyd & Vandenberghe is, for the most part, not duplicated here althoughin some instances (e.g., matrix-valued functions and calculus (§3), linearalgebra, geometry, nonorthogonal and orthogonal projection, semidefiniteprogramming (§6), etcetera) we expand or elaborate topics found particularlyuseful in the study of Euclidean distance geometry, the main focus of thepresent work. In that sense, this journal may be regarded as a companion toBoyd & Vandenberghe.1.1 OverviewThe study of Euclidean distance matrices (EDMs) fundamentally asks whatcan be known geometrically given only distance information between pointsin Euclidean space. Each point may represent simply location or, abstractly,any entity expressible as a vector in finite-dimensional Euclidean space: thusinclusive of nearly all contemporary discrete signals and systems; e.g., thediscrete Fourier transform (DFT). 1.2 The answer to the question posed is thatvery much can be known; the mathematics is rich and deep. Throughout wecite beacons of historical accomplishment in this area of geometry. Theapplication of EDMs has already proven invaluable in discerning biologicalmolecular conformation. [22] The emerging practice of localization in wirelesssensor networks or the global positioning system (GPS) will certainly simplifyand benefit from this theory as well.We study the pervasive convex bodies and their various representations.In particular, we make convex polyhedra, cones, and dual cones more visceralthrough illustration in chapter 2, and we study the geometric relationof polyhedral cones to nonorthogonal bases (biorthogonal expansion). Theconic analogue to linear independence, called conic independence, is introducedas a new tool in the study of classical cone theory; the logical nextstep in the progression: linear, affine, conic.1.2 Yet to date there is no recorded appearance of EDMs in this particular field ofengineering except §4.15.1 herein.

1.1. OVERVIEW 17The concept of faces and extreme points and directions of convex bodiesis explained here in chapter 2; crucial to the understanding of convex optimizationproblems in later chapters, as all convex optimization problemshave quite visual geometric analogues. Any convex optimization problem hasa geometric interpretation. This is, in fact, the main attraction of convexoptimization; the ability to visualize geometry of a problem. This chapterprovides tools to make those visualizations easier.Chapter 3 presents pertinent results for multidimensional convex functionsignored in the literature; tricks and tips for determining their convexityand discerning their geometry.The Euclidean distance matrix (EDM) is studied in chapter 4, its propertiesand relationship to positive semidefinite (PSD) and Gram matrices.Problem types solvable via EDMs, and EDM problems posed as convex optimizationis discussed; e.g., we generate an isotonic map of the United Statesusing only comparative distance information (no distance information, onlydistance inequalities). The DFT is related to EDMs, a connection to digitalsignal processing of which we are previously unaware.In chapter 5 we illustrate the geometric relationship between EDM andPSD cones. In particular we explain geometric requirements for projectionof a candidate matrix on the PSD cone that establish its membership to theEDM cone. The faces of the EDM cone are described, but still open is thequestion whether all its faces are exposed as they are for the PSD cone. Theclassic Schoenberg criterion relating EDM and PSD cones is revealed to be adiscretized membership relation (a generalized inequality, a new Farkas’-likelemma) between the EDM cone and its ordinary dual.Semidefinite programming is reviewed in chapter 6 with particular attentionto optimality conditions of the primal and dual programs, their interplay,and the perturbation method of rank reduction of optimal solutions (extantbut not as well known as should be).Chapter 7 explores methods of solution to a few fundamental and prevalentEuclidean distance matrix proximity problems emphasizing projectionmethods on polyhedral cones of eigenvalues (spectral projection) for rankminimization, and their relation to convex optimization and Procrustes techniques(§C.2). We also derive a novel expression for the EDM cone as anintersection and vector sum of two subspaces and the PSD cone.

18 CHAPTER 1. PRELUDEChapter 8 investigates Euclidean distance matrix completion problemsand their relationship to convex optimization. What is the least amount ofdata contained in an incomplete EDM that makes the corresponding points inEuclidean space uniquely discernible to within an isometry? We demonstratea novel procedure for partitioning a large problem into many small problemswhose solutions are unique, easy to solve, and can be assembled into thesolution of the original large problem.

Chapter 2Convex geometryConvexity has an immensely rich structure and numerous applications.On the other hand, almost every “convex” idea can beexplained by a two-dimensional picture.−Alexander Barvinok [23]There is relatively less published pertaining to matrix-valued convex setsand functions. [24] [25, §6.6] As convex geometry and linear algebra areinextricably bonded, we provide much linear algebra background material(especially in the appendices) although it is assumed the reader is comfortablewith [26], [27], [28], or any other intermediate-level text. The essentialreferences to convex analysis are [29] [30]. The reader is referred to [31] [23][32] [33] [1] [34] [35] for a more comprehensive treatment of convexity.2.0 c○ 2001 Jon Dattorro, all rights reserved.19

20 CHAPTER 2. CONVEX GEOMETRY2.1 Convex setA set C is convex iff for all Y,Z ∈ C and 0≤µ≤1,µY + (1 − µ)Z ∈ C (1)Under that defining constraint on µ , the linear sum in (1) is called a convexcombination of Y and Z . If Y and Z are points in Euclidean vector spaceR n or R m×n , then (1) represents the closed line segment joining them. Allline segments are thereby convex sets. Apparent from the definition, a convexset is a connected set. [36, §3.4, §3.5] [33, p.2]The idealized chicken egg, an ellipsoid (Figure 2.1(c)), is a good organicicon for a convex set in three dimensions R 3 .2.1.0.1 subspaceA subset of Euclidean vector space R n is called a subspace (§2.4) if everyvector of the form αx+βy , for α,β ∈ R , is in the subset whenever x and yare. [37, §2.3] A subspace is a convex set containing the origin, by definition.[30, p.4] It is not difficult to showR n = −R n (2)as is true for any subspace R , because x∈ R n ⇔ −x∈ R n .Any subspace not constituting the entire ambient vector space is a propersubspace; e.g., any line through the origin in two-dimensional Euclideanspace R 2 . The vector space R n is itself a conventional subspace, inclusively,[38, §2.1] although not proper.2.1.0.2 affine setAn affine set (from the word affinity) is any subset of R n that is a translationof some subspace, hence convex; e.g., ∅ , point, line, plane, hyperplane(§2.3.2), subspace, etcetera. The intersection of an arbitrary collection ofaffine sets remains affine. The affine hull of a set C ⊆ R n (§2.2.1) is thesmallest affine set containing it.

2.1. CONVEX SET 212.1.0.3 dimensionDimension of an arbitrary set S is the dimension of its affine hull; [32, p.14]dim S ∆ = dim aff S (3)the same as dimension of the subspace parallel to that affine set. Hencedimension is synonymous with affine dimension. [29, A.2.1]2.1.0.4 relative interiorWhenever a point x lies interior to some set C ⊆ R n , then we say its interiorint C is a neighborhood of x . [36, §2.1.1] We distinguish interior from relativeinterior throughout. [31] [32] [35] The relative interior of a convex set C isits interior relative to its affine hull. 2.1 Thus defined, it is common (thoughconfusing) for the interior of C to be empty while its relative interior is not:this happens whenever dimension of its affine hull is less than dimension ofthe ambient space (dim aff C < n , e.g., were C a flat piece of paper in R 3 ) orin the exception when C is a single point; [36, §2.2.1]rel int{x} ∆ = aff{x} = {x}, int{x} = ∅, x∈ R n (4)In any case, the closure of the relative interior of a convex set always yieldsthe closure of the set itself; rel int C = C .2.1.0.5 emptinessEmptiness of a set ∅ is handled differently than interior in the classicalliterature. It is common for a nonempty convex set to have empty interior;[29, §A.2.1] e.g., paper in the real world. Thus the term relative is theconventional fix to this ambiguous terminology: 2.2An ordinary flat piece of paper is a nonempty convex set havingrelatively nonempty interior.2.1 Likewise for relative boundary, although relative closure is superfluous. [29, §A.2.1]2.2 Superfluous mingling of terms as in relatively nonempty set would be an unfortunateconsequence.

22 CHAPTER 2. CONVEX GEOMETRY(a)R(b)R 2(c)R 3Figure 2.1: Showing intersection of line with boundary of convex Euclideanobject can be a point: (a) Boundary consists of two points. Intersection ofline with ellipsoid in R , (b) in R 2 , (c) in R 3 .2.1.0.6 classical boundaryThe classical definition of boundary of a set C is the closure of C less itsinterior presumed nonempty; [39, §1.1]∂ C = C \ int C (5)which follows from the fact int C = C assuming nonempty interior. Oneimplication is: an open set has a boundary defined although not containedin the set.2.1.0.7.1 Example. Orthant: Name given to a closed convex setthat is the higher-dimensional generalization of quadrant from the classicalCartesian partition of R 2 . The most common is the nonnegative orthantR n + or R n×n+ (analogue to quadrant I) to which membership denotes all nonnegativevector- or matrix-entries respectively;R n +∆= {x∈ R n | x i ≥ 0 ∀i} (6)The nonpositive orthant R ṉ or R n×n- (III) denotes negative and 0 entries.The orthant R n i- for example, identifies the region in R n whose members’

2.1. CONVEX SET 23sole negative coordinate is their i th (II or IV). Orthant convexity 2.3 is easilyverified by definition (1).2.1.0.8 sum, product, differenceThe vector sum of two convex sets C 1 and C 2C 1 + C 2 = {x + y | x ∈ C 1 , y ∈ C 2 } (7)and Cartesian product{[ xC 1 × C 2 =y]| x ∈ C 1 , y ∈ C 2}(8)remain convex. By additive inverse, we can similarly define the vectordifference of two convex setsC 1 − C 2 = {x − y | x∈ C 1 , y ∈ C 2 } (9)which is convex. Applying this definition to nonempty convex C 1 , the selfdifferenceC 1 − C 1 is generally nonempty, nontrivial, and convex.Convex results are also obtained for scaling κ C , rotation/reflection Q C ,or translation C+α of a convex set C ; all similarly defined.Given any operator T and convex set C , we are prone to write T(C)meaningT(C) ∆ = {T(x) | x∈ C} (10)Given linear operator T , it therefore follows from (7),T(C 1 + C 2 ) = {T(x + y) | x∈ C 1 , y ∈ C 2 }= {T(x) + T(y) | x∈ C 1 , y ∈ C 2 }= T(C 1 ) + T(C 2 )(11)2.1.0.9.1 Theorem. Intersection. [1, §2.3.1] [30, §2] The intersectionof an arbitrary collection of convex sets is convex.⋄Note that the converse is implicitly false in so far as a convex set can beformed by the intersection of sets that are not.2.3 All orthants are self-dual simplicial cones. (§2.8.3.2, §2.7.3.0.1)

24 CHAPTER 2. CONVEX GEOMETRYTogether with the theorems of §E.8.0.0.1 it can be shown, for example,that a line can intersect the boundary of a convex set in any dimension at apoint. This is intuitively plausible because a line intersects the boundary ofthe similar convex objects in Figure 2.1 at a point in R , R 2 , and R 3 .2.1.0.10.2 Theorem. Image, Inverse image. [30, §3] [1, §2.3.2]Let f be a mapping from R p×k to R m×n .• The image of a convex set C under any affine function (§3.1.1.1)is convex.• The inverse image 2.4 of a convex set F ,f(C) = {f(X) | X ∈ C} ⊆ R m×n (12)f −1 (F) = {X | f(X)∈ F} ⊆ R p×k (13)a single or many-valued mapping, under any affine function f is convex.⋄In particular, any affine transformation of an affine set remains affine.[30, p.8]Each converse of this two-part theorem is generally false; id est, 2.5 givenf affine, a convex image f(C) does not imply that set C is convex, and neitherdoes a convex inverse image f −1 (F) imply set F is convex. A counterexampleis easy to visualize when the affine function is an orthogonal projector[26] [37]:2.1.0.11.3 Corollary. Projection on subspace. [30, §3] 2.6Orthogonal projection of a convex set on a subspace is another convex set.⋄Again, the converse is false. Shadows, for example, are umbral projectionsthat can be convex when the object providing the shade is not.2.4 See §2.6.6.3.4 for an example.2.5 The expansion of common Latin terms reflects the author’s disdain for acronyms; e.g.,www. Yet we will substitute the abbreviation e.g. in place of the Latin exempli gratia.2.6 The corollary holds more generally for projection on hyperplanes (§2.3.2); [32, §6.6]hence, for projection on affine subsets (§2.2.1, nonempty intersections of hyperplanes).Orthogonal projection on affine subsets is reviewed in §E.4.

2.1. CONVEX SET 252.1.1 Vectorized matrix inner productEuclidean space R n comes equipped with a linear vector inner product〈y,z〉 ∆ = y T z (14)Two vectors are orthogonal (perpendicular) to one another if and only if theirinner product vanishes;y ⊥ z ⇔ 〈y,z〉 = 0 (15)An inner product defines a norm‖y‖ 2 ∆ = √ y T y , ‖y‖ 2 = 0 ⇔ y = 0 (16)When orthogonal vectors each have unit norm, then they are orthonormal.For linear operations on vectors represented by real matrices, the adjointoperation is transposition and defined for matrix operator A by [38, §3.10]〈Ay,z〉 = 〈y,A T z〉 (17)The vector inner product for matrices is calculated just as it is for vectorsby first transforming a matrix in R p×k to a vector in R pk by concatenatingits columns in the natural order. For lack of a better term, we shall call thatlinear bijective transformation vectorization. For example, the vectorizationof Y = [y 1 y 2 · · · y k ] ∈ R p×k [40] [41] isvec Y ∆ =⎡⎢⎣⎤y 1y 2⎥.y k⎦ ∈ Rpk (18)Then the vectorized-matrix inner product is the trace of the matrix innerproduct; for Z ∈ R p×k , [1, §2.6.1] [29, §0.3.1] [42, §8] [43, §2.2]where〈Y , Z〉 ∆ = tr(Y T Z) = vec(Y ) T vec Z (19)tr(Y T Z) = tr(ZY T ) = tr(YZ T ) = tr(Z T Y ) = 1 T (Y ◦ Z)1 (20)

26 CHAPTER 2. CONVEX GEOMETRYand where ◦ denotes the Hadamard product 2.7 of matrices [28] [44, §1.1.4].The adjoint operation can therefore be defined in like manner:〈AY , Z〉 = 〈Y , A T Z〉 (21)For example, take any element C 1 from a matrix-valued set in R p×k ,and consider any particular dimensionally compatible real vectors v andw . Then the vector inner product of C 1 with vw T is〈vw T , C 1 〉 = v T C 1 w = tr(wv T C 1 ) = 1 T( (vw T )◦ C 1)1 (22)Example. Application of the image theorem. Suppose the set C ⊆ R p×kis convex. Then for any particular vectors v ∈R p and w ∈R k , the set of vectorinner productsY ∆ = v T Cw = 〈vw T , C〉 ⊆ R (23)is convex. This result is a consequence of the image theorem. Yet it iseasy to show directly that a convex combination of inner products from Yremains an element of Y . 2.8More generally, vw T in (23) may be replaced with any particular matrixZ ∈ R p×k while convexity of the set 〈Z , C〉 ⊆ R persists. Further, replacingv and w with any particular respective matrices U and W of dimensioncompatible with convex set C , the set U T CW is convex by the image theorembecause it is a linear mapping of C .2.7 The Hadamard product is a simple entry-wise product of corresponding entries fromtwo matrices of like size; id est, not necessarily square.2.8 To verify that, take any two elements C 1 and C 2 from the convex matrix-valued set C ,and then form the vector inner products (23) that are two elements of Y by definition.Now make a convex combination of those inner products; videlicet, for 0≤µ≤1,µ 〈vw T , C 1 〉 + (1 − µ) 〈vw T , C 2 〉 = 〈vw T , µ C 1 + (1 − µ)C 2 〉 (24)The two sides are equivalent by linearity of the inner product. The right-hand side remainsa vector inner product of vw T with an element µ C 1 + (1 − µ)C 2 from the convex set C ;hence belongs to Y . Since that holds true for any two elements from Y , then it must bea convex set.

2.1. CONVEX SET 272.1.1.1 Frobenius’When Z =Y ∈ R p×k in (19), Frobenius’ norm is resultant;‖Y ‖ 2 F = ‖ vec Y ‖2 2 = 〈Y , Y 〉 = tr(Y T Y ) = δ(Y Y T ) T 1= ∑ i,jY 2ij = ∑ iλ(Y T Y ) i = ∑ iσ(Y ) 2 i(25)where λ(Y T Y ) i is the i th eigenvalue of Y T Y , and σ(Y ) i the i th singularvalue of Y . Were Y a normal matrix (§A.5.2), then σ(Y ) = |λ(Y )|[45, §8.1] thus‖Y ‖ 2 F = ∑ λ(Y ) 2 i = ‖λ(Y )‖ 2 2 (26)iThe converse (26) ⇒ normal Y also holds. [28, §2.5.4] Because the metrics‖ vec X −vec Y ‖ 2 = ‖X −Y ‖ F (27)are equivalent and because vectorization (18) is a linear bijective map, thenvector space R p×k is isometrically isomorphic with vector space R pk in theEuclidean sense and vec is an isometric isomorphism on R p×k (but not onthe vector space that is the range or nullspace associated with a particularmatrix (§2.4, e.g., p.217)). 2.9 Because of this Euclidean structure, all theknown results from convex analysis in Euclidean space R n carry over directlyto the space of real matrices R p×k .The Frobenius norm is orthogonally invariant; meaning, for X,Y ∈ R p×kand compatible orthonormal matrix U and orthogonal matrix Q ,‖U(X −Y )Q‖ F = ‖X −Y ‖ F (28)2.9 An isometric isomorphism of a vector space is a linear bijective mapping T (one-to-oneand onto [38, App.A1.2]) that preserves distance; id est, for all x,y ∈dom T ,‖Tx − Ty‖ = ‖x − y‖Unitary linear operator Q : R n → R n representing orthogonal matrix Q∈ R n×n (§B.5), forexample, is an isometric isomorphism. Yet isometric operator T : R 2 → R 3 representing⎡ ⎤1 0T = ⎣ 0 1 ⎦0 0is injective on R 2 but not a surjective map [38, §1.6] to R 3 .

28 CHAPTER 2. CONVEX GEOMETRY2.1.2 Symmetric matricesDefinition. Symmetric subspace. Define a subspace of R M×M : theconvex set of all symmetric M ×M matrices;S M ∆ = { A∈ R M×M | A=A T} ⊆ R M×M (29)This subspace comprising symmetric matrices S M is isomorphic[38, §2.8-8, §3.2-2] 2.10 [46] with the vector space R M(M+1)/2 whose dimensionis the number of free variables in a symmetric M ×M matrix. Theorthogonal complement [26] [37] of S M isS M⊥ ∆ = { A∈ R M×M | A=−A T} ⊂ R M×M (30)the subspace of all antisymmetric matrices in R M×M ; id est,S M ⊕ S M⊥ = R M×M (31)△All antisymmetric matrices are hollow by definition (have 0 maindiagonal).Any square matrix A ∈ R M×M can be written as the sum ofits symmetric and antisymmetric parts: respectively,A = 1 2 (A +AT ) + 1 2 (A −AT ) (32)The symmetric part is orthogonal in R M2 to the antisymmetric part; videlicet,tr ( (A T + A)(A −A T ) ) = 0 (33)In the ambient space of real matrices, the antisymmetric subspace can bedescribed { }S M⊥ =∆ 12 (A −AT ) | A∈ R M×M ⊂ R M×M (34)because any matrix in S M is orthogonal to any matrix in S M⊥ . Furtherconfined to the ambient subspace of symmetric matrices, because of antisymmetry,S M⊥ would become trivial.2.10 An isomorphism of a vector space is a transformation equivalent to a linear bijectivemapping. The image and inverse image under the transformation operator are then calledisomorphic vector spaces.

2.1. CONVEX SET 292.1.2.1 Isomorphism on symmetric subspaceWhen a matrix is symmetric in S M , we may still employ the vectorizationtransformation (18) to R M2 ; vec, an isometric isomorphism. We might insteadchoose to realize in the lower-dimensional subspace R M(M+1)/2 by ignoringredundant entries (below the main diagonal) during transformation.Such a realization would remain isomorphic but not isometric. Lack of isometryis a spatial distortion due now to disparity in metric between R M2 andR M(M+1)/2 . To realize isometrically in R M(M+1)/2 , we must make a correction:For Y = [Y ij ]∈ S M we introduce the symmetric vectorization⎡ ⎤√2Y12Y 11Y 22svec Y =∆ √2Y13√2Y23∈ R M(M+1)/2 (35)⎢ Y 33 ⎥⎣ ⎦.Y MMwhere all entries off the main diagonal have been scaled. Now for Z ∈ S M ,〈Y , Z〉 ∆ = tr(Y T Z) = vec(Y ) T vec Z = svec(Y ) T svec Z (36)Then because the metrics become equivalent‖ svec X −svec Y ‖ 2 = ‖X −Y ‖ F (37)whenever X ∈ S M , and because symmetric vectorization (35) is a linearbijective mapping, svec is an isometric isomorphism on the symmetric matrixsubspace; in other words, S M is isometrically isomorphic with R M(M+1)/2 inthe Euclidean sense under the transformation svec .The set of all symmetric matrices S M forms a proper subspace in R M×M ,so for it there exists a standard orthonormal basis in isometrically isomorphicR M(M+1)/2 ;⎧⎨{E ij ∈ S M } =⎩e i e T i ,1 √2(e i e T j + e j e T ii = j = 1...M), 1 ≤ i < j ≤ M⎫⎬⎭(38)

30 CHAPTER 2. CONVEX GEOMETRYwhere M(M + 1)/2 standard basis matrices E ij are formed from the standardbasis vectors e i ∈ R M . Thus we have a basic orthogonal expansion forY ∈ S M ,M∑ j∑Y = 〈E ij ,Y 〉 E ij (39)whose coefficients〈E ij ,Y 〉 =j=1i=1{Yii , i = 1... MY ij√2 , 1 ≤ i < j ≤ M(40)correspond to entries of the symmetric vectorization (35).2.1.2.2 Symmetric hollow matricesDefinition. Symmetric hollow subspace. [47] Define a subspace ofR M×M : the convex set of all symmetric M ×M matrices having 0 maindiagonal;∆= { A ∈ S M | δ(A) = 0 } ⊂ R M×M (41)S M 0where the main diagonal of A ∈ R M×M is denoted (§A.1)δ(A) ∈ R M (42)This subspace comprising symmetric hollow matrices is isomorphic with subspaceR M(M−1)/2 . The orthogonal complement of S M 0 isS M⊥0∆= { A∈ R M×M | A=−A T + 2δ 2 (A) } ⊆ R M×M (43)the subspace of all antisymmetric antihollow matrices in R M×M ; id est,Yet defined instead as a proper subspace of S M ,S M 0 ⊕ S M⊥0 = R M×M (44)S M 0∆= { A ∈ S M | δ(A) = 0 } ⊂ S M (45)the orthogonal complement S M⊥0 of S M 0 in ambient S MS M⊥0∆= { A∈ S M | A=δ 2 (A) } ⊆ S M (46)

2.1. CONVEX SET 31is simply the set of all diagonal matrices; id est,S M 0 ⊕ S M⊥0 = S M (47)△Any matrix A∈ R M×M can be written as the sum of its symmetric hollowand antisymmetric antihollow parts: respectively,( ) ( 1 1A =2 (A +AT ) − δ 2 (A) +2 (A −AT ) + δ (A))2 (48)The symmetric hollow part is orthogonal in R M2 to the antisymmetric antihollowpart; videlicet,))1 1tr((2 (A +AT ) − δ (A))( 2 2 (A −AT ) + δ 2 (A) = 0 (49)In the ambient space of real matrices, the antisymmetric antihollow subspaceis described{ }∆ 1=2 (A −AT ) + δ 2 (A) | A∈ R M×M ⊆ R M×M (50)S M⊥0because any matrix in S M 0 is orthogonal to any matrix in S M⊥0 . Yet in theambient space of symmetric matrices S M , the antihollow subspace is nontrivial;S M⊥0∆= { δ 2 (A) | A∈ S M} = { δ(u) | u∈ R M} ⊆ S M (51)In anticipation of their utility with Euclidean distance matrices (EDMs)in §4, we introduce the linear bijective vectorization dvec that is the naturalanalogue to symmetric matrix vectorization svec (35) for symmetric hollowmatrices: For Y = [Y ij ]∈ S M 0 ,⎡dvecY = ∆ √ 2⎢⎣⎤Y 12Y 13Y 23Y 14Y 24Y 34 ⎥.⎦Y M−1,M∈ R M(M−1)/2 (52)

32 CHAPTER 2. CONVEX GEOMETRYLike before, dvec is an isometric isomorphism on the symmetric hollow subspace.The set of all symmetric hollow matrices S M 0 forms a proper subspace inR M×M , so for it there must be a standard orthonormal basis in isometricallyisomorphic R M(M−1)/2 ;{ }1 ( ){E ij ∈ S M 0 } = √ ei e T j + e j e T i , 1 ≤ i < j ≤ M (53)2where M(M −1)/2 standard basis matrices E ij are formed from the standardbasis vectors e i ∈ R M .

2.2. HULLS 33Figure 2.2: Convex hull of a random list of points in R 3 . Some pointsfrom that generating list reside in the interior of this convex polyhedron.[48, Convex Polyhedron] (Avis-Fukuda-Mizukoshi)2.2 Hulls2.2.1 Affine dimension, affine hullAscribe the points in a list {x l ∈ R n , l=1... N} to the columns of matrix X ;X = [x 1 · · · x N ] ∈ R n×N (54)The affine dimension of any nonempty list (or set) in R n is the dimensionof the smallest affine set (empty set, point, line, plane, hyperplane (§2.3.2),subspace, R n ) that contains it. Affine dimension is the same as the dimensionof the subspace parallel to that affine set. [30, §1] [29, §A.2.1] In particular,we define the affine dimension r of the N-point list X as the dimension ofthe smallest affine set in Euclidean space R n that contains X ; r is a lowerbound sometimes called the embedding dimension [47] [49]:r ∆ = dim aff X (55)That affine set in which the points are embedded is unique and called theaffine hull [1, §2.1.2] [31, §2.1];A ∆ = aff {x l , l=1... N} = aff X = {Xa | a T 1 = 1} ⊆ R n (56)The subspace of all symmetric matrices S m , for example, is the affine hullof the cone of positive semidefinite matrices; (§2.6.6)aff S m + = S m (57)

34 CHAPTER 2. CONVEX GEOMETRYGiven some arbitrary set C and any x∈ C ,aff C = x + aff(C − x) (58)where aff(C−x) is a subspace. Affine transformations preserve affine hulls.Given any affine mapping T , [30, p.8]aff(T C) = T(aff C) (59)We analogize affine subset to subspace, 2.11 defining it to be any nonemptyaffine set that is a subset of R n and parallel to a subspace. All affine sets areconvex.2.2.2 Convex hullThe convex hull [29, §A.1.4] [1, §2.1.4] [30] of any bounded 2.12 list (or set)of N points X ∈ R n×N forms a unique convex polyhedron (§2.7.0.0.1) whosevertices constitute some subset of that list;P ∆ = conv {x l , l=1... N} = conv X = {Xa | a T 1 = 1, a ≽ 0} ⊆ R n(60)The union of the relative interior and relative boundary of the polyhedroncomprise the convex hull P , the smallest closed convex set that contains thelist X ; e.g., Figure 2.2. Given P , the generating list {x l } is not unique.Given some arbitrary set C ,conv C ⊆ aff C = aff C = aff conv C (61)2.2.2.1 Comparison with respect to R N +The notation a ≽ 0 means vector a belongs to the nonnegative orthantR N + , whereas a≽b denotes comparison of vector a to vector b on R N withrespect to the nonnegative orthant; id est, a ≽ b means a−b belongs to thenonnegative orthant. In particular, a ≽ b ⇔ a i ≽ b i ∀i . (198)2.11 The popular term affine subspace is an oxymoron.2.12 A set in R n is bounded if and only if it can be contained in a Euclidean ball havingfinite radius. [50, §2.2] (confer §4.7.3.0.1)

2.2. HULLS 35Figure 2.3: A simplicial cone (§2.7.3.0.1) in R 3 whose boundary is drawntruncated; constructed using A∈ R 3×3 and C = 0 in (159). By the mostfundamental definition of a cone (§2.6.1), the entire boundary can be constructedfrom an aggregate of rays emanating exclusively from the origin.The extreme directions are the directions of the three edges (§2.5); they areconically and linearly independent for this cone. Because this set is polyhedral,the exposed directions are in one-to-one correspondence with theextreme directions; there are only three.2.2.2.1.1 Example. Convex hull of outer product. [51, §3] [52, §2.4]conv { XX T | X ∈ R n×k , X T X = I } = {A∈ S n | I ≽ A ≽ 0, 〈I , A 〉= k}(62)2.2.3 Conic hullIn terms of a finite-length point list (or set) arranged columnar in X ∈ R n×N(54), its conic hull is expressedK ∆ = cone {x l , l=1... N} = coneX = {Xa | a ≽ 0} ⊆ R n (63)The conic hull of any list forms a polyhedral cone [29, §A.4.3] (§2.7.1; e.g.,Figure 2.3); the smallest closed convex cone that contains the list.

36 CHAPTER 2. CONVEX GEOMETRYGiven some arbitrary set C , it is apparentconv C ⊆ cone C (64)2.2.3.1 Vertex-descriptionThe constraints in (56), (60), and (63) respectively define an affine, convex,and conic combination of elements from the set or list. Whenever a Euclideanobject can be described as some hull or span of a set of points, then thatrepresentation is loosely called a vertex-description.2.3 Halfspace, HyperplaneA two-dimensional affine set is called a plane. An (n −1)-dimensional affineset in R n is called a hyperplane. [30] [29] Every hyperplane partially boundsa halfspace (which is convex but not affine).2.3.1 Halfspaces H + and H −Euclidean space R n is partitioned into two halfspaces by any hyperplane∂H ; id est, H − + H + = R n . The resulting (closed convex) halfspaces, bothpartially bounded by ∂H , may be describedH − = {y | a T y ≤ b} = {y | a T (y − y p ) ≤ 0} ⊂ R n (65)H + = {y | a T y ≥ b} = {y | a T (y − y p ) ≥ 0} ⊂ R n (66)where nonzero vector a∈R n is an outward-normal to the hyperplane partiallybounding H − while an inward-normal with respect to H + . Visualization iseasier if we say b = a T y p ∈ R . Then for any vector y −y p that makes anobtuse angle with normal a , y will lie in the halfspace H − on one side(shaded in Figure 2.4) of the hyperplane, while acute angles denote y in H +on the other side.An equivalent more intuitive representation of a halfspace comes aboutwhen we consider all the points in R n closer to point d than to point c orequidistant, in the Euclidean sense; from Figure 2.4,H − = {y | ‖y − d‖ ≤ ‖y − c‖} (67)

2.3. HALFSPACE, HYPERPLANE 37H +ay pcyd∆∂H={y | a T y=b}H −N(a T )={y | a T y=0}Figure 2.4: Hyperplane illustrated ∂H is a line partially bounding halfspacesH − = {y |a T y ≤ b} and H + = {y |a T y ≥ b} in R 2 . Shaded is a rectangularpiece of semi-infinite H − with respect to which vector a is outward-normalto bounding hyperplane; vector a is inward-normal with respect to H + . Inthis particular instance, H − contains nullspace N(a T ) (dashed line throughorigin) because b > 0. Hyperplane, halfspace, and nullspace are each drawntruncated. Points c and d are equidistant from hyperplane, and vectorc − d is normal to it. ∆ is distance from origin to hyperplane.

38 CHAPTER 2. CONVEX GEOMETRYThis representation, in terms of proximity, is resolved with the more conventionalrepresentation of a halfspace (65) by squaring both sides of theinequality in (67);}H − ={y | (c − d) T y ≤ ‖c‖2 − ‖d‖ 2=2( {y | (c − d) T y − c + d ) }≤ 02(68)A halfspace may be represented just as well using a matrix variable Y .For A,Y ∈ R m×n , and b = 〈A , Y p 〉 ∈ R , (§2.1.1)H − = {Y ∈ R mn | 〈A, Y 〉 ≤ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≤ 0} (69)H + = {Y ∈ R mn | 〈A, Y 〉 ≥ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≥ 0} (70)2.3.1.1 PRINCIPLE 1: Halfspace-description of convex setsThe most fundamental principle in convex geometry follows from thegeometric Hahn-Banach theorem [37, §5.12] [53, §1] that guarantees anyclosed convex set to be an intersection of halfspaces.2.3.1.1.1 Theorem. Halfspaces. [1, §2.3.1] [30, §18][29, §A.4.2(b)] [33, §2.4] A closed convex set in R n is equivalent to theintersection of all halfspaces that contain it.⋄The intersection of multiple halfspaces may be represented using a matrixconstant A ;⋂H i−= {y | A T y ≼ b} = {y | A T (y − y p ) ≼ 0} (71)i⋂H i+= {y | A T y ≽ b} = {y | A T (y − y p ) ≽ 0} (72)iwhere b is now a vector, and the i th column of A is normal to a hyperplane∂H i partially bounding H i . By the halfspaces theorem, intersections likethis can describe interesting convex Euclidean objects such as polyhedra andcones, giving rise to the term halfspace-description.

2.3. HALFSPACE, HYPERPLANE 3911−1−11[ 1a =1][ −1b =−1]−1−11{y | a T y=1}{y | b T y=−1}{y | a T y=−1}{y | b T y=1}[ −1c =1{y | c T y=1}]−11−11{y | d T y=−1}−11−11d =[ 1−1]{y | c T y=−1}{y | d T y=1}e−11e1e2Figure 2.5: Hyperplanes in R 2 (truncated). Hyperplane movement in directionof normal increases inner product. This simple visual concept canbe exploited to attain the analytic solution of some linear programs; e.g.,[1, exer.4.8-4.20]

40 CHAPTER 2. CONVEX GEOMETRY2.3.2 Hyperplane ∂H representationsEvery hyperplane ∂H is an affine set parallel to an (n −1)-dimensional subspaceof R n ; it is itself a subspace if and only if it contains the origin.dim∂H = n − 1 (73)Every hyperplane can be described as the intersection of complementaryhalfspaces; [30, §19]∂H = H − ∩ H + = {y | a T y ≼ b, a T y ≽ b} = {y | a T y = b} (74)a halfspace-description. Assuming normal a ∈ R n to be nonzero, then anyhyperplane in R n can be described as the solution set to the vector equationa T y = b , illustrated in Figure 2.4 and Figure 2.5 for R 2 ;∂H ∆ = {y | a T y = b} = {y | a T (y−y p ) = 0} = {Zξ+y p | ξ ∈ R n−1 } ⊂ R n (75)All solutions y constituting the hyperplane are offset from the nullspace ofa T by the same constant vector y p ∈ R n that is any particular solution toa T y=b ; id est,y = Zξ + y p (76)where the columns of Z ∈ R n×n−1 constitute a basis for the nullspaceN(a T ) . 2.13Conversely, given any point y p in R n , the unique hyperplane containingit having normal a is the affine set ∂H (75) where b equals a T y p andR(Z) = N(a T ). The hyperplane dimension is apparent from the dimensionsof Z ; the hyperplane is parallel to the span of its columns.2.3.2.1 Distance from origin to hyperplaneGiven the (shortest) distance ∆∈R + from the origin to a hyperplane havingnormal vector a , we can find its representation ∂H by dropping a perpendicular.The point thus found is the orthogonal projection of the origin on ∂H(§E.5.0.1.5), equal to a∆/‖a‖ if the origin is known a priori to belong to2.13 We will later find this expression of y in terms of nullspace of a T (more generally, ofmatrix A T ) to be a useful device for eliminating affine equality constraints, much as wedid here.

2.3. HALFSPACE, HYPERPLANE 41halfspace H − (Figure 2.4), or to −a∆/‖a‖ if the origin belongs to halfspaceH + ; id est, when H − ∋0or when H + ∋0∂H = { y | a T (y − a∆/‖a‖) = 0 } = { y | a T y = ‖a‖∆ } (77)∂H = { y | a T (y + a∆/‖a‖) = 0 } = { y | a T y = −‖a‖∆ } (78)Knowledge of only distance ∆ and normal a thus introduces ambiguity intothe hyperplane representation.2.3.2.2 Matrix variableHyperplanes may be represented equally well using matrix variables instead.For A,Y ∈ R m×n , and b = 〈A , Y p 〉 ∈ R , (§2.1.1)∂H = {Y | 〈A, Y 〉 = b} = {Y | 〈A, Y −Y p 〉 = 0} ⊂ R mn (79)Vector a is normal to the hyperplane illustrated in Figure 2.4. Likewise inthe case of matrix variables, for nonzero A we have2.3.2.3 Vertex-description of hyperplaneA ⊥ ∂H in R mn (80)A hyperplane may be described as the affine hull of a minimal set of points{x l } arranged columnar in a matrix X ∈ R n×n (54):∂H = aff{x l ∈ R n } | l = 1... n, dim aff{x l }=n−1= aff X | dim aff X = n−1= x 1 + R{x l − x 1 } | l=2... n, dim R{x l − x 1 }=n−1= x 1 + R(X − x 1 1 T ) | dim R(X − x 1 1 T ) = n−1(81)2.3.2.3.1 Affine independence, minimal set. For affine sets, a minimalset of points constituting a vertex-description is an affinely independent(a.i.) descriptive set and vice versa; {x l , l=1... n} is affinely independentif and only if {x l −x 1 , l=2... n} is linearly independent (l.i.). [29, §A.1.3][54, §3] We deducel.i. ⇒ a.i. (82)

42 CHAPTER 2. CONVEX GEOMETRY(a)Yy paH +H −∂H −(b)Yy pãH −H +∂H +Figure 2.6: (a) Hyperplane ∂H − supporting closed set Y ∈ R 2 . (b) ∂H +unconventionally supporting Y . Tradition [29] [30] recognizes only positivenormal polarity in support function as in (84); id est, normal a figure (a).2.3.2.3.2 Vertex-description of halfspace. Logically consequent tohyperplane definition (81) comes a new representation of a halfspaceH = ⋃ζ≥0ζ x 1 + R{x l − x 1 } | l=2... n, dim R{x l − x 1 }=n−1= {ζ x 1 + R{x l − x 1 } | l=2... n, dim R{x l − x 1 }=n−1, ζ ≥0}(83)that is a union of parallel hyperplanes. Were x 1 normal to R{x l − x 1 } ,then this halfspace could be designated H + by our convention; H − wouldcorrespond with ζ ≤ 0.2.3.2.4 PRINCIPLE 2: Supporting hyperplaneThe second most fundamental principle of convex geometry also follows fromthe geometric Hahn-Banach theorem [37, §5.12] [53, §1] that guarantees existenceof at least one hyperplane in R n supporting a convex set (having

2.3. HALFSPACE, HYPERPLANE 43nonempty interior) 2.14 at each point on its boundary.2.3.2.4.1 Definition. Supporting hyperplane ∂H . The partial boundary∂H of a closed halfspace containing arbitrary set Y is called a supportinghyperplane ∂H to Y when it contains at least one point of Y . [30, §11] Specifically,given normal a ≠0 (belonging to H + by definition), the supportinghyperplane to Y at y p ∈ ∂Y [sic] is∂H − = { y | a T (y − y p ) = 0, y p ∈ Y , a T (z − y p ) ≤ 0 ∀z ∈ Y }= { y | a T y = sup{a T z |z∈Y} } (84)where normal a and set Y reside in opposite halfspaces. (Figure 2.6(a)) Thereal function σ Y (a) ∆ = sup{a T z |z∈Y} is called the support function of Y .An equivalent but non-traditional representation is obtained by reversingthe polarity of normal a ; (1166)∂H + = { y | ã T (y − y p ) = 0, y p ∈ Y , ã T (z − y p ) ≥ 0 ∀z ∈ Y }= { y | ã T y = − inf{ã T z |z∈Y} = sup{−ã T z |z∈Y} } (85)where normal ã and set Y now reside in H + . (Figure 2.6(b))When the supporting hyperplane contains only a single point of Y , thathyperplane is termed strictly supporting (and termed tangent to Y if thesupporting hyperplane is unique there [30, §18, p.169]).△There is no geometric difference 2.15 between supporting hyperplane ∂H +or ∂H − and an ordinary hyperplane ∂H coincident with them.2.3.2.5 PRINCIPLE 3: Separating hyperplaneThe third most fundamental principle of convex geometry again follows fromthe geometric Hahn-Banach theorem [37, §5.12] [53, §1] that guarantees existenceof a hyperplane separating two nonempty convex sets in R n whoserelative interiors are nonintersecting. Separation intuitively means each setbelongs to a halfspace on an opposing side of the hyperplane. There are twocases of interest:2.14 It is customary to speak of a hyperplane supporting set C but not containing C .[30, p.100]2.15 If vector-normal polarity is unimportant, we may instead represent a supporting hyperplaneby ∂H .

44 CHAPTER 2. CONVEX GEOMETRY1) If the two sets intersect only at their relative boundaries, then thereexists a separating hyperplane ∂H containing the intersection but containingno points relatively interior to either set. If at least one of thetwo sets is open, conversely, then the existence of a separating hyperplaneimplies the two sets are nonintersecting. [1, §2.5.1]2) A strictly separating hyperplane ∂H intersects the closure of neitherset; its existence is guaranteed when the intersection of the closures isempty and at least one set is bounded. [29, §A.4.1]2.4 Subspace representationsThere are two common forms of expression for subspaces, both coming fromelementary linear algebra: range form and nullspace form; a.k.a., vertexdescriptionand halfspace-description.The fundamental vector subspaces associated with a matrix A ∈ R m×n[26, §3.1] are ordinarily relatedand of dimension:R(A T ) ⊥ N(A), N(A T ) ⊥ R(A) (86)dim R(A T ) = dim R(A) = rankA ≤ min{m,n} (87)dim N(A) = n − rankA, dim N(A T ) = m − rankA (88)From these four fundamental subspaces, the rowspace and range identify oneform of subspace description (range form or vertex-description (§2.2.3.1)),respectively,R(A T ) ∆ = {A T y | y ∈R m } = {x∈ R n | A T y=x, y ∈R(A)} (89)R(A) ∆ = {Ax | x∈R n } = {y ∈ R m | Ax=y , x∈R(A T )} (90)while the nullspaces identify the second common form (nullspace form orhalfspace-description (74)),N(A) ∆ = {x∈ R n | Ax=0} (91)N(A T ) ∆ = {y ∈ R m | A T y=0} (92)

2.4. SUBSPACE REPRESENTATIONS 45The range forms (89) (90) are realized as the respective span of the columnvectors in matrices A T and A .The nullspace form (91) or (92) is the solution set to a linear equationsimilar to hyperplane definition (75). Yet because matrix A generally hasmultiple rows, halfspace-description N(A) is actually the intersection of asmany hyperplanes through the origin; for (91), each row of A is normal to ahyperplane while for (92), each row of A T is a normal.2.4.1 Subspace or affine set. ..Any particular vector subspace R p can be described as N(A) the nullspaceof some matrix A or as R(B) the range of some matrix B .More generally, we have the choice of expressing an n − m-dimensionalaffine subset in R n as the intersection of m hyperplanes, or as the offset spanof n − m vectors:2.4.1.1 as hyperplane intersectionAny affine subset A of dimension n−m can be described as an intersectionof m hyperplanes in R n ; given fat full-rank (rank = min{m,n}) matrixand vector b∈R m ,⎡a T1A =∆ ⎣ .a T m⎤⎦∈ R m×n (93)A ∆ = {x∈ R n | Ax=b} =m⋂ { }x | aTi x=b ii=1(94)a halfspace-description. (74)For example: The intersection of any two independent hyperplanes in R 3is a line, whereas three independent hyperplanes intersect at a point. In R 4 ,the intersection of two independent hyperplanes is a plane, whereas threehyperplanes intersect at a line, four at a point, and so on.From the result in §2.3.2.3, any affine subset A therefore also has a vertexdescription.

46 CHAPTER 2. CONVEX GEOMETRY2.4.1.2 as span of nullspace basisAlternatively, we may compute a basis for the nullspace of matrix A andthen equivalently express the affine subset as its range plus an offset: DefineZ ∆ = basis N(A)∈ R n×n−m (95)so that AZ =0. Then we have the vertex-description,A = { Zξ + x p | ξ ∈ R n−m} ⊆ R n (96)the offset span of n−m column vectors, where x p is any particular solutionto Ax=b .2.4.2 Intersection of subspacesThe intersection of nullspaces associated with two matrices A ∈ R m×n andB ∈ R k×n can be expressed most simply as([ AN(A) ∩ N(B) = NB])[∆ A= {x∈ R n |B]x = 0} (97)the nullspace of their row-wise concatenation.Suppose the columns of a matrix Z constitute a basis for N(A) while thecolumns of a matrix W constitute a basis for N(BZ). Then [44, §12.4.2]N(A) ∩ N(B) = R(ZW) (98)If each basis is orthonormal, then the columns of ZW constitute an orthonormalbasis for the intersection.In the particular circumstance A and B are each positive semidefinite[55, §6], or in the circumstance A and B are two linearly independent dyads(§B.1.1), thenN(A) ∩ N(B) = N(A + B),⎧⎨⎩A,B ∈ S M +orA + B = u 1 v T 1 + u 2 v T 2(l.i.)(99)

2.5. EXTREME, EXPOSED 472.5 Extreme, ExposedDefinition. Extreme point. An extreme point x ε of a convex set Cis a point, belonging to its closure C [33, §3.3], that is not expressible as aconvex combination of points in C distinct from x ε ; id est, for x ε ∈ C andall x 1 ,x 2 ∈ C \x ε ,µx 1 + (1 − µ)x 2 ≠ x ε , µ ∈ [0, 1] (100)△In other words, x ε is an extreme point of C if and only if x ε is not a pointrelatively interior to any line segment in C . [35, §2.10]The set consisting of a single point C ={x ε } is itself an extreme point.Theorem. Extreme existence. [30, §18.5.3] [23, §II.3.5] A nonemptyclosed convex set containing no lines has at least one extreme point. ⋄Definition. Face, edge. [29, §A.2.3]• A face F of convex set C is a convex subset F ⊆ C such that every closedline segment x 1 x 2 in C , having an interior-point x∈rel intx 1 x 2 in F ,has both endpoints in F . The zero-dimensional faces of C constituteits extreme points. The empty set and C itself are conventional facesof C . [30, §18]• All faces F are extreme sets by definition; id est, for F ⊆ C and allx 1 ,x 2 ∈ C \F ,µx 1 + (1 − µ)x 2 /∈ F , µ ∈ [0, 1] (101)• A one-dimensional face of a convex set is called an edge.△The dimension of a face is the penultimate number of affinely independentpoints (§2.3.2.3.1) belonging to it;dim F = sup dim{x 2 − x 1 , x 3 − x 1 , ... x ρ − x 1 | x i ∈ F , i=1... ρ} (102)ρ

48 CHAPTER 2. CONVEX GEOMETRYABCDFigure 2.7: Closed convex set in R 2 . Point A is exposed hence extreme.Point B is extreme but not an exposed point. Point C is exposed and extreme.Point D is neither an exposed or extreme point although it belongs to a onedimensionalexposed face. [29, §A.2.4] [31, §3.6] Closed face AB is exposed;a facet. The arc is not a conventional face, yet it is composed entirely ofextreme points. The union of rotations of this entire set about its verticaledge produces another set in three dimensions having no edges; the same isfalse for rotation about the horizontal edge.The point of intersection in C with a strictly supporting hyperplane identifiesan extreme point, but not vice versa. The nonempty intersection ofany supporting hyperplane with C identifies a face, in general, but not viceversa. To acquire a converse, the concept exposed face requires introduction:2.5.1 Exposure2.5.1.0.1 Definition. Exposed face, exposed point, vertex, facet.[29, §A.2.3, A.2.4]• F is an exposed face of an n-dimensional convex set C iff there is asupporting hyperplane ∂H to C such thatF = C ∩ ∂H (103)Only faces of dimension −1 through n−1 can be exposed by a hyperplane.• An exposed point, the definition of vertex, is equivalent to a zerodimensionalexposed face; the point of intersection with a strictly supportinghyperplane.

2.5. EXTREME, EXPOSED 49• A facet is an (n − 1)-dimensional exposed face of an n-dimensionalconvex set C ; in one-to-one correspondence with the (n−1)-dimensionalfaces. 2.16• {exposed points} = {extreme points}{exposed faces} ⊆ {faces}△2.5.1.1 Density of exposed pointsFor any closed convex set C , its exposed points constitute a dense subset ofits extreme points; [30, §18] [56] [31, §3.6, p.115] dense in the sense [48] thatclosure of that subset yields the set of extreme points.For the convex set illustrated in Figure 2.7, point B cannot be exposedbecause it relatively bounds both the facet AB and the closed quarter circle,each bounding the set. Since B is not relatively interior to any line segmentin the set, then B is an extreme point by definition. Point B may be regardedas the limit of some sequence of exposed points beginning at C .2.5.1.2 Face transitivity and algebraFaces enjoy a transitive relationship. If F 1 is a face (an extreme set) of F 2which, in turn, is a face of F 3 , then it is always true that F 1 is a face ofF 3 . [30, §18] [57, def.115/6, p.358] For example, any extreme point of F 2 isan extreme point of F 3 . (The parallel statement for exposed faces is false.)Yet it is erroneous to presume that a face, of dimension 1 or more, consistsentirely of extreme points, nor is a face of dimension 2 or more entirelycomposed of edges, and so on.For the polyhedron in R 3 from Figure 2.2, for example, the nonemptyfaces exposed by a hyperplane are the vertices, edges, and facets; there areno more. The zero-, one-, and two-dimensional faces are in one-to-one correspondencewith the exposed faces in that example.Define the smallest face F of a convex set C containing some element G :F(C ∋G) (104)2.16 This coincidence occurs simply because the facet’s dimension is the same as the dimensionof the supporting hyperplane exposing it.

50 CHAPTER 2. CONVEX GEOMETRYAn affine set has no faces except itself and the empty set. The smallest faceof the intersection of C with an affine set A is [52, §2.4]F((C ∩ A)∋G) = F(C ∋G) ∩ A (105)2.5.1.3 BoundaryThe classical definition of boundary of a set C presumes nonempty interior:∂ C = C \ int C (5)More suitable for the study of convex sets is the relative boundary[29, §A.2.1.2]which is conventionally equivalent to:rel∂C = C \ rel int C (106)2.5.1.3.1 Definition. Conventional boundary of convex set. [29, §C.3.1]The relative boundary ∂ C of a nonempty convex set C is the union of all theexposed faces of C .△Equivalence of these two definitions comes about because it is conventionallypresumed that any supporting hyperplane, central to the definitionof exposure, does not contain C . [30, p.100]A bounded convex polyhedron (§2.7.0.0.1) having nonempty interior, forexample, in R has a boundary constructed from two points, in R 2 fromat least three line segments, in R 3 from convex polygons, while a convexpolychoron (a bounded polyhedron in R 4 [48]) has a boundary constructedfrom three-dimensional convex polyhedra.By Definition 2.5.1.3.1, an affine set has no relative boundary. Analogousto the result (105) for faces of a convex set C intersecting an affine set A ,rel ∂(C ∩ A) = rel∂(C) ∩ A (107)

2.6. CONES 51X(a)00(b)XFigure 2.8: (a) Two-dimensional non-convex cone drawn truncated. Boundaryof this cone is itself a cone. [37, §2.4] (b) This convex cone (drawntruncated) is a line through the origin in any dimension.X0Figure 2.9: Truncated non-convex cone in R 2 . Boundary is also a cone.[37, §2.4]2.6 ConesDefinition. Ray. The one-dimensional set{ζΓ + B | ζ ≥ 0, Γ ≠ 0} ⊂ R n (108)defines a half-line called a ray in direction Γ∈ R n having base B ∈ R n . WhenB=0, a ray is the conic hull of direction Γ ; hence a convex cone.△The conventional boundary of a single ray, base 0, in any dimension isthe origin because that is the union of all exposed faces not containing theentire set.

52 CHAPTER 2. CONVEX GEOMETRYXXFigure 2.10: Truncated non-convex cone X = {x ∈ R 2 | x 1 ≥ x 2 , x 1 x 2 ≥ 0}.Boundary is also a cone. [37, §2.4] Cartesian axes drawn for reference.2.6.1 ConeA set X is called, simply, cone if and only ifΓ ∈ X ⇒ ζΓ ∈ X for all ζ ≥ 0 (109)An example of such a cone is the union of two opposing quadrants; e.g.,X = { x∈ R 2 | x 1 x 2 ≥0 } which is not convex. [32, §2.5] Similar examples areshown in Figure 2.8 and Figure 2.10.Not all cones are necessarily convex, but they can all be defined by anaggregate of rays emanating exclusively from the origin.Hence all cones contain the origin and are unbounded, excepting thesimplest cone {0}. The empty set ∅ is not a cone, but [31, §2.1]cone ∅ ∆ = {0} (110)2.6.2 Convex coneWe call the set K ⊆ R M a convex cone iffΓ 1 , Γ 2 ∈ K ⇒ ζΓ 1 + ξΓ 2 ∈ K for all ζ,ξ ≥ 0 (111)

2.6. CONES 53Figure 2.11: Not a cone, ironically, the three-dimensional flared horn (withor without its interior) resembling the mathematical symbol ≻ denotingcone membership and partial order.Obvious from this definition, ζΓ 1 ∈ K and ξΓ 2 ∈ K for all ζ,ξ ≥ 0. Theset K is convex since, for any particular ζ,ξ ≥ 0,because µζ,(1 − µ)ξ ≥ 0.Obviously,µζΓ 1 + (1 − µ)ξΓ 2 ∈ K ∀µ ∈ [0, 1] (112){X } ⊃ {K} (113)the set of all convex cones is a proper subset of all cones. The set of convexcones is a narrower but more familiar class of cone, any member of whichcan be equivalently described as the intersection of a possibly (but not necessarily)infinite number of hyperplanes (through the origin) and halfspaceswhose bounding hyperplanes pass through the origin; a halfspace-description(§2.3). The interior of a convex cone is possibly empty.Familiar examples of convex cones include an unbounded ice-cream cone(a.k.a. second-order, quadratic, or Lorentz cone [1, exmp.2.3&2.25]) unitedwith its interior,{[ ]}xK l = ∈ R n × R | ‖x‖t2 ≤ t(114)and any polyhedral cone (§2.7.1); e.g., any orthant generated by theCartesian axes (§2.1.0.7.1). Esoteric examples of convex cones include thepoint at the origin, any line through the origin, any ray having the origin asbase such as the nonnegative real line R + in subspace R , any halfspace partiallybounded by a hyperplane through the origin, the positive semidefinite

54 CHAPTER 2. CONVEX GEOMETRYcone S M + (120), the cone of Euclidean distance matrices EDM N (335) (§5),any subspace, and R n .More Euclidean objects are cones, it seems, than are not.(confer Figure 2.8 - Figure 2.11)Theorem. Cone intersection. [30, §2, §19] The intersection of an arbitrarycollection of convex cones is a convex cone. The intersection of a finitenumber of polyhedral cones (§2.7.1, Figure 2.16, p.71) is polyhedral.⋄The property pointedness is associated with a convex cone.2.6.2.0.1 Definition. Pointed convex cone. (confer §2.7.2.1.1)A convex cone K is pointed iff it contains no line. Equivalently, K is notpointed iff there exists any nonzero direction Γ ∈ K such that −Γ ∈ K .[1, §2.4.1] If the origin is an extreme point of K or, equivalently, ifthen K is pointed, and vice versa. [31, §2.10]K ∩ −K = {0} (115)Thus the simplest convex cone K = {0} ⊆ R n is pointed by convention,but has empty interior.If closed convex K is not pointed, then it has no extreme point. Yeta pointed closed convex cone has only one extreme point; it resides at theorigin. [33, §3.3]From the cone intersection theorem it follows that an intersection of convexcones is pointed if at least one of the cones is.△2.6.2.0.2 Definition. Proper cone: [1, §2.4.1] A cone that is• convex• closed• pointed• and has nonempty interior.△Examples of proper cones are the positive semidefinite cone S M + in theambient space of symmetric matrices (§2.6.6), the nonnegative real line R +in vector space R , and any orthant in R n .

2.6. CONES 552.6.3 Cone boundaryEvery hyperplane supporting a convex cone contains the origin. [29, §A.4.2]Because any supporting hyperplane to a convex cone must therefore be itselfa cone, then from the cone intersection theorem it follows:2.6.3.0.1 Lemma. Cone faces. [23, §II.8]Each nonempty exposed face of a convex cone is a convex cone. ⋄2.6.3.0.2 Theorem. Proper-cone boundary. Suppose a nonzero pointΓ lies on the boundary ∂K of proper cone K in R n . Then it follows that theray {ζΓ | ζ ≥ 0} also belongs to ∂K .⋄Proof. By virtue of its propriety, a proper cone guarantees the existenceof a strictly supporting hyperplane at the origin. [30, Cor.11.7.3] 2.17 Hencethe origin belongs to the boundary of K because it is the zero-dimensionalexposed face. The origin belongs to the ray through Γ , and the ray belongsto K by definition (109). By the cone faces lemma, each and every nonemptyexposed face must include the origin. Hence the closed line segment 0Γ mustlie in an exposed face of K because both endpoints do by Definition 2.5.1.3.1.That means there exists a supporting hyperplane ∂H to K containing 0Γ .So the ray through Γ belongs both to K and to ∂H . ∂H must thereforeexpose a face of K that contains the ray; id est,{ζΓ | ζ ≥ 0} ⊆ K ∩ ∂H ⊂ ∂K (116)Proper cone {0} in R 0 has no boundary (106) becauserel int{0}={0}. (4)The boundary of any proper cone in R is the origin.The boundary of any proper cone whose dimension exceeds 1 can beconstructed entirely from an aggregate of rays emanating exclusively fromthe origin.2.17 Rockafellar’s corollary yields a supporting hyperplane at the origin to any convex conein R n not equal to R n .

56 CHAPTER 2. CONVEX GEOMETRY2.6.4 Extreme directionThe property extreme direction arises naturally in connection with thepointed closed convex cone K ⊂ R n , being analogous to extreme point.[30, §18, p.162] 2.18 An extreme direction Γ ε of pointed K corresponds to anedge that is a ray emanating from the origin. 2.19 Nonzero direction Γ ε inpointed K is extreme if and only if,ζ 1 Γ 1 +ζ 2 Γ 2 ≠ Γ ε ∀ ζ 1 , ζ 2 ≥ 0, ∀ Γ 1 , Γ 2 ∈ K\{ζΓ ε ∈ K | ζ ≥0} (117)In words, an extreme direction in a pointed closed convex cone is the directionof a ray that cannot be expressed as a conic combination of directions of raysin the cone distinct from the extreme ray.By (64), extreme direction Γ ε is not a point relatively interior to anyline segment in K\{ζΓ ε ∈ K | ζ ≥ 0} . An extreme direction is unique,but its vector representation Γ ε is not because any positive scaling of itproduces another vector in the same (extreme) direction. Thus, by analogy,the corresponding extreme ray {ζΓ ε | ζ ≥ 0} is not a ray relatively interiorto any plane segment 2.20 in K .The extreme directions of the positive semidefinite cone (§2.6.6),for example, comprise the set of all symmetric rank-one matrices{zz T ∈ S M | ‖z‖=1}. [55, §6] [58, §III]If closed convex K is not pointed, then it has no extreme directions andno vertex. [55, §1] Conversely, pointed closed convex K is the convex hull ofits extreme directions and vertex. [30, §18, p.167] That is the practical utilityof extreme direction; to facilitate construction of polyhedral sets, apparentfrom the extremes theorem:2.6.4.0.1 Theorem (Klee). Extremes. [31, §3.6] [30, §18, p.166](confer §2.2.2, §2.7.2) Any closed convex set containing no lines can beexpressed as the convex hull of all its extreme points and directions.⋄2.18 Indeed, Rockafellar suggests the mnemonic “extreme point at infinity”.2.19 An edge of a convex cone is not necessarily a ray. A convex cone may contain an edgethat is a line; e.g., a wedge-shaped polyhedral cone (K ∗ in Figure 2.21, p.108).2.20 A planar fragment; in this context, a planar cone.

2.6. CONES 57It follows that any element of a convex set containing no lines maybe expressed as a linear combination of its extreme elements; e.g.,Example 2.6.6.3.2.2.6.4.1 GeneratorsWhen the extremes theorem applies, the extreme points and directions aretermed generators of a convex set. An arbitrary collection of generators fora convex set includes its extreme elements as a subset; the set of all extremeelements of a convex set is a minimal set of generators for that set. Moregenerally, generators for a convex set comprise any collection of points anddirections whose convex hull constructs the set.When the convex set under scrutiny is a pointed closed convex cone,conic combination of the generators during its construction is implicit:Example. Application of extremes theorem. Given an extreme pointat the origin and N extreme directions, denoting the i th extreme directionby Γ i ∈ R n , then the convex hull is (60)P = { [0 Γ 1 Γ 2 · · · Γ N ]aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ] aζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ] b | b ≽ 0 } (118)⊂ R na closed convex set that is simply a conic hull like (63).2.6.5 Exposed directionDefinition. Exposed direction, exposed point of a pointed convex cone.[30, §18] (confer §2.5.1.0.1)• When a convex cone has a vertex, an exposed point, it resides at theorigin; there can be only one.• In the closure of a pointed convex cone, an exposed direction is thedirection of a one-dimensional exposed face that is a ray emanatingfrom the origin.• {exposed directions} = {extreme directions}△

58 CHAPTER 2. CONVEX GEOMETRYBCADFigure 2.12: Four rays on boundary of conic hull of closed convex set fromFigure 2.7 lifted to R 3 . Properties of extreme points carry over to extremedirections. [30, §18] Ray through point A is exposed hence extreme. Extremedirection B on cone boundary is not an exposed direction, although it belongsto the exposed face cone{A,B} . Extreme ray through C is exposed. PointD is neither an exposed or extreme direction although it belongs to a twodimensionalexposed face of the conic hull.0It follows from Lemma 2.6.3.0.1 for any pointed closed convex cone, thereis one-to-one correspondence of one-dimensional exposed faces with exposeddirections; id est, there is no one-dimensional exposed face that is not a raybase 0.The pointed closed convex cone EDM 2 , for example, is a ray in isomorphicR whose relative boundary (§2.5.1.3.1) is the origin. The conventionally exposeddirections of EDM 2 constitute the empty set ∅ ⊂ {extreme direction}.2.6.5.1 Connection between boundary and extremes2.6.5.1.1 Theorem. Exposed. [30, §18.7] (confer §2.6.4.0.1)Any closed convex set C containing no lines can be expressed as the closureof the convex hull of all its exposed points and directions.⋄

2.6. CONES 59From Theorem 2.6.4.0.1,rel∂C = C \ rel int C (106)= conv{exposed points and directions} \ rel int C= conv{extreme points and directions} \ rel int C(119)Thus each and every extreme point and direction of a convex set containingno lines (a set that is not a half-line) resides on its relative boundary becauseextreme points and directions do not belong to the relative interior bydefinition.The relationship between extreme sets and the relative boundary actuallygoes deeper: Any face F of convex set C (that is not C itself) belongs torel ∂ C , so dim F < dim C . [30, §18.1.3]2.6.5.1.2 Converse caveatYet this result does not mean each and every extreme point and directionis necessarily exposed, as might be erroneously inferred from the conventionalboundary definition (§2.5.1.3.1); although it does mean each and everyextreme point and direction belongs to some exposed face.Arbitrary points residing on the relative boundary of a convex set arenot necessarily exposed or extreme points. Similarly, the direction of anarbitrary ray, base 0, on the boundary of a convex cone is not necessarily anexposed or extreme direction. For the convex cone illustrated in Figure 2.3,for example, there are three two-dimensional exposed faces constituting theentire boundary, each composed of an infinity of rays. Yet there are onlythree exposed directions.Neither is an extreme direction on the boundary of a pointed convex conenecessarily an exposed direction. Lift the two-dimensional set in Figure 2.7,for example, into three dimensions such that no two points in the set arecollinear with the origin. Then its conic hull can have an extreme direction Bon the boundary that is not an exposed direction, illustrated in Figure 2.12.

60 CHAPTER 2. CONVEX GEOMETRY2.6.6 Positive semidefinite (PSD) coneDefinition. Positive semidefinite (PSD) cone. The set of all symmetricpositive semidefinite matrices of particular dimension M is called the positivesemidefinite cone:S M +∆= { A ∈ S M | A ≽ 0 }= { A ∈ S M | y T Ay ≥0, ‖y‖ = 1 }= ⋂ {A ∈ S M | 〈yy T , A〉 ≥ 0 } (120)‖y‖=1formed by the intersection of an infinite number of halfspaces (§2.3.1.1) invariable A , 2.21 each halfspace having partial boundary containing the originin isometrically isomorphic R M(M+1)/2 . It is a unique immutable proper conein the ambient space of symmetric matrices S M .The positive definite (full-rank) matrices comprise the cone interior, 2.22int S M + = { A ∈ S M | A ≻ 0 } (121)while all singular positive semidefinite matrices (having at least one 0 eigenvalue)reside on the cone boundary (e.g., Figure 2.13); (§A.7.4)∂ S M + = { A ∈ S M + | 〈zz T , A〉=0 for any ‖z‖ = 1 }= { A ∈ S M | min{λ(A) i , i=1... M} = 0 } (122)where λ(A) ∈ R M holds the eigenvalues of A . The only symmetric positivesemidefinite matrix having M 0-eigenvalues resides at the origin.(§A.7.2.0.1)△2.21 In higher dimensions, this cone is non-polyhedral because it is the intersection of aninfinite number of halfspaces whose partial boundaries have independent normals yy T .Because y T A y=y T A T y , matrix A is almost always assumed symmetric. (§A.2.1)2.22 The remaining inequalities in (120) also become strict for membership to the coneinterior.

2.6. CONES 61γsvec ∂ S 2 +[ α ββ γ]α√2βFigure 2.13: Truncated boundary of PSD cone in S 2 plotted in isometricallyisomorphic R 3 (via svec, (35)); courtesy, Alexandre W. d’Aspremont.Plotted is 0-contour of minimum eigenvalue (122). Entire boundary can beconstructed from an aggregate of rays (§2.6) emanating exclusively from theorigin, { √ζ[z12 2z1 z 2 z2 2 ] T | ‖z‖=1, ζ ≥0 } . In this dimension, each andevery ray on boundary is an extreme direction, but that is not the case inany higher dimension (confer Figure 2.3). (In any dimension, an extremedirection is a symmetric dyad; yy T .) PSD cone geometry is not as simplein higher dimensions [23, §II.12], although for real matrices it is self-dual(199) in ambient space of symmetric matrices. [58, §II] PSD cone has notwo-dimensional faces in any dimension, and its only extreme point residesat the origin.

62 CHAPTER 2. CONVEX GEOMETRY2.6.6.1 MembershipObserve the notation A ≽ 0 ; meaning, 2.23 matrix A is symmetric and belongsto the positive semidefinite cone in the subspace of symmetric matrices,whereas A ≻ 0 denotes membership to that cone’s interior. (§2.8.2)2.6.6.2 PSD cone is convexThe set of all positive semidefinite matrices forms a convex cone in the ambientspace of symmetric matrices because any pair satisfies definition (111);[28, §7.1] videlicet, for all ζ 1 , ζ 2 ≥ 0 and each and every A 1 , A 2 ∈ S M ,ζ 1 A 1 + ζ 2 A 2 ≽ 0 ⇐ A 1 ≽ 0, A 2 ≽ 0 (123)a fact easily verified by the definitive test for positive semidefiniteness of asymmetric matrix (§A):A ≽ 0 ⇔ x T Ax ≥ 0 for each and every ‖x‖ = 1 ; (124)id est, for A 1 , A 2 ≽ 0 and each and every ζ 1 , ζ 2 ≥ 0,ζ 1 x T A 1 x + ζ 2 x T A 2 x ≥ 0 for each and every normalized x ∈ R M . (125)The convex cone S M + is more easily visualized in the isomorphic vectorspace R M(M+1)/2 whose dimension is the number of free variables in a symmetricM×M matrix. When M = 2 the PSD cone is semi-infinite in expansein R 3 , having boundary illustrated in Figure 2.13. When M = 3 the PSDcone is six-dimensional, and so on.2.6.6.3 Faces of PSD cone and their dimension versus rankEach and every face of the positive semidefinite cone, having dimension lessthan that of the cone, is exposed. [59, §6] [24, §2.3.4] Because each and everyface of the positive semidefinite cone contains the origin (§2.6.3.0.1), eachface belongs to a subspace of the same dimension.2.23 For matrices, the notation A ≽ B denotes comparison on S M with respect to thepositive semidefinite cone; id est, A ≽B ⇔ A −B ∈ S M + , a generalization of comparisonon the real line. The symbol ≥ is reserved for scalar comparison on the real line R withrespect to the nonnegative real line R + as in a T y ≥ b, while a ≽ b denotes comparisonof vectors on R M with respect to the nonnegative orthant R M + .

2.6. CONES 63Given positive semidefinite matrix A ∈ S M + , define F ( S M + ∋A ) as thesmallest face of S M + containing A . Then, [50, §31.5.3] [52, §2.4]F ( S M + ∋A ) = {X ∈ S M + | N(X) ⊇ N(A)} (126)which is isomorphic with the convex cone S rank A+ . Thus the dimension of thesmallest face containing given matrix A isdim F ( S M + ∋A ) = rank(A)(rank(A) + 1)/2 (127)in isomorphic R M(M+1)/2 , and each and every face of S M + is isomorphic witha positive semidefinite cone of dimension ≤ M(M + 1)/2 . Observe not alldimensions are represented, and the only zero-dimensional face is the origin.The PSD cone has no facets, for example.For the positive semidefinite cone in isometrically isomorphic R 3 depictedin Figure 2.13, for example, rank 2 matrices belong to the interior of the facehaving dimension 3 (the entire closed cone), while rank 1 matrices belong tothe relative interior of a face having dimension 1 (the boundary constitutesall the one-dimensional faces, in this dimension, which are rays emanatingfrom the origin), and the only rank 0 matrix is the point at the origin (thezero-dimensional face).2.6.6.3.1 Extreme directions of the PSD coneBecause the positive semidefinite cone is pointed (§2.6.2.0.1), there is a oneto-onecorrespondence of one-dimensional faces with extreme directions inany dimension M ; id est, from the mentioned one-to-one correspondence ofextreme and exposed faces for S M + and the cone faces lemma (§2.6.3.0.1) itfollows there is no one-dimensional face of the PSD cone that is not a rayemanating from the origin. Symmetric dyads, therefore, constitute the set ofall extreme directions.Each and every extreme direction yy T makes the same angle with theidentity matrix in isomorphic R M(M+1)/2 , dependent only on dimension;videlicet, [49, p.162] = arccos( )〈yy T , I〉 1= arccos √‖yy T ‖ F ‖I‖ F M∀y ∈ R M (128)

64 CHAPTER 2. CONVEX GEOMETRY√2βsvec S 2 +(a)(b)α[ α ββ γ]Figure 2.14: (a) Projection of the PSD cone S 2 + , truncated above γ =1, onαβ-plane in isometrically isomorphic R 3 . View is from above with respect toFigure 2.13. (b) Truncated above γ = 2. From these plots we may infer,for example, the line { [ 0 1/ √ 2 γ ] T | γ ∈ R } intercepts the PSD cone atsome large value of γ ; in fact, γ =∞ .

2.6. CONES 652.6.6.3.2 Example. Positive semidefinite matrix from extreme directions.Diagonalizability (§A.5) of symmetric matrices yields the following results:Any symmetric positive semidefinite matrix (845) can be written in theformA = ∑ λ i z i zi T = ∑ a i a T i ≽ 0, λ ≽ 0 (129)iia conic combination of extreme directions (z i zi T or a i a T i ), where λ is a vectorof eigenvalues.If we limit consideration to all symmetric positive semidefinite matricesbounded such that trA=1, then any matrix from that set may be expressedas a convex combination of extreme directions;A = ∑ iλ i z i z T i , 1 T λ = 1, λ ≽ 0 (130)2.6.6.3.3 Example. Sets from the PSD cone. The setC = {X ∈ S n × x∈ R n | X ≽ xx T } (131)is convex because it has a Schur form; (§A.4)X − xx T ≽ 0 ⇔ f(X,x) ∆ =[ X xx T 1]≽ 0 (132)Set C is the inverse image (§2.1.0.10.2) of S n+1+ under the affine mapping f .The set {X ∈ S n × x ∈ R n | X ≼ xx T } is not convex, in contrast, having noSchur form. Yet for fixed x = x p , the set{X ∈ S n | X ≼ x p x T p } (133)is simply the negative semidefinite cone shifted to x p x T p .2.6.6.3.4 Example. Inverse image of the PSD cone.Now consider finding the set of all matrices X ∈ S N satisfyinggiven A,B∈ S N . Define the setAX + B ≽ 0 (134)X ∆ = {X | AX + B ≽ 0} ⊆ S N (135)

66 CHAPTER 2. CONVEX GEOMETRYwhich is the inverse image of the positive semidefinite cone under the affinetransformation g(X) = ∆ AX + B . Set X must therefore be convex byTheorem 2.1.0.10.2.Yet we would like a less amorphous characterization of this set, so insteadwe consider its vectorization (18) which is easier to visualize:wherevec g(X) = vec(AX) + vec B = (I ⊗A) vec X + vec B (136)I ⊗A ∆ = QΛQ T ∈ S N 2 (137)is a block-diagonal matrix formed by Kronecker product (§A.1 no.16,§D.1.2.1). Assignx ∆ = vec X ∈ R N 2b ∆ = vec B ∈ R N 2 (138)then make the equivalent problem: Findwherevec X = {x∈ R N 2 | (I ⊗A)x + b ∈ K} (139)K ∆ = vec S N + (140)is a proper cone isometrically isomorphic with the positive semidefinite conein the subspace of symmetric matrices; the vectorization of every element ofS N + . Utilizing the diagonalization (137),wherevec X = {x | ΛQ T x ∈ Q T (K − b)}= {x | ΦQ T x ∈ Λ † Q T (K − b)} ⊆ R N 2 (141)Φ ∆ = Λ † Λ (142)is a diagonal projection matrix whose entries are either 1 or 0 (§E.3). Wehave the complementary sumΦQ T x + (I − Φ)Q T x = Q T x (143)So, adding (I −Φ)Q T x to both sides of the membership within (141) admitsvec X = {x∈ R N 2 | Q T x ∈ Λ † Q T (K − b) + (I − Φ)Q T x}= {x | Q T x ∈ Φ ( Λ † Q T (K − b) ) ⊕ (I − Φ)R N 2 }= {x ∈ QΛ † Q T (K − b) ⊕ Q(I − Φ)R N 2 }= (I ⊗A) † (K − b) ⊕ N(I ⊗A)(144)

2.6. CONES 67where we used the facts: linear function Q T x in x is a bijection on R N 2 , andΦΛ † = Λ † .In words, set vec X is the vector sum of the translated PSD cone (mappedlinearly onto the rowspace of I⊗A (§E)) and the nullspace of I⊗A (synthesisof facts from §A.6.3 and §A.7.2.0.1). Should I ⊗A have no nullspace, thenvec X =(I ⊗A) −1 (K − b) which is the expected result. 2.6.6.4 Barvinok’s propositionThe following theory quantifies the rank of a positive semidefinite matrixwhose rank is the least of all matrices belonging to the intersection of thePSD cone with an affine subset; it is a least upper bound on rank:2.6.6.4.1 Proposition. Affine intersection with PSD cone.[23, §II.13] [60, §2.2] Consider finding a matrix G∈ S N satisfyingG ≽ 0, 〈A j , G〉 = b j , j =1... m (145)given nonzero linearly independent A j ∈ S N and real b j . Define the affinesubsetA ∆ = {G | 〈A j , G〉=b j , j =1... m} ⊆ S N (146)If the feasible set A∩S N + is nonempty, then there exists a matrix G∈ A∩S N +such that given a number of equalities m ,rankG(rankG + 1)/2 ≤ m (147)whence the least upper bound⌊√ ⌋ 8m + 1 − 1rankG ≤2or given desired rank instead, equivalently,(148)m < (rankG + 1)(rankG + 2)/2 (149)An extreme point of A∩S N + satisfies (148) and (149). A matrix G=X T Xis an extreme point if and only if the dimension of the smallest face of A∩S N +containing G is 0; [52, §2.4] id est, iffdim F ( (A ∩ S N +)∋G ) = rank(G)(rank(G)+1)/2−rank [ svec XA 1 X T svec XA 2 X T · · · svec XA m X T ](150)

68 CHAPTER 2. CONVEX GEOMETRYequals 0 in isomorphic R N(N+1)/2 .Now the feasible set A∩S N + is assumed bounded: Assume a given nonzeroleast upper bound ρ on rank, a number of equalitiesm=(ρ + 1)(ρ + 2)/2 (151)and matrix dimension N ≥ ρ + 2 ≥ 3. If the feasible set is nonempty andbounded, then there exists a matrix G∈ A ∩ S N + such thatrankG ≤ ρ (152)This represents a tightening of the least upper bound; a reduction by exactly1 of the bound provided by (148) given the same specified number ofequalities; id est,rankG ≤√ 8m + 1 − 12− 1 (153)⋄When the feasible set A∩S N + is known a priori to consist of only a singlepoint, then Barvinok’s proposition provides the greatest upper bound on itsrank. The feasible set can be a single nonzero point only if the number oflinearly independent hyperplanes m constituting A satisfies 2.24N(N + 1)/2 − 1 ≤ m ≤ N(N + 1)/2 (154)2.6.7 Conic independence (c.i.)In contrast to extreme direction, the property conically independent directionis more generally applicable, inclusive of all closed convex cones (not onlypointed closed convex cones). Similar to the definition for linear independence,arbitrary given directions {Γ i ∈ R n , i=1... N} are conically independentif and only if, for all ζ i ≥0 ,Γ i ζ i + · · · + Γ j ζ j − Γ l ζ l = 0, i≠ · · · ≠j ≠l = 1... N (155)has only the trivial solution; in words, iff no direction from the given setcan be expressed as a conic combination of those remaining. (Figure 2.15,2.24 N(N + 1)/2 − 1 independent hyperplanes in R N(N+1)/2 can make a line tangent tosvec ∂ S N + at a point because all one-dimensional faces of S N + are exposed. Because a pointedconvex cone has only one vertex, the origin, there can be no intersection of svec∂S N + withany higher-dimensional affine subset A that will make a nonzero point.

2.6. CONES 690 0 0a b cFigure 2.15: Vectors in R 2 : (a) conically independent, (b) conically dependent.Neither example exhibits linear independence.for example. A MATLAB implementation of test (155) is given in §F.2.) Itis evident that linear independence (l.i.) of N directions implies their conicindependence;• l.i. ⇒ c.i.Arranging any set of generators for a particular convex cone in a matrixcolumnar,X ∆ = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (156)then the relationship l.i.⇒c.i. implies the number of l.i. generators in thecolumns of X cannot exceed the number of c.i. generators. Denoting byk the number of conically independent generators contained in X, we havethe most fundamental rank inequality for convex conesdim aff K = dim aff[0 X ] = rankX ≤ k ≤ N (157)Whereas N directions in n dimensions can no longer be linearly independentonce N exceeds n , conic independence remains possible:2.6.7.1 Table: Maximum number conically independent directionsn supk (pointed) sup k (not pointed)0 0 01 1 22 2 43.∞.∞.

70 CHAPTER 2. CONVEX GEOMETRYAssuming veracity of this table, there is an apparent vastness betweentwo and three dimensions. Conic independence is certainly one convex ideathat cannot be completely explained by a two-dimensional picture. [23, p.vii]From this table it is also evident that dimension of Euclidean space cannotexceed the number of conically independent directions possible;• n ≤ sup kWe suspect the number of conically independent columns (rows) of X tobe the same for X †T .• The columns (rows) of X are c.i. ⇔ the columns (rows) of X †T are c.i.Proof. Pending.2.6.7.2 Pointed closed convex K and conic independenceThe following bullets can be derived from definitions (117) and (155) inconjunction with the extremes theorem (§2.6.4.0.1):The set of all extreme directions from a pointed closed convex coneK ⊂ R n is not necessarily a linearly independent set, yet it must be a conicallyindependent set; (compare Figure 2.3, p.35, with Figure 2.16(a))• extreme directions ⇒ c.i.Yet any collection of n or fewer extreme directions from pointed closedconvex cone K must be linearly independent;• ≤ n extreme directions ⇒ l.i.Conversely, when a conically independent set of directions from pointedclosed convex cone K is known a priori to comprise generators, then alldirections from that set must be extreme directions of the cone;• c.i. generators of pointed closed convex K ⇒ extreme directionsBarker & Carlson [55, §1] call the extreme directions a minimal generatingset. A minimal set of generators is therefore a conically independent set ofgenerators, and vice versa for a pointed closed convex cone. 2.252.25 The converse does not hold for a non-pointed closed convex cone as Table 2.6.7.1implies.

2.6. CONES 71K(a)K(b)∂K ∗Figure 2.16: (a) A pointed polyhedral cone (drawn truncated) in R 3 havingsix facets. The extreme directions, corresponding to six edges emanatingfrom the origin, are generators for this cone; not linearly independent butthey must be conically independent. (b) The boundary of dual cone K ∗(drawn truncated) is now added to the drawing of same K . K ∗ is polyhedral,proper, and has the same number of extreme directions as K has facets.

72 CHAPTER 2. CONVEX GEOMETRY2.6.8 When extreme means exposedFor any convex polyhedral set in R n having nonempty interior, the distinctionbetween the terms extreme and exposed vanishes [31, §2.4] [50, §2.2] for facesof all dimensions except n ; their meanings become equivalent as we saw inFigure 2.2 (discussed in §2.5.1.2). In other words, each and every face of anypolyhedral set (except the set itself) can be exposed by a hyperplane, andvice versa; e.g., Figure 2.3.Lewis [59, §6] [24, §2.3.4] claims nonempty extreme proper subsets and theexposed subsets coincide for S n + ; id est, each and every face of the positivesemidefinite cone, whose dimension is less than the dimension of the cone,is exposed. A more general discussion of cones having this property can befound in [61]; e.g., the Lorentz cone (114) [62, §II.A].2.7 Convex polyhedraEvery polyhedron, such as the convex hull (60) of a bounded list X , canbe expressed as the solution set of a finite system of linear equalities andinequalities, and vice versa. [50, §2.2]2.7.0.0.1 Definition. Convex polyhedra, halfspace-description. [1, §2.2.4]A convex polyhedron is the intersection of a finite number of halfspaces andhyperplanes;P = {y | Ay ≽ b, Cy = d} ⊆ R n (158)where the coefficients A and C generally denote matrices. Each row ofC is a vector normal to a hyperplane, while each row of A is a vectorinward-normal to a hyperplane partially bounding a halfspace. △By the halfspaces theorem in §2.3.1.1.1, a polyhedron thus described isa closed convex set having possibly empty interior; e.g., Figure 2.2. Convexpolyhedra 2.26 are finite-dimensional comprising all affine sets (§2.2.1),polyhedral cones, line segments, rays, halfspaces, convex polygons, solids[57, def.104/6, p.343], polychora, polytopes, 2.27 etcetera.2.26 We consider only convex polyhedra throughout, but acknowledge the existence ofconcave polyhedra. [48, Kepler-Poinsot Solid]2.27 Some authors distinguish bounded polyhedra via the designation polytope. [50, §2.2]

2.7. CONVEX POLYHEDRA 73It follows from definition (158) by exposure that each face of a convexpolyhedron is a convex polyhedron.The projection of any polyhedron on a subspace remains a polyhedron.More generally, the image of a polyhedron under any linear transformationis a polyhedron. [23, §I.9]When b and d in (158) are 0, the resultant is a polyhedral cone. The setof all polyhedral cones is a subset of convex cones:2.7.1 Polyhedral coneFrom §2.6, the number of hyperplanes and halfspaces constituting aconvex cone is possibly but not necessarily infinite. When the number isfinite, the convex cone is termed polyhedral. That is the primary distinguishingfeature between the set of all convex cones and polyhedra; allpolyhedra, including polyhedral cones, are finitely generated [30, §19]. Wedistinguish polyhedral cones in the set of all convex cones for this very reason.Definition. Polyhedral cone, halfspace-description. 2.28 (confer (165))A polyhedral cone is the intersection of a finite number of halfspaces andhyperplanes about the origin;K = {y | Ay ≽ 0, Cy = 0} ⊆ R n (a)= {y | Ay ≽ 0, Cy ≽ 0, Cy ≼ 0} (b)⎧ ⎡ ⎤ ⎫⎨ A ⎬=⎩ y | ⎣ C ⎦y ≽ 0(c)⎭−C(159)where coefficients A and C generally denote matrices of finite dimension.Each row of C is a vector normal to a hyperplane containing the origin,while each row of A is a vector inward-normal to a hyperplane containingthe origin and partially bounding a halfspace.△A polyhedral cone thus defined is closed, convex, possibly has emptyinterior, and only a finite number of generators (§2.6.4.1), and vice versa.(Minkowski/Weyl) [31, §2.8]2.28 Rockafellar [30, §19] proposes affine sets be handled via complementary pairs of affineinequalities; e.g., Cy ≽d and Cy ≼d .

74 CHAPTER 2. CONVEX GEOMETRYFrom the definition it follows that any single hyperplane through theorigin, or any halfspace partially bounded by a hyperplane through the originis a polyhedral cone. The most familiar example of polyhedral cone is anyquadrant (or orthant, §2.1.0.7.1) generated by the Cartesian axes. Esotericexamples of polyhedral cone include the point at the origin, any line throughthe origin, any ray having the origin as base such as the nonnegative realline R + in subspace R , polyhedral variations of the proper Lorentz cone(confer (114)), (for l=1 or ∞){[K ∆ xl =t]}∈ R n × R | ‖x‖ l ≤ t(160)any subspace, and R n . More examples are illustrated in Figure 2.16 andFigure 2.3.2.7.2 Vertices of convex polyhedraBy definition, a vertex (§2.5.1.0.1) always lies on the relative boundary of aconvex polyhedron. [57, def.115/6, p.358] In Figure 2.2, each vertex of thepolyhedron is located at the intersection of three or more facets, and everyedge belongs to precisely two facets [23, §VI.1, p.252]. In Figure 2.3, the onlyvertex of that polyhedral cone lies at the origin.The set of all polyhedral cones is clearly a subset of convex polyhedraand a subset of convex cones. Not all convex polyhedra are bounded,evidently, neither can they all be described by the convex hull of abounded set of points as we defined it in (60). Hence we propose a universalvertex-description of polyhedra in terms of that same finite-length list X (54):Definition. Convex polyhedra, vertex-description. (confer §2.6.4.0.1)Denote the truncated a-vector,[ ]aia i:l = .(161)a lBy discriminating a suitable finite-length generating list (or set) arrangedcolumnar in X ∈ R n×N , then any particular polyhedron may be described,P = { Xa | a T 1:k1 = 1, a m:N ≽ 0, {1... k} ∪ {m ...N} = {1... N} } (162)

2.7. CONVEX POLYHEDRA 75where 0 ≤ k ≤ N and 1 ≤ m ≤ N + 1 . Setting k=0 eliminates the affineequality constraint. Setting m = N + 1 eliminates the inequality. △The coefficient indices in (162) may or may not be overlapping, but allthe coefficients are assumed constrained. From (56), (60), and (63), wesummarize how the constraints may be applied;affine sets −→ a T 1:k 1 = 1polyhedral cones −→ a m:N ≽ 0}←− convex hull (m ≤ k) (163)It is always possible to describe a convex hull in the region of overlappingindices because, for 1 ≤ m ≤ k ≤ N ,{a m:k | a T m:k1 = 1, a m:k ≽ 0} ⊆ {a m:k | a T 1:k1 = 1, a m:N ≽ 0} (164)Members of a generating list are not necessarily vertices of the correspondingpolyhedron; certainly true for (60) and (162), some subset of list membersreside in the polyhedron’s relative interior. Conversely, when boundedness(60) applies, the convex hull of the vertices is a polyhedron identical to theconvex hull of the generating list.2.7.2.1 Vertex-description of polyhedral coneGiven closed convex cone K in a subspace of R n having any set of generatorsfor it arranged in a matrix X ∈ R n×N as in (156), then that cone is describedsetting m=1 and k=0 in vertex-description (162):K = cone(X) = {Xa | a ≽ 0} ⊆ R n (165)a conic hull, like (63), of N generators.2.7.2.1.1 Pointedness. [31, §2.10] Assuming all generators constitutingthe columns of X ∈ R n×N are nonzero, K is pointed (§2.6.2.0.1) if and onlyif there is no nonzero a ≽ 0 that solves Xa=0; id est, iff N(X) ∩ R N + = 0.(If rankX = n , then the dual cone K ∗ is pointed. (176))A polyhedral proper cone in R n must have at least n generators that arelinearly independent, or be the intersection of at least n halfspaces whosepartial boundaries have normals that are linearly independent. Otherwise,the cone will contain at least one line and there can be no vertex; id est, thecone cannot otherwise be pointed.

76 CHAPTER 2. CONVEX GEOMETRYFor any pointed polyhedral cone, there is a one-to-one correspondence ofone-dimensional faces with extreme directions.Examples of pointed closed convex cones K are not limited to polyhedralcones: the origin, any 0-based ray in a subspace, any two-dimensional V-shaped cone in a subspace, the Lorentz (ice-cream) cone (including the interior)and its polyhedral variations, the cone of Euclidean distance matricesEDM N in S N 0 , the proper cones: S M + in ambient S M , any orthant in R n orR m×n ; e.g., the nonnegative real line R + in vector space R .2.7.3 Unit simplexA peculiar convex subset of the nonnegative orthant having halfspacedescriptionS ∆ = {s | s ≽ 0, 1 T s ≤ 1} ⊆ R n + (166)is a unique bounded convex polyhedron called unit simplex (Figure 2.17)having nonempty interior, n + 1 vertices, and dimension [1, §2.2.4]dim S = n (167)The origin supplies one vertex while heads of the standard basis [28][26] {e i , i=1... n} in R n constitute those remaining; 2.29 thus its vertexdescription:S= conv {0, {e i , i=1... n}}= { [0 e 1 e 2 · · · e n ]a | a T 1 = 1, a ≽ 0 } (168)The unit simplex comes from a class of general polyhedra called simplex,having vertex-description: [63] [30] [32] [50]conv{x l ∈ R n } | l = 1... k+1, dim aff{x l }=k , n≥k (169)So defined, a simplex is a closed bounded convex set having possibly emptyinterior.2.29 In R 0 the unit simplex is the point at the origin, in R the unit simplex is the linesegment [0,1], in R 2 it is a triangle and its relative interior, in R 3 it is the convex hull ofa tetrahedron (Figure 2.17), in R 4 it is the convex hull of a pentatope [48], and so on.

2.8. DUAL CONE, GENERALIZED INEQUALITY, 772.7.3.0.1 Definition. Simplicial cone. A polyhedral proper (§2.6.2.0.2)cone K in R n is called simplicial iff K has exactly n extreme directions;[62, §II.A] equivalently, iff proper K has exactly n linearly independentgenerators contained in any given set of generators.△There are an infinite variety of simplicial cones in R n ; e.g., Figure 2.3,Figure 2.18, Figure 2.24. Any orthant is simplicial.2.7.4 Converting between descriptionsConversion between halfspace-descriptions (158) (159) and equivalent vertexdescriptions(60) (162) is nontrivial, in general, [64] [50, §2.2] but the conversionis easy for simplices. [1, §2.2] Nonetheless, we tacitly assume the twodescriptions to be equivalent. [30, §19, thm.19.1] We explore conversions in§2.8.2.1 and §2.9:2.8 Dual cone, generalized inequality,biorthogonal expansionThese three concepts, dual cone, generalized inequality, and biorthogonalexpansion, are inextricably melded; meaning, it is difficult to completelydiscuss one without mentioning the others. Knowledge of the dual cone iscrucial to efficient methods for projection on convex cones (§E.8.1.0.1) suchas might occur in schemes for rank minimization (§7). The dual cone iscritical in tests for convergence in contemporary primal/dual methods fornumerical solution of conic problems. [4] [6, §4.5]One way to think of a pointed closed convex cone is as a new kind of coordinatesystem whose basis is generally nonorthogonal; a conic system verymuch like the Cartesian system whose analogous cone is the nonnegativeorthant. The generalized inequality ≽ K is a formalized means to determinemembership to any pointed closed convex cone, while the biorthogonal expansionis simply a formulation for determining coordinates in any pointed conicsystem. When the underlying cone is the nonnegative orthant, then thesethree concepts come into alignment with the familiar Cartesian prototype.

78 CHAPTER 2. CONVEX GEOMETRY2.8.1 Dual coneFor any cone K (convex or not), the dual cone [1, §2.6.1]K ∗ = {y ∈ R n | 〈y, x〉 ≥ 0 for all x ∈ K} (170)is a unique cone 2.30 that is always closed and convex because it is an intersectionof halfspaces (halfspaces theorem, §2.3.1.1.1) whose partial boundarieseach contain the origin, each having inward-normal x belonging to K ; e.g.,Figure 2.19(a).When cone K is convex, the dual cone K ∗ is the union of each and everyvector y inward-normal to a hyperplane supporting or containing K ; e.g.,Figure 2.19(b). When K is represented by a halfspace description such as(159), for example, where⎡a T1A =∆ ⎣ .⎤⎡c T1⎦∈ R m×n , C =∆ ⎣ .⎤⎦∈ R k×n (171)a T mc T kthen the dual cone can be represented as the conic hullK ∗ = cone{a 1 ,..., a m , ±c 1 ,..., ±c k } (172)a vertex-description, because each and every conic combination of normalsfrom the halfspace description of K yields another inward-normal to a hyperplanesupporting or containing K .As defined, dual cone K ∗ exists even when the affine hull of the originalcone is a proper subspace; id est, even when the original cone has emptyinterior. Rockafellar formulates the dimension of K and K ∗ . [30, §14] 2.31To further motivate our understanding of the dual cone, consider theease with which convergence can be observed in the following optimizationproblem (p):Example. Dual problem. Essentially, duality theory concerns therepresentation of a given optimization problem as half a minimax problem2.30 The dual cone is the negative of the polar cone defined by some authors; K ∗ = −K ◦ .[29] [30] [65] [23] [31]2.31 His monumental work Convex Analysis has not one figure or illustration. See[23, §II.16] for a good illustration of Rockafellar’s recession cone [33].

2.8. DUAL CONE, GENERALIZED INEQUALITY, 79(inf x sup y f(x,y)) [30, §36] [1, §5.4] whose saddle-value [66] exists. [34, p.3]Consider primal conic problem (p) and its corresponding dual problem (d):[67, §3.3.1] [52, §2.1] for matrix constant C ,(p)minimize α T xxsubject to x ∈ KCx = βmaximize β T zy,zsubject to y ∈ K ∗C T z + y = α(d) (173)Observe the dual problem is also conic, and its objective 2.32 never exceedsthat of the primal;α T x ≥ β T zx T (C T z + y) ≥ (Cx) T zx T y ≥ 0(174)which is true by definition (170). Under the sufficient condition that (p)is convex and satisfies Slater’s condition, 2.33 then each problem (p) and (d)achieves the same optimal value of its objective and so (p) and (d) are togetherequivalent to the minimax problemminimize α T x − β T zx,y,zsubject to x ∈ K , y ∈ K ∗Cx = β , C T z + y = α(175)Each problem (p) and (d) is called a strong dual to the other because theduality gap is 0; the optimal value of the objective in the minimax problem(175) is always the saddle-value 0 (regardless of the particular convex coneK and other problem parameters). [69, §3.2]2.8.1.1 Key properties of dual cone• For any cone, (−K) ∗ = −K ∗• For any cones K 1 and K 2 , K 1 ⊆ K 2 ⇒ K ∗ 1 ⊇ K ∗ 2 [31, §2.7]2.32 The objective is the function that is argument to minimization or maximization.2.33 In this context, (p) is convex if K is a convex cone. Slater’s condition is satisfiedwhenever any primal strictly feasible point exists; id est, for any point feasible with theaffine equality (or affine inequality) constraint functions and relatively interior to K . If Kis polyhedral, then Slater’s condition is satisfied when any feasible point exists. [1, §5.2][4, §1.3.8] [65, p.325] [68, p.485]

80 CHAPTER 2. CONVEX GEOMETRY• (Symmetry) When K is any convex cone, the dual of the dual coneis the closure of the original cone; K ∗∗ = K . When K is closed andconvex, then the dual of the dual cone is the original cone; K ∗∗ = K .[30, §14]• If any cone K has nonempty interior, then K ∗ is pointed;K nonempty interior ⇒ K ∗ pointed (176)Conversely, if the closure of any convex cone K is pointed, then K ∗ hasnonempty interior;K pointed ⇒ K ∗ nonempty interior (177)• [30, §16.4.2] [70, §4.6] For convex cones K 1 and K 2(dual vector-sum)(K 1 + K 2 ) ∗ = K ∗ 1 ∩ K ∗ 2 (178)For closed convex cones K 1 and K 2(closure of vector sum of duals). 2.34• K is proper if and only if K ∗ is proper.(K 1 ∩ K 2 ) ∗ = K ∗ 1 + K ∗ 2 (179)• K is polyhedral if and only if K ∗ is polyhedral. [31, §2.8]• K is simplicial if and only if K ∗ is simplicial. (§2.9.2.1)• K ⊞ −K ∗ = R n ⇒ K is closed convex [31, thm.2.7.7] (p.447)• K ⊞ −K ∗ = R n ⇐ K is polyhedral [31, §2.8.7]2.34 These parallel the analogous results for subspaces R 1 , R 2 ⊆ R n , [70, §4.6]R ⊥⊥ = R for any subspace R.(R 1 + R 2 ) ⊥ = R ⊥ 1 ∩ R ⊥ 2(R 1 ∩ R 2 ) ⊥ = R ⊥ 1 + R⊥ 2

2.8. DUAL CONE, GENERALIZED INEQUALITY, 812.8.1.1.1 Examples of dual coneWhen cone K is a halfspace in R n with n>0 (Figure 2.20 for example),the dual cone K ∗ is a ray (base 0) belonging to that halfspace but orthogonalto its bounding hyperplane (that contains the origin), and vice versa.When K is a subspace, K ∗ is its orthogonal complement, and vice versa.(§E.8.1.1)When K is R n , K ∗ is the point at the origin, and vice versa.When convex cone K is a closed halfplane in R 3 (Figure 2.22), it is neitherpointed or of nonempty interior; hence, the dual cone K ∗ can be neither ofnonempty interior or pointed.When K is any particular orthant in R n , the dual cone is identical; id est,K = K ∗ .When K is any quadrant in subspace R 2 , K ∗ is a wedge-shaped polyhedralcone in R 3 ; e.g., for K equal to quadrant I , K ∗ is the union of two orthants:R 3 ++ R 3 3- .When K is a polyhedral variation of the Lorentz cone K l (160), the dualis the polyhedral proper cone K q : for l=1 or ∞ ,{[ ]}K q = K ∗ xl = ∈ R n × R | ‖x‖tq ≤ t (180)where ‖x‖ q is the dual norm and 1/l + 1/q = 1.2.8.2 Abstractions of Farkas’ lemma2.8.2.0.1 Corollary. Generalized inequality and membership relation.[29, §A.4.2] Let K be any closed convex cone and K ∗ its dual, and let x andy belong to a vector space R n . Thenx ∈ K ⇔ 〈y, x〉 ≥ 0 for all y ∈ K ∗ (181)which is a simple translation of the Farkas lemma as in [30, §22] to thelanguage of convex cones, and a generalization of the well-known Cartesianfactx ≽ 0 ⇔ 〈y, x〉 ≥ 0 for all y ≽ 0 (182)for which implicitly K = K ∗ = R n + the nonnegative orthant. By closure wehave symmetry:y ∈ K ∗ ⇔ 〈y, x〉 ≥ 0 for all x ∈ K (183)

82 CHAPTER 2. CONVEX GEOMETRYmerely, a statement of fact by definition of the dual cone (170).When K and K ∗ are pointed closed convex cones, membership relation(181) is often written as the generalized inequalityx ≽K0 ⇔ 〈y, x〉 ≥ 0 for all y ≽ 0 (184)K ∗meaning that the coordinates for biorthogonal expansion of x [43] can benonnegative when x belongs to K , and must be nonnegative when x belongsto simplicial K . 2.35 By symmetryy ≽K ∗0 ⇔ 〈y, x〉 ≥ 0 for all x ≽K0 (185)⋄When pointed closed convex K is clear from context, shorthand is prevalent:x ≽ 0 ⇔ x ∈ K(186)x ≻ 0 ⇔ x ∈ rel int KStrict versions are also useful; e.g., for any closed convex cone K havingnonempty interior,x ∈ int K ⇔ 〈y, x〉 > 0 for all y ∈ K ∗ , y ≠ 0x ∈ K , x ≠ 0 ⇔ 〈y, x〉 > 0 for all y ∈ int K ∗ (187)By symmetry, we also have their duals.2.8.2.0.2 Null certificateIf x p /∈ K , the construction in Figure 2.19(b) suggests there exists a hyperplanehaving inward-normal belonging to dual cone K ∗ separating x p fromK ; indeed, for closed convex cone K ,x p /∈ K ⇔ ∃ y ∈ K ∗ 〈y, x p 〉 < 0 (188)The existence of any one such y is a certificate of null membership.2.35 A simplicial cone K and its (simplicial) dual K ∗ ⊂ R n are polyhedral proper cones(e.g., Figure 2.24, p.111), but not the converse; id est, polyhedral proper cones are notnecessarily simplicial.

2.8. DUAL CONE, GENERALIZED INEQUALITY, 832.8.2.1 Discretization2.8.2.1.1 Dual halfspace-description. The halfspace-description ofthe corresponding dual cone is equally simple as vertex-description (165) fora closed convex cone: By definition (170),K ∗ = { y ∈ R n | z T y ≥ 0 for all z ∈ K }= { y ∈ R n | z T y ≥ 0 for all z = Xa , a ≽ 0 }= { y ∈ R n | a T X T y ≥ 0, a ≽ 0 }= { y ∈ R n | X T y ≽ 0 } (189)(confer (159)) that follows from the generalized inequality and membershipcorollary (182). The semi-infinity of tests specified by all z ∈ K has beenreduced to a discrete set of generators constituting the columns of X; id est,the test has been discretized.Whenever K is known to be closed and convex, then the converse mustalso hold; id est, given any set of generators for K ∗ arranged columnar in Y ,then the consequent vertex-description of the dual cone connotes a halfspacedescriptionfor K : [31, §2.8]K ∗ = {Y a | a ≽ 0} ⇔ K ∗∗ = K = { y | Y T y ≽ 0 } (190)2.8.2.1.2 First dual-cone formula. From these two results we deducea general principle: From any given vertex-description of a convex cone K ,a halfspace-description of the dual cone K ∗ is immediate by matrix transposition.Various converses are just a little bit trickier. (§2.9)We deduce further: For any polyhedral cone K , the dual cone K ∗ is alsopolyhedral and K ∗∗ = K . [31, §2.8]The generalized inequality and membership corollary is discretized in thefollowing theorem [55, §1] 2.36 that follows directly from (189) and (190):2.8.2.1.3 Theorem. Discrete membership.Given any discrete set of generators (§2.6.4.1) denoted by G(K) for closedconvex cone K , and denoted by G(K ∗ ) for its dual, let x and y belongto vector space R n . Then discretization of the generalized inequality andmembership corollary is necessary and sufficient for certifying membership:x ∈ K ⇔ 〈γ ∗ , x〉 ≥ 0 for all γ ∗ ∈ G(K ∗ ) (191)2.36 Barker & Carlson state the theorem only for the pointed closed convex case.

84 CHAPTER 2. CONVEX GEOMETRYy ∈ K ∗ ⇔ 〈γ , y〉 ≥ 0 for all γ ∈ G(K) (192)⋄2.8.2.1.4 Dual of pointed polyhedral coneIn a subspace of R n , now we consider a pointed polyhedral cone K given interms of its extreme directions Γ i arranged columnar in X ;X = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (156)The extremes theorem (§2.6.4.0.1) provides the vertex-description of apointed polyhedral cone in terms of its finite number of extreme directionsand its lone vertex at the origin:

2.8. DUAL CONE, GENERALIZED INEQUALITY, 85Definition. Pointed polyhedral cone, vertex-description encore.(confer (165) (118)) Given pointed polyhedral cone K in a subspace of R n ,denoting its i th extreme direction by Γ i ∈ R n arranged in a matrix X as in(156), then that cone may be described, (60) (166)K = { [0 X] aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= Xaζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{ } (193)= Xb | b ≽ 0 ⊆ Rnthat is simply a conic hull (like (63)) of a finite number N of directions.△Whenever K is pointed, closed, and convex, the dual cone K ∗ has ahalfspace-description in terms of the extreme directions Γ i in K :K ∗ = { y | γ T y ≥ 0 for all γ ∈ {Γ i , i=1... N} ⊆ rel∂K } (194)= { y | X T y ≽ 0 } ⊆ R n (195)because when {Γ i } constitutes any set of generators for K , the discretizationresult (189) allows relaxation of the requirement ∀x ∈ K in (170) to∀γ∈{Γ i } in (194) directly. 2.37 That dual cone so defined is unique, identicalto (170), and has nonempty interior because K is assumed pointed; but K ∗is not necessarily pointed unless K has nonempty interior (§2.8.1.1), and K ∗is polyhedral whenever the number of generators N is finite.2.8.2.2 Facet normalWe see from (195) that the conically independent generators of cone K(namely, the extreme directions of pointed closed convex K constituting thecolumns of X) each define an inward-normal to a hyperplane supporting K ∗and exposing a dual facet when N is finite. Were K ∗ pointed and finitelygenerated, then by symmetry the dual statement would also hold; id est, theextreme directions of pointed K ∗ each define a hyperplane that supports Kand exposes a facet.We may conclude the extreme directions of polyhedral proper K are respectivelyorthogonal to the facets of K ∗ ; likewise, the extreme directions ofpolyhedral proper K ∗ are respectively orthogonal to the facets of K .2.37 We learned from (189) that any discrete set of generators for K including its extremedirections can be used to make a halfspace-description of K ∗ . The extreme directionsconstitute a minimal set of generators. Apply this result to Figure 2.23, for example.

86 CHAPTER 2. CONVEX GEOMETRY2.8.2.3 Pointed closed convex cone and partial orderA pointed closed convex cone K induces a partial order [48] on R n or R m×n[55, §1], respectively defined by vector or matrix inequality;x ≼ z ⇔ z − x ∈ K (196)x ≺ z ⇔ z − x ∈ rel int K (197)Strict inequality x ≻ 0 means that coordinates for biorthogonal expansionof x can be positive when x belongs to K , and must be positive wheneverx belongs to simplicial K .From the discrete membership theorem it directly follows, for example,x ≼ z ⇔ x i ≤ z i ∀i (198)when comparison is with respect to the nonnegative orthant K= R n + .Comparable points (illustrated in Figure 2.23) and the minimumelement 2.38 of some vector- or matrix-valued set are thus well defined. Moreproperties of partial ordering: such as reflexivity (x ≼ x), antisymmetry(x ≼ z , z ≼ x ⇒ x = z), transitivity (x ≼ y , y ≼ z ⇒ x ≼ z), additivity(x ≼ z , u ≼ v ⇒ x+u ≼ z +v), strict properties, and preservation undernonnegative scaling or limiting operations are cataloged in [1, §2.4.1] withrespect to proper cones. 2.392.8.2.4 Dual PSD cone and generalized inequalityThe dual positive semidefinite cone K ∗ is confined to S M by convention;S M +∗ ∆ = {Y ∈ S M | 〈X,Y 〉 ≥ 0 for all X ∈ S M + } = S M + (199)The positive semidefinite cone is self-dual in the ambient space of symmetricmatrices [71] [58, §II] [1, §2.6.1], K = K ∗ .Generalized inequality with respect to the positive semidefinite cone inthe ambient space of symmetric matrices can therefore be simply stated:(Fejér)X ≽ 0 ⇔ tr(Y T X) ≥ 0 for all Y ≽ 0 (200)2.38 We say x ∈ C is the (unique) minimum element of C with respect to K if and only iffor each and every z ∈ C we have x ≼ K z ; equivalently, iff C ⊆ x + K . [1, §2.6.3]2.39 We distinguish pointed closed convex cones here because the generalized inequalityand membership corollary remains intact. [29, §A.4.2.7]

2.8. DUAL CONE, GENERALIZED INEQUALITY, 87The trace is introduced because membership to this cone can be determinedin the isometrically isomorphic Euclidean space R M2 via (19). (§2.1.1.1) Bythe two interpretations in §2.8.1, matrix Y can be interpreted as inwardnormalto a hyperplane supporting the positive semidefinite cone.The fundamental statement of positive semidefiniteness, y T Xy ≥ 0 ∀y(§A.3.0.0.1), is a particular instance of this generalized inequality with respectto the positive semidefinite cone (200):X ≽ 0 ⇔ 〈yy T , X〉 ≥ 0 ∀yy T (≽ 0) (201)Discretization (§2.8.2.1.3) allows replacement of matrix Y with this minimalset of generators comprised of all the extreme directions.2.8.2.4.1 Example. Linear matrix inequality.Consider a peculiar vertex-description for a convex cone defined over thepositive semidefinite cone (instead of the nonnegative orthant as in def.(63)):for X ∈ S n given A j ∈ S n , j =1... m ,⎧⎡⎨K = ⎣⎩⎧⎡⎨= ⎣⎩〈A 1 , X〉.〈A m , X〉⎤ ⎫⎬⎦ | X ≽ 0⎭ ⊆ Rm⎤ ⎫vec(A 1 ) T⎬. ⎦vec X | X ≽ 0vec(A m ) T ⎭∆= {A vec X | X ≽ 0}(202)where A ∈ R m×n2 , and vectorization vec is defined in (18). K is indeed aconvex cone because by (111) (§2.6.2)A vec X p1 , A vec X p2 ∈ K ⇒ A(ζ vec X p1 + ξ vec X p2 ) ∈ K for all ζ,ξ ≥ 0(203)since a nonnegatively weighted sum of positive semidefinite matrices mustbe positive semidefinite. (881) Now consider the dual cone:K ∗ = {y | 〈A vec X , y〉 ≥ 0 for all X ≽ 0} ⊆ R m= { y | 〈vec X , A T y〉 ≥ 0 for all X ≽ 0 }= { y | vec −1 (A T y) ≽ 0 } (204)

88 CHAPTER 2. CONVEX GEOMETRYthat follows from (200) and leads to an equally peculiar halfspace-descriptionK ∗ = {y |m∑y j A j ≽ 0} (205)j=1The summation inequality with respect to the positive semidefinite cone isknown as a linear matrix inequality. [3] [5] [72]2.8.3 Biorthogonal expansion, by example2.8.3.0.1 Example. Relationship to dual polyhedral cone.The proper cone K illustrated in Figure 2.23 induces a partial order on R 2 .All points greater than x with respect to K , for example, are contained inthe translated cone x + K . The extreme directions Γ 1 and Γ 2 of K do notmake an orthogonal set; neither do extreme directions Γ 3 and Γ 4 of dualcone K ∗ ; rather, we have the biorthogonality condition, [43]Γ T 4 Γ 1 = Γ T 3 Γ 2 = 0Γ T 3 Γ 1 ≠ 0, Γ T 4 Γ 2 ≠ 0(206)The biorthogonal expansion of x ∈ K is thenx = Γ 1Γ T 3 xΓ T 3 Γ 1+ Γ 2Γ T 4 xΓ T 4 Γ 2(207)where Γ T 3 x/(Γ T 3 Γ 1 ) is the nonnegative coefficient of nonorthogonal projection(§E.6.1) of x on Γ 1 in the direction orthogonal to Γ 3 , and whereΓ T 4 x/(Γ T 4 Γ 2 ) is the nonnegative coefficient of nonorthogonal projection of xon Γ 2 in the direction orthogonal to Γ 4 ; they are the coordinates in this nonorthogonalsystem. Those coefficients must be nonnegative x ≽ K0 becausex ∈ K (186) and K is simplicial.If we ascribe the extreme directions of K to the columns of a matrix X ,X ∆ = [ Γ 1 Γ 2 ] (208)then we findTherefore,X †T =[Γ 31Γ T 3 Γ 1Γ 41Γ T 4 Γ 2](209)x = XX † x (210)

2.8. DUAL CONE, GENERALIZED INEQUALITY, 89is the biorthogonal expansion (207) (§E.0.1), and the biorthogonality condition(206) can be expressed succinctly (§E.1.1) 2.40X † X = I (211)The expansion XX † w for any w ∈ R 2 is unique if and only if the extremedirections of K are linearly independent; id est, iff X has no nullspace.2.8.3.1 Pointed cones and biorthogonalityThe biorthogonality condition (211) X † X = I means Γ 1 and Γ 2 are linearlyindependent generators of K . (§B.1.1.1) From §2.6.7 we know that meansΓ 1 and Γ 2 must be extreme directions of K .A biorthogonal expansion is necessarily associated with a pointed closedconvex cone, otherwise there can be no extreme directions (§2.6.4). We willaddress biorthogonal expansion with respect to a pointed polyhedral conehaving empty interior in §2.9.1.2.8.3.1.1 Example. Expansions implied by diagonalization.When matrix X ∈ R M×M is diagonalizable,⎡ ⎤w T1 M∑X = SΛS −1 = [s 1 · · · s M ] Λ⎣. ⎦ = λ i s i wi T (913)coordinates λ i (contained in diagonal matrix Λ) for its biorthogonal expansion⎡ ⎤wX = SS −1 1 T X M∑X = [ s 1 · · · s M ] ⎣ . ⎦ = λ i s i wi T (212)wM T X i=1depend upon the geometric relationship of X to its linearly independenteigenmatrices s i wiT (§A.5.1, §B.1.1) which are dyads constituted by rightand left eigenvectors and are generators of some simplicial cone. When S isreal and X belongs to that simplicial cone in R M×M , for example, then thecoefficients of expansion, the eigenvalues λ i , must be nonnegative.2.40 Possibly confusing is the fact that formula XX † x is simultaneously the orthogonalprojection of x on R(X) (1202), and the sum of nonorthogonal projections of x ∈ R(X)on the range of each column of full-rank X skinny-or-square (§E.5.0.1.2).w T Mi=1

90 CHAPTER 2. CONVEX GEOMETRYWhen X is symmetric, its eigenmatrices are extreme directions of thepositive semidefinite cone S M + (120) in the subspace of symmetric matrices,although X does not necessarily belong to that cone. The coordinatesfor biorthogonal expansion of X = QΛQ T are its eigenvalues; id est,for X ∈ S M ,X = QQ T X =M∑q i qi T X =i=1M∑λ i q i qi T ∈ S M (213)i=1is an orthogonal expansion with orthonormality condition Q T Q = I whereλ i is the i th eigenvalue of X , q i is the i th corresponding eigenvector of Xarranged columnar in orthogonal matrixQ = [q 1 q 2 · · · q M ] ∈ R M×M (214)and eigenmatrix q i q T i is the i th member of an orthonormal basis for S M (1275)and is an extreme direction of S M + .Similarly, when X belongs to the positive semidefinite cone in the subspaceof symmetric matrices, the coordinates for biorthogonal expansion ofX = QΛQ T can be its nonnegative eigenvalues (845); id est, for X ∈ K= S M + ,X = QQ T X =M∑q i qi T X =i=1M∑λ i q i qi T ∈ S M + (215)i=1is an orthogonal expansion where λ i ≥0 is the i th eigenvalue of X . 2.8.3.2 Self-dual cones and biorthogonalityWhenever a pointed closed convex cone is self-dual, K = K ∗ , an associatedbiorthogonal expansion can become an orthogonal expansion; id est, thebiorthogonality condition X † X = I can instead become the orthonormalitycondition X T X = I . The biorthogonal expansion (212), for example, becamethe orthogonal expansions (213) and (215); id est, extreme directionsof some simplicial cone and its dual became extreme directions of the positivesemidefinite self-dual cone.The most prominent examples of self-dual cones are the orthants, thepositive semidefinite cone S M + in the ambient space of symmetric matrices(199), and the second-order (Lorentz) cone (114) [62, §II.A] [1, exmp.2.25].

2.8. DUAL CONE, GENERALIZED INEQUALITY, 912.8.3.2.1 Example. Biorthogonal expansion respecting nonpositive orthant.Suppose x ∈ K any orthant in R n . 2.41 Then the coordinates for biorthogonalexpansion of x must be nonnegative; in fact, the absolute value of theCartesian coordinates.Suppose, in particular, x belongs to the nonpositive orthant K = R ṉ .Then the biorthogonal expansion becomes an orthogonal expansion and, forx∈ R ṉ n∑n∑x = XX T x = −e i (−e T i x) = −e i |e T i x| ∈ R ṉ (216)i=1the coefficients of expansion are nonnegative. For this orthant K we haveorthonormality condition X T X = I where X = −I , e i ∈ R n is a standardbasis vector, and −e i is an extreme direction (§2.6.4) of K .Of course, this expansion x=XX T x applies more broadly to domain R n ,but then the coefficients belong to R .i=12.41 An orthant is simplicial and self-dual.

92 CHAPTER 2. CONVEX GEOMETRY2.9 Formulae and algorithm finding dual cone2.9.1 Biorthogonal expansion, derivationUnique biorthogonal expansion with respect to polyhedral cone K dependsupon existence of its linearly independent extreme directions; K must thereforebe pointed, closed, and convex to uniquely represent any point in theirspan.In this section, we consider nonempty pointed polyhedral cone K havingpossibly empty interior. Hence we restrict observation to that section of thedual cone K ∗ in the affine hull of K because we are interested in biorthogonalexpansion of x ∈ aff K = aff coneX for X ∈ R n×N as in (156); weseek a vertex-description for K ∗ ∩ aff K in terms of a set of dual generators{Γ ∗ i }⊂aff K in the same finite quantity 2.42 as the extreme directions {Γ i } ofarranged columnar in X :K = cone(X) = {Xa | a ≽ 0} ⊆ R n (165)X = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (156)We assume the quantity of extreme directions N does not exceed the dimensionn of the ambient vector space because, otherwise, the expansion couldnot be unique; id est, assume N linearly independent extreme directions,hence N ≤ n (X skinny 2.43 or square, full-rank).2.9.1.1 x ∈ KSuppose x belongs to K ⊆ R n . Then x =Xa for some a ≽ 0. Vector a isunique only when {Γ i } is a linearly independent set. 2.44 Vector a ∈ R N cantake the form a =Bx if R(B)= R N . Then we require Xa =XBx = x andBx = BXa = a . The pseudoinverse B = X † ∈ R N×n (§E) is suitable whenX is skinny-or-square and full-rank. In that case rankX = N , and for allc ≽ 0 and i=1... N ,a ≽ 0 ⇔ X † Xa ≽ 0 ⇔ a T X T X †T c ≥ 0 ⇔ Γ T i X †T c ≥ 0 (217)2.42 When K is contained in a proper subspace of R n , the ordinary dual cone K ∗ will havemore (conic) generators in any minimal set than K has extreme directions.2.43 “Skinny” meaning thin; more rows than columns.2.44 Conic independence alone (§2.6.7) is insufficient to guarantee uniqueness.

2.9. FORMULAE AND ALGORITHM FINDING DUAL CONE 93The penultimate inequality follows from the generalized inequality and membershipcorollary, while the last inequality is a consequence of that corollary’sdiscretization (§2.8.2.1.3). 2.45 From (217) and (194) we deduceK ∗ ∩ aff K = cone(X †T ) = {X †T c | c ≽ 0} ⊆ R n (218)is the vertex-description for that section of K ∗ in the affine hull of K becauseR(X †T ) = R(X) by definition of the pseudoinverse. From (176), we knowK ∗ ∩ aff K must be pointed if rel int K is logically assumed nonempty withrespect to aff K .Conversely, suppose full-rank skinny-or-square matrix[ ]X †T =∆ Γ ∗ 1 Γ ∗ 2 · · · Γ ∗ N ∈ R n×N (219)comprises the extreme directions {Γ ∗ i } ⊂ aff K of the dual cone section inthe affine hull of K . 2.46 From the discrete membership theorem and (179) weget a partial dual to (194); id est, assuming x∈aff cone X,{ }x ∈ K ⇔ γ ∗T x ≥ 0 for all γ ∗ ∈ Γ ∗ i , i=1... N ⊂ ∂K ∗ ∩ aff K (220)⇔ X † x ≽ 0 (221)that leads to a partial halfspace-description,K = { x∈aff cone X | X † x ≽ 0 } (222)2.45a ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀(c ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀a ≽ 0)∀(c ≽ 0 ⇔ Γ T i X †T c ≥ 0 ∀i)Intuitively, any nonnegative vector a is a conic combination of the standard basis{e i ∈ R N }; a≽0 ⇔ a i e i ≽ 0 for all i. The last inequality in (217) is a consequenceof the fact that x=Xa may be any extreme direction of K , in which case a is a standardbasis vector; a = e i ≽ 0. Theoretically, because c ≽0 defines a pointed polyhedralcone (in fact, the nonnegative orthant in R N ), we can take (217) one step further bydiscretizing c:a ≽ 0 ⇔ Γ T i Γ ∗ j ≥ 0 for i,j =1... N ⇔ X † X ≥ 0In words, X † X must be a matrix whose entries are each nonnegative.2.46 When closed convex K has empty interior, K ∗ has no extreme directions.

94 CHAPTER 2. CONVEX GEOMETRYFor γ ∗ = X †T e i , any x = Xa , and for all i we have e T i X † Xa = e T i a ≥ 0only when a ≽ 0. Hence x∈ K .When X is full-rank, then the unique biorthogonal expansion of x ∈ Kbecomes (210)x = XX † x =N∑i=1Γ i Γ ∗Ti x (223)whose coefficients must be nonnegative because K is relatively simplicial byassumption. Whenever X is full-rank, so is its pseudoinverse X † . (§E) Inthe present case, the columns of X †T are linearly independent and generatorsof the dual cone K ∗ ∩ aff K ; hence, the columns constitute its extremedirections. (§2.6.7) That section of the dual cone is itself a polyhedral cone(by (159) or the cone intersection theorem, §2.6.2) having the same numberof extreme directions as K .2.9.1.2 x ∈ aff KThe extreme directions of K and K ∗ ∩ aff K have a distinct relationship;because X † X = I , then for i,j = 1... N , Γ T i Γ ∗ i = 1, while for i ≠ j ,Γ T i Γ ∗ j = 0. Yet neither set of extreme directions, {Γ i } nor {Γ ∗ i } , is necessarilyorthogonal. This is the exact description of a biorthogonality condition,[43, §2.2.4] [28] implying each set of extreme directions is linearly independent.(§B.1.1.1)The biorthogonal expansion therefore applies more broadly; meaning, forany x ∈ aff K , vector x can be uniquely expressed x =Xb where b∈R N becauseaff K contains the origin. Thus, for any such x∈ R(X) (confer §E.1.1),the biorthogonal expansion (223) becomes x =XX † Xb =Xb .2.9.2 Dual of pointed K, X skinny-or-square full-rankWe wish to derive expressions for the convex cone and its ordinary dual underthe same general assumptions: pointed polyhedral K denoted by its extremedirections arranged columnar in matrix X such thatThe vertex-description is given:rank(X ∈ R n×N ) = N ∆ = dim aff K ≤ n (224)K = {Xa | a ≽ 0} ⊆ R n (225)

2.9. FORMULAE AND ALGORITHM FINDING DUAL CONE 95from which a halfspace-description for the dual cone follows directly:By defining a matrixK ∗ = {y ∈ R n | X T y ≽ 0} (226)X ⊥ ∆ = basis N(X T ) (227)(a columnar basis for the orthogonal complement of R(X)), we can sayaff cone X = aff K = {x | X ⊥T x = 0} (228)meaning K lies in a subspace, perhaps R n . Thus we have a halfspacedescriptionK = {x ∈ R n | X † x ≽ 0, X ⊥T x = 0} (229)and from (179), a vertex-description 2.47K ∗ = { [X †T X ⊥ −X ⊥ ]b | b ≽ 0 } ⊆ R n (230)These results are summarized for a pointed polyhedral cone, having linearlyindependent generators, and its ordinary dual:Cone Table 1 K K ∗vertex-description X X †T , ±X ⊥halfspace-description X † , X ⊥T X T2.9.2.0.1 Example. Theorem of the alternative for linear inequality.A myriad of alternative systems of linear inequalities can now be explained interms of polyhedral cones and their duals. Consider the polyhedral cone K ,for example, constrained to lie in a subspace of R n specified by an intersectionof hyperplanes through the origin {x∈ R n |A T x=0} where A∈ R n×m . Thenfrom (183) we may write,b − Ay ∈ K ∗ ⇔ x T (b − Ay)≥ 0 ∀x ∈ K (231)A T x=0, b − Ay ∈ K ∗ ⇒ x T b ≥ 0 ∀x ∈ K (232)2.47 These descriptions are not unique. A vertex-description of the dual cone, for example,might use four conically independent generators for a plane (§2.6.7.1) when only threewould suffice.

96 CHAPTER 2. CONVEX GEOMETRYConversely, suppose we are given the membership relation (231). (§2.8.2.0.1)Then because x T Ay is unbounded below over all y ∈ R m , x T (b − Ay) ≥ 0implies A T x=0; id est, K lies in a subspace: for y ∈ R m ,A T x=0, b − Ay ∈ K ∗ ⇐ x T (b − Ay)≥ 0 ∀x ∈ K (233)In toto, for any y ∈ R m ,b − Ay ∈ K ∗ ⇔ x T b ≥ 0, A T x=0 ∀x ∈ K (234)From this, the alternative systems of inequalities for pointed polyhedral conesK and K ∗ [30, p.201]Ay ≼K ∗bexclusive or(235)x T b < 0, A T x=0 ∀x ≽K0derived from (234) simply by taking the complementary sense of the inequalityin x T b . These two systems are incompatible; only one is feasible.By invoking a strict membership relation (187), we can construct a moreexotic interdependency;b − Ay ≻K ∗0 ⇔ x T b > 0, A T x=0 ∀x ≽K0, x ≠ 0 (236)From this, the alternative systems of inequalities [1, pp.50, 54, 262]Ay ≺K ∗bexclusive or(237)x T b ≤ 0, A T x=0 ∀x ≽K0, x ≠ 0derived from (236) taking the complementary sense of the inequality in x T b .

2.9. FORMULAE AND ALGORITHM FINDING DUAL CONE 972.9.2.1 Simplicial caseWhen a convex cone is simplicial (§2.7.3), Cone Table 1 simplifies becausethen aff coneX = R n : For square X and assuming simplicial K such thatrank(X ∈ R n×N ) = N ∆ = dim aff K = n (238)we haveCone Table S K K ∗vertex-description X X †Thalfspace-description X † X TFor example, vertex-description (230) simplifies toK ∗ = {X †T b | b ≽ 0} ⊂ R n (239)Now, because dim R(X) = dim R(X †T ) , (§E) the dual cone K ∗ is simplicialwhenever K is.2.9.2.2 Ambient space = aff KIt is obvious by definition (170) of the ordinary dual cone K ∗ in ambient vectorspace S that its determination instead in ambient vector space M ⊆ S isidentical to its intersection with M ; id est, assuming cone K ⊆ M ,becauseK ∗ in ambient M ⊆ S = K ∗ ∩ M (240){y ∈ M | 〈y, x〉 ≥ 0 for all x ∈ K} = {y ∈ S | 〈y, x〉 ≥ 0 for all x ∈ K} ∩ M(241)From this, a constrained and new membership relation (§2.8.2.0.1) for theordinary dual cone K ∗ ⊆ S , assuming x,y ∈ M and K ⊆ M ,y ∈ K ∗ ∩ M ⇔ 〈y, x〉 ≥ 0 for all x ∈ K (242)By closure in ambient M we have symmetry (§2.8.1.1):x ∈ K ⇔ 〈y, x〉 ≥ 0 for all y ∈ K ∗ ∩ M (243)This means membership determination in ambient M requires knowledge ofthe dual cone only in M .

98 CHAPTER 2. CONVEX GEOMETRYAssume now an ambient space M that is the affine hull of cone K : Consideragain a pointed polyhedral cone K denoted by its extreme directionsarranged columnar in matrix X such thatrank(X ∈ R n×N ) = N ∆ = dim aff K ≤ n (224)We want expressions for the convex cone and its dual in ambient M=aff K :Cone Table A K K ∗ ∩ aff Kvertex-description X X †Thalfspace-description X † , X ⊥T X T , X ⊥TWhen dim aff K = n , this table reduces to Cone Table S. These descriptionsfacilitate work in an ambient space that may be a proper subspace. Thesubspace of symmetric matrices S N , for example, often serves as ambientspace. 2.482.9.2.2.1 Example. The monotone nonnegative cone.[1, exer.2.33] [73, §2] Simplicial cone (§2.7.3.0.1) K M+ is the cone of allnonnegative vectors having their entries sorted in nonincreasing order:K M+ ∆ = {x | x 1 ≥ x 2 ≥ · · · ≥ x n ≥ 0} ⊆ R n += {x | (e i − e i+1 ) T x ≥ 0, i = 1... n−1, e T nx ≥ 0}= {x | X † x ≽ 0}(244)a halfspace-description where e i is the i th standard basis vector, and whereX †T ∆ = [ e 1 −e 2 e 2 −e 3 · · · e n ] ∈ R n×n (245)(With X † in hand, we might concisely scribe the remaining vertex andhalfspace-descriptions from the tables for K M+ and its dual. Instead weuse generalized inequality in their derivation.) For any x and y , simplealgebra demandsn∑x T y = x i y i = (x 1 − x 2 )y 1 + (x 2 − x 3 )(y 1 + y 2 ) + (x 3 − x 4 )(y 1 + y 2 + y 3 ) + · · ·i=1+ (x n−1 − x n )(y 1 + · · · + y n−1 ) + x n (y 1 + · · · + y n )(246)2.48 The dual cone of positive semidefinite matrices S N∗+ = S N + remains in S N by convention,whereas the ordinary dual cone would venture into R N×N .

2.9. FORMULAE AND ALGORITHM FINDING DUAL CONE 99Because x i − x i+1 ≥ 0 ∀i by assumption whenever x ∈ K M+ , we canemploy generalized inequality (185) with respect to the self-dual nonnegativeorthant R n + to find the halfspace-description of the dual cone K ∗ M+ . We cansay x T y ≥ 0 for all X † x ≽ 0 [sic] if and only if,id est,wherey 1 ≥ 0, y 1 + y 2 ≥ 0, ... , y 1 + y 2 + · · · + y n ≥ 0 (247)x T y ≥ 0 ∀X † x ≽ 0 ⇔ X T y ≽ 0 (248)X = [e 1 e 1 +e 2 e 1 +e 2 +e 3 · · · 1 ] ∈ R n×n (249)Because X † x ≽ 0 connotes membership of x to pointed K M+ , then by(170) the dual cone we seek comprises all y for which (248) holds; thus itshalfspace-description,k∑K ∗ M+ = {y ≽ 0} = {y | y i ≥ 0, k = 1... n} = {y | X T y ≽ 0} ⊂ R nK ∗ M+i=1(250)The monotone nonnegative cone and its dual are simplicial, illustrated fortwo Euclidean spaces in Figure 2.24.From §2.8.2.2, the extreme directions of proper K M+ are respectivelyorthogonal to the facets of K ∗ M+ . Because K∗ M+ is simplicial, the inwardnormalsto its facets constitute the linearly independent rows of X T by (250).Hence the vertex-description for K M+ employs the columns of X in agreementwith Cone Table S because X † = X −1 . Likewise, the extreme directions ofproper K ∗ M+ are respectively orthogonal to the facets of K M+ whose inwardnormalsare contained in the rows of X † by (244). So the vertex-descriptionfor K ∗ M+ employs the columns of X†T .2.9.2.2.2 Example. Monotone cone. (Figure 2.25) Of nonempty interiorbut not pointed, the monotone cone is polyhedral and defined by thehalfspace-descriptionK M ∆ = {x ∈ R n | x 1 ≥ x 2 ≥ · · · ≥ x n } = {x ∈ R n | X ∗T x ≽ 0} (251)Its dual is therefore pointed but of empty interior, having vertex-descriptionK ∗ M = {X ∗ b ∆ = [e 1 −e 2 e 2 −e 3 · · · e n−1 −e n ]b | b ≽ 0 } ⊂ R n (252)

100 CHAPTER 2. CONVEX GEOMETRYwhere the columns of X ∗ comprise the extreme directions of K ∗ M . BecauseK ∗ M is pointed and satisfiesrank(X ∗ ∈ R n×N ) = N ∆ = dim aff K ∗ ≤ n (253)where N = n−1, and because K M is closed and convex, we may adapt ConeTable 1 as follows:Cone Table 1* K ∗ K ∗∗ = Kvertex-description X ∗ X ∗†T , ±X ∗⊥halfspace-description X ∗† , X ∗⊥T X ∗TThe vertex-description for K M is thereforeK M = {[X ∗†T X ∗⊥ −X ∗⊥ ]a | a ≽ 0} ⊂ R n (254)where X ∗⊥ = 1 and⎡⎤n − 1 −1 −1 · · · −1 −1 −1n − 2 n − 2 −2... · · · −2 −2X ∗† = 1 . n − 3 n − 3 . .. −(n − 4) . −3n3 . n − 4 . ∈ R.. n−1×n−(n − 3) −(n − 3) .⎢⎥⎣ 2 2 · · ·... 2 −(n − 2) −(n − 2) ⎦while1 1 1 · · · 1 1 −(n − 1)(255)K ∗ M = {y ∈ R n | X ∗† y ≽ 0, X ∗⊥T y = 0} (256)is the dual monotone cone halfspace-description.2.9.2.3 More descriptions of pointed cone with equality constraintConsider pointed polyhedral cone K whose subspace membership is explicit;id est, we are given the ordinary halfspace-description,K = {x | Ax ≽ 0, Cx = 0} ⊆ R n(159a)where A ∈ R m×n and C ∈ R p×n . This can be equivalently written in termsof nullspace of C and vector ξ :K = {Zξ ∈ R n | AZξ ≽ 0} (257)

2.9. FORMULAE AND ALGORITHM FINDING DUAL CONE 101where R(Z ∈ R n×n−rank C ) ∆ = N(C) . Assuming (224) is satisfiedrankX ∆ = rank ( (AZ) † ∈ R n−rank C×m) = m − l = dim aff K ≤ n − rankC(258)where l is the number of conically dependent rows in AZ (§2.6.7) thatmust be removed before the cone tables become applicable. Then the resultscollected in the cone tables imply the assignment ˆX ∆ =(ÂZ)† ∈ R n−rank C×m−l ,where Â∈ Rm−l×n , can be followed with linear transformation by Z . So, weget the vertex-descriptionK = {Z(ÂZ)† b | b ≽ 0} (259)From this and (189) we get a halfspace-description of the dual cone,K ∗ = {y ∈ R n | (Z T Â T ) † Z T y ≽ 0} (260)From this and Cone Table 1 we get a vertex-description, (1175)Yet becauseK ∗ = {[Z †T (ÂZ)T C T −C T ]c | c ≽ 0} (261)K = {x | Ax ≽ 0} ∩ {x | Cx = 0} (262)then, by (179), we get an equivalent vertex-description for the dual coneK ∗ = {x | Ax ≽ 0} ∗ + {x | Cx = 0} ∗= {[A T C T −C T ]b | b ≽ 0}(263)from which the conically dependent columns may, of course, be removed.2.9.3 Dual of proper non-simplicial K, X fat full-rankHaving found formula (239) to determine the dual of a simplicial cone, theeasiest way to find the vertex-description for the dual of an arbitrary polyhedralproper cone K in R n is to first decompose it into simplicial parts K iso that K = ⋃ K i . 2.49 The existence of multiple simplicial parts means the2.49 That proposition presupposes, of course, that we know how to perform the simplicialdecomposition efficiently; also called triangulation. [74] [75, §3.1] [76, §3.1]

102 CHAPTER 2. CONVEX GEOMETRYbiorthogonal expansion of x∈ K like (223) can no longer be unique becausethe number of extreme directions in K exceeds n the dimension of the space.Assume we are given a set of N conically independent generators 2.50(§2.6.7) of proper K arranged columnar in X ∈ R n×N such that n

2.9. FORMULAE AND ALGORITHM FINDING DUAL CONE 103Defining X = ∆ [X 1 X 2 · · · X M ] (with any redundant columns optionally removedfrom X), then K ∗ can be expressed, (195) (Cone Table S)M⋂M⋂K ∗ = {y | X T y ≽ 0} = {y | Xi T y ≽ 0} = K ∗ i (269)i=1To find the extreme directions of the dual cone, first we observe that somefacets of each simplicial part K i are common to facets of K by assumption,and the union of all those common facets comprises the set of all facets of Kby design. For any polyhedral proper cone K , the extreme directions of thedual cone K ∗ are respectively orthogonal to the facets of K . (§2.8.2.2) Thenthe extreme directions of the dual cone can be found among the inwardnormalsto facets of the component simplicial cones K i ; those normals areextreme directions of the dual simplicial cones K ∗ i . From the theorem andCone Table S,K ∗ =M⋂K ∗ i =i=1M⋂i=1i=1{X †Ti c | c ≽ 0} (270)The set of extreme directions {Γ ∗ i } for the proper dual cone K ∗ is thereforeconstituted by the conically independent generators, from the columns of allthe dual simplicial matrices {X †Ti }, that do not violate its discrete definition(195):{ } {}Γ ∗ 1 , Γ ∗ 2 ... Γ ∗ N = c.i. X †Ti (:,j), i =1... M, j =1...n | X † i (j,:)Γ l ≥ 0, l =1... N(271)where c.i. denotes the selection of only the conically independent vectorsfrom the argument set, argument (:,j) denotes the j th column while (j,:)denotes the j th row, and {Γ l } constitutes the extreme directions of K .Figure 2.16(b) (p.71) shows a cone and its dual found via this formulation.Example. Dual of K non-simplicial with respect to ambient space aff K .Given conically independent generators for pointed closed convex K in R 4 ,arranged columnar inX = [ Γ 1 Γ 2 Γ 3 Γ 4 ] =⎡⎢⎣1 1 0 0−1 0 1 00 −1 0 10 0 −1 −1⎤⎥⎦ (272)

X 1 =4X †T1 =⎡⎢⎣⎡⎢⎣104 CHAPTER 2. CONVEX GEOMETRYhaving dim aff K =rankX = 3, then performing the most inefficient simplicialdecomposition in aff K we find⎤ ⎡ ⎤ ⎡ ⎤ ⎡1 1 0 1 0 0⎥⎦ , X 2 = ⎢ −1 0 0⎥⎣ 0 −1 1 ⎦ , X 3 = ⎢ −1 1 0⎥⎣ 0 0 1 ⎦ , X 4 = ⎢⎣0 0 −1 0 −1 −11 1 0−1 0 10 −1 00 0 −12 1 1−2 1 12 −3 1−2 1 −31 0 00 1 0−1 0 10 −1 −1(273)The corresponding dual simplicial cones in aff K have generators respectivelycolumnar in⎤ ⎡ ⎤ ⎡ ⎤ ⎡3 2 −13 −1 2⎥⎦ , 4X†T 2 = ⎢ ⎥⎣ ⎦ , 4X†T 3 = ⎢ −1 2 −1⎥⎣ ⎦ , 4X†T 4 = ⎢ −1 3 −2⎣Applying (271) we get1 2 1−3 2 11 −2 11 −2 −3[ ]Γ ∗ 1 Γ ∗ 2 Γ ∗ 3 Γ ∗ 4⎡= 1 ⎢4⎣−1 −2 3−1 −2 −11 2 3 21 2 −1 −21 −2 −1 2−3 −2 −1 −2⎤(274)⎥⎦ (275)whose rank is 3, and is the known result; 2.51 the conically independent generatorsfor that pointed section of the dual cone K ∗ in aff K ; id est, K ∗ ∩aff K .2.10 Polyhedral inscription in PSD coneprimal and dual inscribed conesobjective is bounded means number of faces not exponential as aspremontpredicts.inscription example in §2.8.3.1.1.2.51 These calculations proceed so as to be consistent with [77, §6]; as if the ambient vectorspace were the proper subspace aff K whose dimension is 3. In that ambient space, Kmay be regarded as a proper cone. Yet that author (from the citation) erroneously statesthe dimension of the ordinary dual cone to be 3 ; it is, in fact, 4.⎤⎥⎦−1 −1 2−1 −1 −2⎤⎥⎦

2.10. POLYHEDRAL INSCRIPTION IN PSD CONE 1051Figure 2.17: The unit simplex in R 3 is a unique solid tetrahedron, but is notregular.

106 CHAPTER 2. CONVEX GEOMETRYFigure 2.18: Two views of a simplicial cone and its dual in R 3 . The semiinfiniteboundary of each cone is truncated for illustration. The Cartesianaxes are drawn for reference.

2.10. POLYHEDRAL INSCRIPTION IN PSD CONE 107K ∗y(a)0KK ∗0K(b)Figure 2.19: Two equivalent constructions of dual cone K ∗ : (a) Showingconstruction by intersection of halfspaces. Only those two halfspaces whosebounding hyperplanes have inward-normal corresponding to an extreme directionof this pointed closed convex cone K ⊂ R 2 are drawn (truncated).(b) Suggesting construction by union of inward-normals y to each and everyhyperplane supporting K . This interpretation is valid when K is convex becauseexistence of a supporting hyperplane is then guaranteed (§2.3.2.4).KK ∗0Figure 2.20: K is a halfspace about the origin in R 2 . K ∗ is a ray base 0,hence has empty interior in R 2 ; so K cannot be pointed. (Both convex conesappear truncated.)

108 CHAPTER 2. CONVEX GEOMETRY∂K ∗KFigure 2.21: K is a pointed empty-interior polyhedral cone in R 3 (drawntruncated and parallel to the floor on which you stand). K ∗ is a wedge whosetruncated boundary is illustrated (drawn perpendicular to the floor). In thisparticular instance, K ⊂ int K ∗ (excepting the origin). Cartesian coordinateaxes drawn for reference.

2.10. POLYHEDRAL INSCRIPTION IN PSD CONE 109KK ∗Figure 2.22: K and K ∗ are halfplanes in R 3 . Both semi-infinite convexcones appear truncated. Each cone is like K in Figure 2.20, but embeddedin a two-dimensional subspace of R 3 . Cartesian coordinate axes drawn forreference.

110 CHAPTER 2. CONVEX GEOMETRYΓ 4Γ 2Γ 3K ∗xzKyΓ 10Γ 1 ⊥ Γ 4Γ 2 ⊥ Γ 3K ∗w + KwFigure 2.23: Polyhedral proper cone K in R 2 and its dual K ∗ in R 2 drawntruncated. Conically independent generators Γ 1 and Γ 2 constitute extremedirections of K while Γ 3 and Γ 4 constitute extreme directions ofK ∗ . Point x comparable to point z (and vice versa) but not to y ;z ≽ x ⇔ z − x ∈ K ⇔ z − x ≽ K0 iff coordinates for biorthogonal expansionof z −x are nonnegative. Point y not comparable to z because z doesnot belong to y ±K . Points need not belong to K to be comparable; e.g., allpoints greater than w belong to w + K . Were w lifted to three dimensions(K in R 2 and its ordinary dual K ∗ in R 3 for example), all members of thetranslated cone w + K would remain comparable to w ∈ R 3 .

2.10. POLYHEDRAL INSCRIPTION IN PSD CONE 11110.80.6K ∗ M+(a)0.40.2K M+0−0.2−0.4−0.6K ∗ M+−0.8−1−0.5 0 0.5 1 1.5(b)⎡X †T (:,3) = ⎣001⎤⎦∂K ∗ M+K M+⎡X = ⎣1 1 10 1 10 0 1⎤⎦⎡1X †T (:,1) = ⎣−10⎤⎦⎡ ⎤0X †T (:,2) = ⎣ 1 ⎦−1Figure 2.24: Simplicial cones. (a) Monotone nonnegative cone K M+ and itsdual K ∗ M+ (drawn truncated) in R2 . (b) Monotone nonnegative cone andboundary of its dual (both drawn truncated) in R 3 . Extreme directions ofK ∗ M+ are indicated.

112 CHAPTER 2. CONVEX GEOMETRY∂K MK ∗ M∂K MK ∗ MFigure 2.25: Two views of the monotone cone K M and its dual K ∗ M (drawntruncated) in R 3 . The monotone cone is not pointed. The dual monotonecone has empty interior. The Cartesian coordinate axes are drawn for reference.In R 2 , the monotone cone and its dual resembles Figure 2.20.

Chapter 3Geometry of convex functionsThe link between convex sets and convex functions is via theepigraph: A [real] function is convex if and only if its epigraph isa convex set.−Stephen Boyd & Lieven Vandenberghe [1, §3.1.7]We limit our treatment of multidimensional functions to finitedimensionalEuclidean space. Then the icon for the one-dimensional convexfunction is bowl-shaped (Figure 3.2), whereas the concave icon is the invertedbowl; respectively characterized by a unique global minimum andmaximum whose existence is assumed. Because of this simple relationship,the usage of the term convexity is often implicitly inclusive of concavity inthe literature. Despite the iconic imagery, the reader is reminded that theset of all convex, concave, quasiconvex, and quasiconcave functions containsthe monotone [25] [24, §2.3.5] functions; e.g., [1, §3.6, exer.3.46].3.0 c○ 2001 Jon Dattorro, all rights reserved.113

114 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1 Convex function3.1.1 Vector-valued functionThe vector-valued continuous function f(X) : R p×k →R M is convex in X ifand only if domf is a convex set and, for each and every Y,Z ∈domf andall 0≤µ≤1,f(µY + (1 − µ)Z) ≼µf(Y ) + (1 − µ)f(Z) (276)R M +Reversing the sense of the inequality flips this definition to concavity.Since comparison of vectors here is with respect to R M + (196), theM-dimensional nonnegative orthant, the test prescribed by (276) is simplya comparison on R of each entry of the vector function. The vector-valuedfunction case is therefore a straightforward generalization of conventionalconvexity theory for a real function. [1, §3, §4]This same conclusion also follows from theory of generalized inequality(§2.8.2.0.1) that impliesf convex ⇔ w T f convex ∀w ≽ 0 (277)shown by substituting the defining inequality (276). Discretization(§2.8.2.1.3) allows relaxation of the semi-infinite number of constraints w ≽ 0to: for each and every w ∈ {e i , i=1... M} (the standard basis for R M anda minimal set of generators (§2.6.4.1) for R M + ) from which the stated conclusionfollows.Relation (277) further implies the space of all vector-valued convex functionsis a closed convex cone. Indeed, any nonnegatively weighted sum ofconvex functions remains convex. Certainly any nonnegative sum of realconvex functions remains convex.When f(X) instead satisfies, for each and every distinct Y and Z indomf and all 0

3.1. CONVEX FUNCTION 115problem inf f(X) over some abstract convex set C . The vector (Euclidean)X∈ C2-norm ‖x‖ and Frobenius norm ‖X‖ F , for example, are strictly convexfunctions of their respective argument. [1, §8.1]3.1.1.1 Affine functionA function f(X) is affine when it is continuous and has the dimensionallyextensible form (confer §2.6.6.3.4)f(X) = AX + B (279)When B = 0 then f(X) is a linear function. Variegated multidimensionalaffine functions are recognized by the existence of no multivariate terms inargument entries and no polynomial terms in argument entries of degreehigher than 1 ; id est, entries of the function are characterized only by linearcombinations of the argument entries plus constants.All affine functions are simultaneously convex and concave.For X ∈ S M and matrices A,B, Q, R of any compatible dimensions, forexample, the expression XAX is not affine in X , whereas[ ]R Bg(X) =T XXB Q + A T (280)X + XAis an affine multidimensional function. Such a function is typical in engineeringcontrol. [78, §2.2] 3.1 [3] [5]3.1.1.2 EpigraphIt is well established that a continuous real function ˆf(X) : R p×k →R isconvex if and only if its epigraph makes a convex set;ˆf convex ⇔ epi ˆf convex (281)where the epigraph is defined, [29] [30] [35] [32] [37]epi ˆf ∆ = {(X,t) | X ∈dom ˆf , ˆf(X) ≤ t } ⊆ R p×k × R (282)3.1 The interpretation from this citation of {X ∈ S M | g(X) ≽ 0} as “an intersectionbetween a linear subspace and the cone of positive semidefinite matrices” is incorrect.(See §2.6.6.3.4 for a similar example.) The conditions they state under which strongduality holds for semidefinite programming are conservative. (confer §6.1.2.3.1)

116 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS21.510.5Y 20−0.5−1−1.5−2−2 −1.5 −1 −0.5 0 0.5 1 1.5 2Figure 3.1: Gradient in R 2 evaluated on grid over some open disc in domainof convex quadratic bowl ˆf(Y )= Y T Y : R 2 → R illustrated in Figure 3.2.Circular contours are level sets.Y 1Thus the epigraph is a classical connection between convex sets and realconvex functions.Generalization to a vector valued function f(X) : R p×k →R M yields onlya partial result; from (277): 3.2epi f = {(X,t) | X ∈domf , w T f(X) ≤ t , ∀w ≽ 0} ⊆ R p×k × R= {(X,t) | X ∈domf , f(X) ≼R M +t1} (283)id est,f convex ⇒ epif convex (284)This partial result follows because a convex set can be formed from theintersection of non-convex sets, and (283) describes an intersection. Yet everyreal function constituting each entry of the convex vector-valued functionmust, of course, correspond to a convex epigraph.3.2 By generalized inequality (§2.8.2.0.1), w T (t1 − f) ≥ 0 ∀w ≽ 0 ⇔ t1 − f ≽ 0.

3.1. CONVEX FUNCTION 1173.1.1.3 GradientThe gradient ∇f (§D.1) of a multidimensional function f maps each entryf i to a space having the same dimension as the ambient space of its domain;it can be considered a vector pointing in the direction of greatest change.[79, §15.6] For a one-dimensional function of real variable, for example, thegradient is just the slope of that function evaluated at any point in thedomain. For the quadratic bowl in Figure 3.2, the gradient maps to R 2 ;illustrated in Figure 3.1.Example. Hyperplane, line, described by affine function. Consider thereal affine function of vector variable,ˆf(x) : R p →R = a T x + b (285)whose domain is R p and whose gradient ∇ ˆf(x) = a is a constant vector(independent of x). This function describes the real line R , its range, andit describes a non-vertical [29, §B.1.2] hyperplane ∂H in the space R p × Rfor any vector a (confer §2.3.2);{[∂H =xa T x + b] }| x∈ R p ⊂ R p ×R (286)having nonzero normalη =[ a−1]∈ R p ×R (287)This equivalence to a hyperplane holds only for real functions. 3.3• The epigraph of the real affine function ˆf(x) is therefore a halfspace,and so the real affine function is to convex functions as the halfspaceis to convex sets.3.3 To prove that, consider a vector-valued affine functionf(x) : R p →R M = Ax + bhaving gradient ∇f(x)=A T ∈ R p×M : The affine set{[ ] }x| x∈ R p ⊂ R p ×R MAx + b

118 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSSimilarly, the matrix-valued affine function of real variable x , for anymatrix A∈ R M×N ,h(x) : R→R M×N = Ax + B (288)describes a line in R M×N in direction Aand describes a line in R×R M×N{[xAx + B]whose slope with respect to x is A .{Ax + B | x∈ R} ⊆ R M×N (289)}| x∈ R ⊂ R×R M×N (290)3.1.1.3.1 First-order convexity condition, vector-valued functionWhen a real function ˆf is differentiable at each point in its open domain,there is an intuitive geometric interpretation of function convexity in termsof its gradient ∇ ˆf and its epigraph.Consider the first-order necessary and sufficient condition for convexity:Differentiable vector-valued function f(X) : R p×k →R M is convex if and onlyif domf is open, convex, and for all X,Y ∈ domf ,f(Y ) ≽R M +f(X) +→Y −Xdf(X)= f(X) + d dt∣ f(X+ t (Y − X)) (291)t=0→Y −Xwhere df(X) is the directional derivative 3.4 [79] [80] of f at X in directionY − X , and where the right-hand side of the inequality is the first-orderTaylor series expansion of f about X.is perpendicular tobecauseη T ([η ∆ =[∇f(x)−IxAx + b]∈ R p×M × R M×M] [0−b])= 0, ∀x ∈ R pYet η is a vector (in R p ×R M ) only when M = 1.3.4 We extend the traditional definition of directional derivative in §D.1.4 so that directionmay be indicated by a vector or a matrix, thereby broadening the scope of the Taylor series(§D.1.6).

3.1. CONVEX FUNCTION 119ˆf(Y )[∇ ˆf(X)−1]∂H −Figure 3.2: Drawn is a convex quadratic bowl in R 3 (confer Figure D.1,p.361); ˆf(Y )= Y T Y : R 2 → R versus Y on some open disc in R 2 . Thesupporting hyperplane ∂H − ∈ R 2 ×R (which is tangent, only partially drawn)and its normal vector [ ∇ ˆf(X) T −1 ] T at the point of support [X T ˆf(X) ]Tare also illustrated.Necessary and sufficient discretization (§2.8.2.1.3) of w ≽ 0 in (277) invitesrefocus to the real-valued function: When ˆf(X) : R p×k →R is a realdifferentiable convex function with matrix argument on open convex domain,from (1138)ˆf(Y ) ≥ ˆf(X)(+ tr ∇ ˆf(X))T (Y − X)(292)is the first-order necessary and sufficient condition for convexity.When ˆf(X) : R p →R is a real differentiable convex function with vectorargument on open convex domain, there is further simplification of this firstordercondition (291); [1, §3.1.3] [33, §1.2] [4, §1.2.3]ˆf(Y ) ≥ ˆf(X) + ∇ ˆf(X) T (Y − X) (293)From this we can find a unique [32, §5.5] non-vertical [29, §B.1.2] hyperplane∂H − (§2.3), expressed in terms of the function gradient, supporting epi ˆf ;videlicet, [1, §3.1.7] defining ˆf(Y /∈ dom ˆf)= ∞ ,[ Yt]∈ epi ˆf ⇔ t ≥ ˆf(Y ) ⇒[∇ ˆf(X) T] ([ Y−1t] [−Xˆf(X)])≤ 0(294)

120 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSαβˆf(Y )=α ≥ ˆf(Y )=β ≥ ˆf(Y )= γ∇ ˆf(X)YγzHFigure 3.3: Shown is a plausible contour plot in R 2 of some arbitrary realconvex function ˆf(Y ) at selected levels α , β , and γ ; contours in the function’sdomain of equal level ˆf (level sets). A convex function has convexsublevel sets L X ˆf(X) (296). [30, §4.6]This means for each and every point X in the domain of a [ real convex ]function, there exists a hyperplane in R p ∇ ˆf(X)× R having normal−1[ ]Xsupporting the function epigraph at ∈ ∂H − . One such supportingˆf(X)hyperplane is illustrated in Figure 3.2 for a convex quadratic.From (293) we deduce for all X,Y ∈ dom ˆf :∇ ˆf(X) T (Y − X) ≥ 0 ⇒ ˆf(Y ) ≥ ˆf(X) (295)meaning, the gradient at X identifies a supporting hyperplane in R p thereto the sublevel sets of the function ˆf ,L X ˆf(X) = {Y ∈ dom ˆf | ˆf(Y ) ≤ ˆf(X)} ⊆ Rp(296)illustrated for an arbitrary real convex function in Figure 3.3.

3.1. CONVEX FUNCTION 1213.1.1.4 Second-order convexity condition, vector-valued functionAgain, by discretization, we are obliged to consider only each separate entryof a vector function. For f(X) : R p →R M , a twice differentiable vectorfunction with vector argument on open convex domain,∇ 2 f i (X) ≽0 ∀X ∈ domf , i=1... M (297)S p +is a necessary and sufficient condition for convexity of f . Strict inequalityis a sufficient condition for strict convexity.For a twice-differentiable real function ˆf(X) : R p →R having opendomain, a consequence of the mean value theorem from calculus allows preciseexpression of its complete Taylor series expansion about X ∈ dom ˆf(§D.1.6) using only three terms: On some open interval of ‖Y ‖ so each andevery line segment [X,Y ] belongs to dom ˆf , there exists an α ∈ [0, 1] suchthat [4, §1.2.3] [33, §1.1.4]ˆf(Y ) = ˆf(X) + ∇ ˆf(X) T (Y −X) + 1 2 (Y −X)T ∇ 2ˆf(αX + (1 − α)Y )(Y −X)(298)The first-order condition for convexity (293) follows directly from this andthe second-order condition (297).We need different tools for matrix argument:3.1.2 Matrix-valued functionWe are primarily interested in continuous matrix-valued functions g(X).We choose symmetric g(X) ∈ S M because matrix-valued functions are mostoften compared (299) with respect to the positive semidefinite cone S M + inthe ambient space of symmetric matrices. 3.53.5 Function symmetry is not a necessary requirement for convexity; indeed, for A∈R m×pand B ∈R m×k , g(X) = AX + B is a convex (affine) function in X on domain R p×k withrespect to the nonnegative orthant R m×k+ . Symmetric convex functions share the samebenefits as symmetric matrices. Horn & Johnson [28, §7.7] liken symmetric matrices toreal numbers, and (symmetric) positive definite matrices to positive real numbers.

122 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSDefinition. Convex matrix-valued function.1) Matrix-definition.A function g(X) : R p×k →S M is convex in X iff domg is a convex set and,for each and every Y,Z ∈domg and all 0≤µ≤1, [24, §2.3.7]g(µY + (1 − µ)Z) ≼µg(Y ) + (1 − µ)g(Z) (299)S M +Reversing the sense of the inequality flips this definition to concavity. Strictconvexity is defined less a stroke of the pen in (299) similarly to (278).2) Scalar-definition.It follows that g(X) : R p×k →S M is convex in X iff w T g(X)w : R p×k →R isconvex in X for each and every ‖w‖=1; shown by substituting the defininginequality (299). By generalized inequality (§2.8.2.4) we have the equivalentbut more general criterion,g convex ⇔ 〈W , g〉 convex ∀W ≽0 (300)Because the set of all extreme directions for the positive semidefinitecone comprises a minimal set of generators for that cone, discretization(§2.8.2.1.3) allows replacement of matrix W with symmetric dyad ww T asproposed.△S M +The generalized inequality (300) from this scalar-definition implies thespace of all matrix-valued convex functions is a closed convex cone.3.1.2.1 First-order convexity condition, matrix-valued functionFrom the scalar-definition we have, for each and every real vector w of unitnorm ‖w‖=1,w T g(Y )w ≥ w T →Y −XTg(X)w + w dg(X) w (301)which follows immediately from the first-order condition for convexity of areal function (293). By generalized inequality (confer (291))g(Y ) ≽S M +g(X) +→Y −Xdg(X) (302)must therefore be necessary and sufficient for convexity of a matrix-valuedfunction of matrix variable.

3.1. CONVEX FUNCTION 1233.1.2.2 Epigraph of matrix-valued functionObserving for each and every ‖w‖=1,w T g(X)w ≤ t ⇔ g(X) ≼tI (303)S M +[28, §7.7, prob.9] and by definingepig = { (X,t) | X ∈domg , w T g(X)w ≤ t , ‖w‖=1 } ⊆ R p×k × R= {(X,t) | X ∈domg , g(X) ≼ tI } (304)S M +it followsg convex ⇒ epig convex (305)Given a continuous matrix-valued function g(X) : R p×k →S M consequent tothe scalar-definition, we defined the epigraph of g(X) (304) in terms of thecorresponding real function w T g(X)w ; id est,epig = ⋂‖w‖=1epi ( w T gw ) (306)An epigraph made from an intersection of non-convex sets is possibly convex,hence the partial result (305).3.1.2.3 Line Theoremg(X) : R p×k →S M is convex in X if and only if it remains convex on theintersection of any line with its domain. [1, §3.1.1]⋄3.1.2.4 Second-order convexity condition, matrix-valued functionNow we assume a twice differentiable function and drop the subscript S M +from the inequality when it is apparent.

124 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSDefinition. Differentiable convex matrix-valued function.g(X) : R p×k → S M is convex in X iff dom g is an open convex set, and itssecond derivative g ′′ (X+ t Y ) : R→S M is positive semidefinite on each pointalong every line X+ t Y that intersects domg ; id est, iff for each and everyX, Y ∈ R p×k such that X+ t Y ∈ domg over some open interval of t ∈ R ,Similarly, ifd 2g(X+ t Y ) ≽ 0 (307)dt2 d 2g(X+ t Y ) ≻ 0 (308)dt2 then g is strictly convex; the converse is generally false. [1, §3.1.4] 3.6 △Example. Matrix inverse. The matrix-valued function g(X) = X −1 isconvex on int S M + . For each and every Y ∈ S M , (§D.2.1.1, §A.3.1.0.5)d 2dt 2 g(X+ t Y ) = 2(X+ t Y )−1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 ≽ 0 (309)on some open interval of t ∈ R such that X + t Y ≻ 0. Hence, g(X) isconvex in X. This result is extensible; 3.7 trX −1 is convex on that samedomain. [28, §7.6, prob.2]Example. Matrix squared. The iconic real function ˆf(x)= x 2 is strictlyconvex on R . Its matrix-valued counterpart g(X) = X 2 is convex on thedomain of symmetric matrices; for X, Y ∈ S M and any open interval oft ∈ R , (§D.2.1.1)d 2d2g(X+ t Y ) =dt2 dt 2(X+ t Y )2 = 2Y 2 (310)which is positive semidefinite when Y is symmetric because then Y 2 = Y T Y(848). 3.8 3.6 Quadratic forms constitute a notable exception where the strict-case converse is reliablytrue.3.7 d/dt trg(X+ tY ) = trd/dt g(X+ tY ).3.8 By (857) in §A.3.1, changing the domain instead to all symmetric and nonsymmetricpositive semidefinite matrices, for example, will not produce a convex function.

3.2. QUASICONVEX FUNCTION 125Example. Matrix exponential. The matrix-valued functiong(X) = e X : S M → S M is convex on the subspace of circulant symmetric matrices[81]. Applying the line theorem, for all t ∈ R and for circulantX,Y ∈ S M we haved 2dt 2eX+ t Y = Y e X+ t Y Y ≽ 0, (XY ) T = XY (311)because circulant matrices are commutative (§B.6) and, for symmetric matrices,XY = Y X ⇔ (XY ) T = XY (856). Given symmetric argument, thematrix exponential always resides interior to the cone of positive semidefinitematrices in the symmetric subspace; e A ≻ 0, ∀A ∈ S M (1170). If matrixe A ∈ S M is positive (semi)definite then, for any matrix Y of compatibledimension, Y T e A Y is positive semidefinite. (§A.3.1.0.5)The second derivative, given in Table D.2.7, depends only upon commutativity;so, the domain of convex g may be broadened to all commutativesymmetric matrices.Changing the function domain to the subspace of all real diagonalmatrices reduces the matrix exponential to a vector-valued function in anisometrically isomorphic subspace R M ; [82, §5.3] 3.9 known convex (276) fromthe real-valued function case [1, §3.1.5].There are, of course, multifarious methods to determine function convexity,[1] each of them efficient when appropriate.3.2 Quasiconvex functionQuasiconvex functions [1, §3.4] [29] [31] [32] are useful in practical problemsolving because they are unimodal (by definition when non-monotone); aglobal minimum is guaranteed to exist over any convex set in the functiondomain. The scalar definition of convexity carries over to quasiconvexity:3.9 The matrix exponential of a diagonal matrix exponentiates each individual diagonalentry.

126 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSDefinition. Quasiconvex matrix-valued function.g(X) : R p×k →S M is a quasiconvex function of matrix X iff domg is a convexset and for each and every: Y,Z ∈ domg , 0 ≤ µ ≤ 1, and real vector w ofunit norm,w T g(µY + (1 − µ)Z)w ≤ max{w T g(Y )w, w T g(Z)w} (312)A quasiconcave function is characterizedw T g(µY + (1 − µ)Z)w ≥ min{w T g(Y )w, w T g(Z)w} (313)In either case, vector w becomes superfluous for real functions.△When g(X) : R p×k →S M is a quasiconvex function of matrix X, then foreach and every ν ∈R the corresponding sublevel setL ν g = { X ∈ dom g | w T g(X)w ≤ ν , ‖w‖=1 } ⊆ R p×k (314)= {X ∈ domg | g(X) ≼ νI } (315)is convex. Convexity of all the sublevel sets is a necessary and sufficientcondition for quasiconvexity of a real function ˆf , [83, §2] but it is not sufficientfor a higher-dimensional function because the sublevel set L ν g of amatrix-valued function g is an intersection of sublevel sets of real functionsw T gw (314).Likewise, convexity of all superlevel sets 3.10 is a necessary and sufficientcondition for quasiconcavity of a real function:3.2.0.0.1 Example. Rank function quasiconcavity.For A,B ∈ R m×n [28, §0.4]that follows from the fact [26, §3.6]rankA + rankB ≥ rank(A + B) (316)dim R(A) + dim R(B) = dim R(A + B) + dim(R(A) ∩ R(B)) (317)3.10 The superlevel set is similarly defined:L s g = {X ∈ dom g | g(X) ≽ sI }

3.2. QUASICONVEX FUNCTION 127For A,B ∈ S M + [1, §3.4.2]rankA + rankB ≥ rank(A + B) ≥ min{rankA, rankB} (318)that follows from the factN(A + B) = N(A) ∩ N(B), A,B ∈ S M + (99)Rank is a quasiconcave function on S M + because the right-hand side of (318)has the concave form of (312); videlicet,rank(A + B) = rank(µA + (1 −µ)B) (319)over the open interval (0, 1) of µ , which follows from the linearly independentdyads definition in §B.1.1.From this example we see, unlike convex functions, quasiconvex functionsare not necessarily continuous.Definition. Differentiable quasiconvex matrix-valued function.Assume that function g(X) : R p×k →S M is twice differentiable, and domgis an open convex set.Then g(X) is quasiconvex in X if wherever in its domain the directionalderivative (§D.1.4) becomes 0, the second directional derivative (§D.1.5)is positive definite there [1, §3.4.3] in the same direction Y ; id est, g isquasiconvex if for each and every point X ∈ dom g , all nonzero directionsY ∈ R p×k , and for t ∈ R ,ddt∣ g(X+ t Y ) = 0 ⇒t=0d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ≻ 0 (320)If g(X) is quasiconvex, conversely, then for each and every X ∈ domgand all Y ∈ R p×k ,ddt∣ g(X+ t Y ) = 0 ⇒t=0d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ≽ 0 (321)△

128 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.3 Salient propertiesof convex and quasiconvex functions1. • A convex (or concave) function is assumed continuous on the relativeinterior of its domain. [30, §10]• A quasiconvex (or quasiconcave) function is not necessarily a continuousfunction.2. g convex ⇔ −g concave.g quasiconvex ⇔ −g quasiconcave.3. convexity ⇒ quasiconvexity ⇒ convex sublevel sets.concavity ⇒ quasiconcavity ⇒ convex superlevel sets . 3.114. The scalar-definition of matrix-valued function convexity and the linetheorem (§3.1.2.3) [1, §3.4.2] translate identically to quasiconvexity(and quasiconcavity).5. Composition g(h(X)) of a convex (concave) function g with any affinemapping h : R m×n → R p×k , such that h(R m×n ) ∩ domg ≠ ∅ , becomesconvex (concave) in X ∈ R m×n . [29, §B.2.1] Likewise for the quasiconvex(quasiconcave) functions g .6. • A nonnegatively weighted sum of convex (concave) functions remainsconvex (concave).• A nonnegatively weighted maximum of quasiconvex functionsremains quasiconvex. A nonnegatively weighted minimum ofquasiconcave functions remains quasiconcave.3.11 For real functions, convex sublevel (superlevel) sets are necessary and sufficient forquasiconvexity (quasiconcavity).

Chapter 4Euclidean Distance MatrixThese results were obtained by Schoenberg (1935), a surprisinglylate date for such a fundamental property of Euclidean Geometry.−John Clifford Gower [84, §3]By itself, distance information between many points in Euclidean space islacking. We might want to know more; such as, relative or absolute positionor dimension of some hull. A question naturally arising in some fields (e.g.,geodesy, economics, genetics, psychology, biochemistry, engineering) [85] askswhat facts can be deduced given only distance information. What can weknow about the underlying points that the distance information purports todescribe? We also ask what happens when the given distance informationis incomplete; or suppose the distance information is not reliable, available,or specified only by certain tolerances. These questions motivate a study ofinter-point distance, well represented in any spatial dimension by a simplematrix from linear algebra. 4.1 In what follows, we will answer some of thesequestions via Euclidean distance matrices.4.0 c○ 2001 Jon Dattorro, all rights reserved.4.1 e.g., √ D ∈ R N×N , a classical two-dimensional matrix representation of absolute interpointdistance because its entries (in ordered rows and columns) can be written neatly ona piece of paper. Matrix D will be reserved throughout to hold distance-square.129

130 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX√521Figure 4.1: Convex hull of three points (N = 3) is shaded in R 3 (n = 3).Dotted lines are imagined vectors to points.4.1 EDMEuclidean space R n is a finite-dimensional real vector space having an innerproduct defined on it, hence a metric as well. [38, §3.1] A Euclidean distancematrix, an EDM in R N×N+ , is an exhaustive table of distance-square d ijbetween points taken by pair from a list of N points {x l , l=1... N} in R n ;the squared metric the measure of distance-square:d ij = ‖x i − x j ‖ 2 2∆= 〈x i − x j , x i − x j 〉 (322)Each point is labelled ordinally, hence the row or column index of an EDM,i or j =1... N , individually addresses all the points in the list.Consider the following example of an EDM for the case N = 3 :D = [d ij ] =⎡⎣⎤d 11 d 12 d 13d 21 d 22 d 23⎦ =d 31 d 32 d 33⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎡⎦ = ⎣0 1 51 0 45 4 0⎤⎦ (323)Matrix D has N 2 entries but only N(N − 1)/2 pieces of information. InFigure 4.1 we show three points in R 3 that can be arranged in a list to

4.2. METRIC REQUIREMENTS 131correspond to D in (323). Such a list is not unique because any rotation,reflection, or translation (§4.5) of the points in Figure 4.1 would produce thesame EDM D .4.2 Metric requirementsFor i,j = 1... N , the Euclidean distance between points x i and x j mustsatisfy the axiomatic requirements imposed by any metric space: [38, §1.1][36, §1.7]1. √ d ij ≥ 0, i ≠ j nonnegativity2. √ d ij = 0, i = j self-distance3. √ d ij = √ d ji symmetry4. √ d ij ≤ √ d ik + √ d kj , i≠j ≠k triangle inequalitywhere √ d ij is the Euclidean metric in R n (§4.4). Then all entries of an EDMmust be in concord with these Euclidean axioms: specifically, each entrymust be nonnegative, 4.2 the main diagonal must be 0 , 4.3 an EDM must besymmetric. The fourth axiom provides upper and lower bounds for eachentry. Axiom 4 is true more generally when there are no restrictions onindices i,j,k , but furnishes no new information.4.3 ∃ fifth Euclidean axiomThe four axioms of the Euclidean metric provide information insufficient tocertify that a bounded convex polyhedron more complicated than a trianglehas a Euclidean realization. [84, §2] Yet any list of points or the vertices ofany bounded convex polyhedron must conform to the axioms.4.2 Implicit from the terminology, √ d ij ≥ 0 ⇔ d ij ≥ 0 is always assumed.4.3 What we call zero self-distance, Marsden calls nondegeneracy. [36, §1.6]

132 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXExample. Triangle. Consider the EDM in (323), but missing one of itsentries:⎡0⎤1 d 13D = ⎣ 1 0 4 ⎦ (324)d 31 4 0Can we determine the unknown entries of D by applying the axioms?Axiom 1 demands √ d 13 , √ d 31 ≥0, axiom 2 requires the main diagonal be 0,while axiom 3 makes √ d 31 = √ d 13 . The fourth axiom tells us1 ≤ √ d 13 ≤ 3 (325)Indeed, described over that closed interval [1, 3] is a family of triangularpolyhedra whose angle at vertex x 2 varies from 0 to π radians. So, yes wecan determine the unknown entries of D , but they are not unique; nor shouldthey be from the information given for this example.4.3.0.0.2 Example. Small completion problem, I. Now consider thepolyhedron in Figure 4.2 formed from an unknown list {x 1 ,x 2 ,x 3 ,x 4 }. Thecorresponding EDM less one critical piece of information, d 14 , is given byD =⎡⎢⎣0 1 5 d 141 0 4 15 4 0 1d 14 1 1 0⎤⎥⎦ (326)From axiom 4 we may write a few inequalities for the two triangles commonto d 14 ; we find√5−1 ≤√d14 ≤ 2 (327)We cannot further narrow those bounds on √ d 14 using only the four axioms(§4.8.3.1.1). Yet there is only one possible choice for √ d 14 becausepoints x 2 ,x 3 ,x 4 must be collinear. All other values of √ d 14 in the interval[ √ 5−1, 2] specify impossible distances in any dimension; id est, in this particularexample the triangle inequality axiom does not yield an interval for√d14 over which a family of convex polyhedra can be reconstructed.

4.3. ∃ FIFTH EUCLIDEAN AXIOM 133x 3x 4√5121x 1 1 x 2Figure 4.2: Four axioms of the Euclidean metric are not a recipe for reconstructionof this polyhedron.We will return to this simple Example 4.3.0.0.2 to illustrate more elegantmethods of solution in §4.9.3, §4.12.4.1, and §4.8.3.1.1. Until then, we candeduce some general principles from the foregoing examples:• Unknown d ij of an EDM are not necessarily uniquely determinable.• The triangle inequality does not produce necessarily tight bounds. 4.4• The Euclidean axioms are insufficient for reconstruction.4.3.1 LookaheadThere must exist at least one requirement more than the four axioms ofthe Euclidean metric that makes them altogether necessary and sufficientto certify realizability of bounded convex polyhedra. Indeed, there areinfinitely many more; there are precisely N + 1 necessary and sufficientEuclidean requirements for N points constituting a generating list (§2.2.2).Here is the fifth requirement:4.4 The term tight with reference to an inequality means equality is achievable.

134 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXFifth Euclidean axiom. Relative-angle inequality. [86, §3.1](confer §4.12.2.1) Augmenting the axioms of the Euclidean metric in R n ,for all i,j,l ≠ k ∈{1... N}, i

4.4. EDM DEFINITION 1354.4 EDM definitionAscribe points in a list {x l ∈ R n , l=1... N} to the columns of a matrix X ;X = [x 1 · · · x N ] ∈ R n×N (54)where N is regarded as the cardinality of list X . When matrix D=[d ij ] isan EDM, its entries must be related to those points constituting the list bythe Euclidean distance-square: for i,j=1... N ,d ij = ‖x i − x j ‖ 2 = (x i − x j ) T (x i − x j ) = ‖x i ‖ 2 + ‖x j ‖ 2 − 2x T ix j= [ ] [ ] [ ]x T i x T I −I xij−I I x j= vec(X) T Φ ij vec X(330)where Φ ij has I ∈ S n in its ii th and jj th block of entries while −I ∈ S n fillsits ij th and ji th block; id est,Φ ij = (e i e T i + e j e T j − e i e T j − e j e T i ) ⊗ I ∈ S nN+ (331)where {e i ∈ R N , i = 1... N} is the set of standard basis vectors, and ⊗signifies the Kronecker product (§D.1.2.1). Thus each entry d ij is a convexquadratic function [1, §3, §4] of vec X ∈ R nN (18). [30, §6]The collection of all Euclidean distance matrices EDM N is a convex subsetof R N×N+ called the EDM cone (§5), hence not a subspace; (Figure 7.1, p.246)0 ∈ EDM N ⊆ S N 0 ∩ R N×N+ ⊂ S N (332)An EDM D must be expressible as a function of some list X ; id est, it musthave the formD(X) = ∆ δ(X T X)1 T + 1δ(X T X) T − 2X T X ∈ EDM N (333)⎡⎤0 Φ 12 · · · Φ 1N= (1 ⊗ vec X) T Φ 12 0... Φ 2N⎢⎥(1 ⊗ vec X) (334)⎣ ....... . ⎦Φ 1N Φ 2N · · · 0Conversely, D(X) will make an EDM for any X ∈ R n×N , but D(X) is not a

136 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXconvex function of X (§4.4.1). Now the EDM cone may be described:EDM N = { D(X) | X ∈ R n×N} (335)Expression D(X) is a matrix definition of EDM and so conforms to theEuclidean axioms:Nonnegativity of EDM entries (axiom 1, §4.2) is obvious from thedistance-square definition (330), and so assumed to hold for any D expressiblein the form D(X) in (333).When we say D is an EDM, reading from (333), it implicitly means themain diagonal must be 0 (axiom 2, self-distance) and D must be symmetric(axiom 3); δ(D) = 0 and D T = D or, equivalently, D ∈ S N 0 are necessarymatrix criteria.The mapping D(X) is homogeneous in the sense, for ζ ∈ R ,√D(ζX) = |ζ|√D(X) (336)where the square root is entry-wise.4.4.1 −V T N D(X)V N convexity([ ])xiWe saw that the EDM entries d ij are convex quadratic functions.x jYet −D(X) (333) is not a quasiconvex function of matrix X ∈ R n×N becausethe second directional derivative (§3.2)− d2dt 2 ∣∣∣∣t=0D(X+ t Y ) = 2 ( −δ(Y T Y )1 T − 1δ(Y T Y ) T + 2Y T Y ) (337)is indefinite for any Y ∈ R n×N since its main diagonal is 0. [44, §4.2.8][28, §7.1, prob.2] Hence −D(X) can neither be convex in X.The outcome is different when instead we consider− V T N D(X)V N = 2V T NX T XV N (338)where we introduce the full-rank skinny auxiliary matrix (§B.4.2)⎡⎤−1 −1 · · · −1V ∆ N = √ 11 01= 1 [ ] −1T√ ∈ R N×N−1 (339)2 ⎢⎣. .⎥. ⎦ 2 I0 1

4.4. EDM DEFINITION 137(N(V N )=0) having rangeR(V N ) = N(1 T ), V T N 1 = 0 (340)Matrix-valued function (338) meets the criterion for convexity in §3.1.2.4over its domain that is all of R n×N ; videlicet, for any Y ∈ R n×N ,− d2dt 2 V T N D(X + t Y )V N = 4V T N Y T Y V N ≽ 0 (341)Quadratic matrix function −VN TD(X)V N is therefore convex in X achievingits minimum, with respect to the positive semidefinite cone (§2.8.2.3), atX = 0. When the penultimate number of points exceeds the dimension of thespace n < N −1, strict convexity of the quadratic (338) becomes impossiblebecause (341) could not then be positive definite.4.4.2 Gram-form EDM definitionPositive semidefinite matrix X T X in (333), formed from an inner product ofthe list, is known as the Gram matrix; [37, §3.6]⎡⎤‖x 1 ‖ 2 x T 1x 2 x T 1x 3 · · · x T 1x NxG = ∆ T 2x 1 ‖x 2 ‖ 2 x T 2x 3 · · · x T 2x NX T X =x T 3x 1 x T 3x 2 ‖x 3 ‖ 2 ... x T 3x N∈ S N +⎢⎥⎣ . ....... . ⎦xN Tx 1 xN Tx 2 xN Tx 3 · · · ‖x N ‖ 2⎡⎤⎛⎡= δ⎜⎢⎝⎣‖x 1 ‖‖x 2 ‖.‖x N ‖⎤⎞1 cos ψ 12 cos ψ 13 · · · cos ψ 1N ⎛⎡cosψ 12 1 cos ψ 23 · · · cos ψ 2N⎥⎟⎦⎠cosψ 13 cos ψ 23 1... cos ψ 3Nδ⎜⎢⎢⎥ ⎝⎣⎣ . ....... . ⎦cos ψ 1N cosψ 2N cos ψ 3N · · · 1‖x 1 ‖‖x 2 ‖.‖x N ‖⎤⎞⎥⎟⎦⎠∆= δ 2 (G) 1/2 Ψ δ 2 (G) 1/2 (342)where ψ ij is the angle between vectors x i and x j , and where δ 2 denotesa diagonal matrix in this case. Distance-square d ij (330) is related to Gram

138 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXmatrix entries G T = G ∆ = [g ij ] ,hence the linear EDM definitiond ij = g ii + g jj − 2g ij (343)D(G) ∆ = δ(G)1 T + 1δ(G) T − 2G ∈ EDM N⇔G ≽ 0(344)We note from (342) that positive semidefiniteness of the inter-point anglematrix Ψ implies positive semidefiniteness of the Gram matrix G ; [1, §8.3]Ψ ≽ 0 ⇒ G ≽ 0 (345)The EDM cone can be described,EDM N = { D(G) | G ∈ S N +}(346)4.4.2.1 First point at the originConsider the calculation (I − 1e T 1 )D(G)(I − e 1 1 T ) ; setting D(G) = D forconvenience,where− ( D − (De 1 1 T + 1e T 1D) ) 12 = G − (Ge 11 T + 1e T 1G) + 1e T 1Ge 1 1 T (347)e 1 ∆ =⎡⎢⎣10.0⎤⎥⎦ (348)is the first vector from the standard basis. Under the assumption the firstpoint x 1 in the unknown list X resides at the origin,then it follows for D an EDM,Xe 1 = 0 ⇔ Ge 1 = 0 (349)G = − ( D − (De 1 1 T + 1e T 1D) ) 12 , x 1 = 0= − [ 0 √ ] TD [ √ ]2V N 0 1 2VN2[ ] 0 0T=0 −VN TDV N(350)

4.4. EDM DEFINITION 139whereI − e 1 1 T = [ 0 √ 2V N](351)is a nonorthogonal projector (§E.1). From this we get the first matrix criterionfor an EDM, first proved by Schoenberg in 1935; [87]D ∈ EDM N⇔{−VTN DV N ≽ 0D ∈ S N 0(352)We provide a rigorous more [ geometric proof ] in §4.9.0 0TBy substituting G =0 −VN TDV (350) into D(G) (344), assumingNx 1 = 0,[ ]0D =δ ( [−VN TDV ) 1 T + 1 0 δ ( ) ] [ ]−VNDV T T 0 0TN − 2N0 −VN TDV N(353)4.4.2.2 0 geometric centerNow consider the calculation (I − 1 N 11T )D(G)(I − 1 N 11T ) [88, §1] under theassumption that the geometric center (§4.5.1.0.1) of a corresponding list isthe origin;X1 = 0 ⇔ G1 = 0 (354)Setting D(G) = D for convenience,G = − ( D − 1 N (D11T + 11 T D) + 1N 2 11 T D11 T) 12 , X1 = 0= −V DV 1 2(355)where properties of the auxiliary matrixV = ∆ I − 1 N 11T (996)are found in §B.4. From this, the more popular form of Schoenberg’s criterion:{−V DV ≽ 0D ∈ EDM N ⇔D ∈ S N (356)0

140 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXOf particular utility when D ∈ EDM N is the fact, (§B.4.2 no.11) (330)tr ( −V DV 1 2)=12N∑i,j(∑d ij = 12N vec(X)T Φ ij)vec Xi,j∑= trG = N ‖x l ‖ 2 , X1 = 0l=1(357)where ∑ Φ ij ∈ S nN+ (331), therefore convex in vecX . We will find thistrace useful as a heuristic to minimize affine dimension of an unknown listarranged columnar in X , (§7.2.2) but it tends to facilitate realization of alist configuration having least energy; id est, it compacts a reconstructed listby minimizing total norm-square of the vertices.By substituting G=−V DV 1 (355) into D(G) (344), assuming X1=0,2(D = δ −V DV 1 ) (1 T + 1δ −V DV 1 T (− 2 −V DV22) 1 )(358)2These relationships will allow combination of distance and Gram constraintsin any optimization problem we may pose:• Constraining all diagonal entries of the Gram matrix to 1, for example,is equivalent to the constraint that all points lie on a hypersphere centeredat the origin. Any further constraint on the Gram matrix thenapplies only to inter-point angle Ψ .• More generally, constraint on inter-point angle Ψ can be accomplishedby fixing all the individual point lengths δ(G) 1/2 ; thenΨ = − 1 2 δ2 (G) −1/2 V DV δ 2 (G) −1/2 (359)When δ 2 (G) is nonsingular, then Ψ ≽ 0 ⇔ G ≽ 0. (§A.3.1.0.5)4.4.2.2.1 Example. Vector constraints via Gram matrix.Capitalizing again on identity (355) relating Gram and EDM matrices as inExample 4.4.2.2.3, a constraint set such astr ( − 1V DV e )2 ie T i = ‖xi ‖ 2 = κ i⎫⎪tr ( ⎬− 1V DV (e 2 ie T j + e j e T i ) 2) 1 = xTi x j = κ ijtr ( (360)− 1V DV e ) ⎪2 je T j = ‖xj ‖ 2 = κ ⎭j

4.4. EDM DEFINITION 141(given constants κ i , κ ij , κ j ) unambiguously relates list member x i to x j towithin an isometry through the inner product identitycosψ ij =xT i x j‖x i ‖ ‖x j ‖(361)Consider the academic problem of finding a Gram matrix subject to someconstraints on the entries of the corresponding EDM:find −V DV 1 2 ∈ SNsubject to 〈 (e i e T j + e j e T i ) 1 2 , D〉 = ˆd ij , i,j=1... N , i < j−V DV ≽ 0(362)where the ˆd ij are given nonnegative constants; EDM D can, of course, bereplaced with the equivalent Gram form (358) in that constraint. Barvinok’sproposition predicts 4.7 for N >1 , 4.8⌊√ ⌋8(N(N −1)/2) + 1 − 1rankV DV ≤= N − 1 (363)2Of interest in this problem formulation (362) is the lack of explicit constraintson the main diagonal of EDM D ∈ S N 0 ; finding the Gram matrix is logicallydisjoint from this EDM constraint.Choice and number of constraints can affect uniqueness and affine dimensionof solution as shown in this next example.4.4.2.2.2 Example. Trilateration in small wireless sensor network.Given three known absolute point positions in R 2 (three anchors ˆx 2 , ˆx 3 , ˆx 4 )and only one unknown point (one sensor x 1 ∈ R 2 ), it is easy to unambiguouslydetermine the sensor’s relative position when given its noiselessdistance-square ˆd i1 from only two of the three anchors under the assumptionx 1 = 0 (without loss of generality). The problem can be expressed as afeasibility problem in terms of the Gram matrix (350):4.7 Conditions required for tightening the upper bound as in (153) are not met herebecause the feasible set is not bounded.4.8 −V DV | N←1 = 0 (§B.4.1).

142 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXfind −VN TDV N ∈ S 3subject to tr ( )D(e i e T 1 + e 1 e T i ) 1 = ˆd 2i1 , i = 2, 3([ ] )0 0Ttr0 −VN TDV (e i e T j + e j e T i ) 1 = ˆx T 2 i ˆx j , 2 ≤ i < j = 3, 4N([ ])0 0T∑tr0 −VN TDV = 4 ‖ˆx i ‖ 2Ni=2[ ] 0 0T0 −VN TDV ≽ 0N(364)having a total of six affine equality constraints, where the constraint ondistance-square ˆd i1 can be equivalently written as a constraint on the Grammatrix by substituting (353).This Platonic feasibility problem explores the convex intersection of a faceS 3 + [23, §II.12] of the positive semidefinite cone in S 4 with an affine set in theproper subspace of symmetric matrices S 3 , the intersection of six hyperplanesin isomorphic R 6 . Because of the bounding Σ constraint (whose normalis I), there exists a feasible Gram matrix whose rank is bounded above byBarvinok’s Proposition 2.6.6.4.1 (153); more precisely, if the feasible set isnonempty and nontrivial, then there exists a solution to (364) having affinedimension 1 or 2 . 4.9 The sensor is thereby constrained to lie in the affinehull of the anchors. Having found the Gram matrix, we can locate the sensorrelative to the anchors simply by using a Cholesky factorization (§4.10).The sensor’s relative location is unique when the six hyperplanes intersectat a point in isomorphic R 6 belonging to the positive semidefinite cone.The hyperplane intersection is a point whenever their normals are linearlyindependent in R 6 . To find each normal for the two ˆd i1 hyperplanes, wesubstitute the Gram-form definition (344) for D and then utilize the selfadjointnessproperty (813) of the linear diagonal operator δ ; for i=2, 3,〈δ(G)1 T + 1δ(G) T − 2G , (e i e T 1 + e 1 e T i ) 1 2〉=〈G, δ((ei e T 1 + e 1 e T i )1 ) − (e i e T 1 + e 1 e T i ) 〉(365)The first row and column for these and the remaining four constraint normalsin S 4 are truncated because, for n>m,{x∈ R n | a T x=b} ∩ R m = {x∈ R m | a(1 : m) T x=b} (366)4.9 Intuitively, the bounding constraint might, for example, be met by rotating a planarconfiguration of four points in R 3 .

4.4. EDM DEFINITION 143x 3x 4x 5x 6Figure 4.3: Arbitrary hexagon in R 3 whose vertices are labelled clockwise.x 1x 2These six truncated normals in isometrically isomorphic R 6 are arrangedcolumnar in a matrix⎡⎤1 0 0 0 0 10 0 √2 10 0 00 1 0 0 0 110 0 0 √2 0 0(367)⎢1⎣ 0 0 0 0 √2⎥0 ⎦0 0 0 0 0 1whose rank is obviously 6 independent of the problem data. This means theisometric realization is always unique when it exists, and if the nonemptyfeasible set consisting of one positive semidefinite point is nonzero, then theaffine dimension (§4.7.2) of this solution must be 1 or 2 by Barvinok’s proposition.4.4.2.2.3 Example. Hexagon. Barvinok [89, §2.6] poses a problem ingeometric realizability of an arbitrary hexagon (Figure 4.3) having:1. prescribed (one-dimensional) face-lengths,2. prescribed angles between the three pairs of opposing faces, and3. a constraint on the sum of norm-square of each and every vertex;

144 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXten affine equality constraints in all on the Gram matrix G ∈ S 6 . Let’srealize this as a convex feasibility problem (with constraints written in thesame order) under the assumption of 0 geometric center:find −V DV 1 ∈ 2 S6subject to tr ( )D(e i e T j + e j e T i ) 1 2 = l2ij , j−1 = (i = 1... 6) mod 6tr ( − 1V DV (A 2 i + A T i ) 2) 1 = cosϕi , i = 1... 3tr(− 1 2 V DV ) = 1−V DV ≽ 0where, for A i ∈ R 6×6 ,A 1 = (e 1 − e 6 )(e 3 − e 4 ) T /(l 61 l 34 )A 2 = (e 2 − e 1 )(e 4 − e 5 ) T /(l 12 l 45 )A 3 = (e 3 − e 2 )(e 5 − e 6 ) T /(l 23 l 56 )(368)(369)and where the first constraint on length-square l 2 ij can be equivalently writtenas a constraint on the Gram matrix by substituting (358). We show how tonumerically solve such a problem by alternating projection in §E.9.2.1.1.Barvinok’s Proposition 2.6.6.4.1 asserts the existence of a list correspondingto Gram matrix G solving this feasibility problem whose corresponding affinedimension (§4.7.2) does not exceed 3 because the convex feasible set 4.10 isbounded by the third constraint.4.4.3 Inner-product form EDM definitionEquivalent to (330) is [90, §1-7] [26, §3.2]d ij = d ik + d kj − 2 √ d ik d kj cos θ ikj= [√ d ik√dkj] [ 1 −e ıθ ikj−e −ıθ ikj1] [√ ]d ik√dkj(370)called the law of cosines, where ı ∆ = √ −1 , i,k,j are positive integers, andθ ikj is the angle at vertex x k formed by vectors x i − x k and x j − x k ;cos θ ikj =1(d 2 ik + d kj − d ij )√ = (x i − x k ) T (x j − x k )dik d kj ‖x i − x k ‖ ‖x j − x k ‖4.10 the presumably nonempty set of all points D solving this feasibility problem.(371)

4.4. EDM DEFINITION 145where ([√the numerator ]) forms an inner product of vectors. Distance-squaredikd ij √dkj is a convex quadratic function 4.11 on R 2 + whereas d ij (θ ikj ) isquasiconvex (§3.2) minimized over domain −π ≤θ ikj ≤π by θ ⋆ ikj =0, we getthe Pythagorean theorem when θ ikj =±π/2, and d ij (θ ikj ) is maximized whenθ ⋆ ikj =±π ; d ij = (√ d ik + √ ) 2,d kj θikj = ±πsod ij = d ik + d kj , θ ikj = ± π 2d ij = (√ d ik − √ d kj) 2,θikj = 0(372)| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj (373)Hence the triangle inequality, axiom 4 of the Euclidean metric, holds for anyEDM D .We may construct the inner-product form of the EDM definition for matricesby evaluating (370) for k=1: By defining⎡√ √ √ ⎤d 12 d12 d 13 cos θ 213 d12 d 14 cos θ 214 · · · d12 d 1N cos θ 21N√ √ √ d12 dΘ T Θ =∆ 13 cosθ 213 d 13 d13 d 14 cos θ 314 · · · d13 d 1N cos θ 31N√ √ √ d12 d 14 cosθ 214 d13 d 14 cos θ 314 d 14... d14 d 1N cos θ 41N∈ S N−1⎢⎥⎣ ........√ √. ⎦√d12 d 1N cosθ 21N d13 d 1N cos θ 31N d14 d 1N cos θ 41N · · · d 1Nthen any EDM may be expressed[D(Θ) =∆ 0δ(Θ T Θ)]1 T + 1 [ 0 δ(Θ T Θ) T ] − 2[ 0 0T](374)∈ EDM N0 Θ T Θ(375)EDM N = { D(Θ) | Θ ∈ R n×N−1} (376)for which all Euclidean axioms hold. The entries of Θ T Θ result from innerproducts as in (371); id est,Θ = [x 2 − x 1 x 3 − x 1 · · · x N − x 1 ] ∈ R n×N−1 (377)[]4.11 1 −e ıθ ikj−e −ıθ ≽ 0, having eigenvalues {0,2}. Minimum is achieved forikj1[ √ ] {dik√dkjµ1, µ ≥ 0, θikj = 0=. (§D.2.1, [1, exmp.4.5])0, −π ≤ θ ikj ≤ π , θ ikj ≠ 0

146 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXThe inner product Θ T Θ is obviously related to the Gram matrix (342),[ ] 0 0TG =0 Θ T , xΘ 1 = 0 (378)For D(Θ) = D and no constraint on the list X, (confer (350) (355))4.4.3.1 Relative-angle formΘ T Θ = −V T NDV N ∈ R N−1×N−1 (379)Like D(X) (333), D(Θ) will make an EDM for any Θ∈R n×N−1 , it is neithera convex function of Θ (§4.4.3.2), and it is homogeneous in the sense (336).Scrutinizing Θ T Θ we find that because of the arbitrary choice k=1, distancestherein are all with respect to point x 1 . Similarly, relative angles in Θ T Θ arebetween all vector pairs having vertex x 1 . Yet picking arbitrary θ i1j to fillΘ T Θ will not necessarily make an EDM; Θ T Θ must be positive semidefinite.Θ T Θ = δ( √ d) Ω δ( √ d) =∆⎡√ ⎤⎡⎤⎡√ ⎤d12 0 1 cos θ 213 · · · cos θ 21N d12 0√ d13 ⎢⎥cosθ 213 1... cos θ √31Nd13 ⎣ ... ⎦⎢⎥⎢⎥√ ⎣ ....... . ⎦⎣... ⎦√0d1N cos θ 21N cos θ 31N · · · 1 0d1N(380)Expression D(Θ) defines an EDM for any positive semidefinite relative-anglematrixΩ = [cos θ i1j , i,j = 2...N] ∈ S N−1 (381)and any nonnegative distance vector√d = [√d1j , j = 2...N] = √ δ(Θ T Θ) ∈ R N−1 (382)because (§A.3.1.0.5)Ω ≽ 0 ⇒ Θ T Θ ≽ 0 (383)The decomposition (380) and the relative-angle matrix inequality Ω ≽ 0 leadto a different expression of the inner-product form EDM definition (375):[ ]D(Ω,d) =∆ 01 T +1 [ 0 d ] ([ ])[ ] ([ ])0T − 2δ √d 0 0Tδ √d 0∈ EDM Nd0 Ω(384)

4.5. INVARIANCE 147EDM N ={D(Ω,d) | Ω ≽ 0, √ }d ≽ 0(385)In the particular circumstance x 1 = 0, we can relate inter-point angle matrixΨ from the Gram-form in (342) to relative-angle matrix Ω in (380). Thus,[ ] 0 0TΨ ≡ , x0 Ω 1 = 0 (386)4.4.3.2 Inner-product form −VN TD(Θ)V N convexity[√ ]dikWe saw that d ij is a convex quadratic function of √dkj and a quasiconvexfunction of θ ikj . Here the situation for the inner-product form D(Θ) (375) ofthe EDM definition is identical to that in §4.4.1: −D(Θ) is not a quasiconvexfunction of Θ by the same reasoning, and from (379)− V T N D(Θ)V N = Θ T Θ (387)is a convex quadratic function of Θ on the domain R n×N−1 achieving itsminimum at Θ = 0.4.4.3.3 Inner-product form conclusionWe deduce that knowledge of inter-point distance is equivalent to knowledgeof distance and angle from the perspective of one point, x 1 in our chosencase. The total amount of information in Θ T Θ , N(N−1)/2, is unchanged 4.12with respect to EDM D .4.5 InvarianceWhen D is an EDM, there exist an infinite number of corresponding N-point lists X (54) in Euclidean space. All those lists are related by isometrictransformation: rotation, reflection, and translation (offset or shift).4.12 The reason for the amount O(N 2 ) information is because of the relative measurements.The use of a fixed reference in the measurement of angles and distances would reducethe required information but is antithetical. In the particular case n = 2, for example,ordering all points x l in a length-N list by increasing angle of vector x l −x 1 with respectto x 2 − x 1 , θ i1j becomes equivalent to j−1 ∑θ k,1,k+1 ≤ 2π and the amount of informationis reduced to 2N −3; rather, O(N).k=i

148 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.5.1 TranslationAny translation common among all the points x l in a list will be cancelled inthe formation of each d ij . Proof follows directly from (330). Knowing thattranslation α in advance, we may remove it from the list constituting thecolumns of X by subtracting α1 T . Then it stands to reason by definition(333) of an EDM, for any translation α∈ R n ,D(X − α1 T ) = D(X) (388)In words, inter-point distances are unaffected by offset; EDM D is translationinvariant. When α = x 1 in particular,D(X − x 1 1 T ) = D(X − Xe 1 1 T ) = D ( X [ 0 √ 2V N])= D(X) (389)4.5.1.0.1 Example. Translating geometric center to origin.We might choose to shift the geometric center α g of an N-point list {x l }arranged columnar in X to the origin; [47] [91]α = α ∆ g = Xb ∆ g = 1 X1 ∈ P ⊆ A (390)NIf we were to associate a point-mass m l with each of the points x l in thelist, then their center of mass (or gravity) would be ( ∑ x l m l )/ ∑ m l . Thegeometric center is the same as the center of mass under the assumption ofuniform mass density across points. [79] The geometric center always lies inthe convex hull P of the list; id est, α g ∈ P because b T g 1 = 1 and b g ≽ 0.Subtracting the geometric center from every list member,X − α g 1 T = X − 1 N X11T = X(I − 1 N 11T ) = XV ∈ R n×N (391)So we have (confer (333))D(X) = D(XV ) = δ(V T X T XV )1 T +1δ(V T X T XV ) T −2V T X T XV ∈ EDM N(392)More generally, any b from α=Xb chosen such that b T 1=1, makes anauxiliary V -matrix. (§B.4.5)

4.5. INVARIANCE 149For the inner-product form EDM definition D(Θ) (375) and α ∈ R n , itgenerally holdsD(Θ − α1 T ) ≠ D(Θ) (393)But because (377)Θ = X √ 2V N (394)and for the list form EDM definition (333)D(X − x 1 1 T ) = D ( X [ 0 √ ])2V N = D([0 Θ ]) = D(X) (395)then we have translation invariance in the following sense:D ([ −α Θ − α1 ]) T = D([0 Θ ]) (396)4.5.1.1 Gram form invarianceThe Gram form EDM definition (344) exhibits invariance to translation bya doublet (§B.2) u1 T + 1u T ;because, for any u∈ R N , D(u1 T + 1u T )=0.D(G) = D(G − (u1 T + 1u T )) (397)4.5.2 Rotation/ReflectionRotation of the list X ∈ R n×N about some arbitrary point α∈ R n , or reflectionthrough some affine subset containing α is accomplished via QX −α1 T ,where Q is an orthogonal matrix (§B.5).We rightfully expectD(QX − α1 T ) = D ( Q(X − α1 T ) ) = D(QX) = D(X) (398)Because D(X) is translation invariant, we may safely ignore offset and consideronly the impact of matrices that pre-multiply X . Inter-point distancesare unaffected by rotation or reflection; we say, EDM D is rotation/reflectioninvariant. Proof follows from the fact, Q T = Q −1 ⇒ X T Q T QX = X T X . So(398) follows directly from (333).

150 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXThe class of pre-multiplying matrices for which inter-point distances areunaffected is a little more broad than orthogonal matrices. Looking at EDMdefinition (333), it appears that any matrix Q p such thatwill have the propertyX T Q T pQ p X = X T X (399)D(Q p X) = D(X) (400)An example is skinny Q p ∈ R m×n (m>n) having orthonormal columns.4.5.2.1 Inner-product form invarianceLikewise, D(Θ) (375) is rotation/reflection invariant;so (399) and (400) would similarly apply.4.5.3 Invariance conclusionD(QΘ) = D(Θ) (401)In the construction of an EDM, absolute rotation, reflection, and translationinformation is lost. Given an EDM, reconstruction of point position(§4.10, the list X) can be guaranteed correct only in affine dimension r ;id est, in relative position. Given a noiseless complete EDM, this isometricreconstruction is unique in so far as every realization of a corresponding listX is congruent: (§4.6)4.6 Injectivity of D. Unique reconstructionInjectivity implies uniqueness of an isometric reconstruction; hence we endeavorto demonstrate it.EDM definitions D(X) (333), D(G) (344), and D(Θ) (375) are many-toonesurjections (§4.5) onto the same range; the EDM cone (§5): Independentof Euclidean dimension n ,EDM N = { D(X) : R n×N → S N 0 | X ∈ R } n×N= { }D(G) : S N → S N 0 | G ∈ S N += { (402)D(Θ) : R n×N−1 → S N 0 | Θ ∈ R n×N−1}

4.6. INJECTIVITY OF D . UNIQUE RECONSTRUCTION 1514.6.0.1 Inner-product form injectivity and surjectivitySubstituting Θ T Θ ← −VN TDV N (387) into the inner-product form EDMdefinition (375), D(Θ) may be further decomposed: (confer (353))[0D(D) =δ ( −VN TDV )N] [1 T + 1 0 δ ( −VN TDV ) ] TN[ 0 0T− 20 −VN TDV N(403)This linear operator D is an injective (one-to-one) map of the EDM coneonto itself. Yet when its domain is instead the entire symmetric hollowsubspace S N 0 = aff EDM N (633), D(D) becomes an injective map onto thatsame subspace. Proof follows directly from the fact: linear D has no nullspaceon S N 0 . [92, §A.1]]4.6.0.2 Gram-form bijectivityThe Gram-formD(G) = δ(G)1 T + 1δ(G) T − 2G (344)is an injective map, for example, on the subspace of symmetric matriceshaving all zeros in the first row and columnS N 1∆= {G∈ S N | Ge 1 = 0}{[ ] [ 0 0T 0 0T= Y0 I 0 I] }| Y ∈ S N(1296)because it obviously has no nullspace there. Since Ge 1 =0 (349) means thefirst point in the list X resides at the origin (342), then D(G) on S N 1 ∩ S N +must be surjective onto EDM N .Because it has no nullspace on the geometric center subspace (§E.7.1.0.2),D(G) is also injective on that subspaceS N g∆= {G∈ S N | G1 = 0}= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )} (1292)= {V YV | Y ∈ S N } ⊂ S N (1293)

152 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXwhere V ∈ S N is the auxiliary orthogonal projector having R(V )= N(1 T )(B.4.1).To prove injectivity of D(G) on S N g (1292) (1293): Any matrix Y ∈ S Ncan be decomposed into orthogonal components in S NY = V Y V + (Y − V Y V ) (404)where V Y V ∈ S N g and Y −V Y V ∈ S N⊥g ={u1 T + 1u T | u∈ R N } (1294). Becauseof translation invariance (§4.5) and linearity, D(Y −V Y V )=0 henceN(D(Y )) ⊇ S N⊥g . It remains only to show D(V Y V ) = 0 ⇔ V Y V = 0(⇔ Y = u1 T + 1u T for some particular u∈ R N) . D(V Y V ) will vanish whenever2V Y V = δ(V Y V )1 T +1δ(V Y V ) T . But this implies R(1) were a subsetof R(V Y V ) (§B), which is contradictory. We thus haveN(D(Y )) = S N⊥g (405)Since G1 = 0 ⇔ X1 = 0 means the list X is geometrically centered atthe origin, and because N(D(G)) is the orthogonal complement S N⊥g , thenD(G) on S N g ∩ S N + must be surjective onto EDM N .4.6.1 Inversion of DInjectivity of linear operator D(D) (403) suggests inversion of the linearoperatorV N (D) ∆ = −V T NDV N (406)whose nullspace is {u1 T + 1u T | u ∈ R N } = S N⊥g (1294). Thus V N (D) isinjective on S N 0 because S N⊥g ∩ S N 0 = 0. Indeed, revising the argument ofthat linear operator D ,D ( [−VN TDV )N =0δ ( −VN TDV )Nwe have, for D ∈ S N 0 ,] [1 T + 1 0 δ ( −VN TDV ) ] TN[ 0 0T− 20 −VN TDV N(407)D = D ( V N (D) ) (408)− V T NDV N = V N(D(−VTN DV N ) ) (409)]

4.6. INJECTIVITY OF D . UNIQUE RECONSTRUCTION 153S N⊥gbasis S N 0S N⊥0basis S N gFigure 4.4: Two pairs of orthogonal complements in S N . dim S N g = dim S N 0 =N(N − 1)/2 in isomorphic R N2 . Orthogonal projection of the basis for S N 0on S N g yields another basis for S N g .orS N 0 = D ( V N (S N 0 ) ) (410)− V T N S N 0 V N = V N(D(−VTN S N 0 V N ) ) (411)These operators V N and D are therefore mutual inverses.Now instead we apply the injective Gram-form operator (344) D(G).Define the linear geometric centering operator (§E.7.1.0.2)V(D) ∆ = −V DV 1 2(412)This orthogonal projector V(D) has no nullspace on S N 0 because the projectionof −D/2 on S N g (1292) can be 0 if and only if D ∈ S N⊥g ; butS N⊥g ∩ S N 0 = 0. Projector V(D) is therefore injective hence invertible onS N 0 . We find, for D ∈ S N 0 , (confer (358))id est,D(−V DV 1 2 ) = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (413)

154 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXorD = D ( V(D) ) (414)− V DV = V ( D(−V DV ) ) (415)S N 0 = D ( V(S N 0 ) ) (416)− V S N 0 V = V ( D(−V S N 0 V ) ) (417)reestablished in §4.10.2.1. These operators V and D are mutual inverses.Further, −V S N 0 V/2 is equivalent to the geometric center subspace S N g inthe ambient space of symmetric matrices;because (51)S N g = −V S N V/2 = −V ( S N 0 ⊕ S N⊥0)V/2 = −V SN0 V/2 (418)− V S N 0 V/2 ⊇ −V S N⊥0 V/2 = −V δ 2 (S N )V/2 (419)which holds if and only if S N⊥0 ∩ S N g = 0. The geometry is abstracted inFigure 4.4. Gram-form D ( )S N g (344) is equivalent to SN0 ; by (416)D ( S N g)= D(−V (SN0 ⊕ S N⊥0 )V/2 ) = S N 0 + D ( −V S N⊥0 V/2 ) = S N 0 (420)because S N 0 ⊇ D ( −V δ 2 (S N )V/2 ) . We have, finally, [93, §2.1] [94, §2][95, §18.2.1] [96, §2.1]S N 0 = D(S N g ) (421)and from the bijectivity results in §4.6.0.2,S N g = V(S N 0 ) (422)EDM N = D(S N g ∩ S N +) (423)S N g ∩ S N + = V(EDM N ) (424)4.7 Embedding in the affine hullThe affine hull A (56) of a point list {x l }, arranged columnar in X ∈ R n×N(54), is identical to the affine hull of that polyhedron P (60) formed from allconvex combinations of the x l ; [1, §2] [30, §17]A = aff X = aff P (425)

4.7. EMBEDDING IN THE AFFINE HULL 155Comparing definitions (56) and (60), it becomes obvious that the x l andtheir convex hull P are embedded in their unique affine hull A ;A ⊇ P ⊇ {x l } (426)Recall that affine dimension r is a lower bound on embedding, equal tothe dimension of the subspace parallel to that nonempty affine set A in whichthe points are embedded. (§2.2.1) We define the dimension of the convex hullP to be the same as the dimension r of the affine hull A [30, §2], but r is notnecessarily equal to the rank of X (444).For the particular example illustrated in Figure 4.1, P is the triangle plusits relative interior while its three vertices constitute the entire list X. Theaffine hull A is the unique plane that contains the triangle, so r=2 in thatexample while the rank of X is 3. Were there only two points in Figure 4.1,then the affine hull would instead be the unique line passing through them;r would become 1 while the rank would then be 2.4.7.1 Determining affine dimensionKnowledge of affine dimension r is important because we lose any absoluteoffset component common to all the generating x l in R n when reconstructingconvex polyhedra given only distance information. (§4.5.1) To calculate r ,we first eliminate any offset that serves to increase dimensionality of thesubspace required to contain P ; subtracting α ∈ A from every list memberwill work,X − α1 T (427)translating A to the origin:A − α = aff(X − α1 T ) = aff(X) − α (428)P − α = conv(X − α1 T ) = conv(X) − α (429)which follow from their definitions. Because (425) and (426) translate,R n ⊇ A−α = aff(X −α1 T ) = aff(P −α) ⊇ P −α ⊇ {x l −α} (430)where from the previous relations it is easily shownaff(P − α) = aff(P) − α (431)

156 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXTranslating A neither changes its dimension or the dimension of the embeddedpolyhedron P ; (55)r ∆ = dim A = dim(A − α) ∆ = dim(P − α) = dim P (432)For any α ∈ R n , (428)-(432) remain true. [30, p.4, p.12] Yet when α ∈ A ,the affine set A − α becomes a unique subspace of R n in which the {x l − α}and their convex hull P − α are embedded (430), and whose dimension ismore easily calculated.4.7.1.0.1 Example. Translating first list-member to origin.Subtracting the first member α ∆ = x 1 from every list member will translatetheir affine hull A and their convex hull P and, in particular, x 1 ∈ P ⊆ A tothe origin in R n ; videlicet,X −x 1 1 T = X −Xe 1 1 T = X(I −e 1 1 T ) = X [ 0 √ 2V N]∈ R n×N (433)where V N is defined in (339), and e 1 in (348). Applying (430) to (433),R n ⊇ R(XV N ) = A−x 1 = aff(X −x 1 1 T ) = aff(P −x 1 ) ⊇ P −x 1 ∋ 0(434)where XV N ∈ R n×N−1 . Hencer = dim R(XV N ) (435)Since shifting the geometric center to the origin (§4.5.1.0.1) translates theaffine hull to the origin as well, then it must also be truer = dim R(XV ) (436)For any matrix whose range is R(V )= N(1 T ) we get the same result; e.g.,r = dim R(XV †TN ) (437)because R(XV ) = {Xz | z ∈ N(1 T )} and R(V ) = R(V N ) = R(V †TN) (§E).These auxiliary matrices (§B.4.2) are more closely related;V = V N V † N(438)

4.7. EMBEDDING IN THE AFFINE HULL 1574.7.2 Affine dimension r versus rankNow, suppose D is an EDM as in (333) and we pre-multiply by −VNTpost-multiply by V N . Then because VN T1 = 0 (340),and− V T NDV N = 2V T NX T XV N = 2V T N GV N ∈ S N−1 (439)where G is the Gram matrix. Similarly pre- and post-multiplying by V ,− V DV = 2V X T XV = 2V GV ∈ S N (440)because V 1 = 0 (997). Likewise, multiplying the inner-product form EDMdefinition (375),−V T NDV N = Θ T Θ ∈ S N−1 (379)For any matrix A , rankA T A = rankA = rankA T . [28, §0.4] 4.13 HencerankV DV = rankV GV = rankV T N DV N = rankV T N GV N= rankXV = rankXV N = rank Θ = r(441)By conservation of dimension, (§A.7.2.0.1)The general fact 4.14 (confer (363))r + dim N(V T NDV N ) = N −1 (442)r + dim N(V DV ) = N (443)r ≤ min{n, N −1} (444)is evident from (433) but can be visualized in the example illustrated inFigure 4.1. There we imagine a vector from the origin to each point inthe list. Those three vectors are linearly independent in R 3 , but the affinedimension r is 2 because the three points lie in a plane. When that planeis translated to the origin, it becomes the only subspace of dimension r=2that can contain the translated triangular polyhedron.4.13 For A∈R m×n , N(A T A) = N(A). [26, §3.3]4.14 rankX ≤ min{n , N}

158 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.7.3 PrécisWe collect expressions for the affine dimension: for X ∈ R n×N ,r∆= dim(P − α) = dim P = dim conv X= dim(A − α) = dim A = dim aff X= rank(X − x 1 1 T ) = rank(X − α g 1 T )= rank Θ (377)= rankXV N = rankXV = rankXV †TN= rankVN TGV N = rankV GV = rankV † N GV N= rankVN TDV N = rankV DV = rankV † N DV N (§4.14.3)= rank Λ (486)= N − 1 − dim N([ −D 11 T 0])[ −D 1= rank1 T 0]− 2 (447)⎫⎪⎬D ∈ EDM N⎪⎭(445)4.7.3.0.1 Theorem. EDM rank versus affine dimension r .[91, §3] [97, §3] For D ∈ EDM N , (confer (604))1. r = rank(D) − 1 ⇔ 1 T D † 1 ≠ 0Points constituting a generating list for the corresponding polyhedronlie on the relative boundary of an r-dimensional circumhyperspherehavingdiameter = √ 2 ( 1 T D † 1 ) −1/2(446)2. r = rank(D) − 2 ⇔ 1 T D † 1 = 0There can be no circumhypersphere whose relative boundary containsa generating list for the corresponding polyhedron.3. In Cayley-Menger form [22, §3.3] [98, §40] (§4.12.3.1),([ ]) [ −D 1−D 1r = N − 1 − dim N1 T = rank0 1 T 0]− 2 (447)⋄For all practical purposes,max{0, rank(D) − 2} ≤ r ≤ min{n, N −1} (448)

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 1594.8 Euclidean metric versus matrix criteria4.8.1 Nonnegativity axiom 1When D is an EDM (333), then it is apparent from (439) that2V T NX T XV N = −V T NDV N ≽ 0 (449)because for any A , A T A ≽ 0 . 4.15 We claim that nonnegativity of the d ij isenforced primarily by the matrix inequality (449); id est,−VN TDV }N ≽ 0D ∈ S N ⇒ d ij ≥ 0, i ≠ j (450)0(The matrix criterion to enforce strict positivity differs by a stroke of thepen. (453))We now support our claim: If any matrix A ∈ R m×m is positive semidefinite,then its main diagonal δ(A) ∈ R m must have all nonnegative entries.[44, §4.2] Given D ∈ S N 0 ,−VN TDV N =⎡1d 12 (d 12 12+d 13 −d 23 ) (d 12 1,i+1+d 1,j+1 −d i+1,j+1 ) · · · (d ⎤2 12+d 1N −d 2N )1(d 12 12+d 13 −d 23 ) d 13 (d 12 1,i+1+d 1,j+1 −d i+1,j+1 ) · · · (d 2 13+d 1N −d 3N )12⎢(d 11,j+1+d 1,i+1 −d j+1,i+1 ) (d 2 1,j+1+d 1,i+1 −d j+1,i+1 ) d 1,i+1...1(d 2 14+d 1N −d 4N )⎥⎣ ........ . ⎦1(d 12 12+d 1N −d 2N ) (d 12 13+d 1N −d 3N ) (d 2 14+d 1N −d 4N ) · · · d 1N∈ S N−1 (451)where row,column indices i,j ∈ {1... N −1}. [87] It follows:⎡ ⎤d−VN TDV }12N ≽ 0D ∈ S N ⇒ δ(−VNDV T d 13N ) = ⎢ ⎥0⎣ . ⎦ ≽ 0 (452)d 1N4.15 For A ∈ R m×n , A T A ≽ 0 ⇔ y T A T Ay = ‖Ay‖ 2 ≥ 0 for all ‖y‖ = 1. When A isfull-rank skinny-or-square, A T A ≻ 0.

160 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXMultiplication of V N by any permutation matrix Ξ has null effect on its rangeand nullspace. In other words, any permutation of the rows or columns of V Nproduces a basis for N(1 T ); id est, R(Ξ r V N )= R(V N Ξ c )= R(V N )= N(1 T ).Hence, −VN TDV N ≽ 0 ⇔ −VN TΞT rDΞ r V N ≽ 0 (⇔ −Ξ T c VN TDV N Ξ c ≽ 0). Variouspermutation matrices 4.16 will sift the remaining d ij similarly to (452)thereby proving their nonnegativity. Hence −VN TDV N ≽ 0 is a sufficient testof the first axiom (§4.2) of the Euclidean metric, nonnegativity. 4.8.1.1 Strict positivityShould we require the points in R n to be distinct, then entries of D offthe main diagonal must be strictly positive {d ij > 0, i ≠ j}, and only thoseentries along the main diagonal of D are 0. By similar argument, the strictmatrix inequality is a sufficient test for strict positivity of Euclidean distancesquare;−VN TDV }N ≻ 0D ∈ S N ⇒ d ij > 0, i ≠ j (453)04.8.2 Triangle inequality axiom 4In light of Kreyszig’s observation [38, §1.1, prob.15] that axioms 2 through4 of the Euclidean metric (§4.2) together imply axiom 1, the nonnegativitycriterion (450) suggests that the matrix inequality −V T N DV N ≽ 0 mightsomehow take on the role of triangle inequality; id est,δ(D) = 0D T = D−V T N DV N ≽ 0⎫⎬⎭ ⇒ √ d ij ≤ √ d ik + √ d kj , i≠j ≠k (454)We now show that is indeed the case: Let T be the leading principal submatrixin S 2 of −V T N DV N (upper left 2×2 submatrix from (451));[T =∆1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13(455)4.16 The rule of thumb is: If Ξ r (i,1)=1, then δ(−V T N ΞT rDΞ r V N )∈R N−1 is some permutationof the i th row or column of D excepting the 0 entry from the main diagonal.

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 161Submatrix T must be positive (semi)definite whenever −VN TDV N(§A.3.1.0.4, §4.8.3) Now we have,is.−V T N DV N ≽ 0 ⇒ T ≽ 0 ⇔ σ 1 ≥ σ 2 ≥ 0−V T N DV N ≻ 0 ⇒ T ≻ 0 ⇔ σ 1 > σ 2 > 0(456)where σ 1 and σ 2 are the eigenvalues of T , real due only to symmetry of T :(σ 1 = 1 d2 12 + d 13 + √ )d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ R(σ 2 = 1 d2 12 + d 13 − √ ) (457)d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ RNonnegativity of eigenvalue σ 1 is guaranteed by only the nonnegativity ofthe d ij which in turn is guaranteed by the matrix inequality (450). The inequalitybetween the eigenvalues in (456) follows from only the realness ofthe d ij . Since σ 1 always exceeds or equals σ 2 , conditions for the positive(semi)definiteness of submatrix T can be completely determined by examiningσ 2 , the smaller of its two eigenvalues. A triangle inequality is madeapparent when we express T eigenvalue nonnegativity in terms of D matrixentries; videlicet,T ≽ 0 ⇔ detT = σ 1 σ 2 ≥ 0, d 12 ,d 13 ≥ 0⇔σ 2 ≥ 0⇔| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)(458)Triangle inequality (458a) (confer (373) (470)), in terms of three entries fromD, is equivalent to axiom 4√d13 ≤ √ d 12 + √ √d23d 23≤ √ d 12 + √ √d12d 13≤ √ d 13 + √ (459)d 23for the corresponding points x 1 ,x 2 ,x 3 from some length-N list. 4.174.17 Accounting for symmetry axiom 3, the fourth axiom demands three inequalities besatisfied per one of type (458a). The first of those inequalities in (459) is self evident from(458a), while the two remaining follow from the left-hand side of (458a) and the fact forscalars, |a| ≤ b ⇔ a ≤ b and −a ≤ b.

162 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.8.2.1 CommentGiven D whose dimension N equals or exceeds 3, there are N!/(3!(N − 3)!)distinct triangle inequalities in total like (373) that must be satisfied, of whicheach d ij is involved in N−2, and each point x i is in (N −1)!/(2!(N −1 − 2)!).We have so far revealed only one of those triangle inequalities; namely, (458a)that came from T (455). Yet we claim if −VN TDV N ≽ 0 then all triangleinequalities will be satisfied simultaneously;| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj , i

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 163which collinear arrangement of points is not allowed. [57, §24/6, p.322] Henceby similar reasoning, −VN TDV N ≻ 0 is a sufficient test of all the strict triangleinequalities; id est,δ(D) = 0D T = D−V T N DV N ≻ 0⎫⎬4.8.3 −V T N DV N nesting⎭ ⇒ √ d ij < √ d ik + √ d kj , i≠j ≠k (461)From (455) observe that T = −VN TDV N | N←3 . In fact, for D ∈ EDM N ,the leading principal submatrices of −VN TDV N form a nested sequence (byinclusion) whose members are individually positive semidefinite [44] [28] [26]and have the same form as T ; videlicet,[ − V T N DV N | N←1 = [ ∅ ] , (o)−V T N DV N | N←2 = [d 12 ] ∈ S + , (a)−V T N DV N | N←3 =−V T N DV N | N←4 =[⎡⎢⎣1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13= T ∈ S 2 + , (b)1d 12 (d 12 12+d 13 −d 23 ) (d ⎤2 12+d 14 −d 24 )1(d 12 12+d 13 −d 23 ) d 13 (d 2 13+d 14 −d 34 )1(d 12 12+d 14 −d 24 ) (d 2 13+d 14 −d 34 ) d 14⎥⎦, (c).−VN TDV N | N←i =.−VN TDV N =]⎡⎣⎡⎣−VN TDV ⎤N | N←i−1 ν(i)⎦ ∈ S i−1ν T + , (d)(i) d 1i−VN TDV ⎤N | N←N−1 ν(N)⎦ ∈ S N−1ν T + (e)(N) d 1N(462)

164 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXwhere 4.19 ν(i) ∆ = 1 2⎡⎢⎣⎤d 12 +d 1i −d 2id 13 +d 1i −d 3i⎥. ⎦ ∈ Ri−2 , i > 2 (463)d 1,i−1 +d 1i −d i−1,iHence, the leading principal submatrices of EDM D must also be EDMs. 4.20Bordered symmetric matrices in the form (462d) are known to haveintertwined [26, §6.4] (or interlaced [28, §4.3] [99, §IV.4.1]) eigenvalues;(confer §4.14.1) that means, for the particular submatrices (462a) and(462b),σ 2 ≤ d 12 ≤ σ 1 (464)where d 12 is the eigenvalue of the submatrix (462a), and σ 1 , σ 2 are theeigenvalues of T (462b) (455). Intertwining in (464) predicts that should d 12become 0, then σ 2 must go to 0. 4.21 The eigenvalues are similarly intertwinedfor submatrices (462b) and (462c);γ 3 ≤ σ 2 ≤ γ 2 ≤ σ 1 ≤ γ 1 (465)where γ 1 , γ 2 ,γ 3 are the eigenvalues of submatrix (462c). Intertwining likewisepredicts that should σ 2 become 0 (a possibility revealed in §4.8.3.1),then γ 3 must go to 0. Combining results so far for N = 2, 3, 4: (464) (465)γ 3 ≤ σ 2 ≤ d 12 ≤ σ 1 ≤ γ 1 (466)The preceding logic extends by induction through the remaining membersof the sequence (462).4.8.3.1 Tightening the triangle inequalityNow we apply the Schur complement from §A.4 to tighten the triangle inequality.We find that the gains by doing so are modest. From (462) weidentifyG ∆ = −V T NDV N | N←4 (467)A ∆ = T = −V T NDV N | N←3 (468)4.19 −V DV | N←1 = 0 ∈ S 0 + (§B.4.1)4.20 In fact, each and every principal submatrix of an EDM D is another EDM. [86, §4.1]4.21 If d 12 were 0, eigenvalue σ 2 becomes 0 (457) because d 13 must then be equal to d 23 ;id est, d 12 = 0 ⇔ x 1 = x 2 . (54)

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 165both positive semidefinite by assumption, B = ν(4) defined in (463), andC = d 14 . Using the latter non-strict form of (886), C ≥ 0 by assumption(§4.8.1) and CC † = I . So by the positive semidefinite ordering of eigenvaluestheorem (§A.3.1.0.1),− V T NDV N | N←4 ≽ 0 ⇔ T ≽ d −114 ν(4)ν T (4) ⇒{σ1 ≥ d −114 ‖ν(4)‖ 2σ 2 ≥ 0(469)where {d −114 ‖ν(4)‖ 2 , 0} are the eigenvalues of d −114 ν(4)ν T (4) and σ 1 , σ 2 arethe eigenvalues of T .4.8.3.1.1 Example. Small completion problem, IV.Applying the inequality for σ 1 in (469) to the small completion problem onpage 132, Figure 4.2 (confer §4.9.3, §4.12.4.1), the lower bound on √ d 14 ,1.236 in (327), is tightened to 1.289 . The correct value of √ d 14 to threesignificant figures is 1.414 .4.8.4 Affine dimension reduction in two dimensions(confer §4.12.4) The leading principal 2×2 submatrix T of −VN TDV N haslargest eigenvalue σ 1 (457) which is a convex function of D . 4.22 σ 1 can neverbe 0 unless d 12 = d 13 = d 23 = 0. σ 1 can never be negative while the d ij arenonnegative. The remaining eigenvalue σ 2 is a concave function of D thatbecomes 0 only at the upper and lower bounds of inequality (458a) and itsequivalent forms: (confer (460))| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)⇔| √ d 12 − √ d 13 | ≤ √ d 23 ≤ √ d 12 + √ d 13 (b)⇔| √ d 13 − √ d 23 | ≤ √ d 12 ≤ √ d 13 + √ d 23 (c)(470)4.22 The maximum eigenvalue of any symmetric matrix is always a convex function of itsentries, while the ⎡ minimum ⎤ eigenvalue is always concave. [1, exmp.3.10] In our particulard 12case, say d =∆ ⎣d 13⎦ ∈ R 3 . Then the Hessian (1072) ∇ 2 σ 1 (d) ≽ 0 certifies convexityd 23whereas ∇ 2 σ 2 (d)≼0 certifies concavity. Each Hessian has rank equal to 1. The respectivegradients ∇σ 1 (d) and ∇σ 2 (d) are nowhere 0.

166 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXIn between those bounds, σ 2 is strictly positive; otherwise, it would be negativebut prevented by the condition T ≽ 0.When σ 2 becomes 0, it means that triangle △ 123 has collapsed to a linesegment; a potential reduction in affine dimension r . The same logic is validfor any principal 2×2 submatrix of −V T N DV N , hence applicable to othertriangles.4.9 Bridge: Convex polyhedra to EDMsThe criteria for the existence of an EDM include, by definition (333) (375),the axioms imposed upon its entries d ij by the Euclidean metric. From§4.8.1 and §4.8.2, we know there is a relationship of matrix criteria to thoseaxioms. Here is a snapshot of what we are sure thus far: for i,j,k ∈{1... N},(confer §4.2)√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇐−V T N DV N ≽ 0δ(D) = 0D T = D⇐ D ∈ EDM N(471)In words, these Euclidean axioms are necessary conditions for D to be adistance matrix. At this moment, we have no converse for (471). As of concernin §4.3, we have yet to establish metric requirements beyond the fourEuclidean axioms that would allow D to be certified an EDM or might facilitatepolyhedron or list reconstruction from an incomplete EDM. Our presentgoal is to establish ab initio the necessary and sufficient matrix criteria thatwill subsume all the Euclidean axioms and any further requirements 4.23 forall N > 0 (§4.8.3); id est,−V T N DV N ≽ 0D ∈ S N 0⇔ D ∈ EDM N (352)4.23 In 1935, Schoenberg [87, (1)] first extolled expansion (451) (predicated on symmetryand zero self-distance) specifically incorporating V N , albeit algebraically. He showedthat nonnegativity −y T V T N DV N y ≥ 0, for all (normalized) y ∈R N−1 , is necessary andsufficient for D to be an EDM. Gower [84, §3] remarks how surprising it is that such afundamental property of Euclidean geometry was obtained so late.

4.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 167or since −V T N DV N = Θ T Θ (379), then for relative-angle matrix Ω and distancevector √ d as in (380) and EDM definition (384),Ω ≽ 0√d ≽ 0⇔ D = D(Ω,d) ∈ EDM N (472)From (340) we know R(V N )=N(1 T ), so (352) is the same as (329). In fact,any matrix V in place of V N will satisfy (352) whenever R(V ) = R(V N ) =N(1 T ).4.9.1 Geometric criterionWe derive matrix criteria for D to be an EDM, validating (352) using simplegeometry: distance to the polyhedron formed by the convex hull of a list ofpoints (54) in Euclidean space R n .EDM assertion. D is a Euclidean distance matrix if and only if D ∈ S N 0and distances-square from the origin{‖p(y)‖ 2 = −y T V T NDV N y | y ∈ S − β} (473)correspond to points p in some bounded convex polyhedronP − α = {p(y) | y ∈ S − β} (474)having N or fewer vertices embedded in an r-dimensional subspace A − αof R n , where α ∈ A = aff P , and where the domain of linear surjectionp(y) is the unit simplex S ⊂ R N−1+ shifted such that its vertex at the originis translated to −β in R N−1 . When β = 0, α = x 1 .⋄In terms of V N , the unit simplex (166) in R N−1 has an equivalent representation:S = {s ∈ R N−1 | √ 2V N s ≽ −e 1 } (475)where e 1 is as in (348). Incidental to the EDM assertion, shifting the unitsimplexdomain in R N−1 translates the polyhedron P in R n . Indeed, there is aone-to-one correspondence between vertices of the unit simplex and membersof the list generating P ;

168 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXp : R N−1⎛⎧⎪⎨p⎜⎝⎪⎩−βe 1 − βe 2 − β.e N−1 − β⎫⎞⎪⎬⎟⎠⎪⎭→=⎧⎪⎨⎪⎩R nx 1 − αx 2 − αx 3 − α.x N − α⎫⎪⎬⎪⎭(476)4.9.1.0.2 Proof. EDM assertion.(=⇒) We demonstrate that if D is an EDM, then each distance-square‖p(y)‖ 2 described by (473) corresponds to a point p in some embeddedpolyhedron P − α . Assume D is indeed an EDM; id est, D can be madefrom some list X of N unknown points in Euclidean space R n ; D = D(X)for X ∈ R n×N as in (333). Since D is translation invariant (§4.5.1), we mayshift the affine hull A of those unknown points to the origin as in (427). Thentake any point p in their convex hull (60);P − α = {p = (X − Xb1 T )a | a T 1 = 1, a ≽ 0} (477)where α = Xb ∈ A ⇔ b T 1 = 1. Solutions to a T 1 = 1 are: 4.24a = e 1 + √ 2V N s (478)where s ∈ R N−1 and e 1 is as in (348). Similarly, b = e 1 + √ 2V N β .P − α = {p = X(I − (e 1 + √ 2V N β)1 T )(e 1 + √ 2V N s) | √ 2V N s ≽ −e 1 }= {p = X √ 2V N (s − β) | √ 2V N s ≽ −e 1 }(479)that describes the domain of p(s) as the unit simplex S ;Making the substitution y ← s − β ,S = {s | √ 2V N s ≽ −e 1 } ⊂ R N−1+ (480)P − α = {p = X √ 2V N y | y ∈ S − β} (481)4.24 Since R(V N )= N(1 T ) and N(1 T )⊥ R(1), then over all s∈R N−1 , V N s is a hyperplanethrough the origin orthogonal to 1. Thus the solutions a constitute a hyperplaneorthogonal to the vector 1, and offset from the origin in R N by any particular solution;in this case, a = e 1 .

4.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 169Point p belongs to a convex polyhedron P−α embedded in an r-dimensionalsubspace of R n because the convex hull of any list forms a polyhedron, andbecause the translated affine hull A − α contains the translated polyhedronP − α (430) and the origin (when α∈A), and because A has dimension r bydefinition (432). Now, any distance-square from the origin to the polyhedronP − α can be formulated{p T p = ‖p‖ 2 = 2y T V T NX T XV N y | y ∈ S − β} (482)Applying (439) to (482) we get (473).(⇐=) To validate the EDM assertion in the reverse direction, we prove: Ifeach distance-square ‖p(y)‖ 2 (473) on the shifted unit-simplex S −β ⊂ R N−1corresponds to a point p(y) in some embedded polyhedron P − α , then Dis an EDM. The r-dimensional subspace A − α ⊆ R n is spanned byp(S − β) = P − α (483)because A − α = aff(P − α) ⊇ P − α (430). So, outside the domain S − βof linear surjection p(y) , the simplex complement \S − β ⊂ R N−1 mustcontain the domain of the distance-square ‖p(y)‖ 2 = p(y) T p(y) to remainingpoints in the subspace A − α ; id est, to the polyhedron’s relative exterior\P − α . For ‖p(y)‖ 2 to be nonnegative on the entire subspace A − α ,−V T N DV N must be positive semidefinite and is assumed symmetric; 4.25− V T NDV N ∆ = Φ T Φ (484)where 4.26 Φ ∈ R m×N−1 for some m ≥ r . Because p(S − β) is a convexpolyhedron, it is necessarily a set of linear combinations of points from somelength-N list because every convex polyhedron having N or fewer verticescan be generated that way (§2.7.2). Equivalent to (473) are{p T p | p ∈ P − α} = {p T p = y T Φ T Φ y | y ∈ S − β} (485)Because p ∈ P − α may be found by factoring (485), the list Φ is found byfactoring (484). A unique EDM can be constructed from that list using theinner-product form definition D(Θ)| Θ=Φ (375). That EDM will be identicalto D if δ(D)=0, by injectivity of D (403).4.25 The antisymmetric part ( −V T N DV N − (−V T N DV N) T) /2 is annihilated by ‖p(y)‖ 2 . Bythe same reasoning, any positive (semi)definite matrix A is generally assumed symmetricbecause only the symmetric part (A +A T )/2 survives the test y T Ay ≥ 0. [28, §7.1]4.26 A T = A ≽ 0 ⇔ A = R T R for some real matrix R. [26, §6.3]

170 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX1 2 3 4-0.2-0.4-0.6Figure 4.5: Minimum eigenvalue of −V T N DV N , positive semidefinite for onlyone value of d 14 ; 2.4.9.2 Necessity and sufficiencyFrom (449) we learned that the matrix inequality −V T N DV N ≽ 0 is anecessary test for D to be an EDM. In §4.9.1, the connection between convexpolyhedra and EDMs was pronounced by the EDM assertion; the matrixinequality together with D ∈ S N 0 became a sufficient test when the EDMassertion demanded that every bounded convex polyhedron have a correspondingEDM. For all N > 0 (§4.8.3), the matrix criteria for the existenceof an EDM in (352), (472), and (329) are therefore necessary and sufficientand subsume all the Euclidean axioms and further requirements.4.9.3 Example revisitedNow we apply the necessary and sufficient EDM criteria (352) to an earlierproblem.Example. Small completion problem, II. Continuing the Example onpage 132 that pertains to Figure 4.2 where N =4, d 14 is ascertainable fromthe matrix inequality −VN TDV N ≽ 0. Because all distances in (326) areknown except √ d 14 , we may simply calculate the minimum eigenvalue of−VN TDV N over a range of d 14 as in Figure 4.5. We observe a unique valueof d 14 satisfying (352) where the abscissa is tangent to the hypograph of the

4.10. LIST RECONSTRUCTION 171minimum eigenvalue. Since the minimum eigenvalue of a symmetric matrixis known to be a concave function (§4.8.4), we calculate its second partialderivative with respect to d 14 evaluated at 2 and find −1/3. We concludethere are no other satisfying values of d 14 . Further, that value of d 14 does notmeet an upper or lower bound of a triangle inequality like (460), so neitherdoes it cause the collapse of any triangle. Because the minimum eigenvalue is0, the affine dimension r of any point list corresponding to D cannot exceedN −2. (§4.7.2)4.10 List reconstructionIsometric reconstruction (§4.5.3) of point list X is generally performed byfactorization of some quantity involving inter-point distance-square data.4.10.1 via GramFor quick reconstruction of a generating list, we may simply performa Cholesky factorization 4.27 of the Gram matrix (350) or (355) fromgiven D ∈ EDM N ; id est, G = X T X where reconstruction X providedby the factorization is upper triangular. Alternatively, we may factorize−VN TDV N =Θ T Θ (379) to find upper triangular reconstruction Θ (377).We now consider how rotation/reflection and translation invariance factorinto a reconstruction. Alternatively, the reader may skip ahead to theexamples.4.10.2 x 1 at the origin. V NAt the stage of reconstruction, we have D ∈ EDM N and we wish to find a generatinglist (§2.2.2) for P − α by factoring positive semidefinite −V T N DV N(484) as suggested in §4.9.1.0.2. One way to factor −V T N DV N is viadiagonalization of symmetric matrices; [26, §5.6] [28] (§A.5, §A.3)− V T NDV N ∆ = QΛQ T (486)QΛQ T ≽ 0 ⇔ Λ ≽ 0 (487)4.27 A very stable numerical algorithm (not requiring definiteness) is given in §F.1.1.5.

172 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXwhere Q ∈ R N−1×N−1 is an orthogonal matrix containing eigenvectors whileΛ ∈ R N−1×N−1 is a diagonal matrix containing corresponding nonnegativeeigenvalues ordered by nonincreasing value. From the diagonalization, identifythe list using (439);− V T NDV N = 2V T NX T XV N ∆ = Q √ ΛQ T pQ p√ΛQT(488)where √ ΛQ T pQ p√Λ ∆ = Λ = √ Λ √ Λ , and where Q p ∈ R n×N−1 is unknownas is its dimension n . Rotation/reflection is accounted for by Q p yet onlyits first r columns are necessarily orthonormal. 4.28 Assuming membership tothe unit simplex y ∈S (481), then point p = X √ 2V N y = Q p√ΛQ T y in R nbelongs to the polyhedronwhose generating list constitutes the columns of (433)[0 X √ 2V N]=P − x 1 (489)[ √ ]0 Q p ΛQT∈ R n×N (490)The simplest choice of Q p has n set to N−1; id est, Q p = I . Each memberof the generating list then has N −1 − r zeros in its higher-dimensionalcoordinates because r ≤ N −1. (444) To truncate those zeros, choose n toberankV T NDV N = rankQ p√ΛQ T = rank Λ = r (491)which is the smallest n possible because XV N has rank r ≤ n (441). 4.29 Inthat case, the simplest choice for Q p is [ I 0 ] having dimensions r×N−1.4.28 Q p is not necessarily an orthogonal matrix. Q p is constrained such that only its firstr columns are necessarily orthonormal because there are only r nonzero eigenvalues in Λwhen VN TDV N has rank r (§4.7.2). The remaining columns of Q p are arbitrary.⎡⎤⎡q T1 ⎤λ 1 0. .. 4.29 If we write Q T = ⎣. .. ⎦ as row-wise eigenvectors, Λ = ⎢ λr ⎥qN−1T ⎣ 0 ... ⎦0 0in terms of eigenvalues, and Q p = [ ]q p1 · · · q pN−1 as column vectors, then√Q p Λ Q T ∑= r √λi q pi qiT is a sum of r linearly independent rank-one matrices (§B.1.1).i=1Hence the summation has rank r .

4.10. LIST RECONSTRUCTION 173We may wish to verify the list (490) found from the diagonalization of−V T N DV N . Because of rotation/reflection and translation invariance (§4.5),EDM D can be uniquely constructed from that list by calculating: (333)D(X) = D(X[0 √ 2V N ]) = D(Q p [0 √ ΛQ T ]) = D([0 √ ΛQ T ])(492)This suggests a way to find EDM D given −VN TDV N ; (confer (408))[ ]0D =δ ( [−VN TDV ) 1 T + 1 0 δ ( −VN TDV ) ] [ ]T 0 0TN − 2N0 −VN TDV N4.10.2.1 0 geometric center. V(353)Alternatively, we may perform reconstruction using instead the auxiliarymatrix V (§B.4.1), corresponding to the polyhedronP − α g (493)whose geometric center has been translated to the origin. RedimensioningQ, Λ∈R N×N and Q p ∈ R n×N , (440)− V DV = 2V X T XV ∆ = Q √ Λ Q T pQ p√ΛQT(494)where the generating list now constitutes (confer (490))XV = 1 √2Q p√Λ Q T ∈ R n×N (495)Now, EDM D can be uniquely constructed from the list found by calculating:(333)D(X) = D(XV ) = D( 1 √2Q p√ΛQ T ) = D( √ ΛQ T ) 1 2(496)This EDM is, of course, identical to (492). Similarly to (353), from −V DVwe can find EDM D ; (confer (414))D = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (358)

174 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX(a)(c)(b)(d)(f)(e)Figure 4.6: Map of United States of America showing some state boundariesand the Great Lakes. All plots made using 5020 connected points. Anydifference in scale in (a) through (d) is an artifact of plotting routine.(a) shows original map made from decimated (latitude, longitude) data.(b) shows original map data rotated to highlight curvature of Earth.(c) Map isometrically reconstructed from the EDM.(d) Same reconstructed map illustrating curvature.(e)(f) Two views of one isotonic reconstruction; problem (506) with no sortconstraint Ξd (and no removal of hidden lines).

4.11. RECONSTRUCTION EXAMPLES 1754.11 Reconstruction examples4.11.1 Isometric reconstruction4.11.1.0.1 Example. Map of the USA.The most fundamental application of EDMs is to reconstruct relative pointposition given only inter-point distance information. Drawing a map of theUnited States is a good illustration of isometric reconstruction from completedistance data. We obtained latitude and longitude information for the coast,border, states, and Great Lakes from the usalo atlas data file within theMATLAB Mapping Toolbox; the conversion to Cartesian coordinates (x,y,z)via:φ ∆ = π/2 − latitudeθ ∆ = longitudex = sin(φ) cos(θ)y = sin(φ) sin(θ)z = cos(φ)(497)We used 64% of the available map data to calculate EDM D from N = 5020points. The original (decimated) data and its isometric reconstruction areshown in Figure 4.6(a)-(d). The MATLAB code is in §F.3.1. The eigenvaluescomputed for (486) areλ(−V T NDV N ) = {199.8, 152.3, 2.465, 0, 0, 0, ...} (498)The 0 eigenvalues have numerical error on the order of 2E-13 ; meaning, theEDM data indicates three dimensions (r = 3) are required for reconstructionto nearly machine precision.4.11.2 Isotonic reconstructionSometimes only comparative information about distance is known (e.g., theEarth is closer to the Moon than it is to the Sun). Suppose, for example,the EDM D for three points is unknown:D = [d ij ] =⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎦ ∈ S 3 0 (323)

176 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXbut the comparative data is available:d 12 ≤ d 23 ≤ d 13 (499)With the vectorization d = [d 12 d 13 d 23 ] T ∈ R 3 , we express the comparativedistance relationship as the nondecreasing sorting⎡ ⎤⎡⎤ ⎡ ⎤1 0 0 d 12 d 12Ξd = ⎣ 0 0 1 ⎦⎣d 13⎦ = ⎣ d 23⎦ ∈ K +M (500)0 1 0 d 23 d 13where Ξ is a given permutation matrix expressing the known sorting actionon the entries of unknown EDM D , and K +M is the monotone nonnegativecone (§2.9.2.2.1) with reversed indices,K +M ∆ = {z | 0 ≤ z 1 ≤ z 2 ≤ · · · ≤ z N(N−1)/2 } ⊆ R N(N−1)/2+ (501)where N(N − 1)/2 = 3 for the present example. From the sorted vectorization(500) we create the sort-index matrix⎡ ⎤0 1 2 3 2O = ⎣ 1 2 0 2 2 ⎦ ∈ S 33 2 2 2 0 ∩ R 3×3+ (502)0where, for j ≠ i ,O ij = k 2 | d ij = ( Ξd ) k(503)Replacing EDM data with indices-square of a nondecreasing sorting likethis is, of course, a heuristic we invented. Any process of reconstructionthat leaves comparative distance information intact is called ordinalmultidimensional scaling or isotonic reconstruction. Beyond rotation, reflection,and translation error, (§4.5) list reconstruction by isotonic reconstructionis subject to error in absolute scale (dilation) and distance ratio. YetBorg and Groenen argue: [100, §2.2] reconstruction from complete comparativedistance information for a large number of points is as highly constrainedas reconstruction from an EDM; the larger the number, the better.4.11.2.0.1 Example. Isotonic map of the USA.To test that conjecture, suppose we make a complete sort-index matrixO ∈ S N 0 ∩ R N×N+ for the map of the USA and then substitute O in place of

4.11. RECONSTRUCTION EXAMPLES 177λ(−V T N OV N) j90080070060050040030020010001 2 3 4 5 6 7 8 9 10jFigure 4.7: Largest ten eigenvalues of −V T N OV N for map of USA, sorted bynonincreasing value. In the code (§F.3.2), we normalize O by (N(N −1)/2) 2 .EDM D in the reconstruction process of §4.10. Whereas EDM D returnedonly three significant eigenvalues (498), the sort-index matrix O is generallynot an EDM, certainly not an EDM with corresponding affine dimension3, so returns many more. The eigenvalues, calculated with numerical error5E-7, are plotted in Figure 4.7:λ(−V T N OV N ) = {880.1, 463.9, 186.1, 46.20, 17.12, 9.625, 8.257, 1.701, 0.7128, 0.6460, ...}(504)The extra eigenvalues indicate that affine dimension corresponding to anEDM near O is likely to exceed 3. To realize the map, we must simultaneouslyreduce that dimensionality and find an EDM D closest to O in some sense(a problem explored more in §7) while maintaining the known comparativedistance relationship; e.g., given permutation matrix Ξ that expresses theknown sorting action on the entries d of unknown D ∈ S N 0 , (52)⎡d ∆ = 1 √2dvec D =⎢⎣⎤d 12d 13d 23d 14d 24d 34⎥⎦.d N−1,N∈ R N(N−1)/2 (505)

178 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXwe can make the sort-index matrix O input to the optimization problemminimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3(506)Ξd ∈ K +MD ∈ EDM Nthat finds the EDM D corresponding to affine dimension not exceeding 3 in√12dvec EDM N ∩ Ξ T K +M closest to O in the sense of Schoenberg (352).The analytical solution to this problem, ignoring the sort constraint(Ξd∈K +M ), is known [73] (§7.1): Only the three largest nonnegative eigenvaluesin (504) need be retained to make (490); the rest are discarded.The reconstruction from the EDM D found in this manner is plotted inFigure 4.6(e)(f) from which it becomes clear that inclusion of the sort constraintis necessary for isotonic reconstruction. Ignoring the sort constraint,apparently, violates it. Yet it is remarkable, in light of the many violations,how much the map reconstructed using only ordinal data so resembles theoriginal map.4.11.2.1 Isotonic solution with sort constraintBecause problems involving rank are generally difficult, we will partition(506) into two problems we know how to solve and alternate their solutionuntil convergence:minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM Nminimize o T (o − Ξd)osubject to o ∈ K +Mo − Ξd ∈ K ∗ +M(a)(b)(507)where K ∗ +M is the dual of the reversed monotone nonnegative cone (501),and where the sort-index matrix O (a given constant in (a)) becomes avectorized variable o in (b) related byo ∆ = 1 √2dvecO ∈ R N(N−1)/2 (508)

4.11. RECONSTRUCTION EXAMPLES 179A closed-form solution to problem (507a) exists. (§7.1) Only the first iterationof (507a) sees the original sort-index matrix O whose entries are nonnegativewhole numbers; id est, O 0 =O (503). Subsequent iterations i takeO i = dvec −1 ( √ 2o i ) ∈ R N×N (509)as input, where O i is a real successor to the sort-index matrix O at the i thiteration, and where o i ∈ R N(N−1)/2 is a new vector variable that solves thei th instance of (507b).The new problem (507b) we devised from the necessary and sufficientconditions in §E.8.1.0.1 for unique projection of Ξd on convex cone K +M .Its objective function o T (o − Ξd) cannot fall below the optimal value 0 by(183) because simplicial K +M is closed and convex. By defining (§2.9.2.2.1)Y ∆ = [e m e m +e m−1 e m + e m−1 +e m−2 · · · 1] ∈ R m×mY †T ∆ = [e m − e m−1 e m−1 −e m−2 e m−2 −e m−3 · · · e 1 ] ∈ R m×m (510)where m ∆ = N(N − 1)/2, we may rewrite (507b) as an equivalent quadraticprogram; a convex optimization problem [1, §4] in terms of halfspacedescriptionsof the cone and its dual:minimize o T (o − Ξd)osubject to Y † o ≽ 0Y T (o − Ξd) ≽ 0(511)4.11.2.2 ConvergenceIn §E.9 we discuss convergence of alternating projection on intersecting convexsets; convergence to a point in their intersection. Here the situation is alittle different because sets of positive semidefinite matrices having an upperbound on rank are generally not convex. Yet in §7.1.3.0.1 we prove (507a) isequivalent to a projection of eigenvalues on a polyhedral cone K 3 M+ (720):minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM N≡minimize ‖Υ − Λ‖ FΥsubject to δ(Υ) ∈ K 3 M+(512)

180 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX∆where −VN TDV N =UΥU T ∈ S N−1 and −VN TOV ∆N =QΛQ T ∈ S N−1 are ordereddiagonalizations (§A.5). It so happens: optimal orthogonal U always equalsQ given. An operator T acting on matrix A , as in T(A) = U T AU , is linearand bijective, hence an isomorphism; an isometry because the Frobeniusnorm is orthogonally invariant. (28) This isometric isomorphism T thus mapsa non-convex problem to a convex one that preserves distance. The alternation(507) therefore converges to a point, in the set of all EDMs correspondingto affine dimension not in excess of 3, belonging to √ 12dvec EDM N ∩Ξ T K +M .We have not implemented the second half of the alternation (511) becausememory-demands exceed the capability of our laptop computer.4.11.3 Isotonic reconstruction, alternating projectionPerhaps a computationally simpler approach is to find a feasible point;(Boyd)find D ∈ EDM Nsubject to rankVN TDV N ≤ 3〈A j , D〉 ≤ 0 , j = 1... N(N −1)/2 − 1(513)Pick an arbitrary starting point somewhere in the EDM cone correspondingto affine dimension 3, then alternately project (§E.9) on all the halfspacesspecified by the inequalities and on the subset of the EDM cone correspondingto affine dimension not in excess of 3. The inequalities denote the knownordinal information; e.g., for D ∈ EDM 3 , ordinal datad 12 ≤ d 23 ≤ d 13 (499)would be represented by two inequalities where...⎡ ⎤0 1 −1A 1 = ⎣ 1 0 0 ⎦ 1 2−1 0 0⎡ ⎤0 0 −1A 2 = ⎣ 0 0 1 ⎦ 1 2−1 1 0(514)

4.11. RECONSTRUCTION EXAMPLES 1814.11.3.0.1 Example... Global positioning system (GPS).Twenty four satellites orbit the earth constantly transmitting...http://www.trimble.com/gps/what.htmlhttp://205.236.5.89/~eric/tril.htmlhere’s one, related to GPS: suppose each of n nodes transmits apseudo-random code, and hears all others. the clocks are notsynchronized, but each one is off by a fixed bias from a common clock.then each node gets the distances to all other nodes, plus thedifference in clock offsets. can you solve the problem of figuring outthe clock offsets?

182 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.12 Fifth Euclidean requirementWe continue now with the question raised in §4.3 regarding the necessityfor at least one requirement more than the four axioms of the Euclideanmetric (§4.2) to certify realizability of a bounded convex polyhedron or toreconstruct a generating list for it from incomplete distance information.There we saw the Euclidean axioms are necessary for D ∈ EDM N in the caseN = 3, but become insufficient when the cardinality N exceeds 3 (regardlessof affine dimension).4.12.1 RecapitulateIn the particular case N = 3, −VN TDV N ≽ 0 (456) and D ∈ S 3 0 are thenecessary and sufficient conditions for D to be an EDM. From (458), thetriangle inequality is then the only Euclidean constraint on the bounds of thenecessarily nonnegative d ij , and those bounds are tight. That means thefour axioms of the Euclidean metric are necessary and sufficient requirementsfor D to be an EDM in the case N = 3; for i,j∈{1, 2, 3},√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇔ −V T N DV N ≽ 0D ∈ S 3 0⇔ D ∈ EDM 3Yet the four axioms become insufficient when N > 3.(515)4.12.2 Derivation of the fifthCorrespondence between the triangle inequality and the EDM was developedin §4.8.2 where a triangle inequality (458a) was revealed within theleading principal 2 × 2 submatrix of −VN TDV N when positive semidefinite.Our choice of the leading principal submatrix was arbitrary; actually,a unique triangle inequality like (373) corresponds to any one of the(N −1)!/(2!(N −1 − 2)!) principal 2 ×2 submatrices. 4.30 Assuming D ∈ S 4 04.30 There are fewer principal 2×2 submatrices in −V T N DV N than there are triangles madeby four or more points because there are N!/(3!(N − 3)!) triangles made by point triples.The triangles corresponding to those submatrices all have vertex x 1 . (confer §4.8.2.1)

4.12. FIFTH EUCLIDEAN REQUIREMENT 183and −VN TDV N ∈ S 3 , then by the positive (semi)definite principal submatricestheorem (§A.3.1.0.4) it is sufficient to prove all d ij are nonnegative, alltriangle inequalities are satisfied, and det(−VN TDV N) is nonnegative. WhenN = 4, in other words, that nonnegative determinant becomes the fifth andlast Euclidean requirement for D ∈ EDM N . We now endeavor to ascribegeometric meaning to it.4.12.2.1 Nonnegative determinantBy (379) when D ∈EDM 4 , −VN TDV N is equal to the inner product (374),⎡√ √ ⎤d 12 d12 d 13 cosθ 213 d12 √d12 √d 14 cosθ 214Θ T Θ = ⎣ d 13 cosθ 213 d 13 d13 √d12d 14 cosθ 314⎦√(516)d 14 cosθ 214 d13 d 14 cosθ 314 d 14Because Euclidean space is an inner-product space, the more concise innerproductform of the determinant is admitted;det(Θ T Θ) = −d 12 d 13 d 14(cos 2 θ 213 +cos 2 θ 214 +cos 2 θ 314 − 2 cos θ 213 cos θ 214 cos θ 314 − 1 )The determinant is nonnegative if and only if(517)cos θ 214 cos θ 314 − √ sin 2 θ 214 sin 2 θ 314 ≤ cos θ 213 ≤ cos θ 214 cos θ 314 + √ sin 2 θ 214 sin 2 θ 314⇔cos θ 213 cos θ 314 − √ sin 2 θ 213 sin 2 θ 314 ≤ cos θ 214 ≤ cos θ 213 cos θ 314 + √ sin 2 θ 213 sin 2 θ 314⇔cos θ 213 cos θ 214 − √ sin 2 θ 213 sin 2 θ 214 ≤ cos θ 314 ≤ cos θ 213 cos θ 214 + √ sin 2 θ 213 sin 2 θ 214(518)which simplifies, for 0 ≤ θ i1l ,θ l1j ,θ i1j ≤ π and all i≠j ≠l ∈{2, 3, 4}, tocos(θ i1l + θ l1j ) ≤ cos θ i1j ≤ cos(θ i1l − θ l1j ) (519)Analogously to triangle inequality (470), the determinant is 0 upon equalityon either side of (519) which is tight. Inequality (519) can be equivalentlywritten linearly as a “triangle inequality”, but between relative angles[45, §1.4];|θ i1l − θ l1j | ≤ θ i1j ≤ θ i1l + θ l1jθ i1l + θ l1j + θ i1j ≤ 2π(520)0 ≤ θ i1l ,θ l1j ,θ i1j ≤ π

184 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXθ 213θ 214θ 314Figure 4.8: The relative-angle inequality tetrahedron (521) bounding EDM 4is regular; drawn in entirety. Each angle θ (371) must belong to the solid tobe realizable.πGeneralizing this:Fifth Euclidean axiom - restatement. [86, §3.1] (confer §4.3.1)Relative-angle inequality. Augmenting the axioms of the Euclidean metricin R n , for all i,j,l ≠ k ∈{1... N}, i

4.12. FIFTH EUCLIDEAN REQUIREMENT 185The relative-angle inequality, for this case, is illustrated in Figure 4.8.4.12.2.2 Beyond the fifth axiomWhen the cardinality N exceeds 4, the four axioms of the Euclidean metricand the relative-angle inequality together become insufficient conditionsfor realizability. In other words, the Euclidean axioms and relative-angle inequalityremain necessary but become a sufficient test of only the positivesemidefiniteness of all the principal 3 ×3 submatrices in −V T N DV N . Therelative-angle inequality can be considered a sufficient test of integrity ateach point x k for every purported tetrahedron.When N = 5 in particular, the relative-angle inequality becomes thepenultimate Euclidean requirement while nonnegativity of then unwieldydet(Θ T Θ) corresponds (by the positive (semi)definite principal submatricestheorem in §A.3.1.0.4) to the sixth and last Euclidean requirement, and soon.Yet for all values of N , only assuming nonnegative d ij , the relative-anglematrix inequality in (472) is necessary and sufficient to certify realizability;(§4.4.3.1)Euclidean axiom 1 (§4.2).Angle matrix inequality Ω ≽ 0 (380).⇔ −V T N DV N ≽ 0D ∈ S N 0⇔ D = D(Ω,d) ∈ EDM N(523)Like matrix criteria (329), (352), and (472), the relative-angle matrix inequalityand nonnegativity axiom subsume all the Euclidean axioms and furtherrequirements.4.12.3 Path not followedAn alternate and intuitively appealing way to augment the Euclidean axiomsis to recognize, in the case N = 4, the three-dimensional analogue to triangle& distance is tetrahedron & facet-area, while in the case N = 5 the fourdimensionalanalogue to triangle & distance is polychoron & facet-volume,ad infinitum.4.12.3.1 N = 4Each of the four facets of a general tetrahedron is a triangle and its relative interior.Suppose we identify each facet of the tetrahedron by its area-squared:

186 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXc 1 , c 2 , c 3 , c 4 . Then analogous to axiom 4, we may write a tight 4.31 areainequality for the facets√ci ≤ √ c j + √ c k + √ c l , i≠j ≠k≠l∈{1, 2, 3, 4} (524)which is a generalized “triangle” inequality [38, §1.1] that follows from√ci = √ c j cos ϕ ij + √ c k cos ϕ ik + √ c l cos ϕ il (525)[101] [48, Law of Cosines] where ϕ ij is the dihedral angle at the commonedge between triangular facets i and j .If D is the EDM corresponding to the whole tetrahedron, then areasquaredof the i th triangular facet has a convenient formula in terms ofD i ∈ EDM N−1 the EDM corresponding to that particular facet: From theCayley-Menger determinant 4.32 for simplices, [48] [102] [84, §4] [22, §3.3] thei th facet area-squared for i∈{1... N} is (§A.4.2)c i ===[−12 N−2 (N −2)! 2 det −Di 11 T 0(−1) N2 N−2 (N −2)! 2 detD i](526)(1 T D −1i 1 ) (527)(−1) N2 N−2 (N −2)! 2 1T cof(D i ) T 1 (528)where D i is the i th principal N−1×N−1 submatrix 4.33 of D ∈ EDM N , andcof(D i ) is the N − 1 × N − 1 matrix of cofactors 4.34 [26, §4] correspondingto D i . The number of principal 3 × 3 submatrices in D is, of course, equalto the number of triangular facets in the tetrahedron; four (N!/(3!(N −3)!))when N = 4.The triangle inequality (axiom 4) and area inequality (524) are conditionsnecessary for D to be an EDM; we do not prove their sufficiency inconjunction with the remaining three axioms.4.31 The upper bound is met when all angles in (525) are simultaneously 0; that occurs,for example, if one point is relatively interior to the convex hull of the three remaining.4.32 whose foremost characteristic is: the determinant vanishes if and[ only if affine ] dimensiondoes not equal or exceed penultimate cardinality; id est, det −D 11 T = 0 ⇔0r < N −1, where D is any EDM (§4.7.3.0.1). Otherwise, the determinant is negative.4.33 Every principal submatrix of an EDM remains an EDM. [86, §4.1]4.34 Hence c i is a linear function of the entries of D i ; even when D i is singular.

4.12. FIFTH EUCLIDEAN REQUIREMENT 1874.12.3.2 N = 5Moving to the next level, we might encounter a Euclidean object calledpolychoron, a bounded polyhedron in four dimensions. 4.35 The polychoronhas five (N!/(4!(N −4)!)) facets, each of them a general tetrahedron whosevolume-squared c i is calculated using the same formula; (526) where D is theEDM corresponding to the polychoron, and D i is the EDM corresponding tothe i th facet (the principal 4×4 submatrix of D ∈ EDM N corresponding to thei th tetrahedron). The analogue to triangle & distance is now polychoron &facet-volume. We could then write another generalized “triangle” inequalitylike (524) but in terms of facet volume; [103, §IV]√ci ≤ √ c j + √ c k + √ c l + √ c m , i≠j ≠k≠l≠m∈{1... 5} (529)Now, for N = 5, the triangle (distance) inequality (§4.2) and the areainequality (524) and the volume inequality (529) are conditions necessary forD to be an EDM. We do not prove their sufficiency.4.12.3.3 Volume of simplicesThere is no known formula for the volume of a bounded convex polyhedronexpressed either by halfspace or vertex-description. [4, §2.1] [104, p.173] [105][75] [76] Volume is a concept germane to R 3 ; in higher dimensions it iscalled content. Applying the EDM assertion (§4.9.1) and a result givenin [1, §8.3.1], a general nonempty simplex (§2.7.3) in R N−1 corresponding toan EDM D ∈ S N 0 has content√ c = content(S) det 1/2 (−V T NDV N ) (530)where the content-squared of the unit simplex S ⊂ R N−1 is proportional toits Cayley-Menger determinant;[ ]content(S) 2 −1 −D([0=2 N−1 (N −1)! 2 det e1 e 2 · · · e N−1 ]) 11 T 0(531)where e i ∈ R N−1 and the EDM definition used is D(X) (333).4.35 The simplest polychoron is called a pentatope [48]; a regular simplex hence convex.(A pentahedron is a three-dimensional object having five vertices.)

188 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX.Figure 4.9: The length of one-dimensional face a equals the height h=a=1of this non-simplicial pyramid in R 3 with square base inscribed in a circle ofradius R centered at the origin. [48, Pyramid]Example. Pyramid. A formula for volume of a pyramid is known; 4.36it is 1/3 the product of its base area with its height. [57] The pyramidin Figure 4.9 has volume 1/3. To find its volume using EDMs, we mustfirst decompose the pyramid into simplicial parts. Slicing it along the planecontaining the line segments corresponding to radius R and height h we findthe vertices of one simplex,⎡X = ⎣1/2 1/2 −1/2 01/2 −1/2 −1/2 00 0 0 1⎤⎦ ∈ R n×N (532)where N = n+1 for any nonempty simplex in R n . The volume of this simplexis half that of the entire pyramid; id est, √ c =1/6 found by evaluating (530).With that, we conclude the digression of path.4.36 Pyramid volume is independent of the paramount vertex position as long as its heightremains constant.

4.13. EDM-ENTRY COMPOSITION 1894.12.4 Affine dimension reduction in three dimensions(confer §4.8.4) The determinant of any M×M matrix is equal to the productof its M eigenvalues. [26, §5.1] When N = 4 and det(Θ T Θ) is 0, that meansone or more eigenvalues of Θ T Θ ∈ R 3×3 are 0. The determinant will go to0 whenever equality is attained on either side of (328), (521a), or (521b),meaning that a tetrahedron has collapsed to a lower affine dimension; id est,r = rank Θ T Θ = rank Θ is reduced below N −1 exactly by the number of 0eigenvalues (§4.7.2).Therefore, in solving completion problems of any size N where one ormore entries of an EDM are unknown, the dimension r of the affine hullrequired to contain the unknown points is potentially reduced by selectingdistances to attain equality in (328) or (521a) or (521b).4.12.4.1 Exemplum reduxWe now apply the fifth Euclidean axiom to an earlier problem:Example. Small completion problem, III. Returning again to the Exampleon page 132 that pertains to Figure 4.2 where N = 4 (confer §4.9.3),d 14 is ascertainable from the fifth Euclidean metric requirement. Because alldistances in (326) are known except √ d 14 , cos θ 123 =0 and θ 324 = 0 resultfrom identity (371). Applying (328),cos(θ 123 + θ 324 ) ≤ cos θ 124 ≤ cos(θ 123 − θ 324 )0 ≤ cos θ 124 ≤ 0(533)It follows from (371) that d 14 can only be 2. Because equality is attained in(533), the affine dimension r cannot exceed N −2, as explained. 4.13 EDM-entry compositionResults of Schoenberg from 1938 can be used to show that certain nonlinearcompositions of individual EDM entries yield more EDMs; Laurent [86, §2.3]claims, in particular,D ∈ EDM N ⇔ 1 − e −αD ∆ = [1 − e −αd ij] ∈ EDM N ∀α > 0 (534)

190 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX(a)1.41.210.80.60.40.2d 1/αijlog 2 (1+d 1/αij )1−e −αd ij40.5 1 1.5 2d ij3(b)21d 1/αijlog 2 (1+d 1/αij )1−e −αd ij0.5 1 1.5 2αFigure 4.10: Some entry-wise EDM compositions: (a) α = 2. Concavenondecreasing in d ij . (b) Trajectory convergence in α for d ij = 2.

4.14. EDM INDEFINITENESS 191Schoenberg’s results suggest that certain finite positive roots of EDM entriesproduce another EDM; [106, §6, thm.5] (confer [38, p.108-109]) specifically,D ∈ EDM N ⇔ D 1/α ∆ = [d 1/αij ] ∈ EDM N ∀α > 1 (535)The special case α = 2 is of interest because it corresponds to absolutedistance.Assuming that points constituting a corresponding list X are distinct(453) and because D ∈ S N 0 , then it follows:limα→∞ D1/α = lim 1 − e −αD = −E = ∆ 11 T − I (536)α→∞Negative elementary matrix −E (§B.3) is interior to the EDM cone (§5.4)and terminal to the respective trajectories (534) and (535) as functions ofα . Both trajectories never leave the EDM cone; in engineering terms, theEDM cone is an invariant set [107] to either trajectory. Further, if D is notan EDM but for some particular α p it becomes an EDM, then for all greatervalues of α it will remain an EDM.These preliminary findings lead one to speculate whether any concavenondecreasing composition of individual EDM entries d ij on R + will produceanother EDM; e.g., empirical evidence suggests that given EDM D , foreach fixed α ≥ 1 [sic] the composition [log 2 (1 + d 1/αij )] is also an EDM.Figure 4.10 illustrates its concavity in d ij together with functions (534) and(535).4.14 EDM indefinitenessBy the known result in §A.7.1 regarding a 0-valued entry on the main diagonalof a symmetric positive semidefinite matrix, there can be no positivenor negative semidefinite EDM except the 0 matrix because EDM N ⊆ S N 0(332) andS N 0 ∩ S N + = 0 (537)the origin. So when D ∈ EDM N , there can be no factorization D =A T Anor −D =A T A . [26, §6.3] Hence the eigenvalues of an EDM are neither allnonnegative or all nonpositive; an EDM is indefinite and possibly invertible.

192 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.14.1 EDM eigenvalues, congruence transformationFor any symmetric −D , we can characterize its eigenvalues by congruencetransformation: [26, §6.3][ ]VT− W T NDW = − D [ V N1 T1 ] = −[VTN DV N V T N D11 T DV N 1 T D1](538)Becauseis full-rank, then (891)W ∆ = [V N 1 ] ∈ R N×N (539)inertia(−D) = inertia ( −W T DW ) (540)the congruence (538) has the same number of positive, zero, and negativeeigenvalues as −D . Further, if we denote by σ i , i=1... N −1 , the eigenvaluesof −V T N DV N , and the eigenvalues of the congruence −W T DW byζ i , i=1... N , and if we arrange each respective set of eigenvalues in nonincreasingorder, then by theory of interlacing eigenvalues for bordered symmetricmatrices, [28, §4.3] [26, §6.4] [99, §IV.4.1]ζ N ≤ σ N−1 ≤ ζ N−1 ≤ σ N−2 ≤ · · · ≤ σ 2 ≤ ζ 2 ≤ σ 1 ≤ ζ 1 (541)When D ∈ EDM N , σ i ≥ 0 for all i (845) because −V T N DV N ≽ 0 as we nowknow. That means the congruence must have N −1 nonnegative eigenvalues;ζ i ≥ 0, i=1... N −1. The remaining eigenvalue ζ N cannot be nonnegativebecause then −D would be positive semidefinite, an impossibility; so ζ N < 0.By congruence, nontrivial −D must therefore have one and only one negativeeigenvalue; 4.37 [50, §2.4.5]D ∈ EDM N ⇒⎧⎪⎨⎪⎩λ(−D) i ≥ 0,N∑λ(−D) i = 0i=1D ∈ S N 0i=1... N −1(542)4.37 All the entries of the corresponding eigenvector must have the same sign with respectto each other because that eigenvector is the Perron vector corresponding to the spectralradius; [28, §8.2.6] the predominant characteristic of negative [sic] matrices.

4.14. EDM INDEFINITENESS 193where the λ(−D) i are eigenvalues of −D whose sum must be 0 only becausetrD = 0. [26, §5.1] Yet this condition is insufficient to determine whethersome given H ∈ S N 0 is an EDM, as shown by counter-example. 4.384.14.2 Spectral coneHere we draw heavily from Chapter 2 with regard to polyhedral cones andtheir duals.[ ] −D 1Denoting the eigenvalues of1 T ∈ S N+1 by0([ ]) −D 1λ1 T ∈ R N+1 (543)0we have the Cayley-Menger form (confer §4.12.3.1) of necessary and sufficientconditions for D ∈ EDM N from the literature; [97, §3] 4.39 [108, §3](confer (352) (329))⎧ ([ ]) −D 1λ1⎪⎨T ≥ 0, i = 1... N0iD ∈ EDM N ([ ])⇔T−D 1(544)λ1 T 1 = 00⎪⎩D ∈ S N 0([ ]) −D 1When D is an EDM, the eigenvalues λ1 T belong to the particularorthant in R N+1 having the N + 1 th coordinate as the sole negative0coordinate; 4.40 R N+1N+1- = ∆ cone {e 1 , e 2 , · · · e N , −e N+1 } (545)4.38 When N = 3, for example, the symmetric hollow matrix⎡ ⎤0 1 1H = ⎣ 1 0 5 ⎦ ∈ S N 01 5 0is not an EDM, although λ(−H) = {5, 0.3723, −5.3723} conforms to (542).4.39 Recall that for D ∈ S N 0 , −V T N DV N ≽ 0 subsumes nonnegativity axiom 1 (§4.8.1).4.40 We observe, empirically, all except one entry of the corresponding eigenvector havethe same sign with respect to each other.

194 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXWhen the eigenvalues are ordered, the constraints [ (544) ] specify membership−EDM N 1to a pointed polyhedral spectral cone for1 T :0whereK λ = {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N ≥ 0 ≥ ζ N+1 , 1 T ζ = 0}= K M ∩ R N+1N+1-([ ∩ ∂H]) (546)−EDM N 1= λ1 T 0∂H = {ζ ∈ R N+1 | 1 T ζ = 0} (547)is a hyperplane through the origin, and K M is the monotone cone (§2.9.2.2.2)which has nonempty interior but is not pointed;K M ∆ = {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N+1 } (251)So,K ∗ M = { [e 1 −e 2 e 2 −e 3 · · · e N −e N+1 ]a | a ≽ 0 } ⊂ R N+1 (252)indicating K λ has empty interior. Defining⎡⎡e T 1 − e T ⎤2A =∆ ⎢e T 2 − e T 3 ⎥⎣ . ⎦ ∈ RN×N+1 , B ∆ =⎢e T N − ⎣eT N+1we have the halfspace-description:dim aff K λ = dim ∂H = N (548)e T 1e T2 .e T N−e T N+1⎤⎥⎦ ∈ RN+1×N+1 (549)K λ ∆ = {ζ ∈ R N+1 | Aζ ≽ 0, Bζ ≽ 0, 1 T ζ = 0} (550)From (550) and (259) we get a vertex-description for the pointed spectralcone having empty interior:{ ([ ] ) †}ÂK λ = Ṽ N Ṽ N b | b ≽ 0(551)ˆB

4.14. EDM INDEFINITENESS 195where ṼN ∈ R N+1×N and where ˆB = e T N ∈ R1×N+1 and⎡e T 1 − e T ⎤2Â = ⎢e T 2 − e T 3 ⎥⎣ . ⎦ ∈ RN−1×N+1 (552)e T N−1 − eT Nhold [ those ] rows of A and B corresponding to conically independent rowsAin ṼB N . The vertex-description of the non-pointed dual spectral conehaving nonempty interior is, (179)K ∗ λ = K∗ M + RN+1∗ N+1- + ∂H∗ ⊆ R N+1= {[ A T B T 1 −1 ] b | b ≽ 0 } =From (551) and (260) we get a halfspace-description:{ }[ÂT ˆBT 1 −1]a | a ≽ 0(553)K ∗ λ = {y ∈ R N+1 | (Ṽ T N [ÂT ˆBT ]) † Ṽ T N y ≽ 0} (554)The polyhedral dual spectral cone K ∗ λ is closed, convex, has nonempty interiorbecause K λ is pointed, but is not pointed because K λ has empty interior.For eigenvalues arranged in nonincreasing order, (544) can be restated⎧ ([ ])⎪⎨ −D 1λD ∈ EDM N ⇔ 1 T ∈ K0 λ(555)⎪⎩D ∈ S N 04.14.3 Eigenvalues of −V DV versus −V † N DV NSuppose for D ∈ EDM N we are given eigenvectors v i ∈ R N of −V DV andcorresponding eigenvalues λ∈ R N so that− V DV v i = λ i v i , i = 1... N (556)where V = V N V † N = V W V W T ∈ SN is the auxiliary geometric centering matrix(§B.4). From these we can determine the eigenvectors and eigenvalues of−V † N DV N and −VW T DV W : Defineν i ∆ = V † N v i , λ i ≠ 0 (557)

196 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXThen we have− V DV N V † N v i = λ i v i (558)−V † N V DV N ν i = λ i V † N v i (559)−V † N DV N ν i = λ i ν i (560)the eigenvectors of −V † N DV N are given by (557) while its correspondingnonzero eigenvalues are identical to those of −V DV although −V † N DV N isnot necessarily positive semidefinite.Similarly we can show the eigenvectors of −VW T DV W are VW T v i correspondingto its nonzero eigenvalues which are identical to those of −V DV .4.15 DFTThe discrete Fourier transform (DFT) is a staple of the digital signal processingcommunity. [109] In essence, the DFT is a correlation of a rectangularlywindowed sequence (or discrete signal) with exponentials whose frequenciesare equispaced on the unit circle. 4.41 The DFT of the sequencef = ∆ [f i , i=0... n−1]∈ C n is, in traditional form 4.42 for k ∈{0... n − 1},∑n−1F k = f i e −j i2πk/n (561)i=0where j ∆ = √ −1 . Sequence F ∆ = [F k , k=0... n−1]∈ C n is considered a frequencyspectral analysis of the signal sequence f ; id est, the F k are amplitudesof exponentials which, when combined, give back the original sequence,f i = 1 ∑n−1F k e j i2πk/n (562)nk=0Index k in F k references the discrete frequencies 2πk/n of the exponentiale j i2πk/n in the synthesis equation (562).The matrix form of the DFT is writtenF = W H f (563)4.41 on the unit circle in the z plane; z = e sT where s = σ +jω is the traditional Laplacefrequency, ω is the Fourier frequency in radians, and T is the sample period in seconds.4.42 The convention is: lowercase for the sequence and uppercase for its transform.

4.15. DFT 197where the DFT matrix is 4.43⎡1 1 1 · · · 11 e j2π/n e j4π/n · · · e j(n−1)2π/nW =∆ 1 e j4π/n e j8π/n · · · e j(n−1)4π/n⎢ 1 e j6π/n e j12π/n · · · e j(n−1)6π/n⎣ . . . · · · .1 e j(n−1)2π/n e j(n−1)4π/n · · · e j(n−1)2 2π/n⎤∈ C n×n (564)⎥⎦characterized byW T = W (565)W −1 = 1 n W H (566)where superscript H denotes conjugate transpose. Similarly, the IDFT isf = 1 WF (567)nDirect computation of (563) or (567) would require O(n 2 ) operations for largen . Optimization of computational intensity for the DFT culminated in thedevelopment of the fast Fourier transform (FFT) algorithm whose intensityis O(n log 2 (n)) , as is well known. [110]4.15.1 DFT via EDMThere is a relationship between EDMs and the DFT. The norm of the differenceof any two complex vectors x,y ∈ C n is‖x − y‖ 2 = (x − y) H (x − y) = ‖Re(x − y)‖ 2 + ‖Im(x − y)‖ 2 (568)Consider applying this Euclidean metric to a list comprising the Fourierexponentials and the signal sequence; id est, to {x l ∈ C n , l=1... N} wherex l = [ e −j i2π(l−1)/n , i=0... n−1 ] ∈ C n , l ∈ {1... N −1}x N = [f i , i=0... n−1] ∈ C n (569)whereN = n + 1 (570)4.43 conjugate to that of some other authors.

198 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXThe EDMD ∆ = [ ‖x l − x m ‖ 2 , l,m=1... N × 1... N ] (571)is then constituted, for l,m=1... N −1 × 1... N −1, by‖Re(x l − x m )‖ 2 = n−1 ∑(cos(i2π(l−1)/n) − cos(i2π(m−1)/n)) 2i=0‖Im(x l − x m )‖ 2 = n−1 ∑(572)(sin(i2π(l−1)/n) − sin(i2π(m−1)/n)) 2i=0()∑n−1d lm = 2 n − cos(2i(l − m)π/n) =i=0{ 2n, l ≠ m0, otherwisewhile the last row is constituted, for m=1... N −1, by‖Re(x N − x m )‖ 2 = n−1 ∑i=0= n−1 ∑i=0n−1= 14n‖Im(x N − x m )‖ 2 = n−1 ∑i=0= n−1 ∑(Ref i − cos(i2π(m−1)/n)) 2(573)(Ref i ) 2 − 2Ref i cos(i2π(m−1)/n) + cos 2 (i2π(m−1)/n)∑k=0i=0n−1= 14n|F k + Fn−k ∗ |2 − Re(F m−1 + Fn−m+1) ∗ + n−1 ∑(Imf i + sin(i2π(m−1)/n)) 2i=0cos 2 (i2π(m−1)/n)(574)(Imf i ) 2 + 2Imf i sin(i2π(m−1)/n) + sin 2 (i2π(m−1)/n)∑k=0k=0|F k − Fn−k ∗ |2 + Re(F m−1 − Fn−m+1) ∗ + n−1 ∑i=0sin 2 (i2π(m−1)/n)(575)d Nm = 1 ∑n−1|F4n k + Fn−k| ∗ 2 + |F k − Fn−k| ∗ 2 − 2ReF n−m+1 + n (576)where F n ∆ =F 0 due to transform periodicity, and where Parseval’s relation 4.44[111] [43] is substituted as well as sine and cosine transforms [112] of real andimaginary signals 4.45 [109, §8.8]. Thus exists the relationship between EDM4.44 The Fourier summation ∑ |F k | 2 /n is equivalent to ∑ |f i | 2 .4.45 The complex conjugate of F is indicated by F ∗ . Some physical systems, such as MagneticResonance Imaging (MRI) devices [113] [114], naturally produce signals originatingin the Fourier domain.

4.15. DFT 199and DFT. Only the last row and column of the EDM depends on the sequencef itself. The remaining entries (573) depend only on sequence length n .4.15.1.0.1 Example. IDFT. To relate the DFT to its associated EDMD in a useful way, consider finding the inverse discrete Fourier transform(IDFT). Assuming the signal is real, then we have[](11 T − I)2n [d Nm , m=1...N−1]D =∈ EDM N (577)[d Nm , m=1...N−1] T 0whered Nm = 1 ∑n−1|F k | 2 − 2ReF m−1 + n (578)nk=0Assume n=4, N = 5, and we are given the DFT F =[4 0 0 0 ] T . Then⎡ ⎤0 8 8 8 08 0 8 8 8D =⎢ 8 8 0 8 8⎥(579)⎣ 8 8 8 0 8 ⎦0 8 8 8 0We seek to recover the signal vector x 5 =f . Cholesky factorization (§F.1.1.5)Θ T Θ=−V ∆ N TDV N ∈ R N−1×N−1 yields (377);⎡2 √ 2 √ √ ⎤√2√ 2 0Θ = ⎢ 0 6 2/3 0⎣ 0 0 4/ √ ⎥3 0 ⎦ = ∆ [ ˜x 2 − ˜x 1 ˜x 3 − ˜x 1 ˜x 4 − ˜x 1 ˜x 5 − ˜x 1 ]0 0 0 0(580)In this circumstance, x 1 through x 4 are always known from (569); id est, wealways know, for any length-4 DFT,⎡⎤0 0 0[ x 2 − x 1 x 3 − x 1 x 4 − x 1 ] = ⎢ −1 − j −2 −1 + j⎥⎣ −2 0 −2 ⎦ (581)−1 + j −2 −1 − jThe idea is to rotate the first three columns of Θ so as to become alignedwith (581) using the Procrustes technique in §C.2, and to then apply that

200 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXsame rotation to the last column of Θ . Having found the optimal rotation,all that is left is to add x 1 to the rotated fourth column of Θ to recover x 5 ;in this particular case, rotation of the fourth column is superfluous becauseit is 0. Matrix Θ indicates x 5 = x 1 = [ 1 1 1 1 ] T which is, in fact, theIDFT.Finding an inverse DFT via Cholesky factorization may appear to involvemore computation than the FFT. Yet for D as in (577), we always have theblock matrix⎡1(d ⎤2 1N − d 2N )+ n1−VNDV T (d 2 1N − d 3N )+ nN =(11⎢T 1+ I)n (d 2 1N − d 4N )+ n⎥⎣. ⎦1(d 2 1N − d 2N )+ n 1(d 2 1N − d 3N )+ n 1(d 2 1N − d 4N )+ n · · · d 1N(582)whose first principal submatrix in R n−1×n−1 is fixed whenever n is. Wespeculate this structure reduces normally cubic computational intensity toat most O(n 2 ).4.16 Self similarity...The self similarity function A(l) is related to the autocorrelation functionR(l); in fact, self-similarity is the progenitor of autocorrelation. Whenn→∞ , A(l) = κ − 2R(l) where, for some constant κ . Both functionsare used to determine periodicity.Given the length-M real sequence [f i , i=0... M −1] ∈ R M , we definethe self-similarity function;where n≤M is the window length.A(l) = ∆ 1 ∑n−1(f i − f i−l ) 2 (583)2i=0x i+1 ∆ = f(i) ∈ R (584)d ij ={ (xi − x j ) 2 , 1 ≤ i,j ≤ M0 otherwise(585)

4.16. SELF SIMILARITY... 201A(l) = 1 2n∑d i,i−l (586)which is a sum of some diagonal of the EDM D. To select the ij th entry ofmatrix ∆ ,i=1√dij ∆ = e T i∆e j = e T j∆e i = tr(e j e T i ∆) = 〈e i e T j , ∆〉 (587)where here,D ∆ = ∆ ◦ ∆ (588)and where ◦ denotes the Hadamard (entry-wise) product. Each scaled entry√2dij = √ 2 tr(e j e T i ∆) = 1 √2tr ( (e i e T j + e j e T i )∆ ) (589)is a coefficient of orthogonal projection of ∆ (§E.6.4) on the range of a vectorized(§2.1.1) member of the orthonormal basis for the vector space S M 0 :E ij = √ 1 ( )ei e T j + e j e T i , 1 ≤ i < j ≤ M (53)2The self-similarity function is therefore equivalent to the Parseval relation,[43] [109] [110] [111] giving a total energy of projection:A(l) =n∑tr(e i e T i−lD) = tri=1i > ln∑e i e T i−l D (590)i=1i > l

202 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX

Chapter 5EDM coneFor N > 3, the cone of EDMs is no longer a circular cone andthe geometry becomes complicated...−Hayden, Wells, Liu, & Tarazaga (1991) [49, §3]From (537), we know the EDM cone does not intersect the positive semidefinite(PSD) cone S N + except at the origin, their only vertex; there can beno positive nor negative semidefinite EDM.EDM N ∩ S N + = 0 (591)Even so, the two cones can be related. In §7.3.1.1.1 we prove the resemblanceto EDM definition (333):EDM N = S N 0 ∩ ( )S N⊥g − S N + (779)where S N 0 is the symmetric hollow subspace and S N⊥g ={u1 T +1u T | u∈ R N }is the orthogonal complement of the geometric center subspace (§E.7.1.0.2).5.0 c○ 2001 Jon Dattorro, all rights reserved.203

204 CHAPTER 5. EDM CONEFor now we invoke the popular matrix criterion (356) to illustrate thetraditional correspondence between the EDM and PSD cones, both belongingto the same ambient space of symmetric matrices:D ∈ EDM N⇔{−V DV ∈ SN+D ∈ S N 0(356)The set of all EDMs of dimension N×N forms a closed convex cone EDM Nin S N 0 , a subspace of S N , because any pair of EDMs satisfies definition (111);videlicet, for all ζ 1 , ζ 2 ≥ 0, (§A.3.1.0.2)ζ 1 V D 1 V + ζ 2 V D 2 V ≽ 0ζ 1 D 1 + ζ 2 D 2 ∈ S N 0⇐ V D 1V ≽ 0, V D 2 V ≽ 0D 1 ∈ S N 0 , D 2 ∈ S N 0(592)5.0.0.0.1 Definition. Cone of Euclidean distance matrices.(confer (402)) In the subspace of symmetric hollow matrices S N 0 (45), the setof all Euclidean distance matrices EDM N forms a unique immutable pointedclosed convex cone called the EDM cone. For N > 0 (§4.8.3),EDM N = ∆ { }D ∈ S N 0 | −V DV ∈ S N += ⋂ {D ∈ S N | 〈zz T , −D〉 ≥ 0, 〈e i e T i , D〉 = 0 } (593)z∈N(1 T )i=1...Nthe EDM cone in isomorphic R N 2 is the intersection in variable D = [d ij ]of an infinite number of halfspaces about the origin and a finite numberof hyperplanes through the origin. Hence EDM N has empty interior withrespect to S N .△The EDM cone is more easily visualized in the isomorphic vector subspaceR N(N−1)/2 . In the case N = 1, the EDM cone is the origin in R 0 . Inthe case N = 2, the EDM cone is the nonnegative real line in R illustratedin Figure 5.4. In the case N = 3 points, the Euclidean axioms are necessaryand sufficient tests to certify realizability of triangles; (515). Hencethe triangle inequality (axiom 4) describes three halfspaces (459) whose intersectionmakes a polyhedral cone of realizable √ d ij in R 3 ; an isomorphicsubspace representation of EDM 3 in the natural coordinates illustrated inFigure 5.1(b).

205d 13d 0 13 0.20.2d 120.40.40.60.6 d 120.80.8d 230.80.8√d23 0.60.61110.8d 230.60.40.2(a)0√ √ 0 0 d13 0.20.2 d120.20.20.40.4 0.40.40.60.6 0.60.60.80.8 0.811 1110.81(d)0.40.40.20.2(b)00(c)Figure 5.1: Relative boundary (tiled) of cone of Euclidean distance matricesEDM N (N = 3) drawn truncated in isomorphic subspace R 3 ; id est,dvec ∂EDM 3 . (a) EDM cone drawn in usual distance-square coordinatesd ij . View is from inside looking toward origin. (Imagine it continuing outof page.) Unlike positive semidefinite cone, EDM cone is neither self-dualor proper in S N ; dual EDM cone for this example belongs to ambient spaceof symmetric matrices in isomorphic R 6 . (b) Drawn in its natural coordinates√ d ij (absolute distance), cone remains convex; intersection of threehalfspaces (459) whose partial boundaries each contain origin. Cone geometrybecomes complicated (non-polyhedral) in higher dimension. [49, §3](c) Two coordinate systems artificially superimposed. Coordinate transformationfrom d ij to √ d ij is apparently a topological contraction. (d) Sittingon its vertex, pointed EDM 3 is a circular cone having axis of revolutiondvec(−E)= dvec(11 T − I) ((52), §4.13). Rounded vertex is artifact of plot.

206 CHAPTER 5. EDM CONE5.1 Polyhedral boundsBecause the EDM cone comprises the intersection of an infinite number ofindependent halfspaces, the convex cone of EDMs is non-polyhedral in d ijfor N > 2; [sic] (e.g., Figure 5.1(a)).Albeit relative angles θ ikj (371) are nonlinear functions of the d ij , stillwe found polyhedral relations bounding the set of all EDMs for N = 1, 2, 3, 4:In the case N = 2 we found the EDM cone to be a half-line (the polyhedronillustrated in Figure 5.4), while for N = 3 we have the polyhedral cone inFigure 5.1(b). In the case N = 4, relative-angle inequality (521) togetherwith the four Euclidean axioms are necessary and sufficient tests for realizabilityof tetrahedra. (522) The relative-angle inequality provides a regulartetrahedron in R 3 [sic] bounding angles θ ikj at vertex x k belonging to EDM 4(Figure 4.8, p.184). 5.1Yet if we were to employ the procedure outlined in §4.12.3 for makinggeneralized triangle inequalities, then there we would find all the d ij -transformations necessary for generating polyhedral objects bounding EDMsof any higher dimension; N > 4.5.2 EDM definition in 11 TAny EDM D corresponding to affine dimension r has representation(confer (333))D(V X , y) ∆ = y1 T + 1y T − 2V X V T X + λ N 11T ∈ EDM N (594)where R(V X ∈ R N×r ) ⊆ N(1 T ) and VX TV Xzeros along the main diagonal defines V X ,a diagonal matrix having noλ ∆ = 2‖V X ‖ 2 F , and y ∆ = δ(V X V T X ) − λ2N 1 (595)where y =0 if and only if 1 is an eigenvector of EDM D . [49, §2] Scalar λbecomes an eigenvalue when corresponding eigenvector 1 exists; e.g., when5.1 Still, the axiom-4 triangle inequalities corresponding to each principal 3×3 submatrixof −VN TDV N demand that the corresponding √ d ij belong to a polyhedral cone like thatin Figure 5.1(b).

5.2. EDM DEFINITION IN 11 T 207X = I in EDM definition (333).Formula (594) can be validated by substituting (595); we findD(V X ) ∆ = δ(V X V T X )1 T + 1δ(V X V T X ) T − 2V X V T X ∈ EDM N (596)is simply the standard EDM definition (333) where X T X has been replacedwith the subcompact singular value decomposition (§A.6.2) 5.2V X V T X ≡ V T X T XV (597)Then the inner product VX TV X is an r ×r diagonal matrix Σ of nonzerosingular values. Next we validate eigenvector 1 and eigenvalue λ .(=⇒) Suppose 1 is an eigenvector of EDM D . Then becauseV T X 1 = 0 (598)it followsD1 = δ(V X V T X )1T 1 + 1δ(V X V T X )T 1 = N δ(V X V T X ) + ‖V X ‖ 2 F 1⇒ δ(V X V T X ) ∝ 1 (599)δ(V X VX T)T 1 = Nκ = tr(VX TV X) = ‖V X ‖ 2 F ⇒ δ(V X V X T )=κ1 and y=0.(⇐=) Now suppose δ(V X VX T)= λ 1 ; id est, y=0. Then2ND = λ N 11T − 2V X V T X ∈ EDM N (600)1 is an eigenvector with corresponding eigenvalue λ . ∆5.2 Subcompact SVD: V X VXT =QΣ 1/2 Σ 1/2 Q T ≡V T X T XV . So VXT is not necessarily XV(§4.5.1.0.1), although affine dimension r = rank(VX T ) = rank(XV ). (436)

208 CHAPTER 5. EDM CONE5.2.1 Range of EDM DFrom §B.1.1 pertaining to linear independence of dyad sums: if the transposehalves of all the dyads in the sum (596) 5.3 make a linearly independent set,then the non-transpose halves constitute a basis for the range of EDM D .We have, for D ∈ EDM N ,R(D)= R([δ(V X V T X ) 1 V X ]) ⇐ rank([1 δ(V X V T X ) V X ])= 2 + rR(D)= R([1 V X ]) ⇐ otherwise(601)To show that, we need the condition under which the rank equality is satisfied:We know R(V X ) ⊥ 1, but what is the relative geometric orientationof δ(V X VX T) ? δ(V X V X T) ≽ 0 because V X V X T ≽ 0, and δ(V X V X T ) ∝1 remainspossible (599); this means δ(V X VX T) /∈ N(1T ) simply because it has no negativeentries. (Figure 5.2) If the projection of δ(V X VX T) on N(1T ) does notbelong to R(V X ), then that is a necessary and sufficient condition for linearindependence (l.i.) of δ(V X VX T) with respect to R([1 V X ]) ; id est,V δ(V X V T X ) ≠ V X a for any a∈ R r(I − 1 N 11T )δ(V X V T X ) ≠ V X aδ(V X V T X ) − 1 N ‖V X ‖ 2 F 1 ≠ V X aδ(V X V T X ) − λ2N 1 = y ≠ V X a ⇔ {1, δ(V X V T X ), V X } is l.i.(602)On the other hand when this condition is violated (when (595) y = V X a p ),then from (594) we haveR ( D = y1 T + 1y T − 2V X V T X + λ N 11T) = R ( (V X a p + λ N 1)1T + (1a T p − 2V X )V T X= R([V X a p + λ N 1 1aT p − 2V X ])= R([1 V X ])(603)An example of such a violation is (600) where, in particular, a p = 0. )5.3 Identifying the columns V X∆= [v1 · · · v r ] , then V X V T X = ∑ iv i v T iis a sum of dyads.

5.2. EDM DEFINITION IN 11 T 209N(1 T )δ(V X V T X )1V XFigure 5.2: Example of V X selection to make an EDM corresponding tocardinality N = 3 and affine dimension r=1 ; V X is a vector in N(1 T )⊂ R 3 .Nullspace of 1 T (containing V X ) is hyperplane in R 3 (drawn truncated) havingnormal 1. Vector δ(V X VX T) may or may not be in plane spanned by {1, V X } ,but belongs to nonnegative orthant which is strictly supported by N(1 T ).

210 CHAPTER 5. EDM CONEA statement parallel to (601) is therefore, for D ∈ EDM N ,(Theorem 4.7.3.0.1)rank(D) = r + 2 ⇔ y /∈ R(V X ) ( ⇔ 1 T D † 1 = 0 )rank(D) = r + 1 ⇔ y ∈ R(V X ) ( ⇔ 1 T D † 1 ≠ 0 ) (604)5.2.2 Faces of EDM coneExpression (596) has utility constructing the set of all EDMs correspondingto affine dimension r :{D ∈ EDM N | rank(V DV )= r }= { δ(V X V T X )1T + 1δ(V X V T X )T − 2V X V T X | V X ∈ R N×r , V T X V X = δ2 (V T X V X ), R(V X)⊆ N(1 T ) }(605)whereas {D ∈ EDM N | rank(V DV )≤ r} is just the closure of this same set;{D ∈ EDM N | rank(V DV )≤ r } = { D ∈ EDM N | rank(V DV )= r } (606)None of these are necessarily convex sets.When cardinality N = 3 and affine dimension r = 2, for example, theinterior rel int EDM 3 is realized via (605). (§5.4)When N = 3 and r = 1, the relative boundary of the EDM conedvec ∂EDM 3 is realized in isomorphic R 3 as in Figure 5.1(d). This figurecould be constructed via (606) by spiraling vector V X tightly about the originin N(1 T ) ; as can be imagined with aid of Figure 5.2. Vectors closeto the origin in N(1 T ) are correspondingly close to the origin in EDM N .As vector V X orbits the origin in N(1 T ) , the corresponding EDM orbitsthe axis of revolution while remaining on the boundary of the circular conedvec ∂EDM 3 . (Figure 5.3)5.2.2.1 Isomorphic facesIn high cardinality N , any set of EDMs constructed via (605) or (606) withparticular affine dimension r is isomorphic with any set constructed in lowercardinality that admits the same affine dimension. We do not prove thathere.

5.2. EDM DEFINITION IN 11 T 21110(a)-110-1V X ∈ R 3×1տ-101←−D(V X ) ⊂ dvec ∂EDM 3(b)Figure 5.3: (a) Vector V X from Figure 5.2 spirals in N(1 T ) ⊂ R 3 decayingtoward origin. (Spiral is two-dimensional in vector-space R 3 .) (b) Correspondingtrajectory D(V X ) on EDM cone relative boundary creates a vortexalso decaying toward origin. There are two complete orbits on EDM coneboundary about axis of revolution for every single revolution of V X aboutorigin. (Vortex is three-dimensional in isometrically isomorphic R 3 .)

212 CHAPTER 5. EDM CONEIn particular, extreme directions of EDM N correspond to affine dimensionr =1 and are simply represented: for any particular N and each and everyz ∈ R N in N(1 T ) ,Γ = (z ◦ z)1 T + 1(z ◦ z) T − 2zz T (607)is an extreme direction corresponding to a one-dimensional face of the EDMcone EDM N that is a ray in isomorphic R N(N−1)/2 .Now suppose we are given a particular EDM D p ∈ EDM N correspondingto affine dimension r and parameterized by V Xp in (596). The smallest faceof EDM N containing D p isF ( EDM N ∋ D p)={δ(VX V T X )1T +1δ(V XV T X )T −2V XV T X |V X ∈ R N×r , V T X V X = δ2 (V T X V X ), R(V X)⊆ R(V Xp ) }which is isomorphic 5.4 with the convex cone EDM r+1 , hence of dimension(608)dim F ( EDM N ∋ D p)= (r + 1)r/2 (609)in isomorphic R N(N−1)/2 . Not all dimensions are represented; e.g., the EDMcone has no two-dimensional faces.When cardinality N = 4, and affine dimension r=2 so that R(V Xp ) is anytwo-dimensional subspace of three-dimensional N(1 T ) in R 4 , for example,then the corresponding face of EDM 4 is isometrically isomorphic with: (606)EDM 3 = {D ∈ EDM 3 | rank(V DV )≤ 2} ≃ F(EDM 4 ∋ D(V Xp )) (610)Each two-dimensional subspace of N(1 T ) corresponds to another threedimensionalface.Because each and every principal submatrix of an EDM in EDM N(§4.12.3) is another EDM [86, §4.1], for example, then each principal submatrixbelongs to a particular face of EDM N .5.2.2.2 Open questionThis result (609) is analogous to that for the positive semidefinite cone in§2.6.6.3, although the question remains open whether all faces of EDM N areexposed like they are for the PSD cone.5.4 The fact that the smallest face is isomorphic with another (perhaps smaller) EDMcone is implicit in [49, §2].

5.3. EDM CONE BY INVERSE IMAGE... 2135.3 EDM cone by inverse image...Use technique of §2.6.6.3.4...

214 CHAPTER 5. EDM CONE5.4 Correspondence to PSD cone S N−1+Hayden & Wells et alii [49, §2] assert a one-to-one correspondence with matricespositive semidefinite on N(1 T ). Because rank(V DV )≤N−1 (§4.7.2),the corresponding positive semidefinite cone can only be S N−1+ . [95, §18.2.1]To demonstrate that correspondence more clearly, we follow the innerproductform EDM definition (375): Any EDM may be expressed[ ]D(Φ) =∆ 01δ(Φ)T + 1 [ 0 δ(Φ) ] [ ] 0 0 T T− 2∈ EDM N0 Φ⇔(611)Φ ≽ 0where this D is a surjective linear operator (onto EDM N ), injective becauseit has no nullspace [92, §A.1] on domain S N−1+ . (confer §4.6) Then the EDMcone may be expressed,EDM N = { }D(Φ) | Φ ∈ S N−1+(612)Hayden & Wells’ assertion can therefore be equivalently stated in terms ofthe inner-product form EDM definition,D(S N−1+ ) = EDM N (613)−VN T EDM N V N = S N−1+ (614)because R(V N )= N(1 T ).In terms of affine dimension r , Hayden & Wells claim particular correspondencebetween the PSD and EDM cones:r = N −1: Symmetric hollow matrices D negative definite on N(1 T ) correspondto points relatively interior to the EDM cone.r < N −1: Symmetric hollow matrices D negative semidefinite on N(1 T ) , whereVN TDV N has at least one 0 eigenvalue, correspond to points on therelative boundary of the EDM cone.r = 1: Symmetric hollow nonnegative matrices rank-one on N(1 T ) correspondto extreme directions of the EDM cone; id est, for some vector u ,(§A.3.1.0.7)rankV T N DV N =1D ∈ S N 0 ∩ R N×N+}⇔{D ∈ EDM N−VTD is an extreme direction ⇔ N DV N ≡ uu TD ∈ S N 0(615)

5.4. CORRESPONDENCE TO PSD CONE S N−1+ 2155.4.0.2.1 Example. ⎡ Extreme ⎤ rays versus rays on the boundary.0 1 4The EDM D = ⎣ 1 0 1 ⎦ is an extreme direction of EDM 3 where[ ]4 1 01u = in (615). Because −VN 2TDV N has eigenvalues {0, 5}, the ray whosedirection is D also lies on the [ relative ] boundary of EDM 3 .0 1In contrast, EDM D = is an extreme direction of EDM 2 but1 0−VN TDV N has only one eigenvalue: {1}. Because EDM 2 is a ray whoserelative boundary (§2.5.1.3.1) is the origin, this conventional boundary doesnot include D .5.4.1 Gram-form correspondence to S N−1+Equivalence (614) coincides with 5.5yielding− V T W EDM N V W = S N−1+ (616)− V EDM N V = V W S N−1+ V T W (617)where V W ∈ R N×N−1 is an auxiliary V -matrix (§B.4.3) having propertiesV W VW T =V , V W T V W =I , R(V W)= R(V N )= N(1 T ) , N(VW T )= R(1). Withrespect to the linear Gram-form EDM definitionD(G) = δ(G)1 T + 1δ(G) T − 2G (344)the mutual inverse relation (414) produces [93, §2.1] [94, §2] [95, §18.2.1][96, §2.6]EDM N = D ( V(EDM N ) )= D ( )−V EDM N V 1 2= D ( ) ( )V W S N−1+ VW T 1 2 ≡ D VN S N−1+ VN T 1 2a one-to-one correspondence between EDM N and S N−1+ .(618)5.5 The isomorphism T(Y )=V T N V WY V T W V N relates this necessarily positive semidefinitemap to (614).

216 CHAPTER 5. EDM CONE5.4.2 MonotonicityConsider the linear function g : S N → S N−1 ,g(D) = −V T NDV N (619)having domg = S N 0 and superlevel sets (confer (315))L ν = {D ∈ S N 0 | −V T NDV N ≽ νI } (620)that are simply translations in isomorphic R N(N+1)/2 of the EDM cone thatbelongs to subspace R N(N−1)/2 ; videlicet, for each and every ν ∈ R ,− V T NDV N ≽ νI ⇔ −V T N(D − νV †TN V † N )V N ≽ 0 (621)Because g is concave, 5.6 all its superlevel sets are convex.The difference D 2 − D 1 belongs to the EDM cone if and only if−VN T(D 2 −D 1 )V N ≽ 0 by (352); 5.7 id est,⎧⎪⎨ −VN TD 1V N ≼ − VN TD 2V ND 1≼EDM N D 2⇔⎪⎩S N−1+D 2 − D 1 ∈ S N 0(622)This correspondence between the EDM cone and the positive semidefinitecone connotes monotonicity [25] of g (619); in particular, g(D) is a nondecreasinglinear function on domain S N 0 .5.5 Vectorization projection interpretationIn §E.7.1.0.2 we learn that −V DV can be interpreted as orthogonal projection[94, §2] of −D ∈ S N 0 on the subspace of geometrically centered symmetricmatricesS N g∆= {Y ∈ S N | Y 1 = 0}= {Y ∈ S N | N(Y ) ⊇ 1} = {Y ∈ S N | R(Y ) ⊆ N(1 T )} (1292)= {V XV | X ∈ S N } ⊂ S N (1293)5.6 Any linear function must, of course, be simultaneously concave and convex. (Thesublevel sets of g are simply translations of the negative EDM cone.)5.7 From (329), any matrix V in place of V N will satisfy (622) if R(V )= R(V N )= N(1 T ).

5.5. VECTORIZATION PROJECTION INTERPRETATION 217because elementary auxiliary matrix V is an orthogonal projector (§B.4.1).Yet there is another useful projection interpretation:Revising the fundamental matrix criterion for membership to the EDMcone (329), 5.8〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N 0}⇔ D ∈ EDM N (623)When D ∈ EDM N , this correspondence means −z T Dz is proportional toa nonnegative coefficient of orthogonal projection (§E.6.4.2) of −D in isometricallyisomorphic R N(N+1)/2 on the range of each and every vectorized(§2.1.2.1) symmetric dyad (§B.1) in the nullspace of 11 T ; 5.9 id est, on eachand every member ofT = { svec(zz T ) | z ∈ N(11 T )= R(V N ) } ⊂ svec ∂ S N += { (624)svec(V N υυ T VN T ) | υ ∈ RN−1}whose dimension isdim T = N(N − 1)/2 (625)The set of all symmetric dyads {zz T |z∈R N } constitute the extreme directionsof the positive semidefinite cone (§2.6.4, §2.6.6) S N + , hence lie onits boundary. Yet only those dyads in R(V N ) are included in the test (623),thus only a subset T of all vectorized extreme directions of S N + is observed.In the particularly simple case D ∈ EDM 2 = {D ∈ S 2 0 | d 12 ≥ 0} , for example,there is only one extreme direction of the PSD cone involved (illustratedin Figure 5.4):zz T = 1 2[1 −1−1 1](626)5.8 N(11 T )= N(1 T ) and R(zz T )= R(z).5.9 The range of the vectorized dyads (624) cannot be isomorphic with subspace R(V N )simply because operator T(z ∈ R(V N )) ∆ = svec(zz T ) is nonlinear in z . [38, §2.8-8] Orthogonalprojection on T cannot, therefore, be orthogonal on R(V N ) .

218 CHAPTER 5. EDM CONEsvec ∂ S 2 +d 22dvec EDM 20−Td 11√2d12Figure 5.4: Truncated boundary of positive semidefinite cone S 2 + in isometricallyisomorphic R 3 (via svec (35)) is, in this dimension, constituted solely byits extreme directions. Truncated cone of Euclidean distance matrices EDM 2in isometrically isomorphic subspace R . Relative boundary of EDM cone isconstituted solely by matrix 0. Half-line T = {ζ [ 1 − √ 2 1 ] T | ζ ≥ 0} onPSD cone boundary depicts that lone extreme ray (626) on which orthogonalprojection of −D must be positive semidefinite if D is to belong to EDM 2 .Dual EDM cone is halfspace in R 3 whose partially bounding hyperplane hasinward-normal dvec EDM 2 .

5.5. VECTORIZATION PROJECTION INTERPRETATION 2195.5.1 Face of PSD cone S N + containing VIn any case, set T (624) constitutes the vectorized extreme directions ofan N(N −1)/2-dimensional face of the PSD cone S N + containing auxiliarymatrix V and isomorphic with S N−1+ = S rank V+ (§2.6.6.3).To show this, we must first find the smallest face containing auxiliarymatrix V and then determine its extreme directions. From (126),F ( S N + ∋V ) = {W ∈ S N + | N(W) ⊇ N(V )}= {W ∈ S N + | N(W) ⊇ R(1)}= {V W AV T W| A∈ SN−1 + } = {V W AV T W≃ S rank V+ = −V T N EDMN V N (614)| A∈ int SN−1 + } ≡ {V N BV T N | B ∈ SN−1 + }(627)Because V W VW T =V N V † N =V (§B.4.3), then V belongs to F( S N + ∋V ) . Theisomorphism T(X) = V N VW T XV WVNT is an injective map from SN g onto S N grelating {V W AVW T } to {V NBVN T } . Every rank-one matrix belonging to thisface is therefore of the form:V N υυ T V T N | υ ∈ R N−1 (628)Because F ( S N + ∋V ) is isomorphic with a positive semidefinite cone S N−1+ ,then T constitutes the extreme directions of F , the origin constitutes theextreme points of F , and auxiliary matrix V is some convex combinationof those extreme points and directions by the extremes theorem (§2.6.4.0.1).In fact, the smallest face of the PSD cone S N + containing auxiliary matrixV is the intersection with the geometric center subspace (1292) (1293);F ( S N + ∋V ) = cone { V N υυ T V T N| υ ∈ RN−1}= S N g ∩ S N + (confer (632))(629)In isometrically isomorphic R N(N+1)/2 ,F ( S N + ∋V ) = cone T (630)

220 CHAPTER 5. EDM CONE5.6 Dual EDM cone5.6.1 Ambient S NWe consider finding the ordinary dual EDM cone in ambient space S N whereEDM N is pointed, closed, and convex, but has empty interior. The set of allEDMs in subspace S N is a closed convex cone because it is the intersectionof infinitely many halfspaces about the origin in variable D :EDM N = ⋂ {D ∈ S N | 〈e i e T i , D〉 ≥ 0, 〈e i e T i , D〉 ≤ 0, 〈zz T , −D〉 ≥ 0 }z∈ N(1 T )i=1...N(631)By definition (170), a dual cone K ∗ comprises all vectors inward-normal toeach and every hyperplane ∂H + (§2.3.2.4.1) supporting (at the origin) orcontaining convex K . The dual EDM cone in the ambient space of symmetricmatrices is therefore expressible as the aggregate of every conic combinationof inward-normals from (631): [1, exer.2.36]EDM N∗ ∑= { N ∑ζ i e i e T i − N ξ j e j e T j | ζ i ,ξ j ≥ 0} − cone{zz T | 11 T zz T =0}i=1j=1= {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1 , (‖v‖=1) } ⊂ S N= S N⊥0 − S N g ∩ S N + (confer (779))(632)The EDM cone is not self-dual in ambient S N because its affine hull belongsto a proper subspaceaff EDM N = S N 0 (633)The ordinary dual EDM cone cannot, therefore, be pointed.When N = 1, for example, the EDM cone is the point at the origin inR . The auxiliary matrix V N is empty [ ∅ ] , and the dual cone EDM ∗ is thereal line.When N = 2, the EDM cone is a nonnegative real line in isometricallyisomorphic R 3 . EDM 2∗ is the halfspace whose partial boundary has inwardnormalEDM 2 . The diagonal matrices {δ(u)} are represented by a hyperplanethrough the origin {d | [ 0 1 0 ]d=0} while the term cone{V N υυ T V T N }is represented by the half-line T in Figure 5.4 belonging to the PSD cone.The dual EDM cone is formed by translating the hyperplane along thenegative semidefinite half-line; the union of each and every translation.(confer §2.3.2.3.2)

5.6. DUAL EDM CONE 2215.6.2 Ambient S N 0When instead we consider the ambient space of symmetric hollow matrices(633), then still we find the EDM cone is not self-dual for N > 2. This isproven as follows:Given the minimal set of generators {Γ} (607) for the pointed closedconvex EDM cone, the discrete membership theorem in §2.8.2.1.3 asserts thatmembers of the dual EDM cone in the ambient space of symmetric hollowmatrices can be discerned by testing a discretized generalized inequality:EDM N∗ ∩ S N 0∆= {D ∗ ∈ S N 0 | 〈Γ , D ∗ 〉 ≥ 0 ∀ Γ ∈ G(EDM N )}= {D ∗ ∈ S N 0 | 〈(z ◦ z)1 T + 1(z ◦ z) T − 2zz T , D ∗ 〉 ≥ 0, z ∈ N(1 T )}= {D ∗ ∈ S N 0 | (z ◦ z) T D ∗ 1 − z T D ∗ z ≥ 0, z ∈ N(1 T )}(634)The term (z◦z) T D ∗ 1 foils any hope, for N >2, of self-duality in ambient S N 0 .The first adverse case N = 3 , for example, may be deduced fromFigure 5.1; a circular cone corresponding to no rotation of the Lorentz cone(the self-dual circular cone).To find the dual EDM cone in ambient S N 0 per §2.9.2.2 we prune aggregate(632) describing the ordinary dual EDM cone, removing any member havingnonzero main diagonal:EDM N∗ ∩ S N 0 = cone { δ ( )V N υυ T VNT − VN υυ T VN T | υ ∈ RN−1}= ( )S N⊥0 − S N g ∩ S N + ∩ SN0(635)When N = 1, for example, the EDM cone and its dual in ambient S N 0each comprise the origin in isomorphic R 0 by convention (110).When N = 2, the EDM cone is the nonnegative real line in isomorphic R .(Figure 5.4) EDM 2∗ is identical, thus self-dual in this dimension. [ This ] result1is in agreement with (634), verified directly: for all κ∈R , z = κ and[ ]−11z ◦ z = κ 2 ⇒ d1 12 ≥ 0.

222 CHAPTER 5. EDM CONE5.6.2.1 Schoenberg criterion is discretized membership relationCriterion (623) is a discretized membership relation (§2.8.2); id est, the classicSchoenberg criterion}−VN TDV N ≽ 0D ∈ S N 0⇔ D ∈ EDM N (352)is the same as〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N 0}⇔ D ∈ EDM N (623)which, by (624), is the same as〈zz T , −D〉 ≥ 0∀zz T ∈ { V N υυ T V T N | υ ∈ RN−1}}D ∈ S N 0⇔ D ∈ EDM N (636)where the zz T constitute a set of generators for the face of the positivesemidefinite cone containing auxiliary matrix V ; F ( S N + ∋V ) (§5.5.1). Criterion(636) is equivalent to the following constrained membership relationlike (243), assuming Z,D ∈ aff EDM N ,〈Z , D〉 ≥ 0 ∀ Z ∈ EDM N∗ ∩ S N 0 ⇔ D ∈ EDM N〈Z , D〉 ≥ 0 ∀ Z ∈ cone { δ ( )V N υυ T VNT − VN υυ T VN T | υ ∈ RN−1} ⇔ D ∈ EDM N(637)This membership relation must, of course, hold more broadly; specifically,over the ordinary dual cone breaching aff EDM N into the larger ambientspace S N : For D ∈ S N 0 ,〈Z , D〉 ≥ 0 ∀Z ∈ EDM N∗ ⇔ D ∈ EDM N〈Z , D〉 ≥ 0 ∀Z ∈ {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1} ⇔ D ∈ EDM N〈Z , D〉 ≥ 0 ∀Z ∈ − cone { V N υυ T V T N | υ ∈ RN−1} ⇔ D ∈ EDM N (638)because 〈δ(u), D〉 = 0.relation becomesWhen discretized (§2.8.2.1.3), this membership〈zz T , −D〉 ≥ 0 ∀zz T ∈ { V N υυ T V T N | υ ∈ R N−1} ⇔ D ∈ EDM N (639)

5.6. DUAL EDM CONE 223the same as the classic Schoenberg criterion.Hitherto a correspondence between the EDM cone and a face of the PSDcone, the Schoenberg criterion is now accurately interpreted as a discretizedmembership relation between the EDM cone and its ordinary dual.5.6.3 Theorem of the alternativeTheorem. EDM alternative. [91, §1]Given D ∈ S N 0 ,D /∈ EDM N ⇔ ∃ z such that{1 T z = 1Dz = 0(640)In words, either N(D) intersects hyperplane {z | 1 T z =1} or D is an EDM;the alternatives are incompatible.⋄When D is an EDM, on the other hand, [115, §2]N(D) ⊂ N(1 T ) = {z | 1 T z = 0} (641)Because [91, §2] (§E.0.1)thenDD † 1 = 11 T D † D = 1 T (642)R(1) ⊂ R(D) (643)

224 CHAPTER 5. EDM CONE

Chapter 6Convex optimizationPrior to 1984 , 6.1 linear and nonlinear programming, one a subsetof the other, had evolved for the most part along unconnectedpaths, without even a common terminology. (The use of“programming” to mean “optimization” serves as a persistentreminder of these differences.)−Forsgren, Gill, & Wright (2002) [116]It may seem puzzling, at first, why given some application the searchfor its solution ends abruptly with a formalized statement of the problemas a constrained optimization. The explanation is: typically, we do notseek analytical solution because there are few; rather, if the problem canbe expressed in convex form, then there exist computer programs providingefficient numerical global solution. [11] [117] [118] [4]6.0 c○ 2001 Jon Dattorro, all rights reserved.6.1 nascence of interior-point methods,225

226 CHAPTER 6. CONVEX OPTIMIZATIONThe goal, then, becomes conversion of a given problem to convex form: 6.2Given convex objective function g and convex feasible set 6.3 C , we have thegeneric convex optimization problemminimize g(X)Xsubject to X ∈ C(644)where constraints are abstract in the membership of variable X to C . Inequalityconstraint functions of a convex optimization problem are convex,while equality constraint functions are conventionally affine. [2, §1] [1, §4.2.1]The problem remains convex for maximization of a concave objective function(but not for maximization of a convex function). As conversion to convexform is not always possible, there is much ongoing research to determinewhat classes of problem have convex expression or relaxation. [7] [3] [5] [6][119] [8]6.1 Semidefinite programmingStill, we are surprised to see the relatively small number of submissionsto semidefinite programming (SDP) solvers, as this is anarea of significant current interest to the optimization community.We speculate that semidefinite programming is simply experiencingthe fate of most new areas: Users have yet to understandhow to pose their problems as semidefinite programs, and the lackof support for SDP solvers in popular modelling languages likelydiscourages submissions.−SIAM News, 2002. [120, p.9]6.2 This means, essentially, the original problem statement might be a minimization ofa constrained objective having many local minima. The equivalent or relaxed convexproblem possesses one minimum, a unique global minimum ideally corresponding to asolution of the original problem; e.g., §C.2.1.1.6.3 The feasible set of an optimization problem is the set of all variable values (belongingto the objective function domain) satisfying all the constraints.

6.1. SEMIDEFINITE PROGRAMMING 2276.1.1 Conic problemConsider a prototypical conic problem (p) and its dual (d): [67, §3.3.1][52, §2.1](p)minimize c T xxsubject to x ∈ KAx = bmaximize b T yy,ssubject to s ∈ K ∗A T y + s = c(d) (173)where K is a closed convex cone, K ∗ is its dual, matrix A is fixed, and theremaining quantities are vectors.When K is a polyhedral cone (§2.7.1), then each conic problem becomesa linear program. More generally, each optimization problem is convex whenK is a closed convex cone. Unlike the optimal objective value, a solution toeach is not necessarily unique; in other words, the optimal solution set {x ⋆ }or {y ⋆ ,s ⋆ } may each comprise more than a single point although the correspondingoptimal objective value is unique when the feasible set is nonempty.When K is the self-dual cone of positive semidefinite matrices in the subspaceof symmetric matrices, then each conic problem is called a semidefiniteprogram; primal problem (P) having matrix variable X ∈ S n while correspondingdual (D) has matrix slack variable S ∈ S n and vector variabley ∈ R m : [4, §1.3.8](P)minimize 〈C , X〉Xsubject to X ≽ 0A vec X = bmaximizey,S〈b,y〉subject to S ≽ 0vec −1 (A T y) + S = C(D)(645)where matrix C ∈ S n and vector b∈R m are fixed, as is⎡ ⎤vec(A 1 ) TA =∆ ⎣ . ⎦ ∈ R m×n2 (646)vec(A m ) Twhere A i ∈ S n , i=1... m , are given. Thus⎡ ⎤〈A 1 , X〉A vecX = ⎣ . ⎦〈A m , X〉(647)∑vec −1 (A T y) = m y i A ii=1

228 CHAPTER 6. CONVEX OPTIMIZATIONThe vector inner product for matrices is defined in theEuclidean/Frobenius sense in the isomorphic vector space R n2 ; id est,〈C,X〉 =tr(C ∆ T X)=vec(C) T vec X where vec X (18) denotes vectorizationby stacking columns in the natural order.It has been shown that interior-point methods [1, §11] [42] [121] [6] canconverge to a solution of maximal complementarity; [122] (§6.1.2.3.1) inthat case, not a vertex-solution but a solution of high cardinality or rank.[4, §2.5.3]We desire a simple algorithm for construction of a primal optimal solutionX ⋆ to (P) satisfying a least upper bound on rank governed byProposition 2.6.6.4.1 (Barvinok) that asserts existence of low-rank feasiblesolutions. [23, §II.13.1] Specifically, the proposition asserts an extreme point(§2.5) of the primal feasible set A ∩ S n + satisfies least upper bound⌊√ ⌋ 8m + 1 − 1rankX ≤(148)2whereA ∆ = {X ∈ S n | A vecX = b} (648)is the affine set from primal problem (P). Barvinok showed [60, §2.2] whengiven a positive definite matrix C and an arbitrarily small neighborhood ofC comprising positive definite matrices, there exists a matrix ˜C from thatneighborhood such that optimal solution X ⋆ to (P) (substituting ˜C) is anextreme point of A ∩ S n + and satisfies least upper bound (148). 6.4This means given arbitrary positive definite C , there is no guarantee anoptimal solution X ⋆ to (P) (using C) satisfies (148).To prove that by example: (Ye) Assume dimension n to be an evenpositive number. Then the particular instance of problem (P),〈[ ] 〉 I 0minimize , XX 0 2I(649)subject to X ≽ 0has optimal solutionX ⋆ =〈I , X〉 = n[ 2I 00 0]∈ S n (650)6.4 Further, the set of all such ˜C in that neighborhood is open and dense.

6.1. SEMIDEFINITE PROGRAMMING 229with an equal number of ones and zeros along the main diagonal. Indeed,optimal solution (650) is the terminal solution along the central path takenby the interior-point method as implemented in [4, §2.5.3]. Clearly, rank ofthis primal optimal solution exceeds by far the rank 1 solution predicted byleast upper bound (148).This rational example (649) illustrates the need for a more generally applicableand simple algorithm to identify an optimal solution X ⋆ satisfying theproposition. We will review such an algorithm in §6.1.3, but first we providesome background.6.1.2 Framework6.1.2.1 Feasible setsRespectively denote by F p and F d the sets of primal and dual points satisfyingthe primal and dual constraints, each assumed nonempty;⎧ ⎡⎨F p =⎩ X ∈ Sn + | ⎣F d =〈A 1 , X〉.〈A m , X〉{S ∈ S n + , y ∈ R m |⎤ ⎫⎬⎦= b⎭}m∑y i A i + S = Ci=1(651)These are the primal and dual feasible sets in the domain intersection ofthe respective constraint functions. Geometrically, primal feasible F p representsan intersection of the positive semidefinite cone with an affine subset ofthe subspace of symmetric matrices S n in isometrically isomorphic R n(n+1)/2 .The affine subset has dimension n(n + 1)/2 − m when the A i are linearlyindependent. Dual feasible F d is the Cartesian product of the positive semidefinitecone with its inverse image (§2.1.0.10.2) under affine transformationC − ∑ y i A i . 6.5 Hence both sets are closed and convex, and (P) and (D) areconvex optimization problems.6.5 The inequality C − ∑ y i A i ≽ 0 follows directly from (D) and is known as a linearmatrix inequality. (§2.8.2.4.1) Because ∑ y i A i ≼C , matrix S is known as a slack variable(a term borrowed from linear programming [63]) since its inclusion raises this inequalityto equality.

230 CHAPTER 6. CONVEX OPTIMIZATION6.1.2.2 DualsThe dual objective function evaluated at any feasible point represents a lowerbound on the primal optimal objective value. We can see this by directsubstitution: Assume the feasible sets F p and F d are nonempty. Then it isalways true that〈 ∑i〈C , X〉 ≥ 〈b,y〉〉y i A i + S , X ≥ [ 〈A 1 , X〉 · · · 〈A m , X〉 ] y〈S , X〉 ≥ 0(652)The converse also follows becauseX ≽ 0, S ≽ 0 ⇒ 〈S,X〉 ≥ 0 (862)The optimal value of the dual objective thus represents the greatest lowerbound on the primal. This fact is known as the weak duality theorem forsemidefinite programming, [4, §1.3.8] and can be used to detect convergencein any primal/dual numerical method of solution.6.1.2.3 Classical optimality conditionsWhen any primal feasible point exists interior to F p in S n , or when any dualfeasible point exists interior to F d in S n × R m , then by Slater’s condition[1, §5.2.3] these two problems (645)(P) and (D) become strong duals. Inother words, the primal optimal objective value becomes equivalent to thedual optimal objective value: there is no duality gap; id est, if ∃X ∈ int F por ∃S,y ∈ int F d , then〈 ∑i〈C , X ⋆ 〉 = 〈b,y ⋆ 〉y ⋆ i A i + S ⋆ , X ⋆ 〉= [ 〈A 1 , X ⋆ 〉 · · · 〈A m , X ⋆ 〉 ] y ⋆〈S ⋆ , X ⋆ 〉 = 0(653)where S ⋆ , y ⋆ denote the dual optimal solution. 6.6 We summarize this:6.6 The optimality condition 〈S ⋆ ,X ⋆ 〉=0 is called the complementary slackness conditionin keeping with the tradition of linear programming. [63]

6.1. SEMIDEFINITE PROGRAMMING 2316.1.2.3.1 Corollary. Strong duality and optimality. [69, §3.1] [4, §1.3.8]For semidefinite programs (645)(P) and (D), assume primal and dual feasiblesets F p ⊂ S n and F d ⊂ S n × R m (651) are nonempty. Then• X ⋆ is optimal for (P)• S ⋆ , y ⋆ are optimal for (D)• the duality gap 〈C,X ⋆ 〉−〈b,y ⋆ 〉 is 0if and only ifi) ∃X ∈ int F p or ∃S , y ∈ int F dii) 〈S ⋆ , X ⋆ 〉 = 0and⋄For symmetric positive semidefinite matrices, requirement ii is equivalentto the complementarity (§A.7.3)S ⋆ X ⋆ = X ⋆ S ⋆ = 0 (654)Because these two optimal symmetric matrices are simultaneously diagonalizabledue to their commutativity,rankX ⋆ + rankS ⋆ ≤ n (655)To see that, the product of the symmetric optimal matrices X ⋆ , S ⋆ ∈ S nmust itself be symmetric because of commutativity. (856) The symmetricproduct has diagonalizationS ⋆ X ⋆ = X ⋆ S ⋆ = QΛ S ⋆Λ X ⋆Q T = 0 ⇔ Λ X ⋆Λ S ⋆ = 0 (656)where Q is an orthogonal matrix. The product of the nonnegative diagonalΛ matrices can be 0 if their diagonal zeros are complementary or coincide.Due only to symmetry, rankX ⋆ = rank Λ X ⋆ and rankS ⋆ = rank Λ S ⋆ for theseoptimal primal and dual solutions. (846) So, because of the complementarity,the total number of nonzero diagonal entries from both Λ cannot exceed n .

232 CHAPTER 6. CONVEX OPTIMIZATIONWhen equality is attained in (655),rankX ⋆ + rankS ⋆ = n (657)there are no coinciding diagonal zeros in Λ X ⋆Λ S ⋆ , and then we have whatis called strict complementarity. Logically it follows that a necessary andsufficient condition for strict complementarity of the optimal primal and dualsolution isX ⋆ + S ⋆ ≻ 0 (658)The beauty of Corollary 6.1.2.3.1 is its symmetry; id est, one can solveeither the primal or dual problem and then find a solution to the other via theoptimality conditions. When the dual optimal solution is known, for example,the primal optimal solution belongs to the hyperplane {X | 〈S ⋆ , X〉=0}.6.1.3 Rank-reduced solutionRecall, there is an extreme point of A∩S n + (648) satisfying least upper bound(148) on rank. [60, §2.2] It is therefore sufficient to locate an extreme pointwhose primal objective value (645)(P) is optimal: 6.7For affine set A as in (648), given any optimal solution X ⋆ to (645)(P)(P)minimize 〈C , X 〉Xsubject to X ∈ A ∩ S n +(659)whose rank does not satisfy least upper bound (148), we posit the existenceof a set of perturbationssuch that for some 0≤i≤n{t j B j | t j ∈ R , B j ∈ S n , j =1... n} (660)X ⋆ +i∑t j B j (661)j=1becomes an extreme point of A ∩ S n + and remains an optimal solution of∑(P). Membership of X ⋆ + i t j B j to affine set A is secured for the i thj=16.7 Similar strategies can be found in [50, §31.5.3] [52, §2.4] [93, §3] [123]. There is noknown construction for Barvinok’s tighter result (153).

6.1. SEMIDEFINITE PROGRAMMING 233perturbation by demanding〈B i , A j 〉 = 0, j =1... m (662)Membership to the positive semidefinite cone S n + is insured by a small perturbation.(671) In this manner feasibility is insured. Optimality is provedin §6.1.3.2.The following simple procedure has very low computational intensity andlocates an optimal extreme point, assuming a nontrivial solution:6.1.3.0.1 Rank reduction procedureinitialize: B i = 0 ∀ifor iteration i=1...n{1. compute a nonzero perturbation matrix B i of X ⋆ + i−1 ∑j=1t j B j}∑2. maximize t i such that X ⋆ + i t j B j ∈ S n +j=1The rank-reduced optimal solution is theni∑X ⋆ ← X ⋆ + t j B j (663)j=16.1.3.0.2 Definition. Matrix step function.Define the signum-like real function Ψ : S n → R ,Ψ(A) ∆ ={ 1, A ≽ 0−1, otherwise(664)The value −1 is taken for indefinite or nonzero negative semidefinite argument.△

234 CHAPTER 6. CONVEX OPTIMIZATION6.1.3.1 Perturbation formDeza & Laurent [50, §31.5.3] prove every perturbation matrix B i , i=1... n ,is of the form,B i = −Ψ(Z i )R i Z i R T i (665)where∑i−1X ⋆ + t j B ∆ j = R i Ri T (666)j=1where R i ∈ R n×ρ is full-rank and skinny where( )∑i−1ρ = ∆ rank X ⋆ + t j B jj=1(667)and where matrix Z i ∈ S ρ is found at each iteration i by solving a verysimple feasibility problem: 6.8find Z i ∈ S ρsubject to 〈Z i , RiA T j R i 〉 = 0,j =1... m(668)At iteration i ,∑i−1X ⋆ + t j B j + t i B i = R i (I − t i Ψ(Z i )Z i )Ri T (669)j=1Maximization of each t i in step 2 reduces the rank of (669) and locates anew point on the boundary ∂(A ∩ S n +) because, by fact (845),∑i−11 − t i Ψ(Z i )λ(Z i ) ≽ 0 ⇔ X ⋆ + t j B j + t i B i ≽ 0 (670)rank of a positive semidefinite matrix in S n is diminished below n by thenumber of 0 eigenvalues (846), where λ(Z i ) denotes eigenvalues of Z i , and6.8 whose simplest method of solution projects a random nonzero point on the propersubspace of isomorphic R ρ(ρ+1)/2 specified by the constraints. (§E.5) The solution is nontrivialassuming the specified intersection of hyperplanes is not the origin; guaranteed byρ(ρ + 1)/2 > m. Indeed, this geometric intuition about forming the perturbation is whatbounds the solution’s rank from below; m is fixed by the number of equality constraintsin (645)(P) while rank ρ decreases with each iteration i. Otherwise, we might iterateindefinitely.j=1

6.1. SEMIDEFINITE PROGRAMMING 235because a positive semidefinite matrix having one or more 0 eigenvalues correspondsto a point on the PSD cone boundary (122). Necessity and sufficiencyare due to the facts: R i can be completed to a nonsingular matrix(§A.3.1.0.5), and I − t i Ψ(Z i )Z i can be padded with zeros while maintainingequivalence in (669). The maximization of t i thereby has closed form;(t ⋆ i) −1 = max {Ψ(Z i )λ(Z i ) j , j =1... ρ} (671)When Z i is indefinite, the direction of perturbation (determined by Ψ(Z i ))is arbitrary. We may take an early exit from the Procedure under the circumstancerank [ svec R T iA 1 R i svec R T iA 2 R i · · · svec R T iA m R i]= ρ(ρ + 1)/2 (672)which characterizes the rank ρ of any [sic] extreme point in A∩S n + . [52, §2.4]To see that, assuming the form of every perturbation matrix is indeed(665), then by (668)svec Z i ⊥ [ svec(R T iA 1 R i ) svec(R T iA 2 R i ) · · · svec(R T iA m R i ) ] (673)By orthogonal complement we haverank [ svec(R T iA 1 R i ) · · · svec(R T iA m R i ) ] ⊥+ rank[svec(RTi A 1 R i ) · · · svec(R T iA m R i ) ] = ρ(ρ+1)/2(674)When Z i can only be 0, then the perturbation is null because an extremepoint has been found; thus[svec(RTi A 1 R i ) · · · svec(R T iA m R i ) ] ⊥= 0 (675)from which the stated result (672) directly follows.6.1.3.2 Optimality of perturbed X ⋆We show that the optimal objective value is unaltered by perturbation (665);id est,i∑〈C , X ⋆ + t j B j 〉 = 〈C , X ⋆ 〉 (676)j=1

236 CHAPTER 6. CONVEX OPTIMIZATIONFrom Corollary 6.1.2.3.1 we have the necessary and sufficient relationshipbetween optimal primal and dual solutions under the assumption of existenceof an interior feasible point:S ⋆ X ⋆ = S ⋆ R 1 R T 1 = X ⋆ S ⋆ = R 1 R T 1 S ⋆ = 0 (677)This means R(R 1 ) ⊆ N(S ⋆ ) and R(S ⋆ ) ⊆ N(R T 1 ). From (666) and (669)we get the sequence:X ⋆ = R 1 R1TX ⋆ + t 1 B 1 = R 2 R2 T = R 1 (I − t 1 Ψ(Z 1 )Z 1 )R1TX ⋆ + t 1 B 1 + t 2 B 2 = R 3 R3 T = R 2 (I − t 2 Ψ(Z 2 )Z 2 )R2 T = R 1 (I − t 1 Ψ(Z 1 )Z 1 )(I − t 2 Ψ(Z 2 )Z 2 )R1T.()∑X ⋆ + i i∏t j B j = R 1 (I − t j Ψ(Z j )Z j )j=1j=1R T 1Substituting C = vec −1 (A T y ⋆ ) + S ⋆ from (645),()∑〈C , X ⋆ + i i∏t j B j 〉 =〈vec −1 (A T y ⋆ ) + S ⋆ , R 1 (I − t j Ψ(Z j )Z j )j=1j=1〈〉m∑= yk ⋆A ∑k , X ⋆ + i t j B jk=1j=1〈 m〉∑= yk ⋆A k + S ⋆ , X ⋆ = 〈C , X ⋆ 〉k=1because 〈B i , A j 〉=0 ∀i, j by design (662).(678)R T 1〉(679)6.2 Examples6.2.0.0.1 Example. Rank reduction.Given data⎡⎤ ⎡−1 1 8 1 1A = ⎣ −3 2 8 1/2 1/3 ⎦ , b = ⎣−9 4 8 1/4 1/911/21/4⎤⎦ (680)

Iteration: i=1:6.2. EXAMPLES 237consider the convex optimization problemminimize trXX∈ S 5subject to X ≽ 0Aδ(X) = bwhere the number of equality constraints is m = 3.X ⋆ = δ(x M) is optimal, where⎡ 2 ⎤1280x M=5⎢ 128 ⎥⎣ 0 ⎦90128(681)Rank-3 solution(682)although (⌊ least upper bound (148) predicts existence of at most a√8m+1−1 ⌋ )rank- = 2 feasible solution from three equality constraints. To2find a lower rank ρ solution to (681) (barring combinatorics), we invokeProcedure 6.1.3.0.1:Initialize:C =I , ρ=3, A j ∆ = δ(A(j, :)) , j =1, 2, 3, X ⋆ = δ(x M) , m=3, n=5.{⎡Step 1: R 1 =⎢⎣√21280 00√0 00501280 0√00 090128⎤.⎥⎦find Z 1 ∈ S 3subject to 〈Z 1 , R T 1A j R 1 〉 = 0, j =1, 2, 3(683)A nonzero random matrix Z 1 having 0 main diagonal is feasibleand yields a nonzero perturbation matrix. Arbitrarily chooseZ 1 = 11 T − I ∈ S 3 (684)

238 CHAPTER 6. CONVEX OPTIMIZATIONthen (rounding)⎡B 1 =⎢⎣0 0 0.0247 0 0.10480 0 0 0 00.0247 0 0 0 0.16570 0 0 0 00.1048 0 0.1657 0 0⎤⎥⎦(685)Step 2: t ⋆ 1= 1 because λ(Z 1 )=[−1 −1⎡X ⋆ ← δ(x M) + B 1 =⎢⎣2 ] T . So,20 0.0247 0 0.10481280 0 0 0 00.0247 051280 0.16570 0 0 0 00.1048 0 0.1657 090128⎤⎥⎦ (686)having rank ρ ←1 and produces the same optimal objective value.}This example demonstrates the solution found by rank reduction cancertainly be less than Barvinok’s least upper bound (148).6.2.0.0.2 Example. Boolean. [10] [7, §4.3.4] [119]Consider another approach to finding a solution of Platonic linear problemAx = b (687)given data (680). Matrix A is full-rank having a two-dimensional nullspace(now different from A in the prototype (645)(P), as is b). The obvious anddesired solution to this problem,x ⋆ = e 4 ∈ R 5 (688)has norm equal to 1 and minimum cardinality; 6.9 the minimum number ofnonzero entries in x . A minimum cardinality solution answers the question:“Which fewest linear combination of columns in A most closely resembles6.9 dimx=5 while dim R(x)=1.

6.2. EXAMPLES 239vector b ?” Such a problem has universal appeal, arising in many fields ofscience and across many disciplines. [124] [125]Designing an efficient algorithm to optimize cardinality has proved difficult.Though solution (688) is obvious, the simplest method of solutionis combinatorial in general. The MATLAB backslash command x=A\b , forexample, finds x M(682) having norm 0.7044 . The pseudoinverse solution(rounded)⎡ ⎤−0.0505−0.1975x P= A † b =⎢ 0.0590⎥(689)⎣ 0.2796 ⎦0.3955has minimum norm 0.5288 . Certainly, none of the traditional methods providex ⋆ = e 4 .Formulate the problem Ax=b using x ∆ =(ˆx + 1)/2 , for A∈ R m×n :minimize ‖A(ˆx + 1) 1 −ˆx2 b‖2 2subject to δ(ˆx) 2 = I(690)which is the same asminimize ˆx T A T Aˆx 1 −ˆx2 2(AT b − A T A1 1 2 )T ˆxsubject to δ(ˆx) 2 = I(691)Neither problem is convex because of the nonlinear equality constraints. Thusthe problem is made difficult by the quantized variable, having two allowablestates. By defining[X =∆ ˆx1] [ ˆx 1] ∈ Rn+1×n+1(692)these two problems become equivalent to:minimizeX∈ S n+1( [tr Xsubject to δ(X) = 1X ≽ 0rankX = 1A T A 1 A T A1 1 − 22 AT b(A T A1 1 − 2 AT b) T 0])(693)

240 CHAPTER 6. CONVEX OPTIMIZATIONThe rank constraint is non-convex. By eliminating it we get the convexoptimization 6.10( [Aminimize tr XT A 1 A T A1 1 − ])22 AT bX∈ S n+1 (A T A1 1 − 2 AT b) T 0(694)subject to δ(X) = 1X ≽ 0whose solution X ⋆ is identical to that of the original problem (690) if andonly if rankX ⋆ = 1 ; in that case, we are done. Otherwise we may invokeour rank reduction procedure which is guaranteed only to produce anotheroptimal solution conforming to Barvinok’s least upper bound (148); in thecase n=5 with six constraints δ(X)=1, that upper bound on rank is 3.Our semidefinite program solver produces X ⋆ with the following eigenvalues(accurate to four decimal places):λ(X ⋆ ) = {0.0000, 0.0000, 0.0000, 0.0001, 0.0010, 5.9989} (695)The rank reduction procedure will not produce solutions of arbitrarilylow rank. To force a lower rank solution, spectral projection methods mustbe employed as in §7.6.10 This relaxed problem can also be derived via Lagrange duality; it is the dual of thedual program to (690). Therefore it must be convex.

6.2. EXAMPLES 2416.2.0.0.3 Riemann mapping theorem... Techniques for reconstructionwe studied ...thus far... essentially are methods for optimally embeddingin a subspace of desired dimension an unknown list of points correspondingto given Euclidean distance data. While these techniques are themselvesimportant and have found useful contemporary application, we turn now tooptimal embedding of sampled non-affine manifolds in subspaces.6.2.0.0.4 Manifold... A smooth manifold is...A smooth manifold topologically homeomorphic with a vector space is embeddablein that vector space. [Whitney, H., “Differentiable Manifolds”, Annalsof Mathematics (1936)] Contemporary applications generally seek thatembedding having lowest dimension because it is believed that obscured informationis thereby revealed. [Saul]6.2.0.0.5 Ye projection for dimension reduction...6.2.0.0.6 Linear Complementarity Problem as SDP cite SIAM Reviewvol.46 no.2 June 2004...

242 CHAPTER 6. CONVEX OPTIMIZATION

Chapter 7EDM proximityIn summary, we find that the solution to problem [(703.3) p.248]is difficult and depends on the dimension of the space as thegeometry of the cone of EDMs becomes more complex.−Hayden, Wells, Liu, & Tarazaga (1991) [49, §3]A problem common to various sciences is to find the Euclidean distancematrix (EDM) D ∈ EDM N closest in some sense to a given matrix of measurementsH under the constraint that affine dimension 0 ≤ r ≤ N −1 (§2.2.1,§4.7.2) is predetermined; rather, constrained by upper bound ρ . That commonalitywas unified by de Leeuw and Heiser in 1982 [126] [73] who denotedby the term they coined multidimensional scaling [85] any reconstruction of alist X in Euclidean space from only inter-point distance information, possiblyincomplete (§8) or ordinal (§4.11.2).7.0 c○ 2001 Jon Dattorro, all rights reserved.243

244 CHAPTER 7. EDM PROXIMITY7.0.1 Lower boundMost of the problems we will encounter in this chapter have the general form:minimizeB∈ C‖B − A‖ F (696)where A ∈ R m×n is given data. This particular objective denotes Euclideanprojection of A on the set C (§E.3) which may or may not be convex. When Cis convex, the projection is unique; when C is a subspace, then the projectionis orthogonal.Denoting by A = U A Σ A QA T and B=U BΣ B QB T their full singular valuedecompositions (§A.6), there exists a tight lower bound on the objective;‖Σ B − Σ A ‖ F ≤inf ‖B − A‖ F (697)U A ,U B ,Q A ,Q BThis lower bound holds more generally for any orthogonally invariant normon R m×n (§2.1.1.1) including the Frobenius. [28, §7.4.51]7.0.2 Measurement matrix HIdeally, we want a given measurement matrix H ∈ R N×N to conform withthe first three Euclidean axioms (§4.2), to belong to the intersection of theorthant of nonnegative matrices R N×N+ with the symmetric hollow subspaceS N 0 (§2.1.2.2); id est, we want H to belong to the polyhedral cone (§2.7.1)K ∆ = S N 0 ∩ R N×N+ (698)Yet in practice, H can possess significant measurement uncertainty.Sometimes an optimization problem demands that its input, the givenmatrix H, possess some particular characteristics; perhaps symmetry andhollowness or nonnegativity. When that H given does not have the desiredproperties, then we must impose them upon H prior to optimization:• When measurement matrix H is not symmetric or hollow, taking itssymmetric hollow part is equivalent to orthogonal projection on thesymmetric hollow subspace S N 0 .• When measurements of distance in H are negative, zeroing negativeentries effects unique (minimum distance) projection on the orthant ofnonnegative matrices R N×N+ in isomorphic R N2 (§E.8.1.2.1).

2457.0.2.1 Order of impositionSince convex cone K (698) is the intersection of an orthant with a subspace,we want to project on that subset of the orthant belonging to the subspace;on the nonnegative orthant in the symmetric hollow subspace that is, infact, the intersection. For that reason alone, the unique projection of H onK (that member of K closest to H in R N2 in the Euclidean sense) can beobtained by first taking its symmetric hollow part, and only then clippingnegative entries of the result to 0; id est, there is only one correct order ofprojection, in general, on an orthant intersecting a subspace:• project on the subspace, then project the result on the orthant. (§E.8.3)In contrast, the order of projection on an intersection of subspaces is arbitrary.That order-of-projection rule applies more generally, of course, to theintersection of any convex set C with any subspace. 7.1 Consider the projectiveoptimization 7.2 over convex set S N 0 ∩ C given nonsymmetric nonhollow H :minimizeB∈S N 0subject to‖B − H‖ 2 FB ∈ C(699)Because the symmetric hollow subspace is orthogonal to the antisymmetricantihollow subspace (§2.1.2.2), then for B ∈ S N 0 ,( ( ))1tr B T 2 (H −HT ) + δ 2 (H) = 0 (700)so the objective function is equivalent to( )∥ ‖B − H‖ 2 F ≡1 ∥∥∥2∥ B − 2 (H +HT ) − δ 2 (H)F+21∥2 (H −HT ) + δ 2 (H)∥F(701)7.1 There are special cases where order of projection is arbitrary; e.g., when all the facesof C are perpendicular to the intersecting subspace.7.2 There are two equivalent interpretations of projection (§E.3, §E.8): one finds theperpendicular, the other, minimum distance between a point and a set. In projectiveoptimization we realize the latter view.

246 CHAPTER 7. EDM PROXIMITY......❜..❜..❜..❜..❜..❜..❜.❜❜❜❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧ 0 ✟✟✟✟✟✟✟✟✟✟✟EDM N❜❜ S N ❝ ❝❝❝❝❝❝❝❝❝❝❝0❜❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❜❜❜❜❜❜ K = S N .0 ∩ R N×N+.❜...S N ❜.. ❜..❜.. ❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧....... . . . ..R N×N........................Figure 7.1: Pseudo-Venn diagram: The EDM cone belongs to the intersectionof the symmetric hollow subspace with the nonnegative orthant; EDM N ⊆ K(332). EDM N cannot exist outside S N 0 , but R N×N+ does.......This means the antisymmetric antihollow part of given matrix H would beignored by minimization with respect to symmetric hollow variable B underthe Frobenius norm; id est, minimization proceeds as though given thesymmetric hollow part of H .This action of the Frobenius norm (701) is effectively a projection ofH on the symmetric hollow subspace S N 0 prior to minimization. Thus theminimization proceeds inherently following the correct order for projectionon S N 0 ∩ C . Therefore we may assume H ∈ S N 0 or take its symmetric hollowpart prior to optimization.7.0.2.2 Egregious input error under nonnegativity demandMore pertinent to the optimization problems presented herein whereC ∆ = EDM N ⊆ K = S N 0 ∩ R N×N+ (702)then should some particular realization of a projective optimization problemdemand input H be nonnegative, and were we only to zero negative entries of

247H S N 00EDM NK = S N 0 ∩ R N×N+Figure 7.2: Pseudo-Venn diagram from Figure 7.1 showing elbow placedin path of projection of H on EDM N ⊂ S N 0 by an optimization problemdemanding nonnegative input matrix H . The first two line segments leadingaway from H result from correct order-of-projection required to providenonnegative H prior to optimization. Were H nonnegative, then itsprojection on S N 0 would instead belong to K ; making the elbow disappear.(confer Figure E.3)a nonsymmetric nonhollow input H prior to optimization, then the ensuingprojection on EDM N would be guaranteed incorrect (out of order).Now comes a surprising fact: Even were we to correctly follow the orderof-projectionrule and provide H ∈ K prior to optimization, then the ensuingprojection on EDM N will still be incorrect whenever H has negative entries.This is best understood referring to Figure 7.1: Suppose some projectiveoptimization problem demands a nonnegative input H . Further supposethat same optimization problem correctly projects its input first on S N 0 andthen directly on C = EDM N . Knowing this, then the demand for nonnegativeinput H effectively requires imposition of K on H prior to optimization so asto obtain correct order of projection; id est, S N 0 first. Yet such an impositionprior to projection on EDM N generally introduces an elbow into the pathof projection (illustrated in Figure 7.2) caused by the technique itself; a

248 CHAPTER 7. EDM PROXIMITYparticular optimization problem formulation requiring nonnegative input.Any procedure for imposition of nonnegativity on input H is incorrectin this circumstance. There is no resolution unless input H is guaranteednonnegative with no tinkering. Otherwise, we have no choice but to employa different optimization technique; one not demanding nonnegative input.7.0.3 Three prevalent fundamental problemsThere are three statements of the closest-EDM problem prevalent in theliterature, the multiplicity due primarily to concessions for hard problems andvacillation between the distance-square variable d ij versus absolute distance√dij ; they are, (703.1), (703.2), and (703.3):(1)(3)minimize ‖−V (D − H)V ‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖D − H‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖ √ D − H‖ 2 FDsubject to rankV DV ≤ ρ (2)D ∈ EDM Nminimize ‖−V ( √ D − H)V ‖ 2 FDsubject to rankV DV ≤ ρ (4)D ∈ EDM N(703)where ρ is the upper bound on affine dimension and D ∆ = [d ij ] . Problem(703.2) has the associated moniker stress [127] ascribed by the mathematicalcommunity, 7.3 while (703.1) has the nickname strain; [100] [22] two nebulousmnemonics. Problem (703.4) is not posed in the literature because it haslimited theoretical foundation (§4.13).Generally speaking, each problem produces different results. Of the variousauxiliary matrices (§B.4), the geometric centering matrix V (996) ap-7.3 I have been interested in the concept of stress in a mathematical setting for some time.It is true that the way I use it is not exactly the same as what it means in physics orelasticity theory. But it is closely related. I did not want to take the time to motivatethe word in this paper. I tend to use the word “stress” as the scalar coefficients for thesquared distances in a energy functional, not as a force. The physical sense of stresscould be used, but it is distracting from the simple mathematical ideas that are involved.Again one can translate, but my motivation is from geometry not engineering or evenphysics. Nevertheless, the idea is quite useful. Indeed, “stress matrices” explain a lotabout the “rigidity” of many tensegrity structures, and they do it in a reasonable way, inmy opinion. −Robert Connelly

7.1. FIRST PREVALENT PROBLEM: 249pears in the literature most often although V N (339) is the auxiliary matrixnaturally consequent to Schoenberg’s seminal exposition [87].Substitution of VNT for left-hand V in (703.1) produces a different resultbecause minimization of‖−V (D − H)V ‖ 2 F (704)selects D to minimize the distance of −VHV to the positive semidefinitecone in ambient S N , whereas minimization of‖−V T N (D − H)V N ‖ 2 F (705)minimizes the distance of −VN THV N to the positive semidefinite cone inS N−1 ; quite different mappings. 7.4 But substitution of VW T (§B.4.3) or V † Nin (703.1) instead yields the same result because V = V W VW T = V N V † N; id est,‖−V (D − H)V ‖ 2 F = ‖−V W V T W (D − H)V W V T W ‖2 F = ‖−V T W (D − H)V W‖ 2 F= ‖−V N V † N (D − H)V N V † N ‖2 F = ‖−V † N (D − H)V N ‖ 2 F(706)Problems (703.2) and (703.3) are convex optimization problems in thecase ρ =N −1. When the rank constraint is removed from (703.2), we willsee the convex problem remaining inherently minimizes affine dimension.7.1 First prevalent problem:Projection on PSD coneThis first problemminimize ‖−VN T(D − H)V ⎫N ‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 1 (707)⎪D ∈ EDM N ⎭poses a projection [37, §3.12] of −VN THV N in subspace S N−1 on a non-convexsubset (when ρ

250 CHAPTER 7. EDM PROXIMITYmatrices have rank no greater than desired affine dimension ρ . Problem 1finds the closest EDM D in the sense of Schoenberg. (352) [87] This optimizationproblem is convex only when desired affine dimension (§4.7.2) is largestρ =N −1 although its analytical solution is known [128] for all ρ ≤N −1,being first pronounced in the context of multidimensional scaling by Mardia[129] in 1978: We assume only that the given measurement matrix H issymmetric; 7.5 H ∈ S N (708)Arranging the eigenvalues λ i of −V T N HV N in nonincreasing order for all i ,λ i ≥ λ i+1 with v i the corresponding i th eigenvector,− V T NHV N ∆ =N−1∑i=1then the optimal solution to Problem 1 is [73, §2]λ i v i v T i (709)− V T ND ⋆ V N =ρ∑max{0, λ i }v i vi T (710)i=17.1.1 Closest EDM Problem 1, proof of (710),convex caseWhen desired affine dimension is largest, ρ =N −1, the rank function disappearsfrom (707); videlicetminimize ‖−VN T(D − H)V N ‖ 2 FD(711)subject to D ∈ EDM NIn those terms, assuming D ∈ S N [sic], the necessary and sufficient conditions(§E.8.1.0.1) for unique projection in isomorphic R (N−1)2 on the self-dual (199)7.5 Projection in Problem 1 is on a rank ρ subset (§7.1.2) of the positive semidefinite conein the ambient space of symmetric matrices. It is wrong here to zero the main diagonalof given H because projecting first on the symmetric hollow subspace places an elbow inthe path of projection (confer Figure 7.2).

7.1. FIRST PREVALENT PROBLEM: 251positive semidefinite cone are: 7.6 (943) (confer (1306))−V T N D⋆ V N ≽ 0−V T N D⋆ V N(−VTN D ⋆ V N + V T N HV N)= 0−V T N D⋆ V N + V T N HV N ≽ 0(712)Symmetric −VN THV N is diagonalizable hence decomposable in terms of itseigenvectors v and eigenvalues λ as in (709). Therefore, (confer (710))− V T ND ⋆ V N =N−1∑i=1max{0, λ i }v i v T i (713)satisfies (712), optimally solving (711). To see that, recall: these eigenvectorsconstitute an orthogonal set and− V T ND ⋆ V N + V T NHV N7.1.2 rank ρ subsetN−1∑= − min{0, λ i }v i vi T (714)Prior to determination of D ⋆ , analytical solution (710) is equivalent to solutionof the generic problem: given A∈ S N−1 and desired affine dimension ρ ,i=1minimize ‖B − A‖ 2B∈S N−1Fsubject to rankB ≤ ρB ≽ 0(715)where B ∆ = −V T N DV N ∈ S N−1 and A ∆ = −V T N HV N ∈ S N−1 . Once optimalB ⋆ is found, the technique of §4.10 can be used to determine the optimaldistance matrixD ⋆ ∈ EDM N (716)Hence Problem 1 is truly a projection of −VN THV N on that non-convexsubset of symmetric matrices, belonging to the positive semidefinite cone,having rank no greater than desired affine dimension 7.7 ρ ; called rank ρsubset of the positive semidefinite cone.7.6 These conditions for projection on a convex cone are identical to the Karush-Kuhn-Tucker (KKT) optimality conditions for problem (711).7.7 Recall that affine dimension is a lower bound on embedding (§2.2.1), equal to thedimension of the smallest affine set in which points from a list X corresponding to anEDM D can be embedded.

252 CHAPTER 7. EDM PROXIMITY7.1.3 Closest EDM Problem 1, non-convex caseIt is curious how non-convex Problem 1 has such a simple analytical solution(710). Solution to the generic problem (715) was presented in[128, thm.14.4.2] in 1979. In 1997 Trosset [73] first observed equivalence ofthis rank-constrained problem to spectral projection on the monotone nonnegativecone K M+ . He generalized the problem and its solution by admittingprojection on any convex subset of K M+ . The nature of Trosset’s proofwas algebraic. Here we derive the known solution (710) for a rank ρ subsetof the positive semidefinite cone using instead a more geometric argument(subsuming the proof in §7.1.1):7.1.3.0.1 Proof. (710), non-convex case.With diagonalization of unknownB ∆ = UΥU T ∈ S N−1 (717)given desired affine dimension 0 ≤ ρ ≤ N −1 and diagonalizableA ∆ = QΛQ T ∈ S N−1 (718)having eigenvalues arranged in Λ in nonincreasing order, problem (715) isequivalent to a spectral projection: (28)minimize ‖Υ − U T QΛQ T U‖ 2 FU,Υsubject to δ(Υ) ∈ K ρ M+(719)U −1 = U Twhere δ is the linear main-diagonal operator (§A.1), and whereK ρ M+∆= {υ ∈ R N−1 | υ 1 ≥ υ 2 ≥ · · · ≥ υ ρ ≥ υ ρ+1 = υ ρ+2 = · · · = υ N−1 = 0} ⊆ R ρ +(720)is a pointed polyhedral cone, a ρ-dimensional subset of the monotone nonnegativecone K M+ in R N−1 (§2.9.2.2.1),K M+ = {υ | υ 1 ≥ υ 2 ≥ · · · ≥ υ N−1 ≥ 0} ⊆ R N−1+ (244)

7.1. FIRST PREVALENT PROBLEM: 253where K N−1 ∆M+= K M+ . Problem (719) is equivalent to the problem sequence:[29, §0.1.2]minimize ‖Υ − R T ΛR‖ 2 FΥsubject to δ(Υ) ∈ K ρ (a)M+whereminimize ‖Υ − R T ΛR‖ 2 FRsubject to R −1 = R T(b)(721)R ∆ = Q T U (722)is a bijection in U on the set of orthogonal matrices. Problem (721a) is aunique projection of R T ΛR on a convex subset of the monotone nonnegativecone; on δ(K ρ M+ ) in isometrically isomorphic R(N−1)2 . 7.8 Projection can beaccomplished by first projecting R T ΛR orthogonally on the ρ-dimensionalsubspace containing K ρ M+ , and then projecting the result on Kρ M+ itself:(§E.8.3) Projection on that subspace amounts to zeroing all entries of R T ΛRoff its main diagonal and the last N −1 − ρ entries along its main diagonal.Then we have the equivalent problem sequence:minimize ‖δ(˜Υ) − δ( ˜R T Λ ˜R)‖ 2˜Υsubject to δ(˜Υ) ∈ ˜K M+(a)minimize ‖δ(˜Υ) − δ( ˜R T Λ ˜R)‖ 2˜R(b)subject to ˜RT ˜R = I(723)whereδ(˜Υ) ∆ = δ(Υ) 1:ρ ∈ R ρ (724)and δ(Υ) ρ+1:N−1 = 0, and where˜K M+ ∆ = {υ | υ 1 ≥ υ 2 ≥ · · · ≥ υ ρ ≥ 0} ⊆ R ρ + (244)7.8 Isometry is a consequence of the fact, for diagonal Υ 1 ,Υ 2 ∈ S N−1 , [38, §3.2]〈δ(Υ 1 ) , δ(Υ 2 )〉 = 〈Υ 1 , Υ 2 〉meaning, distances are preserved in the map from R ρ to the vectorization in R (N−1)2 .

254 CHAPTER 7. EDM PROXIMITYis dimensionally redefined hence a simplicial cone with respect to R ρ , andwhere˜R ∆ = [r 1 r 2 · · · r ρ ] ∈ R N−1×ρ (725)constitutes the first ρ columns of orthogonal variable R . Because anypermutation matrix is an orthogonal matrix, we may always assumeδ( ˜R T Λ ˜R) ∈ R ρ to be arranged in nonincreasing order; id est,δ( ˜R T Λ ˜R) i ≥ δ( ˜R T Λ ˜R) i+1 , i=1... ρ −1 (726)Unique projection of vector δ( ˜R T Λ ˜R) on convex cone ˜KM+(§E.8.1.0.1, §2.9.2.2.1)Y † δ(˜Υ ⋆ ≽ 0δ(˜Υ ⋆ )(δ(˜Υ T ⋆ ) − δ( ˜R T Λ ˜R))= 0Y(δ(˜Υ T ⋆ ) − δ( ˜R T Λ ˜R))≽ 0requires:(727)which are necessary and sufficient conditions, whereY †T = ∆ [e 1 −e 2 e 2 −e 3 · · · e ρ ] ∈ R ρ×ρ (245)Y = [e 1 e 1 +e 2 e 1 +e 2 +e 3 · · · 1] ∈ R ρ×ρ (249)Any value ˜Υ ⋆ that satisfies conditions (727) is optimal for (723a). Then therelationshipδ(Υ ⋆ ) i ={max{0, δ( ˜RT Λ ˜R) i }, i=1... ρ0, i=ρ+1... N −1(728)specifies an optimal solution. The lower bound on the objective with respectto R in (721b) is tight; (697) (§C.2.1.2.1)‖ |Υ ⋆ | − |Λ| ‖ F ≤ ‖Υ ⋆ − R T ΛR‖ F (729)where | · | denotes absolute entry-value. For selection of Υ ⋆ as in (728), thislower bound is achieved whenR ⋆ = I (730)the known solution.

7.1. FIRST PREVALENT PROBLEM: 2557.1.3.1 Significance...The importance of this result cannot be overstated:Firstly, the solution is closed form having been discovered only in 1979(relatively late in the development of linear algebra that has its roots in the17 th century [130]).Secondly, the solution is equivalent to projection on a polyhedral cone inthe spectral domain; a necessary and sufficient condition for membership tothe positive semidefinite cone...Thirdly, the solution at once includes projection on a rank ρ subset (partof the boundary of the positive semidefinite cone) from the exterior or interiorof the cone. Projection on the boundary from the interior is a non-convexproblem...7.1.4 Problem 1 in spectral norm, convex caseWhen instead we pose the matrix 2-norm (spectral norm) in Problem 1 (707)for the convex case ρ = N −1, then the new problemminimize ‖−VN T(D − H)V N ‖ 2D(731)subject to D ∈ EDM Nis convex although its solution is not necessarily unique because the spectralnorm is not a strictly convex function (§3.1); 7.9 giving rise to obliqueprojection (§E) on the positive semidefinite cone. Indeed, its solution set includesthe Frobenius solution (710) for the convex case whenever −V T N HV Nis a normal matrix. [97, §1] [131] [1, §8.1.1] Singular value problem (731) isequivalent towhereminimize µµ,Dsubject to −µI ≼ −VN T(D − H)V N ≼ µI (732)D ∈ EDM N{ ∣µ ⋆ = maxλ ( −VN T (D ⋆ − H)V N∣ , i = 1... N −1i)i} (733)the maximum absolute eigenvalue (due to matrix symmetry).[ ] 2 07.9 For each and every |t|≤2, for example, has the same spectral norm.0 t

256 CHAPTER 7. EDM PROXIMITYFor lack of a unique solution here, we prefer the Frobenius rather thanspectral norm.7.2 Second prevalent problem:Projection on EDM cone in √ d ijLet √D ∆ = [ √ d ij ] ∈ K (734)be an unknown matrix of absolute distance; id est,D = [d ij ] ∆ = √ D ◦ √ D ∈ EDM N (735)where ◦ denotes the Hadamard product. The second prevalent problem isa projection of H in the natural coordinates (absolute distance) on a nonconvexsubset (when ρ < N − 1) of the convex cone of Euclidean distancematrices EDM N in subspace S N 0 : (confer Figure 5.1(b))minimize ‖ √ ⎫D − H‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 2 (736)⎪ ⎭D ∈ EDM NThis statement of the second closest-EDM problem is considered difficult tosolve because of the constraint on affine dimension ρ (445) and because theobjective function‖ √ D − H‖ 2 F = ∑ i,j( √ d ij − h ij ) 2 (737)is expressed distinctly in the natural coordinates with respect to the constraints.Our solution to this second problem prevalent in the literature generallyrequires measurement matrix H to be nonnegative.H = [h ij ] ∈ R N×N+ (738)If the H given has negative entries, then the technique of solution presentedhere becomes invalid. As explained in §7.0.2, projection of H on K (698)prior to application of this proposed solution is incorrect.

7.2. SECOND PREVALENT PROBLEM: 2577.2.1 Convex caseWhen ρ = N−1, the rank constraint vanishes and a convex problem appears:[132] [100, §13.6]minimize ‖ √ D − H‖ 2 FDsubject to D ∈ EDM N⇔minimizeD∑ √d ij − 2h ij dij + h 2 iji,jsubject to D ∈ EDM N (739)For any fixed i and j , the argument of summation is a convex functionof d ij because the negative square root is convex in nonnegative d ij and h ijand because d ij + h 2 ij is affine (convex). Because the sum of any number ofconvex functions in D remains convex [1, §3.2.1] and because the feasible setis convex in D , we have a convex problem. The existence of a unique (global)solution D ⋆ for this second prevalent problem depends upon nonnegativityof H .7.2.1.1 Equivalent semidefinite program, Problem 2, convex caseConvex problem (739) is solvable for its global minimum by any interiorpointmethod [1, §11] [4] [121] [6] [42] and requires no further modification.Nevertheless, we translate (739) to an equivalent semidefinite program (SDP)for a pedagogical reason made clear in §7.2.2, and because there exist readilyavailable and efficient computer programs for numerical solution. [69] [7] [11][117] [118]Substituting a new matrix variable Y ∆ = [y ij ]∈ R N×N ,we propose that (739) is equivalent to the SDPminimizeD,Ysubject toy ij ← h ij√dij (740)∑d ij − 2y ij + h 2 iji,j[dij y ij](741)y ij h 2 ≽ 0, i,j = 1... NijD ∈ EDM NTo see that, recall d ij ≥ 0 is implicit to D ∈ EDM N (§4.8.1, (352)). So when

258 CHAPTER 7. EDM PROXIMITYH is nonnegative as assumed,[dij y ijy ijh 2 ij] √ √≽ 0 ⇔ h ij d ij ≥ yij 2 (742)Because negative y ij will not minimize the objective function, nonnegativityof y ij is implicit in (741). Further, minimization of the objective functionimplies maximization of y ij that is bounded above. Hence, as desired,y ij →h ij√dij as optimization proceeds.If the given matrix H is now assumed nonnegative and symmetric,H = [h ij ] ∈ R N×N+ ∩ S N (743)then Y = H ◦ √ D must belong to K = S N 0 ∩ R N×N+ (698). Then becauseY ∈ S N 0 , (§B.4.2 no.11)‖ √ D − H‖ 2 F = ∑ ()d ij − 2y ij + h 2 ij = −N tr V † N (D − 2Y )V N + ∑ h 2 iji,ji,j(744)convex problem (741) is equivalent to the SDPminimizeD,Ysubject to()− tr V † N (D − 2Y )V N[ ]dij y ijy ij h 2 ≽ 0, j > i = 1... N − 1 (745)ijY ∈ S N 0D ∈ EDM Nwhere the constants h 2 ij and N have been dropped arbitrarily from the objective.7.2.2 Minimization of affine dimension in Problem 2When desired affine dimension ρ is diminished, the rank function becomesreintroduced into problem (741) that is then rendered difficult to solve becausethe feasible set loses convexity in S N 0 ×R N×N . Indeed, the rank functionis quasiconcave on the positive semidefinite cone (§3.2.0.0.1).

7.2. SECOND PREVALENT PROBLEM: 2597.2.2.1 Rank minimization heuristicA remedy developed in [133] [72] [134] introduces the convex envelope (cenv)of the rank function:Definition. Convex envelope. [135] The convex envelope of a functionf : C →R is defined as the largest convex function g such that g ≤ f onconvex domain C ⊆ R n . 7.10△• The convex envelope of the rank function is proportional to the tracefunction when its argument is constrained to be symmetric and positivesemidefinite.A properly scaled trace thus represents the best convex lower bound on rank.The idea, then, is to substitute the convex envelope for the rank of somevariable A∈ S M + ;rankA ← cenv(rankA) ∝ trA (746)7.2.2.1.1 Applying the trace heuristic to Problem 2Substituting the rank envelope for the rank function (445), for D ∈ EDM N ,cenv rank(−V T NDV N ) = cenv rank(−V † N DV N) ∝ − tr(V † N DV N) (747)and for ρ ≤ N − 1 and nonnegative H we have the convex optimizationproblemminimize ‖ √ D − H‖ 2 FDsubject to − tr(V † N DV N) ≤ κρ(748)D ∈ EDM N7.10 Provided f ≢+∞ and there exists an affine function h ≤f on R n , then the convexenvelope is equal to the convex conjugate (the Legendre-Fenchel transform) of the convexconjugate of f ; id est, the conjugate-conjugate function f ∗∗ . [29, §E.1]

260 CHAPTER 7. EDM PROXIMITYwhere κ ∈ R + is a constant determined by cut and try. 7.11 The equivalentSDP makes κ variable: for nonnegative and symmetric H ,minimize κρ + 2 tr(V † N Y V N)D,Y, κ[ ]dij ysubject to ijy ij h 2 ≽ 0, j > i = 1... N − 1ij(749)− tr(V † N DV N) ≤ κρY ∈ S N 0D ∈ EDM Nwhich is the same as (745). As the problem is stated, the desired affinedimension ρ yields to the variable scale factor κ ; ρ is effectively ignored.Yet the result is an illuminant for (745) and it equivalents; when thegiven measurement matrix H is nonnegative and symmetric, finding theclosest EDM D as in problem (739), (741), or (745) implicitly entails minimizationof affine dimension (confer §4.8.4, §4.12.4). In other words, thosenon rank-constrained problems are each inherently equivalent to cenv(rank)-minimization problem (749).7.2.2.2 Heuristic insightMinimization of affine dimension by using this trace heuristic tends to findthe list configuration of least energy; rather, it tends to optimize compactionof the reconstruction by minimizing total distance. (357) It is best used wheresome physical equilibrium implies such an energy minimization.In the case of Problem 2, the trace heuristic arose naturally in the objective.Our preference for V † N over V N T spreads the energy over all availabledistances (§B.4.2 no.11, no.13) although the rank function itself is insensitiveto the choice (445).7.2.2.3 Rank minimization heuristic beyond convex envelopeFazel, Hindi, & Boyd [134] propose a rank heuristic more potent than trace(747) for problems of rank minimization; the concave surrogate function7.11 cenv(rank A) on ‖A∈ S n +‖ 2 ≤κ = tr(A)/κ [133]log det(Y +εI) ← rankY (750)

7.2. SECOND PREVALENT PROBLEM: 261in place of quasiconcave rankY (§3.2.0.0.1) when Y ∈ S n + is variable andε is a small positive constant. They propose minimization of the surrogateby substituting a sequence comprising infima of a linearized surrogate aboutthe current estimate Y i ; id est, from the first-order Taylor series expansionabout Y i on some open interval of ‖Y ‖ , (§D.1.6)log det(Y + εI) ≈ log det(Y i + εI) + tr ( (Y i + εI) −1 (Y − Y i ) ) (751)we make the surrogate sequence of infima over bounded convex feasible set C ,wherearg inf rankY ← lim Y i+1 (752)Y ∈ C i→∞Y i+1 = arg infY ∈ C tr( (Y i + εI) −1 Y ) (753)Choosing Y 0 = I , the first iteration becomes equivalent to a minimizationof trY . (747) The intuition underlying (753) is the new term in theargument of trace; specifically, (Y i + εI) −1 weights Y so that relatively smalleigenvalues of Y found by the infimum are made even smaller.To see that, substitute into (753) the non-increasingly ordered diagonalizationsY i + εI = ∆ Q(Λ + εI)Q T (a)(754)Y = ∆ UΥU T(b)Then from (1034) we have,inf δ((Λ + εI) −1 ) T δ(Υ) =Υ∈U ⋆T CU ⋆≤infΥ∈U T CUinf tr ( (Λ + εI) −1 R T ΥR )R T =R −1inf tr((Y i + εI) −1 Y )Y ∈ C(755)where R = ∆ Q T U is a bijection in U on the set of orthogonal matrices. Therole of ε is, therefore, to limit the maximum weight; the smallest diagonalentry of Υ gets the largest weight.7.2.2.3.1 Tightening this rank heuristicThis log det technique is therefore inherently different from projection methodsof rank constraint (such as §7.1.3) because the number of eigenvalues to

262 CHAPTER 7. EDM PROXIMITYbe zeroed is not determined a priori. As it stands, there is no provision formeeting an upper bound ρ on rank. Yet since the eigenvalues of the sum aresimply determined, λ(Y i + εI)=δ(Λ + εI) , we may certainly force selectedweights to ε −1 by manipulating diagonalization (754a). Empirically we findthis sometimes leads to better results, although affine dimension of a solutioncannot be guaranteed.7.3 Third prevalent problem:Projection on EDM cone in d ijReformulating Problem 2 (p.256) in terms of EDM D changes the problemconsiderably:⎫minimize ‖D − H‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 3 (756)⎪D ∈ EDM N ⎭This third prevalent problem is a projection of H on a non-convex subset(when ρ < N − 1) of the convex cone of Euclidean distance matrices EDM Nin subspace S N 0 (Figure 5.1(a)). Because the coordinates of projection aredistance-square and H presumably now holds distance-square measurements,the solution to Problem 3 is generally different than that of Problem 2 . 7.12For the moment, we need make no assumptions regarding measurementmatrix H .7.3.1 Convex caseminimize ‖D − H‖ 2 FD(757)subject to D ∈ EDM NWhen the rank constraint disappears (for ρ = N −1), this third problembecomes obviously convex because the feasible set is then the entire EDMcone, and because the objective function‖D − H‖ 2 F = ∑ i,j(d ij − h ij ) 2 (758)7.12 We speculate this third problem to be prevalent because it was once thought easier tosolve than Problem 2.

7.3. THIRD PREVALENT PROBLEM: 263is a strictly convex quadratic in D ; 7.13∑minimize d 2 ij − 2h ij d ij + h 2 ijDi,j(759)subject to D ∈ EDM N7.3.1.1 Equivalent semidefinite programs, convex caseIn the past, this convex problem was solved numerically by means of alternatingprojection. (Example 7.3.1.1.1) [136] [137] [49, §1] We translate (759)to an equivalent semidefinite program because we have a good SDP solver:Now assume the given measurement matrix H to be positive, symmetric,and hollow; 7.14H = [h ij ] ∈ S N 0 ∩ int R N×N+ (760)For Y = [y ij ] , and ∂ = ∆ [d 2 ij ]=D ◦D distance-square squared, we substitutey ij ← h ij d ij (761)Similarly to the development in §7.2.1.1, we then propose: Problem (759) isequivalent to the SDP( )minimize − tr V † N (∂ − 2Y )V N∂ , Y[ ]∂ij ysubject to ijy ij h 2 ≽ 0, j > i = 1... N − 1ij(762)YH ∈ EDMN∂ ∈ S N 0 ∩ R N×N+where Y/H = ∆ [y ij /h ij ] for i≠j , h ij ≠0 for i≠j , and[ ]√∂ij y ij≽ 0 ⇔ h ij d ij ≥ yij 2 , ∂ ij ≥ 0 (763)y ijh 2 ij7.13 For nonzero Y ∈ S N 0 and some open interval of t∈R , (§3.1.2.4, §D.2.2.1)d 2dt 2 ‖(D + tY ) − H‖2 F = 2 tr Y T Y > 07.14 If that H given has negative entries, then the technique of solution presented herebecomes invalid. As explained in §7.0.2, projection of H on K (698) prior to applicationof this proposed solution is incorrect.

264 CHAPTER 7. EDM PROXIMITYFor the same reasons as before, y ij →h ij d ij as optimization proceeds.The similarity of problem (762) to (745) cannot go unnoticed, but thepossibility of numerical instability due to division by small numbers here istroublesome.7.3.1.1.1 Example. Alternating projection on nearest EDM.By solving (762) we confirm the result from an example given by Glunt,Hayden, et alii [136, §6] who found an analytical solution to (757) for theparticular cardinality N = 3 by using the alternating projection method ofvon Neumann (§E.9):⎡H = ⎣0 1 11 0 91 9 0⎤⎦ , D ⋆ =Let the alternate convex sets be the two conesK 1 = S N 0K 2 =⋂⎡⎢⎣{A ∈ S N | 〈yy T , −A〉 ≥ 0 }19 1909 919 7609 919 7609 9⎤⎥⎦ (764)y∈N(1 T )= { A ∈ S N | −y T V AV y ≥ 0, ∀yy T (≽ 0) } (765)= { A ∈ S N | −V AV ≽ 0 }where auxiliary matrix V ∈ S N is the geometric centering matrix (996), soK 1 ∩ K 2 = EDM N (766)The dual cone K ∗ 1= S N⊥0 ⊆ S N (51) is an orthogonal subspace, and from thedual EDM cone development in §5.6.1,K ∗ 2 = − cone { V N υυ T V T N | υ ∈ R N−1} ⊂ S N (767)In [137, §5.3] Gaffke & Mathar observe that projection on K 1 and K 2 havesimple closed forms: Projection on subspace K 1 is easily performed by symmetrizationand zeroing the main diagonal or vice versa, while projection ofH ∈ S N on K 2 isP K2 H = H − P S N+(V H V ) (768)Thus the original problem (757) of projecting H on the EDM cone is transformed,basically, to an equivalent sequence of projections on the PSD cone.

7.3. THIRD PREVALENT PROBLEM: 265Using ordinary alternating projections, input H goes to D ⋆ with anaccuracy of four decimal places in about 17 iterations. Obviation of semidefiniteprogramming’s computational expense is the principal advantage ofthis alternating projection technique.To prove (768), first we observe membership of H −P S N+(V H V ) to K 2becauseP S N+(V H V ) − H = (P S N+(V H V ) − V H V ) + (V H V − H) (769)The term P S N+(V H V ) − V H V must belong to the (dual) positivesemidefinite ( cone by ) Theorem E.8.1.0.1, and V 2 = V . Hence,−V H −P S N+(V H V ) V ≽ 0 by Corollary A.3.1.0.5. Next, we requireExpanding,〈P K2 H −H , P K2 H 〉 = 0 (770)〈−P S N+(V H V ) , H −P S N+(V H V )〉 = 0 (771)〈P S N+(V H V ) , (P S N+(V H V ) − V H V ) + (V H V − H)〉 = 0 (772)〈P S N+(V H V ) , (V H V − H)〉 = 0 (773)Product V H V belongs to the geometric center subspace; (§E.7.1.0.2)V H V ∈ S N g = {Y ∈ S N | N(Y )⊇1} (774)Diagonalize V H V ∆ =QΛQ T (§A.5) whose nullspace is spanned by the eigenvectorscorresponding to 0 eigenvalues by Theorem A.7.2.0.1. Projection ofV H V on the PSD cone (§7.1) simply zeros negative eigenvalues in diagonalmatrix Λ . ThenN(P S N+(V H V )) ⊇ N(V H V ) (⊇ N(V )) (775)from which it follows:P S N+(V H V ) ∈ S N g (776)so P S N+(V H V ) ⊥ (V H V −H) because V H V −H ∈ S N⊥g . Finally, we musthave P K2 H − H = −P S N+(V H V ) ∈ K ∗ 2 . From §5.5.1 we know dual cone

266 CHAPTER 7. EDM PROXIMITYK ∗ 2 =−F ( S N + ∋V ) is the negative of the smallest face of the positive semidefinitecone containing auxiliary matrix V . Thus P S N+(V H V )∈ F ( S N + ∋V ) ⇔N(P S N+(V H V ))⊇ N(V ) (§2.6.6.3) which was already established above.From the results in §E.7.1.0.2, we know V H V is an orthogonal projectionof H ∈ S N on the geometric center subspace S N g . Thus we haveand by (1305) and (179) we getP K2 H = H − P S N+P S N gH (777)ThereforeK 2 = − ( S N + ∩ S N g) ∗= SN⊥g − S N + (778)EDM N = K 1 ∩ K 2 = S N 0 ∩ ( )S N⊥g − S N +(779)7.3.1.2 Schur-form semidefinite program, Problem 3Potential instability in problem (762) motivates another formulation: Movingthe objective function in (759) to the constraints makes an equivalent secondordercone program: for any measurement matrix H ,minimize tD, tsubject to ‖D − H‖ F ≤ t(780)D ∈ EDM NWe can transform this problem to an equivalent Schur-form semidefinite program;(§A.4.1)minimizeD, tsubject tot[]tI vec(D − H)vec T ≽ 0 (781)(D − H) tD ∈ EDM Ncharacterized by great sparsity and structure. The advantage of this SDP isthe lack of requirements on input H ; e.g., nonpositive entries would invalidatethe solution provided by (762). (§7.0.2.2)

7.3. THIRD PREVALENT PROBLEM: 2677.3.2 Minimization of affine dimension in Problem 3When the desired affine dimension ρ is diminished, Problem 3 (756) is difficultto solve [49, §3] because the feasible set in R N(N−1)/2 loses convexity.By substituting the rank envelope (747) into Problem 3, then for any givenH we have the convex problemminimize ‖D − H‖ 2 FDsubject to − tr(V † N DV N) ≤ κρ(782)D ∈ EDM Nwhere κ ∈ R + is a constant determined by cut and try. Problem (782) is aconvex optimization problem in any affine dimension ρ ; a convex approximationto projection on that non-convex subset of the EDM cone containingEDMs with corresponding affine dimension no greater than ρ .The SDP equivalent to (782) does not move κ into the variables as before(p.259): for positive symmetric hollow input H ,( )minimize − tr V † N (∂ − 2Y )V N∂ , Y[ ]∂ij ysubject to ij≽ 0, j > i = 1... N − 1y ijh 2 ij− tr(V † Y NH V N) ≤ κρYH ∈ EDMN∂ ∈ S N 0 ∩ R N×N+(783)That means we will not see equivalence of this cenv(rank)-minimization problemto the non rank-constrained problems (759) and (762) like we saw for itscounterpart (749) in Problem 2.7.3.3 Tightly constrained affine dimension, Problem 3We use the Cayley-Menger form (§4.14.2) to solve a problem that is the sameas Problem 3 (756):[ ] [ ]∥ minimize−D 1 −H 1 ∥∥∥2D∥ 1 T −0 1 T 0Fsubject to rankVN TDV (784)N ≤ ρD ∈ EDM N

K ρ+2λ268 CHAPTER 7. EDM PROXIMITYa projection of H on a non-convex subset (when ρ < N − 1) of the coneof Euclidean distance matrices EDM N . Relationship (555) provides necessaryand sufficient conditions for membership of D to the whole EDM cone(ρ =N −1); expressed in terms of the spectral coneK λ = {λ∈ R N+1 | λ 1 ≥ λ 2 ≥ · · · ≥ λ N ≥ 0 ≥ λ N+1 , 1 T λ = 0} ⊂ R N+1 (546)[ ]−EDM N 1for1 T . When one desires affine dimension ρ diminished further0below what can be achieved via cenv(rank)-minimization as in (783), spectralprojection methods become attractive. Our strategy for low-rank solution isprojection on that subset K ρ+2λof the spectral cone corresponding to affinedimension not in excess of that desired; id est, on∆= {λ∈ R N+1 | λ 1 ≥ λ 2 ≥ · · · ≥ λ ρ+1 ≥ λ ρ+2 = λ ρ+3 = · · · = λ N = 0 ≥ λ N+1 , 1 T λ = 0}= K M ∩ R N+1N+1- ∩ ∂H ∩ {λ | eT i λ = 0, i=ρ + 2...N} ⊂ R ρ+2 (785)a pointed polyhedral cone having empty interior, to which membership subsumesthe rank constraint; 7.15 a (ρ + 2)-dimensional convex subset of thespectral cone K λ in R N+1 where K N+1 ∆λ= K λ .Given desired affine dimension 0 ≤ ρ ≤ N −1 with unknown EDM D indiagonalization [ −D 11 T 0]∆= UΥU T ∈ S N+1 (786)and symmetric H in diagonalization[ ] −H 1 ∆1 T = QΛQ T ∈ S N+1 (787)0having eigenvalues arranged in nonincreasing order, then problem (784) isequivalent to:minimize ‖Υ − R T ΛR‖ 2 FΥ,Rsubject to δ(Υ) ∈ K ρ+2λ(788)δ(QRΥR T Q T ) = 0R T R = I7.15 K M is the monotone cone (251), R N+1N+1-is that orthant whose members’ sole negativecoordinate is their N +1 th (§2.1.0.7.1), and ∂H ∆ = {λ | 1 T λ = 0}.

7.3. THIRD PREVALENT PROBLEM: 269whereR ∆ = Q T U ∈ R N+1×N+1 (789)is a bijection in U on the set of orthogonal matrices, and where (§4.7.3.0.1)[ −D 12 ≤ rank1 T 0]≤ ρ + 2 ≤ N + 1 (790)The diagonal constraint δ(QRΥR T Q T ) = 0 makes problem (788) difficultto solve because it makes the two variables dependent. Without it, theproblem could be solved exactly by (1168) reducing it to the following finitesequence (a) (b): Initialize R to the identity I ,(a)(b)minimize ‖Υ − R T ΛR‖ 2 FΥsubject to δ(Υ) ∈ K ρ+2λ∩ K ∗ λδH ⋆ = P SN+1(QR ⋆ Υ ⋆ R ⋆T Q T )0minimize ‖Υ ⋆ − R T ΛR‖ 2 FRsubject to R T R = I (791)(c)Ignoring K ∗ λδ for a moment, subproblem (791a) is equivalent to unique projectionof R T ΛR on δ(K ρ+2λ) , a pointed polyhedral cone belonging to a(ρ +2)-dimensional subspace of isometrically isomorphic R (N+1)2 .Subproblem (791b) is a Procrustes problem whose analytical solution isknown (§C.2.1); a minimization on the non-convex manifold of orthogonalmatrices:R ⋆ = √ δ(ψ(δ(Λ))) H √δ(ψ(δ(Υ⋆ ))) ∈ C N+1×N+1 (792)a complex diagonal matrix, where the step function ψ is defined in (933).SoR ⋆ Υ ⋆ R ⋆T = δ(ψ(δ(Λ))) |Υ ⋆ | ∈ S N+1 (793)where the absolute value is entry-wise. Intersection of the feasible set insubproblem (a) with the polyhedral [ cone of]majorization K ∗ λδ (§A.1 no.23)−D 1insures the existence of a matrix1 T in S N+10 having eigenvalues Υ0⋆solving (a). We attempt to find that matrix by solving the sequence (a) (b),and then projecting the symmetric result on S N+10 in subproblem (791c) by

270 CHAPTER 7. EDM PROXIMITYsimply resetting all main diagonal entries of QR ⋆ Υ ⋆ R ⋆T Q T to 0 ; id est,∥[ ∥ ∥∥∥H ⋆ = ∆ −H 1∥∥∥2arg minimize]−H∈S N 1 T QR0⋆ Υ ⋆ R ⋆T Q T = QR ⋆ Υ ⋆ R ⋆T Q T − δ 2 (QR ⋆ Υ ⋆ R ⋆T Q T )subject to δ(H) = 0This produces a new input H we call H ⋆ :1. (791a) produces feasible Υ ⋆2. (791b) calculates R ⋆ in closed form3. (791c) provides H ⋆ by zeroing diagonal entries7.3.3.1 Alternation convergenceF(794)We have no theoretical guarantee the alternation (791)(a) (b) (c) will converge,in the sense of §E.9, because Procrustes problem (791b) is a projectionon a non-convex set. Because every matrix is orthogonally equivalent to amatrix having equal diagonal entries [28, §2.2, prob.3], an alternative butequally valid interpretation of subproblem (c) at convergence is a rotation(§B.5.2) of QR ⋆ Υ ⋆ R ⋆T Q T into S N+10 ; an adjustment to the rotation found in(b). A test for convergence might therefore be detection of a fixed point Υ ⋆of projection product.7.3.3.2 (791a) conversion to SDPSubproblem (791a) is easily converted to a semidefinite program with affineconstraints. Since variable Υ is a diagonal matrix and R T ΛR fixed, onlydiagonal entries participate in the minimization because δ(K ρ+2λ) belongs toa subspace. (§E.8.3) Hence the objective is equivalently stated‖δ(Υ) − δ(R T ΛR)‖ 2 2 (795)and then converted to a Schur-form semidefinite program: (§A.4.1)minimizet,Υsubject tot[]tI δ(Υ) − δ(R T ΛR)(δ(Υ) − δ(R T ΛR)) T≽ 0 (796)tδ(Υ) ∈ K ρ+2λ∩ K ∗ λδ ...

7.3. THIRD PREVALENT PROBLEM: 271Because any permutation matrix is an orthogonal matrix, we must insureδ(R T ΛR) ∈ K M ⊂ R N+1 membership to the monotone cone (§2.9.2.2.2);arrangement in nonincreasing order.7.3.3.2.1 Solution of (791)(b) and (c) by Lagrange multipliers...

272 CHAPTER 7. EDM PROXIMITY

Chapter 8EDM completionIt is not known how to proceed if one wishes to restrict the dimensionof the Euclidean space in which the configuration of pointsmay be constructed.−Michael W. Trosset, 2000 [47]Intriguing is the question of whether the list in X ∈ R n×N (54) may bereconstructed given an incomplete EDM, and under what circumstances thereconstruction is unique. In this chapter we assume an EDM is only partiallyknown, and the known entries are noiseless. 8.1 Due to translation invarianceof an EDM (§4.5), we need only consider reconstruction in subspace R r wherer is affine dimension.We examined this completion problem for small Example 4.3.0.0.2(p.132); revisited in §4.9.3 and §4.12.4.1. When the cardinality N exceeds 4,it is no longer sufficient or convenient to use the fifth Euclidean axiom as wedid in §4.12.4.1; we need a universal theory.8.0 c○ 2001 Jon Dattorro, all rights reserved.8.1 Unknown entries may, for example, be constrained by upper and lower bounds; indeed,all the entries might be unknown having bound constraints.273

274 CHAPTER 8. EDM COMPLETION(b)(c)dvec∂EDM 3(a)∂H0Figure 8.1: (a) Intersection of hyperplane ∂H (representing EDM entryd 23 =1/5) with EDM cone EDM 3 in isomorphic R 3 (both drawn truncated).EDMs in this dimension corresponding to affine dimension 1 comprise therelative boundary of the EDM cone (§5.4) which, by itself, is not a convexset. Since the intersection illustrated includes a subset of the cone’s relativeboundary, then it is obvious there exist many EDM completions correspondingto affine dimension 1. (Rounded vertex of cone is artifact of plottingroutine.) (b) d 13 =1/5. (c) d 12 =1/5.

8.1. COMPLETION NOT NECESSARILY UNIQUE 2758.1 Completion not necessarily uniqueIt remains somewhat of an open question as how to efficiently identify whichmatrices can be completed to positive semidefinite or Euclidean distance.[86] [54] [138] [139] Yet the semidefinite version of Farkas’ lemma, originallyregarding only linear inequalities (polyhedral cones, circa 1902), provides aclue:8.1.0.2.1 Lemma. Semidefinite Farkas’ lemma. 8.2Given an arbitrary set {A i ∈ S N , i=1... m} and a vector b∈R m , define theaffine setA ∆ = {X ∈ S N | 〈A i , X〉 = b i , i=1... m} (797)Then A ∩ S N + is nonempty if and only if, for each and every ‖y‖=1,m∑y i A i ≽ 0 ⇒ y T b ≥ 0 (798)i=18.1.0.3 PSD cone intersectionSemidefinite Farkas’ lemma follows directly from a membership relation(§2.8.2.0.1) and the convex cones from Example 2.8.2.4.1. It provides theconditions required for a set of hyperplanes to share a common intersectionwith the positive semidefinite cone. To apply the lemma to a semidefinitecompletion problem, one simply substitutes members of the standard orthonormalbasis 8.3 in isomorphic R N(N+1)/2 (38) for the A i corresponding toknown entries b i . Mere existence of any particular y =y p , failing to fulfillimplication (798), is a certificate attesting: no positive semidefinite completionexists.8.2 While the lemma as stated is correct, Ye [4, §1.3.8] points out that a positive definiteversion of this lemma is required for semidefinite programming because a feasible pointin A ∩ int S N + is required by Slater’s condition [1, §5.2.3] to achieve 0 duality gap. Inour circumstance, assuming a nonempty intersection, a positive definite lemma is requiredto insure a point of intersection closest to the origin is not at infinity; e.g., Figure 2.14.A positive definite Farkas’ lemma can easily be constructed from a membership relation(187).8.3 When m=N(N +1)/2 and {A i } constitutes the standard orthonormal basis for S N(38), then semidefinite Farkas’ lemma reduces to a theorem of Fejér (200).⋄

276 CHAPTER 8. EDM COMPLETION8.1.0.4 EDM cone intersectionWe know any EDM is a member of a pointed closed convex cone EDM N .(§5) If one or more entries of a particular EDM are fixed, then the geometricinterpretation of the feasible set of completions is like the semidefinitecompletion problem; it is the intersection of the EDM cone in isomorphicR N(N−1)/2 with as many hyperplanes as there are fixed symmetric entries.Unless the common intersection is a point (assuming it is nonempty) thenthe number of completions is generally infinite, and those corresponding toparticular affine dimension r < N − 1 belong to some non-convex subset ofthat intersection (confer §3.2.0.0.1).Depicted in Figure 8.1(a) is an intersection of the EDM cone EDM 3 witha hyperplane representing the set of all EDMs having one fixed symmetricentry. In this dimension, it is impossible to represent a unique nonzerocompletion corresponding to affine dimension 1, for example, using a singlehyperplane because any hyperplane supporting the relative boundaryat a particular point Γ contains an entire ray {ζΓ | ζ ≥ 0} belonging torel ∂EDM 3 by lemma 2.6.3.0.1. Yet it is easy to realize a unique intersectionof two hyperplanes with the relative boundary at Γ . 8.4The sheer number of possible completions may be minimized, of course,by introducing more constraints or by optimizing some objective functionover the feasible set which is the intersection A ∩ S N + .Non-uniqueness of a completed EDM translates to non-uniqueness of anisometric reconstruction; 8.5 in other words, a multiplicity of completions correspondto incongruent realizations.8.4 Because any affine transformation of a hyperplane remains affine (§2.1.0.10.2),we may instead realize the intersection with the associated positive semidefinite coneS 2 +=−VN T EDM3 V N (614) for a rank-one solution. Barvinok (§2.6.6.4.1) provides a leastupper bound on the affine dimension (rankVN TDV N) in which there can exist a corresponding(not necessarily unique) completion given a number m of fixed symmetric entries; thisbound increases as √ m . When the feasible set of completions consists of a single point,then Barvinok’s bound becomes the greatest upper bound; in this case m = 2, the boundequals affine dimension 1.8.5 Isometric reconstruction from an EDM means building a list X correct to within arotation, reflection, and translation. (§4.5.3)

8.1. COMPLETION NOT NECESSARILY UNIQUE 277(a)(c)3c4cx 2 x 3√d24√d34√d142cxx 1 4 x 4x 5 xx 63 x 5x 1x 1x 2x 3x 2x 4x 4Figure 8.2: (a) Given three distances indicated with absolute point positionsx 1 ,x 2 ,x 3 known and not collinear, then absolute position of x 4 can be preciselyand uniquely determined in R 2 by trilateration. If list is realizable inR 2 , then realization is unique. Ignoring x 3 for example, position of x 4 acquiresambiguity. Yet four-point list must always be embeddable in subspacehaving affine dimension rankVN TDV N not exceeding 3. EDM graphs (b) (c)and (d) represent EDMs in various states of completion. Line segments representknown distances, and may cross without vertex at intersection. (b)Now only relative position of non-collinear x 1 ,x 2 ,x 3 is known; indicated byheavy reference triangle. Knowing all distance information, relative pointposition x 4 can be precisely and uniquely determined. If list is realizable inR 2 , then isometric realization is unique. (c) When fifth point is introduced,only distances to x 1 ,x 2 ,x 3 are required to unambiguously determine relativeposition of x 5 in R 2 with respect to reference triangle. Graph represents firstinstance of missing distance information; √ d 45 . (d) For six points in R 2 ,there√are three distances absent from complete set of inter-point distances;d45 , √ d 46 , √ d 56 . Yet unique isometric reconstruction is certain.(b)(d)

278 CHAPTER 8. EDM COMPLETION8.2 Requirements for unique completion8.2.1 Conservative requirements in R 2The well-known geometric technique trilateration requires only three noncollinearabsolute point-positions x 1 ,x 2 ,x 3 in R 2 to uniquely determine absoluteposition of a fourth point x 4 from distance information √ d 14 , √ d 24 ,√d34 . (Figure 8.2(a)) The classical calculation finds the common intersection√of√three circles respectively centered at x 1 , x 2 , and x 3 having radiid14 , d24 , and √ d 34 . This technique is used by GPS receivers (globalpositioning system, §4.11.3.0.1).If only isometric reconstruction is desired, then we need have only relativeposition of the three non-collinear points; and then trilateration is no longerpossible. To determine relative position of x 1 ,x 2 ,x 3 , the triangle inequalityaxiom of the Euclidean metric is necessary and sufficient. (§4.12.1) Combiningwith √ d 14 , √ d 24 , √ d 34 the new distances between unknown x 1 ,x 2 ,x 3represented by the heavy reference triangle in Figure 8.2(b), then we have acomplete EDM for four points;⎡⎢⎣0 d 12 d 13 d 14d 12 0 d 23 d 24d 13 d 23 0 d 34d 14 d 24 d 34 0⎤⎥⎦ (799)(The bold entries represent the reference triangle in a principal 3 × 3 submatrix;all entries here are known.) If a realization exists in R 2 , then theirisometric reconstruction is unique by injectivity of D ; (§4.6) meaning, if theEDM describing the four-point list is unique, then so will be its correspondingisometric realization.Now consider augmentation of our four-point list with a fifth point x 5illustrated in Figure 8.2(c). To unambiguously describe the relative positionof x 5 in R 2 , we need only the reference triangle and √ √ √d 15 , d25 , d35 .If a realization exists in R 2 , then it exists in the plane of the referencetriangle. This means the two isometric realizations thus far ˜x 1 , ˜x 2 , ˜x 3 , ˜x 4and ˜x 1 , ˜x 2 , ˜x 3 , ˜x 5 coexist in R 2 . Since each realization is unique with respectto the reference triangle, then the combined realization is unambiguous inR 2 . Relative position of five points in R 2 is therefore specified uniquely by

8.2. REQUIREMENTS FOR UNIQUE COMPLETION 279only 5(5 −1)/2 −1 distances;⎡⎢⎣0 d 12 d 13 d 14 d 15d 12 0 d 23 d 24 d 25d 13 d 23 0 d 34 d 35d 14 d 24 d 34 0 ?d 15 d 25 d 35 ? 0⎤⎥⎦(800)one distance-square shy of the total amount of information N(N −1)/2 foran EDM in EDM N .Augmenting now with a sixth point in R 2 , we count 6(6−1)/2−3 requireddistances illustrated in Figure 8.2(d). The emergent pattern becomes clearand we find for N > 3, onlyN(N − 1)/2 − (N − 3)(N − 4)/2 = 3N − 6 (801)distances need be known for unique isometric reconstruction in R 2 . In termsof the noiseless EDM D ∈ EDM N , assuming the reference triangle has nonemptyinterior 8.6 and vertices corresponding to the first three elements froma corresponding list X, the known distances-square need constitute only thefirst three rows of D ;⎡⎢⎣0 d 12 d 13 d 14 d 15 d 16d 12 0 d 23 d 24 d 25 d 26d 13 d 23 0 d 34 d 35 d 36d 14 d 24 d 34 0 ? ?d 15 d 25 d 35 ? 0 ?d 16 d 26 d 36 ? ? 08.2.1.1 Other EDM completion patterns⎤⎥⎦(802)The tie between a planar reference triangle and particular rows of D is unnecessary.We may justifiably assert without loss of generality: If a list isrealizable in R 2 , then its unique isometric reconstruction is certain givenat least any three complete rows of the corresponding noiseless EDM thatcontain a representation of a reference triangle (having nonempty interior)in a 3 ×3 principal submatrix. [Caetano...]8.6 non-degenerate; planar; the vertices are non-collinear.

280 CHAPTER 8. EDM COMPLETION8.2.2 Unique completion in any dimension rExtension to affine dimension r greater than 2 is now clear. Continuingany inductive proof by beginning with unique absolute reconstruction of fivepoints in R 3 via trilateration using four spheres, we jump to the logical conclusion:• Given a reference simplex in R r (§2.7.3) whose vertices correspond toany complete principal submatrix D i ∈ EDM r+1 of an incomplete noiselessEDM D in EDM N , if a list is realizable in R r then its uniqueisometric reconstruction in R r is certain knowing at least any r + 1complete rows containing submatrix D i .That lower bound on the total number of entries dispels any notion ofdistance data proliferation in low affine dimension. For unique isometricreconstruction in subspace R r , the lower bound on the total number of knowndistances is O(N) when affine dimension r is less than N − 2; id est, forN > r + 1 [sic], 8.7 [140]12 N(N−1)− 1 (r + 1)(r + 2)(N−(r+1))(N−(r+2)) = (r+1)N −2 2(803)is a sufficient number of known distances for unique isometric reconstruction.When affine dimension r reaches N − 2, then all distances-square in theEDM must be known for unique isometric reconstruction in R r . Going theother way, when r = 1 then bound (803) is equivalent to the constraint thatthe EDM graph be connected. [141, §2.2]8.2.2.1 Connecting instead to other simplicesAssuming the existence of multiple r + 1-tuples of points from a list, eachcomprising vertices of a simplex in R r , then some freedom is gained in constructionof a corresponding EDM having a minimal number of known entries.Assembly of an EDM graph is equally simple as Figure 8.2:1. Vertices of a simplex are chosen and assigned the smallest indices asbefore. All inter-point distances between vertices {x 1 , x 2 ... x r+1 } ofthis initial simplex are assumed known.8.7 The strict inequality follows from trilateration and eliminates local ambiguity due toreflective isometry.

8.2. REQUIREMENTS FOR UNIQUE COMPLETION 281Figure 8.3: EDM graphs having 2N −3 known distances. (a) Redrawing ofFigure 4.2 to emphasize obscured segments x 2 x 4 , x 4 x 3 , and x 2 x 3 . Fivedistances are specified, while the reference simplex is the triangle. The EDMso represented is incomplete, missing d 14 as in (326), yet the isometric reconstructionis unique as shown in §4.9.3 and §4.12.4.1. (b) The correspondingincomplete EDM has a unique isometric reconstruction.2. Each successive point chosen is labelled ordinally (beginning x r+2 ), andthen connected to any set of r +1 points of lower index whose convexhull would make a simplex. (Inter-vertex distances of this simplex neednot necessarily be known.)3. Step 2 is repeated until the last point is connected. 8.8This means we may redistribute the (803) known distances-square in anincomplete EDM so that each column beyond the r+1 th contains at least r+1known and noiseless entries above the main diagonal. [Caetano...] There are,for example, forty matrices like (802) for unique isometric reconstruction in8.8 Proof of this procedure is left pending.

282 CHAPTER 8. EDM COMPLETIONR 2 having only three entries per column above the main diagonal in columns4, 5, and 6; here is one:⎡⎤0 d 12 d 13 d 14 ? ?d 12 0 d 23 d 24 d 25 ?d 13 d 23 0 d 34 d 35 d 36⎢ d 14 d 24 d 34 0 d 45 d 46(804)⎥⎣ ? d 25 d 35 d 45 0 d 56⎦? ? d 36 d 46 d 56 08.2.2.2 Greatest lower boundThe lower bound on the number of known distances for unique isometricreconstruction in low affine dimension remains O((r + 1)N) as in bound(803). Yet as claimed by Huang, Liang, & Pardalos in [54, §4.2], bound(803) is the greatest lower bound; in that light, we may regard it as aconservative but reliable estimate of the minimum number of entries requiredin an incomplete EDM for unique isometric reconstruction. Furtherreduction in the number of distances required for unique isometric reconstructionis achievable under certain conditions on connectivity and arrangementof points; namely, O(2N) independent of affine dimension r ; e.g.,Figure 8.3(b). Figure 8.3(a), originally presented on page 133 from smallcompletion problem Example 4.3.0.0.2, is an example of such a reductionin R 2 with N = 4 points and only 2N − 3 = 5 known symmetric entries.The incomplete four-point EDM that Figure 8.3(a) represents correspondsto a unique realization, although the complete four-point examplein Figure 8.4(b) will not yield a unique realization when any one of the distancesis left unspecified. Figure 8.4(c) and (d) represent further reductionwith respect to bound (803) corresponding to the already incomplete EDMsrepresented in Figure 8.2(c) and (d).

8.2. REQUIREMENTS FOR UNIQUE COMPLETION 283(a)(c)x 2x 2x 3x 3x 3x 5x 5x 6x 4x 1x 4x 4x 1x 1x 2Figure 8.4: EDM graphs. (a) Minimum connectivity per vertex (say, 3 pervertex (3N) in R 2 ) is an insufficient rule for unique isometric reconstruction;illustrated by the dangling diamonds. Yet (b) (c) and (d) each produce aunique isometric reconstruction.(b)(d)

284 CHAPTER 8. EDM COMPLETION0.50.40.30.20.10−0.1−0.2−0.3−0.4−0.5−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5Figure 8.5: Actual positions of fifty sensors in R 2 , only three of which (theanchors) have position × known precisely.8.3 Completion problems8.3.1 Cattle problemThe following scenario is motivated by the ancient problem of determiningthe whereabouts of livestock. Each cow is outfitted with a low-power transceiverfrom which distance information can be determined to all the others towithin a specified accuracy. Because of the requirement for very low powerconsumption and operation over long periods measured in years, each transmitteris necessarily limited in its effective range and its accuracy is modest.A large number of sensors correspond to N points in R 2 . We are given absoluteposition with high precision of each point in a small subset of anchors{x l ∈ R 2 , l=1... M} where without loss of generality we make x 1 ∆ = 0 and0≤M ≤N . For each point pair, known and unknown, is given an upper andlower bound on their squared separation; respectively, d ij and d ij , i ≠ j .If one sensor is in radio-range ̺ of another, then the actual distance √ d ijbetween them can be determined only to within a constant factor η of themeasured distance√ˆdij ;d ij = ˆd ij (1 − η) 2 ≤ d ij ≤ ˆd ij (1 + η) 2 = d ij , i≠j (805)

8.3. COMPLETION PROBLEMS 285id est, all actual distances √ d ij are unknown except those between the anchors.If one sensor is out of radio-range of another, then̺2 ≤ d ij < ∞ , i≠j (806)is all that is known. From this information we must determine absoluteposition of the remaining N −M sensors.8.3.1.1 Problem insightFrom the parameters of this problem, it is apparent that a low-dimensionalsolution exists because the origin of the data is earthly terrain. Even with theuncertainty caused by the accuracy tolerance η , we know a priori that withinthe bounds (805) there must lie a two- or at most three-dimensional solution.We therefore design the problem statement to have this low dimensionalitybuilt-in.For convenience, we normalize all quantities; id est, distance is unitless.

286 CHAPTER 8. EDM COMPLETION8.3.1.1.1 Example. Radio-wave sensor network localization... [142, §5][143] We are given N sensors and M anchors (as already explained), andincomplete noisy distance information in R 2 . This problem can be posedin terms of the Gram matrix (378) having (M −1)M/2 affine equality constraintson it[ ] 0 0G = X T TX =0 −VN TDV , x 1 = 0 (807)Naccounting for the M known sensor positions. We substitute the convex rankenvelope tr(−V † N DV N) (§7.2.2) for affine dimension:whereminimize tr(−V † N DV N)Dsubject to tr ( −VN TDV )NE i−1,j−1 = bij , 2 ≤ i ≤ j =2... M(808)D ≤ D ≤ DD ∈ EDM Ncenv rank(−V † N DV N) ∝ tr(−V † N DV N) (809)where E ij ∈ S N−1 is a member of the standard orthonormal basis for S N−1(38), where b ii = G ii and b ij = √ 2G ij , i ≠ j , and where D = [ ]]d ij andD =[d ij . Using this approach, there is no guarantee that affine dimensionof the reconstruction is 2.Figure 8.5 shows N = 50 sensors. Only M = 3 sensor positions areknown. D and D are known and respectively within ±10% of actual distancesquare.Solving (808), we findλ(−V T NDV N ) = {11.72, 3.397, 0.06208, 0.01160, 0.00005193...} (810)meaning, the list reconstructed from D thus found does not have affine dimension2 in the Platonic sense. Calculating the Cholesky factorization of−V T N DV N (§4.10.1), we find the list depicted by o in Figure 8.6 by projectingthe result on the R 2 plane. 8.9There is significant energy in the higher-dimensional coordinates of thefactorization (not illustrated) that can account for the discrepancy between8.9 Projecting instead −V T N DV N on the positive semidefinite cone produces no betterresult.

8.3. COMPLETION PROBLEMS 2870.50.40.30.20.10−0.1−0.2−0.3−0.4−0.5−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5Figure 8.6: Solution o to problem (808) having Figure 8.5 input overlaid.Solution data required translation and rotation compensation (§C.2).actual and estimated position; meaning, our only choice is to seek anotheralgorithm that produces a result having affine dimension equal to 2. We willreturn to this example.8.3.2 Chemistry problemOne of the most successful applications of Euclidean distance geometry appearsto be in computational chemistry. [49, in cit.] In [144] [127], for example,Trosset applies distance geometry to an important problem called molecularconformation [22] where H represents measurements of interatomicdistance, each with specified tolerance.8.3.2.0.2 Example. Molecular conformation.Here we reproduce the work of....

288 CHAPTER 8. EDM COMPLETION8.3.2.0.3 notes...inequality constraints only; no fixed data... Of special interest is thecase when there are no equality constraints; in effect, we are asked to synthesizethe entire EDM based only upon the known upper and lower bounds...solution of completion problem by alternating projection; e.g., appendix...applications in [86, in cit.]...physical barrier problem...Other researchers [47] [94] have formulated this completion problem ina non-convex way... We will utilize convexity of −VN TD(X)V N (§4.4.1) toreconstruct the list.Objective is real (not matrix-valued) when you use the equivalentformulation (329). z becomes another variable, though.Minimize affine dimension §7.2.2... this works (see comp.sdp and Trosset’sexample)minimize − tr(V † N DV N)Dsubject to 〈E ij , D〉 = h ij , i,j = 1... M(811)or D ◦ Φ = HD ∈ EDM NUse the vector form (330) in a sum of d ij that is trace, to express theproblem in X .

Chapter 9Piano tuning9.0 c○ 2001 Jon Dattorro, all rights reserved.289

290 CHAPTER 9. PIANO TUNING

Appendix ALinear algebraA.1 Main diagonal δ operator, trace, vecWhen linear function δ operates on a square matrix A∈ R N×N , δ(A) returnsa vector composed of all the entries from the main diagonal in the naturalorder;δ(A) ∈ R N (42)Operating on a vector, δ naturally returns a diagonal matrix. Operatingrecursively on a diagonal matrix Λ∈ R N×N , δ(δ(Λ)) returns Λ itself;δ 2 (Λ) ∆ = δ(δ(Λ)) ∆ = Λ ∈ R N×N (812)Defined in this manner, linear operator δ is self-adjoint [38, §3.10, §9.5-1]; A.1videlicet, for y ∈ R N , (§2.1.1)δ(A) T y = 〈δ(A), y〉 = 〈A,δ(y)〉 = tr ( A T δ(y) ) (813)This δ notation is efficient and unambiguous as illustrated in the followingexamples where A ◦ B denotes the Hadamard product [28] [44, §1.1.4] ofA.0 c○ 2001 Jon Dattorro, all rights reserved.A.1 Linear operator T : R m×n → R M×N is self-adjoint when, for each and everyX 1 , X 2 ∈ R m×n ,〈T(X 1 ), X 2 〉 = 〈X 1 , T(X 2 )〉291

292 APPENDIX A. LINEAR ALGEBRAmatrices of like size, σ denotes a vector of non-increasingly ordered singularvalues, and λ denotes a vector of non-increasingly ordered eigenvalues: forA,B∈ R N×N ,1. δ(A) = δ(A T )2. tr(A) = tr(A T ) = δ(A) T 13. 〈I, A〉 = trA4. δ(cA) = cδ(A), c∈R5. tr(c √ A T A) = c tr √ A T A = c1 T σ(A), c∈R6. tr(cA) = c tr(A) = c1 T λ(A), c∈R7. δ(A + B) = δ(A) + δ(B)8. tr(A + B) = tr(A) + tr(B)9. δ(AB) = (A ◦ B T )1 = (B T ◦ A)110. δ(AB) T = 1 T (A T ◦ B) = 1 T (B ◦ A T )⎡ ⎤u 1 v 111. δ(uv T ⎢ ⎥) = ⎣ . ⎦ = u ◦ v , u,v ∈ R Nu N v N12. tr(A T B) = tr(AB T ) = tr(BA T ) = tr(B T A)= 1 T (A ◦ B)1 = 1 T δ(AB T ) = δ(A T B) T 1 = δ(BA T ) T 1 = δ(B T A) T 113. y T Bδ(A) = tr ( Bδ(A)y T) = tr ( δ(B T y)A ) = tr ( Aδ(B T y) )= δ(A) T B T y = tr ( yδ(A) T B T) = tr ( A T δ(B T y) ) = tr ( δ(B T y)A T)14. δ 2 (A T A) = ∑ ie i e T iA T Ae i e T i15. δ ( δ(A)1 T) = δ ( 1δ(A) T) = δ(A)16. vec(AXB) = (B T ⊗ A) vec X17. vec(BXA) = (A T ⊗ B) vec X

A.1. MAIN DIAGONAL δ OPERATOR, TRACE, VEC 29318. tr(AXBX T ) = vec(X) T vec(AXB) = vec(X) T (B T ⊗ A) vec X [40]19. tr(AX T BX) = vec(X) T vec(BXA) = vec(X) T (A T ⊗ B) vec X20. For ζ =[ζ i ]∈ R k and x=[x i ]∈ R k ∑, ζ i /x i = ζ T δ(x) −1 1.21. Let λ(A)∈ C N denote the eigenvalues of A ∈ R N×N . Theniδ(A) = λ(I ◦A) (814)22. For any permutation matrix Ξ and dimensionally compatible vector yor matrix A ,δ(Ξy) = Ξδ(y) Ξ T (815)δ(ΞAΞ T ) = Ξδ(A) (816)So given any permutation matrix Ξ and any compatible matrix B ,for example,δ 2 (B) = Ξδ 2 (Ξ T B Ξ)Ξ T (817)23. Theorem (Schur). Majorization. [45, §7.4] [28, §4.3] [25, §5.5]Let λ ∈ R N denote a vector of eigenvalues, and let δ ∈ R N denote avector of main diagonal entries, both arranged in nonincreasing order.Then∃A∈ S N λ(A)=λ and δ(A)= δ ⇐ λ − δ ∈ K ∗ λδ (818)and converselyA∈ S N ⇒ λ(A) − δ(A) ∈ K ∗ λδ (819)the pointed empty-interior polyhedral cone of majorization(confer (178))∆= K ∗ M+ ∩ {ζ1 | ζ ∈ R} ∗ (820)K ∗ λδwhere K ∗ M+ is the dual monotone nonnegative cone (250), and wherethe dual of the line is a hyperplane; ∂H = {ζ1 | ζ ∈ R} ∗ = 1 ⊥ . ⋄In the particular circumstance δ(A)=0, we get:

294 APPENDIX A. LINEAR ALGEBRACorollary. Symmetric hollow majorization.Let λ ∈ R N denote a vector of eigenvalues arranged in nonincreasingorder. Then∃A∈ S N 0 λ(A)=λ ⇐ λ ∈ K ∗ λδ (821)and converselywhere K ∗ λδis defined in (820).⋄A∈ S N 0 ⇒ λ(A) ∈ K ∗ λδ (822)The majorization cone K ∗ λδ is a natural consequence of the traditionaldefinition of majorization; id est, vector y ∈ R N majorizes vectorx if and only ifk∑ k∑x i ≤ y i ∀ 1 ≤ k ≤ N (823)andi=1i=11 T x = 1 T y (824)Under these circumstances, rather, vector x is majorized by vector y .A.2 Semidefiniteness: domain of testThe most fundamental necessary, sufficient, and definitive test for positivesemidefiniteness of matrix A ∈ R n×n is: [25, §1]x T Ax ≥ 0 for each and every x ∈ R n such that ‖x‖ = 1. (825)Traditionally, authors demand evaluation over broader domain; namely, overall x ∈ R n which is sufficient but unnecessary. Indeed, that standard textbookrequirement is far over-reaching because if x T Ax is nonnegative forparticular x=x p , then it is nonnegative for any αx p where α ∈R . Thus,only normalized x in R n need be evaluated.Many authors add the further requirement that the domain be complex;the broadest domain. By so doing, only Hermitian matrices (A H = A wheresuperscript H denotes conjugate transpose) A.2 are admitted to the set ofpositive semidefinite matrices (829); an unnecessary prohibitive constraint.A.2 Hermitian symmetry is the complex analogue; the real part of a Hermitian matrixis symmetric while its imaginary part is antisymmetric. A Hermitian matrix has realeigenvalues and real main diagonal.

A.2. SEMIDEFINITENESS: DOMAIN OF TEST 295A.2.1Symmetry versus semidefinitenessWe call (825) the most fundamental test of positive semidefiniteness. Yetsome authors instead say, for real A and complex domain (x ∈ C n ), thecomplex test x H Ax ≥ 0 is most fundamental. That complex broadeningof the domain of test causes nonsymmetric real matrices to be excludedfrom the set of positive semidefinite matrices. Yet admitting nonsymmetricreal matrices or not is a matter of preference A.3 unless that complex test isadopted, as we shall now explain.Any real square matrix A has a representation in terms of its symmetricand antisymmetric parts; id est,A = (A +AT )2+ (A −AT )2Because, for all real A, the antisymmetric part vanishes under real test,x T (A −AT )2(826)x = 0 (827)only the symmetric part of A, (A +A T )/2, has a role determining positivesemidefiniteness. Hence the oft-made presumption that only symmetricmatrices may be positive semidefinite is, of course, erroneous under (825).Because eigenvalue-signs of a symmetric matrix translate unequivocally to itssemidefiniteness, the eigenvalues that determine semidefiniteness are alwaysthose of the symmetrized matrix. (§A.3) For that reason, and because symmetric(or Hermitian) matrices must have real eigenvalues, the conventionadopted in the literature is that semidefinite matrices are synonymous withsymmetric semidefinite matrices. Certainly misleading under (825), that presumptionis typically bolstered with compelling examples from the physicalsciences where symmetric matrices occur within the mathematical expositionof natural phenomena. A.4 [145, §52]Perhaps a better explanation of this pervasive presumption of symmetrycomes from Horn & Johnson [28, §7.1] whose perspective A.5 is the complexmatrix, thus necessitating the complex domain of test throughout their treatise.They explain, if A∈C n×nA.3 Golub & Van Loan [44, §4.2.2], for example, admit nonsymmetric real matrices.A.4 Symmetric matrices are certainly pervasive in the our chosen subject as well.A.5 A totally complex perspective is not necessarily more advantageous. The positivesemidefinite cone, for example, is not self-dual (§2.8.2.4) in the ambient space of Hermitianmatrices. [58, §II]

296 APPENDIX A. LINEAR ALGEBRA...and if x H Ax is real for all x ∈ C n , then A is Hermitian.Thus, the assumption that A is Hermitian is not necessary in thedefinition of positive definiteness. It is customary, however.Their comment is best explained by noting, the real part of x H Ax comesfrom the Hermitian part (A +A H )/2 of A ;Re(x H Ax) = x H A +AH2x (828)rather,x H Ax ∈ R ⇔ A H = A (829)because the imaginary part of x H Ax comes from the anti-Hermitian part(A −A H )/2 ;Im(x H Ax) = x H A −AH2x (830)that vanishes for nonzero x if and only if A = A H . So the Hermitiansymmetry assumption is unnecessary, according to Horn & Johnson, not becausenon-Hermitian matrices could be regarded positive semidefinite, ratherbecause non-Hermitian (includes nonsymmetric real) matrices are not comparableon the real line under x H Ax . Yet that complex edifice is dismantledin the test of real matrices (825) because the domain of test is no longernecessarily complex; meaning, x T Ax will certainly always be real, regardlessof symmetry, and so real A will always be comparable.In summary, if we limit the domain of test to x in R n as in (825), then nonsymmetricreal matrices are admitted to the realm of semidefinite matricesbecause they become comparable on the real line. One important exceptionoccurs for rank-one matrices Ψ=uv T where u and v are real vectors: Ψ ispositive semidefinite if and only if Ψ=uu T . (§A.3.1.0.7)We might choose to expand the domain of test to x in C n so that onlysymmetric matrices would be comparable. The alternative to expandingthe domain of test is to exclusively assume all matrices of interest to besymmetric; that is commonly done, hence the synonymous relationship withsemidefinite matrices.

A.2. SEMIDEFINITENESS: DOMAIN OF TEST 297A.2.1.0.1 Example. Nonsymmetric positive definite product.Horn & Johnson assert and Zhang agrees:If A,B ∈ C n×n are positive definite, then we know that theproduct AB is positive definite if and only if AB is Hermitian.[28, §7.6, prob.10] [45, §6.2, §3.2]Implicitly in their statement, A and B are assumed individually Hermitianand the domain of test is assumed complex.We prove that assertion to be false for real matrices under (825) thatadopts a real domain of test.A T = A =⎡⎢⎣3 0 −1 00 5 1 0−1 1 4 10 0 1 4⎤⎥⎦ ,⎡λ(A) = ⎢⎣5.94.53.42.0⎤⎥⎦ (831)B T = B =(AB) T ≠ AB =⎡⎢⎣⎡⎢⎣4 4 −1 −14 5 0 0−1 0 5 1−1 0 1 4⎤⎥⎦ ,13 12 −8 −419 25 5 1−5 1 22 9−5 0 9 17⎤⎥⎦ ,⎡λ(B) = ⎢⎣8.85.53.30.24⎡λ(AB) = ⎢⎣⎤⎥⎦ (832)36.29.10.0.72⎤⎥⎦ (833)1(AB + 2 (AB)T ) =⎡⎢⎣13 15.5 −6.5 −4.515.5 25 3 0.5−6.5 3 22 9−4.5 0.5 9 17⎤⎥⎦ , λ( 12 (AB + (AB)T ) ) =⎡⎢⎣36.30.10.0.014(834)Whenever A ∈ S n + and B ∈ S n + , then λ(AB)= λ(A 1/2 BA 1/2 ) will always bea nonnegative vector by (852) and Corollary A.3.1.0.5. Yet positive definitenessof the product AB is certified instead by the nonnegative eigenvalues⎤⎥⎦

298 APPENDIX A. LINEAR ALGEBRAλ ( 1(AB + 2 (AB)T ) ) in (834) (§A.3.1.0.1) despite the fact AB is not symmetric.A.6 Horn & Johnson and Zhang resolve the anomaly by choosing toexclude nonsymmetric matrices and products; they do so by expanding thedomain of test to C n .A.3 Proper statementsof positive semidefinitenessUnlike Horn & Johnson and Zhang, we never adopt the complex domain oftest in regard to real matrices. So motivated is our consideration of properstatements of positive semidefiniteness under real domain of test. This restriction,ironically, complicates the facts when compared to the correspondingstatements for the complex case (found elsewhere, [28] [45]).We state several fundamental facts regarding positive semidefiniteness ofreal matrix A and the product AB and sum A +B of real matrices underfundamental real test (825); a few require proof as they depart from thestandard texts, while those remaining are well established or obvious.A.3.0.0.1 Theorem. Positive (semi)definite matrix.A∈ S M is positive semidefinite if and only if for each and every real vectorx of unit norm, ‖x‖ = 1 , A.7 we have x T Ax ≥ 0 (825);A ≽ 0 ⇔ tr(xx T A) = x T Ax ≥ 0 (835)Matrix A ∈ S M is positive definite if and only if for each and every ‖x‖ = 1we have x T Ax > 0;A ≻ 0 ⇔ tr(xx T A) = x T Ax > 0 (836)⋄A.6 It is a little more difficult to find a counter-example in R 2×2 or R 3×3 ; which mayhave served to advance any confusion.A.7 The traditional condition requiring all x ∈R M for defining positive (semi)definitenessis actually far more than what is necessary. The restriction to norm-1 vectors is necessaryand sufficient; actually, any particular norm and any nonzero norm-constant will work.

A.3. PROPER STATEMENTS 299Proof. Statements (835) and (836) are each a particular instance ofgeneralized inequality (§2.8.2) with respect to the positive semidefinite cone;videlicet,A ≽ 0 ⇔ 〈xx T , A〉 ≥ 0 ∀xx T (≽ 0)(837)A ≻ 0 ⇔ 〈xx T , A〉 > 0 ∀xx T (≽ 0), xx T ≠ 0Relations (835) and (836) remain true when xx T is replaced with “for eachand every” X ∈ S M + [1, §2.6.1] (§2.8.2.4) of unit norm ‖X‖=1 as inA ≽ 0 ⇔ tr(XA) ≥ 0 ∀X ∈ S M +A ≻ 0 ⇔ tr(XA) > 0 ∀X ∈ int S M +(838)but this condition is far far more than what is necessary. By the discretemembership theorem in §2.8.2.1.3, the extreme directions xx T of the positivesemidefinite cone constitute the minimal set of generators required for necessaryand sufficient discretization of the generalized inequality (838) certifyingmembership to that cone.A.3.1Eigenvalues, semidefiniteness, nonsymmetricWhen A∈R n×n , let λ ( 12 (A +AT ) ) ∈ R n denote eigenvalues of the symmetrizedmatrix A.8 arranged in nonincreasing order.• By positive semidefiniteness of A∈ R n×n we mean, A.9 [146, §1.3.1](confer §A.3.1.0.1)x T Ax ≥ 0 ∀x∈ R n ⇔ A +A T ≽ 0 ⇔ λ(A +A T ) ≽ 0 (839)• (§2.6.6.1)A ≽ 0 ⇒ A T = A (840)A ≽ B ⇔ A −B ≽ 0 A ≽ 0 or B ≽ 0 (841)x T Ax≥0 ∀x A T = A (842)A.8 The symmetrization of A is (A +A T )/2. λ ( 12 (A +AT ) ) = λ(A +A T )/2.A.9 Strang agrees [26, p.334] it is not λ(A) that requires observation. Yet he is mistaken byproposing the Hermitian part alone x H (A + A H )x be tested, because the anti-Hermitianpart does not vanish under complex test unless A is Hermitian. (830)

300 APPENDIX A. LINEAR ALGEBRA• Matrix symmetry is not intrinsic to positive semidefiniteness;A T = A, λ(A) ≽ 0 ⇒ x T Ax ≥ 0 ∀x (843)λ(A) ≽ 0 ⇐ A T = A, x T Ax ≥ 0 ∀x (844)• If A T = A thenλ(A) ≽ 0 ⇔ A ≽ 0 (845)meaning, matrix A belongs to the positive semidefinite cone in thesubspace of symmetric matrices if and only if its eigenvalues belong tothe nonnegative orthant.• For A diagonalizable,rankA = rankδ(λ(A)) (846)meaning, the rank is the same as the number of nonzero eigenvalues bythe 0 eigenvalues theorem (§A.7.2.0.1).• [28, §2.5.4] (confer (25))A is normal ⇔ ‖A‖ 2 F = λ(A) T λ(A) (847)• For A∈ R m×n ,A T A ≽ 0, AA T ≽ 0 (848)because, for dimensionally compatible vector x , x T A T Ax = ‖Ax‖ 2 2 ,x T AA T x = ‖A T x‖ 2 2 .• For A∈ R n×n and c∈R ,tr(cA) = c tr(A) = c1 T λ(A)(§A.1 no.6)detA = ∏ iλ(A) i

A.3. PROPER STATEMENTS 301• For µ ∈ R , all A ∈ R n×n , and vector λ(A) ∈ C n holding the orderedeigenvalues of A ,λ(I + µA) = 1 + λ(µA) (849)Proof: A=MJM −1 and I + µA = M(I + µJ)M −1 where J is theJordan form for A ; [26, §5.6] id est, δ(J) = λ(A) , so λ(I + µA) =δ(I + µJ) because I + µJ is also a Jordan form. Similarly, λ(µI + A) = µ1 + λ(A). For σ(A) holding the singularvalues of any matrix A , σ(I + µA T A) = Ξ |1 + µσ(A T A)| andσ(µI + A T A)=Ξ|µ1 + σ(A T A)| where Ξ is a permutation matrix sorting| · | into nonincreasing order.• For A,B∈S n [39, §1.2] (Fan) (confer (1059))tr(AB) ≤ λ(A) T λ(B) (850)with equality (Theobald) when A and B are simultaneously diagonalizablewith the same ordering of eigenvalues.• For A∈ R m×n and B ∈ R n×m ,tr(AB) = tr(BA) (851)and the nonzero eigenvalues of the product and commuted product areidentical, including their multiplicity; [45, §2.5] [28, §1.3.20]λ 1:η (AB) = λ 1:η (BA), η ∆ =min{m,n} (852)By the 0 eigenvalues theorem (§A.7.2.0.1),rank(AB) = rank(BA), AB and BA diagonalizable (853)

302 APPENDIX A. LINEAR ALGEBRA• For A∈ S n and any nonsingular matrix Y ,inertia(A) = inertia(YAY T ) (854)Known as Sylvester’s law of inertia. (891) [50, §2.4.3]• For A,B∈R n×n square,det(AB) = det(BA) (855)• For A,B ∈ S n , product AB is symmetric if and only if AB is commutative;(AB) T = AB ⇔ AB = BA (856)Proof. (=⇒) Suppose AB = (AB) T . (AB) T = B T A T = BA .AB=(AB) T ⇒ AB=BA .(⇐=) Suppose AB = BA . BA = B T A T = (AB) T . AB = BA ⇒AB=(AB) T .Commutativity alone is insufficient for symmetry of the product.[26, p.26] Diagonalizable matrices A,B∈R n×n commute if and only ifthey are simultaneously diagonalizable. [28, §1.3.12]• For A,B ∈ R n×n and AB = BA ,x T Ax ≥ 0, x T Bx ≥ 0 ∀x ⇒ λ(A+A T ) i λ(B+B T ) i ≥ 0 ∀i x T ABx ≥ 0 ∀x(857)the negative result arising because of the schism between the productof eigenvalues λ(A +A T ) i λ(B +B T ) i and the eigenvalues of the symmetrizedmatrix product λ(AB + (AB) T ) i. For example, X 2 is generallynot positive semidefinite unless matrix X is symmetric; then (848)applies. Simply substituting symmetric matrices changes the outcome:

A.3. PROPER STATEMENTS 303• For A,B ∈ S n and AB = BA ,A ≽ 0, B ≽ 0 ⇒ λ(AB) i =λ(A) i λ(B) i ≥ 0 ∀i ⇔ AB ≽ 0 (858)Positive semidefiniteness of A and B is sufficient but not a necessarycondition for positive semidefiniteness of the product AB .Proof. Because all symmetric matrices are diagonalizable, [26, §5.6]we have A=SΛS T and B=T∆T T , where Λ and ∆ are real diagonalmatrices while S and T are orthogonal matrices. Because (AB) T =AB ,then T must equal S , [28, §1.3] and the eigenvalues of A are orderedidentically to those of B ; id est, λ(A) i = δ(Λ) i and λ(B) i = δ(∆) icorrespond to the same eigenvector.(=⇒) Assume λ(A) i λ(B) i ≥ 0 for i=1... n . AB=SΛ∆S T is symmetricand has nonnegative eigenvalues contained in diagonal matrixΛ∆ by assumption; hence positive semidefinite by (839). Now assumeA,B ≽0. That, of course, implies λ(A) i λ(B) i ≥0 for all i because allthe individual eigenvalues are nonnegative.(⇐=) Suppose AB=SΛ∆S T ≽ 0. Then Λ∆≽0 by (839), and soall products λ(A) i λ(B) i must be nonnegative; meaning, sgn(λ(A))=sgn(λ(B)). We may, therefore, conclude nothing about the semidefinitenessof A and B .• For A,B ∈ S n and A ≽ 0, B ≽ 0 , (Example A.2.1.0.1)AB = BA ⇒ λ(AB) i =λ(A) i λ(B) i ≥ 0 ∀i ⇒ AB ≽ 0 (859)AB = BA ⇒ λ(AB) i ≥ 0, λ(A) i λ(B) i ≥ 0 ∀i ⇔ AB ≽ 0 (860)• For A,B ∈ S n , [45, §6.2]A ≽ 0 ⇒ trA ≥ 0 (861)A ≽ 0, B ≽ 0 ⇒ trA trB ≥ tr(AB) ≥ 0 (862)We have tr(AB) ≥ 0 because A ≽ 0, B ≽ 0 ⇒ λ(AB) =λ(A 1/2 BA 1/2 ) ≽ 0 by (852) and Corollary A.3.1.0.5.A ≽ 0 ⇔ tr(AB)≥ 0, ∀B ≽ 0 (200)

304 APPENDIX A. LINEAR ALGEBRA• For A,B,C ∈ S n (Löwner)A ≼ B , B ≼ C ⇒ A ≼ C (863)A ≼ B ⇔ A + C ≼ B + C (864)A ≼ B , A ≽ B ⇒ A = B (865)• For A,B ∈ R n×n x T Ax ≥ x T Bx ∀x ⇒ trA ≥ trB (866)Proof. x T Ax ≥ x T Bx ∀x ⇔ λ((A − B) + (A − B) T )/2 ≽ 0 ⇒tr(A+A T −(B+B T ))/2 = tr(A −B) ≥ 0. There is no converse.• For A,B ∈ S n [45, §6.2, prob.1]There is no converse. From [45, §6.2],A ≽ B ⇒ trA ≥ trB (867)A ≽ B ≽ 0 ⇒ rankA ≥ rankB (868)A ≽ B ≽ 0 ⇒ detA ≥ detB ≥ 0 (869)• For A,B ∈ int S n + [7, §4.2] [28, §7.7.4],A ≽ B ⇔ A −1 ≼ B −1 (870)• For A,B ∈ S n [45, §6.2]A ≽ B ≽ 0 ⇒ A 1/2 ≽ B 1/2 (871)• For A,B ∈ S n and AB = BA [45, §6.2, prob.3]A ≽ B ≽ 0 ⇒ A k ≽ B k , k=1, 2,... (872)

A.3. PROPER STATEMENTS 305A.3.1.0.1 Theorem. Positive semidefinite ordering of eigenvalues.For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M . Then, [26, §6]x T Ax ≥ 0 ∀x ⇔ λ ( A +A T) ≽ 0 (873)x T Ax > 0 ∀x ⇔ λ ( A +A T) ≻ 0 (874)because x T (A − A T )x = 0. (827) Now arrange the entries of λ ( 1(A 2 +AT ) )and λ ( 1(B 2 +BT ) ) in nonincreasing order so λ ( 1(A 2 +AT ) ) holds the1largest eigenvalue of symmetrized A while λ ( 1(B 2 +BT ) ) holds the largest1eigenvalue of symmetrized B , and so on. Then [28, §7.7, prob.1, prob.9] forκ ∈ R ,x T Ax ≥ x T Bx ∀x ⇒ λ ( A +A T) ≽ λ ( B +B T)x T Ax ≥ x T Ixκ ∀x ⇔ λ ( 1(A 2 +AT ) ) (875)≽ κ1Now let A,B ∈ S M have diagonalizations A=QΛQ T and B =UΥU T withλ(A)=δ(Λ) and λ(B)=δ(Υ) arranged in nonincreasing order. Thenwhere S = QU T . [45, §7.5]A ≽ B ⇔ λ(A−B) ≽ 0 (876)A ≽ B ⇒ λ(A) ≽ λ(B) (877)A ≽ B λ(A) ≽ λ(B) (878)S T AS ≽ B ⇐ λ(A) ≽ λ(B) (879)A.3.1.0.2 Theorem (Weyl). Eigenvalues of sum. [28, §4.3]For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M in nonincreasingorder so λ ( 1(A 2 +AT ) ) holds the largest eigenvalue of symmetrized A while1λ ( 1(B 2 +BT ) ) holds the largest eigenvalue of symmetrized B , and so on.1Then, for any k ∈{1... M} ,λ ( A +A T) k + λ( B +B T) M ≤ λ( (A +A T ) + (B +B T ) ) k ≤ λ( A +A T) k + λ( B +B T) 1(880)⋄Weyl’s theorem establishes positive semidefiniteness of a sum of positivesemidefinite matrices. In fact because S M + is a convex cone, then by (111)A,B ≽ 0 ⇒ ζA + ξB ≽ 0 for all ζ,ξ ≥ 0 (881)⋄

306 APPENDIX A. LINEAR ALGEBRAA.3.1.0.3 Corollary. Eigenvalues of sum and difference. [28, §4.3]For A∈ S M and B ∈ S M + , place the eigenvalues of each matrix into the respectivevectors λ(A),λ(B)∈ R M in nonincreasing order so λ(A) 1 holds thelargest eigenvalue of A while λ(B) 1 holds the largest eigenvalue of B , andso on. Then, for any k ∈{1... M},λ(A − B) k ≤ λ(A) k ≤ λ(A +B) k (882)When B is rank-one positive semidefinite, the eigenvalues interlace; id est,for B=qq T ,λ(A) k−1 ≤ λ(A − qq T ) k ≤ λ(A) k ≤ λ(A + qq T ) k ≤ λ(A) k+1 (883)⋄A.3.1.0.4 Theorem. Positive (semi)definite principal submatrices. A.10• A∈ S M is positive definite if and only if any one principal submatrix ofdimension M −1 is positive definite and detA is positive.• A∈ S M is positive semidefinite if and only if all M principal submatricesof dimension M−1 are positive semidefinite and detA is nonnegative.• A ∈ S M is positive semidefinite if and only if each and every principalsubmatrix is positive semidefinite. [45, §6.1]Regardless of symmetry, if A∈ R M×M is positive (semi)definite, then thedeterminant of each and every principal submatrix is (nonnegative) positive.[146, §1.3.1]A.10 A recursive condition for positive (semi)definiteness, this theorem is a synthesis of factsfrom [28, §7.2] [26, §6.3] (confer [146, §1.3.1]). Principal submatrices are formed by discardingany subset of rows and columns having the same indices. There are M!/(1!(M − 1)!)principal 1 × 1 submatrices, M!/(2!(M − 2)!) principal 2 × 2 submatrices, and so on,totaling 2 M − 1 principal submatrices including A itself. By loading y in y T Ay with variouspatterns of ones and zeros, it follows that any principal submatrix must be positive(semi)definite whenever A is.⋄

A.3. PROPER STATEMENTS 307A.3.1.0.5Corollary. Positive (semi)definite symmetric products.• If A∈ S M is positive definite and any particular dimensionally compatiblematrix Z has no nullspace, then Z T AZ is positive definite.• If matrix A∈ S M is positive (semi)definite then, for any matrix Z ofcompatible dimension, Z T AZ is positive semidefinite.• A∈ S M is positive (semi)definite if and only if there exists a nonsingularZ such that Z T AZ is positive (semi)definite. [28, p.399]• If A ∈ S M is positive semidefinite and singular it remains possible, forsome Z ∈ R M×N with N < M , that Z T AZ becomes positive definite.[28, p.399] A.11 ⋄Given nonsingular matrix Z , it[is easy]to deduce from these: A ∈ S M isZTpositive (semi)definite if and only ifY T A[Z Y ] is positive semidefinitefor any particular compatible Y .Products such as Z † Z and ZZ † are symmetric and positive semidefinitealthough, given A ≽ 0, Z † AZ and ZAZ † are neither necessarily symmetricor positive semidefinite.A.3.1.0.6 Theorem. Symmetric projector semidefinite. [62, §III][55, §6] [147, p.55] For symmetric idempotent matrices P and R ,P,R ≽ 0P ≽ R ⇔ R(P ) ⊇ R(R) ⇔ N(P ) ⊆ N(R)(884)Projector P is never positive definite [27, §6.5, prob.20] unless it is theidentity matrix.⋄A.11 Using the interpretation in §E.6.4.2.1, this means that the coefficients of orthogonalprojection of vectorized A on a subset of extreme directions from S M + determined by Zcan be positive.

308 APPENDIX A. LINEAR ALGEBRAA.3.1.0.7 Theorem. Symmetric positive semidefinite.Given real matrix Ψ with rank Ψ = 1,Ψ ≽ 0 ⇔ Ψ = uu T (885)Proof. Any rank-one matrix must have the form Ψ = uv T . (§B.1)Suppose Ψ is symmetric; id est, v = u . For all y ∈ R M , y T uu T y ≥ 0.Conversely, suppose uv T is positive semidefinite. We know that can hold ifand only if uv T + vu T ≽ 0 ⇔ for all normalized y ∈ R M , 2y T uv T y ≥ 0;but that is possible only if v=u .The same does not hold true for matrices of higher rank, as the Example in§A.2.1.0.1 shows.⋄

A.4. SCHUR COMPLEMENT 309A.4 Schur complementConsider the block matrix G : Given A T = A and C T = C , then [3][ ] A BG =B T ≽ 0C⇔ A ≽ 0, B T (I −AA † ) = 0, C −B T A † B ≽ 0⇔ C ≽ 0, B(I −CC † ) = 0, A−BC † B T ≽ 0(886)where A † denotes the Moore-Penrose (pseudo)inverse (§E). In the first instance,I − AA † is a symmetric projection matrix orthogonally projectingon N(A T ). (1202) It is apparently required thatR(B) ⊥ N(A T ) (887)which precludes A = 0 when B is any nonzero matrix. Note that A ≻ 0 ⇒A † =A −1 ; thereby, the projection matrix vanishes. Likewise, in the secondinstance, I − CC † projects orthogonally on N(C T ). It is required thatR(B T ) ⊥ N(C T ) (888)which precludes C = 0 for B nonzero. Again, C ≻ 0 ⇒ C † = C −1 . So weget, for A and C nonsingular,[ ] A BG =B T ≽ 0C(889)⇔ A ≻ 0, C −B T A −1 B ≽ 0⇔ C ≻ 0, A−BC −1 B T ≽ 0When A is full-rank then, for all B of compatible dimension, R(B) is inR(A). Likewise, when C is full-rank, R(B T ) is in R(C). Thus the variation,[ ] A BG =B T ≻ 0C(890)⇔ A ≻ 0, C −B T A −1 B ≻ 0⇔ C ≻ 0, A−BC −1 B T ≻ 0where C − B T A −1 B is called the Schur complement of A in G , while theSchur complement of C in G is A − BC −1 B T . [148, §4.8]

310 APPENDIX A. LINEAR ALGEBRAThe origin of the term Schur complement is from complementary inertia:[50, §2.4.4] Defineinertia ( G∈ S M) ∆ = {p,z,n} (891)where p,z,n respectively represent the number of positive, zero, and negativeeigenvalues of G ; id est,Then, when C is invertible,and when A is invertible,M = p + z + n (892)inertia(G) = inertia(C) + inertia(A − BC −1 B T ) (893)inertia(G) = inertia(A) + inertia(C − B T A −1 B) (894)When A = C = 0, denoting by σ(B) ∈ R m the non-increasingly orderedsingular values of matrix B ∈ R m×m , then we have the eigenvalues[39, §1.2, prob.17]([ ]) [ ]0 B σ(B)λ(G) = λB T =(895)0 −Ξσ(B)where Ξ is the order-reversing permutation matrix defined in (1046), andwe haveinertia(G) = inertia(B T B) + inertia(−B T B) (896)A.4.0.0.1 Example. Rank of partitioned matrices. [28, §0.4.6(c), §7.7.6]When symmetric matrix A is invertible and C is symmetric,[ ] [ ]A B A 0rankB T = rankC 0 T C −B T A −1 B⇔[ A B∃ nonsingular X,Y XB T C] [ A 0Y =0 T C −B T A −1 B](897)Proof. LetY = X T =[ I −A −1 B0 T I] (898)

A.4. SCHUR COMPLEMENT 311Similarly, when symmetric matrix C is invertible and A is symmetric,[ ] [ ]A B A − BCrankB T = rank−1 B T 0C0 T (899)CSetting matrix A to the identity simplifies the Schur conditions; one consequencerelates the definiteness of three quantities:[ ][ ]I 0I B0 T C −B T ≽ 0 ⇔ C − B T B ≽ 0 ⇔BB T ≽ 0 (900)CA.4.1Semidefinite program via SchurThe Schur complement (886) can be used to convert a projection problemto an optimization problem in epigraph form. Suppose, for example, we arepresented with the constrained projection problem studied by Wells in [97](who gives an analytical solution): Given A ∈ R M×M and some full-ranknonzero matrix S ∈ R M×L with L < M ,minimize ‖A − X‖ 2X∈ S MFsubject to S T XS ≽ 0(901)Variable X is constrained to be positive semidefinite, but only on a subspacedetermined by S . First we write the epigraph form (§3.1.1.2):minimize tX∈ S M , t∈Rsubject to ‖A − X‖ F ≤ tS T XS ≽ 0(902)Next we use the Schur complement [6, §6.4.3] [149] and matrix vectorization(§2.1.1):minimize tX∈ S M , t∈R[]tI vec(A − X)subject tovec T ≽ 0 (903)(A − X) tS T XS ≽ 0This semidefinite program is an epigraph form in disguise, equivalent to (901).

312 APPENDIX A. LINEAR ALGEBRAA.4.2Determinant[ ] A BG =B T C(904)We consider again a matrix G partitioned similarly to (886), but not necessarilypositive (semi)definite, where A,C ∈ S M .• When A is invertible,When C is invertible,detG = detA det(C − B T A −1 B) (905)detG = detC det(A − BC −1 B T ) (906)• When B is full-rank and skinny, C = 0, and A ≽ 0, then [1, §10.1.1]detG ≠ 0 ⇔ A + BB T ≻ 0 (907)When B is a (column) vector, then for all C ∈ R and all A of dimensioncompatible with G,detG = det(A)C − B T A T cofB (908)while for C ≠ 0,detG = C det(A − 1 C BBT ) (909)• When B is full-rank and fat, A = 0, and C ≽ 0, thendetG ≠ 0 ⇔ C + B T B ≻ 0 (910)When B is a row vector, then for A≠0 and all C of dimension compatiblewith G,while for all A∈ R ,detG = A det(C − 1 A BT B) (911)detG = det(C)A − BC T cofB T (912)where A cof and C cof are matrices of cofactors [26, §4] (the adjugates) respectivelycorresponding to A and C .

A.5. EIGEN-DECOMPOSITION 313A.5 Eigen-decompositionWhen a matrix X ∈ R m×m is diagonalizable, [26, §5.6] then⎡ ⎤w T1 m∑X = SΛS −1 = [s 1 · · · s m ] Λ⎣. ⎦ = λ i s i wi T (913)where s i ∈ C m are linearly independent (right-)eigenvectors constituting thecolumns of S ∈ C m×m defined byw T mi=1XS = SΛ (914)w T i ∈ C m are linearly independent left-eigenvectors of X constituting the rowsof S −1 defined by [28]S −1 X = ΛS −1 (915)and where λ i ∈ C are eigenvalues (in diagonal matrix Λ ∈ C m×m ) correspondingto both left and right eigenvectors.There is no connection between diagonalizability and invertibility of X .[26, §5.2] Diagonalizability is guaranteed by a full set of linearly independenteigenvectors, whereas invertibility is guaranteed by all nonzero eigenvalues.distinct eigenvalues ⇒ diagonalizablenot diagonalizable ⇒ repeated eigenvalue(916)A.5.0.0.1 Theorem. Real eigenvector. Eigenvectors of a real matrixcorresponding to real eigenvalues must be real.Proof. Ax = λx . Given λ = λ ∗ , x H Ax = λx H x = λ‖x‖ 2 = x T Ax ∗x = x ∗ , where x H =x ∗T . The converse is equally simple. ⇒A.5.0.1UniquenessFrom the fundamental theorem of algebra it follows: Eigenvalues, includingtheir multiplicity, for a given matrix are unique.When eigenvectors are unique, we mean their directions are unique towithin a linear real scaling. Eigenvectors corresponding to a repeated eigenvalueof a diagonalizable matrix are not unique. [99]diagonalizable, repeated eigenvalue ⇒ eigenvectors not unique (917)

314 APPENDIX A. LINEAR ALGEBRAThe proof follows from the observation that any linear combination of distincteigenvectors, corresponding to a particular eigenvalue, produces anothereigenvector.A.5.1EigenmatrixThe (right-)eigenvectors {s i } are naturally orthogonal to the left-eigenvectors{w i } except, for i = 1... m , w T i s i = 1 ; called a biorthogonality condition[43, §2.2.4] [28] because neither set of left or right eigenvectors is necessarilyan orthogonal set. Consequently, each dyad from a diagonalization is anindependent (§B.1.1) nonorthogonal projector becauses i w T i s i w T i = s i w T i (918)(whereas the dyads of singular value decomposition are not inherently projectors(confer(922))).The dyads of eigen-decomposition can be termed eigenmatrices becauseA.5.2Symmetric diagonalizationX s i w T i = λ i s i w T i (919)The set of normal matrices is, precisely, that set of all real matrices havinga complete orthonormal set of eigenvectors; [45, §8.1] [27, prob.10.2.31] e.g.,orthogonal and circulant matrices [81]. All normal matrices are diagonalizable.A symmetric matrix is a special normal matrix, whose eigenvaluesmust be real and whose eigenvectors can be chosen to make a real orthonormalset; id est, for X ∈ S m ,⎡ ⎤s T1 m∑X = SΛS T = [ s 1 · · · s m ] Λ⎣. ⎦ = λ i s i s T i (920)where δ 2 (Λ) = Λ∈ R m×m (§A.1) and S −1 = S T ∈ R m×m (§B.5). Because thearrangement of eigenvectors and their corresponding eigenvalues is arbitrary,we almost always arrange the eigenvalues in nonincreasing order as is theconvention for singular value decomposition.When X ∈ S m + , its unique matrix square root is defined√X ∆ = S √ ΛS T ∈ S m + (921)s T mi=1

A.6. SINGULAR VALUE DECOMPOSITION, SVD 315where the square root of the nonnegative diagonal matrix √ Λ is taken entrywise.Then X = √ X √ X .A.6 Singular value decomposition, SVDA.6.1Compact SVDFor any A∈ R m×n ,⎡q T1A = UΣQ T = [ u 1 · · · u η ] Σ⎣.⎤∑⎦ = η σ i u i qiTqηT i=1 (922)U ∈ R m×η , Σ ∈ R η×η , Q ∈ R n×ηwhere U and Q are always skinny-or-square each having orthonormalcolumns, and whereη ∆ = min{m,n} (923)Square matrix Σ is diagonal (§A.1)δ 2 (Σ) = Σ ∈ R η×η (924)holding the singular values σ i of A which are always arranged in nonincreasingorder by convention and are related to eigenvalues by A.12⎧ √⎨ λ(AT A) i = √ (√ )λ(AA T ) i = λ AT A)= λ(√AAT> 0, i = 1... ρσ(A) i =ii⎩0, i = ρ+1... η(925)of which the last η −ρ are 0 , A.13 whereρ ∆ = rankA = rank Σ (926)A point sometimes lost is that any real matrix may be decomposed in termsof its real singular values σ(A)∈ R η and real matrices U and Q as in (922),where [44, §2.5.3]A.12 When A is normal, σ(A) = |λ(A)|. [45, §8.1]A.13 For η = n, σ(A) = √ )λ(A T A) = λ(√AT A where λ denotes eigenvalues. For η = m,σ(A) = √ )λ(AA T ) = λ(√AAT.

316 APPENDIX A. LINEAR ALGEBRAR{u i |σ i ≠0} = R(A)R{u i |σ i =0} ⊆ N(A T )R{q i |σ i ≠0} = R(A T )R{q i |σ i =0} ⊆ N(A)(927)A.6.2Subcompact SVDSome authors allow only nonzero singular values. In that case the compactdecomposition can be made smaller; it can be redimensioned in terms of rankρ because, for any A∈ R m×n ,ρ = rankA = rank Σ = max {i∈{1... η} | σ i ≠ 0} ≤ η (928)rank is equivalent to the number of nonzero singular values, as is well known.Now⎡q T1 ⎤∑A = UΣQ T = [ u 1 · · · u ρ ] Σ⎣. ⎦ = ρ σ i u i qiTqρT i=1 (929)U ∈ R m×ρ , Σ ∈ R ρ×ρ , Q ∈ R n×ρwhere the main diagonal of diagonal matrix Σ has no 0 entries, andR{u i } = R(A)R{q i } = R(A T )(930)A.6.3Full SVDAnother common and useful expression of the SVD makes U and Q square;making the decomposition larger than compact SVD. Completing the nullspacebases in U and Q from (927) provides what is called the full singularvalue decomposition of A∈ R m×n [26, App.A]. Orthonormal matrices U andQ become orthogonal matrices (§B.5):R{u i |σ i ≠0} = R(A)R{u i |σ i =0} = N(A T )R{q i |σ i ≠0} = R(A T )R{q i |σ i =0} = N(A)(931)

A.6. SINGULAR VALUE DECOMPOSITION, SVD 317For any matrix A with rank ρ (= rank Σ) ,⎡q T1A = UΣQ T = [u 1 · · · u m ] Σ⎣.q T n⎤∑⎦ = η σ i u i qiT⎡σ 1= [ m×ρ basis R(A) m×m−ρ basis N(A T ) ] σ 2⎢⎣...i=1⎤⎡⎥⎣⎦(n×ρ basis R(A T ) ) ⎤T⎦(n×n−ρ basis N(A)) TU ∈ R m×m , Σ ∈ R m×n , Q ∈ R n×n (932)where the upper limit of summation η is defined in (923). Matrix Σ is nolonger necessarily square, now padded with respect to (924) by m−η zerorows or n−η zero columns; the nonincreasingly ordered (possibly 0) singularvalues appear along its main diagonal as for compact SVD (925).A.6.4SVD of symmetric matricesDefinition. Step function. Define the signum-like entry-wise vectorvaluedfunction ψ : R n → R n that takes the value 1 corresponding to givenargument entry 0. For a∈ R ,{ψ(a) = ∆ x 1, a ≥ 0limx→a |x| = −1, a < 0(933)Were argument a instead a vector, then |x| denotes absolute value of eachindividual entry while the division is also entry-wise.△Signs of the real eigenvalues of a symmetric matrix, for example, can beabsorbed into either real U or Q from the SVD; [150, p.34]A = U|Λ|δ(ψ(δ(Λ)))U T = Qδ(ψ(δ(Λ))) |Λ|Q T ∈ S n (934)where |Λ| denotes entry-wise absolute value of diagonal matrix Λ . Asymmetric but complex SVD of a (real) symmetric matrix is given in§C.2.1.2.1.

318 APPENDIX A. LINEAR ALGEBRAA.6.5Pseudoinverse by SVDThe matrix pseudoinverse (§E) is nearly synonymous with singular valuedecomposition because of the elegant expression,A † = QΣ †T U T ∈ R n×m (935)where Σ † simply inverts the nonzero entries of matrix Σ .Given symmetric matrix A and its diagonalization, then its pseudoinversesimply inverts the nonzero eigenvalues; for A∈ S n ,A = QΛQ T , A † = QΛ † Q T (936)A.7 ZerosA.7.10 entryIf a positive semidefinite matrix A = [A ij ] ∈ R n×n has a 0 entry A ii on itsmain diagonal, then A ij + A ji = 0 ∀j . [146, §1.3.1]Any symmetric positive semidefinite matrix having a 0 entry on its maindiagonal must be 0 along the entire row and column to which that 0 entrybelongs. [44, §4.2.8] [28, §7.1, prob.2]A.7.2 0 eigenvalues theoremA.7.2.0.1 Theorem. Number of 0 eigenvalues.For any matrix A∈ R m×n ,rank(A) + dim N(A) = n (937)by conservation of dimension. [28, §0.4.4]For any square matrix A∈ R m×m , the number of 0 eigenvalues is at leastequal to dim N(A) ,dim N(A) ≤ number of 0 eigenvalues ≤ m (938)while the eigenvectors corresponding to those 0 eigenvalues belong to N(A).[26, §5.1] A.14A.14 We take as given the well-known fact that the number of 0 eigenvalues cannot be less

A.7. ZEROS 319For diagonalizable matrix A (§A.5), the number of 0 eigenvalues is preciselydim N(A) while the corresponding eigenvectors span N(A). The realand imaginary parts of the eigenvectors remaining span R(A).TRANSPOSE.Likewise, for any matrix A∈ R m×n ,rank(A T ) + dim N(A T ) = m (939)For any square A∈ R m×m , the number of 0 eigenvalues is at least equalto dim N(A T ) = dim N(A) while the left-eigenvectors (eigenvectors of A T )corresponding to those 0 eigenvalues belong to N(A T ).For diagonalizable A , the number of 0 eigenvalues is preciselydim N(A T ) while the corresponding left-eigenvectors span N(A T ). The realand imaginary parts of the left-eigenvectors remaining span R(A T ).⋄Proof. First we show, for a diagonalizable matrix, the number of 0eigenvalues is precisely the dimension of its nullspace while the eigenvectorscorresponding to those 0 eigenvalues span the nullspace:Any diagonalizable matrix A∈ R m×m must possess a complete set oflinearly independent eigenvectors. If A is full-rank (invertible), then allm=rank(A) eigenvalues are nonzero. [26, §5.1]Suppose rank(A) < m . Then dim N(A) = m−rank(A). Thus there isa set of m−rank(A) linearly independent vectors spanning N(A). Each ofthose can be an eigenvector associated with a 0 eigenvalue because A is diagonalizable⇔ ∃ m linearly independent eigenvectors. [26, §5.2] Eigenvectorsof a real matrix corresponding to 0 eigenvalues must be real. A.15 Thus A hasat least m−rank(A) eigenvalues equal to 0.than the dimension of the nullspace. We offer an example of the converse:⎡ ⎤1 0 1 0A = ⎢ 0 0 1 0⎥⎣ 0 0 0 0 ⎦1 0 0 0dim N(A) = 2, λ(A) = [0 0 0 1] T ; three eigenvectors in the nullspace but only two areindependent. The right-hand side of (938) is tight for nonzero matrices; e.g., (§B.1) dyaduv T ∈ R m×m has m 0-eigenvalues when u ∈ v ⊥ .A.15 Let ∗ denote complex conjugation. Suppose A=A ∗ and As i = 0. Then s i = s ∗ i ⇒As i =As ∗ i ⇒ As ∗ i =0. Conversely, As∗ i =0 ⇒ As i=As ∗ i ⇒ s i= s ∗ i .

320 APPENDIX A. LINEAR ALGEBRANow suppose A has more than m−rank(A) eigenvalues equal to 0. Thenthere are more than m−rank(A) linearly independent eigenvectors associatedwith 0 eigenvalues, and each of those eigenvectors must be in N(A). Thusthere are more than m−rank(A) linearly independent vectors in N(A) ; acontradiction.Therefore diagonalizable A has rank(A) nonzero eigenvalues and exactlym−rank(A) eigenvalues equal to 0 whose corresponding eigenvectors spanN(A).By similar argument, the left-eigenvectors corresponding to 0 eigenvaluesspan N(A T ).Next we show when A is diagonalizable, the real and imaginary parts ofits eigenvectors (corresponding to nonzero eigenvalues) span R(A):The right-eigenvectors of a diagonalizable matrix A∈ R m×m are linearlyindependent if and only if the left-eigenvectors are. So, matrix A has a representationin terms of its right and left-eigenvectors; from the diagonalization(913), assuming 0 eigenvalues are ordered last,A =m∑λ i s i wi T =i=1k∑≤ mi=1λ i ≠0λ i s i w T i (940)From the linearly independent dyads theorem (§B.1.1.0.1), the dyads {s i w T i }must be independent because each set of eigenvectors are; hence rankA=k ,the number of nonzero eigenvalues. Complex eigenvectors and eigenvaluesare common for real matrices, and must come in complex conjugate pairs forthe summation to remain real. Assume that conjugate pairs of eigenvaluesappear in sequence. Given any particular conjugate pair from (940), we getthe partial summationλ i s i w T i + λ ∗ i s ∗ iw ∗Ti = 2Re(λ i s i w T i )= 2 ( Re s i Re(λ i w T i ) − Im s i Im(λ i w T i ) ) (941)where A.16 λ ∗ iwrittenA = 2 ∑ iλ ∈ Cλ i ≠0∆=λ i+1 , s ∗ ∆i = s i+1 , and wi∗∆= w i+1 . Then (940) is equivalentlyRe s 2i Re(λ 2i w T 2i) − Im s 2i Im(λ 2i w T 2i) + ∑ jλ ∈ Rλ j ≠0λ j s j w T j (942)A.16 The complex conjugate of w is denoted w ∗ , while its conjugate transpose is denotedby w H = w ∗T .

A.7. ZEROS 321The summation (942) shows that A is a linear combination of real and imaginaryparts of its right-eigenvectors corresponding to nonzero eigenvalues.Therefore, the k vectors {Re s i , Im s i | λ i ≠ 0, i∈{1... m}} must span therange of diagonalizable matrix A .The argument is similar regarding the span of the left-eigenvectors.A.7.30 trace and productFor X,A∈ S M + [7, §2.6.1, exer.2.8] [69, §3.1],tr(XA) = 0 ⇔ XA = AX = 0 (943)Symmetric matrices A and X are simultaneously diagonalizable because theycommute; (856) [28, p.50] id est, they share a complete set of eigenvectors.Proof. (⇐=) Suppose XA = AX = 0. Then tr(XA)=0 is obvious.(=⇒) Suppose tr(XA) = 0. tr(XA) = tr(A 1/2 XA 1/2 ) whose argument ispositive semidefinite by Corollary A.3.1.0.5. Trace of any square matrix isequivalent to the sum of its eigenvalues. Eigenvalues of a positive semidefinitematrix can total 0 if and only if each and every nonnegative eigenvalue is0. The only feasible positive semidefinite matrix, having all 0 eigenvalues,resides at the origin; id est,A 1/2 XA 1/2 = ( X 1/2 A 1/2) TX 1/2 A 1/2 = 0 (944)which in turn implies XA=0. A similar argument shows AX = 0. A.7.4Zero definiteThe domain over which an arbitrary real matrix A is zero definite can exceedits left and right nullspaces. The positive semidefinite matrix[ ] 1 2A =(945)0 1for example, has no nullspace. Yet{x | x T Ax = 0} = {x | 1 T x = 0} ⊂ R 2 (946)

322 APPENDIX A. LINEAR ALGEBRAwhich is the nullspace of the symmetrized matrix. Symmetric matrices arenot spared from the excess; videlicet,[ ] 1 2B =(947)2 1has eigenvalues {−1, 3}, no nullspace, but is zero definite on A.17X = {x∈ R 2 | x 2 = (−2 ± √ 3)x 1 } (948)For any positive semidefinite matrix A∈ R M×M , in particular,{x | x T Ax = 0, A +A T ≽ 0} = N(A +A T ) (949)because ∃R A+A T =R T R , ‖Rx‖=0 ⇔ Rx=0, and N(A+A T )= N(R).For a positive definite matrix A ,Further, [45, §3.2, prob.5]{x | x T Ax = 0, A +A T ≻ 0} = 0 (950){x | x T Ax = 0} = R M ⇔ A T = −A (951)while{x | x H Ax = 0} = C M ⇔ A = 0 (952)A.7.4.0.1 Lemma. Dyad-decompositions... [151]Given symmetric matrix A ∈ S M , let X ∈ S M + be a symmetric positivesemidefinite matrix having rank ρ such that 〈A, X〉 = 0. Then there is adyad-decomposition of Xρ∑X = x j x T j (953)satisfyingProof...j=1〈A, x j x T j 〉 = 〈A, X〉 = 0 for all j (954)A.17 These two lines represent the limit in the union of two generally distinct hyperbolae;id est,lim{x∈ R 2 | x T Bx = ε} = Xε↓0

A.7. ZEROS 323A.7.4.0.2 Example. Dyad.The dyad uv T ∈ R M×M (§B.1) is zero definite on all x for which eitherx T u=0 or x T v=0;{x | x T uv T x = 0} = {x | x T u=0} ∪ {x | v T x=0} (955)id est, on u ⊥ ∪ v ⊥ . Symmetrizing the dyad does not change the outcome:{x | x T (uv T + vu T )x/2 = 0} = {x | x T u=0} ∪ {x | v T x=0} (956)

324 APPENDIX A. LINEAR ALGEBRA

Appendix BSimple matricesMathematicians also attempted to develop algebra of vectors butthere was no natural definition of the product of two vectorsthat held in arbitrary dimensions. The first vector algebra thatinvolved a noncommutative vector product (that is, v×w need notequal w×v) was proposed by Hermann Grassmann in his bookAusdehnungslehre (1844). Grassmann’s text also introduced theproduct of a column matrix and a row matrix, which resulted inwhat is now called a simple or a rank-one matrix. In the late19th century the American mathematical physicist Willard Gibbspublished his famous treatise on vector analysis. In that treatiseGibbs represented general matrices, which he called dyadics, assums of simple matrices, which Gibbs called dyads. Later thephysicist P. A. M. Dirac introduced the term “bra-ket” for whatwe now call the scalar product of a “bra” (row) vector times a“ket” (column) vector and the term “ket-bra” for the product of aket times a bra, resulting in what we now call a simple matrix, asabove. Our convention of identifying column matrices and vectorswas introduced by physicists in the 20th century.−Marie A. Vitulli, [130]B.0 c○ 2001 Jon Dattorro, all rights reserved.325

326 APPENDIX B. SIMPLE MATRICESB.1 Rank-one matrix (dyad)Any matrix formed from the unsigned outer product of two vectors,Ψ = uv T ∈ R M×N (957)where u ∈ R M and v ∈ R N , is rank-one and called a dyad. Conversely, anyrank-one matrix must have the form Ψ . [28, prob.1.4.1] The product −uv Tis a negative dyad. For matrix products AB T , in general, we haveR(AB T ) ⊆ R(A), N(AB T ) ⊇ N(B T ) (958)with equality when B = A [26, §3.3, §3.6] B.1 or respectively when B is invertibleand N(A)=0. Yet for all nonzero dyads we haveR(uv T ) = R(u), N(uv T ) = N(v T ) ≡ v ⊥ (959)where dim v ⊥ =N −1.It is obvious a dyad can be 0 only when u or v is 0;Ψ = uv T = 0 ⇔ u = 0 or v = 0 (960)The matrix 2-norm for Ψ is equivalent to the Frobenius norm;‖Ψ‖ = ‖uv T ‖ F = ‖uv T ‖ 2 = ‖u‖ ‖v‖ (961)When u and v are normalized, the pseudoinverse is the transposed dyad.Otherwise,Ψ † = (uv T ) † vu T=(962)‖u‖ 2 ‖v‖ 2When dyad uv T ∈R N×N is square, uv T has at least N −1 0-eigenvaluesand corresponding eigenvectors spanning v ⊥ . The remaining eigenvector uspans the range of uv T with corresponding eigenvalueB.1 Proof. R(AA T ) ⊆ R(A) is obvious.λ = v T u = tr(uv T ) ∈ R (963)R(AA T ) = {AA T y | y ∈ R m }⊇ {AA T y | A T y ∈ R(A T )} = R(A) by (90).

B.1. RANK-ONE MATRIX (DYAD) 327R(v)0 0 R(Ψ) = R(u)N(Ψ)= N(v T )N(u T )R N = R(v) ⊕ N(uv T )N(u T ) ⊕ R(uv T ) = R MFigure B.1: The four fundamental subspaces [27, §3.6] of any dyadΨ = uv T ∈R M×N . Ψ(x) = uv T x is a linear mapping from R N to R M . Themap from R(v) to R(u) is one-to-one and onto (bijective). [26, §3.1]The determinant is the product of the eigenvalues; so, it is always true thatdet Ψ = det(uv T ) = 0 (964)When λ = 1, the square dyad is a nonorthogonal projector on its range(Ψ 2 =Ψ , §E.1). It is quite possible that u∈v ⊥ making the remaining eigenvalueinstead 0 ; B.2 λ=0 together with the first N −1 0-eigenvalues; id est,it is possible uv T were nonzero while all its eigenvalues are 0. The matrix[ 1−1] [ 1 1]=[ 1 1−1 −1](965)for example, has two 0-eigenvalues. In other words, the value of eigenvectoru may simultaneously be a member of the nullspace and range of the dyad.The explanation is, simply, because u and v share the same dimension,dimu = M = dimv = N :Figure B.1 shows the four fundamental subspaces for the dyad. Linearoperator Ψ(x) : R N → R M is a mapping between vector spaces that remaindistinct when M =N ;u ∈ R(uv T )u ∈ N(uv T ) ⇔ v T u = 0R(uv T ) ∩ N(uv T ) = ∅(966)B.2 The dyad is not always diagonalizable (§A.5) because the eigenvectors are not necessarilyindependent.

328 APPENDIX B. SIMPLE MATRICESB.1.0.1rank-one modificationIf A is any nonsingular compatible matrix and 1 + v T A −1 u ≠ 0, then[152, App.6] [45, §2.3, prob.16] [148, §4.11.2] (Sherman-Morrison)(A + uv T ) −1 = A −1 − A−1 uv T A −11 + v T A −1 u(967)B.1.0.2dyad symmetryIn the specific circumstance that v = u , then uu T ∈ R N×N is symmetric,rank-one, and positive semidefinite having exactly N −1 0-eigenvalues. Infact, (§A.3.1.0.7)uv T ≽ 0 ⇔ v = u (968)and the remaining eigenvalue is almost always positive;λ = u T u = tr(uu T ) > 0 unless u=0 (969)The matrix [ Ψ uu T 1for example, is rank 1 positive semidefinite if and only if Ψ = uu T .](970)B.1.1Dyad independenceNow we consider a sum of dyads like (957) as encountered in diagonalizationand singular value decomposition:( k∑)k∑R s i wiT = R ( )k∑s i wiT = R(s i ) ⇐ w i ∀i are l.i. (971)i=1i=1i=1range of the summation is the vector sum of ranges. B.3 (Theorem B.1.1.1.1)Under the assumption the dyads are linearly independent (l.i.), then thevector sums are unique (p.447): ∀i, w i l.i. and s i l.i.,( k∑)R s i wiT = R ( ) ( )s 1 w1 T ⊕ ... ⊕ R sk wk T = R(s1 ) ⊕ ... ⊕ R(s k ) (972)i=1B.3 Movement of range R inside the summation depends on linear independence of {w i }.

B.1. RANK-ONE MATRIX (DYAD) 329B.1.1.0.1 Definition. Linearly independent dyads. [153, p.29, thm.11][154, p.2] The set of k dyads{si wi T | i=1... k } (973)where s i ∈C M and w i ∈C N , is said to be linearly independent iff()k∑rank SW T =∆ s i wiT = k (974)i=1where S ∆ = [s 1 · · · s k ] ∈ C M×k and W ∆ = [w 1 · · · w k ] ∈ C N×k .△As defined, dyad independence does not preclude existence of a nullspaceN(SW T ) , nor does it imply SW T is full-rank. In the absence of an assumptionof independence, generally, rankSW T ≤ k . Conversely, any rank-kmatrix can be written in the form SW T by singular value decomposition.(§A.6)Theorem. Linearly independent dyads. The vectors{si ∈ C M , i=1... k } are linearly independent and the vectors{wi ∈ C N , i=1... k } are linearly independent if and only if the dyads{si w T i , i=1... k } are linearly independent. ⋄Proof. Linear independence of k dyads is identical to definition (974).(=⇒) Suppose {s i } and {w i } are each linearly independent sets. InvokingSylvester’s rank inequality, [28, §0.4] [45, §2.4]rankS+rankW− k ≤ rank(SW T ) ≤ min{rankS , rankW } (≤ k) (975)Then k ≤rank(SW T )≤k that implies the dyads are independent.(⇐=) Conversely, suppose rank(SW T )=k . Thenk ≤ min{rankS , rankW } ≤ k (976)implying the vector sets are each independent.B.1.1.1Biorthogonality condition, Range, NullspaceDyads characterized by a biorthogonality condition W T S = I are independent;id est, for S ∈C M×k and W ∈ C N×k , if W T S =I then rank(SW T )=k

330 APPENDIX B. SIMPLE MATRICESby the linearly independent dyads theorem because (e.g., §E.1.1)W T S =I ⇔ rankS=rankW =k≤M =N (977)To see that, we need only show: N(S)=0 ⇔ ∃ B BS =I . B.4(⇐=) Assume BS =I . Then N(BS)=0={x | BSx = 0} ⊇ N(S). (958)(=⇒) If N(S)=0 [ then ] S must be full-rank skinny-or-square.B∴ ∃ A,B,C [ S A] = I (id est, [ S A] is invertible) ⇒ BS =I .CLeft inverse B is given as W T here. Because of reciprocity with S , itimmediately follows: N(W)=0 ⇔ ∃ S S T W =I . Dyads produced by diagonalization, for example, are independent becauseof their inherent biorthogonality. (§A.5.1) The converse is generally false;id est, linearly independent dyads are not necessarily biorthogonal.B.1.1.1.1 Theorem. Nullspace and range of dyad sum.Given a sum of dyads represented by SW T where S ∈C M×k and W ∈ C N×k ,N(SW T ) = N(W T ) ⇐ ∃ B BS = IR(SW T ) = R(S) ⇐ ∃ Z W T Z = I(978)⋄Proof. (=⇒) N(SW T )⊇ N(W T ) and R(SW T )⊆ R(S) are obvious.(⇐=) Assume the existence of a left inverse B ∈ R k×N and a right inverseZ ∈ R N×k . B.5N(SW T ) = {x | SW T x = 0} ⊆ {x | BSW T x = 0} = N(W T ) (979)R(SW T ) = {SW T x | x∈R N } ⊇ {SW T Zy | Zy ∈ R N } = R(S) (980)Under the biorthogonality condition W T S = I of diagonalization,for example, nullspace N(SW T ) must be equivalent to N(W T ) andR(SW T )≡ R(S).B.4 Left inverse is not unique, in general.B.5 By counter example, the converse cannot be true; e.g., S = W = [1 0].

B.2. DOUBLET 331R([u v ])R(Π)= R([u v])0 0 N(Π)=v ⊥ ∩ u ⊥([ ]) vR N T= R([u v ]) ⊕ Nu Tv ⊥ ∩ u ⊥([ ]) vTNu T ⊕ R([u v ]) = R NFigure B.2: Four fundamental subspaces [27, §3.6] of a doubletΠ = uv T + vu T ∈ S N . Π(x) = (uv T + vu T )x is a linear bijective mappingfrom R([u v ]) to R([u v ]).B.2 DoubletConsider a sum of two linearly independent square dyads, one a transpositionof the other:Π = uv T + vu T = [ u v ] [ ]v Tu T = SW T ∈ S N (981)where u,v ∈ R N . Like the dyad, a doublet can be 0 only when u or v is 0;Π = uv T + vu T = 0 ⇔ u = 0 or v = 0 (982)By assumption of independence, a nonzero doublet has two nonzero eigenvaluesλ 1 ∆ = u T v + ‖uv T ‖ , λ 2 ∆ = u T v − ‖uv T ‖ (983)where λ 1 > 0 >λ 2 , with corresponding eigenvectorsx 1 ∆ = u‖u‖ + v‖v‖ , x 2∆= u‖u‖ −v‖v‖(984)spanning the doublet range. Eigenvalue λ 1 cannot be 0 unless u and v haveopposing directions, but that is antithetical since then the dyads would nolonger be independent. Eigenvalue λ 2 is 0 if and only if u and v share thesame direction, again antithetical. Generally we have λ 1 > 0 and λ 2 < 0, soΠ is indefinite.

332 APPENDIX B. SIMPLE MATRICESN(u T )R(E) = N(v T )0 0 N(E)=R(u)R(v)R N = N(u T ) ⊕ N(E)R(v) ⊕ R(E) = R NFigure B.3: v T u = 1/ζ . The four fundamental subspaces)[27, §3.6] of elementarymatrix E as a linear mapping E(x)=(I − uvT x .v T uBy the nullspace and range of dyad sum theorem, doublet Π hasN −2 ([ 0-eigenvalues ]) remaining and corresponding eigenvectors spanningvTNu T . We therefore haveR(Π) = R([u v ]), N(Π) = v ⊥ ∩ u ⊥ (985)of respective dimension 2 and N −2.B.3 Elementary matrixA matrix of the formE = I − ζuv T ∈ R N×N (986)where ζ ∈ R is finite and u,v ∈ R N , is called an elementary matrix or arank-one modification of the identity. [155] Any elementary matrix in R N×Nhas N −1 eigenvalues equal to 1 corresponding to real eigenvectors thatspan v ⊥ . The remaining eigenvalueλ = 1 − ζv T u (987)corresponds to eigenvector u . B.6 From [152, App.7.A.26] the determinant:detE = 1 − tr ( ζuv T) = λ (988)B.6 Elementary matrix E is not always diagonalizable because eigenvector u need not beindependent of the others; id est, u∈v ⊥ is possible.

B.3. ELEMENTARY MATRIX 333If λ ≠ 0 then E is invertible; [148]E −1 = I + ζ λ uvT (989)Eigenvectors corresponding to 0 eigenvalues belong to N(E), and thenumber of 0 eigenvalues must be at least dim N(E) that, here, can beat most one. (§A.7.2.0.1) The nullspace exists, therefore, when λ=0;id est, when v T u=1/ζ , rather, whenever u belongs to the hyperplane{z ∈R N | v T z=1/ζ}. Then (when λ=0) elementary matrix E is a nonorthogonalprojector on its range (E 2 =E , §E.1) and N(E)= R(u) ; eigenvectoru spans the nullspace when it exists. By conservation of dimension,dim R(E)=N −dim N(E). It is apparent from (986) that v ⊥ ⊆ R(E) ,but dimv ⊥ =N −1. Hence R(E)≡v ⊥ when the nullspace exists, and theremaining eigenvectors span it.In summary, when a nontrivial nullspace of E exists,R(E) = N(v T ), N(E) = R(u), v T u = 1/ζ (990)illustrated in Figure B.3, which is opposite to the assignment of subspacesfor a dyad (Figure B.1). Otherwise, R(E)= R N .When E =E T , the spectral norm is‖E‖ 2 = max{1, |λ|} (991)B.3.1Householder matrixAn elementary matrix E is called a Householder matrix H when, for nonzerovector u , E has the defining form [44, §5.1.2] [148, §4.10.1] [26, §7.3][28, §2.2]H = I − 2 uuTu T u ∈ SN (992)which is a symmetric orthogonal (reflection) matrix (H −1 =H T =H (§B.5)).Vector u is normal to an N −1-dimensional subspace u ⊥ through which thisparticular H effects point-wise reflection; e.g., Hu ⊥ =u ⊥ while Hu = −u .Matrix H has N − 1 orthonormal eigenvectors spanning that reflectingsubspace u ⊥ , with corresponding eigenvalues equal to 1. The remainingeigenvector u has corresponding eigenvalue −1. Due to symmetry of H ,

334 APPENDIX B. SIMPLE MATRICESthe matrix 2-norm (the spectral norm) is equal to the largest eigenvaluemagnitude.A Householder matrix is thus characterized,H T = H , H −1 = H T , ‖H‖ 2 = 1, H 0 (993)For example, the permutation matrix⎡ ⎤1 0 0Ξ = ⎣ 0 0 1 ⎦ (994)0 1 0is a Householder matrix having u = [ 0 1 −1 ] T / √ 2 . Not all permutationmatrices are Householder matrices, although all permutation matricesare orthogonal matrices. [26, §3.4] Neither are all symmetric permutationmatrices Householder matrices; e.g.,is not a Householder matrix.Ξ =⎡⎢⎣0 0 0 10 0 1 00 1 0 01 0 0 0B.4 Auxiliary V -matricesB.4.1Auxiliary matrix V⎤⎥⎦ (995)It is convenient to define a matrix V that arises naturally as a consequence oftranslating the geometric center α g (§4.5.1.0.1) of some list X to the origin.In place of X − α g 1 T we may write XV as in (391) whereV ∆ = I − 1 N 11T ∈ S N (996)is an elementary matrix called the geometric centering matrix.Any elementary matrix in R N×N has N−1 eigenvalues equal to 1. For theparticular elementary matrix V , the N th eigenvalue equals 0. The numberof 0 eigenvalues must equal dim N(V ) = 1, by the 0 eigenvalues theorem(§A.7.2.0.1), because V =V T is diagonalizable. BecauseV 1 = 0 (997)

B.4. AUXILIARY V -MATRICES 335the nullspace N(V )=R(1) is spanned by the eigenvector 1. The remainingeigenvectors span R(V ) ≡ 1 ⊥ = N(1 T ).BecauseV 2 = V (998)and V T = V , the elementary matrix V is also a projection matrix (§E.3) projectingorthogonally on its range R(V ) = N(1 T ) that has dimension N −1.V = I − 1(1 T 1) −1 1 T (999)The {0, 1} eigenvalues also indicate that diagonalizable V is a projectionmatrix. [45, §4.1, thm.4.1] The symmetry of V denotes orthogonal projection;[48] hence, (1207)V T = V , V † = V , ‖V ‖ 2 = 1, V ≽ 0 (1000)Matrix V is also circulant. (§B.6) [81] Circulant matrices form a subspace,symmetric circulant matrices form another. Yet the set of all (symmetric)circulant projection matrices cannot be convex because the eigenvalues ofsome convex combination of such matrices will not remain 0 or 1.• Let H ∈ S N be a Householder matrix (992) defined by⎡ ⎤u = ⎢1.⎥⎣ 11 + √ ⎦ ∈ RN (1001)NThen we have [136, §2]Let D ∈ S N 0 and define[ I 0V = H0 T 0[− HDH = ∆ A b−b T c]H (1002)](1003)where b is a vector. Then because H is nonsingular (§A.3.1.0.5) [97, §3][ ] A 0− V DV = −H0 T H ≽ 0 ⇔ −A ≽ 0 (1004)0and affine dimension is r = rankA when D is an EDM.

336 APPENDIX B. SIMPLE MATRICESB.4.2Schoenberg auxiliary matrix V N1. V N = 1 √2[ −1TI2. V T N 1 = 03. I − e 1 1 T = [ 0 √ 2V N]]∈ R N×N−14. [ 0 √ ]2V N VN = V N5. [ 0 √ ][ √ ] [ √ ]2V N 0 2VN = 0 2VN6. V † N = √ 2 [ − 1 1 I − 1 11T] ∈ R N−1×N ,N N7. V † N 1 = 0(I −1N 11T ∈ S N−1)8. V † N V N = I9. [V N1 √21 ] −1 =[ ]V†N√2N 1T10. V T = V = V N V † N = I − 1 N 11T ∈ S N11. D = [d ij ] ∈ S N 0tr(−V DV ) = tr(−V D) = tr(−V † N DV N) = 1 N 1T D 1 = 1 N tr(11T D) = 1 NAny elementary matrix E ∈ S N of the particular form∑d iji,jE = k 1 I − k 2 11 T (1005)where k 1 , k 2 ∈ R , B.7 will make tr(−ED) proportional to ∑ d ij .12. D = [d ij ] ∈ S Ntr(−V DV ) = 1 N∑i,ji≠jd ij − N−1N∑iB.7 If k 1 is 1−ρ while k 2 equals −ρ ∈ R , then, for −1/(N −1) < ρ < 1, all the eigenvaluesof E are guaranteed positive and therefore E is guaranteed positive definite. [156]d ii

B.4. AUXILIARY V -MATRICES 33713. D = [d ij ] ∈ S N 0tr(−V T N DV N) = ∑ jd 1jB.4.3The skinny matrixV ∆ W =⎢⎣Auxiliary matrix V W⎡−1 √N1 + −1N+ √ N−1N+ √ N.−1N+ √ N√−1−1N· · · √N−1N+ √ N.... . .−1N+ √ N· · ·−1N+ √ N...−1N+ √ N. . .· · · 1 + −1N+ √ N.⎤∈ R N×N−1 (1006)⎥⎦has R(V W )= N(1 T ) and orthonormal columns. [94] We defined three auxiliaryV -matrices: V , V N (339), and V W sharing some attributes listed inTable B.4.4. For example, V can be expressedV = V W VW T = V N V † N(1007)but V T W V W= I means V is an orthogonal projector (1204) andV † W = V T W , ‖V W ‖ 2 = 1 , V T W1 = 0 (1008)B.4.4Auxiliary V -matrix TabledimV rankV R(V ) N(V T ) V T V V V T V V †V N ×N N −1 N(1 T ) R(1) V V V[ ]V N N ×(N −1) N −1 N(1 T 1) R(1) (I + 2 11T 1 N −1 −1T) V2 −1 IV W N ×(N −1) N −1 N(1 T ) R(1) I V V

338 APPENDIX B. SIMPLE MATRICESB.4.5More auxiliary matricesMathar shows [115, §2] that any elementary matrix (§B.3) of the formV M = I − b1 T ∈ R N×N (1009)such that b T 1 = 1 (confer [84, §2]), is an auxiliary V -matrix havingR(V T M ) = N(bT ), R(V M ) = N(1 T )N(V M ) = R(b), N(V T M ) = R(1) (1010)Given X ∈ R n×N , the choice b= 1 N 1 minimizes ‖X(I−b1T )‖ F . [157, §3.2.1]B.5 Orthogonal matrixThe property Q −1 = Q T completely defines an orthogonal matrix Q ∈ R n×nemployed to effect vector rotation; [26, §2.6, §3.4] [27, §6.5] [28, §2.1] forx ∈ R n ,‖Qx‖ = ‖x‖ (1011)The orthogonal matrix is characterized:Q −1 = Q T , ‖Q‖ 2 = 1 (1012)Applying characterization (1012) to Q T we see it too is an orthogonal matrix.Hence the rows and columns of Q respectively form an orthonormal set.All permutation matrices Ξ , for example, are orthogonal matrices. Thelargest magnitude entry of any orthogonal matrix is 1; for each and everyj ∈1... n ,‖Q(j,:)‖ ∞ ≤ 1(1013)‖Q(:, j)‖ ∞ ≤ 1Each and every eigenvalue of an orthogonal matrix has magnitude 1,λ(Q) ∈ C n , |λ(Q)| = 1 (1014)while only the identity matrix can be simultaneously positive definite andorthogonal.A unitary matrix is a complex generalization of the orthogonal matrix.The conjugate transpose defines it: U −1 = U H . An orthogonal matrix issimply a real unitary matrix.

B.5. ORTHOGONAL MATRIX 339B.5.1ReflectionA matrix for point-wise reflection is defined by imposing symmetry uponthe orthogonal matrix; id est, a reflection matrix is completely defined byQ −1 = Q T = Q . The reflection matrix is an orthogonal matrix, characterized:Q T = Q, Q −1 = Q T , ‖Q‖ 2 = 1 (1015)The Householder matrix (§B.3.1) is an example of a symmetric orthogonal(reflection) matrix.Reflection matrices have eigenvalues equal to ±1, so it is natural to expecta relationship between reflection and projection matrices because allprojection matrices have eigenvalues belonging to {0, 1} . In fact, any reflectionmatrix Q is related to some orthogonal projector P by [155, §1, prob.44]Q = I − 2P (1016)Yet P is, generally, neither orthogonal or invertible (1207).λ(Q) ∈ R n , |λ(Q)| = 1 (1017)Reflection is with respect to R(P ) ⊥ . Matrix 2P −I represents antireflection.Every orthogonal matrix can be expressed as the product of a rotation anda reflection. The collection of all orthogonal matrices of particular dimensiondoes not form a convex set.B.5.2Matrix rotationOrthogonal matrices are also employed to rotate or reflect other matrices:[sic] [44, §12.4.1] Given orthogonal matrix Q , the product Q T A will rotateA ∈ R n×n in the vectorization sense in R n2 because the Frobenius norm isorthogonally invariant (§2.1.1.1);‖Q T A‖ F = √ tr(A T QQ T A) = ‖A‖ F (1018)(likewise for AQ). Were A symmetric, such a rotation would depart fromS n . One remedy is to instead form the product Q T AQ because‖Q T AQ‖ F =√tr(Q T A T QQ T AQ) = ‖A‖ F (1019)

340 APPENDIX B. SIMPLE MATRICESThis rotation of A in the vectorization sense has an added benefit: the simultaneousrotation/reflection of range and rowspace. B.8 We see that by recalling,any matrix can be expressed in terms of its singular value decompositionA = UΣW T (922) where δ 2 (Σ) = Σ , R(U) ⊇ R(A), and R(W) ⊇ R(A T ).B.5.2.1bijectionAny product of orthogonal matrices remains orthogonal. Given a secondorthogonal matrix U, the mapping g(A) = U T AQ is a linear bijection onthe domain of orthogonal matrices. [52, §2.1]B.6 Circulant matrices...Define circulant and symmetric circulant matrices. Eigenvectors alwayssame. Means image theorem may be applied because set of all circulantmatrices makes a subspace. Idea is: Interpolating between any two circulantmatrices would interpolate the known eigenvalues.From the image theorem in §2.1.0.10.2, it follows that 〈E ij , C〉 in (1273)is a convex set whenever C is. The set of all circulant matrices forms asubspace in R M×M whose members all have the same eigenvectors; theorthogonal basis from the discrete Fourier transform. So each eigenvalue(1277) of a circulant matrix comes from a convex set of eigenvalues by theimage theorem.Apply this to the previous DFT by making a circulant matrix C frominput f.Justify use of circulant input using applications from DSP.Because circulant matrices are diagonalizable [81], any circulant matrix Cmay be represented,where Λ = δ(W H f).which is in the final form of (1276),...show this.C = 1 n WΛW H (1020)B.8 The product Q T AQ can be regarded as a coordinate transformation; e.g., given linearmap y =Ax : R n → R n , for orthogonal Q the transformation Qy =AQx is a rotation ofthe range and rowspace (90) (89) of matrix A.

B.6. CIRCULANT MATRICES... 341The set of all circulant matrices forms a subspace. Hence it must followthat any linear combination of circulant matrices remains circulant; for allµ,ζ ∈ R ,µC 1 + ζC 2 = 1 n W(µΛ 1 + ζΛ 2 )W H (1021)is circulant. We know that each eigenvalue comes from a convex set. Therelation (1021) indicates that each convex set of eigenvalues is itself asubspace. The same comments apply to the subspace of symmetric circulantmatrices.Positive definite circulant matrices form a polyhedral (Gallego Circulant.pspaper) cone in the subspace of circulant matrices. -[oldReader,p.35C]Circulant.ps... set of circulant EDMs is a polyhedral cone.For C circulant, where the first row is some time sequence [c 0 ,c 1 ,c 2 ...],let X =C T . Then (333)D(C T ) = k i 11 T − 2CC T (1022)where k i = 2δ(CC T ) i is any one of the diagonal entries, all identical forcirculant matrices. This is classical relationship between autocorrelation andsimilarity function where CC T takes on the role of autocorrelation.

342 APPENDIX B. SIMPLE MATRICES

Appendix CSome optimal analytical resultsC.1 involving diagonal, trace, eigenvalues• For A ∈ R m×n and σ(A) denoting its singular values, [1, §A.1.6][133, §1]∑iσ(A) i = tr √ A T A = sup tr(X T A) = maximize tr(X T A)‖X‖ 2 ≤1X∈R m×n [ ] I Xsubject toX T ≽ 0I(1023)• For X ∈ S m , Y ∈ S n , A ∈ C ⊆ R m×n for C convex and σ(A) denotingthe singular values of A , [133, §3]minimizeA∈R m×n∑σ(A) iiA ∈ C≡minimizeA , X , Ysubject totrX + trY[ ] X AA T ≽ 0YA ∈ C(1024)• For A∈ S N + and β ∈ R ,β trA = maximize tr(XA)X∈ S Nsubject to X ≼ βI(1025)C.0 c○ 2001 Jon Dattorro, all rights reserved.343

344 APPENDIX C. SOME OPTIMAL ANALYTICAL RESULTS• For A∈ S N having eigenvalues λ(A)∈ R N , [52, §2.1] [158, §I.6.15]min{λ(A) i } = inf‖x‖=1 xT Ax = minimizeX∈ S N +subject to trX = 1max{λ(A) i } = sup x T Ax = maximize‖x‖=1X∈ S N +subject to trX = 1tr(XA) = maximize tt∈Rsubject to A ≽ tI(1026)tr(XA) = minimize tt∈Rsubject to A ≼ tI(1027)The minimum eigenvalue of any symmetric matrix is always a concavefunction of its entries, while the maximum eigenvalue is always convex.[1, exmp.3.10]• For B ∈ S N whose eigenvalues λ(B)∈ R N are arranged in nonincreasingorder, and for 1≤k ≤N , [24] [28, §4.3.18] [69, §2] [52, §2.1] (Fan)N∑λ(B) i = infi=N−k+1 U∈ R N×kU T U=Ik∑λ(B) i = supi=1U∈ R N×kU T U=Itr(U T BU) = minimizeX∈ S N +tr(XB)subject to X ≼ Itr(U T BU) = maximizeX∈ S N +trX = ktr(XB)subject to X ≼ ItrX = k(a)(b)= minimize kµ + trZ (c)µ∈R , Z∈S N +subject to µI + Z ≽ B(1028)Optimal U are respectively U ⋆ = W(:, N − k +1 : N) ∈ R N×k andU ⋆ = W(:, 1:k)∈ R N×k where B = WΛW T is an ordered diagonalization.• For B ∈ S N whose eigenvalues λ(B) ∈ R N are arranged in nonincreasingorder, let Ξλ(B) be a permutation of λ(B) such that|Ξλ(B)| 1 ≥ |Ξλ(B)| 2 ≥ · · · ≥ |Ξλ(B)| N . Then, for 1 ≤ k ≤ N ,

C.1. INVOLVING DIAGONAL, TRACE, EIGENVALUES 345[51, §6]k∑|Ξλ(B)| ii=1= minimizez∈R , V,W,S,T ∈S N +kz + tr(V + W)subject to zI + V − W = BzI + S − T = −B= maximizeX,Y ∈S N +〈B , X −Y 〉subject to I ≽ X,Ytr(X + Y )=k(1029)• For A,B ∈ S N whose eigenvalues λ(A), λ(B) ∈ R N are respectivelyarranged in nonincreasing order, and for non-increasingly ordered diagonalizationsB = W B ΛWBT and A = W AΥWA T , [159] [52, §2.1]λ(A) T λ(B) = supU∈ R N×NU T U=I(confer (1050)) where optimal U istr(A T U T BU) ≥ tr(A T B) (1030)U ⋆ = W B W TA ∈ R N×N (1031)We push that bound a little higher using a result in §C.2.1.2.1:|λ(A)| T |λ(B)| = supU∈ C N×NU T U=Itr(A T U T BU) (1032)where U T denotes ordinary transposition (1066). Optimal U becomesU ⋆ = W B√δ(ψ(δ(Λ)))H√δ(ψ(δ(Υ)))WTA ∈ C N×N (1033)where the step function ψ is defined in (933).• For B ∈ S N whose eigenvalues λ(B) ∈ R N are arranged in nonincreasingorder, and for diagonal matrix Υ ∈ S k whose diagonal entries arearranged in nonincreasing order where 1≤k ≤N , we utilize the maindiagonalδ operator’s property of self-adjointness (813); [160, §4.2]k∑i=1Υ ii λ(B) N−i+1 = inf tr(ΥU T BU) = infU∈ R N×kU T U=IU∈ R N×kU T U=Iδ(Υ) T δ(U T BU)(1034)

346 APPENDIX C. SOME OPTIMAL ANALYTICAL RESULTSWe speculate,k∑Υ ii λ(B) i = sup tr(ΥU T BU) = sup δ(Υ) T δ(U T BU) (1035)i=1U∈ R N×kU∈ R N×kU T U=IU T U=IC.2 Orthogonal Procrustes problem[157] Given matrices A,B∈ R n×N , their product having full singular valuedecomposition (§A.6.3)AB T ∆ = UΣQ T ∈ R n×n (1036)then an optimal solution R ⋆ to the orthogonal Procrustes problemminimize ‖A − R T B‖ 2 FR(1037)subject to R T = R −1maximizes tr(A T R T B) over the non-convex manifold of orthogonal matrices:[28, §7.4.8]The optimal value of the objective is therefore,R ⋆ = QU T ∈ R n×n (1038)andtr ( A T A + B T B − 2A(B T R ⋆ ) ) = tr(A T A) + tr(B T B) − 2 tr(UΣU T )= ‖A‖ 2 F + ‖B‖2 F − 2δ(Σ)T 1(1039)sup tr(A T R T B) = tr(A T R ⋆T B) = δ(Σ) T 1 ≥ tr(A T B) (1040)R T =R −1This optimization problem is useful for discovering the rotation/reflectionof one list A with respect to another list B . (§4.5) The solution is unique ifrankBV N = n . [50, §2.4.1]

C.2. ORTHOGONAL PROCRUSTES PROBLEM 347C.2.1Two-sided orthogonal Procrustes problemsGiven symmetric A,B∈ S N , each having diagonalizationA ∆ = Q A Λ A Q T A , B ∆ = Q B Λ B Q T B (1041)where eigenvalues are arranged in their respective diagonal matrix Λ innonincreasing order, then an optimal solution to the two-sided orthogonalProcrustes problemminimize ‖A − R T BR‖ FR= minimize tr ( A T A − 2A T R T BR + B T B )subject to R T = R −1 Rsubject to R T = R −1 (1042)maximizes tr(A T R T BR) over the non-convex manifold of orthogonal matrices:[161]R ⋆ = Q B Q T A ∈ R N×N (1043)which is an optimal orthogonal matrix. The optimal value of the objectiveis therefore, (28)‖Q A Λ A Q T A −R ⋆T Q B Λ B Q T BR ⋆ ‖ F = ‖Q A (Λ A −Λ B )Q T A ‖ F = ‖Λ A −Λ B ‖ F (1044)andsup tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A Λ B ) ≥ tr(A T B) (1045)R T =R −1Any permutation matrix is an orthogonal matrix. Defining a row andcolumn swapping permutation matrix (B.5.1)Ξ = Ξ T =⎡⎢⎣0 1··11 0then an optimal solution to the maximization problem⎤⎥⎦(1046)maximize ‖A − R T BR‖ FR(1047)subject to R T = R −1

348 APPENDIX C. SOME OPTIMAL ANALYTICAL RESULTSminimizes tr(A T R T BR) : [159] [52, §2.1]The optimal value of the objective is therefore,andR ⋆ = Q B ΞQ T A ∈ R N×N (1048)‖Q A Λ A Q T A − R⋆T Q B Λ B Q T B R⋆ ‖ F = ‖Q A Λ A Q T A − Q A ΞT Λ B ΞQ T A ‖ F= ‖Λ A − ΞΛ B Ξ‖ F(1049)C.2.1.1inf tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A ΞΛ B Ξ) (1050)R T =R −1Procrustes’ relation to linear programmingWhile these two-sided Procrustes problems are non-convex, a connection withlinear programming was recently found by Anstreicher & Wolkowicz: givenA,B∈ S N , [160, §3] [52, §2.1]minimizeRtr(A T R T BR) = maximizeS,T ∈S N tr(S + T)subject to R T = R −1 subject to A T ⊗ B − I ⊗ S − T ⊗ I ≽ 0(1051)where ⊗ signifies the Kronecker product (§D.1.2.1), has solution (1050).These two problems are strong duals (§2.8.1) to each other. Given ordereddiagonalizations (1041), make the observation:infR tr(AT R T BR) = inf tr(Λ ˆRT A Λ B ˆR) (1052)ˆRsince ˆR = QB TRQ A is a bijection on the set of orthogonal matrices (whichincludes the permutation matrices). This means, basically, diagonal eigenvaluematrices Λ A and Λ B may be substituted for A and B , so only the maindiagonals of S and T come into play;maximizeS,T ∈S N 1 T δ(S + T)subject to δ(Λ A ⊗ (ΞΛ B Ξ) − I ⊗ S − T ⊗ I) ≽ 0(1053)a linear program in δ(S) and δ(T) having the same optimal objective valueas the semidefinite program.

C.2. ORTHOGONAL PROCRUSTES PROBLEM 349We relate their results to the Procrustes problem (1042) by manipulatingsigns (1166) and permuting eigenvalues:maximize tr(A T R T BR) = minimize 1 T δ(S + T)RS,T ∈S Nsubject to R T = R −1 subject to δ(I ⊗ S + T ⊗ I − Λ A ⊗ Λ B ) ≽ 0whose solution is (1045).= minimizeS,T ∈S N tr(S + T)subject to I ⊗ S + T ⊗ I − A T ⊗ B ≽ 0(1054)C.2.1.2Asymmetric two-sided orthogonal ProcrustesBy making the left and right-side orthogonal matrices independent, we canpush the upper bound on the optimal trace a little further: Given arbitraryA,B each having full singular value decomposition (§A.6.3)A ∆ = U A Σ A Q T A ∈ R m×n , B ∆ = U B Σ B Q T B ∈ R m×n (1055)then the well-known optimal solution to the problemmaximizes Re tr(A T SBR) : [162] [163] [164] [165]minimize ‖A − SBR‖ FR,Ssubject to R H = R −1(1056)S H = S −1S ⋆ = U A U H B ∈ C m×m , R ⋆ = Q B Q H A ∈ C n×n (1057)are optimal unitary matrices (§B.5) which are not necessarily unique[28, §7.4.13] because the feasible set is not convex. The optimal value ofthe objective is therefore, (28)‖U A Σ A Q H A − S ⋆ U B Σ B Q H B R ⋆ ‖ F = ‖U A (Σ A − Σ B )Q H A ‖ F = ‖Σ A − Σ B ‖ F (1058)and, [158, §III.6.12]sup | tr(A T SBR)| = sup Re tr(A T SBR) = Re tr(A T S ⋆ BR ⋆ ) = tr(ΣAΣ T B ) ≥ tr(A T B)R H =R −1R H =R −1S H =S −1 S H =S −1(1059)

350 APPENDIX C. SOME OPTIMAL ANALYTICAL RESULTSA T S ⋆ BR ⋆ ≽ 0, BR ⋆ A T S ⋆ ≽ 0 (1060)The lower bound on the inner product of singular values in (1059) is due tovon Neumann. Equality is achieved if U HA U B =I and QH B Q A =I .C.2.1.2.1 Symmetric matrices. Suppose A,B have diagonalizations(§A.5)A = W A ΥWA T ∈ S n , δ(Υ) ∈ K M (1061)B = W B ΛWB T ∈ S n , δ(Λ) ∈ K M (1062)with their respective eigenvalues in diagonal matrices Υ, Λ ∈ S n arranged innonincreasing order (membership to the monotone cone K M (251)). Thenwe have a symmetric complex singular value decompositionA ∆ = U A Σ A Q H A ∈ S n , B ∆ = U B Σ B Q H B ∈ S n (1063)with U A , U B , Q A , Q B ∈ C n×n by definingU A ∆ = W A√δ(ψ(δ(Υ))) , QA ∆ = W A√δ(ψ(δ(Υ)))H, ΣA = |Υ| (1064)U B ∆ = W B√δ(ψ(δ(Λ))) , QB ∆ = W B√δ(ψ(δ(Λ)))H, ΣB = |Λ| (1065)where step function ψ is defined in (933). In this circumstance,S ⋆ = R ⋆T ∈ C n×n (1066)the optimal unitary matrices (1057) are related by transposition.C.2.1.2.2 Diagonal matrices. Now supposeA = Υ = δ 2 (Υ) ∈ S n , δ(Υ) ∈ K M (1067)B = Λ = δ 2 (Λ) ∈ S n , δ(Λ) ∈ K M (1068)both having their respective entries arranged in nonincreasing order. Thenusing the step function (933) we have the symmetric complex singular valuedecomposition (1063) whereU A ∆ = √ δ(ψ(δ(Υ))), Q A ∆ = √ δ(ψ(δ(Υ))) H , Σ A = |Υ| (1069)U B ∆ = √ δ(ψ(δ(Λ))), Q B ∆ = √ δ(ψ(δ(Λ))) H , Σ B = |Λ| (1070)The Procrustes solution (1057) again sees S ⋆ = R ⋆T ∈ C n×n but both optimalmatrices are now themselves diagonal.

Appendix DMatrix calculusFrom too much study, and from extreme passion, comethmadnesse.−Isaac Newton, [166, §5]D.1 Directional derivative, Taylor seriesD.1.1GradientsTraditionally, the gradient of a differentiable real function f(x) : R K →Rwith respect to its vector domain is defined⎡∇f(x) =∆ ⎢⎣∂f(x)∂x 1∂f(x)∂x 2.∂f(x)∂x K⎤⎥⎦ ∈ RK (1071)D.0 c○ 2001 Jon Dattorro, all rights reserved.351

352 APPENDIX D. MATRIX CALCULUSwhile the second-order gradient of the twice differentiable real function withrespect to its vector domain is traditionally called the Hessian ;⎡⎤∂ 2 f(x)∂x 1 ∂x 2· · ·∇ 2 f(x) =∆ ⎢⎣∂ 2 f(x)∂ 2 x 1∂ 2 f(x)∂x 2 ∂x 1.∂ 2 f(x)∂x K ∂x 1∂ 2 f(x)∂ 2 x 2· · ·....∂ 2 f(x)∂x K ∂x 2· · ·∂ 2 f(x)∂x 1 ∂x K∂ 2 f(x)∂x 2 ∂x K.∂ 2 f(x)∂ 2 x K∈ S K (1072)⎥⎦The gradient of vector-valued function v(x) : R→R N on real domain isa row-vector[]∇v(x) =∆ ∂v 1 (x) ∂v 2 (x) ∂v· · · N (x)∈ R N (1073)∂x ∂x ∂xwhile the second-order gradient is[∇ 2 v(x) =∆∂ 2 v 1 (x)∂x 2∂ 2 v 2 (x)∂x 2 · · ·]∂ 2 v N (x)∈ R N (1074)∂x 2∇ 2 h(x) =∆ ⎢⎣The gradient of vector function h(x) : R K →R N on vector domain is⎡⎤∂h 2 (x)∂x 1· · ·∇h(x) =∆ ⎢⎣∂h 1 (x)∂x 1∂h 1 (x)∂x 2.∂h 1 (x)∂x K∂h 2 (x)∂x 2.· · ·∂h 2 (x)∂x K· · ·∂h N (x)∂x 1∂h N (x)∂x 2.∂h N (x)∂x K= [ ∇h 1 (x) ∇h 2 (x) · · · ∇h N (x) ] ∈ R K×N⎥⎦(1075)while the second-order gradient has a three-dimensional representationdubbed cubix ; D.1⎡⎤∇ ∂h 1(x)∂x 1∇ ∂h 2(x)∂x 1· · · ∇ ∂h N(x)∂x 1∇ ∂h 1(x)∂x 2.∇ ∂h 1(x)∂x K∇ ∂h 2(x)∂x 2· · · ∇ ∂h N(x)∂x 2. .∇ ∂h 2(x)∂x K· · · ∇ ∂h N(x)∂x K= [ ∇ 2 h⎥ 1 (x) ∇ 2 h 2 (x) · · · ∇ 2 h N (x) ] ∈ R K×N×K⎦(1076)D.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derivedfrom mater meaning mother.

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 353where the gradient of each real entry is with respect to vector x as in (1071).The gradient of real function g(X) : R K×L →R on matrix domain is⎡∇g(X) =∆ ⎢⎣=∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)[∇X(:,1) g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤∈ R K×L⎥⎦∇ X(:,2) g(X)...∇ X(:,L) g(X) ] ∈ R K×1×L (1077)where the gradient ∇ X(:,i) is with respect to the i th column of X. The strangeappearance of (1077) in R K×1×L is meant to suggest a third dimension perpendicularto the page (not a diagonal matrix). The second-order gradienthas representation,⎡∇ 2 g(X) =∆ ⎢⎣=∇ ∂g(X)∂X 11∇ ∂g(X)∂X 21.∇ ∂g(X)∂X K1[∇∇X(:,1) g(X)∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ ∂g(X)∂X 22· · · ∇ ∂g(X)∂X 2L. .∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL⎤∈ R K×L×K×L⎥⎦∇∇ X(:,2) g(X)...∇∇ X(:,L) g(X) ] ∈ R K×1×L×K×L (1078)where the gradient ∇ is with respect to matrix X.The gradient of vector function g(X) : R K×L →R N on matrix domain is

354 APPENDIX D. MATRIX CALCULUSa cubix,[∇X(:,1) g 1 (X) ∇ X(:,1) g 2 (X) · · · ∇ X(:,1) g N (X)∇g(X) ∆ =∇ X(:,2) g 1 (X) ∇ X(:,2) g 2 (X) · · · ∇ X(:,2) g N (X)∈ R.........K×N×L∇ X(:,L) g 1 (X) ∇ X(:,L) g 2 (X) · · · ∇ X(:,L) g N (X) ]= [ ∇g 1 (X) ∇g 2 (X) · · · ∇g N (X) ] (1079)while the second-order gradient has a five-dimensional representation;[∇∇X(:,1) g 1 (X) ∇∇ X(:,1) g 2 (X) · · · ∇∇ X(:,1) g N (X)∇ 2 g(X) ∆ =∇∇ X(:,2) g 1 (X) ∇∇ X(:,2) g 2 (X) · · · ∇∇ X(:,2) g N (X)∈ R.........K×N×L×K×L∇∇ X(:,L) g 1 (X) ∇∇ X(:,L) g 2 (X) · · · ∇∇ X(:,L) g N (X) ]= [ ∇ 2 g 1 (X) ∇ 2 g 2 (X) · · · ∇ 2 g N (X) ] (1080)The gradient of matrix-valued function g(X) : R K×L →R M×N on matrixdomain has a four-dimensional representation called quartix ,⎡∇g(X) =∆ ⎢⎣∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g 21 (X) ∇g 22 (X) · · · ∇g 2N (X). .∇g M1 (X) ∇g M2 (X) · · ·.∇g MN (X)⎤⎥⎦ ∈ RM×N×K×Lwhile the second-order gradient has six-dimensional representation⎡∇ 2 g(X) =∆ ⎢⎣and so on.∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X). .∇ 2 g M1 (X) ∇ 2 g M2 (X) · · ·.∇ 2 g MN (X)⎤(1081)⎥⎦ ∈ RM×N×K×L×K×L(1082)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 355D.1.2Product rules for matrix-functionsGiven compatible matrix-valued functions of matrix variable f(X) andg(X) ,∇ X(f(X) T g(X) ) = ∇ X (f)g + ∇ X (g)f (1083)while [100, §8.3] [167]∇ X tr ( f(X) T g(X) ) = ∇ X(tr ( f(X) T g(Z) ) + tr ( g(X)f(Z) T))∣ ∣∣Z←X(1084)These expressions implicitly apply as well to scalar, vector, or matrix functionsof scalar, vector, or matrix arguments.D.1.2.0.1 Example. Cubix.Suppose f(X) : R 2×2 →R 2 = X T a and g(X) : R 2×2 →R 2 = Xb . We wishto find∇ X(f(X) T g(X) ) = ∇ X a T X 2 b (1085)using the product rule. Formula (1083) calls for∇ X a T X 2 b = ∇ X (X T a)Xb + ∇ X (Xb)X T a (1086)Consider the first of the two terms:∇ X (f)g = ∇ X (X T a)Xb= [ ∇(X T a) 1 ∇(X T a) 2]Xb(1087)The gradient of X T a forms a cubix in R 2×2×2 .⎡∇ X (X T a)Xb =⎢⎣∂(X T a) 1∂X 11 ∂(X T a) 1∂X 21 ∂(X T a) 2∂X 11 ∂(X T a) 1∂(X T a) 2∂X 12 ∂X 12∂(X T a) 2∂X 21 ∂(X T a) 1∂(X T a) 2∂X 22 ∂X 22⎤⎡⎢⎣⎥⎦(1088)⎤(Xb) 1⎥⎦ ∈ R 2×1×2(Xb) 2Because the gradient of the product (1085) requires total change with respectto change in each entry of matrix X , the Xb vector must make an inner

356 APPENDIX D. MATRIX CALCULUSproduct with each vector in the second dimension of the cubix (indicated bydotted line segments);⎡∇ X (X T a)Xb = ⎢⎣=⎤a 1 00 a 1⎥a 2 0 ⎦0 a 2[ ]b1 X 11 + b 2 X 12∈ R 2×1×2b 1 X 21 + b 2 X 22[a1 (b 1 X 11 + b 2 X 12 ) a 1 (b 1 X 21 + b 2 X 22 )a 2 (b 1 X 11 + b 2 X 12 ) a 2 (b 1 X 21 + b 2 X 22 )]∈ R 2×2= ab T X T (1089)where the cubix appears as a complete 2 ×2×2 matrix. In like manner forthe second term ∇ X (g)f ,⎡∇ X (Xb)X T a = ⎢⎣The solutionb 1 0b 2 0⎤⎥0 b 1⎦0 b 2[ ]X11 a 1 + X 21 a 2∈ R 2×1×2X 12 a 1 + X 22 a 2= X T ab T ∈ R 2×2 (1090)∇ X a T X 2 b = ab T X T + X T ab T (1091)can be found from Table D.2.1 or verified using (1084).D.1.2.1Kronecker productA partial remedy for venturing into hyper-dimensional representations, suchas the cubix or quartix, is to first vectorize matrices as in (18). This devicegives rise to the Kronecker product of matrices ⊗ ; a.k.a., direct productor tensor product. Its definition is subject to slight variation, [41, §2.1] butin its most conventional form it appears: for A∈ R m×n and B ∈ R p×q ,⎡⎤B 11 A B 12 A · · · B 1q AB ⊗ A =∆ B 21 A B 22 A · · · B 2q A⎢⎥⎣ . . . ⎦ ∈ Rpm×qn (1092)B p1 A B p2 A · · · B pq A

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 357One advantage is the existence of a traditional two-dimensional matrix representationfor the second-order gradient of a scalar function with respectto a vectorized matrix. For example, from §A.1 no.18 (§D.2.1) for squareA,B∈R n×n , [40, §5.2] [160, §3]∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vecT (X)(B T ⊗A) vec X = B⊗A T +B T ⊗A ∈ R n2 ×n 2(1093)To disadvantage is a large new but known set of algebraic rules, [40] [41]and the fact that its mere use does not generally guarantee two-dimensionalrepresentation of gradients.D.1.3Chain rules for composite matrix-functionsGiven compatible matrix-valued functions of matrix variable f(X) andg(X) , [79, §15.7]∇ X g ( f(X) T) = ∇ X f T ∇ f g (1094)∇ 2 X g( f(X) T) = ∇ X(∇X f T ∇ f g ) = ∇ 2 X f ∇ f g + ∇ X f T ∇ 2f g ∇ Xf (1095)D.1.3.0.1D.1.3.1Example. Need better example...Two arguments∇ X g ( f(X) T , h(X) T) = ∇ X f T ∇ f g + ∇ X h T ∇ h g (1096)D.1.3.1.1 Example. Chain rule for two arguments. [33, §1.1]∇ x g ( f(x) T , h(x) T) =g ( f(x) T , h(x) T) = (f(x) + h(x)) T A (f(x) + h(x)) (1097)[ ] [ ]x1εx1f(x) = , h(x) =εx 2 x 2(1098)[ 1 00 ε][ ε 0(A +A T )(f + h) +0 1](A +A T )(f + h)(1099)

358 APPENDIX D. MATRIX CALCULUS∇ x g ( f(x) T , h(x) T) =[ 1 + ε 00 1 + εthat can be found in Table D.2.1.] ([ ] [ ])(A +A T x1 εx1) +εx 2 x 2(1100)limε→0 ∇ xg ( f(x) T , h(x) T) = (A +A T )x (1101)These formulae remain correct when the gradients produce hyperdimensionalrepresentations:D.1.4First directional derivativeAssume that a differentiable function g(X) : R K×L →R M×N has continuousfirst- and second-order gradients ∇g and ∇ 2 g over domg which is an openset. We seek simple expressions for the first and second directional derivativesin direction Y ∈R K×L →Y, dg ∈ R M×N and dg →Y2 ∈ R M×N respectively.Assuming that the limit exists, we may state the partial derivative of themn th entry of g with respect to the kl th entry of X ;∂g mn (X)∂X klg mn (X + ∆t e= limk e T l ) − g mn(X)∆t→0 ∆t∈ R (1102)where e k is the k th standard basis vector in R K while e l is the l th standardbasis vector in R L . The total number of partial derivatives equals KLMNwhile the gradient is defined in their terms; the mn th entry of the gradient is⎡∇g mn (X) =⎢⎣∂g mn(X)∂X 11∂g mn(X)∂X 21.∂g mn(X)∂X K1∂g mn(X)∂X 12· · ·∂g mn(X)∂X 22· · ·.∂g mn(X)∂X K2· · ·∂g mn(X)∂X 1L∂g mn(X)∂X 2L.∂g mn(X)∂X KL⎤∈ R K×L (1103)⎥⎦while the gradient is a quartix⎡⎤∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g 21 (X) ∇g 22 (X) · · · ∇g 2N (X)∇g(X) = ⎢⎥⎣ . .. ⎦ ∈ RM×N×K×L (1104)∇g M1 (X) ∇g M2 (X) · · · ∇g MN (X)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 359By simply rotating our perspective of the four-dimensional representation ofthe gradient matrix, we find one of three useful transpositions of this quartix:⎡∇g(X) T 1=⎢⎣∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤⎥⎦ ∈ RK×L×M×N (1105)When the limit for ∆t ∈ R exists, it is easy to show by substitution ofvariables in (1102)∂g mn (X) g mn (X + ∆t Y kl eY kl = limk e T l ) − g mn(X)∂X kl ∆t→0 ∆t∈ R (1106)which may be interpreted as the change in g mn at X when the change in X klis equal to Y kl , the kl th entry of any Y ∈ R K×L . Because the total changein g mn (X) due to Y is the sum of change with respect to each and everyX kl , the mn th entry of the directional derivative is the corresponding totaldifferential [79, §15.8]dg mn (X)| dX→Y= ∑ k,l∂g mn (X)∂X klY kl = tr ( ∇g mn (X) T Y ) (1107)= ∑ g mn (X + ∆t Y kl elimk e T l ) − g mn(X)(1108)∆t→0 ∆tk,lg mn (X + ∆t Y ) − g mn (X)= lim(1109)∆t→0 ∆t= d dt∣ g mn (X+ t Y ) (1110)t=0where t ∈ R . Assuming finite Y , equation (1109) is called the Gateauxdifferential [65, App.A.5] [29, §D.2.1] [35, §5.28] whose existence is impliedby the existence of the Fréchet differential, the sum in (1107). [37, §7.2] Eachmay be understood as the change in g mn at X when the change in X is equal

360 APPENDIX D. MATRIX CALCULUSin magnitude and direction to Y . D.2 Hence the directional derivative,⎡⎤dg 11 (X) dg 12 (X) · · · dg 1N (X)→Ydg(X) =∆ dg 21 (X) dg 22 (X) · · · dg 2N (X)⎢⎥∈ R M×N⎣ . . . ⎦∣dg M1 (X) dg M2 (X) · · · dg MN (X)⎡= ⎢⎣⎡∣dX→Ytr ( ∇g 11 (X) T Y ) tr ( ∇g 12 (X) T Y ) · · · tr ( ∇g 1N (X) T Y ) ⎤tr ( ∇g 21 (X) T Y ) tr ( ∇g 22 (X) T Y ) · · · tr ( ∇g 2N (X) T Y )⎥...tr ( ∇g M1 (X) T Y ) tr ( ∇g M2 (X) T Y ) · · · tr ( ∇g MN (X) T Y ) ⎦∑k,l∑=k,l⎢⎣ ∑k,lfrom which it follows∂g 11 (X)∂X klY kl∑k,l∂g 21 (X)∂X klY kl∑k,l.∂g M1 (X)∑∂X klY klk,l∂g 12 (X)∂X klY kl · · ·∂g 22 (X)∂X klY kl · · ·.∂g M2 (X)∂X klY kl · · ·→Ydg(X) = ∑ k,l∑k,l∑k,l∑k,l∂g 1N (X)∂X klY kl∂g 2N (X)∂X klY kl.∂g MN (X)∂X kl⎤⎥⎦Y kl(1111)∂g(X)∂X klY kl (1112)Yet for all X ∈ domg , any Y ∈R K×L , and some open interval of t∈R ,g(X+ t Y ) = g(X) + t →Ydg(X) + o(t 2 ) (1113)which is the first-order Taylor series expansion about X. [79, §18.4][66, §2.3.4] Differentiation with respect to t and subsequent t-zeroing isolatesthe second term of the expansion. Thus differentiating and zeroingg(X+ t Y ) in t is an operation equivalent to individually differentiating andzeroing every entry g mn (X+ t Y ) as in (1110). So the directional derivativeof g(X) in any direction Y ∈R K×L evaluated at X ∈ domg becomes→Ydg(X) = d dt∣ g(X+ t Y ) ∈ R M×N (1114)t=0D.2 Although Y is a matrix, we may regard it as a vector in R KL .

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 361υ ∆ =⎡⎢⎣∇ x ˆf(α)→∇ x ˆf(α)1d ˆf(α) 2⎤⎥⎦ˆf(α + t y)υ T( ˆf(α),α)✡ ✡✡✡✡✡✡✡✡✡ˆf(x)∂HFigure D.1: Drawn is a convex quadratic bowl in R 3 ; ˆf(x)= x T x : R 2 → Rversus x on some open disc in R 2 . Plane slice ∂H is perpendicular todomain. Intersection of slice with domain connotes slice direction y indomain. Slope of tangent line T at point ( ˆf(α),α) is value of directionalderivative at α in slice direction; equivalent to ∇ x ˆf(α) T y (1141). Recall,the negative gradient is always the direction of steepest descent [168].[79, §15.6] For this function, the gradient maps to R 2 . When vector entry υ 3is half the directional derivative in direction of the gradient at α , and when[υ 1 υ 2 ] T = ∆ ∇ x ˆf(α) , then −υ ∈ R 3 points directly toward bottom of bowl.[6, §2.1, §5.4.5] [7, §6.3.1] which is simplest. The derivative with respect tot makes the directional derivative (1114) resemble ordinary calculus (§D.2);→Ye.g., when g(X) is linear, dg(X) = g(Y ). [37, §7.2]D.1.4.1InterpretationIn the case of a real function ˆf(X) : R K×L →R , the directional derivative ofˆf(X) at X in any direction Y yields the slope of ˆf along the line X+ t Ythrough its domain, parametrized by t∈ R , evaluated at t = 0. Figure D.1,for example, shows a plane slice of a real convex bowl-shaped function ˆf(x)along a line α + t y through its domain. The slice reveals a one-dimensionalreal function of t ; ˆf(α+t y). The directional derivative at x=α in directiony is the slope of ˆf(α + t y) with respect to t at t = 0.

362 APPENDIX D. MATRIX CALCULUSNotice, unlike the gradient, directional derivative does not expand dimension;directional derivative in (1114) retains the dimensions of g . Forhigher-dimensional functions, the foregoing slope interpretation can be appliedto each entry of the directional derivative by (1111).In the case of a real function having vector argument h(X) : R K →R ,the directional derivative in the normalized direction of the gradient is themagnitude of the gradient. (1141)Theorem. Directional derivative condition for optimization. [37, §7.4]Suppose ˆf(X) : R K×L →R is minimized on convex set C ⊆ R p×k by X ⋆ , andthe directional derivative of ˆf exists there. Then for all X ∈ C ,D.1.4.1.1 Example. Simple bowl.Bowl function (Figure D.1)→X−X ⋆d ˆf(X) ≥ 0 (1115)⋄ˆf(x) : R K → R ∆ = (x − a) T (x − a) − b (1116)has function offset −b∈R , axis of revolution at x=a , and positive definiteHessian (1072) everywhere in its domain (an open hyperdisc in R K ); id est,strictly convex quadratic ˆf(x) has global minimum equal to −b at x=a . Avector −υ based anywhere in dom ˆf × R pointing toward the unique bowlbottom is specified:[ ] x − aυ ∝∈ R ˆf(x) K × R (1117)+ bSuch a vector issince the gradient isυ ∆ =⎡⎢⎣∇ x ˆf(x)→∇ x ˆf(x)1d ˆf(x) 2⎤⎥⎦ (1118)∇ x ˆf(x) = 2(x − a) (1119)and the directional derivative in the direction of the gradient is (1141)→∇ x ˆf(x)d ˆf(x)( )= ∇ x ˆf(x) T ∇ x ˆf(x) = 4(x − a) T (x − a) = 4 ˆf(x) + b(1120)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 363D.1.5Second directional derivativeBy similar argument, it so happens: the second directional derivative isequally simple. Given g(X) : R K×L →R M×N on open domain,∇ ∂g mn(X)∂X kl= ∂∇g mn(X)∂X kl=⎡∇ 2 g mn (X) =⎢⎣⎡=⎢⎣⎡⎢⎣∇ ∂gmn(X)∂X 11∇ ∂gmn(X)∂X 21.∇ ∂gmn(X)∂X K1∂∇g mn(X)∂X 11∂∇g mn(X)∂X 21.∂∇g mn(X)∂X K1∂ 2 g mn(X)∂X kl ∂X 11∂ 2 g mn(X)∂X kl ∂X 21∂ 2 g mn(X)∂X kl ∂X K1∂ 2 g mn(X)∂X kl ∂X 12· · ·∂ 2 g mn(X)∂X kl ∂X 1L∂ 2 g mn(X) ∂∂X kl ∂X 22· · ·2 g mn(X)∂X kl ∂X 2L∈ R K×L⎥. . . ⎦∂ 2 g mn(X) ∂∂X kl ∂X K2· · ·2 g mn(X)∂X kl ∂X KL(1121)⎤∇ ∂gmn(X)∂X 12· · · ∇ ∂gmn(X)∂X 1L∇ ∂gmn(X)∂X 22· · · ∇ ∂gmn(X)∂X 2L∈ R K×L×K×L⎥. . ⎦∇ ∂gmn(X)∂X K2· · · ∇ ∂gmn(X)∂X KL∂∇g mn(X)∂X 12· · ·∂∇g mn(X)∂X 22.· · ·∂∇g mn(X)∂X K2· · ·∂∇g mn(X)∂X 1L∂∇g mn(X)∂X 2L.∂∇g mn(X)∂X KL(1122)Rotating our perspective, we get several views of the second-order gradient:⎡⎤∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 ∇ 2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X)g(X) = ⎢⎥⎣ . .. ⎦ ∈ RM×N×K×L×K×L∇ 2 g M1 (X) ∇ 2 g M2 (X) · · · ∇ 2 g MN (X)(1123)⎡⎤∇ ∂g(X)∂X 11∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ 2 g(X) T 1=∇ ∂g(X)∂X⎢21∇ ∂g(X)∂X 22· · · ∇ ∂g(X)∂X 2L⎥⎣ . . . ⎦ ∈ RK×L×M×N×K×L (1124)∇ ∂g(X)∂X K1∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL⎤⎥⎦⎤

364 APPENDIX D. MATRIX CALCULUS⎡∇ 2 g(X) T 2=⎢⎣∂∇g(X)∂X 11∂∇g(X)∂X 21.∂∇g(X)∂X K1∂∇g(X)∂X 12· · ·∂∇g(X)∂X 22.· · ·∂∇g(X)∂X K2· · ·∂∇g(X)∂X 1L∂∇g(X)∂X 2L.∂∇g(X)∂X KL⎤⎥⎦ ∈ RK×L×K×L×M×N (1125)Assuming the limits exist, we may state the partial derivative of the mn thentry of g with respect to the kl th and ij th entries of X ;∂ 2 g mn(X)∂X kl ∂X ij=g mn(X+∆t elimk e T l +∆τ e i eT j )−gmn(X+∆t e k eT l )−(gmn(X+∆τ e i eT )−gmn(X))j∆τ,∆t→0∆τ ∆t(1126)Differentiating (1106) and then scaling by Y ij ,∂ 2 g mn(X)∂X kl ∂X ijY kl Y ij = lim∆t→0∂g mn(X+∆t Y kl e k e T l )−∂gmn(X)∂X ijY∆tijg mn(X+∆t Y kl e= limk e T l +∆τ Y ij e i e T j )−gmn(X+∆t Y kl e k e T l )−(gmn(X+∆τ Y ij e i e T )−gmn(X))j∆τ,∆t→0∆τ ∆t(1127)that can be proved by substitution of variables in (1126). The mn th secondordertotal differential due to any Y ∈R K×L isd 2 g mn (X)| dX→Y= ∑ i,j∑k,l∂ 2 g mn (X)Y kl Y ij = tr(∇ X tr ( ∇g mn (X) T Y ) )TY (1128)∂X kl ∂X ij= ∑ ∂g mn (X + ∆t Y ) − ∂g mn (X)limY ij∆t→0 ∂Xi,jij ∆t(1129)g mn (X + 2∆t Y ) − 2g mn (X + ∆t Y ) + g mn (X)= lim∆t→0∆t 2 (1130)= d2dt 2 ∣∣∣∣t=0g mn (X+ t Y ) (1131)Hence the second directional derivative,

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 365→Ydg 2 (X) ∆ =⎡⎢⎣⎡tr(∇tr ( ∇g 11 (X) T Y ) )TY =tr(∇tr ( ∇g 21 (X) T Y ) )TY ⎢⎣ .tr(∇tr ( ∇g M1 (X) T Y ) )TY⎡∑ ∑i,j k,l∑ ∑=i,j k,l⎢⎣ ∑ ∑i,jk,l∂ 2 g 11 (X)∂X kl ∂X ijY kl Y ij∂ 2 g 21 (X)∂X kl ∂X ijY kl Y ij.∂ 2 g M1 (X)∂X kl ∂X ijY kl Y ijd 2 g 11 (X) d 2 g 12 (X) · · · d 2 g 1N (X)d 2 g 21 (X) d 2 g 22 (X) · · · d 2 g 2N (X). .d 2 g M1 (X) d 2 g M2 (X) · · ·.d 2 g MN (X)⎤⎥∈ R M×N⎦∣∣dX→Ytr(∇tr ( ∇g 12 (X) T Y ) )TY · · · tr(∇tr ( ∇g 1N (X) T Y ) ) ⎤TYtr(∇tr ( ∇g 22 (X) T Y ) )TY · · · tr(∇tr ( ∇g 2N (X) T Y ) )T Y ..tr(∇tr ( ∇g M2 (X) T Y ) )TY · · · tr(∇tr ( ∇g MN (X) T Y ) ⎥) ⎦TY∑ ∑∂ 2 g 12 (X)∂X kl ∂X ijY kl Y ij · · ·i,ji,jk,l∑ ∑∂ 2 g 22 (X)∂X kl ∂X ijY kl Y ij · · ·k,l.∑ ∑∂ 2 g M2 (X)∂X kl ∂X ijY kl Y ij · · ·from which it follows→Ydg 2 (X) = ∑ ∑ ∂ 2 g(X)Y kl Y ij = ∑ ∂Xi,j kl ∂X iji,jk,li,jk,l∑∑⎤∂ 2 g 1N (X)∂X kl ∂X ijY kl Y iji,j k,l∑∑∂ 2 g 2N (X)∂X kl ∂X ijY kl Y iji,j k,l. ⎥∑ ∑ ∂ 2 g MN (X) ⎦∂X kl ∂X ijY kl Y iji,jk,l(1132)∂∂X ij→Ydg(X)Y ij (1133)Yet for all X ∈ domg, any Y ∈R K×L , and some open interval of t∈R ,g(X+ t Y ) = g(X) + t dg(X) →Y+ 1 →Yt2dg 2 (X) + o(t 3 ) (1134)2!which is the second-order Taylor series expansion about X. [79, §18.4][66, §2.3.4] Differentiating twice with respect to t and subsequent t-zeroingisolates the third term of the expansion. Thus differentiating and zeroingg(X+tY ) in t is an operation equivalent to individually differentiating andzeroing every entry g mn (X + t Y ) as in (1131). So the second directionalderivative becomes→Ydg 2 (X) = d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ∈ R M×N (1135)[6, §2.1, §5.4.5] [7, §6.3.1] which is again simplest. (confer (1114))

366 APPENDIX D. MATRIX CALCULUSD.1.6Taylor seriesSeries expansions of the differentiable matrix-valued function g(X), of matrixargument, were given earlier in (1113) and (1134). Assuming g(X) hascontinuous first-, second-, and third-order gradients over the open set domg ,then for X ∈ domg and any Y ∈ R K×L the complete Taylor series on someopen interval of µ∈R is expressedg(X+µY ) = g(X) + µ dg(X) →Y+ 1 →Yµ2dg 2 (X) + 1 →Yµ3dg 3 (X) + o(µ 4 ) (1136)2! 3!or on some open interval of ‖Y ‖g(Y ) = g(X) +→Y −X→Y −X→Y −Xdg(X) + 1 dg 2 (X) + 1 dg 3 (X) + o(‖Y ‖ 4 ) (1137)2! 3!which are third-order expansions about X. The mean value theorem fromcalculus is what insures the finite order of the series. [33, §1.1] [65, App.A.5][29, §0.4] [79]In the case of a real function g(X) : R K×L → R , all the directionalderivatives are in R :→Ydg(X) = tr ( ∇g(X) T Y ) (1138)→Ydg 2 (X) = tr(∇ X tr ( ∇g(X) T Y ) )TY(= tr∇ X→Ydg(X) T Y)(1139)(→Ydg 3 (X) = tr ∇ X tr(∇ X tr ( ∇g(X) T Y ) ) ) ( )T TY →YY = tr ∇ X dg 2 (X) T Y(1140)In the case g(X) : R K →R has vector argument, they further simplify:and so on.→Ydg(X) = ∇g(X) T Y (1141)→Ydg 2 (X) = Y T ∇ 2 g(X)Y (1142)→Ydg 3 (X) = ∇ X(Y T ∇ 2 g(X)Y ) TY (1143)D.1.6.0.1 Example. log detX . [1, p.644]We want the first two terms of the Taylor series (1137)...

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 367D.1.7Correspondence of gradient to derivativeFrom the foregoing expressions for directional derivative, we derive a relationshipbetween the gradient with respect to matrix X and the derivativewith respect to real variable t :D.1.7.1first-orderRemoving from (1114) the evaluation at t = 0 , D.3 we find an expression forthe directional derivative of g(X) in the direction Y evaluated anywherealong a line X+ t Y intersecting domg , parametrized by t ;→Ydg(X+ t Y ) = d g(X+ t Y ) (1144)dtIn the general case g(X) : R K×L →R M×N , from (1107) and (1110) we find,tr ( ∇ X g mn (X+ t Y ) T Y ) = d dt g mn(X+ t Y ) (1145)which is valid at t = 0, of course, when X ∈ domg . In the important caseof a real function g(X) : R K×L →R , from (1138) we have simply,tr ( ∇ X g(X+ t Y ) T Y ) = d g(X+ t Y ) (1146)dtWhen, additionally, g(X) : R K →R has vector argument,∇ X g(X+ t Y ) T Y = d g(X+ t Y ) (1147)dtD.3 Justified by replacing X with X+ tY in (1107)-(1109); beginning,dg mn (X+ tY )| dX→Y= ∑ k,l∂g mn (X+ tY )Y kl∂X kl

368 APPENDIX D. MATRIX CALCULUSExample. Gradient.the tables in §D.2,g(X) = w T X T Xw , X ∈ R K×L , w ∈R L . Usingtr ( ∇ X g(X+ t Y ) T Y ) = tr ( 2ww T (X T + t Y T )Y ) (1148)Applying the equivalence (1146),= 2w T (X T Y + t Y T Y )w (1149)ddt g(X+ t Y ) = d dt wT (X+ t Y ) T (X+ t Y )w (1150)= w T( X T Y + Y T X + 2t Y T Y ) w (1151)= 2w T (X T Y + t Y T Y )w (1152)which is the same as (1149); hence, the equivalence is demonstrated.It is easy to extract ∇g(X) from (1152) knowing only (1146):D.1.7.2tr ( ∇ X g(X+ t Y ) T Y ) = 2w T (X T Y + t Y T Y )w= 2 tr ( ww T (X T + t Y T )Y )tr ( ∇ X g(X) T Y ) = 2 tr ( ww T X T Y )(1153)⇔∇ X g(X) = 2Xww Tsecond-orderLikewise removing the evaluation at t = 0 from (1135),→Ydg 2 (X+ t Y ) = d2dt2g(X+ t Y ) (1154)we can find a similar relationship between the second-order gradient and thesecond derivative: In the general case g(X) : R K×L → R M×N from (1128)and (1131),tr(∇ X tr ( ∇ X g mn (X+ t Y ) T Y ) )TY = d2dt 2g mn(X+ t Y ) (1155)In the case of a real function g(X) : R K×L →R , we have, of course,tr(∇ X tr ( ∇ X g(X+ t Y ) T Y ) )TY = d2dt2g(X+ t Y ) (1156)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 369From (1142), the simpler case, where the real function g(X) : R K → R hasvector argument,Y T ∇ 2 Xd2g(X+ t Y )Y =dt2g(X+ t Y ) (1157)D.1.7.2.1 Example. Second-order gradient.Given real function g(X)= log detX , having domain X ∈ int S K + , we wantto find ∇ 2 g(X)∈R K×K×K×K . From the tables in §D.2,h(X) ∆ = ∇g(X) = X −1 ∈int S K + (1158)so ∇ 2 g(X)=∇h(X). By (1145) and (1113), for Y ∈ S K ,tr ( ∇h mn (X) T Y ) =d dt∣ h mn (X+ t Y ) (1159)( t=0)d =dt∣∣ h(X+ t Y ) (1160)t=0 mn( )d =dt∣∣ (X+ t Y ) −1 (1161)t=0mn= − ( X −1 Y X −1) mn(1162)Setting Y to a member of the standard basis E kl =e k e T l , for k,l ∈{1... K},and employing a property of the trace function (20), we find∇ 2 g(X) mnkl = tr ( ∇h mn (X) T E kl)= ∇hmn (X) kl = − (X −1 E kl X −1 ) mn(1163)∇ 2 g(X) kl = ∇h(X) kl = − ( X −1 E kl X −1) ∈ R K×K (1164)From all these first- and second-order expressions, we may generate newones by evaluating both sides at arbitrary t , in some open interval, but onlyafter the differentiation.

370 APPENDIX D. MATRIX CALCULUSD.2 Tables of gradients and derivatives• [40] [169] When proving results for symmetric matrices algebraically, itis critical to take gradients ignoring symmetry and to then substitutesymmetric entries afterward.• a,b∈R n , x,y ∈R k , A,B∈ R m×n , X,Y ∈ R K×L , i,j,k,l,K,L,m,n,M,N are integers, t,µ∈R , unless otherwise noted.• x µ means δ(δ(x) µ ) for µ∈R ; id est, entry-wise exponentiation.δ is the main-diagonal operator (42) (§A.1). x 0 = ∆ 1, X 0 = ∆ I .⎡ ⎤• ddx∆=⎢⎣ddx 1.ddx k⎥⎦,→ydg (x) ,→ydg 2 (x) (directional derivatives §D.1), log x ,sgnx, sin x , x/y (entry-wise division), etcetera, are maps f : R k → R kthat maintain dimension; e.g., (§A.1)ddx x−1 ∆ = ∇ x 1 T δ(x) −1 1 (1165)• Given f(x) : X →R defined on arbitrary set X , [29, §0.1.2]inf f(x) = − sup −f(x)x∈X x∈Xf(x) = −infsupx∈Xx∈X −f(x) (1166)arg inf f(x) = arg sup −f(x)x∈X x∈Xarg supf(x) = arg inf −f(x) (1167)x∈Xx∈X• Given g(x,y) : X × Y → R defined on arbitrary sets X and Y ,[29, §0.1.2] and independent variables x and y ,inf g(x,y) = inf inf g(x,y) = inf inf g(x,y) (1168)x∈X, y∈Y x∈X y∈Y y∈Y x∈XThe respective arguments of infima are not necessarily unique.• The standard basis: { E kl = e k e T l for k,l∈{1... K} } in R K×K .

D.2. TABLES OF GRADIENTS AND DERIVATIVES 371• For A a scalar or matrix, we have the Taylor series [170, §3.6]e A =∞∑k=01k! Ak (1169)Further, [26, §5.4]e A ≻ 0, ∀A ∈ S m (1170)• For all square A and integer k ,det k A = detA k (1171)Try to generalize 2X2 cases...Try d f(X + t Y dt )µ = ...,[24, §2.3.5]−1 ≤ µ ≤ 1, X,Y ∈ S M + ... f monotoneThe function X µ is convex on S M + for −1 ≤ µ ≤ 0 or 1 ≤ µ ≤ 2 andconcave for 0≤µ≤1... [1, §3.6.2]

372 APPENDIX D. MATRIX CALCULUS

D.2. TABLES OF GRADIENTS AND DERIVATIVES 373D.2.1Algebraic∇ x x = ∇ x x T = I ∈ R k×k ∇ X X = ∇ X X T = ∆ I ∈ R K×L×K×L (the identity)∇ x (Ax − b) = A T(∇ x x T A − b T) = A∇ x (Ax − b) T (Ax − b) = 2A T (Ax − b)∇x(Ax 2 − b) T (Ax − b) = 2A T A(∇ x x T Ax + 2x T By + y T Cy ) = (A +A T )x + 2By∇ 2 x(x T Ax + 2x T By + y T Cy ) = A +A T ∇ X a T Xb = ∇ X b T X T a = ab T∇ X a T X 2 b = X T ab T + ab T X T∇ X a T X −1 b = −X −T ab T X −T∇ x a T x T xb = 2xa T b ∇ X a T X T Xb = X(ab T + ba T )∇ x a T xx T b = (ab T + ba T )x ∇ X a T XX T b = (ab T + ba T )X∇ X (X −1 ) kl = ∂X−1∂X kl= −X −1 E kl X −1 , confer (1105)(1164)∇ x a T x T xa = 2xa T a∇ x a T xx T a = 2aa T x∇ x a T yx T b = ba T y∇ x a T y T xb = yb T a∇ x a T xy T b = ab T y∇ x a T x T yb = ya T b∇ X a T X T Xa = 2Xaa T∇ X a T XX T a = 2aa T X∇ X a T YX T b = ba T Y∇ X a T Y T Xb = Y ab T∇ X a T XY T b = ab T Y∇ X a T X T Y b = Y ba T

374 APPENDIX D. MATRIX CALCULUSD.2.1.1Algebraic cont.d(X+ t Y ) = Ydtddt BT (X+ t Y ) −1 A = −B T (X+ t Y ) −1 Y (X+ t Y ) −1 Addt BT (X+ t Y ) −T A = −B T (X+ t Y ) −T Y T (X+ t Y ) −T Addt BT (X+ t Y ) µ A = ..., −1 ≤ µ ≤ 1, X,Y ∈ S M +d 2dt 2 B T (X+ t Y ) −1 A = 2B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 Addt((X+ t Y ) T A(X+ t Y ) ) = Y T AX + X T AY + 2tY T AYd 2dt 2 ((X+ t Y ) T A(X+ t Y ) ) = 2Y T AYd((X+ t Y )A(X+ t Y )) = YAX + XAY + 2tYAYdtd 2dt 2 ((X+ t Y )A(X+ t Y )) = 2YAY

D.2. TABLES OF GRADIENTS AND DERIVATIVES 375

376 APPENDIX D. MATRIX CALCULUSD.2.2Trace∇ x µx = µI ∇ X trµX = ∇ X µ trX = µI∇ x 1 T δ(x) −1 1 = ddx x−1 = −x −2∇ x 1 T δ(x) −1 y = −δ(x) −2 y (§A.1)∇ X trX −1 = −X −2T∇ X tr(X −1 Y ) = ∇ X tr(Y X −1 ) = −X −T Y T X −Tddx xµ = µx µ−1 ∇ X trX µ = µX (µ−1)T , X ∈ R 2×2∇ X trX j = jX (j−1)T∇ x (b − a T x) −1 = (b − a T x) −2 a∇ X tr((B − AX) −1 ) = ((B − AX) −2 A) T∇ x (b − a T x) µ = −µ(b − a T x) µ−1 a∇ x x T y = ∇ x y T x = y∇ X tr(X T Y ) = ∇ X tr(Y X T ) = ∇ X tr(Y T X) = ∇ X tr(XY T ) = Y∇ X tr(AXBX T ) = ∇ X tr(XBX T A) = A T XB T + AXB∇ X tr(AXBX) = ∇ X tr(XBXA) = A T X T B T + B T X T A T∇ X tr(AXAXAX) = ∇ X tr(XAXAXA) = 3(AXAXA) T∇ X tr(Y X k ) = ∇ X tr(X k Y ) = k−1 ∑( X i Y X k−1−i) Ti=0∇ X tr(Y T XX T Y ) = ∇ X tr(X T Y Y T X) = 2Y Y T X∇ X tr(Y T X T XY ) = ∇ X tr(XY Y T X T ) = 2XY Y T∇ X tr ( (X + Y ) T (X + Y ) ) = 2(X + Y )∇ X tr((X + Y )(X + Y )) = 2(X + Y ) T∇ X tr(A T XB) = ∇ X tr(X T AB T ) = AB T∇ X tr(A T X −1 B) = ∇ X tr(X −T AB T ) = −X −T AB T X −T∇ X a T Xb = ∇ X tr(ba T X) = ∇ X tr(Xba T ) = ab T∇ X b T X T a = ∇ X tr(X T ab T ) = ∇ X tr(ab T X T ) = ab T∇ X a T X −1 b = ∇ X tr(X −T ab T ) = −X −T ab T X −T∇ X a T X µ b =

D.2. TABLES OF GRADIENTS AND DERIVATIVES 377D.2.2.1Trace cont.dtrg(X+ t Y ) = tr d g(X+ t Y )dt dtdtr(X+ t Y ) = trYdtddt trj (X+ t Y ) = j tr j−1 (X+ t Y ) trYdtr(X+ t Y dt )j = j tr((X+ t Y ) j−1 Y )(∀j)d2tr((X+ t Y )Y ) = trYdtdtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = k tr ( (X+ t Y ) k−1 Y 2) , k ∈{0, 1, 2}dtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = tr k−1 ∑(X+ t Y ) i Y (X+ t Y ) k−1−i Ydtr((X+ t Y dt )−1 Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dtr( B T (X+ t Y ) −1 A ) = − tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 A )dtdtr( B T (X+ t Y ) −T A ) = − tr ( B T (X+ t Y ) −T Y T (X+ t Y ) −T A )dtdtr( B T (X+ t Y ) −k A ) = ... k>0dtdtr( B T (X+ t Y ) µ A ) = ..., −1 ≤ µ ≤ 1, X,Y ∈ S M dt +d 2tr ( B T (X+ t Y ) −1 A ) = 2 tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 A )dt 2dtr( (X+ t Y ) T A(X+ t Y ) ) = tr ( Y T AX + X T AY + 2tY T AY )dtd 2tr ( (X+ t Y ) T A(X+ t Y ) ) = 2 tr ( Y T AY )dt 2dtr((X+ t Y )A(X+ t Y )) = tr(YAX + XAY + 2tYAY )dtd 2dt 2 tr((X+ t Y )A(X+ t Y )) = 2 tr(YAY )i=0

378 APPENDIX D. MATRIX CALCULUSD.2.3Trace Kronecker∇ vec X tr(AXBX T ) = ∇ vec X vec T (X)(B T ⊗ A) vec X = (B ⊗A T + B T ⊗A) vec X∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vecT (X)(B T ⊗ A) vec X = B ⊗A T + B T ⊗A

D.2. TABLES OF GRADIENTS AND DERIVATIVES 379D.2.4Log determinantx ≻ 0, detX > 0 on some neighborhood of X, det(X + t Y ) > 0 on someopen interval of t ; otherwise, log(·) is discontinuous.dlog x = x−1∇dxddx log x−1 = −x −1ddx log xµ = µx −1X log detX = X −T∇ 2 X log det(X) kl = ∂X−T∂X kl= −(X −1 E kl X −1 ) T , confer (1122)(1164)∇ X log detX −1 = −X −T∇ X log det µ X = µX −T∇ X log detX µ = µX −T , X ∈ R 2×2∇ X log detX k = ∇ X log det k X = kX −T∇ X log det µ (X+ t Y ) = µ(X+ t Y ) −T∇ x log(a T x + b) = a 1a T x+b∇ X log det(AX+ B) = A T (AX+ B) −T∇ X log det(I ± A T XA) = ...∇ X log det(X+ t Y ) k = ∇ X log det k (X+ t Y ) = k(X+ t Y ) −Tdlog det(X+ t Y ) = tr((X+ t Y dt )−1 Y )d 2dt 2 log det(X+ t Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dlog det(X+ t Y dt )−1 = − tr((X+ t Y ) −1 Y )d 2dt 2 log det(X+ t Y ) −1 = tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )ddt log det(δ(A(x + t y) + a)2 + µI)= tr ( (δ(A(x + t y) + a) 2 + µI) −1 2δ(A(x + t y) + a)δ(Ay) )

380 APPENDIX D. MATRIX CALCULUSD.2.5Determinant∇ X detX = ∇ X detX T = det(X)X −T∇ X detX −1 = − det(X −1 )X −T = − det(X) −1 X −T∇ X det µ X = µ det µ (X)X −T∇ X detX µ = µ det(X µ )X −T , X ∈ R 2×2∇ X detX k = k det k−1 (X) ( tr(X)I − X T) , X ∈ R 2×2∇ X detX k = ∇ X det k X = k det(X k )X −T = k det k (X)X −T∇ X det µ (X+ t Y ) = µ det µ (X+ t Y )(X+ t Y ) −T∇ X det(X+ t Y ) k = ∇ X det k (X+ t Y ) = k det k (X+ t Y )(X+ t Y ) −Tddet(X+ t Y ) = det(X+ t Y ) tr((X+ t Y dt )−1 Y )d 2dt 2 det(X+ t Y ) = det(X+ t Y ) (tr 2 ((X+ t Y ) −1 Y ) − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddet(X+ t Y dt )−1 = − det(X+ t Y ) −1 tr((X+ t Y ) −1 Y )d 2dt 2 det(X+ t Y ) −1 = det(X+ t Y ) −1 (tr 2 ((X+ t Y ) −1 Y ) + tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddt detµ (X+ t Y ) =

D.2. TABLES OF GRADIENTS AND DERIVATIVES 381D.2.6Logarithmicdlog(X+ t Y dt )µ = ...,−1 ≤ µ ≤ 1, X,Y ∈ S M + [25, §6.6, prob.20]D.2.7Exponential[170, §3.6, §4.5] [26, §5.4]∇ X e tr(Y T X) = ∇ X dete Y TX = e tr(Y T X) Y (∀X,Y )∇ X tre Y X = e Y T X T Y T = Y T e XT Y Tlog-sum-exp & geometric mean [1, p.74]...d jdt j e tr(X+ t Y ) = e tr(X+ t Y ) tr j (Y )ddt etY = e tY Y = Y e tYddt eX+ t Y = e X+ t Y Y = Y e X+ t Y ,d 2dt 2 e X+ t Y = e X+ t Y Y 2 = Y e X+ t Y Y = Y 2 e X+ t Y ,XY = Y XXY = Y Xe X for symmetric X of dimension less than 3 [1, pg.110]...

382 APPENDIX D. MATRIX CALCULUS

Appendix EPseudoinverse, ProjectionFor all A∈R m×n , the pseudoinverse [28, §7.3, prob.9] [44, §5.5.4] [26, App.A][37, §6.12, prob.19]A † = limt→0 +(AT A + t I) −1 A T = limt→0 +AT (AA T + t I) −1 ∈ R n×m (1172)is a unique matrix, having E.1 [171] [99, §III.1, exer.1]R(A † ) = R(A T ), R(A †T ) = R(A) (1173)N(A † ) = N(A T ), N(A †T ) = N(A) (1174)that satisfies the Penrose conditions: [48] [155, §1.3]1. AA † A = A2. A † AA † = A †3. (AA † ) T = AA †4. (A † A) T = A † AThe Penrose conditions are necessary and sufficient to establish the pseudoinversewhose principal action is to injectively map R(A) onto R(A T ). Thefollowing relations are reliably true without qualification:E.0 c○ 2001 Jon Dattorro, all rights reserved.E.1 Proof of (1173) and (1174) is by singular value decomposition (§A.6).383

384 APPENDIX E. PSEUDOINVERSE, PROJECTIONa. A T † = A †Tb. A †† = Ac. (AA T ) † = A †T A †d. (A T A) † = A † A †Te. (AA † ) † = AA †f. (A † A) † = A † AYet for arbitrary A,B it is generally true that (AB) † ≠ B † A † :Theorem. Pseudoinverse of product.For A∈ R m×n and B ∈ R n×k ,[172] [173, exer.7.23](AB) † = B † A † (1175)if and only ifR(A T AB) ⊆ R(B) and R(BB T A T ) ⊆ R(A T ) (1176)For orthogonal matrices U,Q and arbitrary A , [99, §III.1]E.0.1Logical deductions(UAQ T ) † = QA † U T (1177)When A is invertible, A † = A −1 , of course; so A † A = AA † = I . Otherwise,for A∈ R m×n , [148, §5.3.3.1] [173, §7] [174]g. A † A = I , A † = (A T A) −1 A T , rankA = nh. AA † = I , A † = A T (AA T ) −1 , rankA = mi. A † Aω = ω , ω ∈ R(A T )j. AA † υ = υ , υ ∈ R(A)k. A † A = AA † , A normall. A k† = A †k , A normal, k an integerWhen A is symmetric, A † is symmetric and (§A.6)A ≽ 0 ⇔ A † ≽ 0 (1178)⋄

E.1. IDEMPOTENT MATRICES 385E.1 Idempotent matricesProjection matrices are square and defined by idempotence, P 2 =P ;[26, §2.6] [155, §1.3] equivalent to the condition that P be diagonalizable[28, §3.3, prob.3] with eigenvalues φ i ∈ {0, 1}. [45, §4.1, thm.4.1] Idempotentmatrices are not necessarily symmetric. The transpose of an idempotentmatrix remains idempotent; P T P T = P T . Solely excepting P = I , all projectionmatrices are neither orthogonal (§B.5) or invertible. [26, §3.4] Thecollection of all projection matrices of particular dimension does not form aconvex set.Suppose we wish to project nonorthogonally (obliquely) on the range ofany particular matrix A∈ R m×n . All idempotent matrices projecting nonorthogonallyon R(A) may be expressed:P = A(A † + BZ T ) (1179)where R(P ) = R(A) , E.2 B ∈ R n×k for k ∈ {1... m} is otherwise arbitrary,and Z ∈ R m×k is any matrix in N(A T ); id est,A T Z = A † Z = 0 (1180)Evidently, the collection of nonorthogonal projectors on R(A) is an affine setP k = { A(A † + BZ T ) | B ∈ R n×k} (1181)When matrix A in (1179) is full-rank (A † A=I) or even has orthonormalcolumns (A T A =I), this characteristic leads to a biorthogonal characterizationof nonorthogonal projection:E.2 Proof. R(P )⊆ R(A) is obvious [26, §3.6]. By (89) and (90),R(A † + BZ T ) = {(A † + BZ T )y | y ∈ R m }⊇ {(A † + BZ T )y | y ∈ R(A)} = R(A T )R(P ) = {A(A † + BZ T )y | y ∈R m }⊇ {A(A † + BZ T )y | (A † + BZ T )y ∈ R(A T )} = R(A)

386 APPENDIX E. PSEUDOINVERSE, PROJECTIONE.1.1Biorthogonal characterizationAny nonorthogonal projector P 2 = P ∈ R m×m on R(U) can be defined bya biorthogonality condition Q T U =I ; the biorthogonal decomposition of Pbeing (confer (1179))where E.3 (§B.1.1.1)and where generally (confer (1207)) E.4P = UQ T , Q T U = I (1182)R(P )= R(U) , N(P)= N(Q T ) (1183)P T ≠ P , P † ≠ P , ‖P ‖ 2 ≠ 1, P 0 (1184)and P is not non-expansive (1208).(⇐=) To verify assertion (1182) we observe: because idempotent matricesare diagonalizable (§A.5), [28, §3.3, prob.3] they must have the form (913)P = SΦS −1 =m∑φ i s i wi T =i=1k∑≤ mi=1s i w T i (1185)that is a sum of k = rankP independent projector dyads (idempotentdyad §B.1.1) where φ i ∈ {0, 1} are the eigenvalues of P [45, §4.1, thm.4.1]in diagonal matrix Φ∈ R m×m arranged in nonincreasing order, and wheres i ,w i ∈R m are the right and left-eigenvectors of P , respectively, which areindependent and real. E.5 ThereforeU ∆ = S(:,1:k) = [ s 1 · · · s k]∈ Rm×k(1186)E.3 Proof. Obviously, R(P) ⊆ R(U). Because Q T U = I ,R(P) = {UQ T x | x ∈ R m }⊇ {UQ T Uy | y ∈ R k } = R(U)E.4 Orthonormal decomposition (1204) (§E.3.4) is a special case of biorthogonal decomposition(1182), so these characteristics (1184) are not necessary conditions for biorthogonality.E.5 Eigenvectors of a real matrix corresponding to real eigenvalues must be real.(§A.5.0.0.1)

E.1. IDEMPOTENT MATRICES 387is the full-rank matrix S ∈ R m×m having m − k columns truncated(corresponding to 0 eigenvalues), while⎡Q T = ∆ S −1 (1:k, :) = ⎣w T1.w T k⎤⎦ ∈ R k×m (1187)is matrix S −1 having the corresponding m − k rows truncated. By the0 eigenvalues theorem (§A.7.2.0.1), R(U)= R(P ) , R(Q)= R(P T ) , andR(P ) = span {s i | φ i = 1, ∀i}N(P) = span {s i | φ i = 0, ∀i}R(P T ) = span {w i | φ i = 1, ∀i}N(P T ) = span {w i | φ i = 0, ∀i}(1188)Thus biorthogonality Q T U = I is a necessary condition for idempotence,and so the collection of nonorthogonal projectors on R(U) is the affine setP k =UQ T k where Q k = {Q | Q T U = I , Q∈ R m×k }.(=⇒) Biorthogonality is a sufficient condition for idempotence;P 2 =k∑s i wiTi=1k∑s j wj T = P (1189)j=1id est, if the cross-products are annihilated, then P 2 =P .The nonorthogonal projection of x on R(P ) can be expressed in abiorthogonal expansion of the projection,Px = UQ T x =k∑wi T xs i (1190)i=1When the domain is restricted to the range of P , say x = Uξ for ξ ∈ R k ,then x = Px = UQ T Uξ = Uξ and the expansion is unique because theeigenvectors are linearly independent. Otherwise, any component of x inN(Q T ) will be ignored by the expansion. The direction of nonorthogonalprojection is orthogonal to R(Q) ⇔ Q T U =I ; id est, for Px∈ R(U) ,Px − x ⊥ R(Q) in R m (1191)

388 APPENDIX E. PSEUDOINVERSE, PROJECTIONTxT ⊥ R(Q)cone(Q)cone(U)PxFigure E.1: Nonorthogonal projection of x ∈ R 3 on R(U) = R 2 underbiorthogonality condition; Px=UQ T x such that Q T U =I . Any point alongimaginary line T connecting x to Px will be projected nonorthogonally onPx with respect to horizontal plane constituting R 2 in this example. Extremedirections of cone(U) correspond to two columns of U ; likewise forcone(Q). For purpose of illustration, we truncate each conic hull by truncatingcoefficients of conic combination at unity. Conic hull cone(Q) is headedupward at an angle, out of plane of page. Nonorthogonal projection wouldfail were N(Q T ) in R(U) (were T parallel to R(U)).

E.2. I − P , PROJECTION ON ALGEBRAIC COMPLEMENT 389E.1.1.0.1 Example. Illustration of nonorthogonal projector.Figure E.1 shows cone(U) , the conic hull of the columns of⎡ ⎤1 1U = ⎣−0.5 0.3 ⎦ (1192)0 0from nonorthogonal projector P = UQ T . Matrix U has a limitless numberof left inverses because N(U T ) is nontrivial. Left inverse Q T from (1179),⎡⎤ ⎡ ⎤0.3750 0.6250 0Q = U †T + ZB T = ⎣ −1.2500 1.2500 ⎦ + ⎣ 0 ⎦[0.5 0.5]0 0 1⎡= ⎣0.3750 0.6250−1.2500 1.25000.5000 0.5000⎤⎦(1193)where Z ∈ N(U T ) and matrix B is selected arbitrarily, is similarly depicted;id est, Q T U =I because U is full-rank.Direction of projection on R(U) is orthogonal to R(Q). Any point alongline T in the figure, for example, will have the same projection. Were matrixZ instead equal to 0, then cone(Q) would become the relative dual tocone(U) sharing the same affine hull. (§2.9.1, confer Figure 2.23) In thatcase, projection Px =UU † x of x on R(U) becomes orthogonal (unique inthe sense, minimum distance).E.1.2Idempotence summarySummarizing, nonorthogonal projector P is a linear operator defined byidempotence or biorthogonal decomposition (1182), but characterized not bysymmetry nor positive semidefiniteness nor non-expansiveness (1208).E.2 I−P , Projection on algebraic complementIt follows from the diagonalizability of idempotent matrices that I−P mustalso be a projection matrix because it too is idempotent, and because it maybe expressedm∑I − P = S(I − Φ)S −1 = (1 − φ i )s i wi T (1194)i=1

390 APPENDIX E. PSEUDOINVERSE, PROJECTIONwhere (1 − φ i ) ∈ {1, 0} are the eigenvalues of I−P (849) whose eigenvectorss i ,w i are identical to those of P in (1185). A consequence of that complementaryrelationship is the fact, [175, §2] [176, §2] for subspace projectorP =P 2 ∈ R m×m ,R(P ) = span {s i | φ i = 1, ∀i} = span {s i | (1 − φ i ) = 0, ∀i} = N(I − P )N(P ) = span {s i | φ i = 0, ∀i} = span {s i | (1 − φ i ) = 1, ∀i} = R(I − P )R(P T ) = span {w i | φ i = 1, ∀i} = span {w i | (1 − φ i ) = 0, ∀i} = N(I − P T )N(P T ) = span {w i | φ i = 0, ∀i} = span {w i | (1 − φ i ) = 1, ∀i} = R(I − P T )(1195)that is easy to see from (1185) and (1194). Idempotent I − P thereforeprojects vectors on its range, N(P ). Because all eigenvectors of a real idempotentmatrix are real and independent, the algebraic complement [38, §3.3]of R(P ) is equivalent to N(P ) ; E.6 id est,R(P )⊕N(P ) = R(P T )⊕N(P T ) = R(P T )⊕N(P ) = R(P )⊕N(P T ) = R m(1196)because R(P)⊕ R(I −P) = R m . For idempotent P ∈ R m×m , consequently,rankP + rank(I − P ) = m (1197)E.2.0.0.1 Theorem. Rank/Trace. [45, §4.1, prob.9] (confer (1212))P 2 = P⇔rankP = trP and rank(I − P ) = tr(I − P )(1198)⋄E.3 Symmetric idempotent matricesWhen idempotent matrix P is symmetric, P is an orthogonal projector. Inother words, the projection Px of vector x ∈ R m on subspace R(P ) isorthogonal; [48] id est, for P 2 =P ∈ S m and Px∈ R(P) ,Px − x ⊥ R(P ) in R m (1199)E.6 The same phenomenon occurs with symmetric (non-idempotent) matrices, for example.When the summands in A ⊕ B = R m are orthogonal vector spaces, the algebraiccomplement is called an orthogonal complement.

E.3. SYMMETRIC IDEMPOTENT MATRICES 391which is a necessary and sufficient perpendicularity condition for orthogonalprojection on a subspace.Any norm is a convex function. [37, §7.8] A condition equivalent to (1199)is: The norm of vector x −Px is the infimum of all nonorthogonal projectionsof x on R(P ) ; [37, §3.3] for P 2 =P ∈ S m , R(P )= R(A) , matricesA,B, Z and integer k as defined for (1179), and any given x∈ R m ,‖x − Px‖ 2 =inf ‖x − A(A † + BZ T )x‖ 2 (1200)B∈R n×kThe infimum is attained for R(B) ⊆ N(A) over any affine set of nonorthogonalprojectors (1181) indexed by k . The proof is straightforward, applyinggradients from §D.2, setting the gradient of the norm-square to 0,(A T ABZ T − A T (I − AA † ) ) xx T A = 0⇔A T ABZ T xx T A = 0(1201)In any case, P =AA † so the projection matrix must be symmetric. Thenfor any A∈ R m×n , P =AA † projects any vector x in R m orthogonally onR(A). Under either condition (1199) or (1200), the projection Px is unique(minimum distance).E.3.1Four subspacesWe summarize the orthogonal projectors on the four fundamental subspaces:For completeness: E.7 (1195)A † A : R n on R(A † A) = R(A T )AA † : R m on R(AA † ) = R(A)I −A † A : R n on R(I −A † A) = N(A)I −AA † : R m on R(I −AA † ) = N(A T )N(A † A) = N(A)N(AA † ) = N(A T )N(I −A † A) = R(A T )N(I −AA † ) = R(A)(1202)(1203)E.7 Proof is by singular value decomposition (§A.6.2): N(A † A) ⊆ N(A) is obvious.Conversely, suppose A † Ax=0. Then x T A † Ax=x T QQ T x=‖Q T x‖ 2 = 0 where A=UΣQ Tis the subcompact singular value decomposition. Because R(Q)= R(A T ), then x∈N(A)that implies N(A † A)⊇ N(A).

392 APPENDIX E. PSEUDOINVERSE, PROJECTIONE.3.2Orthogonal characterizationAny symmetric projector P 2 = P ∈ S m on R(Q) can be defined by theorthonormality condition Q T Q = I . When skinny matrix Q ∈ R m×k hasorthonormal columns, then Q † = Q T by the Penrose conditions. Hence, anyP having an orthonormal decomposition (§E.3.4)where [26, §3.3] (958)P = QQ T , Q T Q = I (1204)R(P ) = R(Q) , N(P ) = N(Q T ) (1205)is an orthogonal projector on R(Q) having, for Px∈ R(Q) , (confer (1191))Px − x ⊥ R(Q) in R m (1206)From (1204), orthogonal projector P is obviously positive semidefinite(§A.3.1.0.6); necessarily,P T = P , P † = P , ‖P ‖ 2 = 1, P ≽ 0 (1207)and ‖Px‖ = ‖QQ T x‖ = ‖Q T x‖ because ‖Qy‖ = ‖y‖ ∀y ∈ R k . All orthogonalprojectors are therefore non-expansive because, from Bessel’s inequality [38],‖Px‖ = ‖Q T x‖ ≤ ‖x‖ ∀x∈ R m (1208)with equality when x∈ R(Q) .From the diagonalization of idempotent matrices (1185) on page 386,m∑k∑≤ mP = SΦS T = φ i s i s T i = s i s T i (1209)i=1orthogonal projection of point x on R(P ) can be expressed in an orthogonalexpansion of the projection,k∑Px = QQ T x = s T i xs i (1210)wherei=1i=1Q = S(:,1:k) = [ s 1 · · · s k]∈ Rm×k(1211)and where the s i [sic] are orthonormal eigenvectors of symmetric idempotentP . When the domain is restricted to the range of P , say x=Qξ for ξ ∈ R k ,then x = Px = QQ T Qξ = Qξ and the expansion is unique because theeigenvectors are linearly independent. Otherwise, any component of x inN(Q T ) will be ignored by the expansion.

E.3. SYMMETRIC IDEMPOTENT MATRICES 393E.3.2.0.1 Theorem. Symmetric rank/trace. (confer (1198) (847))P T = P , P 2 = P⇔rankP = trP = ‖P ‖ 2 F and rank(I − P ) = tr(I − P ) = ‖I − P ‖ 2 F(1212)⋄Proof. We take as given Theorem E.2.0.0.1 that establishes idempotence.We only have left to show trP =‖P ‖ 2 F ⇒ P T = P . [45, §7.1] E.3.3Symmetric idempotence summaryIn summary, orthogonal projector P is a linear operator defined [29, §A.3.1]by idempotence and symmetry, and characterized by positive semidefinitenessand non-expansiveness.E.3.4Orthonormal decompositionWhen Z = 0 in the general nonorthogonal projector A(A † + BZ T ) (1179),an orthogonal projector results (for any matrix A) characterized principallyby idempotence and symmetry. Any real orthogonal projector may, infact, be represented by an orthonormal decomposition such as (1204).[155, §1, prob.42]To verify that assertion for the four fundamental subspaces (1202), weneed only to express A using the compact singular value decomposition(§A.6.1) [44, §2.5.4]: From (935) and the compact SVD we haveAA † = UΣΣ † U T = ÛÛ T ,I − AA † = I − ÛÛ T = Û ⊥ Û ⊥T ,A † A = QΣ † ΣQ T T= ˆQ ˆQI − A † A = I − ˆQ ˆQ T ⊥= ˆQ ˆQ⊥T(1213)where Û ∈ Rm×rank A is the SVD matrix U having η −rankA of its orthonormalcolumns (corresponding to 0 singular values) truncated, likewisefor ˆQ ∈ Rn×rank A , and where Û ⊥ ∈ R m×m−rank A holds a columnar orthonormalbasis for the orthogonal complement of R(Û) , and likewise forˆQ ⊥ ∈ R n×n−rank A . The existence of an orthonormal decomposition is sufficientto establish idempotence and symmetry of P (1204).

394 APPENDIX E. PSEUDOINVERSE, PROJECTIONE.3.4.1Unifying trait of all projectors: directionThe relation (1213) shows that orthogonal projectors simultaneously possessa biorthogonal decomposition (§E.1.1; e.g., AA † for skinny-or-square A fullrank)and an orthonormal decomposition (e.g., ÛÛ T , whence the orthogonalexpansion of Px = ÛÛ T x).E.3.4.1.1 orthogonal projector, orthonormal decompositionConsider orthogonal expansion of x∈ R(Û) :x = ÛÛT x =n∑û i û T i x (1214)i=1a sum of one-dimensional orthogonal projections (§E.6.3), whereÛ ∆ = [û 1 · · · û n ] (1215)and where the orthogonal projector has two expressions, (1213)AA † ∆ = ÛÛ T (1216)where A ∈ R m×n has rank n . The direction of projection of x on û j forsome j ∈{1... n} , for example, is orthogonal to û j but parallel to the spanof all the remaining vectors constituting the columns of Û ;û T j(û j û T j x − x) = 0û j û T j x − x = û j û T j x − ÛÛT x ∈ R({û i |i=1... n, i≠j})(1217)E.3.4.1.2 orthogonal projector, biorthogonal decompositionWe get a similar result for the biorthogonal expansion of x∈R(A). Defineand the rows of the pseudoinverse⎡A ∆ = [a 1 a 2 · · · a n ] ∈ R m×n (1218)A † =∆ ⎢⎣a † 1a †2.a † n⎤⎥⎦ ∈ Rn×m (1219)

E.3. SYMMETRIC IDEMPOTENT MATRICES 395under the biorthogonality condition A † A=I . In the biorthogonal expansion(§2.9.1)n∑x = AA † x = a i a † i x (1220)the direction of projection of x on a j for some particular j ∈ {1... n} , forexample, is orthogonal to a † j and parallel to the span of all the remainingvectors constituting the columns of A ;i=1a † j (a ja † j x − x) = 0a j a † j x − x = a ja † j x − AA† x ∈ R({a i |i=1... n, i≠j})(1221)E.3.4.1.3 nonorthogonal projector, biorthogonal decompositionBecause this foregoing result is independent of symmetry AA † =(AA † ) T , wemust have the same result for any nonorthogonal projector characterized bya biorthogonality condition; namely, for nonorthogonal projector P = UQ T(1182) under biorthogonality condition Q T U = I and x ∈ R(U) , in thebiorthogonal expansionx = UQ T x =k∑u i qi T x (1222)i=1whereU = ∆ [ ]u 1 · · · u k ∈ Rm×k⎡ ⎤q T1 (1223)Q T =∆ ⎣ . ⎦ ∈ R k×mq T kthe direction of projection of x on u j is orthogonal to q j and parallel tothe span of the remaining u i :q T j (u j q T j x − x) = 0u j q T j x − x = u j q T j x − UQ T x ∈ R({u i |i=1... k , i≠j})(1224)

396 APPENDIX E. PSEUDOINVERSE, PROJECTIONE.4 Algebra of projection on affine subsetsLet P A x denote projection of x on affine subset A ∆ = R + α where R is asubspace and α ∈ A . Then, because R is parallel to A , it holds:P A x = P R+α x = (I − P R )(α) + P R x= P R (x − α) + α(1225)Subspace projector P R is a linear operator, and P R (x + y)=P R x whenevery ⊥R and P R is an orthogonal projector.Theorem. Orthogonal projection on affine subset. [70, §9.26]Let A = R + α be an affine subset where α ∈ A , and let R ⊥ be the orthogonalcomplement of subspace R . Then P A x is the orthogonal projectionof x∈ R n on A if and only ifP A x ∈ A , 〈P A x − x, a − α〉 = 0, ∀a ∈ A (1226)or if and only ifP A x ∈ A , P A x − x ∈ R ⊥ (1227)⋄E.5 Projection examplesE.5.0.1.1 Example. Orthogonal projection on orthogonal basis.Orthogonal projection on a subspace can instead be accomplished by orthogonallyprojecting on the individual members of an orthogonal basis for thatsubspace. Suppose, for example, matrix A∈ R m×n holds an orthonormal basisfor R(A) in its columns; A = ∆ [a 1 a 2 · · · a n ] . Then orthogonal projectionof vector x on R(A) is a sum of one-dimensional orthogonal projectionsPx = AA † x = A(A T A) −1 A T x = AA T x =n∑a i a T i x (1228)i=1where each symmetric dyad a i a T i is an orthogonal projector on R(a i ).(§E.6.3)

E.5. PROJECTION EXAMPLES 397E.5.0.1.2 Example. Orthogonal projection on span of nonorthogonal basis.Orthogonal projection on a subspace can also be accomplished by projectingnonorthogonally on the individual members of any nonorthogonal basis forthat subspace. This interpretation is in fact the principal function of thepseudoinverse we discussed. Now suppose matrix A holds a nonorthogonalbasis for R(A) in its columns;A ∆ = [a 1 a 2 · · · a n ] ∈ R m×n (1229)and define the rows of the pseudoinverse⎡ ⎤a † 1A † =∆ a †2⎢ ⎥⎣ . ⎦ ∈ Rn×m (1230)a † nwith A † A=I . Then orthogonal projection of vector x on R(A) is a sum ofone-dimensional nonorthogonal projectionsPx = AA † x =n∑a i a † i x (1231)i=1where each nonsymmetric dyad a i a † i is a nonorthogonal projector on R(a i ) ,(§E.6.1) idempotent because of the biorthogonality condition A † A=I .E.5.0.1.3 Example. Biorthogonal expansion as nonorthogonal projection.Biorthogonal expansion can be viewed as a sum of components, each a nonorthogonalprojection on the range of an extreme direction of a pointed polyhedralcone K ; e.g., Figure E.2.Suppose matrix A ∈ R m×n holds a nonorthogonal basis for R(A)in its columns as in (1229), and the rows of pseudoinverse A † are definedas in (1230). Assuming the most general biorthogonality condition(A † + BZ T )A=I with BZ T defined as for (1179), then biorthogonal expansionof vector x is a sum of one-dimensional nonorthogonal projections; forx∈ R(A) ,n∑x = A(A † + BZ T )x = AA † x = a i a † i x (1232)i=1

398 APPENDIX E. PSEUDOINVERSE, PROJECTIONa †T2K ∗a 2a †TzxK0ya 1a 1 ⊥ a †T2a 2 ⊥ a †T1x = y + z = P a1 x + P a2 xK ∗Figure E.2: (confer Figure 2.23) Biorthogonal expansion of point x ∈ aff Kis found by projecting x nonorthogonally on range of extreme directions ofpolyhedral cone K ⊂ R 2 . Direction of projection on extreme direction a 1is orthogonal to extreme direction a †T1 of dual cone K ∗ and parallel to a 2(§E.3.4.1); similarly, direction of projection on a 2 is orthogonal to a †T2 andparallel to a 1 . Point x is sum of nonorthogonal projections: x on R(a 1 )and x on R(a 2 ) . Expansion is unique because extreme directions of K arelinearly independent. Were a 1 orthogonal to a 2 , then K would be identicalto K ∗ and nonorthogonal projections would become orthogonal.1

E.5. PROJECTION EXAMPLES 399where each dyad a i a † i is a nonorthogonal projector on R(a i ) . (§E.6.1) Theextreme directions of K=cone(A) are {a 1 ,..., a n } the linearly independentcolumns of A while {a †T1 ,..., a †Tn } the extreme directions of relative dualcone K ∗ ∩aff K= cone(A †T ) (§2.9.1) correspond to the linearly independent(§B.1.1.1) rows of A † . The directions of nonorthogonal projection are determinedby the pseudoinverse; id est, direction of projection a i a † i x−x onR(a i ) is orthogonal to a †Ti . E.8Because the extreme directions of this cone K are linearly independent,the component projections are unique in the sense:• there is only one linear combination of extreme directions of K thatyields a particular point x∈ R(A) wheneverR(A) = aff K = R(a 1 ) ⊕ R(a 2 ) ⊕ ... ⊕ R(a n ) (1233)E.5.0.1.4 Example. Nonorthogonal projection on elementary matrix.Suppose P Y is a linear nonorthogonal projector on subspace Y ⊂ M , andsuppose the range of a vector u is linearly independent of Y ; id est, forsome other subspace M ,M = R(u) ⊕ Y (1234)Assuming P M x = P u x + P Y x holds, then it follows for vector x∈M ,P u x = x − P Y x , P Y x = x − P u x (1235)the nonorthogonal projection of x on R(u) can be determined from thenonorthogonal projection of x on Y , and vice versa.Such a scenario is realizable were there some arbitrary basis for Y populatinga full-rank skinny-or-square matrix A ,A ∆ = [ basis Y u ] ∈ R n+1 (1236)Then P M =AA † fulfills the requirements, with P u =A(:,n + 1)A † (n + 1,:)and P Y = A(:,1 : n)A † (1 : n,:). Observe, P M is an orthogonal projectorwhereas P Y and P u are nonorthogonal projectors.E.8 This remains true in high dimension although only a little more difficult to visualizein R 3 ; confer , Figure 2.24.

400 APPENDIX E. PSEUDOINVERSE, PROJECTIONNow suppose, for example, P Y is an elementary matrix (§B.3); in particular,= I − e 1 1 T = [ 0 √ ]2V N ∈ RN×N(1237)P Ywhere Y = N(1 T ) . We have M = R N , A=[ √ 2V N e 1 ] , and u = e 1 . ThusP u = e 1 1 T is a nonorthogonal projector on R(u) projecting in a directionparallel to Y (§E.3.4.1), and P Y x = x − e 1 1 T x is a nonorthogonal projectionof x on Y in a direction parallel to R(u) .E.5.0.1.5 Example. Projecting the origin on a hyperplane.(confer §2.3.2.1) Given the hyperplane representation having b ∈ R andnonzero normal a∈R m ,∂H = {y | a T y = b} ⊂ R m (75)the orthogonal projection of the origin P0 on that hyperplane is the solutionto a minimization problem: (1200)‖P0 − 0‖ 2 = infy∈∂H ‖y − 0‖ 2= infξ∈R m−1 ‖Zξ + x‖ 2(1238)where x is any solution to a T y=b , and where the columns of Z ∈ R m×m−1constitute a basis for N(a T ) so that y = Zξ + x ∈ ∂H for all ξ ∈ R m−1 .The infimum can be found by setting the gradient (with respect to ξ) ofthe strictly convex norm-square to 0. We find the minimizing argument,ξ ⋆ = −(Z T Z) −1 Z T x (1239)soand from (1202),y ⋆ = ( I − Z(Z T Z) −1 Z T) x (1240)P0 = y ⋆ = a(a T a) −1 a T x =a a T‖a‖ ‖a‖ x = ∆ AA † x = a b (1241)‖a‖ 2In words, any point x in the hyperplane ∂H projected on its normal a(confer (1265)) yields that point y ⋆ in the hyperplane closest to the origin.

E.5. PROJECTION EXAMPLES 401E.5.0.1.6 Example. Projection on affine subset.The technique of Example E.5.0.1.5 is extensible. Given an intersection ofhyperplanesA = {y | Ay = b} ⊂ R m (1242)where each row of A ∈ R m×n is nonzero and b ∈ R(A) , then the orthogonalprojection Px of any point x ∈ R n on A is the solution to a minimizationproblem:‖Px − x‖ 2 = infy∈A‖y − x‖ 2= infξ∈R n−rank A ‖Zξ + y p − x‖ 2(1243)where y p is any solution to Ay = b , and where the columns ofZ ∈ R n×n−rank A constitute a basis for N(A) so that y = Zξ + y p ∈ A for allξ ∈ R n−rank A .The infimum is found by setting the gradient of the strictly convex normsquareto 0. The minimizing argument isξ ⋆ = −(Z T Z) −1 Z T (y p − x) (1244)soand from (1202),y ⋆ = ( I − Z(Z T Z) −1 Z T) (y p − x) + x (1245)Px = y ⋆ = x − A † (Ax − b)= (I − A † A)x + A † Ay p(1246)which is a projection of x on N(A) then translated perpendicularly withrespect to the nullspace until it meets the affine subset A . E.5.0.1.7 Example. Projection on affine subset, vertex-description.Suppose now we instead describe the affine subset A in terms of some givenminimal set of generators arranged columnar in X ∈ R n×N (54); id est,A ∆ = aff X = {Xa | a T 1=1} ⊆ R n (1247)Here a minimal set means XV N = [x 2 − x 1 , x 3 − x 1 · · · x N − x 1 ]/ √ 2 isfull-rank (§2.3.2.3) where V N ∈ R N×N−1 is the auxiliary matrix from §B.4.2.

402 APPENDIX E. PSEUDOINVERSE, PROJECTIONThen the orthogonal projection Px of any point x ∈R n on A is, as before,the solution to a minimization problem:‖Px − x‖ 2= inf ‖Xa − x‖ 2a T 1=1(1248)= inf ‖X(V N ξ + a p ) − x‖ 2ξ∈R N−1where a p is any solution to a T 1=1. We find the minimizing argumentξ ⋆ = −(V T NX T XV N ) −1 V T NX T (Xa p − x) (1249)and so the orthogonal projection is [54, §3]Px = Xa ⋆ = (I − XV N (XV N ) † )Xa p + XV N (XV N ) † x (1250)a projection of point x on R(XV N ) then translated perpendicularly withrespect to that range until it meets the affine subset A . E.5.0.1.8 Example. Projecting on hyperplane, halfspace.Given the hyperplane representation having b ∈ R and nonzero normala∈ R m ,∂H = {y | a T y = b} ⊂ R m (75)the orthogonal projection of any point x∈ R m on that hyperplane isPx = x − a(a T a) −1 (a T x − b) (1251)Similarly, orthogonal projection of x on the halfspaceH − = {y | a T y ≤ b} ⊂ R m (65)is the pointPx = x − a(a T a) −1 max{0, a T x − b} (1252)

E.6. VECTORIZATION INTERPRETATION, 403E.6 Vectorization interpretation,projection on a matrixE.6.1Nonorthogonal projection on a vectorNonorthogonal projection of vector x on the range of vector y is accomplishedusing a normalized dyad P 0 (§B.1); videlicet,〈z,x〉〈z,y〉 y = zT xz T y y = yzTz T y x ∆ = P 0 x (1253)where 〈z,x〉/〈z,y〉 is the coefficient of projection on y . Because P 2 0 =P 0and R(P 0 )= R(y) , rank-one matrix P 0 is a nonorthogonal projector on y .The direction of nonorthogonal projection is orthogonal to z ; id est,E.6.2P 0 x − x ⊥ R(P T 0 ) (1254)Nonorthogonal projection on vectorized matrixFormula (1253) is extensible. Given X,Y,Z ∈ R m×n , we have the onedimensionalnonorthogonal projection of X in isomorphic R mn on the rangeof vectorized Y : (§2.1.1)〈Z , X〉Y , 〈Z , Y 〉 ≠ 0 (1255)〈Z , Y 〉where 〈Z , X〉/〈Z , Y 〉 is the coefficient of projection. The inequality accountsfor the fact: projection on vecY is in a direction orthogonal to vecZ .E.6.2.1Nonorthogonal projection on dyadNow suppose we have nonorthogonal projector dyadAnalogous to (1253), for X ∈ R m×m ,P 0 = yzTz T y ∈ Rm×m (1256)P 0 XP 0 =yzTz T y X yzTz T y = zT Xy(z T y) 2 yzT = 〈zyT , X〉〈zy T , yz T 〉 yzT (1257)

404 APPENDIX E. PSEUDOINVERSE, PROJECTIONis the nonorthogonal projection of matrix X on the range of vectorized dyadP 0 ; from which it follows:P 0 XP 0 = zT Xyz T y〈〉yz T zyT yzTz T y = z T y , X z T y = 〈P 0 T , X〉 P 0 = 〈P 0 T , X〉〈P0 T , P 0 〉 P 0When nonsymmetric projector P 0 is rank-one as in (1256), therefore,(1258)R(vecP 0 XP 0 ) = R(vec P 0 ) in R m2 (1259)andP 0 XP 0 − X ⊥ P T 0 in R m2 (1260)E.6.2.1.1 Example. λ as coefficients of nonorthogonal projection.Any diagonalization (913)X = SΛS −1 =m∑λ i s i wi T ∈ R m×m (1261)i=1may be expressed as a sum of nonorthogonal projections on the range of itsvectorized eigenmatrices P j ∆ = s j w T j ;∑X = m 〈(Se i e T jS −1 ) T , X〉 Se i e T jS −1i,j=1∑= m ∑〈(s j wj T ) T , X〉 s j wj T + m 〈(Se i e T jS −1 ) T , SΛS −1 〉 Se i e T jS −1j=1∑= m 〈(s j wj T ) T , X〉 s j wjTj=1i,j=1j ≠ i∆ ∑= m ∑〈Pj T , X〉 P j = m s j wj T Xs j wjTj=1∑= m λ j s j wjTj=1j=1∑= m P j XP jj=1(1262)Matrix X is a sum of one-dimensional nonorthogonal projections becausethe term outside the projection coefficient 〈·〉 differs from the term inside.(§E.6.4) The eigenvalues λ j are coefficients of nonorthogonal projection of

E.6. VECTORIZATION INTERPRETATION, 405X , while the remaining M(M −1)/2 coefficients (for i≠j) are zeroed by theprojection. When P j is rank-one as in (1262),andR(vecP j XP j ) = R(vec s j w T j ) = R(vec P j ) in R m2 (1263)P j XP j − X ⊥ P T j in R m2 (1264)Were X a symmetric matrix, then the eigenmatrices would also be symmetric.So the one-dimensional projections would become orthogonal.E.6.3Orthogonal projection on a vectorThe formula for orthogonal projection of vector x on the range of vector y(one-dimensional projection) is basic analytic geometry; [177, §3.3] [26, §3.2][43, §2.2] [90, §1-8]〈y,x〉〈y,y〉 y = yT xy T y y = yyTy T y x = ∆ P 1 x (1265)where 〈y,x〉/〈y,y〉 is the coefficient of projection on y . An equivalentdescription is: Vector P 1 x is the orthogonal projection of vector x onR(P 1 )= R(y). Rank-one matrix P 1 is a projection matrix because P1 2 =P 1 .The projection is orthogonalbecause P T 1 = P 1 .E.6.4P 1 x − x ⊥ R(P 1 ) (1266)Orthogonal projection on a vectorized matrixFrom (1265), given instead X,Y ∈ R m×n , we have the one-dimensional orthogonalprojection of matrix X in isomorphic R mn on the range of vectorized Y :(§2.1.1)〈Y , X〉〈Y , Y 〉 Y (1267)where 〈Y , X〉/〈Y , Y 〉 is the coefficient of projection.For orthogonal projection, the term outside the inner products 〈·〉 mustequal the terms inside in three places.

406 APPENDIX E. PSEUDOINVERSE, PROJECTIONE.6.4.1Orthogonal projection on dyadThere is opportunity for insight when Y is a dyad yz T (§B.1): Instead givenX ∈ R m×n , y ∈R m , and z ∈R n ,〈yz T , X〉〈yz T , yz T 〉 yzT = yT Xzy T y z T z yzT (1268)is the one-dimensional orthogonal projection of X in isomorphic R mn onthe range of vectorized yz T . To reveal the obscured symmetric projectionmatrices P 1 and P 2 we rewrite (1268):y T Xzy T y z T z yzT =yyTy T y X zzTz T z∆= P 1 XP 2 (1269)So for projector dyads, the projection (1269) in R mn is orthogonal if and onlyif projectors P 1 and P 2 are symmetric; E.9and• for orthogonal projection on the range of a vectorized dyad yz T , inother words, the term outside the inner products 〈·〉 in (1268) mustequal the terms inside in three places.When P 1 and P 2 are rank-one symmetric projectors as in (1269), (18)R(vecP 1 XP 2 ) = R(vec yz T ) in R mn (1270)When y=z then P 1 =P 2 =P T 2 andP 1 XP 2 − X ⊥ yz T in R mn (1271)P 1 XP 1 = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1 (1272)meaning, P 1 XP 1 is equivalent to orthogonal projection of matrix X on thevectorized range of projector dyad P 1 .E.9 For diagonalizable X ∈ R m×m (§A.5), its orthogonal projection in isomorphic R m2 onyz T ∈ R m×m becomes:P 1 XP 2 =m∑λ i P 1 s i wi T P 2i=1When R(P 1 ) = R(w j ) and R(P 2 ) = R(s j ), the j th dyad term from the diagonalizationis isolated but only, in general, to within a scale factor because neither set of left orright eigenvectors is necessarily orthonormal unless X is normal [45, §3.2]. Yet whenR(P 2 )= R(s k ) , k≠j ∈{1... m}, then P 1 XP 2 =0.

E.6. VECTORIZATION INTERPRETATION, 407E.6.4.1.1 Example. Eigenvalues λ as coefficients of orthogonal projection.Let C represent any convex subset of subspace S M , and let C 1 be any elementof C . Then C 1 can be written as the orthonormal series:C 1 =M∑i=1M∑〈E ij , C 1 〉 E ij ∈ C ⊂ S M (1273)j=1j ≥ iwhere E ij ∈ S M is a member of the standard orthonormal basis for S M (38);〈E ij , E ij 〉 = 1. This series is a sum of one-dimensional orthogonal projections;each projection on the range of a vectorized standard basis matrix.The vector inner product 〈E ij , C 1 〉 (§2.1.1) is the coefficient of projection ofvec C 1 on vec E ij .When C 1 is any member of a convex set C whose dimension is L ,Caratheodory’s theorem [50] [30] [29] [65] [33] guarantees that no more thanL +1 affinely independent members from C are required to faithfully representit by some linear combination of those members. Because any symmetricmatrix can be diagonalized [27, §6.4], for example, C 1 ∈ S M has a decompositionin terms of its eigenmatrices q i q T i (§A.5) and eigenvalues λ i ;C 1 = QΛQ T =M∑λ i q i qi T ∈ S M (1274)i=1where Λ∈ S M is a diagonal matrix having δ(Λ) i =λ i , and Q=[q 1 · · · q M ] isan orthogonal matrix in R M×M containing corresponding eigenvectors. Yetthe dimension of S M is M(M+1)/2 in isometrically isomorphic R M(M+1)/2 .To derive eigen-decomposition (1274) from (1273), M standard basis matricesE ij are rotated (§B.5) into alignment with the M eigenmatrices q i q T iof C 1 by applying a traditional similarity transformation; [26, §5.6]{QE ij Q T } ={qi q T i , i = j = 1... M}( )√1qi 2qj T + q j qiT , 1 ≤ i < j ≤ M(1275)This set remains an orthonormal basis for S M , and a remarkable thing hap-

408 APPENDIX E. PSEUDOINVERSE, PROJECTIONpens to the series:∑C 1 = M 〈QE ij Q T , C 1 〉 QE ij Q Ti,j=1j ≥ i∑= M ∑〈q i qi T , C 1 〉 q i qi T + M 〈QE ij Q T , QΛQ T 〉 QE ij Q Ti=1∑= M 〈q i qi T , C 1 〉 q i qiTi=1i,j=1j > i∆ ∑= M ∑〈P i , C 1 〉 P i = M q i qi T C 1 q i qiTi=1∑= M λ i q i qiTi=1i=1∑= M P i C 1 P ii=1(1276)The eigenvaluesλ i = 〈q i q T i , C 1 〉 (1277)are clearly coefficients of orthogonal projection of C 1 on the range of its vectorizedeigenmatrices; (confer §E.6.2.1.1) C 1 still is a sum of one-dimensionalprojections. The remaining M(M −1)/2 coefficients (i≠j) are zeroed by theprojection. When P i is rank-one symmetric as it is in (1276),R(vecP i C 1 P i ) = R(vec q i q T i ) = R(vecP i ) in R M2 (1278)andP i C 1 P i − C 1 ⊥ P i in R M2 (1279)E.6.4.2Positive semidefiniteness test as orthogonal projectionFor any given X ∈ R m×m , the familiar quadratic construct y T Xy ≥ 0, overbroad domain, is a fundamental test of positive semidefiniteness. (§A.2)It is a fact that y T Xy is always proportional to a coefficient of orthogonalprojection; letting z in formula (1268) become y ∈ R m , thenP 2 =P 1 =yy T /y T y=yy T /‖yy T ‖ 2 (confer (961)) and formula (1269) becomes〈yy T , X〉〈yy T , yy T 〉 yyT = yT Xyy T yyy Ty T y = yyTy T y X yyTy T y∆= P 1 XP 1 (1280)

E.6. VECTORIZATION INTERPRETATION, 409By (1267), the product P 1 XP 1 is the one-dimensional orthogonal projectionof X in isomorphic R m2 on the range of vectorized P 1 because, for rankP 1 =1and P 21 =P 1 ∈ S m , (confer (1258))P 1 XP 1 = yT Xyy T y〈〉yy T yyT yyTy T y = y T y , X y T y = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1(1281)The coefficient of orthogonal projection 〈P 1 , X〉= y T Xy/(y T y) is also knownas Rayleigh’s quotient. E.10 When P 1 is rank-one symmetric as in (1280),R(vecP 1 XP 1 ) = R(vec P 1 ) in R m2 (1282)andP 1 XP 1 − X ⊥ P 1 in R m2 (1283)The test for positive semidefiniteness, then, is a test for nonnegativity ofthe coefficient of orthogonal projection of X on the range of each and everyvectorized extreme direction yy T (§2.6.4) from the positive semidefinite conein the ambient space of symmetric matrices.E.10 When y becomes the j th eigenvector s j of diagonalizable X , for example, 〈P 1 , X〉becomes the j th eigenvalue: [58, §III]〈P 1 , X〉| y=sj=s T j( m∑λ i s i wiTi=1s T j s j)s j= λ jSimilarly for y = w j , the j th left-eigenvector,〈P 1 , X〉| y=wj=w T j( m∑λ i s i wiTi=1w T j w j)w j= λ jA quandary may arise regarding the potential annihilation of the antisymmetric part ofX when s T j Xs j is formed. Were annihilation to occur, it would imply the eigenvalue thusfound came instead from the symmetric part of X . The quandary is resolved recognizingthat diagonalization of real X admits complex eigenvectors; hence, annihilation could onlycome about by forming Re(s H j Xs j) = s H j (X +XT )s j /2 [28, §7.1] where (X +X T )/2 isthe symmetric part of X, and s H j denotes the conjugate transpose.

410 APPENDIX E. PSEUDOINVERSE, PROJECTIONE.6.4.2.1 PXP ≽ 0In some circumstances, it may be desirable to limit the domain of testy T Xy ≥ 0 for positive semidefiniteness; e.g., ‖y‖=1. Another example oflimiting the domain-of-test is central to Euclidean distance geometry: ForR(V ) = N(1 T ), the test −V DV ≽ 0 determines whether D ∈ S N 0 is aEuclidean distance matrix. The same test may be stated: For D ∈ S N 0 (andoptionally ‖y‖=1),D ∈ EDM N ⇔ −y T Dy = 〈yy T , −D〉 ≥ 0 ∀y ∈ R(V ) (1284)The test −V DV ≽ 0 is therefore equivalent to a test for nonnegativity of thecoefficient of orthogonal projection of −D on the range of each and everyvectorized extreme direction yy T from the positive semidefinite cone S N + suchthat R(yy T )= R(y) ⊆ R(V ). (The validity of this result is independent ofwhether V is itself a projection matrix.)E.6.4.3 PXP misinterpretation for higher rank PFor a projection matrix P of rank greater than 1, PXP is generally notcommensurate with 〈P,X〉 P as is the case for projector dyads (1281). Yet〈P,P 〉for a symmetric idempotent matrix P of any rank we are tempted to say“ PXP is an orthogonal projection of X ∈ S m on vec P ”. The fallacy is:vec PXP does not necessarily belong to the range of vectorized P ; the mostbasic requirement for projection on vecP .E.7 Range/Rowspace interpretationFor projection matrices P 1 and P 2 of any rank, P 1 XP2T is a projectionof R(X) on R(P 1 ) and a projection of R(X T ) on R(P 2 ) : For anyX = UΣQ T ∈ R m×p as in compact singular value decomposition (922) wherehere η ∆ = min{m,p} ,P 1 XP T 2 =η∑σ i P 1 u i qi T P2 T =i=1η∑σ i P 1 u i (P 2 q i ) T (1285)i=1Recall u i ∈ R(X) and q i ∈ R(X T ) when the corresponding singular valuesare nonzero. (§A.6.1) So P 1 projects u i on R(P 1 ) while P 2 projects q i on

E.7. RANGE/ROWSPACE INTERPRETATION 411R(P 2 ) ; id est, the range and rowspace of any X are respectively projectedon the ranges of P 1 and P 2 . E.11E.7.1Projection on vectorized matrices of higher rankWith A 1 ,B 1 ,Z 1 ,A 2 ,B 2 ,Z 2 as defined for nonorthogonal projector (1179),for P 1 ∆ = A 1 A † 1 ∈ S m where A 1 ∈ R m×n , Z 1 ∈ R m×k , and for P 2 ∆ =A 2 A † 2 ∈ S pwhere A 2 ∈ R p×n , Z 2 ∈ R p×k , and any given X ,‖X−P 1 XP 2 ‖ F = inf ‖X−A 1 (A † 1+B 1 Z TB 1 , B 2 ∈R n×k 1 )X(A †T2 +Z 2 B2 T )A T 2 ‖ F (1286)As for all projectors, the range of the projector is the subspace upon whichprojection is made; {P 1 Y P 2 } . Altogether this meansP 1 XP 2 − X ⊥ {P 1 Y P 2 | Y ∈ R m×p } in R mp (1287)and projectors P 1 and P 2 must each be symmetric (confer (1269)) to achievethe infimum, but may be of any rank:E.7.1.0.1 Proof. Minimum Frobenius norm (1286).inf ‖X − A 1 (A † 1 + B 1 Z1 T )X(A †T2 + Z 2 B2 T )A T 2 ‖ F (1288)B 1 , B 2Defining P ∆ = A 1 (A † 1 + B 1 Z T 1 ) ,inf ‖X − PX(A †T2 + Z 2 B2 T )A T 2 ‖ 2 FB 1 , B 2()= inf tr (X T − A 2 (A † 2 + B 2 Z2 T )X T P T )(X − PX(A †T2 + Z 2 B2 T )A T 2 )B 1 , B 2(= inf tr X T X −X T PX(A †T2 +Z 2 B2 T )A T 2 −A 2 (A †B 1 , B 22+B 2 Z2 T )X T P T X)+A 2 (A † 2+B 2 Z2 T )X T P T PX(A †T2 +Z 2 B2 T )A T 2(1289)E.11 When P 1 and P 2 are symmetric and R(P 1 )=R(u j ) and R(P 2 )=R(q j ), then the j thdyad term from the singular value decomposition of X is isolated by the projection. Yetif R(P 2 )= R(q l ), l≠j ∈{1... η}, then P 1 XP 2 =0.

412 APPENDIX E. PSEUDOINVERSE, PROJECTIONThe Frobenius norm is a convex function. [1, §8.1] Necessary and sufficientconditions for a global minimum are ∇ B1 =0 and ∇ B2 =0. (§D.1.3.1) Termsnot containing B 2 in (1289) will vanish from the gradient ∇ B2 ; (§D.2.2)(∇ B2 tr −X T PXZ 2 B2A T T 2 −A 2 B 2 Z2X T T P T X+A 2 A † 2X T P T PXZ 2 B2A T T 2+A 2 B 2 Z T 2X T P T PXA †T2 A T 2+A 2 B 2 Z T 2X T P T PXZ 2 B T 2A T 2= −2A T 2X T PXZ 2 + 2A T 2A 2 A † 2X T P T PXZ 2 +)2A T 2A 2 B 2 Z2X T T P T PXZ 2= A T 2(−X T + A 2 A † 2X T P T + A 2 B 2 Z2X T T P T PXZ 2= 0⇔R(B 1 )⊆ N(A 1 ) and R(B 2 )⊆ N(A 2 ))(1290)The same conclusion is obtained were instead P T = ∆ (A †T2 + Z 2 B2 T )A T 2 andthe gradient with respect to B 1 observed.E.7.1.0.2 Example. PXP redux, nullspace.Suppose we define a subspace of all m × n matrices, each matrix havingcolumns constituting a list whose geometric center (§4.5.1.0.1) is the originin R m :R m×ng∆= {Y ∈ R m×n | Y 1 = 0}= {Y ∈ R m×n | N(Y ) ⊇ 1} = {Y ∈ R m×n | R(Y T ) ⊆ N(1 T )}(1291)Further suppose V ∈ S n is a projection matrix having N(V ) = R(1) andR(V ) = N(1 T ). Then linear operator T(X)=XV is an orthogonal projector,projecting any X ∈ R m×n on R m×ng in the sense (1287) because V issymmetric, XV 1=0, N(XV )⊇1, and R(VX T )⊆ N(1 T ).Now suppose we define a subspace of all symmetric n ×n matrices eachof whose columns constitute a list having the origin in R n as their geometriccenter,S n g∆= {Y ∈ S n | Y 1 = 0}= {Y ∈ S n | N(Y ) ⊇ 1} = {Y ∈ S n | R(Y ) ⊆ N(1 T )}(1292)the geometric center subspace. Further suppose V ∈ S n is a projection matrix,the same as before. Then geometric centering operator V(X) = V XV

E.7. RANGE/ROWSPACE INTERPRETATION 413(§4.6.1) is an orthogonal projector, projecting any X ∈ S n on S n g in thesense (1287) because V is symmetric, V XV 1 = 0, N(V XV ) ⊇ 1, andR(V XV )⊆ N(1 T ). Two-sided projection is necessary only to remain in theambient symmetric subspace. ThenS n g = {V XV | X ∈ S n } ⊂ S n (1293)We find its orthogonal complement as the aggregate of all directions of orthogonalprojection on S n g :S n⊥g∆= {V XV − X | X ∈ S n } ⊂ S n= {u1 T + 1u T | u∈ R n }(1294)characterized by the doublet u1 T + 1u T (§B.2). E.12 Analogously to vectorprojectors (§E.2), N(V)= R(I − V) ; id est,N(V) = S n⊥g (1295)Now compare the subspace of symmetric matrices having all zeros in thefirst row and columnS n 1∆= {Y ∈ S n | Y e 1 = 0}{[ ] [ 0 0T 0 0T= X0 I 0 I] }| X ∈ S n(1296)E.12 Proof.{V X V − X | X ∈ S n } = {(I − 1 n 11T )X(I − 11 T 1 n ) − X | X ∈ Sn }Because {X1 | X ∈ S n }= R n ,= {− 1 n 11T X − X11 T 1 n + 1 n 11T X11 T 1 n | X ∈ Sn }{V X V − X | X ∈ S n } = {−1ζ T − ζ1 T + 11 T (1 T ζ 1 n ) | ζ ∈Rn }(={−1ζ T I − 11 T 1 ) (− I − 1 ) }2n 2n 11T ζ1 T | ζ ∈R nwhere I − 12n 11T is invertible.

414 APPENDIX E. PSEUDOINVERSE, PROJECTION[ 0 0Twhere0 IS n⊥1∆=]is an orthogonal projector. Then similarly,{[ ] [ ] }0 0T 0 0TX − X | X ∈ S n ⊂ S n0 I 0 I= { (1297)ue T 1 + e 1 u T | u∈ R n}and obviously S n 1 ⊕ S n⊥1 = S n . E.8 Projection on convex setThus far we have discussed only projection on subspaces. Now we generalize,considering projection on arbitrary convex sets in Euclidean space; convexbecause projection is then unique.If C ⊆ R n is a closed convex set, then for each and every x ∈ R n thereexists a unique point Px belonging to C that is closest to x in the Euclideansense. Like before (1200), the unique projection Px of a point x on convexset C is that point in C closest to x ; [37, §3.12]There exists a converse:‖x − Px‖ 2 = infy∈C ‖x − y‖ 2 (1298)Bunt-Motzkin Theorem. Convex set if projections unique. [32, §7.5][178] If C ⊆ R n is a nonempty closed set and if for each and every x in R nthere is a unique Euclidean projection of x on C belonging to C , then C isconvex.⋄Also like before, there is a well-known equivalent characterization of projectionon a convex set; a generalization of the perpendicularity condition(1199) for projection on a subspace:

E.8. PROJECTION ON CONVEX SET 415E.8.0.0.1 Theorem. Unique projection.[29, §A.3.1] [37, §3.12] [70, §4.1] [179] (Figure E.6(b), p.425) Point Px isthe unique projection of a point x ∈ R n on the closed convex set C ⊆ R n ifand only if,Px ∈ C , 〈x − Px, y − Px〉 ≤ 0 ∀y ∈ C (1299)In other words, Px is that point in C nearest some given x .⋄Yet unlike before, the operator P is not linear; projector P is a linearoperator if and only if convex set C (on which projection is made) is a subspace.E.8.0.0.2 Fact. Non-expansivity. When C ⊂ R n is an arbitrary closedconvex set, projector P on C is non-expansive in the sense: [180, §2] for anyx,y ∈ R n ,‖Px − Py‖ ≤ ‖x − y‖ (1300)with equality when x −Px = y −Py . E.13⋄Proof. [181]‖x − y‖ 2 = ‖Px − Py‖ 2 + ‖(I − P)x − (I − P)y‖ 2+ 2〈x − Px, Px − Py〉 + 2〈y − Py , Py − Px〉(1301)Nonnegativity of the last two terms follows directly from the uniqueprojection theorem.E.8.1Projection on coneWhen the convex set C is a cone, there is a finer statement of optimalityconditions:E.13 This condition for equality corrects an error in [179] (where the norm is applied toeach side of the condition given here) easily revealed by counter-example.

416 APPENDIX E. PSEUDOINVERSE, PROJECTIONE.8.1.0.1 Theorem. Unique projection on cone. [29, §A.3.2]Let K ⊆ R n be a closed convex cone, and K ∗ its dual (§2.8.1). Then Px isthe unique (minimum distance) projection of x∈ R n on K if and only ifPx ∈ K , 〈Px − x, Px〉 = 0, Px − x ∈ K ∗ (1302)⋄In words, Px is the unique projection of x on K if and only if1) projection Px lies in K ,2) direction Px−x is orthogonal to the projection Px ,3) direction Px−x lies in the dual cone K ∗ .As stated, the theorem admits projection on K having empty interior; id est,convex cones in a proper subspace of R n . Projection on K of any pointx∈−K ∗ belonging to the negative dual cone is on the origin.E.8.1.1Relation to subspace projectionThe first and second conditions of the theorem are common with orthogonalprojection on a subspace R(P ) : The first is the most basic requirement;namely, Px∈ R(P) , the projection belongs to the subspace. Invoking perpendicularitycondition (1199), we recall the second requirement for projectionon a subspace:Px − x ⊥ R(P ) or Px − x ∈ R(P ) ⊥ (1303)Yet condition 3 is a generalization of subspace projection; id est, for uniqueprojection on a closed convex cone, K ∗ plays the role R(P ) ⊥ plays for subspaceprojection. Indeed, orthogonal vector sum (p.447) K ⊞ −K ∗ = R n ⇒cone K is closed and convex. [31, §2.7] Recalling that any subspace is a closedconvex cone (but a proper subspace is not a proper cone (§2.6.2.0.2)),K = R(P ) ⇔ K ∗ = R(P ) ⊥ (1304)meaning, when a cone is a subspace R(P ) then the dual cone becomes itsorthogonal complement R(P ) ⊥ . [1, §2.6.1] In this circumstance, condition 3becomes coincident with condition 2.By analogy to projection on the algebraic complement via I − P in §E.2,given unique projection Px on convex cone K satisfying Theorem E.8.1.0.1,K ∗ = {Px − x | x∈ R n } (1305)

E.8. PROJECTION ON CONVEX SET 417E.8.1.2 Salient properties: projection Px on closed convex cone K[29, §A.3.2]1. Px = 0 ⇔ x ∈ −K ∗2. Pαx = α Px ∀α ≥ 03. P(−x) (on K) = −(Px on −K)4. (Jean-Jacques Moreau)x = x 1 + x 2 , x 1 ∈ K , x 2 ∈−K ∗ , x 1 ⊥ x 2⇔x 1 = Px (on K) and x 2 = Px on −K ∗E.8.1.2.1 Example. Unique projection on nonnegative orthant.(confer (712)) From the unique projection on cone theorem, to uniquelyproject matrix H ∈ R m×n on the self-dual orthant (§2.8.3.2) of nonnegativematrices R m×n+ in isomorphic R mn , the necessary and sufficient conditionsare:H ⋆ ≥ 0tr ( (H ⋆ − H) T H ⋆) = 0(1306)H ⋆ − H ≥ 0where the inequalities denote entry-wise comparison. The optimal solutionH ⋆ is simply H having all its negative entries zeroed.Example. Unique projection on truncated convex cone. Consider theproblem of projecting a point x on a pointed closed convex cone that isartificially bounded; really, a bounded convex polyhedron having a vertex atthe origin:minimize ‖x − Ay‖ 2y≽0(1307)subject to ‖y‖ ∞ ≤ 1where the (unbounded) cone has vertex-description (§2.7.2),K = {Ay | y ≽ 0} (1308)This is an optimization problem having no closed-form solution, in general.It arises, for example, in the fitting of hearing aids that are designed around

418 APPENDIX E. PSEUDOINVERSE, PROJECTIONa programmable graphic equalizer (a filter bank whose only adjustable parametersare gain). [182] The problem is equivalent to a Schur-form semidefiniteprogram (§A.4.1),minimizey≽0, t∈Rsubject tot[tIvec T (x − Ay)vec(x − Ay)t]≽ 0(1309)y ≼ 1perhaps easier to solve numerically.E.8.2Easy projections• Projecting any matrix H ∈ R n×n orthogonally in theEuclidean/Frobenius sense on the subspace of symmetric matricesS n in isomorphic R n2 amounts to taking the symmetric part of H ;(§2.1.2) id est, (H+H T )/2 is the projection.• To project any H ∈ R n×n orthogonally on the symmetric hollow subspaceS n 0 in isomorphic R n2 (§2.1.2.2), we may take the symmetricpart and then zero all entries along the main diagonal, or vice versa(because this is projection on the intersection of two subspaces); id est,(H + H T )/2 − δ 2 (H) .• To project uniquely on the nonnegative orthant R m×n+ , simply clip allnegative entries to 0.• Clipping in excess of |1| each entry of a point x ∈ R n is equivalent tounique projection of x on the unit hypercube centered at the origin.(confer §E.9.3.1.1)• Projection of x∈ R n on a hyper-rectangle: [1, §8.1.1]C = {y ∈ R n | l ≼ y ≼ u, l ≺ u} (1310)⎧⎨ l k , x k ≤ l kP(x) k = x k , l k ≤ x k ≤ u k (1311)⎩u k , x k ≥ u k

E.8. PROJECTION ON CONVEX SET 419• Unique projection of H ∈ S n in the Euclidean/Frobenius sense on thepositive semidefinite cone S n + is accomplished by eigen-decomposition(diagonalization) followed by clipping all negative eigenvalues to 0.Unique projection on all rank ρ matrices belonging to S n + is accomplishedby clipping all negative eigenvalues to 0 and zeroing the smallestnonnegative eigenvalues keeping only ρ largest. (§7.1.2)• Unique projection of H ∈ R m×n in the Euclidean/Frobenius sense onthe set of all m×n matrices of rank no greater than k is the singularvalue decomposition (§A.6) of H having all singular values beyond thek th zeroed. [99, p.208] This solution is identical to the projection in thesense of spectral norm. [1, §8.1]• Unique projection on ellipsoid (Figure 2.1(c)) [164]...• Unique projection on set of all matrices whose largest singular valuedoes not exceed 1...• Projection on Lorentz cone... [1, exer.8.3(c)]Yet projecting matrix H ∈ R n×n uniquely on convex cone K = R n×n+ ∩ S nin isomorphic R n2 can be accomplished by first projecting on S n and onlythen projecting the result on R n×n+ : (confer §7.0.2)E.8.3Projection on convex set in subspaceSuppose a convex set C is contained in some subspace R n . Then unique(minimum distance) projection of any point in R n ⊕ R n⊥ on C can beaccomplished by first projecting orthogonally on that subspace, and thenuniquely projecting the result on C ; [70, §5.14] id est, the ordered productof two individual projections.To show that, suppose Px on C is y as illustrated in Figure E.3;‖x − y‖ ≤ ‖x − q‖ ∀q ∈ C (1312)Further suppose Px on R n equals z . By the Pythagorean theorem‖x − y‖ 2 = ‖x − z‖ 2 + ‖z − y‖ 2 (1313)

420 APPENDIX E. PSEUDOINVERSE, PROJECTION❇❇❇❇❇❇❇❇ x ❜❜R n⊥ ❇❙ ❜❇❇❇❇◗◗◗◗ ❙❙❙❙❙❙ ❜❜❜❜❜z❜y❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧ R n❜❜C❜❜❜❜❜❜x −z ⊥ z −y ❜❇❜❇❇❇❇ ❜❜❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧Figure E.3: Closed convex set C , whose relative boundary is drawn, belongsto subspace R n represented in sketch by the diamond (drawn without properperspective). Point y is unique projection of x on C ; equivalent to productof orthogonal projection of x on R n and unique projection of result z on C .

E.8. PROJECTION ON CONVEX SET 421because x −z ⊥ z −y . (1199) [37, §3.3] Then point y is also the projectionof z on C because‖z − y‖ 2 = ‖x − y‖ 2 − ‖x − z‖ 2 ≤ ‖z − q‖ 2 = ‖x − q‖ 2 − ‖x − z‖ 2 ∀q ∈ C(1314)The converse also holds.

422 APPENDIX E. PSEUDOINVERSE, PROJECTIONbC 2∂K ⊥ C 1 ∩ C 2(Pb) + PbC 1Figure E.4: First several iterations (1324) in the von Neumann-style projectionof point b converging on the closest point Pb in the intersection of twoclosed convex sets in R 2 ; C 1 and C 2 are partially drawn in the vicinity oftheir intersection. The pointed normal cone K ⊥ (1345) is translated to Pb ,the unique projection of b on the intersection. For this particular example, itis possible to start anywhere in a large neighborhood of b and still convergeto Pb . The iterates are themselves robust with respect to some significantamount of noise because they belong to the translated normal cone.E.9 Alternating projectionThe method of alternating projection is an iterative technique for finding apoint in the intersection of a number of arbitrary closed convex sets C k , orfor finding the distance between two nonintersecting closed convex sets. Becauseit can sometimes be difficult or inefficient to compute the intersectionor express it analytically, one naturally asks whether it is possible to insteadsequentially project uniquely (minimum distance) on the individual C k , ofteneasier. Once a sequence of projections (an iteration) is complete, we thencyclically repeat (iterate) the sequence until convergence, as we shall show.If the intersection of two closed convex sets is empty, then by convergencewe mean the iterates settle to a point of minimum distance. (§E.9.2)Given, in particular, two convex sets C 1 and C 2 and their respective projectionoperators P 1 and P 2 , one considers alternating projection wheneverthose projectors do not commute; id est, when P 1 P 2 ≠P 2 P 1 . When C 1 andC 2 are subspaces, projectors P 1 and P 2 commute if and only if P 1 P 2 = P C1 ∩ C 2or if and only if P 1 P 2 is the orthogonal projection on a Euclidean subspace.[70, lem.9.2] Subspace projectors will commute, for example, when C 1 ⊂ C 2

E.9. ALTERNATING PROJECTION 423bH 1H 2P 2 bH 1 ∩ H 2PbP 1 P 2 bFigure E.5: The sets {C k } in this example comprise two halfspaces H 1 andH 2 . The von Neumann-style alternating projection in R 2 quickly convergesto P 1 P 2 b (feasibility). The unique projection on the intersection is, ofcourse, Pb .or C 2 ⊂ C 1 or C 1 ⊥ C 2 . When the projectors commute, this means we canfind a point in the intersection in a finite number of steps; in fact, the closestpoint.The iconic example for non-commutative projectors illustrated inFigure E.4 shows the iterates converging to the closest point in the intersectionof two arbitrary convex sets. Yet simple examples like Figure E.5reveal that non-commutative alternating projection does not always yieldthe closest point, although we shall show it always yields some point in theintersection or a point that attains the distance between two convex sets.Alternating projection is also known as successive projection [183] [180][184], cyclic projection [137], successive approximation [179], or simplyprojection on convex sets [185, §6.4]. It is traced back to von Neumann(1933) [186] and later Wiener [187] who showed that higher iterates of aproduct of two orthogonal projections on subspaces converge at each pointin the ambient space to the unique projection on the intersection of the twosubspaces. More precisely, if R 1 and R 2 are closed subspaces of a Euclideanspace and P 1 and P 2 respectively denote orthogonal projection on R 1 and

424 APPENDIX E. PSEUDOINVERSE, PROJECTIONR 2 , then for each vector b in that space,lim (P 1P 2 ) i b = P R1 ∩ R 2b (1315)i→∞Deutsch [70, thm.9.8, thm.9.35] shows that rate of convergence for subspacesis geometric [4, §1.4.4]; bounded above by κ 2i+1 ‖b‖, i = 0, 1, 2..., where0≤κ

E.9. ALTERNATING PROJECTION 425C 2C 1C 1C 1wasbC 2C 2(a)(y −P 1 b) T (b −P 1 b)=0b(b)yP 1 bbPbP 1 P 2 b(c)Figure E.6:(a) (distance) Intersection of two convex sets in R 2 is empty. Method ofalternating projection would be applied to find that point in C 1 nearest C 2 .(b) (distance) Given b ∈ C 2 , then P 1 b ∈ C 1 is nearest b iff(y −P 1 b) T (b −P 1 b)≤0 ∀y ∈ C 1 by the unique projection theorem (§E.8.0.0.1).When b attains the distance between the two sets, the hyperplane{y | (b −P 1 b) T (y −P 1 b)=0} separates C 1 from C 2 . [1, §2.5.1](c) (0 distance) Intersection is nonempty. (optimization) We may wantthe point Pb in ⋂ C k nearest point b , or (feasibility) we may instead besatisfied with a fixed point x = P 1 P 2 b in ⋂ C k of the projection product.

426 APPENDIX E. PSEUDOINVERSE, PROJECTIONE.9.1 Distance and existenceThe existence of a fixed point is established:Theorem. Distance. [179] Given any two closed convex sets C 1 and C 2in R n , then P 1 b ∈ C 1 is a fixed point of the projection product P 1 P 2 if andonly if P 1 b is a point of C 1 nearest C 2 .⋄Proof. (=⇒) Given fixed point a = P 1 P 2 a ∈ C 1 with b ∆ = P 2 a ∈ C 2 intandem so that a =P 1 b , then by the unique projection theorem (§E.8.0.0.1)(u − a) T (b − a) ≤ 0 ∀u∈ C 1(v − b) T (a − b) ≤ 0 ∀v ∈ C 2⇔‖a − b‖ ≤ ‖u − v‖(1320)by the Schwarz inequality [38] [30].(⇐=) Suppose a ∈ C 1 and ‖a − P 2 a‖ ≤ ‖u − P 2 u‖ ∀u∈ C 1 . Now supposewe choose u =P 1 P 2 a . Then‖u − P 2 u‖ = ‖P 1 P 2 a − P 2 P 1 P 2 a‖ ≤ ‖a − P 2 a‖ ⇔ a = P 1 P 2 a (1321)Thus a = P 1 b (with b =P ∆ 2 a∈ C 2 ) is a fixed point in C 1 of the projectionproduct P 1 P 2 . E.15E.9.2Feasibility and convergenceThe set of all fixed points of any non-expansive mapping is a closed convexset. [188, lem.3.4] [189, §1] The projection product P 1 P 2 is non-expansiveby Fact E.8.0.0.2 because, for any x,a∈ R n ,‖P 1 P 2 x − P 1 P 2 a‖ ≤ ‖P 2 x − P 2 a‖ ≤ ‖x − a‖ (1322)If the intersection of two closed convex sets C 1 ∩ C 2 is empty, then the iteratesconverge to a point of minimum distance, a fixed point of the projectionproduct. Otherwise, convergence is to some fixed point in their intersection(a feasible point) whose existence is guaranteed by virtue of the fact that eachE.15 Point b=P 2 a can be shown similarly to be a fixed point of the product P 2 P 1 .

E.9. ALTERNATING PROJECTION 427and every point in the convex intersection is in one-to-one correspondencewith fixed points of the non-expansive projection product.Bauschke & Borwein [189, §2] argue that any sequence monotone in thesense of Fejér is convergent. E.16Definition. Fejér monotonicity. [190] Given closed convex set C ≠ ∅,then a sequence [ x i ∈ R n | i≥0 ] is monotone in the sense of Fejér with respectto C iff‖x i+1 − c‖ ≤ ‖x i − c‖ for every i≥0 and ∀c∈ C (1323)Given x 0 ∆ = b , if we express each iteration of alternating projection byx i+1 = P 1 P 2 x i , i=0, 1, 2... (1324)△and define any fixed point a = P 1 P 2 a , then the sequence x imonotone with respect to fixed point a becauseis Fejér‖P 1 P 2 x i − a‖ ≤ ‖x i − a‖ ∀i ≥ 0 (1325)by non-expansivity. The nonincreasing sequence ‖P 1 P 2 x i − a‖ is boundedbelow hence convergent because any bounded monotone sequence in R is convergent;[36, §1.2] [33, §1.1] P 1 P 2 x i+1 = P 1 P 2 x i = x i+1 . Sequence x i thereforeconverges to some fixed point. If the intersection C 1 ∩ C 2 is nonempty,convergence is to some point there by the distance theorem. Otherwise, x iconverges to a point in C 1 of minimum distance to C 2 .E.9.2.0.1 Example. Hyperplane/orthant intersection. Find a feasiblepoint ( ∏ P k )b belonging to the nonempty intersection of two convex sets:C 1 ∩ C 2 = R n + ∩ A = {y | y ≽ 0} ∩ {y | Ay = β} ⊂ R n (1326)the nonnegative orthant with affine subset A an intersection of hyperplanes(A∈ R m×n , β ∈ R(A)). Projection of an iterate x i ∈ R n on A is calculatedP 2 x i = x i − A T (AA T ) −1 (Ax i − β) (1246)while, thereafter, projection of the result on the orthant is simplyx i+1 = P 1 P 2 x i = max{0,P 2 x i } (1327)

428 APPENDIX E. PSEUDOINVERSE, PROJECTIONy 2θC 1 = R 2 +bPby 1C 2 = A = {y | [ 1 1 ]y = 1}Figure E.7: From Example E.9.2.0.1 in R 2 , showing von Neumann-style iterationsto find feasible point belonging to intersection of nonnegative orthantwith hyperplane. Point Pb lies at intersection of hyperplane with ordinate.In this particular example, the feasible point found is coincidentally optimal.The rate of convergence depends upon angle θ ; as it becomes more acute,convergence slows. [180, §3]( ∏ ∥ x i − P k)bi,k ∥201816141210864200 5 10 15 20 25 30 35 40 45iFigure E.8: Geometric convergence of iterates in norm, for Example E.9.2.0.1in R 1000 .

E.9. ALTERNATING PROJECTION 429where the maximum is entry-wise (§E.8.1.2.1).One realization of this problem in R 2 is illustrated in Figure E.7: ForA = [ 1 1 ] , β = 1, and x 0 = b = [ −3 1/2 ] T , the iterates converge to thefeasible point Pb = [ 0 1 ] T .To give a more palpable sense of convergence in higher dimension, wedo this example again but now we compute an alternating projection forthe case A ∈ R 400×1000 , β ∈ R 400 , and b ∈ R 1000 , all of whose entries areindependently and randomly set to a uniformly distributed real number inthe interval [−1, 1] . The sequence ‖x i − ( ∏ i,kP k )b‖ is plotted in Figure E.8.This application of alternating projection to feasibility is extensible toany finite number of closed convex sets.E.9.2.1Relative measure of convergenceThe algorithm we used in the Example illustrated in Figure E.8 required twopasses; the first estimates point lim( ∏ P l )b in the presumably nonempty intersection,a technique motivated by Fejér monotonicity. A priori knowledgei→∞i,lof a feasible point to monitor convergence is impractical and antithetical, sowe need an alternative measure. Non-expansiveness implies( k∏( k∏∥ ∥∥∥∥ P l)x k,i−1 − P l)x ki = ‖x ki − x k,i+1 ‖ ≤ ‖x k,i−1 − x ki ‖ (1328)∥llwhere x ki ∈ R n represents unique projection of x k+1,i on convex set k atiteration i . So a good convergence measure is the total monotonic sequenceε i ∆ = ∑ k‖x ki − x k,i+1 ‖ (1329)where limi→∞ε i = 0 whether or not the intersection is nonempty.E.9.2.1.1 Example. Affine subset of positive semidefinite cone.Consider the problem of finding X ∈ S n that satisfiesX ≽ 0, 〈A j ,X〉 = b j , j =1... m (1330)E.16 Other authors prove convergence by different means. [180] [184]

430 APPENDIX E. PSEUDOINVERSE, PROJECTIONgiven nonzero A j ∈ S n and real b j . Here we take C 1 to be the positivesemidefinite cone S n + while C 2 is the affine subset of S nC 2 = A = ∆ {X | tr(A j X)=b j , j =1... m} ⊆ S n⎡ ⎤vec(A 1 ) T= {X | ⎣ . ⎦vec X = b}vec(A m ) T∆= {X | A vecX = b}(1331)where b = [b j ] ∈ R m , A ∈ R m×n2 , and vectorization vec is defined in (18).Projection of iterate X i ∈ S n on A is: (§E.5.0.1.6)P 2 vec X i = vec X i − A † (A vec X i − b) (1332)The Euclidean distance from X i to A is thereforedist(X i , A) = ‖X i − P 2 X i ‖ F = ‖A † (A vec X i − b)‖ 2 (1333)Projection on the positive semidefinite cone (§7.1) is found from the eigendecompositionP 2 X i = ∑ λ j q j qj T ;jn∑P 1 P 2 X i = max{0,λ j }q j qj T (1334)j=1The distance from P 2 X i to the positive semidefinite cone is thereforen∑dist(P 2 X i , S n +) = ‖P 2 X i − P 1 P 2 X i ‖ F = √ min{0,λ j } 2 (1335)When the intersection is empty A ∩ S n + = ∅ , the alternating projectionsconverge to that positive semidefinite matrix closest to A in the Euclideansense. Otherwise, the alternation converges to some point in the nonemptyintersection.Barvinok (§2.6.6.4.1) shows that if a point feasible with (1330) exists,then there exists an X ∈ A ∩ S n + such that⌊√ ⌋ 8m + 1 − 1rankX ≤(148)2j=1

E.9. ALTERNATING PROJECTION 431E.9.2.1.2 Example. Semidefinite matrix completion.Continuing Example E.9.2.1.1: When m ≤n(n + 1)/2 and the A j matricesare unique members of the standard orthonormal basis {E lq ∈ S n } (38),{A j ∈ S n , j =1... m} ⊆ {E lq } ={el e T l , l = q = 1... n}√12(e l e T q + e q e T l ), 1 ≤ l < q ≤ nand when the constants b j are set to constrained entries of variable X,{b j , j =1... m} ⊆{ }Xlq√, l = q = 1... nX lq 2 , 1 ≤ l < q ≤ n(1336)= {〈X,E lq 〉} (1337)then the equality constraints in (1330) fix individual entries of X ∈ S n . Thusthe feasibility problem becomes a positive semidefinite matrix completionproblem. Projection of iterate X i ∈ S n on A simplifies to (confer (1332))P 2 vec X i = vec X i − A T (A vec X i − b) (1338)From this we can see that orthogonal projection is achieved simply by settingcorresponding entries of P 2 X i to the known entries of X, while the remainingentries of P 2 X i are set to corresponding entries of the current iterate X i .Using this technique, we find a positive semidefinite completion for⎡⎢⎣4 3 ? 23 4 3 ?? 3 4 32 ? 3 4⎤⎥⎦ (1339)Initializing the unknown entries to 0, they all converge geometrically to1.5858 (rounded) after about 42 iterations.Laurent gives a problem for which no positive semidefinite completionexists: [191]⎡⎢⎣1 1 ? 01 1 1 ?? 1 1 10 ? 1 1⎤⎥⎦ (1340)

432 APPENDIX E. PSEUDOINVERSE, PROJECTION10 0dist(P 2 X i , S n +)10 −10 2 4 6 8 10 12 14 16 18iFigure E.9: Distance between iterate in A (1331) and PSD cone for Laurent’sproblem; initially decreasing geometrically.By alternating projection we find the constrained matrix closest to the positivesemidefinite cone,⎡⎤⎢⎣1 1 0.5454 01 1 1 0.54540.5454 1 1 10 0.5454 1 1⎥⎦ (1341)and we find the positive semidefinite matrix closest to the affine subset A(1331):⎡⎤1.0521 0.9409 0.5454 0.0292⎢ 0.9409 1.0980 0.9451 0.5454⎥⎣ 0.5454 0.9451 1.0980 0.9409 ⎦ (1342)0.0292 0.5454 0.9409 1.0521These matrices (1341) and (1342) achieve the Euclidean distancedist(A , S n +) . Convergence is illustrated in Figure E.9. E.9.3OptimizationUnique projection on the nonempty intersection of arbitrary convex sets tofind the closest point therein is a convex optimization problem. The firstsuccessful application of alternating projection to this problem is attributedto Dykstra [192] [193] who in 1983 provided an elegant algorithm that prevails

E.9. ALTERNATING PROJECTION 433bH 1y 21x 22x 21Hy 211x 12H 1 ∩ H 2x 11Figure E.10: H 1 and H 2 are the same halfspaces as in Figure E.5.Dykstra’s alternating projection algorithm generates the sequence[ b, x 21 , x 11 , x 22 , x 12 , x 12 ...]. The path illustrated from b to x 12 inR 2 terminates at the desired result, Pb . The iterates are not so robust inthe presence of noise as for the example in Figure E.4.today. In 1988, Han [183] rediscovered the algorithm and provided a primaldualconvergence proof. A synopsis of the history of alternating projectioncan be found in [194] where it becomes apparent that Dykstra’s work isseminal. E.17E.9.3.1Dykstra’s algorithmGiven some point b ∈ R n and closed convex sets {C k ⊂ R n | k = 1... L}, letx ki ∈ R n and y ki ∈ R n respectively denote a primal and dual vector associatedwith set k at iteration i . Initialize y k0 = 0 ∀k = 1... L , and x 1,0 = b .Denoting by P k t the unique projection of t on C k , and for convenienceE.17 For a synopsis of alternating projection applied to distance geometry, see [73, §3.1].

434 APPENDIX E. PSEUDOINVERSE, PROJECTIONx L+1,i ∆ =x 1,i−1 , calculation of the iterates proceeds: E.18for i=1, 2,...until convergence {for k=L...1 {t = x k+1,i − y k,i−1x ki = P k ty ki = P k t − t}}(1343)Assuming a nonempty intersection, then the iterates converge to the uniqueprojection of point b on that intersection; [70, §9.24]Pb = limi→∞x 1i (1344)In the case all the C k are affine, then calculation of y ki is superfluous and thealgorithm becomes identical to alternating projection. [70, §9.26] [137, §1]Dykstra’s algorithm is so simple, elegant, and represents such a tiny incrementin computational intensity over alternating projection, that it is nearlyalways arguably cost-effective.E.9.3.1.1 Normal cone. Glunt [136, §4] observes that the overall effectof this iterative procedure is to drive t toward the translated normal coneto ⋂ C k at the solution Pb (translated to Pb).The normal cone to any set S ⊆ R n at any particular a ∈ R n is definedas the closed cone [29, §A.5]K ⊥ S (a) ∆ = {z ∈ R n | z T (y −a)≤0, ∀y ∈ S} (1345)an intersection of halfspaces about the origin, hence convex regardless of theconvexity of S . Projection on S of any point in the translated normal coneKS ⊥ (a) + a (translated to any⋂a∈S) is identical to a .The normal cone to Ck at Pb in Figure E.5 is the ray{ξ(b −Pb) | ξ ≥0}. Applying Dykstra’s algorithm to that example, convergenceto the desired result is achieved in two iterations as illustrated inFigure E.10. Yet applying Dykstra’s algorithm to the example in Figure E.4does not improve the rate of convergence, unfortunately, because the givenpoint b and all the iterates already belong to the translated normal cone atthe vertex of intersection.E.18 We reverse order of projection (k=L...1) for continuity of exposition.

Appendix FMATLAB programs...These programs are available on the author’s website:http://www.stanford.edu/∼dattorro/...F.1 isedm()...F.1.1 Subroutines for isedm()F.1.1.1 chop()F.1.1.2 Vn()F.1.1.3 signeig()F.1.1.4 Dx()F.0 c○ 2001 Jon Dattorro, all rights reserved.435

436 APPENDIX F. MATLAB PROGRAMS...F.1.1.5 modchol() [148]% modified Cholesky factorization.% -Walter Murray% A = R^T R - Efunction [r,e] = modchol(a)n = size(a,2);r = zeros(n,n); e = zeros(n,n);delta = eps*n*norm(a);gamma = max(abs(diag(a,0)));xi = max(max(abs(a - diag(diag(a,0)))));beta = sqrt(max([gamma, xi/sqrt(n^2-1), eps]));for k = 1:nif k+1

F.2. CONIC INDEPENDENCE 437F.2 conic independenceThe recommended subroutine lp() is a linear program solver from MATLAB’sOptimization Toolbox v2.0 (R11). Later releases of MATLAB replacelp() withlinprog() that we find quite inferior to lp() on a wide range of problems.% Test for c.i. of arbitrary directions. -Jon Dattorro 2001function [indep, how_many_depend, Xci] = conici(X);[n, N] = size(X);indep = ’conically independent’;how_many_depend = 0;if rank(X) == N, Xci = X; return, endcount = 1;new_N = N;for i=1:NA = [X(:,1:count-1) X(:,count+1:new_N); -eye(new_N-1)];b = [X(:,count); zeros(new_N-1,1)];[a, lambda, how] = lp(zeros(new_N-1,1),A,b,[ ],[ ],[ ],n,-1);if ~strcmp(how,’infeasible’)how_many_depend = how_many_depend + 1;indep = ’conically Dependent’;X(:,count) = [ ];new_N = new_N - 1;elsecount = count + 1;endendXci = X;

438 APPENDIX F. MATLAB PROGRAMS...F.2.1 lp()LP Linear programming.X=LP(f,A,b) solves the linear programming problem:min f’x subject to: Ax

F.3. MAP OF THE USA 439F.3 Map of the USAF.3.1EDM%Find map of USA using only distance information.% -Jon Dattorro 2001%EDM reconstruction problem.clear all;close all;load usalo; %From Matlab Mapping Toolbox%http://www-ccs.ucsd.edu/matlab/toolbox/map/usalo.html%To speed-up execution (decimate map data), make%’factor’ bigger positive integer.factor = 2;Mg = 2*factor; %Relative decimation factorsMs = factor;Mu = 2*factor;gtlakelat = decimate(gtlakelat,Mg);gtlakelon = decimate(gtlakelon,Mg);statelat = decimate(statelat,Ms);statelon = decimate(statelon,Ms);uslat = decimate(uslat,Mu);uslon = decimate(uslon,Mu);lat = [gtlakelat; statelat; uslat]*pi/180;lon = [gtlakelon; statelon; uslon]*pi/180;phi = pi/2 - lat;theta = lon;x = sin(phi).*cos(theta);y = sin(phi).*sin(theta);z = cos(phi);%plot original dataplot3(x,y,z), axis equal, axis off

440 APPENDIX F. MATLAB PROGRAMS...lengthNaN = length(lat);id = find(isfinite(x));X = [x(id)’; y(id)’; z(id)’];N = length(X(1,:))% Construct the distance matrixclear gtlakelat gtlakelon statelat statelonclear factor x y z phi theta conusclear uslat uslon Mg Ms Mu lat lonD = diag(X’*X)*ones(1,N) + ones(N,1)*diag(X’*X)’ - 2*X’*X;%destroy input dataclear XVn = [-ones(1,N-1); speye(N-1)];VDV = (-Vn’*D*Vn)/2;clear D Vnpack[evec evals flag] = eigs(VDV, speye(size(VDV)), 10, ’LR’);if flag, disp(’convergence problem’), return, end;evals = real(diag(evals));index = find(abs(evals) > eps*normest(VDV)*N);n = sum(evals(index) > 0);Xs = [zeros(n,1) diag(sqrt(evals(index)))*evec(:,index)’];warning off; Xsplot=zeros(3,lengthNaN)*(0/0); warning on;Xsplot(:,id) = Xs;figure(2)%plot map found via EDM.plot3(Xsplot(1,:), Xsplot(2,:), Xsplot(3,:))axis equal, axis off

F.3. MAP OF THE USA 441F.3.1.1USA map input-data decimation subroutinefunction xd = decimate(x,m)roll = 0;rock = 1;for i=1:length(x)if isnan(x(i))roll = 0;xd(rock) = x(i);rock=rock+1;elseif ~mod(roll,m)xd(rock) = x(i);rock=rock+1;endroll=roll+1;endendxd = xd’;F.3.2EDM using ordinal data%Find map of USA using ORDINAL distance information.% -Jon Dattorro 2003%EDM reconstruction problem.clear all;close all;load usalo; %From Matlab Mapping Toolbox%http://www-ccs.ucsd.edu/matlab/toolbox/map/usalo.htmlfactor = 2; %Execution time factor=2 approx. 18 minutes.Mg = 2*factor; %Relative decimation factorsMs = factor;Mu = 2*factor;gtlakelat = decimate(gtlakelat,Mg);gtlakelon = decimate(gtlakelon,Mg);

442 APPENDIX F. MATLAB PROGRAMS...statelat = decimate(statelat,Ms);statelon = decimate(statelon,Ms);uslat = decimate(uslat,Mu);uslon = decimate(uslon,Mu);lat = [gtlakelat; statelat; uslat]*pi/180;lon = [gtlakelon; statelon; uslon]*pi/180;phi = pi/2 - lat;theta = lon;x = sin(phi).*cos(theta);y = sin(phi).*sin(theta);z = cos(phi);%plot original dataplot3(x,y,z), axis equal, axis offlengthNaN = length(lat);id = find(isfinite(x));X = [x(id)’; y(id)’; z(id)’];N = length(X(1,:))% Construct the distance matrixclear gtlakelat gtlakelon statelat statelonclear factor x y z phi theta conusclear uslat uslon Mg Ms Mu lat lonD = diag(X’*X)*ones(1,N) + ones(N,1)*diag(X’*X)’ - 2*X’*X;%ORDINAL MDS - vectorize Dcount = 1;f = zeros(N*(N-1)/2,1);for j=1:Nfor i=1:Nif i

F.3. MAP OF THE USA 443%sorted = f(idx)[sorted idx] = sort(f);clear D sorted XM = (N*(N-1))/2;f(idx)=((1:M).^2)/M^2;%Create ordinal data matrixO = zeros(N,N);count = 1;for j=1:Nfor i=1:Nif i

444 APPENDIX F. MATLAB PROGRAMS...

Appendix GNotation and some definitionsbg ′g ′′b Tb H→Ydg→Ydg 2b i:jvector, scalar, or logical conditionfirst derivative of possibly multidimensional function with respect toreal argumentsecond derivative with respect to real argumentvector b transposevector b Hermitian (conjugate) transposefirst directional derivative of possibly multidimensional function g indirection Y ∈R K×L (maintains dimensions of g)second directional derivative of g in direction Ytruncated vector comprising i th through j th entry of vector bA matrix, vector, scalar, or logical conditionfat a fat matrix, meaning more columns than rows; [ ]skinny⎡a skinny matrix, meaning more rows than columns; ⎣⎤⎦G.0 c○ 2001 Jon Dattorro, all rights reserved.445

446 APPENDIX G. NOTATION AND SOME DEFINITIONSAF(C ∋A)A †some set (calligraphic ABCDEFGHIJ KLMN OPQRST UVWX YZ)smallest face of set C containing element AMoore-Penrose pseudoinverse of matrix AA 1/2 or √ A any matrix such that √ A √ A =A . For A ∈ S n + ,[39, §1.2]√A ∈ Sn+ is unique.A(:,i) i th column of matrix A [44, §1.1.8]A(j,:)a.i.c.i.l.i.ReImPSDSDPEDMEDM N∈∋j th row of matrix Aaffinely independentconically independentlinearly independentreal partimaginary partpositive semidefinitesemidefinite programEuclidean distance matrixcone of N × N Euclidean distance matrices in symmetric hollow subspacemembership, belongs tomembership, contains as in C ∋ y⊂ ⊃ ∩ ∪ from standard set theory, subset, superset, intersection, union≡equivalent to∆= defined equal to≈approximately equal to

447≃∼ =∴isomorphic to or withcongruent to or withtherefore◦ Hadamard product of matrices⊗Kronecker product of matrices⊕ vector sum of sets X = Y ⊕ Z where every element x ∈ X has aunique representation x = y + z where y ∈ Y and z ∈ Z . [30, p.19]X = Y ⊕Z ⇒ X = Y +Z . Each element in a vector sum of a collectionof subspaces has a unique representation when the basis from eachsubspace is linearly independent with respect to all the others.⊞A ⊥ Borthogonal vector sum of sets X = Y ⊞ Z where every element x ∈ Xhas a unique orthogonal decomposition x=y +z where y ∈ Y , z ∈ Z ,and y ⊥ z . [31, p.51] X = Y ⊞ Z ⇒ X = Y + Z .A is orthogonal to B , where A and B are sets, vectors, or matrices\A logical not A , or relative complement of A ; e.g.,B\A = {x∈B | x /∈A}⇔if and only if or corresponds to or necessary and sufficient⇒ or ⇐ implies; e.g., A ⇒ B ⇔ \A ⇐ \B or →↓←does not implygoes togoes to from above; e.g., above might mean positive in some context[29, p.2]is replaced with| as in f(x) | x ∈ C means with the condition(s) or such that orevaluated at, or as in {f(x) | x∈ C} means evaluated at each and everyx belonging to Cg| xpexpression g evaluated at x p

448 APPENDIX G. NOTATION AND SOME DEFINITIONS: as in f : R n → R m meaning f is a mappingf : A → B∃∀AB[A,B ]meaning f is a mapping from ambient space A to ambient B ; notnecessarily denoting either domain or rangesuch thatthere existsfor allclosed line segment ABclosed interval or line segment between A and B in R[ ] square brackets denote a matrix or sequence; e.g., respectively [A B ]or [A i ]⌊ ⌋ floor function, ⌊x⌋ is greatest integer not exceeding x| | entry-wise absolute value of scalars, vectors, and matricesdet(A,B)matrix determinantopen interval between A and B in R{ } curly braces denote a set or list, e.g., {Xa | a ≽ 0} the set of all Xafor each and every a ≽ 0 where membership of a to some space isimplicit, a union∅empty set〈 , 〉 angle brackets denote vector inner productx pparticular value of xx 0 particular instance of x , or initial value of a sequence [x i ]x 1 first entry of vector x , or first element or a set or list {x i }x ⋆x ∗optimal value of variable xcomplex conjugate

449f ∗K ∗P C x or PxP k xδ(A)λ(A)√dijdistd ijconvex conjugate functiondual cone or setprojection of point x on set Cprojection of point x on set C kvector made from the main diagonal of A if A is a matrix; otherwise,the diagonal matrix made from the vector A . δ 2 (·) ≡ δδ(·)vector of eigenvalues of matrix A , typically arranged in nonincreasingorder(absolute) distance scalardistance between point or set argumentsdistance-square scalard ijCupper bound on distance-square d ijclosure of set Cd ijd∆VV NXDDrlower bound on distance-square d ijvector of distance-squaredistance scalar, or matrix of absolute distance, or difference operator,or diagonal matrixN×N symmetric elementary, auxiliary, and geometric centering matrixN ×N −1 Schoenberg auxiliary matrixpoint list, set of generators, or extreme directions in R n×N , or matrixvariablematrix of distance square, or Euclidean distance matrixEuclidean distance matrix function, EDM definitionaffine dimension

450 APPENDIX G. NOTATION AND SOME DEFINITIONSnN∂∂y∂KdomepiR(A)basis R(A)N(A)R n or R n×nı or jC n or C n×nlist X dimension, or integercardinality of list X , or integerboundary or partial derivative or matrix of distance-square squaredpartial differential of yboundary of set Kdomain of function argumentepigraph of functionrange of Acolumnar basis for range of Anullspace of AEuclidean vector space√ −1complex Euclidean vector spaceR n + or R n×n+ nonnegative orthant in Euclidean vector spaceR ṉ or R n×n- nonpositive orthantR n i-S nS n⊥S n +int S n +orthant whose only negative coordinate is the i thsubspace comprising all real symmetric n×n matrices, the symmetricsubspaceorthogonal complement of S n in R n×nconvex cone comprising all real symmetric positive semidefinite n×nmatrices, the positive semidefinite coneinterior of convex cone comprising all real symmetric positive semidefiniten×n matrices; id est, positive definite matrices

451S n 1S n 0R m×ngS n gS n⊥gsubspace comprising all symmetric n×n matrices having all zeros infirst row and columnsubspace comprising all real symmetric hollow n×n matrices (0 maindiagonal), the symmetric hollow subspacesubspace comprising all geometrically centered m×n matricessubspace comprising all geometrically centered symmetric n×n matrices,the geometric center subspaceorthogonal complement of S n g in S nX ⊥ basis N(X T )x ⊥ N(x T )R(P ) ⊥ N(P T )R ⊥ set orthogonal to set R⊆ R n ; R ⊥ ={y ∆ ∈ R n | 〈x,y〉=0 ∀x∈ R}K ⊥K M+K +MK ∗ λδH∂H∂H∂H +∂H −Inormal conemonotone nonnegative conemonotone nonnegative cone with reversed indicescone of majorizationhalfspacehyperplane; id est, partial boundary of halfspacesupporting hyperplanesupporting hyperplane having inward-normal with respect to Hsupporting hyperplane having outward-normal with respect to Hidentity matrix0 vector or matrix of zeros1 vector of ones

452 APPENDIX G. NOTATION AND SOME DEFINITIONSe iargsupvector whose i th entry is 1, otherwise 0, or member of the standardbasis for R nargument of operator or functionsupremum, least upper bound [29, §0.1.1] (this bound, as in boundary(5), is not necessarily a member of the set that is argument)max maximum [29, §0.1.1]infinfimum, greatest lower bound [29, §0.1.1] (this bound, as in boundary(5), is not necessarily a member of the set that is argument)min minimum [29, §0.1.1]iffrelintsgnmodtrrankAif and only if, necessary and sufficient, meaning typically attachedto appearance of the word “if ” in a definition; a practice requiringabolition because of ambiguity thus conferredrelativeinteriorsignum functionmodulus operatormatrix tracerank of matrix A ; dim R(A)dim dimension, dim R n = n , dim(x ∈ R n ) = n , dim R(x ∈ R n ) = 1,dim R(A∈ R m×n )= rank(A)affconvcenvconecontentaffine hullconvex hullconvex envelopeconic hullcontent of high-dimensional bounded polyhedron, it is volume in 3 dimensions,area in 2, and so on

453cofmatrix of cofactors corresponding to matrix argumentvec vectorization of matrix, Euclidean dimension n 2svec vectorization of symmetric matrix, Euclidean dimension n(n + 1)/2dvec≽≥vectorization of symmetric hollow matrix, Euclidean dimensionn(n − 1)/2generalized inequality, for example, A ≽ 0 means vector or matrix Acan be expressed in a biorthogonal expansion having nonnegative coordinateswith respect to some implicit pointed closed convex cone K , orcomparison to the origin with respect to some implicit pointed closedconvex cone, or when K = S n + matrix A belongs to the positive semidefinitecone of symmetric matricesgreater than or equal to; comparison of scalars or entry-wise comparisonof vectors or matrices with respect to R‖x‖ vector 2-norm or Euclidean norm ‖x‖ 2√n∑‖x‖ l= l |x j | l vector l-normj=1‖x‖ 2 2= x T x‖x‖ ∞ = max{|x j |,∀j} infinity-norm‖x‖ 1 = 1 T |x| 1-norm, dual infinity-norm‖X‖ 2= sup ‖Xa‖ 2 = σ 1 = √ λ(X T X) 1‖a‖=1greatest singular valuematrix 2-norm (spectral norm),‖X‖ = ‖X‖ F Frobenius’ matrix norm

454 APPENDIX G. NOTATION AND SOME DEFINITIONS

Bibliography[1] Stephen Boyd and Lieven Vandenberghe. Convex Optimization.http://www.stanford.edu/~boyd/cvxbook[2] R. Tyrrell Rockafellar. Lagrange multipliers and optimality. SIAMReview, 35(2):183–238, June 1993.[3] Stephen Boyd, Laurent El Ghaoui, Eric Feron, and VenkataramananBalakrishnan. Linear Matrix Inequalities in System and Control Theory.SIAM, 1994.[4] Yinyu Ye. Interior Point Algorithms: Theory and Analysis. Wiley,1997.[5] Laurent El Ghaoui and Silviu-Iulian Niculescu, editors. Advances inLinear Matrix Inequality Methods in Control. SIAM, 2000.[6] Yurii Nesterov and Arkadii Nemirovskii. Interior-Point PolynomialAlgorithms in Convex Programming. SIAM, 1994.[7] Aharon Ben-Tal and Arkadi Nemirovski. Lectures on Modern ConvexOptimization: Analysis, Algorithms, and Engineering Applications.SIAM, 2001.[8] Jérôme Galtier. Semi-definite programming as a simple extension tolinear programming: convex optimization with queueing, equity andother telecom functionals. In 3ème Rencontres Francophones sur lesAspects Algorithmiques des Télécommunications (AlgoTel). INRIA,France, 2001.www-sop.inria.fr/mascotte/Jerome.Galtier/misc/Galtier01b.pdf455

456 BIBLIOGRAPHY[9] Günter M. Ziegler. Kissing numbers: Surprises in dimension four.Emissary, pages 4–5, Spring 2004.http://www.msri.org/communications/emissary[10] Joachim Dahl, Bernard H. Fleury, and Lieven Vandenberghe. Approximatemaximum-likelihood estimation using semidefinite programming.In Proceedings of the IEEE International Conference on Acoustics,Speech, and Signal Processing, volume VI, pages 721–724, April 2003.[11] Shao-Po Wu. max-det Programming with Applications in MagnitudeFilter Design. A dissertation submitted to the department of ElectricalEngineering, Stanford University, December 1997.[12] Mar Hershenson. Efficient description of the design space of analogcircuits. In Proceedings of the 40 th ACM/IEEE Design AutomationConference, pages 970–973, June 2003.[13] Peter Santos. Application platforms and synthesizable analog IP.In Proceedings of the Embedded Systems Conference, San FranciscoCalifornia USA, 2003.http://www.convexoptimization.com/TOOLS/Santos.pdf[14] Mar Hershenson. Design of pipeline analog-to-digital converters viageometric programming. In Proceedings of the IEEE/ACM InternationalConference on Computer Aided Design (ICCAD), pages 317–324, November 2002.[15] Mar Hershenson, Dave Colleran, Arash Hassibi, and Navraj Nandra.Synthesizable full custom mixed-signal IP. Electronics Design AutomationConsortium (EDA), 2002.http://www.eda.org/edps/edp02/PAPERS/edp02-s6 2.pdf[16] Arash Hassibi and Mar Hershenson. Automated optimal design ofswitched-capacitor filters. In Proceedings of the Conference on Design,Automation, and Test in Europe, page 1111, March 2002.[17] Mar Hershenson, Sunderarajan S. Mohan, Stephen Boyd, and ThomasLee. Optimization of inductor circuits via geometric programming. InProceedings of the 36 th ACM/IEEE Design Automation Conference,pages 994–998, 1999.http://www.stanford.edu/~boyd/papers/inductor opt.html

BIBLIOGRAPHY 457[18] Mar Hershenson, Stephen Boyd, and Thomas Lee. Optimal design ofa CMOS OpAmp via geometric programming. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, 20(1):1–21, January 2001.http://www.stanford.edu/~boyd/papers/opamp.html[19] Joel Dawson, Stephen Boyd, Mar Hershenson, and Thomas Lee.Optimal allocation of local feedback in multistage amplifiers viageometric programming. IEEE Transactions on Circuits and SystemsI: Fundamental Theory and Applications, 48(1):1–11, January 2001.http://www.stanford.edu/~boyd/papers/fdbk alloc.html[20] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and ThomasLee. Bandwidth extension in CMOS with optimized on-chip inductors.IEEE Journal of Solid-State Circuits, 35(3):346–355, March 2000.http://www.stanford.edu/~boyd/papers/bandwidth ext.html[21] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and ThomasLee. Simple accurate expressions for planar spiral inductances. IEEEJournal of Solid-State Circuits, 34(10):1419–1424, October 1999.http://www.stanford.edu/~boyd/papers/inductance expressions.html[22] G. M. Crippen and T. F. Havel. Distance Geometry and MolecularConformation. Wiley, 1988.[23] Alexander I. Barvinok. A Course in Convexity. American MathematicalSociety, 2002.[24] Florian Jarre. Convex analysis on symmetric matrices. In HenryWolkowicz, Romesh Saigal, and Lieven Vandenberghe, editors, Handbookof Semidefinite Programming: Theory, Algorithms, and Applications,chapter 2. Kluwer, 2000.[25] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis.Cambridge University Press, 1994.[26] Gilbert Strang. Linear Algebra and its Applications. Harcourt Brace,third edition, 1988.[27] Gilbert Strang. Introduction to Linear Algebra. Wellesley-CambridgePress, second edition, 1998.

458 BIBLIOGRAPHY[28] Roger A. Horn and Charles R. Johnson. Matrix Analysis. CambridgeUniversity Press, 1987.[29] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Fundamentalsof Convex Analysis. Springer-Verlag, 2001.[30] R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press,1997. First published in 1970.[31] Josef Stoer and Christoph Witzgall. Convexity and Optimization inFinite Dimensions I. Springer-Verlag, 1970.[32] Roger Webster. Convexity. Oxford University Press, 1994.[33] Dimitri P. Bertsekas, Angelia Nedić, and Asuman E. Ozdaglar.Convex Analysis and Optimization. Athena Scientific, 2003.[34] R. Tyrrell Rockafellar. Conjugate Duality and Optimization. SIAM,1974.[35] Jan van Tiel. Convex Analysis, an Introductory Text. Wiley, 1984.[36] Jerrold E. Marsden and Michael J. Hoffman. Elementary ClassicalAnalysis. Freeman, second edition, 1995.[37] David G. Luenberger. Optimization by Vector Space Methods. Wiley,1969.[38] Erwin Kreyszig. Introductory Functional Analysis with Applications.Wiley, 1989.[39] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis andNonlinear Optimization: Theory and Examples. Springer-Verlag, 2000.http://www.convexoptimization.com/TOOLS/Borwein.pdf[40] Alexander Graham. Kronecker Products and Matrix Calculus with Applications.Ellis Horwood Limited, 1981.[41] Willi-Hans Steeb. Matrix Calculus and Kronecker Product with Applicationsand C++ Programs. World Scientific Publishing Co., 1997.[42] Stephen J. Wright. Primal-Dual Interior-Point Methods. SIAM, 1997.

BIBLIOGRAPHY 459[43] Martin Vetterli and Jelena Kovačević. Wavelets and Subband Coding.Prentice-Hall, 1995.[44] Gene H. Golub and Charles F. Van Loan. Matrix Computations. JohnsHopkins, third edition, 1996.[45] Fuzhen Zhang. Matrix Theory: Basic Results and Techniques.Springer-Verlag, 1999.[46] Christopher P. Mawata.Graph Theory Lessons - Lesson 3: Isomorphism.www.utc.edu/Faculty/Christopher-Mawata/petersen/lesson3.htm ,2000.[47] Michael W. Trosset. Distance matrix completion by numerical optimization.Computational Optimization and Applications, 17(1):11–22,October 2000.[48] Eric W. Weisstein. Mathworld – A Wolfram Web Resource.http://mathworld.wolfram.com/search[49] Tom L. Hayden, Jim Wells, Wei-Min Liu, and Pablo Tarazaga. Thecone of distance matrices. Linear Algebra and its Applications, 144:153–169, 1991.[50] Michel Marie Deza and Monique Laurent. Geometry of Cuts and Metrics.Springer-Verlag, 1997.[51] Gábor Pataki. On the rank of extreme matrices in semidefinite programsand the multiplicity of optimal eigenvalues. Mathematics ofOperations Research, 23(2):339–358, 1998.http://citeseer.ist.psu.edu/pataki97rank.html.[52] Monique Laurent and Franz Rendl. Semidefinite programming andinteger programming. Optimization Online, 2002.http://www.optimization-online.org/DB HTML/2002/12/585.html[53] Keith Ball. An elementary introduction to modern convex geometry.In Silvio Levy, editor, Flavors of Geometry, volume 31, chapter 1,pages 1–58. MSRI Publications, 1997.www.msri.org/publications/books/Book31/files/ball.pdf

460 BIBLIOGRAPHY[54] Hong-Xuan Huang, Zhi-An Liang, and Panos M. Pardalos. Some propertiesfor the Euclidean distance matrix and positive semidefinit matrixcompletion problems. Journal of Global Optimization, 25(1):3–21, January2003.[55] George Phillip Barker and David Carlson. Cones of diagonally dominantmatrices. Pacific Journal of Mathematics, 57(1):15–32, 1975.[56] Stefan Straszewicz. Über exponierte Punkte abgeschlossener Punktmengen.Fundamenta Mathematicae, 24:139–143, 1935.http://www.convexoptimization.com/TOOLS/Straszewicz.pdf[57] Paul J. Kelly and Norman E. Ladd. Geometry. Scott, Foresman andCompany, 1965.[58] Richard D. Hill and Steven R. Waters. On the cone of positive semidefinitematrices. Linear Algebra and its Applications, 90:81–88, 1987.[59] Adrian S. Lewis. Eigenvalue-constrained faces. Linear Algebra and itsApplications, 269:159–181, 1998.[60] Alexander I. Barvinok. Problems of distance geometry and convexproperties of quadratic maps. Discrete & Computational Geometry,13(2):189–202, 1995.[61] Chen Han Sung and Bit-Shun Tam. A study of projectionally exposedcones. Linear Algebra and its Applications, 139:225–252, 1990.[62] George Phillip Barker. Theory of cones. Linear Algebra and its Applications,39:263–291, 1981.[63] George B. Dantzig. Linear Programming and Extensions.Princeton University Press, 1998.[64] D. Avis and K. Fukuda. A pivoting algorithm for convex hulls and vertexenumeration of arrangements and polyhedra. Discrete and ComputationalGeometry, 8:295–313, 1992.[65] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, secondedition, 1999.

BIBLIOGRAPHY 461[66] Philip E. Gill, Walter Murray, and Margaret H. Wright. PracticalOptimization. Academic Press, 1999.[67] Gábor Pataki. The geometry of semidefinite programming. In HenryWolkowicz, Romesh Saigal, and Lieven Vandenberghe, editors, Handbookof Semidefinite Programming: Theory, Algorithms, and Applications,chapter 3. Kluwer, 2000.[68] Stephen G. Nash and Ariela Sofer. Linear and Nonlinear Programming.McGraw-Hill, 1996.[69] Lieven Vandenberghe and Stephen Boyd. Semidefinite programming.SIAM Review, 38(1):49–95, March 1996.[70] Frank Deutsch. Best Approximation in Inner Product Spaces. Springer-Verlag, 2001.[71] Abraham Berman. Cones, Matrices, and Mathematical Programming,volume 79 of Lecture Notes in Economics and Mathematical Systems.Springer-Verlag, 1973.[72] Mehran Mesbahi and G. P. Papavassilopoulos. On the rank minimizationproblem over a positive semi-definite linear matrix inequality.IEEE Transactions on Automatic Control, 42(2):239–243, February1997.[73] Michael W. Trosset. Extensions of classical multidimensional scaling:Computational theory. www.math.wm.edu/~trosset/r.mds.html,2001. Revision of technical report entitled “Computing distances betweenconvex sets and subsets of the positive semidefinite matrices”first published in 1997.[74] Sara Robinson. Hooked on meshing, researcher creates award-winningtriangulation program. SIAM News, 36(9), November 2003.http://www.siam.org/news/news.php?id=370[75] Peter Gritzmann and Victor Klee. On the complexity of some basicproblems in computational convexity: II. volume and mixed volumes.Technical Report TR:94-31, DIMACS, Rutgers University, 1994.ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1994/94-31.p

462 BIBLIOGRAPHY[76] Peter Gritzmann and Victor Klee. On the complexity of some basicproblems in computational convexity: II. volume and mixed volumes.In T. Bisztriczky, P. McMullen, R. Schneider, and A. Ivić Weiss, editors,Polytopes: Abstract, Convex and Computational, pages 373–466.Kluwer Academic Publishers, 1994.[77] Carolyn Pillers Dobler. A matrix approach to finding a set of generatorsand finding the polar (dual) of a class of polyhedral cones. SIAM Journalon Matrix Analysis and Applications, 15(3):796–803, July 1994.Erratum: p.104 herein.[78] David D. Yao, Shuzhong Zhang, and Xun Yu Zhou. Stochastic linearquadraticcontrol via primal-dual semidefinite programming. SIAMReview, 46(1):87–111, March 2004. Erratum: p.115 herein.[79] George B. Thomas, Jr. Calculus and Analytic Geometry.Addison-Wesley, fourth edition, 1972.[80] Gilbert Strang. Calculus. Wellesley-Cambridge Press, 1992.[81] Robert M. Gray. Toeplitz and circulant matrices: A review.http://www-ee.stanford.edu/~gray/toeplitz.pdf[82] David G. Luenberger. Introduction to Dynamic Systems: Theory, Models,& Applications. Wiley, 1979.[83] Vladimir L. Levin. Quasi-convex functions and quasi-monotone operators.Journal of Convex Analysis, 2(1/2):167–172, 1995.[84] John Clifford Gower. Euclidean distance geometry. The MathematicalScientist, 7:1–14, 1982.http://www.convexoptimization.com/TOOLS/Gower2.pdf[85] Jan de Leeuw. Multidimensional scaling. In International Encyclopediaof the Social & Behavioral Sciences. Elsevier, 2001.http://preprints.stat.ucla.edu/274/274.pdf[86] Monique Laurent. A tour d’horizon on positive semidefinite andEuclidean distance matrix completion problems. In Panos M. Pardalosand Henry Wolkowicz, editors, Topics in Semidefinite and Interior-Point Methods, pages 51–76. American Mathematical Society, 1998.

BIBLIOGRAPHY 463[87] Isaac J. Schoenberg. Remarks to Maurice Fréchet’s article “Sur ladéfinition axiomatique d’une classe d’espace distanciés vectoriellementapplicable sur l’espace de Hilbert”. Annals of Mathematics, 36(3):724–732, July 1935.http://www.convexoptimization.com/TOOLS/Schoenberg2.pdf[88] Frank Critchley. On certain linear mappings between inner-productand squared-distance matrices. Linear Algebra and its Applications,105:91–107, 1988.[89] Alexander I. Barvinok. A remark on the rank of positive semidefinitematrices subject to affine constraints. Discrete & ComputationalGeometry, 25(1):23–31, 2001.http://citeseer.ist.psu.edu/304448.html[90] William Wooton, Edwin F. Beckenbach, and Frank J. Fleming. ModernAnalytic Geometry. Houghton Mifflin, 1975.[91] John Clifford Gower. Properties of Euclidean and non-Euclideandistance matrices. Linear Algebra and its Applications, 67:81–97,1985.http://www.convexoptimization.com/TOOLS/Gower1.pdf[92] John B. Conway. A Course in Functional Analysis. Springer-Verlag,second edition, 1990.[93] Abdo Y. Alfakih and Henry Wolkowicz. On the embeddability ofweighted graphs in Euclidean spaces. Research Report CORR 98-12,Department of Combinatorics and Optimization, University of Waterloo,May 1998.http://citeseer.ist.psu.edu/alfakih98embeddability.html.[94] Abdo Y. Alfakih, Amir Khandani, and Henry Wolkowicz. SolvingEuclidean distance matrix completion problems via semidefinite programming.Computational Optimization and Applications, 12(1):13–30,January 1999.http://citeseer.ist.psu.edu/alfakih97solving.html[95] Abdo Y. Alfakih and Henry Wolkowicz. Matrix completion problems.In Henry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe, edi-

464 BIBLIOGRAPHYtors, Handbook of Semidefinite Programming: Theory, Algorithms, andApplications, chapter 18. Kluwer, 2000.[96] Suliman Al-Homidan and Henry Wolkowicz. Approximate and exactcompletion problems for Euclidean distance matrices using semidefiniteprogramming, April 2004.orion.math.uwaterloo.ca/~hwolkowi/henry/reports/edmapr04.ps[97] Tom L. Hayden and Jim Wells. Approximation by matrices positivesemidefinite on a subspace. Linear Algebra and its Applications,109:115–130, 1988.[98] Leonard M. Blumenthal. Theory and Applications of Distance Geometry.Oxford University Press, 1953.[99] Gilbert W. Stewart and Ji-guang Sun. Matrix Perturbation Theory.Academic Press, 1990.[100] Ingwer Borg and Patrick Groenen. Modern Multidimensional Scaling.Springer-Verlag, 1997.[101] Jung Rye Lee. The law of cosines in a tetrahedron. Journal of the KoreaSociety of Mathematical Education Series B: The Pure and AppliedMathematics, 4(1):1–6, 1997.[102] John J. Edgell. Graphics calculator applications on 4-D constructs,1996.http://archives.math.utk.edu/ICTCM/EP-9/C47/pdf/paper.pdf[103] Willie W. Wong. Cayley-Menger determinant and generalizedN-dimensional Pythagorean theorem, November 2003. Application ofLinear Algebra: Notes on Talk given to Princeton University MathClub.http://www.princeton.edu/~wwong/papers/gp-r.pdf[104] C. S. Ogilvy. Excursions in Geometry. Dover, 1990. Citation: Proceedingsof the CUPM Geometry Conference, Mathematical Association ofAmerica, No.16 (1967), p.21.[105] Jean B. Lasserre and Eduardo S. Zeron. A Laplace transform algorithmfor the volume of a convex polytope. Journal of the Association for

BIBLIOGRAPHY 465Computing Machinery, 48(6):1126–1140, November 2001.http://arxiv.org/abs/math/0106168 (title revision).[106] Isaac J. Schoenberg. Metric spaces and positive definite functions.Transactions of the American Mathematical Society, 44:522–536, 1938.http://www.convexoptimization.com/TOOLS/Schoenberg3.pdf[107] Shankar Sastry. Nonlinear Systems: Analysis, Stability, and Control.Springer-Verlag, 1999.[108] Yves Chabrillac and Jean-Pierre Crouzeix. Definiteness and semidefinitenessof quadratic forms revisited. Linear Algebra and its Applications,63:283–292, 1984.[109] Alan V. Oppenheim and Ronald W. Schafer. Discrete-Time SignalProcessing. Prentice-Hall, 1989.[110] Boaz Porat. A Course in Digital Signal Processing. Wiley, 1997.[111] Robert M. Gray and Joseph W. Goodman. Fourier Transforms, AnIntroduction for Engineers. Kluwer Academic Publishers, 1995.[112] Ronald N. Bracewell. The Fourier Transform and Its Applications.McGraw-Hill, revised second edition, 1986.[113] Graham A. Wright. An introduction to magnetic resonance.www.sunnybrook.utoronto.ca:8080/~gawright/main mr.html ,2002.[114] Gary H. Glover and Christine S. Law. Spiral-in/out BOLD fMRIfor increased SNR and reduced susceptibility artifacts. MagneticResonance in Medicine, 46:515–522, 2001.http://rsl.stanford.edu/glover/sprlioMRM.pdf[115] Rudolf Mathar. The best Euclidean fit to a given distance matrix inprescribed dimensions. Linear Algebra and its Applications, 67:1–6,1985.[116] Anders Forsgren, Philip E. Gill, and Margaret H. Wright. Interiormethods for nonlinear optimization. SIAM Review, 44(4):525–597,2002.

466 BIBLIOGRAPHY[117] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver for semidefiniteprogramming and determinant maximization problems withmatrix structure, 1995.http://www.stanford.edu/~boyd/old software/SDPSOL.html[118] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver for semidefiniteprograms with matrix structure. In Laurent El Ghaoui andSilviu-Iulian Niculescu, editors, Advances in Linear Matrix InequalityMethods in Control, chapter 4, pages 79–91. SIAM, 2000.http://www.stanford.edu/~boyd/sdpsol.html[119] Peng Hui Tan and Lars K. Rasmussen. The application of semidefiniteprogramming for detection in CDMA. IEEE Journal on Selected Areasin Communications, 19(8), August 2001.[120] Elizabeth D. Dolan, Robert Fourer, Jorge J. Moré, and Todd S. Munson.Optimization on the NEOS server. SIAM News, 35(6):4,8,9, August2002.[121] Panos M. Pardalos and Henry Wolkowicz, editors. Topics in Semidefiniteand Interior-Point Methods. American Mathematical Society,1998.[122] Osman Güler and Yinyu Ye. Convergence behavior of interior-pointalgorithms. Mathematical Programming, 60(2):215–228, 1993.[123] Gábor Pataki. Cone-LP’s and semidefinite programs: Geometry anda simplex-type method. In William H. Cunningham, S. ThomasMcCormick, and Maurice Queyranne, editors, Integer Programmingand Combinatorial Optimization, Proceedings of the 5th InternationalIPCO Conference, Vancouver, British Columbia, Canada, June 3-5,1996, volume 1084 of Lecture Notes in Computer Science, pages162–174. Springer-Verlag, 1996.[124] Joakim Jaldén, Cristoff Martin, and Björn Ottersten. Semidefiniteprogramming for detection in linear systems − Optimality conditionsand space-time decoding. In Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing, volume VI,April 2003.http://www.s3.kth.se/signal/reports/03/IR-S3-SB-0309.pdf

BIBLIOGRAPHY 467[125] Joshua A. Singer. Log-Penalized Linear Regression. PhD thesis,Stanford University, Department of Electrical Engineering, June 2004.http://www.convexoptimization.com/TOOLS/Josh.pdf[126] Jan de Leeuw and Willem Heiser. Theory of multidimensional scaling.In P. R. Krishnaiah and L. N. Kanal, editors, Handbook of Statistics,volume 2, chapter 13, pages 285–316. North-Holland Publishing, Amsterdam,1982.[127] Michael W. Trosset and Rudolf Mathar. On the existence of nonglobalminimizers of the STRESS criterion for metric multidimensional scaling.In Proceedings of the Statistical Computing Section, pages 158–162.American Statistical Association, 1997.[128] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis.Academic Press, 1979.[129] K. V. Mardia. Some properties of classical multi-dimensional scaling.Communications in Statistics: Theory and Methods, A7(13):1233–1241, 1978.[130] Marie A. Vitulli. A brief history of linear algebra and matrix theory,2004.darkwing.uoregon.edu/~vitulli/441.sp04/LinAlgHistory.html[131] P. R. Halmos. Positive approximants of operators. Indiana UniversityMathematics Journal, 21:951–960, 1972.[132] Jan de Leeuw. Fitting distances by least squares. UCLA StatisticsSeries Technical Report No. 130, Interdivisional Program in Statistics,UCLA, Los Angeles California USA, 1993.http://citeseer.ist.psu.edu/deleeuw93fitting.html[133] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. A rankminimization heuristic with application to minimum order systemapproximation. In Proceedings of the American Control Conference,volume 6, pages 4734–4739. American Automatic Control Council(AACC), June 2001.http://www.cds.caltech.edu/~maryam/nucnorm.html

468 BIBLIOGRAPHY[134] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. Log-detheuristic for matrix rank minimization with applications to Hankeland Euclidean distance matrices. In Proceedings of the AmericanControl Conference. American Automatic Control Council (AACC),June 2003.http://www.cds.caltech.edu/~maryam/acc03 final.pdf[135] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Convex Analysisand Minimization Algorithms II: Advanced Theory and Bundle Methods.Springer-Verlag, second edition, 1996.[136] W. Glunt, Tom L. Hayden, S. Hong, and J. Wells. An alternating projectionalgorithm for computing the nearest Euclidean distance matrix.SIAM Journal on Matrix Analysis and Applications, 11(4):589–600,1990.[137] Norbert Gaffke and Rudolf Mathar. A cyclic projection algorithm viaduality. Metrika, 36:29–54, 1989.[138] Monique Laurent. A connection between positive semidefinite andEuclidean distance matrix completion problems. Linear Algebra andits Applications, 273:9–22, 1998.[139] Charles R. Johnson and Pablo Tarazaga. Connections between the realpositive semidefinite and distance matrix completion problems. LinearAlgebra and its Applications, 223/224:375–391, 1995.[140] L. Asimow and Ben G. Roth. The rigidity of graphs. Transactions ofthe American Mathematical Society, 245:279–289, 1978.[141] Bruce Hendrickson. Conditions for unique graph realizations. SIAMJournal on Computing, 21(1):65–84, February 1992.[142] Pratik Biswas and Yinyu Ye. Semidefinite programming for ad hocwireless sensor network localization. In Proceedings of the Third InternationalSymposium on Information Processing in Sensor Networks(IPSN), pages 46–54. IEEE, April 2004.http://www.stanford.edu/~yyye/adhocn2.pdf

BIBLIOGRAPHY 469[143] Yinyu Ye. Semidefinite programming for Euclidean distance geometricoptimization. Lecture notes, 2003.http://www.stanford.edu/class/ee392o/EE392o-yinyu-ye.pdf[144] Michael W. Trosset. Applications of multidimensional scaling to molecularconformation. Computing Science and Statistics, 29:148–152, 1998.[145] Richard Phillips Feynman, Robert B. Leighton, and Matthew L. Sands.The Feynman Lectures on Physics: Commemorative Issue, volume I.Addison-Wesley, 1989.[146] Katta G. Murty and Feng-Tien Yu. Linear Complementarity, Linearand Nonlinear Programming. Heldermann Verlag, Internet edition,1988.ioe.engin.umich.edu/people/fac/books/murty/linear complementarity webbook[147] Tosio Kato. Perturbation Theory for Linear Operators.Springer-Verlag, 1966.[148] Philip E. Gill, Walter Murray, and Margaret H. Wright. NumericalLinear Algebra and Optimization, volume 1. Addison-Wesley, 1991.[149] Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and HervéLebret. Applications of second-order cone programming. LinearAlgebra and its Applications, 284:193–228, November 1998. SpecialIssue on Linear Algebra in Control, Signals and Image Processing.http://www.stanford.edu/~boyd/socp.html[150] Lloyd N. Trefethen and David Bau, III. Numerical Linear Algebra.SIAM, 1997.[151] Jos F. Sturm and Shuzhong Zhang. On cones of nonnegative quadraticfunctions. Optimization Online, April 2001.www.optimization-online.org/DB HTML/2001/05/324.html[152] Thomas Kailath. Linear Systems. Prentice-Hall, 1980.[153] Nathan Jacobson. Lectures in Abstract Algebra, vol. II - Linear Algebra.Van Nostrand, 1953.

470 BIBLIOGRAPHY[154] George P. H. Styan and Akimichi Takemura. Rank additivity and matrixpolynomials. Technical Report 57, Stanford University, Departmentof Statistics, September 1982.[155] Alston S. Householder. The Theory of Matrices in Numerical Analysis.Dover, 1975.[156] Alex Reznik. Problem 3.41, from Sergio Verdú. Multiuser Detection.Cambridge University Press, 1998.www.ee.princeton.edu/~verdu/mud/solutions/3/3.41.areznik.pdf,2001.[157] John Clifford Gower and Garmt B. Dijksterhuis. Procrustes Problems.Oxford University Press, 2004.[158] Rajendra Bhatia. Matrix Analysis. Springer-Verlag, 1997.[159] Alan J. Hoffman and Helmut W. Wielandt. The variation of the spectrumof a normal matrix. Duke Mathematical Journal, 20:37–40, 1953.[160] Kurt Anstreicher and Henry Wolkowicz. On Lagrangian relaxation ofquadratic matrix constraints. SIAM Journal on Matrix Analysis andApplications, 22(1):41–55, 2000.[161] Alan Edelman, Tomás A. Arias, and Steven T. Smith. The geometryof algorithms with orthogonality constraints. SIAM Journal on MatrixAnalysis and Applications, 20(2):303–353, 1998.[162] Peter H. Schönemann. A generalized solution of the orthogonalProcrustes problem. Psychometrika, 31(1):1–10, 1966.[163] Pythagoras Papadimitriou. Parallel Solution of SVD-Related Problems,With Applications. PhD thesis, Department of Mathematics, Universityof Manchester, October 1993.[164] A. W. Bojanczyk and A. Lutoborski. The Procrustes problemfor orthogonal Stiefel matrices. SIAM Journal on Scientific Computing,21(4):1291–1304, February 1998. Date is electronic publication:http://citeseer.ist.psu.edu/500185.html

BIBLIOGRAPHY 471[165] Nick Higham. Matrix Procrustes problems, 1995.http://www.ma.man.ac.uk/~higham/talksLecture notes.[166] James Gleik. Isaac Newton. Pantheon Books, 2003.[167] Peter H. Schönemann, Tim Dorcey, and K. Kienapple. Subadditive concatenationin dissimilarity judgements. Perception and Psychophysics,38:1–17, 1985.[168] Bernard Widrow and Samuel D. Stearns. Adaptive Signal Processing.Prentice-Hall, 1985.[169] Mike Brookes. Matrix reference manual: Matrix calculus, 2002.http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html[170] Chi-Tsong Chen. Linear System Theory and Design. Oxford UniversityPress, 1999.[171] E. H. Moore. On the reciprocal of the general algebraic matrix. Bulletinof the American Mathematical Society, 26:394–395, 1920. Abstract.[172] T. N. E. Greville. Note on the generalized inverse of a matrix product.SIAM Review, 8:518–521, 1966.[173] Charles L. Lawson and Richard J. Hanson. Solving Least Squares Problems.SIAM, 1995.[174] Roger Penrose. A generalized inverse for matrices. In Proceedings ofthe Cambridge Philosophical Society, volume 51, pages 406–413, 1955.[175] Akimichi Takemura. On generalizations of Cochran’s theorem and projectionmatrices. Technical Report 44, Stanford University, Departmentof Statistics, August 1980.[176] George P. H. Styan. A review and some extensions of Takemura’sgeneralizations of Cochran’s theorem. Technical Report 56, StanfordUniversity, Department of Statistics, September 1982.[177] Howard Anton. Elementary Linear Algebra. Wiley, second edition,1977.

472 BIBLIOGRAPHY[178] Jean-Baptiste Hiriart-Urruty. Ensembles de Tchebychev vs. ensemblesconvexes: l’état de la situation vu via l’analyse convexe non lisse. Annalesdes Sciences Mathématiques du Québec, 22(1):47–62, 1998.[179] Ward Cheney and Allen A. Goldstein. Proximity maps for convex sets.Proceedings of the American Mathematical Society, 10:448–450, 1959.[180] L. G. Gubin, B. T. Polyak, and E. V. Raik. The method of projectionsfor finding the common point of convex sets. U.S.S.R. ComputationalMathematics and Mathematical Physics, 7(6):1–24, 1967.[181] Jonathan M. Borwein and Heinz Bauschke.Projection algorithms and monotone operators, 1998.http://oldweb.cecm.sfu.ca/personal/jborwein/projections4.pdf[182] Jon Dattorro. Constrained least squares fit of a filter bank to anarbitrary magnitude frequency response, 1991.http://www.stanford.edu/~dattorro/Hearing.htm[183] Shih-Ping Han. A successive projection method. Mathematical Programming,40:1–14, 1988.[184] Lev M. Brègman. The method of successive projection for finding acommon point of convex sets. Soviet Mathematics, 162(3):487–490,1965. AMS translation of Doklady Akademii Nauk SSSR, 6:688-692.[185] Henry Stark. Polar, spiral, and generalized sampling and interpolation.In Robert J. Marks II, editor, Advanced Topics in Shannon Samplingand Interpolation Theory, chapter 6, pages 185–218. Springer-Verlag,1993.[186] John von Neumann. Functional Operators, Volume II: The Geometryof Orthogonal Spaces. Princeton University Press, 1950. Reprintedfrom mimeographed lecture notes first distributed in 1933.[187] Norbert Wiener. On factorization of matrices. Commentarii MathematiciHelvetici, 29:97–111, 1955.[188] K. Goebel and W. A. Kirk. Topics in Metric Fixed Point Theory.Cambridge University Press, 1990.

BIBLIOGRAPHY 473[189] Heinz H. Bauschke and Jonathan M. Borwein. On projection algorithmsfor solving convex feasibility problems. SIAM Review,38(3):367–426, September 1996.[190] T. S. Motzkin and I. J. Schoenberg. The relaxation method for linearinequalities. Canadian Journal of Mathematics, 6:393–404, 1954.[191] Monique Laurent. Matrix completion problems. In Christodoulos A.Floudas and Panos M. Pardalos, editors, Encyclopedia of Optimization,volume III (Interior-M), pages 221–229. Kluwer, 2001.http://homepages.cwi.nl/~monique/files/opt.ps[192] Richard L. Dykstra. An algorithm for restricted least squares regression.Journal of the American Statistical Association, 78(384):837–842,1983.[193] James P. Boyle and Richard L. Dykstra. A method for finding projectionsonto the intersection of convex sets in Hilbert spaces. InR. Dykstra, T. Robertson, and F. T. Wright, editors, Advances in OrderRestricted Statistical Inference, pages 28–47. Springer-Verlag, 1986.[194] Lev M. Brègman, Yair Censor, Simeon Reich, and Yael Zepkowitz-Malachi. Finding the projection of a point onto the intersection ofconvex sets via projections onto halfspaces, 2003.http://www.optimization-online.org/DB FILE/2003/06/669.pdfG.0 c○ 2001 Jon Dattorro, all rights reserved.

474 BIBLIOGRAPHY

IndexG.0 c○ 2001 Jon Dattorro, all rights reserved.475

v2004.06.19 - Convex Optimization

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?