Lecture 6. The Tucker Representation - Gene Golub SIAM Summer ...

From Matrix to Tensor: 

The Transition to Numerical Multilinear Algebra 

Lecture 6. The Tucker Representation 

Charles F. Van Loan 

Cornell University 

The Gene Golub SIAM Summer School 2010 

Selva di Fasano, Brindisi, Italy 

⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 6. The Tucker Representation

Where We Are 

Lecture 1. Introduction to Tensor Computations 

Lecture 2. Tensor Unfoldings 

Lecture 3. Transpositions, Kronecker Products, Contractions 

Lecture 4. Tensor-Related Singular Value Decompositions 

Lecture 5. The CP Representation and Tensor Rank 

Lecture 6. The Tucker Representation 

Lecture 7. Other Decompositions and Nearness Problems 

Lecture 8. Multilinear Rayleigh Quotients 

Lecture 9. The Curse of Dimensionality 

Lecture 10. Special Topics 


What is This Lecture About? 

Approximate A ∈ IR n1×···×nd 

Find S ∈ IR r1×r2×r3 and orthonormal U1 ∈ IR n1×r1 , U2 ∈ IR n2×r2 

and U3 ∈ IR n3×r3 so that 

A ≈ 

r1 

j1=1 

r2 

j2=1 

r3 

S(j1, j2, j3) · U1(:, j1) ◦ U2(:, j2) ◦ U3(:, j3) 

j3=1 

Compare with the CP Representation from Last Lecture... 

Find λ ∈ IR r U1 ∈ IR n1×r , U2 ∈ IR n2×r and U3 ∈ IR n3×r so that 

A ≈ 

r 

λj · U1(:, j) ◦ U2(:, j) ◦ U3(:, j) 

j=1 

The SVD Ambition: Illuminating Sums of Rank-1 Tensors 



Approximate A ∈ IR n1×···×nd 

Find S ∈ IR r1×r2×r3 and orthonormal U1 ∈ IR n1×r1 , U2 ∈ IR n2×r2 

and U3 ∈ IR n3×r3 so that 

A ≈ 

r1 

j1=1 

r2 

j2=1 

r3 

S(j1, j2, j3) · U1(:, j1) ◦ U2(:, j2) ◦ U3(:, j3) 

j3=1 

Compare with SVD of Matrix... 

Find S ∈ IR r1×r2 and orthonormal U1 ∈ IR n1×r1 and U2 ∈ IR n3×r2 so 

that 

A ≈ 

r1 

j1=1 

r2 

S(j1, j2) · U1(:, j1) ◦ U2(:, j2) 

j2=1 

But we will not be able to make S(1:r1, 1:r2, 1:r3) diagonal. 



The Plan... 

Review the Tucker Representation and the HOSVD introduced in 

Lecture 4. 

Develop an Alternating Least Squares framework for minimizing 

 

 

 

r 

 

 

 

A − S(j) · U1(:, j1) ◦ U2(:, j2) ◦ U3(:, j3) 

 

 

j=1 

Re-examine the tensor rank issue. 

Use Order-3 to Motivate Main Ideas. 

⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 6. The Tucker Representation 

F

The Tucker Product Representation (Brief Review) 

Mode-k Multiplication 

Apply a matrix to all the mode-k fibers of a tensor. 

For example, if S ∈ IR r1×r2×r3 and U2 ∈ IR n2×r2 , then 

X = S × 2 U2 ⇔ X (2) = U2 · S (2) 



The Tucker Product 

A succession of mode-k products. 

For example, if S ∈ IR r1 × r2 × r3 , U1 ∈ IR n1×r1 , U2 ∈ IR n2×r2 , and 

U3 ∈ IR n3×r3 , then 

X = S × 1 U1 × 2 U2 × 3 U3 

= ((S × 1 U1) × 2 U2) × 3 U3 

= [[ S ; U1, U2, U3 ]] 

The tensor X ∈ IR n1×n2×n3 is represented in Tucker form. 



As a Sum of Rank-1 Tensors... 

If S ∈ IR r1×r2×r3 , U1 ∈ IR n1×r1 , U2 ∈ IR n2×r2 , U3 ∈ IR n3×r3 , and 

then 

X = 

r1 

j1=1 

r2 

j2=1 

X = [[ S ; U1, U2, U3 ]] 

r3 

S(j1, j2, j3) · U1(:, j1) ◦ U2(:, j2) ◦ U3(:, j3) 

j3=1 



As a Scalar Summation... 


then 

X (i1, i2, i3) = = 

r1 

X = [[ S ; U1, U2, U3 ]] 

r2 

j1=1 j2=1 j3=1 

r3 

S(j1, j2, j3) · U1(i1, j1) · U2(i2, j2) · U3(i3, j3) 



As a Matrix-Vector Product... 


then 

X = [[ S ; U1, U2, U3 ]] 

vec(X ) = (U3 ⊗ U2 ⊗ U1) · vec (S) 



When the U’s are Orthogonal... 

If A ∈ IR n1×n2×n3 is given and U1 ∈ IR n1×n1 , U2 ∈ IR n2×n2 , and 

U3 ∈ IR n3×n3 are orthogonal, then it is possible to determine 

so that A = X . 

X = [[ S ; U1, U2, U3 ]] 

S = A × 1 U T 1 × 2 U T 2 × 3U T 3 



When the U’s are from the Modal Unfolding SVDs... 

Suppose A ∈ IR n1×n2×n3 is given. If 

are SVDs and 

then 

is the higher-order SVD of A. 

A (1) = U1Σ1V T 1 

A (2) = U2Σ2V T 2 

A (3) = U3Σ3V T 3 

S = A × 1 U T 1 × 2 U T 2 × 3 U T 3 , 

A = [[ S ; U1, U2, U3 ]] 


Matlab Tensor Toolbox: Tucker Tensor Set-Up 

n = [5 8 3]; m = [4 6 2]; 

F = randn (n(1) ,m (1)); G = randn (n(2) ,m (2)); 

H = randn (n(3) ,m (3)); 

S = tenrand (m); 

X = ttensor (S ,{F,G,H }); 

Fsize = size (X.U {1}); Gsize = size (X.U {2}); 

Hsize = size (X.U {3}); 

Ssize = size (X. core ); s = size (X); 

A ttensor is a structure with two fields that is used to represent a tensor in 

Tucker form. In the above, X.core houses the the core tensor S while X.U is a 

cell array of the matrices F , G, and H that define the tensor X. 

Variable Value 

Fsize [5,4] 

Gsize [8,6] 

Hsize [3,2] 

Ssize [4 6 2] 

s [5 8 3] 


Problem 6.1. Suppose 

A = [[S; M1, M2, M3 ]] 

and that each Mi has more rows than columns. If Mi = QiRi is a QR 

factorization, then 

A = [[S; Q1, Q2, Q3 ]] ×1R1 ×2R2 ×3R3 

can be regarded as a QR factorization of A. 

Write a Matlab function [Q,R] = tensorQR(A) that carries out this 

decomposition where A, Q and R are ttensors. 


The Tucker Product Approximation Problem 

Definition 

Given A ∈ IR n1×n2×n3 and r ≤ n, determine 

S ∈ IR r1×r2×r3 the “core tensor” 

U1 ∈ IR n1×r1 orthonormal columns 



such that A − X F is minimized where 

r 

X = [[ S; U1, U2, U3 ]] = S(j) · U1(:, j1) ◦ U2(:, j2) ◦ U3(:, j3) 

j=1 

We say that X is a length-r Tucker tensor. 

In the matrix case, we solve this problem by truncating A’s SVD. 


The Truncated HOSVD 

Definition 

If 

A = [[ S; U1, U2, U3 ]] 

is the HOSVD of A ∈ IR n1×n2×n3 and r ≤ n, then 

Ar = [[ S(1:r1, 1:r2, 1:r3); U1(:, 1:r1), U2(:, 1:r2, U3(:, 1:r3) ]] 

is a truncated HOSVD of A. 

How good is X = Ar as a minimizer of A − X F ? Not optimal but... 



Look at A ≈ Ar at the scalar level... 

A(i1, i2, i3) = 

Ar(i1, i2, i3) = 

n1 

n2 

n3 

j1=1 j2=1 j3=1 

r1 

r2 

j1=1 j2=1 j3=1 


r3 


What can we say about the “thrown away” terms? 



The Core Tensor S is Graded 

Recall from Lecture 4 that S (k)( j, :) is the j-th singular value of 

A (k). This means that 

S(j, :, :) F = σj(A (1)) j = 1:n1 

S(:, j, :) F = σj(A (2)) j = 1:n2 

S(:, :, j) F = σj(A (3)) j = 1:n3 

The entries in S tend to get smaller as you move away from the (1,1,1) 

entry. 


Problem 6.2. Does this inequality hold? 

A − Ar 2 

F ≤ 

Can you do better? 

n1 X 

j=r1+1 

Problem 6.3. Show that 

σj(A(1)) 2 + 

n2 X 

j=r2+1 

σj(A(2)) 2 + 

n3 X 

j=r3+1 

σj(A(3)) 2 

|A(i1, i2, i3) − Xr(i1, i2, i3)| ≤ min{σr1+1(A(1)), σr2+1(A(2)), σr3+1(A(3)), } 



Once again, the problem... 

Given A ∈ IR n1×n2×n3 and r ≤ n, determine 

such that 

is minimized. 

S ∈ IR r1×r2×r3 the “core tensor” 




A − [[ S; U1, U2, U3 ]] F 

= 

vec(A) − (U3 ⊗ U2 ⊗ U1) vec(S) 

Note that vec(S) solves an ordinary least squares problem... 



Reformulation... 

Since S must minimize 

vec(A) − (U3 ⊗ U2 ⊗ U1) · vec(S) 

and U3 ⊗ U2 ⊗ U1 is orthonormal, we see that 

S = 

 

U T 3 ⊗ UT 2 ⊗ UT 1 

 

· vec(A) 

and so our goal is to choose the Ui so that 

I − (U3 ⊗ U2 ⊗ U1) UT 3 ⊗ UT 2 ⊗ UT 

1 vec(A) 

is minimized. 



Reformulation... 

Since U3 ⊗ U2 ⊗ U1 has orthonormal columns, it follows that our 

goal is to choose orthonormal Ui so that 

is maximized. 

If Q has orthonormal columns then 

UT 3 ⊗ UT 2 ⊗ UT 

1 · vec(A) 

(I − QQT )a 2 

2 = a 2 − QT a 2 

2 



Reshaping... 

UT 3 ⊗ UT 2 ⊗ UT 

1 · vec(A) 

= 

U T 1 · A (1) · (U3 ⊗ U2) F 

= 

U T 2 · A (2) · (U3 ⊗ U1) F 

= 

U T 3 · A (3) · (U2 ⊗ U1) F 

Sets the stage for an alternating least squares solution approach... 


Alternating Least Squares Framework 

A Sequence of Three Linear Problems... 

UT 3 ⊗ UT 2 ⊗ UT 

1 · vec(A) 

= 

U T 1 · A (1) · (U3 ⊗ U2) F 

= 

U T 2 · A (2) · (U3 ⊗ U1) F 

= 

U T 3 · A (3) · (U2 ⊗ U1) F 

⇐ 1. Fix U2 and U3 and 

maximize with U1. 





These max problems are SVD problems... 


Alternating Least Squares Framework 

A Sequence of Three Linear Problems... 

Repeat: 

1. Compute the SVD A (1) · (U3 ⊗ U2) = Ũ1Σ1V T 1 

and set U1 = Ũ1(:, 1:r1). 


and set U2 = Ũ2(:, 1:r2). 


and set U3 = Ũ3(:, 1:r3). 

Higher Order Orthogonal Iteration (HOOI) 


Problem 6.4. Write a Matlab function X = MyTucker3(A,r,itmax) 

that performs itMax steps of the HOOI algorithm to obtain a best length-r 

Tucker approximation to the order-3 tensor A. The output tensor X should 

be a ttensor. Use the truncated HOSVD to obtain an initial guess. Justify 

its use over a random starting guess. To make your implementation 

efficient, use the Kronecker product fact that if 

B = A · (Y ⊗ Z) 

then B(i, :) is a reshaping of a matrix product that involves Y , Z, and a 

reshaping of A(i, :). 

Problem 6.5. How does MyTucker3 behave it it is based on 

QR-with-column-pivoting instead of the SVD? 


The Order-d Case 


Given A ∈ IR n1×···×nd and r ≤ n, compute S ∈ IR r1×···×rd and 

orthonormal matrices Uk ∈ IR nk×rk for k = 1:d such that if 

X = [[ S; U1, . . . , Ud ]] = 

then A − X F is minimized. 

r 

S(j) U1(:, j1) ◦ · · · ◦ Ud(:, jd) 


j=1

The Order-d Case 

The ALS Framework Involves a Sequence of d SVD Problems... 

Repeat: 

for k = 1:d 

end 

Compute the SVD 

A (k) (Ud ⊗ · · · ⊗ Uk+1 ⊗ Uk−1 ⊗ · · · ⊗ U1) = ŨkΣkV T k 

and set Uk = Ũk(:, 1:rk) 

What about the choice of r = [ r1, . . . , rd ]? 


Matlab Tensor Toolbox: The Function tucker als 

n = [ 5 6 7 ]; 

% Generate a random tensor ... 

A = tenrand (n); 

for r = 1: min (n) 

% Find the closest length -[r r r] ttensor ... 

X = tucker_als (A ,[r r r ]); 

% Display the fit ... 

E = double (X)- double (A); 

fit = norm ( reshape (E, prod (n ) ,1)); 

fprintf (’r = %1d, fit = %5.3 e\n’,r,fit ); 

end 

The function Tucker als returns a ttensor. Default values for the number of 

iterations and the termination criteria can be modified: 

X = Tucker als(A,r,’maxiters’,20,’tol’,.001) 


Problem 6.6. Compare the efficiency of MyTucker3 and tucker als. 

Problem 6.7. Do cp als(A,1) and tucker als(A,1) return the same 

rank-1 tensor? 


More on Tensor Rank 

k-Rank 

If A ∈ IR n1×···×nd and 1 ≤ k ≤ d, then its k-rank is defined by 

rankk(A) = rank(A (k)) 

Problem 6.8. Show that if A ∈ IR n1×···×n d then there exists 

X = [[ S ; U1, . . . , Ud]] 

where for k = 1:d, Uk ∈ IR n k ×r k has orthonormal columns and 

rk = rankk(A). This can be thought of as a “thin” HOSVD. 


More on Tensor Rank 

rank(Any Unfolding of A) ≤ rank(A) 

Suppose Uk ∈ IR nk ×r and that 

A = 

r 

U1(:, k) ◦ · · · ◦ Ud(:, k). 

k=1 

If p is any permutation of 1:d and 1 ≤ j < d, then 

tenmat(A, p(1:j), p( j + 1:d)) 

 

Up(j) ⊙ · · · ⊙ Up(1) Up(d) ⊙ · · · ⊙ Up( j+1) 

= 

= 

r 

Up(j)(:, k) ⊗ · · · ⊗ Up(1)(:, j) Up(d)(:, k) ⊗ · · · ⊗ Up( j+1)(:, j) T k=1 


T

Problem 6.9. Does it follow that the “most square” unfolding of 

A ∈ IR n1×···×n d has the highest rank? 

Problem 6.10. Suppose M is an unfolding of A ∈ IR n1×···×n d. How might 

you construct a good ktensor or ttensor approximation to A from the SVD 

of M? 


Summary of Lecture 6. 

Key Words 

The Tucker Approximation Problem for a given tensor A and 

a given integer vector r, involves finding the nearest length-r 

Tucker tensor to A in the Frobenius norm. 

The alternating least squares framework is used by 

tucker als to solve the Tucker approximation problem. It 

proceeds by solving a sequence of SVD problems. 

The k-rank of a tensor is the matrix rank of its mode-k 

unfolding.

Lecture 6. The Tucker Representation - Gene Golub SIAM Summer ...

Create successful ePaper yourself

Delete template?

Save as template?