15.05.2015 Views

Greville's Method for Preconditioning Least Squares ... - Projects

Greville's Method for Preconditioning Least Squares ... - Projects

Greville's Method for Preconditioning Least Squares ... - Projects

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Noname manuscript No.<br />

(will be inserted by the editor)<br />

Greville’s <strong>Method</strong> <strong>for</strong> <strong>Preconditioning</strong> <strong>Least</strong> <strong>Squares</strong><br />

Problems<br />

Xiaoke Cui · Ken Hayami · Jun-Feng Yin<br />

Received: date / Accepted: date<br />

Abstract In this paper, we propose a preconditioning algorithm <strong>for</strong> least squares<br />

problems min<br />

x∈R n‖b − Ax‖ 2, where A can be matrices with any shape or rank. The preconditioner<br />

is constructed to be a sparse approximation to the Moore-Penrose inverse<br />

of the coefficient matrix A. For this preconditioner, we provide theoretical analysis<br />

to show that under our assumption, the least squares problem preconditioned by this<br />

preconditioner is equivalent to the original problem, and the GMRES method can determine<br />

a solution to the preconditioned problem be<strong>for</strong>e breakdown happens. In the<br />

end of this paper, we also give some numerical examples showing the per<strong>for</strong>mance of<br />

the method.<br />

Keywords <strong>Least</strong> <strong>Squares</strong> Problem · <strong>Preconditioning</strong> · Moore-Penrose Inverse ·<br />

Greville Algorithm · GMRES<br />

Mathematics Subject Classification (2000) 65F08 · 65F10<br />

This research was supported by the Grants-in-Aid <strong>for</strong> Scientific Research(C) of the Ministry<br />

of Education, Culture, Sports, Science and Technology, Japan.<br />

Xiaoke Cui<br />

Department of In<strong>for</strong>matics, School of Multidisciplinary Sciences The Graduate University <strong>for</strong><br />

Advanced Studies (Sokendai), 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan, 101-8430.<br />

Presently Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha,<br />

Kashiwa, Chiba, Japan, 277-8568.<br />

E-mail: xkcui@sml.k.u-tokyo.ac.jp<br />

Ken Hayami<br />

National Institute of In<strong>for</strong>matics, Japan, also Department of In<strong>for</strong>matics, School of Multidisciplinary<br />

Sciences The Graduate University <strong>for</strong> Advanced Studies (Sokendai), 2-1-2 Hitotsubashi,<br />

Chiyoda-ku, Tokyo, Japan, 101-8430.<br />

Tel.: +81-3-4212-2553<br />

E-mail: hayami@nii.ac.jp<br />

Jun-Feng Yin<br />

Department of Mathematics, Tongji University, 1239 Siping Road, Shanghai 200092, P. R.<br />

China.<br />

E-mail: yinjf@tongji.edu.cn


2<br />

1 Introduction<br />

Consider the least squares problem,<br />

min ‖b − Ax‖ 2, (1.1)<br />

x∈R n<br />

where A ∈ R m×n , b ∈ R m .<br />

To solve least squares problems, there are two main kinds of methods : direct<br />

methods and iterative methods [4]. When A is large and sparse, iterative methods,<br />

especially Krylov subspace methods are preferred <strong>for</strong> solving (1.1), since they require<br />

less memory and computational work compared to direct methods. In [14], Hayami<br />

et al. proposed using GMRES [16] to solve least squares problems by using some<br />

preconditioners. By using a preconditioner B ∈ R n×m preconditioning (1.1) from the<br />

left, we can trans<strong>for</strong>m problem (1.1) to<br />

min ‖Bb − BAx‖ 2. (1.2)<br />

x∈R n<br />

On the other hand, we can also precondition problem (1.1) from the right and trans<strong>for</strong>m<br />

the problem (1.1) to<br />

min ‖b − ABy‖ 2. (1.3)<br />

y∈R m<br />

In [14], necessary and sufficient conditions <strong>for</strong> B such that (1.2) and (1.3) are equivalent<br />

to (1.1) were given, respectively.<br />

There are different methods to construct the preconditioner B, the simplest choice<br />

of B is to choose B = A T . If rank(A) = n, one may use B = CA T where C =<br />

diag(A T A) −1 or better, use Robust Incomplete Factorization(RIF) [2,3] to construct<br />

C[14]. Another possibility is to use the incomplete Givens orthogonalization method to<br />

construct B [19]. However, so far no efficient preconditioners has been proposed <strong>for</strong> the<br />

rank deficient case. In this paper, we will construct a preconditioner which also works<br />

when A is rank deficient. The idea is to use the approximate inverse of the coefficient<br />

matrix of a linear system<br />

Ax = b (1.4)<br />

with A ∈ R n×n as a preconditioner. This kind of preconditioners were originally developed<br />

<strong>for</strong> solving nonsingular linear systems [10,15]. Since now A is a general matrix, we<br />

construct matrix M ∈ R n×m which is an approximation to the Moore-Penrose inverse<br />

[9,17] of A, and use M to precondition the least squares problem (1.1). When A is rank<br />

deficient, the preconditioner M can also be rank deficient. Note, in passing, that in [20]<br />

it was shown that <strong>for</strong> singular linear systems, singular preconditioners are sometimes<br />

better than nonsingular preconditioners.<br />

The main contribution of this paper is to give a new way to precondition general<br />

least squares problems from the perspective of the approximate Moore-Penrose<br />

inverse. Similar to the RIF preconditioner[2,3] our method also includes an A T A-<br />

orthogonalization process when the coefficient matrix is full column rank. When A<br />

is rank deficient, our method tries to orthogonalize the linearly independent part in<br />

A. We also give a theoretical analysis on the equivalence between the preconditioned<br />

problem and the original problem, and discuss the possibility of breakdown when using<br />

GMRES to solve the preconditioned system.<br />

The rest of the paper is organized as follows. In Section 2, we first introduce some<br />

properties of the Moore-Penrose inverse of a rank-one updated matrix and the original


3<br />

Greville’s method. In Section 3, based on the Greville’s method, we propose a global<br />

algorithm and a vector-wise algorithm <strong>for</strong> constructing the preconditioner M which is<br />

an approximate generalized inverse of A. In Section 4, we show that <strong>for</strong> full column<br />

rank matrix A, our algorithm is similar to the RIF preconditioning algorithm[3] and<br />

includes an A T A-orthogonalization process. In Section 5 and Section 6, we prove that<br />

under a certain assumption, using our preconditioner M, the preconditioned problem is<br />

equivalent to the original problem, and the GMRES method can determine a solution<br />

to the preconditioned problem be<strong>for</strong>e breakdown happens. In Section 7, we consider<br />

some details on the implementation of our algorithms. Numerical results are presented<br />

in Section 8. Section 9 concludes the paper.<br />

2 Greville’s <strong>Method</strong><br />

Given a rectangular matrix A ∈ R m×n , rank(A) = r ≤ min{m, n}, assume that the<br />

Moore-Penrose inverse of A is known. We are interested in how to compute the Moore-<br />

Penrose inverse of<br />

A + cd T , c ∈ R m , d ∈ R n , (2.1)<br />

which is a rank-one update of A [9,12]. In [9], the following six logical possibilities are<br />

considered<br />

1. c ∉ R(A), d ∉ R(A T ) and 1 + d T A † c arbitrary,<br />

2. c ∈ R(A), d ∉ R(A T ) and 1 + d T A † c = 0,<br />

3. c ∈ R(A), d arbitrary and 1 + d T A † c ≠ 0,<br />

4. c ∉ R(A), d ∈ R(A T ) and 1 + d T A † c = 0,<br />

5. c arbitrary, d ∈ R(A T ) and 1 + d T A † c ≠ 0,<br />

6. c ∈ R(A), d ∈ R(A T ) and 1 + d T A † c = 0.<br />

Here R(A) denotes the range space of A. For each possibility, an expression <strong>for</strong> the<br />

Moore-Penrose inverse of the rank one update of A is given by the following theorem<br />

[9].<br />

Theorem 1 For A ∈ R m×n , c ∈ R m , d ∈ R n , let k = A † c, h = d T A † , u = (I−AA † )c,<br />

v = d T (I − A † A), and β = 1 + d T A † c. Note that,<br />

c ∈ R(A) ⇔ u = 0 (2.2)<br />

d ∈ R(A T ) ⇔ v = 0, (2.3)<br />

then, the generalized inverse of A + cd T is given as follows.<br />

1. If u ≠ 0 and v ≠ 0, then (A + cd T ) † = A † − ku † − v † h + βv † u † .<br />

2. If u = 0 and v ≠ 0, and β = 0, then (A + cd T ) † = A † − kk † A † − v † h.<br />

3. If u = 0 and β ≠ 0, then (A + cd T ) † = A † + 1¯β v T k T A † − ¯β<br />

σ 1<br />

p 1 q1 T , where p 1 =<br />

( ‖k‖<br />

2<br />

) (<br />

2<br />

− ¯β<br />

vT + k , q1 T ‖v‖<br />

2<br />

)<br />

2<br />

= − ¯β<br />

kT A † + h .<br />

4. If u ≠ 0, v = 0 and β = 0, then (A + cd T ) † = A † − A † h † h − ku † .<br />

5. If v = 0 and β ≠ 0, then (A + cd T ) † = A † + 1¯β A † h T u T − ¯β<br />

σ 2<br />

p 2 q2 T , where p 2 =<br />

)<br />

( ‖h‖<br />

2<br />

( ‖u‖2<br />

− ¯β<br />

A† h T + k , q2 T 2<br />

= − ¯β<br />

uT + h , and σ 2 = ‖h‖ 2 2‖u‖ 2 2 + |β| 2 .<br />

6. If u = 0, v = 0 and β = 0, then (A + cd T ) † = A † − kk † A † − A † h † h + (k † A † h † )kh.<br />

)


4<br />

To utilize the above theorem, let<br />

A =<br />

n∑<br />

a i e T i , (2.4)<br />

i=1<br />

where a i is the ith column of A. Further let A i = [a 1 , . . . , a i , 0, . . . , 0] ∈ R m×n . Hence<br />

we have<br />

i∑<br />

A i = a k e T k , i = 1, . . . , n, (2.5)<br />

and if we denote A 0 = 0 m×n ,<br />

k=1<br />

A i = A i−1 + a i e T i , i = 1, . . . , n. (2.6)<br />

Thus every A i , i = 1, . . . , n is a rank-one update of A i−1 . Noticing that A † 0 = 0 n×m,<br />

we can utilize Theorem 1 to compute the Moore-Penrose inverse of A step by step and<br />

have A † = A † n in the end.<br />

In Theorem 1, substituting a i into c and e i into d, we can rewrite Equation (2.2)<br />

as follows.<br />

a i ∈ R(A i−1 ) ⇔ u = (I − A i−1 A † i−1 )a i = 0 (2.7)<br />

No matter whether a i is linearly dependent on the previous columns or not, we have<br />

<strong>for</strong> any i = 1, 2, . . . , n<br />

β = 1 + e T i A † i−1 a i = 1. (2.8)<br />

The vector v in equation (2.3) can be written as<br />

v = e T i (I − A † i−1 A i−1) ≠ 0, (2.9)<br />

which is nonzero <strong>for</strong> any i = 1, . . . , n. Hence, we can use Case 1 and Case 3 <strong>for</strong> case<br />

a i ∉ R(A i−1 ) and case a i ∈ R(A i−1 ), respectively.<br />

Then from Theorem 1, denoting A 0 = 0 m×n , we obtain a method to compte A † i<br />

based on A † i−1 as<br />

A † i = { A<br />

†<br />

i−1 + (e i − A † i−1 a i)((I − A i−1 A † i−1 )a i) † if a i ∉ R(A i−1 )<br />

A † i−1 + 1 σ i<br />

(e i − A † i−1 a i)(A † i−1 a i) T A † i−1 if a i ∈ R(A i−1 ) (2.10)<br />

where σ i = 1 + ‖A † i−1 a i‖ 2 2. This method was proposed by Greville in the 1960s[13].<br />

3 <strong>Preconditioning</strong> Algorithm<br />

In this section, we construct our preconditioning algorithm according to the Greville’s<br />

method in section 2. First of all, we notice that in Equation (2.10) the difference<br />

between case a i ∉ R(A i−1 ) and case a i ∈ R(A i−1 ) lies in the second term. If we define<br />

vectors k i , u i , v i and scalar f i <strong>for</strong> i = 1, . . . , n as<br />

k i = A † i−1 a i, (3.1)<br />

u i = a i − A i−1 k i = (I − A i−1 A † i−1 )a i, (3.2)<br />

{ ‖ui ‖ 2 2 if a<br />

f i =<br />

i ∉ R(A i−1 )<br />

1 + ‖k i ‖ 2 2 if a i ∈ R(A i−1 ) , (3.3)<br />

{<br />

u i if a i ∉ R(A i−1 )<br />

v i =<br />

(A † i−1 )T k i if a i ∈ R(A i−1 ) , (3.4)


5<br />

we can express A † i<br />

in a unified <strong>for</strong>m <strong>for</strong> general matrices as<br />

and we have<br />

A † i = A† i−1 + 1 f i<br />

(e i − k i )v T i , (3.5)<br />

A † =<br />

n∑<br />

i=1<br />

If we define matrices K ∈ R n×n , V ∈ R m×n , F ∈ R n×n as<br />

we obtain a matrix factorization of A † as follows.<br />

1<br />

f i<br />

(e i − k i )v T i . (3.6)<br />

K = [k 1 , . . . , k n], (3.7)<br />

V = [v 1 , . . . , v n], (3.8)<br />

⎡<br />

⎤<br />

f 1 · · · 0<br />

⎢<br />

F =<br />

.<br />

⎣ 0 ..<br />

⎥<br />

0 ⎦ , (3.9)<br />

0 · · · f n<br />

Theorem 2 Let A ∈ R m×n and rank(A) ≤ min{m, n}. Using the above notations,<br />

the Moore-Penrose inverse of A has the following factorization<br />

A † = (I − K)F −1 V T . (3.10)<br />

Here I is the identity matrix of order n, K is a strict upper triangular matrix, F is a<br />

diagonal matrix whose diagonal elements are all positive.<br />

If A is full column rank, then<br />

V = A(I − K) (3.11)<br />

A † = (I − K)F −1 (I − K) T A T . (3.12)<br />

Proof<br />

Denote Āi = [a 1 , . . . , a i ]. Then since<br />

k i = A † i−1 a i (3.13)<br />

= [a 1 , . . . , a i−1 , 0, . . . , 0] † a i (3.14)<br />

= [ Ā i−1 , 0, . . . , 0 ] † ai (3.15)<br />

[ ]<br />

Ā†<br />

= i−1 a i (3.16)<br />

0<br />

⎡ ⎤<br />

k i,1<br />

.<br />

=<br />

k i,i−1<br />

0 , (3.17)<br />

⎢ ... ⎥<br />

⎣ ⎦<br />

0<br />

K = [k 1 , . . . , k n] is a strictly upper triangular matrix.<br />

Since u i = 0 ⇔ a i ∈ R(A i−1 ),<br />

{ ‖ui ‖ 2 2 if<br />

f i =<br />

1 + ‖k i ‖ 2 2 if<br />

a i ∉ R(A i−1 )<br />

a i ∈ R(A i−1 ) . (3.18)


6<br />

Thus f i (i = 1, . . . , n) are always positive, which implies that F is a diagonal matrix<br />

with positive diagonal elements.<br />

If A is a full rank matrix, we have<br />

V = [u 1 , . . . , u n] (3.19)<br />

]<br />

=<br />

[(I − A 0 A † 0 )a 1, . . . , (I − A n−1 A † n−1 )an (3.20)<br />

= [a 1 − A 0 k 1 , . . . , a n − A n−1 k n] (3.21)<br />

= A − [A 0 k 1 , . . . , A n−1 k n] (3.22)<br />

= A − [A 1 k 1 , . . . , A nk n] (3.23)<br />

= A(I − K). (3.24)<br />

The second from the bottom equality follows from the fact that K is a strictly upper<br />

triangular matrix. Now, when A is full rank, A † can be decomposed as follows.<br />

A † = (I − K)F −1 V T = (I − K)F −1 (I − K) T A T ✷ (3.25)<br />

Remark 1 From the above proof, it is easy to see that when A is a full column rank<br />

matrix, (I − K) −T F (I − K) −1 is a LDL T Decomposition of A T A.<br />

Based on the Greville’s method, we can construct a preconditioning algorithm. We<br />

want to construct a sparse approximation to the Moore-Penrose inverse of A. Hence,<br />

we per<strong>for</strong>m some numerical droppings in the middle of the algorithm to maintain the<br />

sparsity of the preconditioner. We call the following algorithm the global Greville<br />

preconditioning algorithm, since it <strong>for</strong>ms or updates the whole matrix at a time rather<br />

than column by column.<br />

Algorithm 1 Global Greville preconditioning algorithm<br />

1. set M 0 = 0<br />

2. <strong>for</strong> i = 1 : n<br />

3. k i = M i−1 a i<br />

4. u i = a i − A i−1 k i<br />

5. if ‖u i ‖ 2 is not small<br />

6. f i = ‖u i ‖ 2 2<br />

7. v i = u i<br />

8. else<br />

9. f i = 1 + ‖k i ‖ 2 2<br />

10. v i = Mi−1 T k i<br />

11. end if<br />

12. M i = M i−1 + f 1 i<br />

(e i − k i )vi<br />

T<br />

13. per<strong>for</strong>m numerical droppings to M i<br />

14. end <strong>for</strong><br />

15. Obtain M n ≈ A † .<br />

Remark 2 In Algorithm 1, we do not need to store k i , v i , f i , i = 1, . . . , n, because we<br />

<strong>for</strong>m the M i explicitly.<br />

Remark 3 In Algorithm 1, we need to update M i at every step, but we do not need<br />

to update the whole matrix, since only the first i − 1 rows of M i−1 can be nonzero.<br />

Hence, to compute M i , we need to update the first i − 1 rows of M i−1 , and then add<br />

one new nonzero row to be the i-th row.


7<br />

Remark 4 If k i is sparse, when we update M i−1 , we only need to update the rows<br />

which correspond to the nonzero elements in k i . Hence, the rank-one update can be<br />

efficient.<br />

Remark 5 When A is nonsingular, the inverse of A can also be computed by the<br />

Sherman-Morrison <strong>for</strong>mula, which is also a rank-one update algorithm. Following this<br />

Sherman-Morrison <strong>for</strong>mula method, an incomplete factorization preconditioner can<br />

also be developed, we refer readers to [7,8].<br />

If we want to construct the matrix K, F and V without <strong>for</strong>ming M i explicitly,<br />

we can use a vector-wise version of the above algorithm. In Algorithm 1, the column<br />

vectors of K are constructed one column at a step, and u i , f i , v i are defined according<br />

to k i . Hence, it is possible to rewrite Algorithm 1 into a vector-wise <strong>for</strong>m.<br />

Since u i can be computed from a i −A i−1 k i , which does not refer to M i−1 explicitly,<br />

to vectorize Algorithm 1, we only need to <strong>for</strong>m k i and v i = Mi−1 T k i when linear<br />

dependence occurs, without using M i−1 explicitly.<br />

Consider the case when numerical droppings are not used. Since we already know<br />

that<br />

A † = (I − K)F −1 V T<br />

⎡<br />

f1<br />

−1<br />

⎢<br />

= (I − [ k 1 . . . k n ]) ⎣<br />

=<br />

n∑<br />

(e i − k i ) 1 vi T ,<br />

f i<br />

i=1<br />

. ..<br />

f −1 n<br />

⎤ ⎡<br />

v1 T<br />

⎥ ⎢ ...<br />

⎦ ⎣<br />

<strong>for</strong> any integer 1 ≤ p ≤ n, it is easy to see that<br />

p∑<br />

A † p = (e i − k i ) 1 vi T . (3.26)<br />

f i<br />

There<strong>for</strong>e, we have<br />

and<br />

i=1<br />

v i = (A † i−1 )T k i (3.27)<br />

∑i−1<br />

= ( (e p − k 1 p) vp T ) T k i<br />

f p<br />

(3.28)<br />

=<br />

p=1<br />

∑i−1<br />

1<br />

v p(e p − k p) T k i (3.29)<br />

f p<br />

p=1<br />

k i = A † i−1 a i (3.30)<br />

∑i−1<br />

= (e p − k 1 p) vp T a i<br />

f p<br />

(3.31)<br />

p=1<br />

∑i−2<br />

= (e p − k 1 p) v T 1<br />

p a i + (e i−1 − k i−1 ) v<br />

f p f<br />

i−1a T i<br />

p=1<br />

i−1<br />

(3.32)<br />

= A † i−2 a 1<br />

i + (e i−1 − k i−1 ) v<br />

f<br />

i−1a T i .<br />

i−1<br />

(3.33)<br />

v T n<br />

⎤<br />

⎥<br />


8<br />

To make this more clear, from the last column of K, the requirement relationship<br />

can be shown as<br />

A † n−2 an<br />

↗<br />

k n = A † n−1 an<br />

↖<br />

k n−1 = A † n−2 a n−1<br />

↗ ↖ ↗ ↖<br />

A † n−3 an k n−2 = A † n−3 a n−2 A † n−3 a n−1 k n−2 = A † n−3 a n−2<br />

. . . . . . .<br />

In other words, we need to compute every A † i a k, k = i + 1, . . . , n. Denote A † i a j, j > i<br />

as k i,j . In this sense, k i = k i−1,i . In the algorithm, k i,j , j > i will be stored in the j-th<br />

column of K, if j = i + 1, k i,j = k j , and it will not be updated any more. If j > i + 1,<br />

k i,j will be updated to k i+1,j and still stored in the same position.<br />

Based on the above discussion, and adding the numerical dropping strategy, we<br />

have the following algorithm. In the algorithm, we omit the first subscript of k i,j .<br />

Algorithm 2 Vector-wise Greville preconditioning algorithm<br />

1. set K = 0 n×n<br />

2. <strong>for</strong> i = 1 : n<br />

3. u = a i − A i−1 k i<br />

4. if ‖u‖ 2 is not small<br />

5. f i = ‖u‖ 2 2<br />

6. v i = u<br />

7. else<br />

8. f i = ‖k i ‖ 2 2 + 1<br />

9. v i = (A † i−1 )T k i = ∑ i−1<br />

p=1 f 1<br />

p<br />

v p(e p − k p) T k i<br />

10. end if<br />

11. <strong>for</strong> j = i + 1, . . . , n<br />

12. k j = k j + vT i aj<br />

f i<br />

(e i − k i )<br />

13. per<strong>for</strong>m numerical droppings on k j<br />

14. end <strong>for</strong><br />

15. end <strong>for</strong><br />

16. Obtain K = [k 1 , . . . , k n], F = Diag {f 1 , . . . , f n}, V = [v 1 , . . . , v n].<br />

4 The Special Case When A is Full Column Rank<br />

In this section, we especially take a look at the full column rank case. When A is full<br />

column rank, Algorithm 2 can be simplified as follows.<br />

Algorithm 3 Vector-wise Greville preconditioning algorithm <strong>for</strong> full column rank matrices<br />

1. set K = 0 n×n<br />

2. <strong>for</strong> i = 1 : n<br />

3. u i = a i − A i−1 k i<br />

4. f i = ‖u i ‖ 2 2<br />

5. <strong>for</strong> j = i + 1, . . . , n


9<br />

6. k j = k j + uT i aj<br />

f i<br />

(e i − k i )<br />

7. per<strong>for</strong>m numerical droppings on k j<br />

8. end <strong>for</strong><br />

9. end <strong>for</strong><br />

10. Obtain K = [k 1 , . . . , k n], F = Diag{f 1 , . . . , f n}.<br />

In Algorithm 3,<br />

u i = a i − A i−1 k i<br />

⎡ ⎤<br />

−k i,1<br />

.<br />

−k i,i−1<br />

= [a 1 , . . . , a i , 0, . . . , 0]<br />

1<br />

0 ⎢<br />

⎣ ... ⎥<br />

⎦<br />

0<br />

= A i (e i − k i )<br />

= A(e i − k i ).<br />

If we denote e i − k i as z i , then u i = Az i .<br />

Then, line 6 of Algorithm 3 can be rewritten as<br />

k j = k j + uT i a j<br />

‖u i ‖ 2 (e i − k i )<br />

2<br />

e j − k j = e j − k j − uT i a j<br />

‖u i ‖ 2 (e i − k i )<br />

2<br />

z j = z j − uT i a j<br />

‖u i ‖ 2 z i .<br />

2<br />

Denote d i = ‖u i ‖ 2 2 and θ = uT i a j<br />

. Then, combining all the new notations, we can<br />

d i<br />

rewrite the algorithm as follows.<br />

Algorithm 4<br />

1. set Z = I n×n<br />

2. <strong>for</strong> i = 1 : n<br />

3. u i = A i z i<br />

4. d i = (u i , u i )<br />

5. <strong>for</strong> j = i + 1, . . . , n<br />

6. θ = (ui,aj)<br />

d i<br />

7. z j = z j − θz i<br />

8. end <strong>for</strong><br />

9. end <strong>for</strong><br />

10. Z = [z 1 , . . . , z n], D = Diag{d 1 , . . . , d n}.<br />

Remark 6 Since z i = e i − k i , we have Z = I − K. Denoting D = Diag{d 1 , . . . , d n}, the<br />

factorization of A † in Theorem 2 can be rewritten as<br />

A † = ZD −1 Z T A T (4.1)


10<br />

Remark 7 In Algorithm 4, the definition of θ can be rewritten as<br />

θ = (u i, a j )<br />

d i<br />

(4.2)<br />

= (Az i, Ae j )<br />

(Az i , Az i )<br />

(4.3)<br />

= (z i, e j ) A T A<br />

. (4.4)<br />

(z i , z i ) A T A<br />

When A is full column rank, A T A is symmetric positive definite. Hence, (·, ·) A T A<br />

defines an inner product. Hence, Algorithm 4 includes an A T A-orthogonalization procedure.<br />

Based on this observation, we conclude that <strong>for</strong> the special case in which A<br />

is full column rank, our algorithm is very similar to the RIF preconditioner which<br />

was proposed by Benzi and Tůma [3]. The difference lies in the implementation. Our<br />

algorithm looks <strong>for</strong> an incomplete factorization of A † , while RIF looks <strong>for</strong> an LDL T<br />

decomposition of A T A.<br />

5 Equivalence Condition<br />

Consider solving the least squares problem (1.1) by trans<strong>for</strong>ming it into the left preconditioned<br />

<strong>for</strong>m,<br />

min ‖Mb − MAx‖ 2, (5.1)<br />

x∈R n<br />

where A ∈ R m×n , M ∈ R n×m , and b is a right-hand-side vector b ∈ R m .<br />

When preconditioning a least squares problem, one important issue is whether<br />

the solution of the preconditioned problem is the solution of the original problem. For<br />

nonsingular linear systems, the condition <strong>for</strong> this equivalence is that the preconditioner<br />

M should be nonsingular. Since we are dealing with general rectangular matrices,<br />

we need some other conditions to ensure that the preconditioned problem (5.1) is<br />

equivalent to the original least squares problem (1.1).<br />

First, note the following lemma [14].<br />

Lemma 1<br />

and<br />

‖b − Ax‖ 2 = min<br />

x∈R n ‖b − Ax‖ 2<br />

‖Mb − MAx‖ 2 = min<br />

x∈R n ‖Mb − MAx‖ 2<br />

are equivalent <strong>for</strong> all b ∈ R m , if and only if R(A) = R(M T MA).<br />

To analyze the equivalence between the original problem (1.1) and the preconditioned<br />

problem (5.1), where M is from any of our algorithms, we first consider the<br />

simple case in which A is a full column rank matrix. After numerical droppings, we<br />

have,<br />

A † ≈ M = ZF −1 Z T A T . (5.2)<br />

Note that Z is a unit upper triangular matrix and F is a diagonal matrix with positive<br />

elements. Hence, we have<br />

M = CA T , (5.3)


11<br />

where C is a nonsingular matrix. Hence, we have<br />

R(M T ) = R(A). (5.4)<br />

For the general case, K and F are still nonsingular. However, the expression <strong>for</strong> V is<br />

not straight<strong>for</strong>ward. To simplify the problem, we make the following assumption.<br />

Assumption 1<br />

1. There is no zero column in A.<br />

2. Our preconditioning algorithm can detect all the linear independence in A.<br />

The definition of v i can be rewritten as follows.<br />

v i =<br />

{ ai − Ak i ∈ R(A i ) ∉ R(A i−1 ) if a i ∉ R(A i−1 )<br />

(A † i−1 )T k i ∈ R(A i−1 ) = R(A i ) if a i ∈ R(A i−1 )<br />

(5.5)<br />

Hence, we have<br />

span{v 1 , v 2 , . . . , v i } = span{a 1 , a 2 , . . . , a i }. (5.6)<br />

On the other hand, note the 12-th line of Algorithm 1<br />

M i = M i−1 + 1 f i<br />

(e i − k i )v T i , (5.7)<br />

which implies that every row of M i is a linear combination of the vectors v T k , 1 ≤ k ≤ i,<br />

i.e.,<br />

R(M T i ) = span{v 1 , . . . , v i }. (5.8)<br />

Based on the above discussions, we obtain the following theorem.<br />

Theorem 3 Let A ∈ R m×n , m ≥ n. If Assumption 1 holds, then we have the following<br />

relationships, where M is the approximate Moore-Penrose inverse constructed by<br />

Algorithm 1.<br />

R(M T ) = R(V ) = R(A) (5.9)<br />

Remark 8 In Assumption 1, we assume that our algorithms can detect all the linear<br />

independence in the columns of A. Hence, we allow such mistakes that a linearly<br />

dependent column is recognized as a linearly independent column. An extreme case is<br />

that we recognize all the columns of A as linearly independent, i.e., we take A as a full<br />

column rank matrix. In this sense, our assumption can always be satisfied.<br />

Hence, we proved that <strong>for</strong> any matrix A ∈ R m×n , R(A) = R(M T ). We have the<br />

following theorem.<br />

Theorem 4 If Assumption 1 holds, then <strong>for</strong> all b ∈ R m , the preconditioned least<br />

squares problem (5.1), where M is constructed by Algorithm 1, is equivalent to the<br />

original least squares problem (1.1).


12<br />

Proof If Assumption 1 holds, we have<br />

R(A) = R(M T ). (5.10)<br />

Then there exists a nonsingular matrix C ∈ R n×n such that A = M T C. Hence,<br />

R(M T MA) = R(M T MA) (5.11)<br />

= R(M T MM T C) (5.12)<br />

= R(M T MM T ) (5.13)<br />

= R(M T M) (5.14)<br />

= R(M T ) (5.15)<br />

= R(A). (5.16)<br />

In the above equalities we used the relationship R(MM T ) = R(M).<br />

By Theorem 1 we complete the proof. ✷<br />

6 Breakdown-free Condition<br />

In this section, we assume without losing generality that the first r columns of A are<br />

linearly independent. Hence,<br />

R(A) = span{a 1 , . . . , a r}, (6.1)<br />

where rank(A) = r, and a i (i = 1, . . . , r) is the i-th column of A. The reason is that we<br />

can incorporate a column pivoting in Algorithm 1 easily. Then we have,<br />

a i ∈ span{a 1 , . . . , a r}, i = r + 1, . . . , n. (6.2)<br />

In this case, after per<strong>for</strong>ming Algorithm 1 with numerical dropping, matrix V can be<br />

written in the <strong>for</strong>m<br />

V = [v 1 , . . . , v r, v r+1 , . . . , v n]. (6.3)<br />

If we denote [v 1 , . . . , v r] as U r, then<br />

U r = A(I − K)I r, I r =<br />

There exists a matrix H ∈ R r×n−r such that<br />

H can be rank deficient. Then, V is given by<br />

[ ] Ir×r<br />

. (6.4)<br />

0<br />

[v r+1 , . . . , v n] = U rH = A(I − K)I rH. (6.5)<br />

V = [v 1 , . . . , v r, v r+1 , . . . , v n] (6.6)<br />

= [U r, U rH] (6.7)<br />

= U r [ I r×r H ] (6.8)<br />

[ ] Ir×r<br />

= A(I − K) [ I r×r H ] (6.9)<br />

0<br />

[ ] Ir×r H<br />

= A(I − K)<br />

. (6.10)<br />

0 0


13<br />

Hence,<br />

[ ]<br />

M = (I − K)F −1 Ir×r 0<br />

H T (I − K) T A T . (6.11)<br />

0<br />

From the above equation, we can also see that the difference between the full column<br />

rank case and the rank deficient case lies in<br />

[ ] Ir×r 0<br />

H T , (6.12)<br />

0<br />

which should be an identity matrix when A is full column rank.<br />

If there is no numerical dropping, M will be the Moore-Penrose inverse of A,<br />

[ ]<br />

A † = (I − ˆK) ˆF −1 Ir×r 0<br />

Ĥ T (I −<br />

0<br />

ˆK) T A T . (6.13)<br />

Comparing Equation (6.11) and Equation (6.13), we have the following theorem.<br />

Theorem 5 Let A ∈ R m×n . If Assumption 1 holds, then the following relationships<br />

hold, where M denotes the approximate Moore-Penrose inverse constructed by Algorithm<br />

1,<br />

R(M) = R(A † ) = R(A T ). (6.14)<br />

Based on Theorem 3 and Theorem 5, we have the following theorem which ensures<br />

that the GMRES method can determine a solution to the preconditioned problem<br />

MAx = Mb be<strong>for</strong>e breakdown happens <strong>for</strong> any b ∈ R m .<br />

Theorem 6 Let A ∈ R m×n . If Assumption 1 holds, then GMRES can determine a<br />

least squares solution to<br />

min ‖MAx − Mb‖ x∈R n 2 (6.15)<br />

be<strong>for</strong>e breakdown happens <strong>for</strong> all b ∈ R m , where M is constructed by Algorithm 1.<br />

Proof According to Theorem 2.1 in Brown and Walker [6], we only need to prove<br />

which is equivalent to<br />

N (MA) = N (A T M T ), (6.16)<br />

R(MA) = R(A T M T ). (6.17)<br />

Using the result from Theorem 3, there exists a nonsingular matrix T ∈ R n×n such<br />

that A = M T T . Hence,<br />

On the other hand,<br />

R(MA) = R(MM T T ) (6.18)<br />

= R(MM T ) (6.19)<br />

= R(M). (6.20)<br />

R(A T M T ) = R(A T AT −1 ) (6.21)<br />

= R(A T A) (6.22)<br />

= R(A T ). (6.23)<br />

The proof is completed using Theorem 5.<br />


14<br />

Remark 9 From the proof of the above theorem, we have R(MA) = R(M), which<br />

means that the preconditioned least squares problem (5.1) is a consistent problem.<br />

Thus, by Theorem 4 and Theorem 6, we finally obtain our main result <strong>for</strong> our<br />

Greville preconditioner M.<br />

Theorem 7 Let A ∈ R m×n . If Assumption 1 holds, then <strong>for</strong> any b ∈ R m , preconditioned<br />

GMRES determines a least squares solution of<br />

min ‖MAx − Mb‖ x∈R n 2 (6.24)<br />

be<strong>for</strong>e breakdown and this solution attains min<br />

x∈R n‖b − Ax‖ 2, where M is computed by<br />

Algorithm 1.<br />

Remark 10 Since all the proofs are based on the factorization M = (I − K)F −1 V T ,<br />

the theorems hold also <strong>for</strong> Algorithm 2.<br />

7 Implementation Consideration<br />

7.1 Detecting Linear Dependence<br />

In Algorithm 1 and Algorithm 2, one important issue is how to recognize the columns<br />

which are linearly dependent on the previous columns. In the algorithms we check<br />

whether ‖u i ‖ 2 is small to detect the rank deficiency. Simply speaking, we can set up<br />

a tolerance τ in advance, and switch to “else” when ‖u i ‖ 2 < τ. However, is this good<br />

enough to help us detect the linear dependent columns of A when we per<strong>for</strong>m numerical<br />

droppings? To address this issue, we first take a look at the RIF preconditioning<br />

algorithm.<br />

The RIF preconditioner was developed <strong>for</strong> full rank matrices. However, numerical<br />

experiments showed that it also works <strong>for</strong> rank deficient matrices. For this phenomenon,<br />

our equivalent Algorithm 3 can give a good insight into the RIF preconditioner, since<br />

the only possibility <strong>for</strong> the RIF preconditioner to breakdown is when f i = 0, which<br />

implies that u i is a zero vector. From Algorithm 1 and Algorithm 2 we have<br />

u i = a i − A i−1 k i (7.1)<br />

= a i − A i−1 A † i−1 a i (7.2)<br />

= (I − A i−1 A † i−1 )a i. (7.3)<br />

Thus, u i is the projection of a i onto R(A i−1 ) ⊥ . Hence, in exact arithmetic u i = 0<br />

if and only if a i ∈ R(A i−1 ). Algorithm 1 and Algorithm 2 have an alternative when<br />

a i ∈ R(A i−1 ), i.e. when u i = 0, Algorithm 1 and Algorithm 2 will turn to the “else”<br />

case. However, this is not always necessary, because of the numerical droppings. With<br />

numerical droppings, the u i is actually,<br />

u i = a i − A i−1 M i−1 a i (7.4)<br />

= a i − A i−1 (A † i−1 + ∆E)a i (7.5)<br />

= a i − A i−1 A † i−1 a i − A i−1 ∆Ea i (7.6)<br />

= −A i−1 ∆Ea i (7.7)<br />

≠ 0, (7.8)


15<br />

where ∆E is an error matrix. Hence, even though a i ∈ R(A i−1 ), since u i will not<br />

be the exact projection of a i onto R(A i−1 ) ⊥ , the RIF algorithm will not necessarily<br />

breakdown when linear dependence happens.<br />

The RIF preconditioner does not take the rank deficient columns or nearly rank deficient<br />

columns into consideration. Hence, if we can capture the rank deficient columns,<br />

we might be able to have a better acceleration to the convergence of the preconditioned<br />

problem. Assume the M we compute from Algorithm 1 or Algorithm 2 can be viewed<br />

as an approximation to A † with error matrix E ∈ R n×m , that is<br />

M = A † + E. (7.9)<br />

First note a theoretical result about the perturbation lower bound of the generalized<br />

inverse.<br />

Theorem 8 [18] If rank(A + E) ≠ rank(A), then<br />

‖(A + E) † − A † ‖ 2 ≥ 1<br />

‖E‖ 2<br />

. (7.10)<br />

By Theorem 8, if the rank of M = A † + E from our algorithm is not equal to the<br />

rank of A † , ( or A, since they have the same rank), by the above theorem, we have,<br />

‖M † − (A † ) † ‖ 2 ≥ 1<br />

‖E‖ 2<br />

(7.11)<br />

⇒ ‖M † − A‖ 2 ≥ 1<br />

‖E‖ 2<br />

. (7.12)<br />

The above inequality says that, if we denote M † = A + ∆A, then ‖∆A‖ 2 ≥ 1<br />

‖E‖ 2<br />

.<br />

Hence, when ‖E‖ 2 is small, which means M is a good approximation to A † , M can<br />

be an exact generalized inverse of another matrix which is far from A, and the smaller<br />

the ‖E‖ 2 is, the further M † is from A. In this sense, if the rank of M is not the same<br />

as that of A, M may not be a good preconditioner.<br />

Thus, it is important to maintain the rank of M to be the same as rank(A). Hence,<br />

when we per<strong>for</strong>m our algorithm, we need to sparsify the preconditioner M, but at the<br />

same time we also want to capture as many rank deficient columns as possible, and<br />

maintain the rank of M. To achieve this, apparently, it is very important to decide<br />

how to detect whether the exact value u i = ‖(I − A i−1 A † i−1 )a i‖ 2 is close to zero or<br />

not based on the computed value u i = ‖(I − A i−1 M i−1 )a i ‖ 2 .<br />

From the previous discussion, we have<br />

When a i ∈ R(A i−1 ), u i = a i − A i−1 A † i−1 a i = 0. Then,<br />

u i = u i − A i−1 ∆Ea i . (7.13)<br />

u i = −A i−1 ∆Ea i , (7.14)<br />

‖u i ‖ 2 ≤ ‖A i−1 ‖ F ‖a i ‖ 2 ‖∆E‖ F . (7.15)<br />

If we require ∆E to be small, we can use a tolerance τ s. If<br />

‖u i ‖ 2 ≤ τ s‖A i−1 ‖ F ‖a i ‖ 2 , (7.16)<br />

we suppose we detect a column a i which is in the range space of A i−1 . From now on<br />

we call τ s the switching tolerance.


16<br />

An appropriate switching tolerance τ s is important to the efficiency of the preconditioner.<br />

When we choose a small τ s, e.g. 10 −8 , we have less chance to detect the<br />

linearly dependent columns. On the other hand, when a relatively large τ s, e.g. 10 −3 , is<br />

chosen, the preconditioning algorithm tends to find more linearly dependent columns.<br />

When a large τ s is taken, many linearly dependent columns can be found. However, it<br />

is also possible that Assumption 1 is violated so that the original least squares problem<br />

can not be solved. When a relatively small τ s is taken, it is possible that less or no<br />

linearly dependent columns can be found. Hence, the preconditioner can loose some<br />

efficiency. When τ s is fixed, a larger dropping tolerance τ d results in less possibility<br />

to detect linearly dependent columns, and a smaller τ d results in larger possibility to<br />

detect linearly dependent columns. Choosing optimal τ s and the dropping tolerance τ d<br />

to achieve the balance between them remains a problem.<br />

Another issue related with the dropping tolerance τ d is how much memory is needed<br />

to store the preconditioner. From the previous discussion, we know that R(M T ) =<br />

span{V }. When numerical droppings is per<strong>for</strong>med on V , the relation may not hold.<br />

Hence, in Algorithm 2 we only per<strong>for</strong>m numerical droppings on the columns of K. In<br />

this way, as long as Assumption 1 holds, we have R(M T ) = span{V }. However, on<br />

the other hand, we cannot control the sparsity of V directly. We will discuss this issue<br />

again at the end of this section.<br />

7.2 Right-<strong>Preconditioning</strong><br />

So far we assumed A ∈ R m×n , m ≥ n, and only discussed left-preconditioning. When<br />

m ≥ n, it is better to per<strong>for</strong>m left-preconditioning since the size of the preconditioned<br />

problem will be smaller. When m ≤ n, right-preconditioning will be better, i.e, solving<br />

the preconditioned problem<br />

min ‖b − ABy‖ 2. (7.17)<br />

y∈R m<br />

When m ≤ n, Theorem 3 and Theorem 5 still hold, since in the proof of these two<br />

theorems we did not refer to the fact that m ≥ n. Then, according to the following<br />

lemma from [14], it is easy to obtain similar conclusions <strong>for</strong> right-preconditioning.<br />

Lemma 2 min ‖b − Ax‖ x∈R n 2 = min ‖b − ABy‖ y∈R m 2 holds <strong>for</strong> any b ∈ R m if and only if<br />

R(A) = R(AB).<br />

Theorem 9 Let A ∈ R m×n . If Assumption 1 holds, then <strong>for</strong> any b ∈ R m we have<br />

min ‖b − Ax‖ x∈R n 2 = min ‖b − AMy‖ 2, where M is a preconditioner constructed by Algorithm<br />

y∈R m<br />

1.<br />

Proof From Theorem 5, we know that R(M) = R(A T ), which implies that there exists<br />

a nonsingular matrix C ∈ R m×m such that M = A T C. Hence,<br />

R(AM) = R(AA T C) (7.18)<br />

= R(AA T ) (7.19)<br />

= R(A). (7.20)<br />

Using Theorem 2 we complete the proof.<br />


17<br />

Theorem 10 Let A ∈ R m×n . If Assumption 1 holds, then GMRES determines a<br />

solution to the right preconditioned problem<br />

be<strong>for</strong>e breakdown happens.<br />

min ‖b − AMy‖ y∈R m 2 (7.21)<br />

Proof The proof is the same as Theorem 6.<br />

✷<br />

Theorem 11 Let A ∈ R m×n . If Assumption 1 holds, then <strong>for</strong> any b ∈ R m , GMRES<br />

determines a least squares solution of<br />

min ‖b − AMy‖ y∈R m 2 (7.22)<br />

be<strong>for</strong>e breakdown and this solution attains min<br />

x∈R n‖b − Ax‖ 2, where M is computed by<br />

Algorithm 1.<br />

We would like to remark that it is preferable to per<strong>for</strong>m Algorithm 1 and Algorithm<br />

2 to A T rather than A when m < n, based on the following three reasons. By doing<br />

so, we construct ˆM, an approximate generalized inverse of A T . Then, we can use ˆM T<br />

as the preconditioner to the original least squares problem.<br />

1. In Algorithm 1 and Algorithm 2, the approximate generalized inverse is constructed<br />

row by row. Hence, we per<strong>for</strong>m a loop which goes through all the columns of A<br />

once. When m ≥ n, this loop is relatively short. However, when m < n, this loop<br />

could become very long, and the preconditioning will be more time-consuming.<br />

2. Another reason is that, linear dependence will always happen in this case even<br />

though matrix A is full row rank. If m τ s ∗ ‖A i−1 ‖ F ∗ ‖a i ‖ 2


18<br />

10. f i = ‖u‖ 2 2<br />

11. v i = u<br />

12. else<br />

13. f i = ‖k i ‖ 2 2 + 1<br />

14. v i = (A † ∑i−1<br />

1<br />

i−1 )T k i = v p(e p − k p) T k i<br />

f p<br />

p=1<br />

15. end if<br />

16. end <strong>for</strong><br />

17. K = [k 1 , . . . , k n], F = Diag {f 1 , . . . , f n}, V = [v 1 , . . . , v n].<br />

In Algorithm 2, all the columns k j , j = i, . . . , n are updated in one outer loop. In<br />

Algorithm 5, all the updates <strong>for</strong> each k i column are finished at a time. This left-looking<br />

approach is suggested in [1,5]. From [1], a left-looking version is sometimes more robust<br />

and efficient. In Section 8 our preconditioners are computed by Algorithm 5.<br />

In the next section, we will use Algorithm 5 as our preconditioning algorithm.<br />

Since we cannot control the sparsity of V , and V can be very dense, we try not to<br />

store V . Note that when column a j is recognized as a linearly independent column,<br />

the corresponding column becomes v j = A(e j − k j ). Hence, we do not have to store<br />

v j . When we need v j in the 4th line in Algorithm 5, we can compte θ = a T i A(e j − k j ),<br />

where a T i A is used <strong>for</strong> i − 1 times and discarded. For the v j corresponding to a linearly<br />

dependent column, we need to store it.<br />

8 Numerical Examples<br />

In this section, we use matrices from the Florida University Sparse Matrices Collection<br />

[11] to test our algorithms, where zero rows are omitted and if the original matrix<br />

has more columns than rows we use its transpose. All computations were run on a<br />

Dell Precision 690, where the CPU is 3 GHz and the memory is 16 GB, and the<br />

programming language and compiling environment was GNU C/C++ 3.4.3 in Redhat<br />

Linux. Detailed in<strong>for</strong>mation of the matrices is given in Table 1.


19<br />

Table 1 In<strong>for</strong>mation on the test matrices<br />

Name m n rank density(%) cond<br />

vtp base 346 198 198 1.53 3.55 × 10 7<br />

capri 482 271 271 1.45 8.09 × 10 3<br />

scfxm1 600 330 330 1.38 2.42 × 10 4<br />

scfxm2 1200 660 660 0.69 2.42 × 10 4<br />

perold 1560 625 625 0.65 5.13 × 10 5<br />

scfxm3 1800 990 990 0.51 1.39 × 10 3<br />

maros 1966 846 846 0.61 1.95 × 10 6<br />

pilot we 2928 722 722 0.44 4.28 × 10 5<br />

stoc<strong>for</strong>2 3045 2157 2157 0.14 2.84 × 10 4<br />

bnl2 4486 2324 2324 0.14 7.77 × 10 3<br />

pilot 4860 1441 1441 0.63 2.66 × 10 3<br />

d2q06c 5831 2171 2171 0.26 1.33 × 10 5<br />

80bau3b 11934 2262 2262 0.09 567.23<br />

beaflw 500 492 460 21.71 1.52 × 10 9<br />

beause 505 492 459 17.93 6.74 × 10 8<br />

lp cycle 3371 1890 1875 0.33 1.46 × 10 7<br />

Pd rhs 5804 4371 4368 0.02 3.36 × 10 8<br />

Model9 10939 2879 2787 0.18 7.90 × 10 20<br />

landmark 71952 2673 2671 0.60 1.02 × 10 8<br />

In Table 1, the matrices in the first section are full column rank matrices, the<br />

matrices in the second section are rank deficient matrices. The condition number of A<br />

is given by σ1(A) , where r = rank(A). We construct the preconditioner M by Algorithm<br />

σ r(A)<br />

5 and per<strong>for</strong>m the BA-GMRES [14] which is given below, where B is the preconditioner.<br />

Algorithm 6 BA- GMRES<br />

1. Choose x 0 , ˜r 0 = B(b − Ax 0 )<br />

2. v 1 = ˜r 0 /‖˜r 0 ‖ 2<br />

3. <strong>for</strong> i = 1, 2, . . . , k<br />

4. w i = BAv i<br />

5. <strong>for</strong> j = 1, 2, . . . , i<br />

6. h j,i = (w i , v j )<br />

7. w i = w i − h j,i v j<br />

8. end <strong>for</strong><br />

9. h i+1,i = ‖w i ‖ 2<br />

10. v i+1 = w i /h i+1,i<br />

11. Find y i ∈ R i which minimizes ‖˜r i ‖ 2 = ∥ ∥ ‖˜r0 ‖ 2 e i − ¯H i y ∥ ∥ 2<br />

12. x i = x 0 + [v 1 , . . . , v i ]y i<br />

13. r i = b − Ax i<br />

14. if ‖A T r i ‖ 2 < ε stop<br />

15. end <strong>for</strong><br />

16. x 0 = x k<br />

17. Go to 1.<br />

The BA-GMRES is a method that solves least squares problems with GMRES by<br />

preconditioning the original problem with a suitable left preconditioner.<br />

In the following examples, the right hand side vectors b are generated randomly by<br />

Matlab so that the least squares problems we solve are inconsistent. In this section,


20<br />

our dropping rule is to drop the i-th element of k j , i.e., k j,i when the following two<br />

inequalities holds.<br />

|k j,i |‖a i ‖ 2 < τ d and |k j,i | < τ d ′ max<br />

l=1,...,i−1 |k l,i|. (8.1)<br />

Here we have another tolerance τ d ′. In the following examples, we simply fix it to<br />

10.0 ∗ τ d . We use<br />

‖u i ‖ 2 ≤ τ s‖A i−1 ‖ F ‖a i ‖ 2 , (8.2)<br />

as the criterion to judge if column a i is linearly dependent on the previous columns or<br />

not, where τ s is our switching tolerance. The stopping rule <strong>for</strong> BA-GMRES is<br />

‖A T (b − Ax)‖ 2 ≤ 10 −8 · ‖A T b‖ 2 . (8.3)<br />

In the following tables, we use our preconditioner to precondition GMRES and compare<br />

with GMRES preconditioned by RIF, which is denoted by “RIF-GMRES”, CGLS<br />

[4] preconditioned by Incomplete Modified Gram-Schmidt orthogonalization preconditioner<br />

(IMGS) [15], which is denoted by “IMGS-CGLS”, CGLS preconditioned by using<br />

the diagonal scaling preconditioner, which is denoted by “D-CGLS” and CGLS preconditioned<br />

by the Incomplete Cholesky Factorization preconditioner [4], which is denoted<br />

by “IC-CGLS”, to solve the original least squares problem min<br />

x∈R n‖b−Ax‖ 2. The columns<br />

τ d and τ s list the dropping tolerance and the switching tolerance we used in our preconditioning<br />

algorithm. The column “ITS” gives the number of iterations. The columns<br />

“Pre. T” and “Tol. T” give the preconditioning time and the total time in seconds,<br />

respectively. For “RIF-GMRES”, “IMGS-CGLS”, “D-CGLS” and “D-CGSL”, in each<br />

cell we give the numbers of iterations and the total CPU time. For “RIF-GMRES” and<br />

“IMGS-CGLS”, we give the parameters we used in brackets. In the “ nnzP<br />

nnzA ” column,<br />

we give the ratio of the number of nonzero elements in our preconditioner and A, where<br />

nnzP is the number of nonzero elements in K plus the number of the nonzero elements<br />

in diagonal matrix F . The ∗ indicates the fastest method <strong>for</strong> each problem.<br />

Table 2 Numerical Results <strong>for</strong> Full Rank Matrices<br />

matrix τ d<br />

nnzP<br />

nnzA ITS Pre. T Tot. T RIF-GMRES IMGS-CGLS D-CGLS IC-CGLS<br />

vtp base 1.e-6 18.0 4 0.03 *0.03 5, 0.04 (1.e-9) 9, 0.10 (1.e-5) 3667, 0.29 1348, 0.12<br />

capri 1.e-3 10.0 17 0.03 0.04 9, 0.07 (1.e-4) 371, 0.04 (1.0) 496, 0.06 195, *0.03<br />

scfxm1 1.e-3 13.3 17 0.05 0.06 7, *0.05 (1.e-5) 116, 0.08 (1.e-1) 761, 0.11 318, 0.06<br />

scfxm2 1.e-3 16.6 27 0.12 *0.18 12, 0.26 (1.e-5) 734, 0.28 (1.0) 1332, 0.42 680, 0.23<br />

perold 1.e-2 12.4 17 0.17 0.21 45, 0.36 (1.e-4) 134, 0.26 (1.e-1) 1058, 0.36 394, *0.14<br />

scfxm3 1.e-4 27.0 7 0.30 *0.34 12, 0.46 (1.e-5) 878, 0.53 (1.0) 1583, 0.72 847, 0.41<br />

maros 1.e-6 28.8 3 1.02 *1.05 10, 1.13 (1.e-7) 383, 1.47 (1.e-1) 13714, 6.71 3602, 1.8<br />

pilot we 1.e-1 4.5 41 0.10 0.20 12, 0.4 (1.e-5) 536, 0.37 (1.0) 736, 0.38 287, *0.14<br />

stoc<strong>for</strong>2 1.e-5 152.2 5 10.55 10.68 16, 11.5 (1.e-7) 138, 3.14 (1.e-1) 2147, 1.59 1362, *1.12<br />

bnl2 1.e-1 3.0 114 0.4 1.22 127, 2.79 (1.e-2) 598, 1.24 (1.0) 657, 0.64 240, *0.36<br />

pilot 1.e-2 2.9 18 1.54 1.72 359, 4.57 (1.e-1) 741, 1.49 (1.0) 822, 1.24 269, *0.75<br />

d2q06c 1.e-3 18.7 32 4.27 5.15 19, 10.68 (1.e-5) 1209, 3.6 (1.0) 2743, 4.05 1077, *1.63<br />

80bau3b 5.e-1 0.5 28 0.46 0.52 26, 0.54 (5.e-1) 47,1.34 (1.0) 56, *0.09 21, 0.21<br />

For full rank matrices, we did not list τ s because we simply set τ s to be a very small<br />

number and do not try to detect any linearly dependent columns in the coefficient<br />

matrices. From the numerical results in Table 2, we conclude that our preconditioner<br />

per<strong>for</strong>ms competitively when the matrices are ill-conditioned. We can also see that our


21<br />

preconditioner is usually much denser than the original matrix A when the density of<br />

A is larger than 0.5%. When the density of A is less than 0.3%, the density of our<br />

preconditioner is usually similar to or less than that of A. In Table 3 we show the<br />

numerical results <strong>for</strong> rank deficient matrices. Column “D” gives the number of linearly<br />

dependent columns that we detected.<br />

Table 3 Numerical Results <strong>for</strong> Rank Deficient Matrices<br />

matrix τ d , τ s<br />

nnzP<br />

nnzA D ITS Pre. T Tot. T<br />

beaflw 1.e-5, 1.e-10 2.25 26 209 1.22 *2.25<br />

beause 1.e-5, 1.e-11 2.71 31 20 1.16 *1.22<br />

lp cycle 1.e-4, 1.e-7 33.5 15 27 2.17 *2.68<br />

Pd rhs 1.e-4, 1.e-8 0.75 3 3 0.79 0.80<br />

model09 1.e-2, 1.e-6 2.92 0 12 0.97 *1.12<br />

landmark 1.e-1, 1.e-8 0.05 0 143 2.83 10.37<br />

matrix RIF-GMRES IMGS-CGLS D-CGLS IC-CGLS<br />

beaflw † † † †<br />

beause † † † †<br />

lp cycle † 5832, 4.87(100.0) 6799, 6.74 †<br />

Pd rhs † 25, 0.93(0.1) 242, *0.29 89, 0.54<br />

model09 46, 2.53(1.e-3) † 1267, 3.09 412, 1.22<br />

landmark 239, 38.43(10.0) 252, 36.68(10.0) 252, *8.77 †<br />

†: GMRES did not converge in 2, 000 steps or D-CGLS did not converge in 25, 000 steps.<br />

From the results in Table 3 we can conclude that our preconditioner works <strong>for</strong><br />

all the tested problems, and per<strong>for</strong>med competitively with other preconditioners. The<br />

density of our preconditioner is about 3 times the original matrix A or less except <strong>for</strong><br />

lp cycle.<br />

For the matrix lp cycle, we know that the rank deficient columns are,<br />

182 184 216 237 253<br />

717 754 961 1221 1239<br />

1260 1261 1278 1640 1859,<br />

(8.4)<br />

15 columns in all. As we know that the rank deficient columns are not unique, the<br />

above columns we list are the columns which are linearly dependent on their previous<br />

columns. In the following example, we can see that our preconditioning algorithm can<br />

detect many of them.<br />

Table 4 Numerical Results <strong>for</strong> lp cycle<br />

τ d τ s deficiency detected ITS Pre. T Its. T Tot. T<br />

1.e-6 1.e-10 −1239, −1261, −1278 6 3.65 0.16 3.81<br />

1.e-6 1.e-7 detect all exactly 4 3.7 0.10 3.8<br />

1.e-6 1.e-5 detected 95 col † 4.7<br />

In the above table, we tested our preconditioning algorithm with fixed dropping<br />

tolerance τ d = 10 −6 and different switching tolerances τ s. For this problem lp cycle,


22<br />

RIF preconditioned GMRES did not converge in 2000 steps <strong>for</strong> all the dropping tolerances<br />

τ d we tried. The column “deficiency detected” gives the linearly dependent<br />

columns detected by our algorithm. −1260 means that the 1260th column, which is<br />

a rank deficient column, is missed by our algorithm. For example, when τ d = 10 −6 ,<br />

τ s = 10 −10 our preconditioning algorithm detected 12 rank deficient columns, and<br />

missed the 1239-th, 1261-th, and 1278-th columns, and did not detect wrong linearly<br />

dependent columns. Hence, Assumption 1 is satisfied. When τ d = 10 −6 , τ s = 10 −7<br />

our preconditioning algorithm found exactly the 15 linearly dependent columns. When<br />

τ d = 10 −6 , τ s = 10 −5 many more than 15 columns were detected. Thus, assumption<br />

1 is not satisfied. Hence, the preconditioned GMRES did not converge. From this example<br />

we can see that the numerical results coincide with our theory very well. From<br />

Figure 1, we obtain a better insight into this situation. We observe that when the<br />

0<br />

Residual Curve of lp_cycle, τ d<br />

= 10 −6<br />

−2<br />

log 10<br />

||A T r|| 2<br />

/||A T b|| 2<br />

−4<br />

−6<br />

−8<br />

−10<br />

τ s<br />

=10 −5<br />

τ s<br />

=10 −7<br />

τ s<br />

=10 −10<br />

−12<br />

−14<br />

0 50 100 150 200<br />

Iteration Number<br />

Fig. 1 Convergence Curve <strong>for</strong> lp cycle<br />

switching tolerance τ s = 10 −5 , ‖AT r‖ 2<br />

‖A T maintains the level 10 −1 in the end, which<br />

b‖ 2<br />

means that the solution is not a solution to the original least squares problem. This<br />

phenomenon illustrates that Assumption 1 is necessary.<br />

9 Conclusion<br />

In this paper, we proposed a new preconditioner <strong>for</strong> least squares problems. When<br />

matrix A is full column rank, our preconditioning method is similar to the RIF preconditioner<br />

[3]. When A is rank deficient, our preconditioner is also rank deficient.<br />

We proved that under Assumption 1, using our preconditioners, the preconditioned


23<br />

problems are equivalent to the original problems. Also, under the same assumption,<br />

we showed that the GMRES determines a solution to the preconditioned problem.<br />

Numerical experiments confirmed our theories and Assumption 1.<br />

Numerical results showed that our preconditioner per<strong>for</strong>med competitively both <strong>for</strong><br />

ill-conditioned full column rank matrices and ill-conditioned rank deficient matrices.<br />

For rank deficient problems, the RIF preconditioner may not work but our preconditioner<br />

worked efficiently.<br />

References<br />

1. M. Benzi and M. Tůma, A sparse approximate inverse preconditioner <strong>for</strong> nonsymmetric<br />

linear systems, SIAM Journal on Scientific Computing, 19 (1998), pp. 968–994.<br />

2. , A robust incomplete factorization preconditioner <strong>for</strong> poitive definite matrices, Numerical<br />

linear algebra with applications, 10 (2003), pp. 385–400.<br />

3. , A robust preconditioner with low memory requirements <strong>for</strong> large sparse least<br />

squares problems, SIAM Journal on Scientific Computing, 25 (2003), pp. 499–512.<br />

4. Å. Björck, Numerical <strong>Method</strong>s <strong>for</strong> <strong>Least</strong> <strong>Squares</strong> Problems, 1996.<br />

5. M. Bollhöfer and Y. Saad, On the relations between ILUs and factored approximate<br />

inverses, SIAM Journal on Matrix Analysis and Applications, 24 (2002), pp. 219–237.<br />

6. P. N. Brown and H. F. Walker, GMRES on (nearly) singular systems, SIAM Journal<br />

on Matrix Analysis and Applications, 18 (1997), pp. 37–51.<br />

7. R. Bru, J. Cerdán, J. Marín, and J. Mas, <strong>Preconditioning</strong> sparse nonsymmetric linear<br />

systems with the sherman–morrison <strong>for</strong>mula, SIAM Journal on Scientific Computing, 25<br />

(2003), pp. 701–715.<br />

8. R. Bru, J. MarÍn, and J. M. M. TUMA, Balanced incomplete factorization, SIAM<br />

Journal on Scientific Computing, 30 (2008), pp. 2302–2318.<br />

9. S. L. Campbell and C. D. Meyer, Jr., Generalized Inverses of Linear Trans<strong>for</strong>mations,<br />

Pitman, London, 1979. Reprinted by Dover, New York, 1991.<br />

10. E. Chow and Y. Saad, Approximate inverse preconditioners via sparse-sparse iterations,<br />

SIAM Journal on Scientific Computing, 19 (1998), pp. 995–1023.<br />

11. T. A. Davis, University of florida sparse matrix collection, NA Digest, 92 (1994).<br />

12. J. A. Fill and D. E. Fishkind, The Moore–Penrose generalized inverse <strong>for</strong> sums of<br />

matrices, SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp. 629–635.<br />

13. T. N. E. Greville, Some applications of the pseudoinverse of a matrix, SIAM Review, 2<br />

(1960), pp. 15–22.<br />

14. K. Hayami, J.-F. Yin, and T. Ito, GMRES methods <strong>for</strong> least squares problems, Tech.<br />

Report NII-2007-009E, National Institute of In<strong>for</strong>matics, Tokyo, July 2007, (also to appear<br />

in SIAM Journal on Matrix Analysis and Applications).<br />

15. Y. Saad, Iterative <strong>Method</strong>s <strong>for</strong> Sparse Linear Systems, Society <strong>for</strong> Industrial and Applied<br />

Mathematics, Philadelphia, PA, USA, second ed., 2003.<br />

16. Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm <strong>for</strong> solving<br />

nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing,<br />

7 (1986), pp. 856–869.<br />

17. G. Wang, Y. Wei, and S. Qiao, Generalized Inverses: Theory and Computations, Since<br />

Press, Beijing, 2003.<br />

18. P.-Å. Wedin, Perturbation theory <strong>for</strong> pseudo-inverses, BIT Numerical Mathematics, 13<br />

(1973), pp. 217 – 232.<br />

19. J.-F. Yin and K. Hayami, Preconditioned GMRES methods with incomplete Givens orthogonalization<br />

method <strong>for</strong> large sparse least-squares problems, J. Comput. Appl. Math.,<br />

226 (2009), pp. 177–186.<br />

20. N. Zhang and Y. Wei, On the convergence of general stationary iterative methods <strong>for</strong><br />

range-hermitian singular linear systems (p n/a), Numerical Linear Algebra with Applications,<br />

published on line (2009).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!