Greville's Method for Preconditioning Least Squares ... - Projects
Greville's Method for Preconditioning Least Squares ... - Projects
Greville's Method for Preconditioning Least Squares ... - Projects
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Noname manuscript No.<br />
(will be inserted by the editor)<br />
Greville’s <strong>Method</strong> <strong>for</strong> <strong>Preconditioning</strong> <strong>Least</strong> <strong>Squares</strong><br />
Problems<br />
Xiaoke Cui · Ken Hayami · Jun-Feng Yin<br />
Received: date / Accepted: date<br />
Abstract In this paper, we propose a preconditioning algorithm <strong>for</strong> least squares<br />
problems min<br />
x∈R n‖b − Ax‖ 2, where A can be matrices with any shape or rank. The preconditioner<br />
is constructed to be a sparse approximation to the Moore-Penrose inverse<br />
of the coefficient matrix A. For this preconditioner, we provide theoretical analysis<br />
to show that under our assumption, the least squares problem preconditioned by this<br />
preconditioner is equivalent to the original problem, and the GMRES method can determine<br />
a solution to the preconditioned problem be<strong>for</strong>e breakdown happens. In the<br />
end of this paper, we also give some numerical examples showing the per<strong>for</strong>mance of<br />
the method.<br />
Keywords <strong>Least</strong> <strong>Squares</strong> Problem · <strong>Preconditioning</strong> · Moore-Penrose Inverse ·<br />
Greville Algorithm · GMRES<br />
Mathematics Subject Classification (2000) 65F08 · 65F10<br />
This research was supported by the Grants-in-Aid <strong>for</strong> Scientific Research(C) of the Ministry<br />
of Education, Culture, Sports, Science and Technology, Japan.<br />
Xiaoke Cui<br />
Department of In<strong>for</strong>matics, School of Multidisciplinary Sciences The Graduate University <strong>for</strong><br />
Advanced Studies (Sokendai), 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan, 101-8430.<br />
Presently Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha,<br />
Kashiwa, Chiba, Japan, 277-8568.<br />
E-mail: xkcui@sml.k.u-tokyo.ac.jp<br />
Ken Hayami<br />
National Institute of In<strong>for</strong>matics, Japan, also Department of In<strong>for</strong>matics, School of Multidisciplinary<br />
Sciences The Graduate University <strong>for</strong> Advanced Studies (Sokendai), 2-1-2 Hitotsubashi,<br />
Chiyoda-ku, Tokyo, Japan, 101-8430.<br />
Tel.: +81-3-4212-2553<br />
E-mail: hayami@nii.ac.jp<br />
Jun-Feng Yin<br />
Department of Mathematics, Tongji University, 1239 Siping Road, Shanghai 200092, P. R.<br />
China.<br />
E-mail: yinjf@tongji.edu.cn
2<br />
1 Introduction<br />
Consider the least squares problem,<br />
min ‖b − Ax‖ 2, (1.1)<br />
x∈R n<br />
where A ∈ R m×n , b ∈ R m .<br />
To solve least squares problems, there are two main kinds of methods : direct<br />
methods and iterative methods [4]. When A is large and sparse, iterative methods,<br />
especially Krylov subspace methods are preferred <strong>for</strong> solving (1.1), since they require<br />
less memory and computational work compared to direct methods. In [14], Hayami<br />
et al. proposed using GMRES [16] to solve least squares problems by using some<br />
preconditioners. By using a preconditioner B ∈ R n×m preconditioning (1.1) from the<br />
left, we can trans<strong>for</strong>m problem (1.1) to<br />
min ‖Bb − BAx‖ 2. (1.2)<br />
x∈R n<br />
On the other hand, we can also precondition problem (1.1) from the right and trans<strong>for</strong>m<br />
the problem (1.1) to<br />
min ‖b − ABy‖ 2. (1.3)<br />
y∈R m<br />
In [14], necessary and sufficient conditions <strong>for</strong> B such that (1.2) and (1.3) are equivalent<br />
to (1.1) were given, respectively.<br />
There are different methods to construct the preconditioner B, the simplest choice<br />
of B is to choose B = A T . If rank(A) = n, one may use B = CA T where C =<br />
diag(A T A) −1 or better, use Robust Incomplete Factorization(RIF) [2,3] to construct<br />
C[14]. Another possibility is to use the incomplete Givens orthogonalization method to<br />
construct B [19]. However, so far no efficient preconditioners has been proposed <strong>for</strong> the<br />
rank deficient case. In this paper, we will construct a preconditioner which also works<br />
when A is rank deficient. The idea is to use the approximate inverse of the coefficient<br />
matrix of a linear system<br />
Ax = b (1.4)<br />
with A ∈ R n×n as a preconditioner. This kind of preconditioners were originally developed<br />
<strong>for</strong> solving nonsingular linear systems [10,15]. Since now A is a general matrix, we<br />
construct matrix M ∈ R n×m which is an approximation to the Moore-Penrose inverse<br />
[9,17] of A, and use M to precondition the least squares problem (1.1). When A is rank<br />
deficient, the preconditioner M can also be rank deficient. Note, in passing, that in [20]<br />
it was shown that <strong>for</strong> singular linear systems, singular preconditioners are sometimes<br />
better than nonsingular preconditioners.<br />
The main contribution of this paper is to give a new way to precondition general<br />
least squares problems from the perspective of the approximate Moore-Penrose<br />
inverse. Similar to the RIF preconditioner[2,3] our method also includes an A T A-<br />
orthogonalization process when the coefficient matrix is full column rank. When A<br />
is rank deficient, our method tries to orthogonalize the linearly independent part in<br />
A. We also give a theoretical analysis on the equivalence between the preconditioned<br />
problem and the original problem, and discuss the possibility of breakdown when using<br />
GMRES to solve the preconditioned system.<br />
The rest of the paper is organized as follows. In Section 2, we first introduce some<br />
properties of the Moore-Penrose inverse of a rank-one updated matrix and the original
3<br />
Greville’s method. In Section 3, based on the Greville’s method, we propose a global<br />
algorithm and a vector-wise algorithm <strong>for</strong> constructing the preconditioner M which is<br />
an approximate generalized inverse of A. In Section 4, we show that <strong>for</strong> full column<br />
rank matrix A, our algorithm is similar to the RIF preconditioning algorithm[3] and<br />
includes an A T A-orthogonalization process. In Section 5 and Section 6, we prove that<br />
under a certain assumption, using our preconditioner M, the preconditioned problem is<br />
equivalent to the original problem, and the GMRES method can determine a solution<br />
to the preconditioned problem be<strong>for</strong>e breakdown happens. In Section 7, we consider<br />
some details on the implementation of our algorithms. Numerical results are presented<br />
in Section 8. Section 9 concludes the paper.<br />
2 Greville’s <strong>Method</strong><br />
Given a rectangular matrix A ∈ R m×n , rank(A) = r ≤ min{m, n}, assume that the<br />
Moore-Penrose inverse of A is known. We are interested in how to compute the Moore-<br />
Penrose inverse of<br />
A + cd T , c ∈ R m , d ∈ R n , (2.1)<br />
which is a rank-one update of A [9,12]. In [9], the following six logical possibilities are<br />
considered<br />
1. c ∉ R(A), d ∉ R(A T ) and 1 + d T A † c arbitrary,<br />
2. c ∈ R(A), d ∉ R(A T ) and 1 + d T A † c = 0,<br />
3. c ∈ R(A), d arbitrary and 1 + d T A † c ≠ 0,<br />
4. c ∉ R(A), d ∈ R(A T ) and 1 + d T A † c = 0,<br />
5. c arbitrary, d ∈ R(A T ) and 1 + d T A † c ≠ 0,<br />
6. c ∈ R(A), d ∈ R(A T ) and 1 + d T A † c = 0.<br />
Here R(A) denotes the range space of A. For each possibility, an expression <strong>for</strong> the<br />
Moore-Penrose inverse of the rank one update of A is given by the following theorem<br />
[9].<br />
Theorem 1 For A ∈ R m×n , c ∈ R m , d ∈ R n , let k = A † c, h = d T A † , u = (I−AA † )c,<br />
v = d T (I − A † A), and β = 1 + d T A † c. Note that,<br />
c ∈ R(A) ⇔ u = 0 (2.2)<br />
d ∈ R(A T ) ⇔ v = 0, (2.3)<br />
then, the generalized inverse of A + cd T is given as follows.<br />
1. If u ≠ 0 and v ≠ 0, then (A + cd T ) † = A † − ku † − v † h + βv † u † .<br />
2. If u = 0 and v ≠ 0, and β = 0, then (A + cd T ) † = A † − kk † A † − v † h.<br />
3. If u = 0 and β ≠ 0, then (A + cd T ) † = A † + 1¯β v T k T A † − ¯β<br />
σ 1<br />
p 1 q1 T , where p 1 =<br />
( ‖k‖<br />
2<br />
) (<br />
2<br />
− ¯β<br />
vT + k , q1 T ‖v‖<br />
2<br />
)<br />
2<br />
= − ¯β<br />
kT A † + h .<br />
4. If u ≠ 0, v = 0 and β = 0, then (A + cd T ) † = A † − A † h † h − ku † .<br />
5. If v = 0 and β ≠ 0, then (A + cd T ) † = A † + 1¯β A † h T u T − ¯β<br />
σ 2<br />
p 2 q2 T , where p 2 =<br />
)<br />
( ‖h‖<br />
2<br />
( ‖u‖2<br />
− ¯β<br />
A† h T + k , q2 T 2<br />
= − ¯β<br />
uT + h , and σ 2 = ‖h‖ 2 2‖u‖ 2 2 + |β| 2 .<br />
6. If u = 0, v = 0 and β = 0, then (A + cd T ) † = A † − kk † A † − A † h † h + (k † A † h † )kh.<br />
)
4<br />
To utilize the above theorem, let<br />
A =<br />
n∑<br />
a i e T i , (2.4)<br />
i=1<br />
where a i is the ith column of A. Further let A i = [a 1 , . . . , a i , 0, . . . , 0] ∈ R m×n . Hence<br />
we have<br />
i∑<br />
A i = a k e T k , i = 1, . . . , n, (2.5)<br />
and if we denote A 0 = 0 m×n ,<br />
k=1<br />
A i = A i−1 + a i e T i , i = 1, . . . , n. (2.6)<br />
Thus every A i , i = 1, . . . , n is a rank-one update of A i−1 . Noticing that A † 0 = 0 n×m,<br />
we can utilize Theorem 1 to compute the Moore-Penrose inverse of A step by step and<br />
have A † = A † n in the end.<br />
In Theorem 1, substituting a i into c and e i into d, we can rewrite Equation (2.2)<br />
as follows.<br />
a i ∈ R(A i−1 ) ⇔ u = (I − A i−1 A † i−1 )a i = 0 (2.7)<br />
No matter whether a i is linearly dependent on the previous columns or not, we have<br />
<strong>for</strong> any i = 1, 2, . . . , n<br />
β = 1 + e T i A † i−1 a i = 1. (2.8)<br />
The vector v in equation (2.3) can be written as<br />
v = e T i (I − A † i−1 A i−1) ≠ 0, (2.9)<br />
which is nonzero <strong>for</strong> any i = 1, . . . , n. Hence, we can use Case 1 and Case 3 <strong>for</strong> case<br />
a i ∉ R(A i−1 ) and case a i ∈ R(A i−1 ), respectively.<br />
Then from Theorem 1, denoting A 0 = 0 m×n , we obtain a method to compte A † i<br />
based on A † i−1 as<br />
A † i = { A<br />
†<br />
i−1 + (e i − A † i−1 a i)((I − A i−1 A † i−1 )a i) † if a i ∉ R(A i−1 )<br />
A † i−1 + 1 σ i<br />
(e i − A † i−1 a i)(A † i−1 a i) T A † i−1 if a i ∈ R(A i−1 ) (2.10)<br />
where σ i = 1 + ‖A † i−1 a i‖ 2 2. This method was proposed by Greville in the 1960s[13].<br />
3 <strong>Preconditioning</strong> Algorithm<br />
In this section, we construct our preconditioning algorithm according to the Greville’s<br />
method in section 2. First of all, we notice that in Equation (2.10) the difference<br />
between case a i ∉ R(A i−1 ) and case a i ∈ R(A i−1 ) lies in the second term. If we define<br />
vectors k i , u i , v i and scalar f i <strong>for</strong> i = 1, . . . , n as<br />
k i = A † i−1 a i, (3.1)<br />
u i = a i − A i−1 k i = (I − A i−1 A † i−1 )a i, (3.2)<br />
{ ‖ui ‖ 2 2 if a<br />
f i =<br />
i ∉ R(A i−1 )<br />
1 + ‖k i ‖ 2 2 if a i ∈ R(A i−1 ) , (3.3)<br />
{<br />
u i if a i ∉ R(A i−1 )<br />
v i =<br />
(A † i−1 )T k i if a i ∈ R(A i−1 ) , (3.4)
5<br />
we can express A † i<br />
in a unified <strong>for</strong>m <strong>for</strong> general matrices as<br />
and we have<br />
A † i = A† i−1 + 1 f i<br />
(e i − k i )v T i , (3.5)<br />
A † =<br />
n∑<br />
i=1<br />
If we define matrices K ∈ R n×n , V ∈ R m×n , F ∈ R n×n as<br />
we obtain a matrix factorization of A † as follows.<br />
1<br />
f i<br />
(e i − k i )v T i . (3.6)<br />
K = [k 1 , . . . , k n], (3.7)<br />
V = [v 1 , . . . , v n], (3.8)<br />
⎡<br />
⎤<br />
f 1 · · · 0<br />
⎢<br />
F =<br />
.<br />
⎣ 0 ..<br />
⎥<br />
0 ⎦ , (3.9)<br />
0 · · · f n<br />
Theorem 2 Let A ∈ R m×n and rank(A) ≤ min{m, n}. Using the above notations,<br />
the Moore-Penrose inverse of A has the following factorization<br />
A † = (I − K)F −1 V T . (3.10)<br />
Here I is the identity matrix of order n, K is a strict upper triangular matrix, F is a<br />
diagonal matrix whose diagonal elements are all positive.<br />
If A is full column rank, then<br />
V = A(I − K) (3.11)<br />
A † = (I − K)F −1 (I − K) T A T . (3.12)<br />
Proof<br />
Denote Āi = [a 1 , . . . , a i ]. Then since<br />
k i = A † i−1 a i (3.13)<br />
= [a 1 , . . . , a i−1 , 0, . . . , 0] † a i (3.14)<br />
= [ Ā i−1 , 0, . . . , 0 ] † ai (3.15)<br />
[ ]<br />
Ā†<br />
= i−1 a i (3.16)<br />
0<br />
⎡ ⎤<br />
k i,1<br />
.<br />
=<br />
k i,i−1<br />
0 , (3.17)<br />
⎢ ... ⎥<br />
⎣ ⎦<br />
0<br />
K = [k 1 , . . . , k n] is a strictly upper triangular matrix.<br />
Since u i = 0 ⇔ a i ∈ R(A i−1 ),<br />
{ ‖ui ‖ 2 2 if<br />
f i =<br />
1 + ‖k i ‖ 2 2 if<br />
a i ∉ R(A i−1 )<br />
a i ∈ R(A i−1 ) . (3.18)
6<br />
Thus f i (i = 1, . . . , n) are always positive, which implies that F is a diagonal matrix<br />
with positive diagonal elements.<br />
If A is a full rank matrix, we have<br />
V = [u 1 , . . . , u n] (3.19)<br />
]<br />
=<br />
[(I − A 0 A † 0 )a 1, . . . , (I − A n−1 A † n−1 )an (3.20)<br />
= [a 1 − A 0 k 1 , . . . , a n − A n−1 k n] (3.21)<br />
= A − [A 0 k 1 , . . . , A n−1 k n] (3.22)<br />
= A − [A 1 k 1 , . . . , A nk n] (3.23)<br />
= A(I − K). (3.24)<br />
The second from the bottom equality follows from the fact that K is a strictly upper<br />
triangular matrix. Now, when A is full rank, A † can be decomposed as follows.<br />
A † = (I − K)F −1 V T = (I − K)F −1 (I − K) T A T ✷ (3.25)<br />
Remark 1 From the above proof, it is easy to see that when A is a full column rank<br />
matrix, (I − K) −T F (I − K) −1 is a LDL T Decomposition of A T A.<br />
Based on the Greville’s method, we can construct a preconditioning algorithm. We<br />
want to construct a sparse approximation to the Moore-Penrose inverse of A. Hence,<br />
we per<strong>for</strong>m some numerical droppings in the middle of the algorithm to maintain the<br />
sparsity of the preconditioner. We call the following algorithm the global Greville<br />
preconditioning algorithm, since it <strong>for</strong>ms or updates the whole matrix at a time rather<br />
than column by column.<br />
Algorithm 1 Global Greville preconditioning algorithm<br />
1. set M 0 = 0<br />
2. <strong>for</strong> i = 1 : n<br />
3. k i = M i−1 a i<br />
4. u i = a i − A i−1 k i<br />
5. if ‖u i ‖ 2 is not small<br />
6. f i = ‖u i ‖ 2 2<br />
7. v i = u i<br />
8. else<br />
9. f i = 1 + ‖k i ‖ 2 2<br />
10. v i = Mi−1 T k i<br />
11. end if<br />
12. M i = M i−1 + f 1 i<br />
(e i − k i )vi<br />
T<br />
13. per<strong>for</strong>m numerical droppings to M i<br />
14. end <strong>for</strong><br />
15. Obtain M n ≈ A † .<br />
Remark 2 In Algorithm 1, we do not need to store k i , v i , f i , i = 1, . . . , n, because we<br />
<strong>for</strong>m the M i explicitly.<br />
Remark 3 In Algorithm 1, we need to update M i at every step, but we do not need<br />
to update the whole matrix, since only the first i − 1 rows of M i−1 can be nonzero.<br />
Hence, to compute M i , we need to update the first i − 1 rows of M i−1 , and then add<br />
one new nonzero row to be the i-th row.
7<br />
Remark 4 If k i is sparse, when we update M i−1 , we only need to update the rows<br />
which correspond to the nonzero elements in k i . Hence, the rank-one update can be<br />
efficient.<br />
Remark 5 When A is nonsingular, the inverse of A can also be computed by the<br />
Sherman-Morrison <strong>for</strong>mula, which is also a rank-one update algorithm. Following this<br />
Sherman-Morrison <strong>for</strong>mula method, an incomplete factorization preconditioner can<br />
also be developed, we refer readers to [7,8].<br />
If we want to construct the matrix K, F and V without <strong>for</strong>ming M i explicitly,<br />
we can use a vector-wise version of the above algorithm. In Algorithm 1, the column<br />
vectors of K are constructed one column at a step, and u i , f i , v i are defined according<br />
to k i . Hence, it is possible to rewrite Algorithm 1 into a vector-wise <strong>for</strong>m.<br />
Since u i can be computed from a i −A i−1 k i , which does not refer to M i−1 explicitly,<br />
to vectorize Algorithm 1, we only need to <strong>for</strong>m k i and v i = Mi−1 T k i when linear<br />
dependence occurs, without using M i−1 explicitly.<br />
Consider the case when numerical droppings are not used. Since we already know<br />
that<br />
A † = (I − K)F −1 V T<br />
⎡<br />
f1<br />
−1<br />
⎢<br />
= (I − [ k 1 . . . k n ]) ⎣<br />
=<br />
n∑<br />
(e i − k i ) 1 vi T ,<br />
f i<br />
i=1<br />
. ..<br />
f −1 n<br />
⎤ ⎡<br />
v1 T<br />
⎥ ⎢ ...<br />
⎦ ⎣<br />
<strong>for</strong> any integer 1 ≤ p ≤ n, it is easy to see that<br />
p∑<br />
A † p = (e i − k i ) 1 vi T . (3.26)<br />
f i<br />
There<strong>for</strong>e, we have<br />
and<br />
i=1<br />
v i = (A † i−1 )T k i (3.27)<br />
∑i−1<br />
= ( (e p − k 1 p) vp T ) T k i<br />
f p<br />
(3.28)<br />
=<br />
p=1<br />
∑i−1<br />
1<br />
v p(e p − k p) T k i (3.29)<br />
f p<br />
p=1<br />
k i = A † i−1 a i (3.30)<br />
∑i−1<br />
= (e p − k 1 p) vp T a i<br />
f p<br />
(3.31)<br />
p=1<br />
∑i−2<br />
= (e p − k 1 p) v T 1<br />
p a i + (e i−1 − k i−1 ) v<br />
f p f<br />
i−1a T i<br />
p=1<br />
i−1<br />
(3.32)<br />
= A † i−2 a 1<br />
i + (e i−1 − k i−1 ) v<br />
f<br />
i−1a T i .<br />
i−1<br />
(3.33)<br />
v T n<br />
⎤<br />
⎥<br />
⎦
8<br />
To make this more clear, from the last column of K, the requirement relationship<br />
can be shown as<br />
A † n−2 an<br />
↗<br />
k n = A † n−1 an<br />
↖<br />
k n−1 = A † n−2 a n−1<br />
↗ ↖ ↗ ↖<br />
A † n−3 an k n−2 = A † n−3 a n−2 A † n−3 a n−1 k n−2 = A † n−3 a n−2<br />
. . . . . . .<br />
In other words, we need to compute every A † i a k, k = i + 1, . . . , n. Denote A † i a j, j > i<br />
as k i,j . In this sense, k i = k i−1,i . In the algorithm, k i,j , j > i will be stored in the j-th<br />
column of K, if j = i + 1, k i,j = k j , and it will not be updated any more. If j > i + 1,<br />
k i,j will be updated to k i+1,j and still stored in the same position.<br />
Based on the above discussion, and adding the numerical dropping strategy, we<br />
have the following algorithm. In the algorithm, we omit the first subscript of k i,j .<br />
Algorithm 2 Vector-wise Greville preconditioning algorithm<br />
1. set K = 0 n×n<br />
2. <strong>for</strong> i = 1 : n<br />
3. u = a i − A i−1 k i<br />
4. if ‖u‖ 2 is not small<br />
5. f i = ‖u‖ 2 2<br />
6. v i = u<br />
7. else<br />
8. f i = ‖k i ‖ 2 2 + 1<br />
9. v i = (A † i−1 )T k i = ∑ i−1<br />
p=1 f 1<br />
p<br />
v p(e p − k p) T k i<br />
10. end if<br />
11. <strong>for</strong> j = i + 1, . . . , n<br />
12. k j = k j + vT i aj<br />
f i<br />
(e i − k i )<br />
13. per<strong>for</strong>m numerical droppings on k j<br />
14. end <strong>for</strong><br />
15. end <strong>for</strong><br />
16. Obtain K = [k 1 , . . . , k n], F = Diag {f 1 , . . . , f n}, V = [v 1 , . . . , v n].<br />
4 The Special Case When A is Full Column Rank<br />
In this section, we especially take a look at the full column rank case. When A is full<br />
column rank, Algorithm 2 can be simplified as follows.<br />
Algorithm 3 Vector-wise Greville preconditioning algorithm <strong>for</strong> full column rank matrices<br />
1. set K = 0 n×n<br />
2. <strong>for</strong> i = 1 : n<br />
3. u i = a i − A i−1 k i<br />
4. f i = ‖u i ‖ 2 2<br />
5. <strong>for</strong> j = i + 1, . . . , n
9<br />
6. k j = k j + uT i aj<br />
f i<br />
(e i − k i )<br />
7. per<strong>for</strong>m numerical droppings on k j<br />
8. end <strong>for</strong><br />
9. end <strong>for</strong><br />
10. Obtain K = [k 1 , . . . , k n], F = Diag{f 1 , . . . , f n}.<br />
In Algorithm 3,<br />
u i = a i − A i−1 k i<br />
⎡ ⎤<br />
−k i,1<br />
.<br />
−k i,i−1<br />
= [a 1 , . . . , a i , 0, . . . , 0]<br />
1<br />
0 ⎢<br />
⎣ ... ⎥<br />
⎦<br />
0<br />
= A i (e i − k i )<br />
= A(e i − k i ).<br />
If we denote e i − k i as z i , then u i = Az i .<br />
Then, line 6 of Algorithm 3 can be rewritten as<br />
k j = k j + uT i a j<br />
‖u i ‖ 2 (e i − k i )<br />
2<br />
e j − k j = e j − k j − uT i a j<br />
‖u i ‖ 2 (e i − k i )<br />
2<br />
z j = z j − uT i a j<br />
‖u i ‖ 2 z i .<br />
2<br />
Denote d i = ‖u i ‖ 2 2 and θ = uT i a j<br />
. Then, combining all the new notations, we can<br />
d i<br />
rewrite the algorithm as follows.<br />
Algorithm 4<br />
1. set Z = I n×n<br />
2. <strong>for</strong> i = 1 : n<br />
3. u i = A i z i<br />
4. d i = (u i , u i )<br />
5. <strong>for</strong> j = i + 1, . . . , n<br />
6. θ = (ui,aj)<br />
d i<br />
7. z j = z j − θz i<br />
8. end <strong>for</strong><br />
9. end <strong>for</strong><br />
10. Z = [z 1 , . . . , z n], D = Diag{d 1 , . . . , d n}.<br />
Remark 6 Since z i = e i − k i , we have Z = I − K. Denoting D = Diag{d 1 , . . . , d n}, the<br />
factorization of A † in Theorem 2 can be rewritten as<br />
A † = ZD −1 Z T A T (4.1)
10<br />
Remark 7 In Algorithm 4, the definition of θ can be rewritten as<br />
θ = (u i, a j )<br />
d i<br />
(4.2)<br />
= (Az i, Ae j )<br />
(Az i , Az i )<br />
(4.3)<br />
= (z i, e j ) A T A<br />
. (4.4)<br />
(z i , z i ) A T A<br />
When A is full column rank, A T A is symmetric positive definite. Hence, (·, ·) A T A<br />
defines an inner product. Hence, Algorithm 4 includes an A T A-orthogonalization procedure.<br />
Based on this observation, we conclude that <strong>for</strong> the special case in which A<br />
is full column rank, our algorithm is very similar to the RIF preconditioner which<br />
was proposed by Benzi and Tůma [3]. The difference lies in the implementation. Our<br />
algorithm looks <strong>for</strong> an incomplete factorization of A † , while RIF looks <strong>for</strong> an LDL T<br />
decomposition of A T A.<br />
5 Equivalence Condition<br />
Consider solving the least squares problem (1.1) by trans<strong>for</strong>ming it into the left preconditioned<br />
<strong>for</strong>m,<br />
min ‖Mb − MAx‖ 2, (5.1)<br />
x∈R n<br />
where A ∈ R m×n , M ∈ R n×m , and b is a right-hand-side vector b ∈ R m .<br />
When preconditioning a least squares problem, one important issue is whether<br />
the solution of the preconditioned problem is the solution of the original problem. For<br />
nonsingular linear systems, the condition <strong>for</strong> this equivalence is that the preconditioner<br />
M should be nonsingular. Since we are dealing with general rectangular matrices,<br />
we need some other conditions to ensure that the preconditioned problem (5.1) is<br />
equivalent to the original least squares problem (1.1).<br />
First, note the following lemma [14].<br />
Lemma 1<br />
and<br />
‖b − Ax‖ 2 = min<br />
x∈R n ‖b − Ax‖ 2<br />
‖Mb − MAx‖ 2 = min<br />
x∈R n ‖Mb − MAx‖ 2<br />
are equivalent <strong>for</strong> all b ∈ R m , if and only if R(A) = R(M T MA).<br />
To analyze the equivalence between the original problem (1.1) and the preconditioned<br />
problem (5.1), where M is from any of our algorithms, we first consider the<br />
simple case in which A is a full column rank matrix. After numerical droppings, we<br />
have,<br />
A † ≈ M = ZF −1 Z T A T . (5.2)<br />
Note that Z is a unit upper triangular matrix and F is a diagonal matrix with positive<br />
elements. Hence, we have<br />
M = CA T , (5.3)
11<br />
where C is a nonsingular matrix. Hence, we have<br />
R(M T ) = R(A). (5.4)<br />
For the general case, K and F are still nonsingular. However, the expression <strong>for</strong> V is<br />
not straight<strong>for</strong>ward. To simplify the problem, we make the following assumption.<br />
Assumption 1<br />
1. There is no zero column in A.<br />
2. Our preconditioning algorithm can detect all the linear independence in A.<br />
The definition of v i can be rewritten as follows.<br />
v i =<br />
{ ai − Ak i ∈ R(A i ) ∉ R(A i−1 ) if a i ∉ R(A i−1 )<br />
(A † i−1 )T k i ∈ R(A i−1 ) = R(A i ) if a i ∈ R(A i−1 )<br />
(5.5)<br />
Hence, we have<br />
span{v 1 , v 2 , . . . , v i } = span{a 1 , a 2 , . . . , a i }. (5.6)<br />
On the other hand, note the 12-th line of Algorithm 1<br />
M i = M i−1 + 1 f i<br />
(e i − k i )v T i , (5.7)<br />
which implies that every row of M i is a linear combination of the vectors v T k , 1 ≤ k ≤ i,<br />
i.e.,<br />
R(M T i ) = span{v 1 , . . . , v i }. (5.8)<br />
Based on the above discussions, we obtain the following theorem.<br />
Theorem 3 Let A ∈ R m×n , m ≥ n. If Assumption 1 holds, then we have the following<br />
relationships, where M is the approximate Moore-Penrose inverse constructed by<br />
Algorithm 1.<br />
R(M T ) = R(V ) = R(A) (5.9)<br />
Remark 8 In Assumption 1, we assume that our algorithms can detect all the linear<br />
independence in the columns of A. Hence, we allow such mistakes that a linearly<br />
dependent column is recognized as a linearly independent column. An extreme case is<br />
that we recognize all the columns of A as linearly independent, i.e., we take A as a full<br />
column rank matrix. In this sense, our assumption can always be satisfied.<br />
Hence, we proved that <strong>for</strong> any matrix A ∈ R m×n , R(A) = R(M T ). We have the<br />
following theorem.<br />
Theorem 4 If Assumption 1 holds, then <strong>for</strong> all b ∈ R m , the preconditioned least<br />
squares problem (5.1), where M is constructed by Algorithm 1, is equivalent to the<br />
original least squares problem (1.1).
12<br />
Proof If Assumption 1 holds, we have<br />
R(A) = R(M T ). (5.10)<br />
Then there exists a nonsingular matrix C ∈ R n×n such that A = M T C. Hence,<br />
R(M T MA) = R(M T MA) (5.11)<br />
= R(M T MM T C) (5.12)<br />
= R(M T MM T ) (5.13)<br />
= R(M T M) (5.14)<br />
= R(M T ) (5.15)<br />
= R(A). (5.16)<br />
In the above equalities we used the relationship R(MM T ) = R(M).<br />
By Theorem 1 we complete the proof. ✷<br />
6 Breakdown-free Condition<br />
In this section, we assume without losing generality that the first r columns of A are<br />
linearly independent. Hence,<br />
R(A) = span{a 1 , . . . , a r}, (6.1)<br />
where rank(A) = r, and a i (i = 1, . . . , r) is the i-th column of A. The reason is that we<br />
can incorporate a column pivoting in Algorithm 1 easily. Then we have,<br />
a i ∈ span{a 1 , . . . , a r}, i = r + 1, . . . , n. (6.2)<br />
In this case, after per<strong>for</strong>ming Algorithm 1 with numerical dropping, matrix V can be<br />
written in the <strong>for</strong>m<br />
V = [v 1 , . . . , v r, v r+1 , . . . , v n]. (6.3)<br />
If we denote [v 1 , . . . , v r] as U r, then<br />
U r = A(I − K)I r, I r =<br />
There exists a matrix H ∈ R r×n−r such that<br />
H can be rank deficient. Then, V is given by<br />
[ ] Ir×r<br />
. (6.4)<br />
0<br />
[v r+1 , . . . , v n] = U rH = A(I − K)I rH. (6.5)<br />
V = [v 1 , . . . , v r, v r+1 , . . . , v n] (6.6)<br />
= [U r, U rH] (6.7)<br />
= U r [ I r×r H ] (6.8)<br />
[ ] Ir×r<br />
= A(I − K) [ I r×r H ] (6.9)<br />
0<br />
[ ] Ir×r H<br />
= A(I − K)<br />
. (6.10)<br />
0 0
13<br />
Hence,<br />
[ ]<br />
M = (I − K)F −1 Ir×r 0<br />
H T (I − K) T A T . (6.11)<br />
0<br />
From the above equation, we can also see that the difference between the full column<br />
rank case and the rank deficient case lies in<br />
[ ] Ir×r 0<br />
H T , (6.12)<br />
0<br />
which should be an identity matrix when A is full column rank.<br />
If there is no numerical dropping, M will be the Moore-Penrose inverse of A,<br />
[ ]<br />
A † = (I − ˆK) ˆF −1 Ir×r 0<br />
Ĥ T (I −<br />
0<br />
ˆK) T A T . (6.13)<br />
Comparing Equation (6.11) and Equation (6.13), we have the following theorem.<br />
Theorem 5 Let A ∈ R m×n . If Assumption 1 holds, then the following relationships<br />
hold, where M denotes the approximate Moore-Penrose inverse constructed by Algorithm<br />
1,<br />
R(M) = R(A † ) = R(A T ). (6.14)<br />
Based on Theorem 3 and Theorem 5, we have the following theorem which ensures<br />
that the GMRES method can determine a solution to the preconditioned problem<br />
MAx = Mb be<strong>for</strong>e breakdown happens <strong>for</strong> any b ∈ R m .<br />
Theorem 6 Let A ∈ R m×n . If Assumption 1 holds, then GMRES can determine a<br />
least squares solution to<br />
min ‖MAx − Mb‖ x∈R n 2 (6.15)<br />
be<strong>for</strong>e breakdown happens <strong>for</strong> all b ∈ R m , where M is constructed by Algorithm 1.<br />
Proof According to Theorem 2.1 in Brown and Walker [6], we only need to prove<br />
which is equivalent to<br />
N (MA) = N (A T M T ), (6.16)<br />
R(MA) = R(A T M T ). (6.17)<br />
Using the result from Theorem 3, there exists a nonsingular matrix T ∈ R n×n such<br />
that A = M T T . Hence,<br />
On the other hand,<br />
R(MA) = R(MM T T ) (6.18)<br />
= R(MM T ) (6.19)<br />
= R(M). (6.20)<br />
R(A T M T ) = R(A T AT −1 ) (6.21)<br />
= R(A T A) (6.22)<br />
= R(A T ). (6.23)<br />
The proof is completed using Theorem 5.<br />
✷
14<br />
Remark 9 From the proof of the above theorem, we have R(MA) = R(M), which<br />
means that the preconditioned least squares problem (5.1) is a consistent problem.<br />
Thus, by Theorem 4 and Theorem 6, we finally obtain our main result <strong>for</strong> our<br />
Greville preconditioner M.<br />
Theorem 7 Let A ∈ R m×n . If Assumption 1 holds, then <strong>for</strong> any b ∈ R m , preconditioned<br />
GMRES determines a least squares solution of<br />
min ‖MAx − Mb‖ x∈R n 2 (6.24)<br />
be<strong>for</strong>e breakdown and this solution attains min<br />
x∈R n‖b − Ax‖ 2, where M is computed by<br />
Algorithm 1.<br />
Remark 10 Since all the proofs are based on the factorization M = (I − K)F −1 V T ,<br />
the theorems hold also <strong>for</strong> Algorithm 2.<br />
7 Implementation Consideration<br />
7.1 Detecting Linear Dependence<br />
In Algorithm 1 and Algorithm 2, one important issue is how to recognize the columns<br />
which are linearly dependent on the previous columns. In the algorithms we check<br />
whether ‖u i ‖ 2 is small to detect the rank deficiency. Simply speaking, we can set up<br />
a tolerance τ in advance, and switch to “else” when ‖u i ‖ 2 < τ. However, is this good<br />
enough to help us detect the linear dependent columns of A when we per<strong>for</strong>m numerical<br />
droppings? To address this issue, we first take a look at the RIF preconditioning<br />
algorithm.<br />
The RIF preconditioner was developed <strong>for</strong> full rank matrices. However, numerical<br />
experiments showed that it also works <strong>for</strong> rank deficient matrices. For this phenomenon,<br />
our equivalent Algorithm 3 can give a good insight into the RIF preconditioner, since<br />
the only possibility <strong>for</strong> the RIF preconditioner to breakdown is when f i = 0, which<br />
implies that u i is a zero vector. From Algorithm 1 and Algorithm 2 we have<br />
u i = a i − A i−1 k i (7.1)<br />
= a i − A i−1 A † i−1 a i (7.2)<br />
= (I − A i−1 A † i−1 )a i. (7.3)<br />
Thus, u i is the projection of a i onto R(A i−1 ) ⊥ . Hence, in exact arithmetic u i = 0<br />
if and only if a i ∈ R(A i−1 ). Algorithm 1 and Algorithm 2 have an alternative when<br />
a i ∈ R(A i−1 ), i.e. when u i = 0, Algorithm 1 and Algorithm 2 will turn to the “else”<br />
case. However, this is not always necessary, because of the numerical droppings. With<br />
numerical droppings, the u i is actually,<br />
u i = a i − A i−1 M i−1 a i (7.4)<br />
= a i − A i−1 (A † i−1 + ∆E)a i (7.5)<br />
= a i − A i−1 A † i−1 a i − A i−1 ∆Ea i (7.6)<br />
= −A i−1 ∆Ea i (7.7)<br />
≠ 0, (7.8)
15<br />
where ∆E is an error matrix. Hence, even though a i ∈ R(A i−1 ), since u i will not<br />
be the exact projection of a i onto R(A i−1 ) ⊥ , the RIF algorithm will not necessarily<br />
breakdown when linear dependence happens.<br />
The RIF preconditioner does not take the rank deficient columns or nearly rank deficient<br />
columns into consideration. Hence, if we can capture the rank deficient columns,<br />
we might be able to have a better acceleration to the convergence of the preconditioned<br />
problem. Assume the M we compute from Algorithm 1 or Algorithm 2 can be viewed<br />
as an approximation to A † with error matrix E ∈ R n×m , that is<br />
M = A † + E. (7.9)<br />
First note a theoretical result about the perturbation lower bound of the generalized<br />
inverse.<br />
Theorem 8 [18] If rank(A + E) ≠ rank(A), then<br />
‖(A + E) † − A † ‖ 2 ≥ 1<br />
‖E‖ 2<br />
. (7.10)<br />
By Theorem 8, if the rank of M = A † + E from our algorithm is not equal to the<br />
rank of A † , ( or A, since they have the same rank), by the above theorem, we have,<br />
‖M † − (A † ) † ‖ 2 ≥ 1<br />
‖E‖ 2<br />
(7.11)<br />
⇒ ‖M † − A‖ 2 ≥ 1<br />
‖E‖ 2<br />
. (7.12)<br />
The above inequality says that, if we denote M † = A + ∆A, then ‖∆A‖ 2 ≥ 1<br />
‖E‖ 2<br />
.<br />
Hence, when ‖E‖ 2 is small, which means M is a good approximation to A † , M can<br />
be an exact generalized inverse of another matrix which is far from A, and the smaller<br />
the ‖E‖ 2 is, the further M † is from A. In this sense, if the rank of M is not the same<br />
as that of A, M may not be a good preconditioner.<br />
Thus, it is important to maintain the rank of M to be the same as rank(A). Hence,<br />
when we per<strong>for</strong>m our algorithm, we need to sparsify the preconditioner M, but at the<br />
same time we also want to capture as many rank deficient columns as possible, and<br />
maintain the rank of M. To achieve this, apparently, it is very important to decide<br />
how to detect whether the exact value u i = ‖(I − A i−1 A † i−1 )a i‖ 2 is close to zero or<br />
not based on the computed value u i = ‖(I − A i−1 M i−1 )a i ‖ 2 .<br />
From the previous discussion, we have<br />
When a i ∈ R(A i−1 ), u i = a i − A i−1 A † i−1 a i = 0. Then,<br />
u i = u i − A i−1 ∆Ea i . (7.13)<br />
u i = −A i−1 ∆Ea i , (7.14)<br />
‖u i ‖ 2 ≤ ‖A i−1 ‖ F ‖a i ‖ 2 ‖∆E‖ F . (7.15)<br />
If we require ∆E to be small, we can use a tolerance τ s. If<br />
‖u i ‖ 2 ≤ τ s‖A i−1 ‖ F ‖a i ‖ 2 , (7.16)<br />
we suppose we detect a column a i which is in the range space of A i−1 . From now on<br />
we call τ s the switching tolerance.
16<br />
An appropriate switching tolerance τ s is important to the efficiency of the preconditioner.<br />
When we choose a small τ s, e.g. 10 −8 , we have less chance to detect the<br />
linearly dependent columns. On the other hand, when a relatively large τ s, e.g. 10 −3 , is<br />
chosen, the preconditioning algorithm tends to find more linearly dependent columns.<br />
When a large τ s is taken, many linearly dependent columns can be found. However, it<br />
is also possible that Assumption 1 is violated so that the original least squares problem<br />
can not be solved. When a relatively small τ s is taken, it is possible that less or no<br />
linearly dependent columns can be found. Hence, the preconditioner can loose some<br />
efficiency. When τ s is fixed, a larger dropping tolerance τ d results in less possibility<br />
to detect linearly dependent columns, and a smaller τ d results in larger possibility to<br />
detect linearly dependent columns. Choosing optimal τ s and the dropping tolerance τ d<br />
to achieve the balance between them remains a problem.<br />
Another issue related with the dropping tolerance τ d is how much memory is needed<br />
to store the preconditioner. From the previous discussion, we know that R(M T ) =<br />
span{V }. When numerical droppings is per<strong>for</strong>med on V , the relation may not hold.<br />
Hence, in Algorithm 2 we only per<strong>for</strong>m numerical droppings on the columns of K. In<br />
this way, as long as Assumption 1 holds, we have R(M T ) = span{V }. However, on<br />
the other hand, we cannot control the sparsity of V directly. We will discuss this issue<br />
again at the end of this section.<br />
7.2 Right-<strong>Preconditioning</strong><br />
So far we assumed A ∈ R m×n , m ≥ n, and only discussed left-preconditioning. When<br />
m ≥ n, it is better to per<strong>for</strong>m left-preconditioning since the size of the preconditioned<br />
problem will be smaller. When m ≤ n, right-preconditioning will be better, i.e, solving<br />
the preconditioned problem<br />
min ‖b − ABy‖ 2. (7.17)<br />
y∈R m<br />
When m ≤ n, Theorem 3 and Theorem 5 still hold, since in the proof of these two<br />
theorems we did not refer to the fact that m ≥ n. Then, according to the following<br />
lemma from [14], it is easy to obtain similar conclusions <strong>for</strong> right-preconditioning.<br />
Lemma 2 min ‖b − Ax‖ x∈R n 2 = min ‖b − ABy‖ y∈R m 2 holds <strong>for</strong> any b ∈ R m if and only if<br />
R(A) = R(AB).<br />
Theorem 9 Let A ∈ R m×n . If Assumption 1 holds, then <strong>for</strong> any b ∈ R m we have<br />
min ‖b − Ax‖ x∈R n 2 = min ‖b − AMy‖ 2, where M is a preconditioner constructed by Algorithm<br />
y∈R m<br />
1.<br />
Proof From Theorem 5, we know that R(M) = R(A T ), which implies that there exists<br />
a nonsingular matrix C ∈ R m×m such that M = A T C. Hence,<br />
R(AM) = R(AA T C) (7.18)<br />
= R(AA T ) (7.19)<br />
= R(A). (7.20)<br />
Using Theorem 2 we complete the proof.<br />
✷
17<br />
Theorem 10 Let A ∈ R m×n . If Assumption 1 holds, then GMRES determines a<br />
solution to the right preconditioned problem<br />
be<strong>for</strong>e breakdown happens.<br />
min ‖b − AMy‖ y∈R m 2 (7.21)<br />
Proof The proof is the same as Theorem 6.<br />
✷<br />
Theorem 11 Let A ∈ R m×n . If Assumption 1 holds, then <strong>for</strong> any b ∈ R m , GMRES<br />
determines a least squares solution of<br />
min ‖b − AMy‖ y∈R m 2 (7.22)<br />
be<strong>for</strong>e breakdown and this solution attains min<br />
x∈R n‖b − Ax‖ 2, where M is computed by<br />
Algorithm 1.<br />
We would like to remark that it is preferable to per<strong>for</strong>m Algorithm 1 and Algorithm<br />
2 to A T rather than A when m < n, based on the following three reasons. By doing<br />
so, we construct ˆM, an approximate generalized inverse of A T . Then, we can use ˆM T<br />
as the preconditioner to the original least squares problem.<br />
1. In Algorithm 1 and Algorithm 2, the approximate generalized inverse is constructed<br />
row by row. Hence, we per<strong>for</strong>m a loop which goes through all the columns of A<br />
once. When m ≥ n, this loop is relatively short. However, when m < n, this loop<br />
could become very long, and the preconditioning will be more time-consuming.<br />
2. Another reason is that, linear dependence will always happen in this case even<br />
though matrix A is full row rank. If m τ s ∗ ‖A i−1 ‖ F ∗ ‖a i ‖ 2
18<br />
10. f i = ‖u‖ 2 2<br />
11. v i = u<br />
12. else<br />
13. f i = ‖k i ‖ 2 2 + 1<br />
14. v i = (A † ∑i−1<br />
1<br />
i−1 )T k i = v p(e p − k p) T k i<br />
f p<br />
p=1<br />
15. end if<br />
16. end <strong>for</strong><br />
17. K = [k 1 , . . . , k n], F = Diag {f 1 , . . . , f n}, V = [v 1 , . . . , v n].<br />
In Algorithm 2, all the columns k j , j = i, . . . , n are updated in one outer loop. In<br />
Algorithm 5, all the updates <strong>for</strong> each k i column are finished at a time. This left-looking<br />
approach is suggested in [1,5]. From [1], a left-looking version is sometimes more robust<br />
and efficient. In Section 8 our preconditioners are computed by Algorithm 5.<br />
In the next section, we will use Algorithm 5 as our preconditioning algorithm.<br />
Since we cannot control the sparsity of V , and V can be very dense, we try not to<br />
store V . Note that when column a j is recognized as a linearly independent column,<br />
the corresponding column becomes v j = A(e j − k j ). Hence, we do not have to store<br />
v j . When we need v j in the 4th line in Algorithm 5, we can compte θ = a T i A(e j − k j ),<br />
where a T i A is used <strong>for</strong> i − 1 times and discarded. For the v j corresponding to a linearly<br />
dependent column, we need to store it.<br />
8 Numerical Examples<br />
In this section, we use matrices from the Florida University Sparse Matrices Collection<br />
[11] to test our algorithms, where zero rows are omitted and if the original matrix<br />
has more columns than rows we use its transpose. All computations were run on a<br />
Dell Precision 690, where the CPU is 3 GHz and the memory is 16 GB, and the<br />
programming language and compiling environment was GNU C/C++ 3.4.3 in Redhat<br />
Linux. Detailed in<strong>for</strong>mation of the matrices is given in Table 1.
19<br />
Table 1 In<strong>for</strong>mation on the test matrices<br />
Name m n rank density(%) cond<br />
vtp base 346 198 198 1.53 3.55 × 10 7<br />
capri 482 271 271 1.45 8.09 × 10 3<br />
scfxm1 600 330 330 1.38 2.42 × 10 4<br />
scfxm2 1200 660 660 0.69 2.42 × 10 4<br />
perold 1560 625 625 0.65 5.13 × 10 5<br />
scfxm3 1800 990 990 0.51 1.39 × 10 3<br />
maros 1966 846 846 0.61 1.95 × 10 6<br />
pilot we 2928 722 722 0.44 4.28 × 10 5<br />
stoc<strong>for</strong>2 3045 2157 2157 0.14 2.84 × 10 4<br />
bnl2 4486 2324 2324 0.14 7.77 × 10 3<br />
pilot 4860 1441 1441 0.63 2.66 × 10 3<br />
d2q06c 5831 2171 2171 0.26 1.33 × 10 5<br />
80bau3b 11934 2262 2262 0.09 567.23<br />
beaflw 500 492 460 21.71 1.52 × 10 9<br />
beause 505 492 459 17.93 6.74 × 10 8<br />
lp cycle 3371 1890 1875 0.33 1.46 × 10 7<br />
Pd rhs 5804 4371 4368 0.02 3.36 × 10 8<br />
Model9 10939 2879 2787 0.18 7.90 × 10 20<br />
landmark 71952 2673 2671 0.60 1.02 × 10 8<br />
In Table 1, the matrices in the first section are full column rank matrices, the<br />
matrices in the second section are rank deficient matrices. The condition number of A<br />
is given by σ1(A) , where r = rank(A). We construct the preconditioner M by Algorithm<br />
σ r(A)<br />
5 and per<strong>for</strong>m the BA-GMRES [14] which is given below, where B is the preconditioner.<br />
Algorithm 6 BA- GMRES<br />
1. Choose x 0 , ˜r 0 = B(b − Ax 0 )<br />
2. v 1 = ˜r 0 /‖˜r 0 ‖ 2<br />
3. <strong>for</strong> i = 1, 2, . . . , k<br />
4. w i = BAv i<br />
5. <strong>for</strong> j = 1, 2, . . . , i<br />
6. h j,i = (w i , v j )<br />
7. w i = w i − h j,i v j<br />
8. end <strong>for</strong><br />
9. h i+1,i = ‖w i ‖ 2<br />
10. v i+1 = w i /h i+1,i<br />
11. Find y i ∈ R i which minimizes ‖˜r i ‖ 2 = ∥ ∥ ‖˜r0 ‖ 2 e i − ¯H i y ∥ ∥ 2<br />
12. x i = x 0 + [v 1 , . . . , v i ]y i<br />
13. r i = b − Ax i<br />
14. if ‖A T r i ‖ 2 < ε stop<br />
15. end <strong>for</strong><br />
16. x 0 = x k<br />
17. Go to 1.<br />
The BA-GMRES is a method that solves least squares problems with GMRES by<br />
preconditioning the original problem with a suitable left preconditioner.<br />
In the following examples, the right hand side vectors b are generated randomly by<br />
Matlab so that the least squares problems we solve are inconsistent. In this section,
20<br />
our dropping rule is to drop the i-th element of k j , i.e., k j,i when the following two<br />
inequalities holds.<br />
|k j,i |‖a i ‖ 2 < τ d and |k j,i | < τ d ′ max<br />
l=1,...,i−1 |k l,i|. (8.1)<br />
Here we have another tolerance τ d ′. In the following examples, we simply fix it to<br />
10.0 ∗ τ d . We use<br />
‖u i ‖ 2 ≤ τ s‖A i−1 ‖ F ‖a i ‖ 2 , (8.2)<br />
as the criterion to judge if column a i is linearly dependent on the previous columns or<br />
not, where τ s is our switching tolerance. The stopping rule <strong>for</strong> BA-GMRES is<br />
‖A T (b − Ax)‖ 2 ≤ 10 −8 · ‖A T b‖ 2 . (8.3)<br />
In the following tables, we use our preconditioner to precondition GMRES and compare<br />
with GMRES preconditioned by RIF, which is denoted by “RIF-GMRES”, CGLS<br />
[4] preconditioned by Incomplete Modified Gram-Schmidt orthogonalization preconditioner<br />
(IMGS) [15], which is denoted by “IMGS-CGLS”, CGLS preconditioned by using<br />
the diagonal scaling preconditioner, which is denoted by “D-CGLS” and CGLS preconditioned<br />
by the Incomplete Cholesky Factorization preconditioner [4], which is denoted<br />
by “IC-CGLS”, to solve the original least squares problem min<br />
x∈R n‖b−Ax‖ 2. The columns<br />
τ d and τ s list the dropping tolerance and the switching tolerance we used in our preconditioning<br />
algorithm. The column “ITS” gives the number of iterations. The columns<br />
“Pre. T” and “Tol. T” give the preconditioning time and the total time in seconds,<br />
respectively. For “RIF-GMRES”, “IMGS-CGLS”, “D-CGLS” and “D-CGSL”, in each<br />
cell we give the numbers of iterations and the total CPU time. For “RIF-GMRES” and<br />
“IMGS-CGLS”, we give the parameters we used in brackets. In the “ nnzP<br />
nnzA ” column,<br />
we give the ratio of the number of nonzero elements in our preconditioner and A, where<br />
nnzP is the number of nonzero elements in K plus the number of the nonzero elements<br />
in diagonal matrix F . The ∗ indicates the fastest method <strong>for</strong> each problem.<br />
Table 2 Numerical Results <strong>for</strong> Full Rank Matrices<br />
matrix τ d<br />
nnzP<br />
nnzA ITS Pre. T Tot. T RIF-GMRES IMGS-CGLS D-CGLS IC-CGLS<br />
vtp base 1.e-6 18.0 4 0.03 *0.03 5, 0.04 (1.e-9) 9, 0.10 (1.e-5) 3667, 0.29 1348, 0.12<br />
capri 1.e-3 10.0 17 0.03 0.04 9, 0.07 (1.e-4) 371, 0.04 (1.0) 496, 0.06 195, *0.03<br />
scfxm1 1.e-3 13.3 17 0.05 0.06 7, *0.05 (1.e-5) 116, 0.08 (1.e-1) 761, 0.11 318, 0.06<br />
scfxm2 1.e-3 16.6 27 0.12 *0.18 12, 0.26 (1.e-5) 734, 0.28 (1.0) 1332, 0.42 680, 0.23<br />
perold 1.e-2 12.4 17 0.17 0.21 45, 0.36 (1.e-4) 134, 0.26 (1.e-1) 1058, 0.36 394, *0.14<br />
scfxm3 1.e-4 27.0 7 0.30 *0.34 12, 0.46 (1.e-5) 878, 0.53 (1.0) 1583, 0.72 847, 0.41<br />
maros 1.e-6 28.8 3 1.02 *1.05 10, 1.13 (1.e-7) 383, 1.47 (1.e-1) 13714, 6.71 3602, 1.8<br />
pilot we 1.e-1 4.5 41 0.10 0.20 12, 0.4 (1.e-5) 536, 0.37 (1.0) 736, 0.38 287, *0.14<br />
stoc<strong>for</strong>2 1.e-5 152.2 5 10.55 10.68 16, 11.5 (1.e-7) 138, 3.14 (1.e-1) 2147, 1.59 1362, *1.12<br />
bnl2 1.e-1 3.0 114 0.4 1.22 127, 2.79 (1.e-2) 598, 1.24 (1.0) 657, 0.64 240, *0.36<br />
pilot 1.e-2 2.9 18 1.54 1.72 359, 4.57 (1.e-1) 741, 1.49 (1.0) 822, 1.24 269, *0.75<br />
d2q06c 1.e-3 18.7 32 4.27 5.15 19, 10.68 (1.e-5) 1209, 3.6 (1.0) 2743, 4.05 1077, *1.63<br />
80bau3b 5.e-1 0.5 28 0.46 0.52 26, 0.54 (5.e-1) 47,1.34 (1.0) 56, *0.09 21, 0.21<br />
For full rank matrices, we did not list τ s because we simply set τ s to be a very small<br />
number and do not try to detect any linearly dependent columns in the coefficient<br />
matrices. From the numerical results in Table 2, we conclude that our preconditioner<br />
per<strong>for</strong>ms competitively when the matrices are ill-conditioned. We can also see that our
21<br />
preconditioner is usually much denser than the original matrix A when the density of<br />
A is larger than 0.5%. When the density of A is less than 0.3%, the density of our<br />
preconditioner is usually similar to or less than that of A. In Table 3 we show the<br />
numerical results <strong>for</strong> rank deficient matrices. Column “D” gives the number of linearly<br />
dependent columns that we detected.<br />
Table 3 Numerical Results <strong>for</strong> Rank Deficient Matrices<br />
matrix τ d , τ s<br />
nnzP<br />
nnzA D ITS Pre. T Tot. T<br />
beaflw 1.e-5, 1.e-10 2.25 26 209 1.22 *2.25<br />
beause 1.e-5, 1.e-11 2.71 31 20 1.16 *1.22<br />
lp cycle 1.e-4, 1.e-7 33.5 15 27 2.17 *2.68<br />
Pd rhs 1.e-4, 1.e-8 0.75 3 3 0.79 0.80<br />
model09 1.e-2, 1.e-6 2.92 0 12 0.97 *1.12<br />
landmark 1.e-1, 1.e-8 0.05 0 143 2.83 10.37<br />
matrix RIF-GMRES IMGS-CGLS D-CGLS IC-CGLS<br />
beaflw † † † †<br />
beause † † † †<br />
lp cycle † 5832, 4.87(100.0) 6799, 6.74 †<br />
Pd rhs † 25, 0.93(0.1) 242, *0.29 89, 0.54<br />
model09 46, 2.53(1.e-3) † 1267, 3.09 412, 1.22<br />
landmark 239, 38.43(10.0) 252, 36.68(10.0) 252, *8.77 †<br />
†: GMRES did not converge in 2, 000 steps or D-CGLS did not converge in 25, 000 steps.<br />
From the results in Table 3 we can conclude that our preconditioner works <strong>for</strong><br />
all the tested problems, and per<strong>for</strong>med competitively with other preconditioners. The<br />
density of our preconditioner is about 3 times the original matrix A or less except <strong>for</strong><br />
lp cycle.<br />
For the matrix lp cycle, we know that the rank deficient columns are,<br />
182 184 216 237 253<br />
717 754 961 1221 1239<br />
1260 1261 1278 1640 1859,<br />
(8.4)<br />
15 columns in all. As we know that the rank deficient columns are not unique, the<br />
above columns we list are the columns which are linearly dependent on their previous<br />
columns. In the following example, we can see that our preconditioning algorithm can<br />
detect many of them.<br />
Table 4 Numerical Results <strong>for</strong> lp cycle<br />
τ d τ s deficiency detected ITS Pre. T Its. T Tot. T<br />
1.e-6 1.e-10 −1239, −1261, −1278 6 3.65 0.16 3.81<br />
1.e-6 1.e-7 detect all exactly 4 3.7 0.10 3.8<br />
1.e-6 1.e-5 detected 95 col † 4.7<br />
In the above table, we tested our preconditioning algorithm with fixed dropping<br />
tolerance τ d = 10 −6 and different switching tolerances τ s. For this problem lp cycle,
22<br />
RIF preconditioned GMRES did not converge in 2000 steps <strong>for</strong> all the dropping tolerances<br />
τ d we tried. The column “deficiency detected” gives the linearly dependent<br />
columns detected by our algorithm. −1260 means that the 1260th column, which is<br />
a rank deficient column, is missed by our algorithm. For example, when τ d = 10 −6 ,<br />
τ s = 10 −10 our preconditioning algorithm detected 12 rank deficient columns, and<br />
missed the 1239-th, 1261-th, and 1278-th columns, and did not detect wrong linearly<br />
dependent columns. Hence, Assumption 1 is satisfied. When τ d = 10 −6 , τ s = 10 −7<br />
our preconditioning algorithm found exactly the 15 linearly dependent columns. When<br />
τ d = 10 −6 , τ s = 10 −5 many more than 15 columns were detected. Thus, assumption<br />
1 is not satisfied. Hence, the preconditioned GMRES did not converge. From this example<br />
we can see that the numerical results coincide with our theory very well. From<br />
Figure 1, we obtain a better insight into this situation. We observe that when the<br />
0<br />
Residual Curve of lp_cycle, τ d<br />
= 10 −6<br />
−2<br />
log 10<br />
||A T r|| 2<br />
/||A T b|| 2<br />
−4<br />
−6<br />
−8<br />
−10<br />
τ s<br />
=10 −5<br />
τ s<br />
=10 −7<br />
τ s<br />
=10 −10<br />
−12<br />
−14<br />
0 50 100 150 200<br />
Iteration Number<br />
Fig. 1 Convergence Curve <strong>for</strong> lp cycle<br />
switching tolerance τ s = 10 −5 , ‖AT r‖ 2<br />
‖A T maintains the level 10 −1 in the end, which<br />
b‖ 2<br />
means that the solution is not a solution to the original least squares problem. This<br />
phenomenon illustrates that Assumption 1 is necessary.<br />
9 Conclusion<br />
In this paper, we proposed a new preconditioner <strong>for</strong> least squares problems. When<br />
matrix A is full column rank, our preconditioning method is similar to the RIF preconditioner<br />
[3]. When A is rank deficient, our preconditioner is also rank deficient.<br />
We proved that under Assumption 1, using our preconditioners, the preconditioned
23<br />
problems are equivalent to the original problems. Also, under the same assumption,<br />
we showed that the GMRES determines a solution to the preconditioned problem.<br />
Numerical experiments confirmed our theories and Assumption 1.<br />
Numerical results showed that our preconditioner per<strong>for</strong>med competitively both <strong>for</strong><br />
ill-conditioned full column rank matrices and ill-conditioned rank deficient matrices.<br />
For rank deficient problems, the RIF preconditioner may not work but our preconditioner<br />
worked efficiently.<br />
References<br />
1. M. Benzi and M. Tůma, A sparse approximate inverse preconditioner <strong>for</strong> nonsymmetric<br />
linear systems, SIAM Journal on Scientific Computing, 19 (1998), pp. 968–994.<br />
2. , A robust incomplete factorization preconditioner <strong>for</strong> poitive definite matrices, Numerical<br />
linear algebra with applications, 10 (2003), pp. 385–400.<br />
3. , A robust preconditioner with low memory requirements <strong>for</strong> large sparse least<br />
squares problems, SIAM Journal on Scientific Computing, 25 (2003), pp. 499–512.<br />
4. Å. Björck, Numerical <strong>Method</strong>s <strong>for</strong> <strong>Least</strong> <strong>Squares</strong> Problems, 1996.<br />
5. M. Bollhöfer and Y. Saad, On the relations between ILUs and factored approximate<br />
inverses, SIAM Journal on Matrix Analysis and Applications, 24 (2002), pp. 219–237.<br />
6. P. N. Brown and H. F. Walker, GMRES on (nearly) singular systems, SIAM Journal<br />
on Matrix Analysis and Applications, 18 (1997), pp. 37–51.<br />
7. R. Bru, J. Cerdán, J. Marín, and J. Mas, <strong>Preconditioning</strong> sparse nonsymmetric linear<br />
systems with the sherman–morrison <strong>for</strong>mula, SIAM Journal on Scientific Computing, 25<br />
(2003), pp. 701–715.<br />
8. R. Bru, J. MarÍn, and J. M. M. TUMA, Balanced incomplete factorization, SIAM<br />
Journal on Scientific Computing, 30 (2008), pp. 2302–2318.<br />
9. S. L. Campbell and C. D. Meyer, Jr., Generalized Inverses of Linear Trans<strong>for</strong>mations,<br />
Pitman, London, 1979. Reprinted by Dover, New York, 1991.<br />
10. E. Chow and Y. Saad, Approximate inverse preconditioners via sparse-sparse iterations,<br />
SIAM Journal on Scientific Computing, 19 (1998), pp. 995–1023.<br />
11. T. A. Davis, University of florida sparse matrix collection, NA Digest, 92 (1994).<br />
12. J. A. Fill and D. E. Fishkind, The Moore–Penrose generalized inverse <strong>for</strong> sums of<br />
matrices, SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp. 629–635.<br />
13. T. N. E. Greville, Some applications of the pseudoinverse of a matrix, SIAM Review, 2<br />
(1960), pp. 15–22.<br />
14. K. Hayami, J.-F. Yin, and T. Ito, GMRES methods <strong>for</strong> least squares problems, Tech.<br />
Report NII-2007-009E, National Institute of In<strong>for</strong>matics, Tokyo, July 2007, (also to appear<br />
in SIAM Journal on Matrix Analysis and Applications).<br />
15. Y. Saad, Iterative <strong>Method</strong>s <strong>for</strong> Sparse Linear Systems, Society <strong>for</strong> Industrial and Applied<br />
Mathematics, Philadelphia, PA, USA, second ed., 2003.<br />
16. Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm <strong>for</strong> solving<br />
nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing,<br />
7 (1986), pp. 856–869.<br />
17. G. Wang, Y. Wei, and S. Qiao, Generalized Inverses: Theory and Computations, Since<br />
Press, Beijing, 2003.<br />
18. P.-Å. Wedin, Perturbation theory <strong>for</strong> pseudo-inverses, BIT Numerical Mathematics, 13<br />
(1973), pp. 217 – 232.<br />
19. J.-F. Yin and K. Hayami, Preconditioned GMRES methods with incomplete Givens orthogonalization<br />
method <strong>for</strong> large sparse least-squares problems, J. Comput. Appl. Math.,<br />
226 (2009), pp. 177–186.<br />
20. N. Zhang and Y. Wei, On the convergence of general stationary iterative methods <strong>for</strong><br />
range-hermitian singular linear systems (p n/a), Numerical Linear Algebra with Applications,<br />
published on line (2009).