02.11.2014 Views

Gerard L.G. Sleijpen, Peter Sonneveld and Martin B. van Gijzen, Bi ...

Gerard L.G. Sleijpen, Peter Sonneveld and Martin B. van Gijzen, Bi ...

Gerard L.G. Sleijpen, Peter Sonneveld and Martin B. van Gijzen, Bi ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Author's personal copy<br />

G.L.G. <strong>Sleijpen</strong> et al. / Applied Numerical Mathematics 60 (2010) 1100–1114 1109<br />

Determine the singular value decomposition of σ k , σ k = Q Σ V ∗ with Q <strong>and</strong> V s × s unitary matrices, Σ =<br />

diag(ν 1 ,...,ν s ), with singular values ν j ordered such that ν 1 ··· ν s 0. Find p < s such that ν p >δ>ν p+1 for some<br />

appropriate (small) δ>0. Now, replace ˜Rk by ˜Rk Q ( : , 1: p) (hereweuseMATLAB’snotation)<strong>and</strong>C k by C k ( : , 1: p). Then the<br />

‘new’ σ k has singular values ν 1 ,...,ν p .<br />

7. <strong>Bi</strong>-CGSTAB<br />

We are now ready to formulate the <strong>Bi</strong>-CGSTAB version of IDR for s > 1.<br />

As before, for each k, select a polynomial p k of exact degree k, <strong>and</strong> put P k ≡ p k (A). For ease of notation, replace<br />

C k = C <strong>Bi</strong>-CG , U<br />

k k = U <strong>Bi</strong>-CG , <strong>and</strong> r<br />

k<br />

k = r <strong>Bi</strong>-CG in (11) by C<br />

k<br />

k , U k , <strong>and</strong> r k , respectively. Now (11) can be reformulated as (take<br />

˜R k = ¯p k (A ∗ ) ˜R0 )<br />

P k r k+1 = P k r k − P k C k ⃗α k ⊥ ˜R0<br />

v = P k r k+1<br />

for j = 1,...,s<br />

s = Av<br />

P k C k+1 e j = s − P k C k<br />

⃗β j ⊥ ˜R0<br />

v = P k C k+1 e j<br />

end for (12)<br />

Note that, for obtaining a formula for computing ⃗α k <strong>and</strong> ⃗β j = ⃗β (k) there is no need to refer to (11): it suffices to refer to<br />

j<br />

the orthogonality conditions in (12).<br />

To repeat the k-loop, represented in (12), k has to be increased, that is, P k+1 r k+1 <strong>and</strong> P k+1 C k+1 have to be computed<br />

from P k r k+1 <strong>and</strong> P k C k+1 , respectively. If P k+1 = (I − ω k A)P k , then these quantities can be obtained simply by multiplication<br />

by I − ω k A. This naive approach would require s + 1 additional multiplications by A. However, in the process of updating<br />

P k C k+1 , the vectors AP k r k+1 <strong>and</strong> AP k C k+1 e j ( j = 1,...,s − 1) are computed. The following loop exploits this information.<br />

P k r k+1 = P k r k − AP k U k α k ⊥ ˜R0<br />

v = P k r k+1 , s = Av, AP k r k+1 = s<br />

for j = 1,...,s<br />

AP k U k+1 e j = s − AP k U k β j ⊥ ˜R0<br />

P k U k+1 e j = v − P k U k β j<br />

v = AP k U k+1 e j , s = Av, A 2 P k U k+1 e j = s<br />

end for<br />

Select an ω k , P k+1 r k+1 = P k r k+1 − ω k AP k r k+1<br />

for i = 0, 1, A i P k+1 U k+1 = A i P k U k+1 − ω k A i+1 P k U k+1 , end for (13)<br />

Here, AU k equals C k .WeusedU k in our formulation here since we also need A 2 U k <strong>and</strong> we want to limit the number of<br />

different symbols.<br />

Computing P k+1 r k+1 as soon as P k r k+1 is available, <strong>and</strong> computing P k+1 C k+1 e j as soon as P k C k+1 e j , is available leads to<br />

the following variant that requires less memory.<br />

P k r k+1 = P k r k − P k C k α k ⊥ ˜R0<br />

v = P k r k+1 , s = Av<br />

Select an ω k , P k+1 r k+1 = v − ω k s<br />

for j = 1,...,s<br />

v = s − P k C k β j ⊥ ˜R0<br />

s = Av, P k+1 C k+1 e j = v − ω k s<br />

end for (14)<br />

The additional matrix P k C k+1 need not to be stored. The formation of P k C k+1 <strong>and</strong> AP k C k+1 in the last step of (13) may<br />

be superfluous if P k+1 r k+1 = P k r k+1 − ω k AP k r k+1 turns out to be small enough. Unfortunately, this step cannot be avoided<br />

in (13), whereas (14) allows a ‘break’ after the computation of P k+1 r k+1 . Nevertheless, the approach in (13) may be useful,<br />

since it may allow easier generalization to other hybrid <strong>Bi</strong>-CG variants. Note that in coding (13) the matrix update of<br />

P k+1 C k+1 can exploit BLAS2 subroutines with better parallelization properties then the sequence of vector updates in (14).<br />

With r k ≡ P k r k , S k ≡ P k C k <strong>and</strong> v k ≡ P k r k+1 , we obtain the algorithm as displayed in the left panel of Algorithm 4.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!