AIR Tools - A MATLAB Package for Algebraic Iterative ...

AIR Tools - A MATLAB Package 

for Algebraic Iterative 

Reconstruction Techniques 

Maria Saxild-Hansen 

Kongens Lyngby 2010

Technical University of Denmark 

Informatics and Mathematical Modelling 

Building 321, DK-2800 Kongens Lyngby, Denmark 

Phone +45 45253351, Fax +45 45882673 

reception@imm.dtu.dk 

www.imm.dtu.dk

Summary 

In this master thesis a MATLAB package AIR Tools with implementations of 

several iterative algebraic reconstruction methods for discretizations of tomography 

problems is developed. The focus is mainly on the two classes of methods: 

Simultaneous Iterative Reconstruction Technique (SIRT) and Algebraic Reconstruction 

Techniques (ART). The package also includes three simplified test 

problems from medical and seismic tomography. 

For each iterative method a number of strategies for choosing the relaxation 

parameter and the stopping rule are presented and implemented. The relaxation 

parameter can be chosen as a fixed parameter or chosen adaptively in each 

iteration. For the fixed case a training strategy is developed for finding the 

optimal parameter for a given test problem. The stopping rules provided in the 

package is the Discrepancy Principle, the Monotone Error Rule and the NCP 

criterion. For the first two methods a training strategy is also provided for 

finding an optimal stopping parameter. 

In addition simulation studies and comparisons of performance of the available 

methods and strategies are presented and discussed. 

This thesis also includes manual pages for each implemented routine that describes 

the use of the implemented routines. 

KEYWORDS: ART methods, SIRT methods, iterative methods, semi-convergence, 

relaxation parameters, stopping rules, tomography.

Resumé 

I dette eksamensprojekt udvikles en MATLAB programpakke, AIR Tools, med 

implementeringer af flere iterative algebraiske rekonstruktions metoder til diskretiseret 

tomografi problemer. Det primære fokus er p˚a to klasser af metoder: Simulatan 

Iterative Rekonstruktion Teknik (SIRT) og Algebraiske Rekonstruktions 

Teknikker (ART). Programpakken indeholder ligeledes tre simple testproblemer 

fra medicinsk og seismisk tomografi. 

For hver iterative metode præsenteres og implementeres en række strategier til 

at vælge relaxations parameteren samt stopkriterier. Relaxations parameteren 

kan enten vælges som en konstant parameter eller den kan vælges adaptivt i 

hver iteration. For det konstante tilfælde er der udviklet en træningsstrategi 

til at finde den optimale værdi for et givent testproblem. Stopkriterierne, der 

er tilgængelige i denne pakke, er discrepancy princippet, monotone error reglen 

samt NCP kriteriet. For de to første metoder er der givet en træningsstrategi 

til at finde den optimale værdi for stopparamerten. 

Yderligere er studier og sammenligninger af metodernes og strategiernes opførsel 

ogs˚a præsenteret of diskuteret. 

Eksamensprojektet indeholder ogs˚a manual sider til hver implementeret funktion, 

som beskriver benyttelsen heraf. 

STIKORD: ART metoder, SIRT metoder, iterative metoder, semi-konvergens, 

relaxation parameter, stopkriterier, tomografi.

Preface 

This master thesis is prepared at the Department of Informatics and Mathematical 

Modeling, Technical University of Denmark (DTU), and marks the 

completion of the master degree in Mathematical Modeling and Computations. 

It represents the workload of 35 ETCS points and has been prepared during a 

seven month period from August 31 to March 31. The study has been conducted 

under the supervision of Professor Per Christian Hansen. 

I would like to thank a few people for helping me with this project. I would like 

to thank Professor in scientific compution at University of Linköping, Dept. of 

Mathematics, Tommy Elfving who, through a visit at DTU in November 2009 

provided valuable insight into the theory of iterative methods. I would also like 

to thank Professor at DTU Informatics Klaus Mosegaard for his assistance in 

creation of a seismic tomography problem and a usefull test phantom and Ph.D. 

student Jakob Heide Jørgensen for assistance in creation an algorithm for the 

tomography test problems. Finally, I would like to thank my family and friends, 

especially Katrine Lange and Elin A. Larsen for assistance and for keeping up 

my spirit. 

Kgs. Lyngby, 31th March 2010 

Maria Saxild-Hansen

List of Symbols 

The following is a list of symbols used over the thesis. Be aware that this list 

only contains the symbols which are used for the same purpose through the 

thesis. This list is therefore not a complete list since only frequently used symbols 

are represented. Also be aware that some symbols have multiple meanings. 

However the meaning will be clear from the context. 

Symbol Quantity Dimension 

A coefficient matrix m × n 

ai i’th row in the matrix A m 

aj j’th column in the matrix A m 

aij element in the i’th row and the j’th column of A scalar 

b 

¯b right-hand side 

exact right-hand side 

m 

m 

bi i’th element in the vector b scalar 

δ the noise level scalar 

I identity matrix 

k iteration number scalar 

λk relaxation parameter k or scalar 

M symmetric, positive definite matrix for the SIRT m × m 

methods 

m, n matrix dimensions scalars 

Φ k (σ, λ) iteration-error scalar 

ϕi filter factor scalar 

Φ diagonal matrix of filter factors n × n

viii Contents 

̟ average number of nonzero elements in a row scalar 

Ψk (σ, λ) noise-error scalar 

ρ spectral radius scalar 

Σ diagonal matrix with all singular values m × n 

σi singular value of matrix scalar 

the number of nonzero elements in the j’th col- scalar 

sj 

umn 

τ the stopping parameter scalar 

τ1 parameter for the modified Ψ1-based relaxation scalar 

τ2 parameter for the modified Ψ2-based relaxation scalar 

T symmetric positive definite matrix for the SIRT n × n 

methods 

U matrix with all left singular vectors m × m 

ui i’th left singular vector m 

V matrix with all right singular vectors n × n 

vi i’th right singular vector n 

w weighting vector m 

wi i’th element in the weighting vector scalar 

x k solution in the k’th iteration n 

¯x exact solution n 

Hi the i’th hyperplane 

Pi(·) projection 

Ri(·) reflection 

〈·, ·〉 inner product, i.e. 〈x, y〉 = xTy. · 2 2-norm 

NNZ(·) number of nonzero elements scalar

Contents 

Summary i 

Resumé iii 

Preface v 

List of Symbols vii 

List of Figures xiv 

1 Introduction 1 

1.1 Sturcture of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 2 

2 Theory of Inverse Problems and Regularization 5 

2.1 Discrete Ill-Posed Problems . . . . . . . . . . . . . . . . . . . . . 5 

2.2 SVD and Picard Condition . . . . . . . . . . . . . . . . . . . . . 6 

2.3 Spectral Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

2.4 Iterative Methods and Semi-Convergence . . . . . . . . . . . . . 10 

2.5 Resolution Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

3 Iterative Methods for Reconstruction 13 

3.1 Simultaneous Iterative Reconstructive Technique (SIRT) . . . . . 14 

3.2 Algebraic Reconstruction Techniques (ART) . . . . . . . . . . . . 21 

3.3 Considerations Towards the Package . . . . . . . . . . . . . . . . 26 

3.4 Block-Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . 27 

4 Semi-Convergence and Choice of Relaxation Parameter 33 

4.1 Semi-Convergence for SIRT Methods . . . . . . . . . . . . . . . . 33 

4.2 Choice of Relaxation Parameter . . . . . . . . . . . . . . . . . . . 38

x CONTENTS 

5 Stopping Rules 53 

5.1 Stopping Rules with Training . . . . . . . . . . . . . . . . . . . . 53 

5.2 Normalized Cumulative Periodogram . . . . . . . . . . . . . . . . 58 

6 Test Problems 61 

7 Testing the Methods 67 

7.1 Convergence of DROP . . . . . . . . . . . . . . . . . . . . . . . . 69 

7.2 Symmetric Kaczmarz as a SIRT Method . . . . . . . . . . . . . . 70 

7.3 Test of the Choice of Relaxation Parameter . . . . . . . . . . . . 71 

7.4 Stopping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 

7.5 Relaxation Strategies Combined with Stopping Rules . . . . . . . 89 

8 Manual Pages 97 

9 Conclusion and Future Work 147 

9.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 

A Appendix 149 

A.1 Orthogonal Projection on a Hyperplane . . . . . . . . . . . . . . 149 

A.2 Investigation of the Roots . . . . . . . . . . . . . . . . . . . . . . 152 

A.3 Work Units for the SIRT and ART methods . . . . . . . . . . . . 154 

Bibliography 155

List of Figures 

2.1 SVD basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 

2.2 Picard plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 

2.3 Illustration of basis semi-convergence . . . . . . . . . . . . . . . . 10 

3.1 Cimmino’s reflection method . . . . . . . . . . . . . . . . . . . . 16 

3.2 Cimmino’s projection method . . . . . . . . . . . . . . . . . . . . 18 

3.3 Kaczmarz’s method . . . . . . . . . . . . . . . . . . . . . . . . . 22 

3.4 Symmetric Kaczmarz . . . . . . . . . . . . . . . . . . . . . . . . . 23 

4.1 Behaviour of Φ k (σ, λ) and Ψ k (σ, λ). . . . . . . . . . . . . . . . . 37 

4.2 Ψ k (σ, λ) as function of σ . . . . . . . . . . . . . . . . . . . . . . . 39 

4.3 Relative error histories for nine values of λ . . . . . . . . . . . . . 40 

4.4 The minimum relative errors for different λ-values . . . . . . . . 40 

4.5 Optimal number of iterations for a SIRT method . . . . . . . . . 41

xii LIST OF FIGURES 

4.6 Relative error histories for an ART method . . . . . . . . . . . . 43 

4.7 The minimum relative errors for different λ-values for an ART 

method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

4.8 Optimal number of iterations for an ART method . . . . . . . . . 44 

4.9 Relative error histories for a SIRT method with maximum number 

of iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 

4.10 Minimum relative error for a SIRT method with maximum number 

of iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 

4.11 Optimal number of iterations for a SIRT method with maximum 

number of iterations . . . . . . . . . . . . . . . . . . . . . . . . . 47 

4.12 Illustration of line search . . . . . . . . . . . . . . . . . . . . . . . 48 

6.1 Parallel beam illustration . . . . . . . . . . . . . . . . . . . . . . 62 

6.2 Fan beam illustration . . . . . . . . . . . . . . . . . . . . . . . . 63 

6.3 Seismic tomography illustration . . . . . . . . . . . . . . . . . . . 64 

6.4 The two exact phantoms . . . . . . . . . . . . . . . . . . . . . . . 65 

7.1 Relative error histories for test of DROP . . . . . . . . . . . . . . 68 

7.2 Relative error histories for test of DROP using weighting . . . . 69 

7.3 Ψ-based relaxations for symmetric Kaczmarz . . . . . . . . . . . 71 

7.4 Training of relaxation parameter using Cimmino’s projection method 72 

7.5 Training of relaxation parameter using Kaczmarz’s method . . . 72 

7.6 Training of relaxation parameter using randomized Kaczmarz . . 73 

7.7 Relative errors for the SIRT methods with trained λ . . . . . . . 73 

7.8 Relative errors for the ART methods with trained λ . . . . . . . 74

LIST OF FIGURES xiii 

7.9 Training of relaxation parameter using Cimmino’s projection method 

with maximum number of iterations . . . . . . . . . . . . . . . . 76 

7.10 Training of relaxation parameter using Kaczmarz’s method with 

maximum number of iterations . . . . . . . . . . . . . . . . . . . 76 

7.11 Training of relaxation parameter using randomized Kaczmarz 

method with maximum number of iterations . . . . . . . . . . . . 77 

7.12 Relative error for the SIRT methods using line search . . . . . . 78 

7.13 Relative error using the Ψ-based relaxations . . . . . . . . . . . . 79 

7.14 Relative error using the modified Ψ-based relaxations . . . . . . 79 

7.15 Relative errors for the SNARK test problem with different relaxation 

strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 

7.16 Training of stopping rule for Cimmino’s projection method . . . 84 

7.17 Training of stopping rule for DROP . . . . . . . . . . . . . . . . 84 

7.18 Training of stopping rule for Kaczmarz’s method . . . . . . . . . 85 

7.19 Illustration of the stopping rules for the SIRT methods . . . . . . 86 

7.20 Illustration of the stopping rules for the ART methods . . . . . . 87 

7.21 Ψ-based relaxation with stopping rules . . . . . . . . . . . . . . . 90 

7.22 Line search with stopping rules . . . . . . . . . . . . . . . . . . . 91 

7.23 Training λ with stopping rules for SIRT methods . . . . . . . . . 93 

7.24 Training λ with stopping rules for ART methods . . . . . . . . . 94 

A.1 Illustration of projection on hyperplane where origo is in the hyperplane 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 

A.2 Illustration of projection on the hyperplane wheer origo is not in 

the hyperplane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 

A.3 Illustration of the roots . . . . . . . . . . . . . . . . . . . . . . . 152

xiv 

A.4 Zoom of the roots . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Chapter 1 

Introduction 

In the beginning of the 20th century the Polish mathematician Stefan Kaczmarz 

[27] and the Italian mathematician Gianfranco Cimmino [8] independently developed 

iterative algorithms for solving linear systems. In 1970 Gordon, Bender 

and Herman rediscovered Kaczmarz’s method applied in medical imaging [17]. 

They called the method ART (Algebraic Reconstruction Technique) and when 

Houndsfield patented the first CT-scanner in 1972, which awarded him together 

with Cormark the Nobel Prize in 1979, the classical methods found their practical 

purpose in tomograpgy [24]. The word tomography means reconstruction 

from slices. After the invention of the CT-scanner several new methods familiar 

with the old classical methods were developed. 

This master thesis deals with the classical Kaczmarz’s and Cimmino’s methods 

but also with the methods familiar with these methods. We divide the gathered 

methods into two main categories, the SIRT and the ART methods, and present 

strategies for choosing the relaxation parameter and different stopping rules. 

We will compare the performance of different methods and different strategies 

on a test problem derived from medical tomography.

2 Introduction 

1.1 Sturcture of the Thesis 

The goal of the project is to develope and implement a MATLAB package containing 

a number of iterative methods for algebraic reconstruction used in tomography 

problems. This includes describing the methods in a common framework 

such that the methods are described in same notation and the created 

functions have similar interfaces. Furthermore strategies for choosing the relaxation 

parameter must be availiable just as different stopping rules must be 

included. A few test problems relevant for these kind of methods must also 

be implemented. A critical comparison of the different methods and strategies 

used on different test problems will be produced. Finally the thesis will have 

the form as a extended manual such that it contains chapters with theory and 

manual pages for each implemented routine. 

The chapters of the thesis are organized in the following way: 

• Chapter 2: We begin by giving a short presentation of inverse problem 

theory and defining the concept of semi-convergence for iterative methods 

and the concept of resolution limit. 

• Chapter 3: In this chapter we introduce the theory of the gathered SIRT 

and ART methods which this package concerns. We also provide a brief 

overview of block-iterative methods. 

• Chapter 4: In the next chapter we examine the semi-convergence behaviour 

of a part of the SIRT methods. After this examination we introduce 

different strategies of choosing the relaxation parameter, where one 

of the strategies is based on the examination of semi-convergence. 

• Chapter 5: In this chapter we introduce three strategies for the stopping 

rules. To devise effective stopping rules a training strategy is introduced 

for two of the stopping rules. 

• Chapter 6: We introduce in this chapter three different test problems, 

where two of the test problems arise from medical tomography and the 

third test problem arise from seismic tomography. 

• Chapter 7: This chapter discusses the performance of the methods. We 

also examine the perfomance of the methods when the different strategies 

for choosing the relaxation parameter and for different stopping rules are 

used. Furthermore we compare the performance of the SIRT and the ART 

methods.

1.1 Sturcture of the Thesis 3 

• Chapter 8: This chapter contains an overview of the implemented routines 

followed by an individual manual page for each function in the package. 

The manual pages are arranged alphabetically. 

• Chapter 9: This chapter contains the conclusion and suggenstions for 

future work. 

All the implemented MATLAB routines have been implemented in MATLAB 

7.8. To produce the test results, examples and figures a large number of scrips 

have been created but only the relevant functions are included in the package.

4 Introduction

Chapter 2 

Theory of Inverse Problems 

and Regularization 

Inverse problems arise in many applications in science and technology. Examples 

where inverse problems are found could be in medical imaging, where it is used 

e.g. in CT scanning, in geophysical prospecting or image deblurring. We will in 

this chapter introduce some of the fundamental concepts of inverse problems. 

We will first introduce the concept of an inverse problem and describe what 

defines an ill-posed problem. Then the important tools of SVD and the discrete 

Picard condition is defined followed by a few examples of spectral filtering. 

Finally we will give a short description of semi-convergence for iterative methods 

and define the concept of resolution limit. 

2.1 Discrete Ill-Posed Problems 

Inverse problems arise when we need to compute information that is either 

internal or hidden. In the forward problem we have a known input and a known 

system and we can then compute the output. In the inverse problem the output 

is often known with errors and we then have to compute either the system or 

the input, where the other one is known. For the linear problems we let the 

system be represented by the matrix A ∈ R m×n , the output as the right-hand

6 Theory of Inverse Problems and Regularization 

side b ∈ R m , which is the known data and the solution x ∈ R n . The problem 

can be formulated as a system of linear equations: 

Ax = b, (2.1) 

where the matrix A typically is a discretization from an ill-posed problem, e.g. 

the Radon transform. The system (2.1) is said to be overdetermined when 

m > n and underdetermined when m < n. 

The definition of a well-posed problem was invented by Hadamard, who stated 

that a problem is well-posed if it satisfies the following requirements: 

Existence: There exist a solution to the problem. 

Uniqueness: There exist only one solution to the problem. 

Stability: The solution must depend continuously on data. 

If one of the three conditions is not satisfied, then the problem is said to be 

ill-posed. 

2.2 SVD and Picard Condition 

An important tool in analysing inverse problems is the singular value decomposition 

(SVD). SVD is defined for any matrix A ∈ R m×n as 

A = 

min{m,n} 

i=1 

uiσiv T i , 

where the vectors ui and vi are orthonormal, and 

σ1 ≥ σ2 ≥ · · · ≥ σ min{m,n} ≥ 0. 

The elements σi are the singular values and the rank of the matrix A is equal 

to the number of positive singular values. Assuming that the inverse of A exists 

it is given as 

A −1 = 

min{m,n} 

i=1 

1 

viu 

σi 

T i .

2.2 SVD and Picard Condition 7 

U(:,k) 

U(:,k) 

U(:,k) 

0 

−0.2 

k = 1 

−0.4 

0 50 

n 

k = 4 

0.5 

0 

−0.5 

0 50 

n 

k = 7 

0.5 

0 

−0.5 

0 50 

n 

U(:,k) 

U(:,k) 

U(:,k) 

0.5 

0 

k = 2 

−0.5 

0 50 

n 

k = 5 

0.5 

0 

−0.5 

0 50 

n 

k = 8 

0.5 

0 

−0.5 

0 50 

n 

U(:,k) 

U(:,k) 

U(:,k) 

0.5 

0 

k = 3 

−0.5 

0 50 

n 

k = 6 

0.5 

0 

−0.5 

0 50 

n 

k = 9 

0.5 

0 

−0.5 

0 50 

n 

Figure 2.1: The first 9 left singular vectors ui for the test problem shaw. 

10 5 

10 0 

10 −5 

10 −10 

10 −15 

Picard Plot with b exact 

σ 

i 

T 

|u b| 

i 

T 

|u b|/σi 

i 

10 

0 10 20 30 40 50 

−20 

i 

(a) Picard plot with no noise. 

10 20 

10 10 

10 0 

10 −10 

Picard Plot with b noise 

σ 

i 

T 

|u b| 

i 

T 

|u b|/σi 

i 

10 

0 10 20 30 40 50 

−20 

i 

(b) Picard plot with noise. 

Figure 2.2: The Picard plot for the test problem shaw. The left figure is with no noise 

while the right figure is with noise level δ = 10 −3 .


Using this we can write the naive solution as 

x = A −1 b = 

min{m,n} 

i=1 

Figure 2.1 shows the first nine left singular vectors ui for the test problem shaw 

from Regularization Tools [21] with white noise level δ = 10 −3 . We see that 

the singular vectors have more oscillations as i increases, and the corresponding 

singular values σi decrease. 

We will now investigate the behaviour of the SVD coefficients 〈ui, b〉 and 〈ui,b〉 

. σi 

We call a plot of these coefficients together a Picard plot. Figure 2.2 shows the 

Picard plot for the test problem shaw with n = 50. From the left plot (a) we see 

the Picard plot when no noise is added to the right-hand side. We notice that the 

SVD coefficients |〈ui, b〉| decay faster than the singular values σi. This continues 

until i ≥ 18 where the coefficients level off. We recognize the reached level as the 

machine precision. We also notice that the solution coefficients 〈ui,b〉 

also decay 

σi 

but for i ≥ 18 they increase due to the inaccuracy of the coefficients 〈ui, b〉. 

We therefore cannot expect to get a meaningful solution to the inverse problem 

since the influence of the rounding errors destroys the computed solution. 

The plot to the right (b) shows the same problem, but we have used a noisy 

right-hand side. In this plot the SVD coefficients |〈ui, b〉| also decay until a 

certain level where they level off. This level is determined by the added noise. 

Also the solution’s coefficients 〈ui,b〉 

decay in the beginning, but increase again 

σi 

when the SVD coefficients |〈ui, b〉| level off. In this case the computed solution 

is totally dominated by the SVD coefficients which corresponds to the smaller 

singular values. 

u T i b 

In this connection we introduce the discrete Picard Condition. 

Definition 2.1 (Discrete Picard Condition) The discrete Picard Condition 

is satisfied if for all singular values σi greater than τ the corresponding coefficients 

|〈ui, b〉| on average decay faster than σi, where τ denotes the level at 

which the computed singular values level off due to rounding errors. 

Notice that the Picard Condition is about the decay and not the size of the 

singular values and the coefficients |〈ui, b〉|. If the discrete Picard condition is 

not satisfied then we cannot expect to solve a discrete ill-posed problem. 

σi 

vi.

2.3 Spectral Filtering 9 

2.3 Spectral Filtering 

Due to the difficulties associated with the discrete inverse problems the naive 

solution x = A −1 b is useless since it is becomes dominated by the rounding 

errors. We will in this section introduce two spectral filtering methods, which 

can be expressed as a filtered SVD expansion on the form: 

xfilter = 

min{m,n} 

i=1 

ϕi 

〈ui, b〉 

vi, 

where ϕi are the filter factors for the corresponding method. We will first 

introduce the truncated SVD method (TSVD). 

We realised that the large errors in the naive solution came from the noisy SVD 

coefficients corresponding to the smallest singular values but we also noticed that 

the SVD coefficients for large singular values were useful, since these coefficients 

fulfilled 〈ui,b〉 

σi ≃ 〈ui,¯b〉 , where b is the noisy right-hand side and σi 

¯b is the righthand 

side without noise. This leads to the truncated SVD (TSVD) method 

where we choose only to include the first k components of the naive solution 

to x. With this method we therefore cut off those SVD coefficients that are 

dominated by inverted noise. We define the TSVD solution as 

xk = 

σi 

k 〈ui, b〉 

vi, 

i=1 

where we call k the truncation parameter and k must be chosen such that all 

the noise-dominated SVD coefficients are discarded. This leads to the following 

filter factors for the TSVD method: 

ϕi = 

σi 

1 i ≤ k 

0 i > k. 

The second method we will introduce the Tikhonov regularization. For this 

method the filter factors is defined as 

ϕi = 

σ 2 i 

σ 2 i 

+ ω2 , i = 1, · · · , n, 

where ω is the regularization parameter, which in a sense corresponds to the 

truncation parameter k. The Tikhonovs regularization corresponds to the following 

minimization problem 

min 

x {Ax − b 2 2 + ω2 x 2 2 }.


x 0 

x 1 

x 2 

x k 

x k opt 

x exact 

A −1 b 

Figure 2.3: The basis concept of semi-convergence 

We notice that for σi ≫ ω, then the filter factors are close to 1 and the corresponding 

SVD components contribute to xfilter with almost full strength. On 

the other hand when σi ≪ ω then the filter factors are close to σ 2 i /ω2 , and the 

SVD components are damped or filtered. 

2.4 Iterative Methods and Semi-Convergence 

For large problems where it is not feasible to compute the SVD, we need other 

methods than the introduced TSVD and Tikhonov regularization. This leads us 

to the use of iterative methods, where we need a user-specified starting vector 

x 0 , and from this vector the method produces a sequence of iterates x 1 , x 2 , . . . 

that converge to some solution. 

For iterative methods Natterer [31] has introduced the concept of semi-convergence. 

The concept describes the behaviour of the iterate x k for the iterative methods. 

The first iterates tend to be better and better approximations of the exact solution 

but at some point the iterates start to deteriorate and instead they converge 

to the naive solution x = A −1 b, see figure 2.3. For the iterative methods the 

regularization parameter is therefore the number of iterations.

2.5 Resolution Limit 11 

2.5 Resolution Limit 

When exploring the iterative methods, which this package concerns, we need to 

define the concept of resolution limit. For a better understanding of this concept 

we draw attention to the fact that the relative error is defined as 

xk − ¯x2 

, 

¯x2 

where x k is the solution in the k’th iterate and ¯x is the exact solution. 

The bound for how accurate a solution one can obtain, is determined by the 

noise in the data and this can be studied in terms of the SVD. We then define 

the resolution limit to be this bound. The resolution limit is not only dependent 

on the noise but also the used method and the given problem to solve. We define 

the resolution limit to be 

RL(A, b, method) = min 

k 

xk − ¯x2 

. 

¯x2 

From this definition we let the resolution limit depend on the used method, and 

the problem.

12 Theory of Inverse Problems and Regularization

Chapter 3 

Iterative Methods for 

Reconstruction 

In this chapter we will give a brief introduction to the theory for some iterative 

methods called SIRT and ART methods. The need for iterative methods arises, 

when the dimensions of the matrix A become so large that direct factorization 

methods become infeasible, which is usually the case in two and three dimensions. 

This is typically the case when A is a discretization that arises from a 

real-world problem. In this case one can use iterative methods instead of the 

well-known Tikhonov regularization or TSVD described in section 2.3. Where 

we for Tikhonov regularization have the regularization parameter ω, the number 

of iterations k plays the role of regularization parameter for the iterative 

methods. 

In the following presented theory we will assume that all the elements in the 

matrix A are nonnegative. In the articles where the methods are defined they 

do not include used-defined weights, but we have chosen to include them in both 

the description and the implementation.

14 Iterative Methods for Reconstruction 

3.1 Simultaneous Iterative Reconstructive Technique 

(SIRT) 

In this section we will present the class of iterative methods which we call 

Simultaneous Iterative Reconstructive Technique (SIRT). As the name refers to, 

all the methods of this class are simultaneous, which means that information 

from all the equations are used at the same time. 

In the literature the class of SIRT methods is also referred to as Landwebertypes, 

since the Landweber iteration is one of the classical methods of the SIRTclass. 

The common property of the SIRT methods is that they can be written 

in the following general form: 

x k+1 = x k + λkTA T M(b − Ax k ), k = 0, 1, . . . (3.1) 

where x k denotes the current iteration vector, x k+1 denotes the new iteration 

vector, λk is the relaxation parameter, and the matrices M and T are symmetric 

positive definite. We will realize that the different methods depend on the choice 

of the matrices M and T. In most of the presented methods we will have that 

T = I. 

For the methods given on the form (3.1) with T = I the following theorem 

regarding convergence has been shown [4], [25]. 

Theorem 3.1 The iterates on the form (3.1) with T = I converge to a solution 

ˆx of Ax − bM if and only if 

0 < ǫ ≤ λk ≤ 2 

σ2 − ǫ, 

1 

where ǫ is an arbitrarily small, but fixed constant and σ1 is the largest singular 

value of M 1 

2 A. If in addition x 0 ∈ R(A T ), then ˆx is the unique solution of 

minimum 2-norm. 

Theorem 3.1 is a useful theorem since it insures convergence of the SIRT methods 

in general. It was originally only proved to be a sufficient condition for convergence, 

but in [35] it is shown that the condition is also necessary as stated in 

the theorem. 

3.1.1 Classical Landweber Method 

The classical Landweber method was first introduced by Landweber in [29], 

and it has often been used for image reconstruction. The classical Landweber

3.1 Simultaneous Iterative Reconstructive Technique (SIRT) 15 

method can be written as follows: 

x k+1 = x k + λkA T (b − Ax k ), k = 0, 1, . . ., (3.2) 

which corresponds to setting M = T = I in (3.1). 

The iterates x k from (3.2) can be expressed as filtered SVD solutions. If we let 

the SVD for the matrix A take the following form 

A = UΣV T = 

then the filtered solution can be written as 

where Φ k is given as 

The filter factors ϕ k i 

n 

i=1 

x k = V Φ k Σ −1 U T b, 

uiσiv T i , 

Φ k = diag ϕ k 1, . . . , ϕ k n . 

for i = 1, . . .,n are given as 

ϕ k i = 1 − 1 − λσ 2k i . 

For small singular values σi we have that Φk i ≈ kλσ2 i showing that they decay 

with the same rate as the Tikhonov filter factors described in section 2.3. 

3.1.2 Generalized Landweber 

Another classical method is the generalized Landweber iteration which is described 

in [20] and [33]. The generalized Landweber has the following form: 

x k+1 = x k + λTA T (b − Ax k ), k = 0, 1, . . ., 

where λ is our constant relaxation parameter and T is a ”sharping matrix” given 

by 

T = F(A T A), 

where F is a rational function of A T A. We obtain the classical Landweber 

method when F = I. 

The filter factors ϕ k i 

for the generalized Landweber method are given by 

ϕ k i = 1 − (1 − σ 2 i F(σ 2 i )) k ,


✻ R2 

H2 

H1 

✙ 

R2(z) 

R1(z) 

z 

✲ 

Figure 3.1: Cimmino’s reflection method 

since the eigenvalue decomposition of F(A T A) is given as 

F(A T A) = 

n 

i=1 

viF(σ 2 i )v T i . 

We see that using the generalized Landweber method gives a further impact 

on the filter factors, since the function F occurs in the filter factors. It is also 

possible to choose the function in such a way that the method approximates, 

say, the TSVD or the Tikhonov regularization. 

3.1.3 Cimmino’s Method 

Another method in the SIRT-class is Cimmino’s method which was introduced 

in [8]. Cimmino’s method was originally based on reflections onto hyperplans 

but there also exists a version with projections. 

To introduce the two versions of Cimmino’s method, we define Hi to be the 

hyperplanes for the linear equations 〈a i , x〉 = bi: 

Hi = {x ∈ R n | a i , x = bi}, for i = 1, . . .,m. 

We will introduce both versions of Cimmino’s method, and we will start with 

the original that uses reflections.


The idea about Cimmino’s reflection method is that the next iterate can be 

described using an equal weighting of the reflections of x k on Hi. Reflections 

on hyperplanes is the following: 

Ri(z) = z + 2 bi − 〈ai , z〉 

ai2 a 

2 

i . 

The reflection method then uses the average of the reflections of x k onto the 

hyperplanes Hi to determine the direction of the step to the new iteration. 

Figure 3.1 illustrates the concept in R 2 for a consistent problem. The method 

can then be written as follows: 

x k+1 = x k + λk 

m 1 

wi Ri(x 

m 

k ) − x k , 

i=1 

where the relaxation parameter λk determines how much of the step is taken 

from x k to the new iterate x k+1 and wi > 0 are user-defined weights. Using the 

definition of reflections we get the following: 

x k+1 = x k + λk 

m 2 

m 

i=1 

wi 

bi − 〈ai , xk 〉 

ai2 ai for k = 0, 1, . . .. 

2 

Cimmino’s reflection method can be written using matrix notation on the form 

wi 

for i = 1, . . .,m and T = I. 

(3.1), where M = 2 

m diag 

a i 2 2 

We will now introduce Cimmino’s projection method. Using an equal weighting 

of all the equations the next iterate in Cimmino’s projection method can be 

described using orthogonal projections of x k on Hi. As shown in appendix A.1 

an orthogonal projection of the vector z on the hyperplane Hi is the following: 

Pi(z) = z + bi − a i , z 

a i 2 2 

a i . (3.3) 

Cimmino’s projection method uses the average of the projections of x k onto the 

hyperplanes Hi to determine the direction of the step to the new iterate. Figure 

3.2 illustrates the concept in R 2 for a consistent problem. 

The new iterate can then be described as the current iterate plus a contribution 

of the average of the found step direction. We can therefore write Cimmino’s 

projection method as the following: 

x k+1 = x k + λk 

1 

m 

m 

i=1 

 

wi Pi(x k ) − x k ,


✻ R2 

H2 

H1 

P1(z) 

✙ 

P2(z) 

z 

✲ 

Figure 3.2: Cimmino’s projection method 

where the relaxation parameter λk determines how much of the step is taken 

from x k to the new iterate x k+1 and wi are userdefined weights, where wi > 0 

for i = 1, . . .,m. 

Using the definition of orthogonal projection (3.3) we can rewrite the expression: 

x k+1 = x k + λk 

m 1 

m 

i=1 

wi 

bi − ai , xk a i 

a i 2 2 

for k = 0, 1, . . .. 

Using matrix notation Cimmino’s projection method has the general form (3.1), 

wi 

for i = 1, 2, . . ., m and T = I. 

where M = 1 

m diag 

a i 2 2 

3.1.4 Component Averaging (CAV) 

Component Averaging (CAV) is introduced in [6] and is an expansion of Cimmino’s 

method. In Cimmino’s method we use equal weighting of the contributions 

from the projections. In the case where the matrix A is dense, it seems 

fair that all contributions for Pi(x k ) − x k are equally weighted. 

The heuristic in CAV includes a factor which is proportional to the number of 

nonzero elements. We therefore let sj denote the number of nonzero elements


of column j for each j = 1, 2, . . ., n: 

sj = NNZ(aj), for j = 1, . . .,n. 

We then define a i 2 S = n 

j=1 a2 ij sj. Using this the CAV algorithm is as follows: 

x k+1 

j 

= xk j 

+ λk 

m 

i=1 

wi 

bi − a i , x k 

a i 2 S 

where wi > 0 are user-defined weights. 

a i j for k = 0, 1, . . ., 

We see that when A is dense we get the original Cimmino’s method, since sj = m 

for all j = 1, . . .,n, and we have ai2 1 

S = mai2 2. 

To rewrite the CAV algorithm in matrix form we define S = diag(s1, s2, . . . , sn), 

where the sj-values are defined as described above. We then let 

 

wi 

DS = diag for i = 1, . . .,m, 

a i 2 S 

where a i 2 S = (ai ) T Sa i and the CAV algorithm has the following matrix form 

x k+1 = x k + λkA T DS(b − Ax k ), 

which we recognize as (3.1) with M = DS and T = I. 

3.1.5 Diagonally Relaxed Orthogonal Projections (DROP) 

Another method in the SIRT class is the diagonally relaxed orthogonal projection 

(DROP) method which is described in [5]. This method is another 

extension of Cimmino’s method, which is inspired by the CAV method. In the 

DROP method we also introduce a user-defined weighting of the equations. We 

let wi > 0 denote this weighting. 

The DROP method can then be written as: 

m 

x k+1 = x k + λk 

i=1 

wiS −1 (Pi(x k ) − x k ), 

where Pi(x k ) is defined as in (3.3) and S is defined as above for CAV. Using 

(3.3) we can rewrite the DROP algorithm into the following form: 

x k+1 

j 

= xk j 

1 

+ λk 

sj 

m 

i=1 

wi 

bi − a i , x 

a i 2 2 

a i j ,


for all j = 1, 2, . . .,n. Recall that wi > 0 for all i = 1, . . .,m are user-chosen 

weights. When wi = 1 for all i = 1, . . .,m and the matrix A is dense, i.e. sj = m 

for all j = 1, . . . , n then we have Cimmino’s method. 

The DROP method has the following matrix form: 

x k+1 = x k + λkS −1 A T D(b − Ax k ), (3.4) 

which we recognize as the general form with T = S−1 

wi 

and M = D = diag ai2 2 

Since the DROP method has T = I, we cannot use theorem 3.1 and therefore we 

make a further investigation of the convergence theory. By defining yk = S 1 

2xk 1 

and Ā = AS− 2 we can rewrite to another matrix form: 

y k+1 = y k + λk ĀT D(b − Āyk ). 

For this form it is known, that λk must be between 0 and 2/ρ( ĀTDĀ). Using 

the definition of Ā we get that ρ(ĀT DĀ) = ρ(S−1AT DA). Then in [5] it is 

shown that for the DROP method where wi > 0 for all i = 1, . . .,m and if 

D = diag wi/ai2 

m×m −1 

2 ∈ R and S = diag(1/sj) ∈ Rn×n , where sj = 0, 

then ρ(S −1 A T DA) ≤ max{wi|i = 1, . . .,m}. We therefore have the following 

convergence theorem which replaces theorem 3.1 for the DROP method, where 

we let zD = 〈z, Dz〉 denote the D-norm: 

Theorem 3.2 Assume that wi > 0 for all i = 1, . . .,m. If for all k ≥ 0, 

0 < ǫ ≤ λk ≤ (2 − ǫ)/ max{wi|i = 1, . . .,m}, 

where ǫ is an arbitrarily small but fixed constant, then any sequence generated by 

(3.4) converges to a weighted least squares solution x ∗ = argmin{Ax −bD|x ∈ 

R n }. If in addition x 0 ∈ R(S −1 A T ), then x ∗ is the unique solution of minimum 

S-norm. 

3.1.6 Simultaneous Algebraic Reconstruction Technique 

(SART) 

Simultaneous Algebraic Reconstruction Technique (SART) is developed in the 

ART setting [1], but it can be written in the general SIRT form (3.1) and we 

therefore categorize it as a SIRT method. 

The SART method is written in the following matrix form: 

x k+1 = x k + λkV −1 A T W(b − Ax k ), 

 

.

3.2 Algebraic Reconstruction Techniques (ART) 21 

where V = diag(ςj) and W = diag 1 

ςi 

i , where ς and ςj denotes the row and 

the column sums: 

ς i = 

ςj = 

n 

j=1 

m 

i=1 

a i j 

a i j 

for i = 1, . . .,m 

for j = 1, . . .,n. 

For this method we assume that ai = 0 and aj = 0, such that A does not contain 

any zero rows or columns. 

Since the SART method has T = I, we cannot use theorem 3.1. The convergence 

for SART was independently developed by Censor, Elfvind in [4] and Jiang, 

Wang in [26]. Both showed that the convergence for SART is within the interval 

(0, 2). 

3.2 Algebraic Reconstruction Techniques (ART) 

We now introduce a different class of methods which we will denote algebraic 

reconstruction techniques (ART). All methods in the ART-class are fully sequential 

method, i.e., each equation is treated at a time, since each equation is 

dependent on the previous. 

3.2.1 Kaczmarz’s Method 

The classical and most known method of the ART class is called Kaczmarz’s 

method, [27]. The method is a so-called row action method, since each iteration 

consist of a ”sweep” through all the rows in the matrix A. Since the method uses 

one equation in each step, an iteration consists of m steps. Figure 3.3 shows 

an example of a sweep for the consistent case with the relaxation parameter 

λk = 1. 

The algorithm for Kaczmarz’s method updates x k in the following way: 

x k,0 = x k , 

x k,i = x k,i−1 bi − 

+ λk 

ai , xk,i−1 ai2 2 

x k+1 = x k,m . 

a i , i = 1, 2, . . .,m,


H4 

H5 

H3 

H2 

H6 

H1 

x k+1 

x k 

Figure 3.3: Kaczmarz’s Method 

If the linear system (2.1) is consistent, then Kaczmarz’s method converges to a 

solution of this system. If the system is inconsistent, then every sub-sequence 

of iterations converges, but not necessarily to a least squares solution. 

In the literature Kaczmarz’s method is also referred to as ART, which can be 

confusing since ART is also the name of algebraic reconstruction techniques in 

general. 

Experiments have shown that Kaczmarz’s method converges fast in the first 

iterations after which it converges very slowly. This is perhaps one of the reasons 

why this method was often used for tomography problems, where the solution 

is often found within few iterations. 

By using SOR-theory it can be shown that Kaczmarz’s method for λ constant 

in each iteration can by written in the form (3.1), but then MA is no longer 

symmetric [13]: 

x k+1 = x k + λA T MA(b − Ax k ), (3.5) 

where MA = (D + λL) −1 . Since MA is not symmetric, we cannot use the 

theory derived for the SIRT-methods. It can on the other hand be proved, that 

for 0 < λ < 2, then the iterations of Kaczmarz’s method (3.5) converge to a


solution of 

H4 

H5 

H3 

H2 

H6 

H1 

x k+1 

x k 

Figure 3.4: Symmetric Kaczmarz 

A T MA(b − Ax) = 0. 

3.2.2 Symmetric Kaczmarz 

A variant of the Kaczmarz method is symmetric Kaczmarz. This method is also 

fully sequential, and it consists of one ”sweep” of the Kaczmarz method followed 

by another ”sweep” of Kaczmarz’s method, where the equations are used in 

reverse order. The iteration for the symmetric Kaczmarz method therefore 

consists of 2m − 2 steps. Figure 3.4 shows an example of an iteration for the 

consistent case with the relaxation parameter λk = 1. 

The algorithm for symmetric Kaczmarz method is the following: 

x k,0 = x k 

x k,i = x k,i−1 bi − 

+ λk 

ai , xk,i−1 ai2 2 

x k+1 = x k,1 , 

where x k,1 denotes the last of the step in (3.6). 

a i , i = 1, . . .,m, . . . ,2 (3.6)


Symmetric Kaczmarz was introduced in [3] and as for the Kaczmarz’s method 

symmetric Kaczmarz can also be rewritten to have the form of the SIRT methods 

[14], where λk = λ: 

x k+1 = x k + λA T MSA(b − Ax k ), 

where MSA is symmetric, which means that the theory for SIRT methods is 

valid, but not practical to implement in this way. 

3.2.3 Randomized Kaczmarz 

The next method we will introduce is the randomized Kaczmarz method. Experience 

has shown that Kaczmarz’s method converges very slowly to the solution. 

The method we present was proposed in [36] and is proved to have exponential 

expected rate of convergence, and the rate does not depend on the number of 

equations in the system. The randomized Kaczmarz method has the following 

form: 

x k+1 = x k + br(i) − ar(i) , xk ar(i) 2 a 

2 

r(i) , 

where the index r(i) is chosen from the set {1, 2, . . ., m} randomly with probability 

proportional with a r(i) 2 2 . 

For the randomized Kaczmarz method we cannot talk about iterations but only 

the number of steps. 

In the definition of randomized Kaczmarz method in [36] the method is presented 

without a relaxation parameter λk, but in our implemented algorithm 

this relaxation parameter is present.. We emphasize that no convergence results 

exist for this parameter and a safe choice would therefore be λk = 1, since we 

then have the originally presented method. 

3.2.4 Extended Kaczmarz Method 

As mentioned earlier Kaczmarz’s method cannot provide a least squares solution 

in the inconsistent case and therefore an extended Kaczmarz method was 

proposed. In this method we also consider the orthogonal projections onto the 

hyperplanes with respect to the columns of A. We let aj denote the j’th column 

of A. The extended Kaczmarz method is given both in a version with and without 

relaxation parameters. We will in this section only consider the version with


relaxation parameters. We again let λ denote the constant relaxation parameter 

of the orthogonal projection for the rows of A and we let α denote the constant 

relaxation parameter for the orthogonal projection using the columns. 

The extended Kaczmarz method has the following algorithm, where x 0 ∈ R n 

and y 0 = b: 

y k,0 = y k 

y k,j = y k,j−1 − α 

y k+1 = y k,n 

b k+1 = b − y k+1 

x k,0 = x k 

aj, y k,j−1 

aj 2 2 

aj 

j = 1, . . .,n 

x k,i = x k,i−1 + λ bk+1 − ai , xk,i−1 a i , i = 1, . . . , m 

x k+1 = x k,m . 

a i 2 2 

For the extended Kaczmarz method it is proven in [34] that for any x 0 ∈ R n 

and for any λ, α ∈ (0, 2) the method converges to a least squares solution. This 

method is not implemented in the package. 

3.2.5 Multiplicative ART 

Another method in the ART class is the multiplicative ART method. This 

method was proposed by in [17]. For this method we assume that x 0 is a n 

dimensional vector of all ones and that all the elements in A are between 0 and 

1, 0 ≤ aij ≤ 1. The multiplicative ART method is given as: 

x k+1 

j 

= 

 

bi 

〈ai , xk aij x 

〉 

k j, 

where i = (k mod m) + 1. Originally when the method was presented is was 

assumed that all the elements in A are either 0 or 1, but later is has been shown 

that if 

• all the entries of A are between 0 and 1 

• A does not have zero rows 

• the system (2.1) has a nonnegative solution


Work Units WU 

Landweber 2 

Cimmino 2 

CAV 2 

DROP 2 

SART 2 

Kaczmarz 4 

Symmetric Kaczmarz 8 

Randomized Kacmarz 4 

Table 3.1: Working units for one iteration of the SIRT and the ART methods. 

then multiplicative ART converges to the maximum entropy of the solution 

Ax = b, which is defined as 

maxent(x) = − 

where ¯x is the average value of xj. 

n 

j=1 

xj xj 

ln 

n¯x n¯x , 

3.3 Considerations Towards the Package 

We have now introduced some SIRT and ART methods. For the package we will 

only use some of the introduced methods. For the methods, which are left out of 

the package we found them interesting, such that they should be described and 

mentioned. In the package we will use the SIRT methods Landweber, Cimmino 

(both the reflection and the projection version), CAV, DROP and SART. We 

have not implemented generalized Landweber, since there is no specific description 

of the T matrix. For the ART methods we have implemented Kaczmarz’s 

method, symmetric Kaczmarz and randomized Kaczmarz. Extended Kaczmarz 

is not implemented, since it requires a choice of two relaxation parameters. The 

method MART is also left out of the package, since the algorithm for MART is 

very different from the other methods. 

We have two classes of methods which cannot be directly compared with respect 

to computational work, since they have different properties. Therefore 

we introduce the concept of a work unit WU. We define a work unit to be one 

matrix-vector multiplication. In appendix A.3 the total work units per iteration 

is calculated for each of the implemented methods. The result is collected in 

table 3.1. We notice that all SIRT methods use 2 WU per iteration, while both 

Kaczmarz’s method and randomized Kaczmarz use 4 WU per iteration, since

3.4 Block-Iterative Methods 27 

we denote one iteration of randomized Kaczmarz to be m random selections 

of a row. Since symmetric Kaczmarz uses twice as many steps per iteration 

as Kaczmarz’s method the work units per iteration for the method is 8. This 

result will be used later to compare the performance of the SIRT and the ART 

methods. 

When comparing the methods imnplemented in this package, the user should 

notice that due to the MATLAB implementation the SIRT methods are much 

faster than the ART methods. The user should also be aware that this is only the 

case since the implementation is done in MATLAB, where loops are slow. Using 

another language there would not be this difference in the running time. When 

implementing the SIRT methods in MATLAB a dilemma occurs between speed 

and memory. When creating the matrices M and T we have mostly chosen 

the fastest implementation but in case of memory trouble most of the SIRT 

methods also have an alternative implementation which require less memory 

but with a slower running time. If the alternative code exists it can be found in 

the comments in the code. 

In the following chapters it might seem to the user that we prefer the SIRT 

methods, since most of the remaining theory is for the SIRT methods, but this 

is only the case since a corresponding theory cannot be found for the ART 

methods. 

3.4 Block-Iterative Methods 

We will now look into the field of block-iterative methods although they are 

not a part of this package. The idea of this class of methods is to partition the 

system (2.1) into so-called blocks of equations and treat each block according 

to the given iterative method by passing cyclic over all the blocks. Most of the 

theory for block-iterative methods is based on the assumption that equations 

can appear in more than one block, but in the following we will always look 

at the case with disjoint partitioning, i.e. every equation can only appear in 

exactly one block. 

For the case of disjoint partitioning we have the following structure of the system: 

⎛ 

⎜ 

A = ⎜ 

⎝ 

A1 

A2 

. 

. 

Ap 

⎞ 

⎛ 

⎟ ⎜ 

⎟ ⎜ 

⎟ , b = ⎜ 

⎠ ⎝ 

b1 

b2 

. 

. 

bp 

⎞ 

⎛ 

⎟ 

⎠ , AT ⎜ 

= ⎜ 

⎝ 

B1 

B2 

. 

. 

Bq 

⎞ 

⎟ 

⎠ ,


where p denotes the number of blocks for the linear system and q denotes the 

number of blocks for A T . 

For t = 1, . . .,p we let the block with the index Bt ⊆ {1, . . .,m} be a ordered 

subset of the form 

 

Bt = i t 1, i t 2, . . .,i t 

m(t) , 

where m(t) is the number of elements in Bt. 

We will now introduce some block-iterative methods, but since this software 

package does not include block-iterative methods, we will only look at a small 

selection. Other block-iterative methods can be found in for example [34], [18]. 

3.4.1 Block-Iteration 

The first block-iterative method we will introduce is called the Block-Iteration. 

This method was first proposed by Elfving and later generalized by Eggermont, 

Herman and Lent. The method is also known as the ordinary Block-Kaczmarz 

method. For x 0 ∈ R the algorithm can be written as: 

x k,0 = x k 

x k,t = x k,t−1 + λtA T t Mt(b t − Atx k,t−1 ), t = 1, 2, . . ., p 

x k+1 = x k,p , 

where λt is a set of relaxation parameters and Mt is a set of given symmetric 

positive definite matrices. In the algorithm originally proposed by Elfving we 

had that Mt = (AtA T t )−1 and λt = λ. 

For p = 1, i.e. only one block we have that the method is given on the standard 

SIRT form (3.1) with T = I and this is called a fully simultaneous iteration. 

With p = m we on the other hand have a fully sequential iteration since each 

block consists of only one equation. 

In [14] it is proven that if 

0 < ǫ ≤ λt ≤ 

ρ(A T t 

2 − ǫ 

, 

MtAt) 

for t = 1, . . .,p, then the Block-Iteration method converges. 

One block-iteration is defined as a pass through all data and since the Block- 

Iteration method uses a single block in each block-step every block-iteration


consists of p steps. One block-iteration of the Block-Iteration with the relaxation 

parameter λk can be written as: 

x k+1 = x k + A T ¯ MB(b − Ax k ), (3.7) 

¯MB = ( ¯ D + L) −1 

where ¯ D is block-diagonal and L is block-lower triangular and defined as: 

⎛ 

0 

⎜ 

L = ⎜ A2A 

⎜ 

⎝ 

0 

T 1 

. 

. .. 

. .. . .. 

⎞ 

⎟ 

⎠ , 

⎛ 

λ 

D ¯ ⎜ 

= ⎝ 

−1 −1 

1 M1 . .. 

0 

ApA T 1 · · · ApA T p−1 0 

The sequence defined by (3.7) converges towards the solution of 

A T ¯ MB(b − Ax) = 0. 

3.4.2 Symmetric Block-Iteration 

0 λ−1 p M −1 

p 

⎞ 

⎟ 

⎠ .(3.8) 

In Symmetric Block-Iteration one block-iteration consists of first one blockiteration 

of the above Block-Iteration method followed by another block-iteration, 

where the blocks appear in reverse order. This gives the algorithm the following 

control order t = 1, 2, . . .,p − 1, p, p − 1, . . . 1. 

The algorithm for the symmetric block-iteration for x 0 ∈ R n looks as follows: 

x k,0 = x k 

x k,t = x k,t−1 + λtA T t Mt(b t − Atx k,t−1 ), (3.9) 

x k+1 = x k,1 , 

where t = 1, . . .,p − 1, p, p − 1, . . .,1 and x k,1 denotes the last step in (3.9). 

One block-iteration of the Symmetric Block-Iteration method can be written in 

a general form, where we let 

AA T = L + D + L T 

be the splitting of AA T into its lower block triangular, block diagonal and upper 

block triangular parts. The block-iteration can then be written as: 

x k+1 = x k + A T ¯ MSB(b − Ax k ). (3.10)


Using (3.8) and ˜ D = 2 ¯ D − D we get 

¯MSB = ( ¯ D + L T ) −1 ˜ D( ¯ D + L) −1 , 

where ¯ MSB is symmetric positive definite. 

From [14] we have that the block-iterations of Symmetric Block-Iteration (3.10) 

converge to a solution x of the weighted least squares problem 

min 

x Ax − b ¯ MSB . 

If in addition x 0 ∈ R(A T ), then x is the unique solution of minimal 2-norm, 

where the corresponding normal equations are 

A T ¯ MSB(b − Ax) = 0. 

3.4.3 Block-Iterative Component Averaging Methods (BI- 

CAV) 

Earlier we defined the CAV method as one of the SIRT methods. The Block- 

Iterative Component Averaging method (BICAV) is the block version of the 

CAV method introduced in [7]. As for the CAV method we will define the 

factor s t j . In the BICAV case st j 

is the number of nonzero elements in the j’th 

column of At for t = 1, 2, . . .,p. The BICAV method can then be written on 

the following form: 

x k+1 

j 

= xk j + λk 

where ai t(k) 2S = n j=1 st(k) 

us to the following matrix form: 

 

i∈B t(k) 

bi − ai , xk a i j, 

a i t(k) 2 S 

j (a i j )2 , t(k) = (k mod p) + 1 and k ≥ 0. This lead 

x k+1 = x k + λkA T t(k) M t(k)(b t(k) − A t(k)x k ), (3.11) 

 

where M = diag 1/ai t(k) 2 

S for all i = 1, . . .,m. 

In [4] the following convergence theorem is proven for the BICAV method: 

Theorem 3.3 For 

0 < ǫ ≤ λk ≤ (2 − ǫ)/ρ(A T t(k) M t(k)A t(k)),


where ǫ is an arbitrarily small but fixed constant and M t(k) are given symmetric 

and positive definite matrices with the control t(k), then any sequence generated 

by (3.11) converges to a solution for (2.1). If in addition x 0 ∈ R(A T ), then x k 

converges to the solution of minimum 2-norm. 

The BICAV method has the property that for p = 1 the method becomes fully 

simultaneous, i.e. it becomes the CAV method. For p = m we on the other 

hand have that BICAV becomes the well-known Kaczmarz’s method. 

3.4.4 Block-Iterative Diagonally Relaxed Orthogonal Projections 

(BIDROP) 

For the general SIRT methods we described a method called DROP and we 

will now introduce its block-iterative generalization, which we will call Block- 

Iterative Diagonally Relaxed Orthogonal Projections (BIDROP). 

If we let Wt be positive definite diagonal matrices and Ut be symmetric positive 

definite matrices for t = 1, 2, . . ., p, then the algorithm for the BIDROP method 

looks as follows: 

x k+1 = x k + λkU t(k)A T t(k) W t(k)(b t(k) − Atkx k ), (3.12) 

where t(k) = (k mod p) + 1. 

The following convergence theorem is derived for the BIDROP method: 

Theorem 3.4 Let U be a given symmetric and positive definite matrix, and let 

Wt be given positive definite diagonal matrices. If for all k ≥ 0, 

0 < ǫ ≤ λk ≤ (2 − ǫ)/ρ(UA T t(k) W t(k)A t(k)), 

where ǫ is an arbitrarily small but fixed constant, then any sequence generated 

by (3.12) converges to a solution. If in addition x 0 ∈ R(UA T ), then the solution 

has minimal U −1 -norm. 

With only one block, i.e. p = 1, and U1 = S and W1 = W, then we have the 

standard DROP method. 

The BIDROP method is a general method since Ut and Wt is not specific given. 

One of the variants of BIDROP is introduced in [5] and is called BIDROP1.


This method has the following scheme: 

x k+1 = x k + λkU 

where µ t(k) 

q is defined as 

µ t(k) 

q 

m(t(k)) 

q=1 

= 

w t(k) 

q = 1 

µ t(k) 

 

q b t(k) − a iq it(k) q , x k 

a it(k) q , 

w t(k) 

q 

a it(k) 

q 2 2 

, where 

for q = 1, 2, . . .,m(t). The matrix U is fixed for each block, i.e. Ut = U and is 

given as 

 

1 

U = diag , 

τj 

where τj = max st j |t = 1, . . .,p and st j is the number of nonzero elements in 

column j for the block At.

Chapter 4 

Semi-Convergence and Choice 

of Relaxation Parameter 

4.1 Semi-Convergence for SIRT Methods 

For the SIRT methods on the form (3.1) with T = I, theorem 3.1 ensures the 

convergence to a solution of the least squares problem Ax − bM, but when 

solving linear ill-posed problems with iterative methods we are typically more 

interested in the earlier mentioned semi-convergence behaviour. We will now 

take a close look at the semi-convergence for the SIRT methods [16]. To make 

the presentation simpler we assume that m ≥ n, but the used theory can be 

applied regardless the dimensions. 

We assume that the noise in the right-hand side is additive i.e., 

b = ¯ b + δb, 

where ¯ b is the noise-free right-hand side and δb is the noise component which 

can be caused by discretization errors and measurement errors. 

We want to analyze the semi-convergence behaviour of the SIRT scheme where 

T = I. To do this we assume that the relaxation parameter λ is constant for all 

iterations. For convenience we introduce 

B = A T MA and c = A T Mb,

34 Semi-Convergence and Choice of Relaxation Parameter 

and let the singular value decomposition (SVD) of M 1 

2A be 

M 1 

2A = UΣV T , 

where Σ = diag(σ1, . . . , σp, 0, . . .,0) with σ1 ≥ σ2 ≥ . . . ≥ σp > 0, and 

rank(A) = p. 

From the SIRT scheme we get the following: 

x k = x k−1 + λA T M(b − Ax k−1 ) 

= x k−1 + λA T Mb − λA T MAx k−1 

= x k−1 + λc − λBx k−1 

= (I − λB)x k−1 + λc. 

By direct insertion we obtain, for k = 1, 

Similar we for k = 2 get: 

Similar we for k = 3 get: 

x 1 = (I − λB)x 0 + λc. 

x 2 = (I − λB)x 1 + λc 

x 3 = (I − λB)x 2 + λc 

= (I − λB) (I − λB)x 0 + λc + λc 

= (I − λB) 2 x 0 + (I − λB)λc + λc 

= (I − λB) 2 x 0 + ((I − λB) + I) λc. 

= (I − λB) (I − λB) 2 x 0 + ((I − λB) + I)λc + λc 

= (I − λB) 3 x 0 + (I − λB) 2 + (I − λB) λc + λc 

= (I − λB) 3 x 0 + (I − λB) 2 + (I − λB) + I λc. 

It can then be seen that the k’th iteration can be written as 

x k = (I − λB) k x 0 k−1 

+ λ 

 

(I − λB) j c. 

j=0 

Using the SVD for M 1 

2A we can rewrite B: 

where 

B = 

 

M 1 

T 

2 A M 1 

 

2A = V Σ T Σ T V T = V FV T , (4.1) 

F = diag σ 2 1, σ 2 2, . . . , σ 2 p, 0, . . .,0 .

4.1 Semi-Convergence for SIRT Methods 35 

By using (4.1) we can then write 

k−1 

(I − λB) j = 

j=0 

= 

k−1 T T 

V V − λV FV j 

k−1 T 

= V (I − λF)V j 

j=0 

j=0 

k−1 j T 

V (I − λF) V ⎛ 

k−1 

= V ⎝ (I − λF) j 

⎞ 

⎠ V T 

j=0 

= V EkV T , 

where the i’th diagonal element of Ek is 

 

k−1 

(1 − λσ 2 i ) j = 1 + (1 − λσ 2 i ) + (1 − λσ 2 i ) 2 + . . . + (1 − λσ 2 i ) k−1 

j=0 

j=0 

= 1 − (1 − λσ2 i )k 

1 − (1 − λσ2 i ) = 1 − (1 − λσ2 i )k 

λσ2 i 

where the formula for geometric series is used to obtain the last result. The 

matrix Ek then has the following form: 

Ek = diag 

 

1 − (1 − λσ 2 1 )k 

λσ 2 1 

Assuming that x0 = 0 we can then write x k as 

, . . . , 1 − (1 − λσ2 p )k 

λσ2 

, 0, . . .,0 . 

p 

x k = V (λEk)V T c = V (λEk)V T A T Mb (4.2) 

= V (λEk)V T V Σ T U T M 1 

2( ¯b + δb) 

p 2 

= 1 − (1 − λσi ) k uT i 

i=1 

, 

M 1 

2 ( ¯ b + δb) 

where ui and vi are the columns of U and V respectively and ϕ k i = 1−(1−λσ2 i )k 

for i = 1, 2, . . ., p are the filter factors [20, p. 138]. 

The minimum-norm solution to the weighted least squares problem with the 

noise-free right-hand side ¯x = argminAx − ¯ bM can, using SVD, be written as 

where 

σi 

¯x = V EΣ T U T M 1 

2¯b, (4.3) 

 

1 

E = diag 

σ2 , 

1 

1 

σ2 , . . . , 

2 

1 

σ2 

, 0, . . .,0 . 

p 

vi,


The error in the k’th iterate can then be expressed as 

x k − ¯x = V (λEk)Σ T U T M 1 

2 ( ¯b + δb) − V EΣ T U T M 1 

2¯b = V (λEkΣ T U T M 1 

2¯b + λEkΣ T U T M 1 

2δb − EΣ T U T M 1 

2¯b 

= V (λEk − E)Σ T U T M 1 

2¯b + λEkΣ T U T M 1 

 

2δb . 

We then define D k 1 and Dk 2 as 

and 

D k 1 = (λEk − E)Σ T = −diag 

D k 2 = λEkΣ T = diag 

 

 

(1 − λσ 2 1 )k 

σ1 

1 − (1 − λσ 2 1 )k 

σ1 

ˆb = 

1 T 

U M 2¯b δˆb = U T M 1 

2 δb, 

then we can write the projected error e V,k as 

e V,k ≡ V T (x k − x ∗ ) = D k 1 ˆ b + D k 2 δˆ b. 

For the later analysis we define the following functions: 

Φ k (σ, λ) = (1 − λσ2 ) k 

Ψ k (σ, λ) = 1 − (1 − λσ2 ) k 

We can then write the j’th component for e as 

e V,k 

j = −Φk (σi, λ) ˆ bj + Ψ k (σi, λ)δ ˆ bj, 

, . . . , (1 − λσ2 p ) 

 

, 0, . . .,0 (4.4) 

σp 

, . . . , 1 − (1 − λσ2 p ) 

 

, 0, . . .,0 ,(4.5) 

σ 

σ 

σp 

(4.6) 

where the first term is an iteration-error and the second term is a noise-error. It 

is the interplay between the iteration-error and the noise error that explains the 

semi-convergence behaviour. Figure 4.1 shows Φ k (σ, λ) and Ψ k (σ, λ) for fixed λ 

and various σ as function of the iteration index k. It can be seen that for small 

values of k the noise-error is negligible and the iteration seems to converge to 

the exact solution. When the noise-error reaches the order of magnitude of the 

approximation error, then the propagated noise-error is no longer hidden in the 

regularized solution, and the total error starts to increase. 

We now want to investigate the behaviour of the functions Φ k (σ, λ) and Ψ k (σ, λ).

4.1 Semi-Convergence for SIRT Methods 37 

20 

10 

σ = 0.0468 

Φ k (σ,λ) 

Ψ k (σ,λ) 

0 

0 10 20 30 

40 

20 

σ = 0.0247 

Φ k (σ,λ) 

Ψ k (σ,λ) 

0 

0 10 20 30 

30 

20 

10 

300 

200 

100 

σ = 0.0353 

Φ k (σ,λ) 

Ψ k (σ,λ) 

0 

0 10 20 30 

σ = 0.0035 

Φ k (σ,λ) 

Ψ k (σ,λ) 

0 

0 10 20 30 

Figure 4.1: The behaviour of Φ k (σ, λ) and Ψ k (σ, λ) for fixed λ and various σ as 

function of the iteration index k. 

Proposition 4.1 Let 

0 < ǫ ≤ λ ≤ 2/σ 2 1 − ǫ, and 0 < σp ≤ σ < 1 

√ λ . (4.7) 

a) For λ and σ fixed then Φ k (σ, λ) is decreasing and convex and Ψ k (σ, λ) is 

increasing and concave. 

b) For all integers k > 0 it holds that Φ k (σ, λ), Ψ k (σ, λ) ≥ 0 and Φ k (σ, 0) = 

1 

σ , Ψk (σ, 0) = 0. 

c) For λ fixed and k ≤ 0, then as function as σ Φ k (σ, λ) is decreasing. 

The proof for this proposition can be found in [16]. 

Remark 4.2 The upper bound for σ in (4.7) is ˆσ = 1/ √ λ. When 0 < ǫ ≤ λ ≤ 

1/σ 2 1 then ˆσ ≥ σ1 and when 1/σ 2 1 < λ < 2/σ2 1 then ˆσ ≥ 1/ 2/σ 2 1 = σ1/ √ 2. 

Hence ˆσ ≥ σ1/ √ 2 for all relaxation parameters λ satisfying (4.7). 

For small values of k the noise-errors expressed via Ψ k (σ, λ) is negligible and the 

iteration approaches the exact solution. When the noise-error reaches the same 

order of magnitude as the approximation error, the propagated noise-error is no 

longer hidden in the iteration vector and the total error starts to increase.


Proposition 4.3 Assume that (4.7) of proposition 4.1 holds, and let λ be fixed. 

For k ≥ 2 it holds: There exist a point σ ∗ k ∈ (0, 1/ (λ)) such that 

where σ ∗ k 

is unique and 

σ ∗ k 

= arg max 

0

4.2 Choice of Relaxation Parameter 39 

10 3 

10 2 

10 1 

10 −3 

10 0 

Ψ k (σ,100) 

10 −2 

σ 

k = 10 

k = 30 

k = 90 

k = 270 

1/σ 

σ * 

k 

Figure 4.2: The function Ψ k (σ, λ) as function of σ for λ = 100 and k = 10, 30, 90 

and 270. The dashed line illustrates 1/σ. The black dots denotes the maximum of the 

functions. 

4.2.1 Training to Optimal Choice 

The purpose of this strategy is to find a constant relaxation parameter λ = λk 

of optimal choice, when the exact solution ¯x is known. But how do we define 

the concept of an “optimal λ-value”. Since the ART and the SIRT methods 

have different properties, we will treat each class of methods separately. 

SIRT Methods 

Usually the goal of a reconstruction method is to minimize the relative error. 

The challange is to do this when the exact solution is unknown; but we can study 

the behaviour of the methods for problems with known solutions. Figure 4.3 

shows the relative error as function of the iteration number k for 9 values of λ for 

three different noise levels. For all three noise levels it holds that the minimum 

relative error reaches the same resolution limit for many different values of λ. 

Figure 4.4 illustrates the minimum relative error for different λ-values for δ = 

0.03. The green lines illustrate an interval that includes ±0.015% of the resolution 

limit. From this we observe that we for almost all the λ-values have the 

10 −1


λ = 10 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 80 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 120 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 30 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 100 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 130 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 60 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 110 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 150 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

(a) Noise level δ = 0.03 

λ = 10 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 80 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 120 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 30 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 100 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 130 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 10 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 80 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 120 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

(c) Noise level δ = 0.08 

λ = 30 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 100 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 130 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 60 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 110 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 150 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

(b) Noise level δ = 0.05 

λ = 60 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 110 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

λ = 150 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 

Figure 4.3: The figure shows the relative error histories for nine values of λ using a 

SIRT method. Each subfigure represents a specific noise level. 

relative error 

0.5 

0.45 

0.4 

0.35 

0.3 

0.25 

Minimum relative error as function of lambda 

0.2 

0 50 100 150 

λ 

Figure 4.4: Illustration of the minimum relative error for different λ-values for a 

SIRT method. The dots denote the relative errors while the green dashed lines show 

the interval of ±0.015% of the resolution limit.


k opt 

Optimal number of iterations k opt as function of lambda 

100 

80 

60 

40 

20 

0 

0 25 50 75 100 125 150 

λ 

Figure 4.5: The optimal number of iterations kopt as function of the λ-values for a 

SIRT method. 

minimum relative error inside this interval. The only exception is when λ is 

close to either 0 or 2 

σ2 . We are now convinced that the minimum relative error 

1 

reaches the resolution limit for many different values of λ, and we then need 

another way to distinguish between the different λ-values. 

We therefore take a second look at figure 4.3. The difference between the error 

histories for different λ-value is the iteration number for which the minimum 

relative error is reached. From this we would like to define the optimal λ-value 

as the λ which gives rise to the fastest convergence to the smallest relative error 

in the solution. “Training” is a strategy that selects the optimal λ from a test 

problem with a known solution. The hope is that the λ chosen this way is also 

a good choice for a real problem. This is the case if the test problem is chosen 

to reflect the properties of the real problem. 

This definition leads us to a strategy in two parts, where the first part is to 

determine the resolution limit and the second part is to determine the λ-value, 

which reaches the resolution limit using the smallest number of iterations. From 

figure 4.3 and 4.4 we conclude that using λ = 1 

σ2 would be a safe choice of 

1 

relaxation parameter to determine the resolution limit since it represents the 

midpoint of the convergence interval. We therefore find the minimum relative 

error and define the upper bound of the resolution limit to be the relative error 

plus 1%. We define the upper bound of the resolution limit to be ub. 

For the second part of the strategy we use a modified version of the golden 

section search to find the value of λ that reaches the resolution limit within the 

smallest number of iterations [38]. The requirement for using golden section 

search is that the function that we want to minimize is unimodal. Figure 4.5 

illustrates the optimal number of iterations kopt as function as λ. From this


figure it seems reasonable to assume, that we have an unimodal function. We 

also notice, that the λ value we seek lies in the right part of the interval. 

In our modified golden section search we denote the search interval (a, b), which 

is the convergence interval for the given SIRT method. For this method we 

also need two interior points, which we define to be c = a + r(b − a) and 

d = a+(1 −r)(b −a), where r = 3−√ 5 

2 . The reason for this choice can be found 

in [38]. 

We then define the function values fc and fd of the interior points c and d to 

be the iteration number which corresponds to the solution with the smallest 

relative error with λ equal to c and d respectively. We also define the smallest 

relative error for each of the interior points as xc and xd. 

In the ordinary golden section search the function values are used to reduce 

the interval. In our modified version we also use the knowledge of the value of 

the smallest relative error. We therefore reduce the interval according to the 

following properties in the given order: 

If xc > ub: This means that the relative error for λ = c has not reached the 

resolution limit, and since tests have shown that the optimal value lies in 

the right part of the interval, we can reduce the interval to (c, b). 

If xd > ub: In this case we have that the relative error for λ = d is outside the 

resolution interval. When we reach this point, then we know that λ = c 

is inside the resolution interval, and using this information we can remove 

the right part of the interval, such that our new interval is (a, d). 

If fc ≥ fd: In this case both the point c and d are allowed values of λ. Our 

new objective is to determine the minimum number of iterations used. If 

fc is greather than or equal to fd, then acoording to the unimodality we 

can reduce the interval to (c, b). We choose this case to be the tiebreaker, 

if fc = fd, since we have assumed that the optimal value lies in the right 

part of the interval. 

If fd > fc: In the last case we again have that both the point c and d are 

allowed values of λ. We reduce the interval to (a, d) according to the 

assumption of unimodality. 

The reductions continue until the difference between c and d is very small, and 

the optimal value of λ is then chosen to be λ = (c+d) 

2 .


λ = 0.1 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.7 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 1.4 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.3 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.9 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 1.7 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

(a) Noise level δ = 0.03 

λ = 0.5 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 1.1 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 2 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.1 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.7 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 1.4 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.3 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.9 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 1.7 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.1 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.7 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 1.4 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

(c) Noise level δ = 0.08 

λ = 0.3 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.9 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 1.7 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 0.5 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 1.1 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 2 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

(b) Noise level δ = 0.05 

λ = 0.5 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 1.1 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

λ = 2 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 10 

Figure 4.6: The figures show the relative error histories for nine values of λ using an 

ART method. Each subfigure represent a specific noise level. 

ART Methods 

Inspired by the modified golden section search method for the SIRT algorithms 

we look at figure 4.6, which shows the relative error as function of the iteration 

number k for nine values of λ for three different noise levels. We notice that 

not all values of λ reach the resolution limit. From figure 4.7 we clearly see 

that only a small number of λ-values reaches the so-called resolution limit. We 

would like to keep the definition of the optimal λ-value and the overall structure 

of the strategy to find this, but we need to make some changes that makes the 

method fit to the ART methods. 

Again we keep our strategy in two parts, where the first part is to determine the 

resolution limit and the second part is to determine the λ-value, which reaches 

the resolution limit using the fewest number of iterations. From figures 4.6 and



0.5 

0.45 

0.4 

0.35 

0.3 

0.25 


0.2 

0 0.5 1 1.5 2 

λ 

Figure 4.7: The figures shows the relative error histories for 9 values of λ using a 

ART method. Each subfigure represent a specific noise level. 

k opt 

10 

8 

6 

4 

2 


0 

0 0.5 1 1.5 2 

λ 

Figure 4.8: The optimal number of iterations kopt as function of the λ-values for an 

ART method. 

4.7 we conclude that using λ = 0.25 would be an appropriate choice. Note 

that the convergence interval for all ART methods is (0, 2). Again we find the 

smallest relative error and define the upper bound of the resolutions limit ub to 

be this relative error plus 1%. 

For the second part of the strategy we use another modified version of golden 

section search. From figure 4.8 and 4.7 we conclude that it sounds reasonable 

to assume that the function is unimodal, since most of the interval will be 

discarded, since the relative error is above the upper bound of the resolution 

limit. We notice that for the ART methods the λ-value we seek lies in the left 

part of the interval. 

As before we denote the search interval (a, b), where a = 0 and b = 2. The 

interior points c and d, the function values fc and fd and the value xc and xd


are defined as above. The reduction of the interval follows the given order: 

If xd > ub: This means that the relative error for λ = d has not reached the 

resolution limit, and since tests have shown that the optimal value lies in 

the left part of the interval, we can reduce the interval to (a, d). 

If xc > ub: In this case we have that the relative error for λ = c is outside the 

resolution interval. When we reach this point, then we know that λ = d 

is inside the resolution interval, and using this information we can remove 

the right part of the interval, such that our new interval is (c, b). 

If fc > fd: In this case both the point c and d are allowed values of λ. Our 

new objective is to determine the minimum number of iterations used. If 

fc is greather than or equal to fd, then acoording to the unimodality we 

can reduce the interval to [c, b]. 

If fd ≥ fc: In the last case we again have that both the point c and d are 

allowed values of λ. We reduce the interval to (a, d) acoording to the 

assumption of unimodality. We choose this case to be the tiebreaker, if 

fc = fd, since we have assumed that the optimal value lies in the left part 

of the interval. 

Again the reductions continue until the difference between c and d is very small, 

and the optimal value of λ is chosen to be λ = (c+d) 

2 . 

Introducing Maximum Number of Iterations 

In both the implementation of the strategy for SIRT methods and ART methods 

a default number of iteration is used when the resolution limit is determined. 

For some problems it could be the case that we, with the default number of 

iterations, do not reach the point in the semi-convergence where the relative 

error starts to increase. It is therefore possible for the users to increase the 

maximum number of iterations by use of an input parameter. 

This input parameter can, on the other hand, also be decreased if the user will 

only allow a smaller number of iterations. In this case a possible consequence 

could be, that with the given number of iterations the solutions does not reach 

the point in the semi-convergence, where the relative error again starts to increase. 

If this is not the case for λ = 1 

σ2 for the SIRT methods and λ = 0.25 for 

1 

the ART methods, then our introduced method does not find the actual resolution 

limit. The problem will then not have the earlier shown properties, and the 

problem to solve is completely different. Figure 4.9 shows the relative errors for


λ = 10 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 

λ = 80 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 

λ = 120 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 

λ = 30 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 

λ = 100 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 

λ = 130 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 

λ = 60 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 

λ = 110 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 

λ = 150 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 5 

Figure 4.9: The figure shows the relative error histories for nine values of λ using a 

SIRT method when the maximum number of iteration is 7. 


0.5 

0.45 

0.4 

0.35 

0.3 

0.25 


0.2 

0 50 100 150 

λ 

Figure 4.10: Illustration of the minimum relative error for different λ-values for a 

SIRT method, when the maximum number of iteration is 7. The dots denote the relative 

errors while the green dashed lines show the interval of ±0.015% of the resolution limit, 

which is found in 1 

σ 2 1 

.


k opt 

8 

7.5 

7 

6.5 


6 

0 25 50 75 100 125 150 

λ 

Figure 4.11: The optimal number of iterations kopt as function of the λ-values for a 

SIRT method, when the maximum number of iterations is 7. 

nine different values of λ, i.e. the same as figure 4.3. The only difference is that 

the allowed number of iterations is 7. We observe that the minimum relative 

error for all nine values of λ is 7 which indicates that the actually minimum is 

not found. Figure 4.10 illustrates the minimum relative error for different values 

of λ. We now notice that the interval for the resolution limit no longer contain 

most of the relative errors. Figure 4.11 shows the optimal number of iterations 

as function as λ. We observe that all λ-values give rise to the same number 

of iterations 7, which is the maximum number of iterations. In this case we 

cannot rely on the fact, that the introduced strategy for finding λ will return a 

reasonable result. 

The implemented versions of the defined strategies contain some kind of check, 

that can determine if the actual resolution limit is reached. If this is the case, 

then the original strategy is used. Otherwise the program uses a different approach. 

In the case the resolution limit is not reached, then the used number 

of iterations is the same for almost every λ-value, which can be seen in figure 

4.11. The relative error at this point is different, and the golden section search 

will consider the relative error instead of the number of iterations. 

4.2.2 Line Search 

The next strategy we will present is based on picking λk such that the error 

¯x − x k 2 is minimized in each iteration. This type of methods are also known 

as line search and are only derived for SIRT methods, where T = I [2], [9], [10], 

[11]. In the following we will derive the line-search strategy for the different 

SIRT methods, but we will assume that the problem is consistent, i.e. A¯x = b,


x k 

where ¯x denotes the exact solution. 

x k+1 

p 

✸ 

k 

Figure 4.12: Illustratation of line search 

In general we can write all the SIRT methods as 

¯x 

x k+1 = x k + λkp k , (4.10) 

where p k then varies with the method. When using line search the aim is to find 

the minimum euclidean distance from the next iteration to the exact method: 

min x k+1 − ¯x2. 

By looking at figure 4.12 we see that the minimum solution also can be found 

as finding x k+1 such that the direction p k from the existing step is orthogonal 

to the vector x k+1 − x ∗ , i.e. 

〈p k , x k+1 − ¯x〉 = 0. 

Using the expression for the method (4.10) we then get: 

〈p k , x k + λkp k − ¯x〉 = 〈p k , x k − ¯x〉 + λk〈p k , p k 〉 = 0. 

From this it follows that 

λk = 〈pk , ¯x − xk 〉 

pk2 . 

2 

We will now derive the formula for all the SIRT methods, where T = I, i.e. we 

have that p k = A T M(b − Ax k ). For the numerator we get: 

〈A T M(b − Ax k , ¯x − x k )〉 = 

〈M(b − Ax k ), A(¯x − x k )〉 = 

〈M(b − Ax k ), A¯x − Ax k 〉.


We then use that A¯x = b and define r k = b − Ax k . This gives us the following 

for the numerator: 

For the denominator we get: 

〈M(b − Ax k ), b − Ax k 〉 = 〈Mr k , r k 〉. 

p k 2 2 = A T M(b − Ax k ) 2 2 = A T Mr k 2 2. 

This gives us the following method to determine λk: 

λk = 〈Mrk , rk 〉 

ATMr k2 . (4.11) 

2 

4.2.3 Relaxation to Control Noise Propagation 

We will now introduce two strategies to choose the relaxation parameter λk. 

Both methods arise from the analysis of the semi-convergence behaviour and are 

only derived for the SIRT methods where T = I, but in the software package 

the developed theory can also be used for SIRT methods where T = I although 

the theory is not valid. The motivations for these methods are to monitor and 

control the noise-part of the error. The methods are presented in [16] and all 

the used proofs can be found there also. 

The first strategy we will denote Ψ1-based relaxation and it takes the following 


λk = 

√ 

2 

σ2 1 

2 

σ2 1 

for k = 0, 1 

(1 − ζk) for k ≥ 2 

where ζk is the unique root in (0, 1) of the polynomial (4.9). 

, (4.12) 

The following theorem ensures that the iterates produced with the strategy 

(4.12) converge towards the weighted least squares solutions: 

Theorem 4.6 The iterates produced using the Ψ1-based relaxation strategy (4.12) 

converge toward a solution of minx Ax − bM. 

We first assume that λ is fixed in the first k iterations 

λj = λ, j = 0, 1, . . ., k − 1.


With this assumption we can use the theory of semi-convergence from section 

4.1. We let x k and ¯x k denote the iterates from (3.1) with noisy and noise free 

data respectively. The error in the k’th iterate satisfies 

x k − ¯x2 ≤ ¯x k − ¯x2 + x k − ¯x k 2, 

and the error is decomposed into two parts the iteration error ¯x k − ¯x and the 

noise error x k − ¯x k . Using (4.2), (4.3), (4.4) and (4.5) we get 

¯x k − ¯x = V (λEk)Σ T U T M 1 

1 

2¯ T T 

b − V EΣ U M 2¯b = V (λEk − E)Σ T U T M 1 

2¯b = V D k 1U T M 1 

2¯b x k − ¯x k = V (λEk)Σ T U T M 1 

2 b − V (λEk)Σ T U T M 1 

2¯b = V D k 2U T M 1 

2(b − ¯b) = V D k 2UT M 1 

2δb. 

The noise-error is then bounded by 

x k − ¯x k 2 ≤ max 

1≤i≤p Ψk (σi, λ)M 1 

2 δb2. 

We then assume that λ ∈ (0, 1/σ2 1 ] and using Remark 4.2 we have that ˆσ ≥ σ1 

and it then follows that for k ≥ 2 

max 

1≤i≤p Ψk (σi, λ) ≤ max Ψ 

0≤σ≤σ1 

k (σ, λ) ≤ max 

0≤σ≤ˆσ Ψk (σ, λ) = Ψ k (σ ∗ k, λ). (4.13) 

It then follows using (4.6) and (4.8) 

x k − ¯x k 2 ≤ Ψ k (σ ∗ k, λ)M 1 

2 δb2 = 

 

1 − 

1 − λ 1−ζk 

1−ζk 

λ 

λ 

k 

M 1 

2δb2 

= √ λ 1 − ζk k √ M 

1 − ζk 

1 

2δb2. (4.14) 

Then consider the k’th iteration and choose λk from (4.12). With the assumption 

that λj + 1/λj ≈ 1, which holds for (4.12), we can assume that (4.14) holds 

approximatively. By substituting (4.12) into (4.14) we get for k ≥ 2 

x k − ¯x k 2 ≤ √ λ 1 − ζk k √ M 

1 − ζk 

1 

2δb2 

 

 

√ 

2 

1 − ζ 

1 − ζk 

σ1 

k k √ M 

1 − ζk 

1 

2δb2 

√ 

2 

σ1 

(1 − ζ k k)M 1 

2 δb2.


This implies that the Ψ1-based strategy gives an upper bound for the noise-part. 

For the case λ ∈ (1/σ2 1 , 2/σ2 1 ) equation (4.13) only holds approximatively. However 

for the Ψ1-based relaxation we have that λ ≤ 1/σ2 1 for small values of k. 

The second strategy we denote Ψ2-based relaxation and it takes the following 


λk = 

√ 

2 

σ2 1 

2 

σ2 1−ζk 

1 (1−ζk k )2 for k ≥ 2 

for k = 0, 1 

. (4.15) 

We use the same approach as for the Ψ1-based relaxation and substitute (4.15) 

into (4.14) and we will then get the following bound for the noise error using 

Ψ2-based relaxation: 

x k − ¯x k 2 ≤ 

√ 2 

σ1 

M 1 

2δb2 

In [16] it is shown that iterates produced with the Ψ2-based relaxation converge 

towards the weighted least squares solution. 

In [16] the possibility for using an accelerated modification of the strategies Ψ1 

and Ψ2 are discussed. The idea is to choose ¯ λk = τkλk for k ≥ 2, where τk is 

the parameter to be chosen. For the Ψ1 strategy this modification would mean 

that 

¯λk = τk,1 

2 

(1 − ζk) k ≥ 2. (4.16) 

σ 2 1 

For τk,1 < (1−ζk) −1 we would stay inside the convergence interval. By choosing 

the parameter τk,1 to be constant for all iterations k we must use τk,1 = τ1 = 

(1 −ζ1) −1 ≃ 1.5. For the Ψ2 strategy the modification takes the following form: 

¯λk = τk,2 

2 

σ 2 1 

1 − ζk 

(1 − ζ 2 k )2 k ≥ 2, (4.17) 

and with τk,2 < (1−ζ 2 k )2 /(1−ζk) the convergence is maintained. For a constant 

value of τk,2 we have the upper bound τ2 ≃ 1.18. 

Even though the theory shows that the upper bound of the constant parameters 

τ1 and τ2 is 1.5 and 1.18 respectively experiments is in [16] illustrates that it 

pays to allow a larger value. We therefore choose that the reasonable choices 

are τ1 = 2 and τ2 = 1.5.

52 Semi-Convergence and Choice of Relaxation Parameter

Chapter 5 

Stopping Rules 

In the previous chapter we discussed methods for choosing the relaxation parameter. 

In this chapter we will look at strategies for determining the optimal 

number of iterations k∗. We will present three strategies. The first two strategies 

require some kind of knowledge of the noise level δ and also a user-chosen 

parameter τ. We will for both these strategies present a training strategy to 

choose a reasonable value of τ. In the following chapter we let · denote the 

2-norm · 2. 

5.1 Stopping Rules with Training 

In this section we will introduce a general rule to determine the appropriate 

stopping index k∗ and from this general rule we will focus on two already known 

special cases, which are all described in [15]. 

As in section 4.1 we assume the following additive noise model: 

b = ¯ b + δb, 

where ¯ b is the noise free right-hand side and δb is the noise component, which 

may come from both discretization errors and measurement errors. We also

54 Stopping Rules 

assume that the norm of the error is known: 

δ = δb. 

For notational convenience we assume that λ = λk. 

Proposition 5.1 Let {xk } be given from (3.1), where T = I and rk = M 1 

2 (b − 

Axk ). Put Q = M 1 

2 AATM 1 

2, W = I − λβ 

2(1−α) 

Q, where α, β are given real 

numbers. Let ¯ b ∈ R(A) and ¯x be any solution to Ax = ¯ b and let −1 ≤ τk ≤ 1. 

Put ek = ¯x − x k and t1 = 2λ(1 − α)〈r k , Wr k 〉 then 

where 

e 2 k+1 = e2k − λ(dα,β − 2τkδM 1 

2r k ) − t1, (5.1) 

dα,β = 〈r k , (2α + β − 1)r k + (1 − β)r k+1 〉. (5.2) 

The proof can be found in [15]. From (5.1) we get 

e 2 k+1 ≤ e 2 k − λ(dα,β − τδM 1 

2 r k ) − t1, (5.3) 

where τ = 2 maxk |τk|, such that τ ∈ (0, 2). This means that the error is 

decreasing as long as t1 ≥ 0, dα,β − τδM 1 

2 r k ≥ 0. 

This lead us to the following general rule: 

α, β-rule: 

dα,β 

rk 1 

≤ τδM 2 (5.4) 

 

By using the α, β-rule we search for the smallest iteration number k = kα,β, 

where the monotonicity property ¯x − xk < ¯x − xk+1 are guaranteed. (If 

dα,β/r0 ≤ τδM 1 

2 then kα,β = 0.) 

Proposition 5.2 Let α, β ∈ (0, 1). Then 

and 

λ ≤ λ1 = 

λ ≤ λ2 = 

2(1 − α) 

βσ 2 1 

2α 

(1 − β)σ 2 1 

⇒ t1 ≥ 0, 

⇒ dα,β ≥ 0. 

The proof for can be found in [15]. Using this proposition we should take 

⇒ α + β ≥ 1 and 

λ ≤ λmax = min(λ1, λ2). It can now be seen that λ1 ≤ 2 

σ 2 1

5.1 Stopping Rules with Training 55 

⇒ α + β ≤ 1. This means that λ1 ≤ λ2 ⇒ α + β ≥ 1. From this it 

follows that 

λ2 ≤ 2 

σ 2 1 

λmax = λ1 ≤ 2 

σ 2 1 

= λ2 ≤ 2 

σ 2 1 

= 2 

σ 2 1 

The rule corresponding to λmax = 2 

σ 2 1 

is 

if α + β ≥ 1 

if α + β ≤ 1 

if α + β = 1. (5.5) 

dα,1−α = 〈r k , (2α + 1 − α − 1)r k + (1 − 1 + α)r k+1 〉 

= 〈r k , αr k + αr k+1 〉. 

The ME-rule which we will describe later is a rule of the just mentioned form. 

5.1.1 The Discrepancy Principle 

We will now introduce a specific variant of the α, β-rule (5.4), the well-known 

discrepancy principle (DP) of Morozov. To gain the DP-rule we let α = 0.5, β = 

1 and then by (5.2), d0.5,1 = r k 2 = dDP. The stopping index k = k0.5,1 = kDP 

is then the first index for which 

DP-rule: r k ≤ τδM 1 

2 . (5.6) 

We note that from proposition 5.2 that λ2 = +∞ and λ1 = 1 

σ2 . Hence for the 

1 

DP-rule the error ek is monotonically decreasing for k = 1, 2, . . .,kDP assuming 

that λ ∈ (0, 1/σ2 1 ). 

Since we introduced DP as a specific variant of the α, β-rule, formula (5.6) is 

only valid for the SIRT methods where T = I. By using the original version of 

the discrepancy principle we can also formulate the discrepancy priciple for the 

remaining methods. For these methods the stopping index k = kDP is the first 

index for which 

Ax k − b2 ≤ τδ. 

5.1.2 The Monotone Error Rule 

Another specific variant of the α, β-rule (5.4) the monotone error rule (ME) by 

Hämarik and Tautenhahn [23]. We let α = 1, β = 0 and get d1,0 = dME =


〈r k , r k + r k+1 〉. The stopping index k = k0.5,1 = kME is the first index for 

which 

ME-rule: 

dME 

rk 1 

≤ τδM 2 . (5.7) 

 

From proposition 5.2 we get that λ2 = 2 

σ2 . The expression for λ1 cannot be 

1 

used directly from proposition 5.2 and we must therefore look at the definition 

of t1 given in proposition 5.1. We then have 

t1 = 2λ(1 − α)〈r k , Wr k 〉 

= 2λ〈r k , (1 − α)Wr k 〉 

= 2λ 

 

r k 

, (1 − α)I − λβ 

2 Q 

In this case t1 = 0 and it follows that λmax = 2 

σ 2 1 

 

r k 

 

. 

in accordance with (5.5). 

For the ME-rule the error ek monotonically decreases for k = 1, 2, . . ., kME 

assuming that λ ∈ (0, 2/σ 2 1). The ME-rule in this form is only valid for the 

SIRT methods, where T = I. 

A further investigation and comparison of rules (5.6) and (5.7) can be found in 

[15]. 

5.1.3 The Training Part 

To generate effective stopping rules for the DP-rule and the ME-rule we will 

use training to teach the rule when to stop for a certain data set, the training 

sample. Our hope is that that it will be successful when it is used on different 

data sets not too distant from the training sample. 

From the inequality (5.3) we have that 

where 

e 2 k − e 2 k+1 ≥ Pk, 

Pk = λ(dα,β − τδM 1 

2 · r k ). 

We then have that Pk acts like a predictor for e 2 k − e2 k+1 . As long as Pk > 0 

then the iterations should be continued and spot the first time where 

Pk−1 > 0, Pk ≤ 0.

5.1 Stopping Rules with Training 57 

Using this we obtain the following bounds and acceptance interval for τ: 

Rk = dα,β(k) 

δM 1 

2 rk ≤ τ < dα,β(k − 1) 

δM 1 

2 rk−1 = Rk−1. (5.8) 

The training process consists of the following steps. We assume that the matrix 

A is given. 

1. Choose a test solution ¯x. 

2. Generate rhs ¯ b. 

3. Generate noisy samples of rhs ¯ b: b i = ¯ b + δb i , i = 1, . . . , s. 

4. For each sample b i , i = 1, . . .,s compute {x k (b i )} by using the algorithm 

described by equation (3.1), where T = I. Find the index 

such that the relative error 

is minimal. 

k = k∗ = k∗(i), 

E k(i) = xk (b) − ¯x 

¯x 

5. Use formula (5.8) to find the corresponding interval for τ: 

τ : τ = τi ∈ [R k∗(i), R k∗(i)−1). 

Put ¯τi = mid [Rk∗(i), Rk∗(i)−1) and define ¯τ = 1 s s i=1 ¯τi. 

6. Use τ = ¯τ in the stopping rule. 

In [15] an alternative training scheme is also introduced. In this scheme the 

points 5. and 6. in the above scheme are replaced with points that use the 

length of the τ intervals instead of the τ itself. 

Even though the theory for this training scheme rises from SIRT methods on the 

form (3.1) where T = I, we will in this software package also use this strategy for 

the remaining methods. This requires some changes in the acceptance interval 

(5.8) for the ART methods, since M does not exist for these methods.


5.2 Normalized Cumulative Periodogram 

When we first introduced the iterative methods in chapter 3, we mentioned 

that the number of iterations k can be compared to Tikhononvs regularization 

parameter ω and the truncation parameter for TSVD k. In the choice of an 

optimal value for the number of iterations for the iterative methods we got the 

idea to use the Normalized Cumulative Periodogram (NCP), which is already 

used to determine the regularization parameters for Tikhonov and TSVD. 

The motivation for the NCP method was to find a method to choose the regularization 

parameter without calculating the SVD or looking at the Picard plot. 

In the NCP approach we look at the residual vector r k = b − Ax k as a time 

series and consider the exact right hand side ¯ b as a signal which appears clearly 

different from the noise vector δb. We can do this since we know that ¯ b is a 

smooth function. We then want to find the regularization parameter where the 

residual changes from being signal-like and dominated by components from ¯ b to 

being noise-like and dominated by components of δb. 

In [22] it is discussed that the singular functions are similar to the Fourier basis 

functions and the discrete Fourier transform (DFT) is therefore used in the NCP 

method. 

We let ˆr k denote the DFT of the residual vector r k for the iterative method, 

ˆr k = dft(r k ) = (ˆr k )1, (ˆr k )2, . . .,(ˆr k T m 

)m ∈ C . 

The power spectrum of r k is defined as the real vector 

p k = |(ˆr 2 )1| 2 , |(ˆr 2 )2| 2 , . . .,|(ˆr 2 )q+1| 2 T , q = ⌊m/2⌋, 

where q denotes the largest integer such that q ≤ m/2. 

We then define the normalized cumulative periodogram (NCP) for the residual 

vector r k as the vector c(r k ) ∈ R q as 

c(r k )i = (pk )2 + . . . + (pk )i+1 

(pk )2 + . . . + (pk , i = 1, . . .,q. 

)q+1 

If the residual vector consists of white noise, then by definition the expected 

power spectrum is flat, i.e. E((p k )2) = E((p k )3) = . . . = E((p k )q+1). Hence 

the points on the NCP curve (i, E(c(r k )i)) lie on the straight line from (0, 0) 

to (q, 1). Actual noise does not have an ideal flat spectrum, but we can still 

expect the NCP to be close to a straight line. A statistical method to determine 

whether the NCP is within a straight line is that with a 5 % signification level

5.2 Normalized Cumulative Periodogram 59 

the NCP curve must lie inside the Kolmogorov-Smirnoff limits ±1.35q 1/2 of the 

straight line. 

In practice it can be difficult to achieve the Kolmogorov-Smirnoff limits, and we 

will instead choose the regularization parameter for which the residual r k represents 

white noise the most, in the sense that NCP is closest to a straight line. We 

measure the 2-norm between the NCP and the vector cwhite = (1/q, 2/q, . . ., 1) T . 

We then define the NCP method as choosing k∗ = kNCP as minimizer of: 

d(k) = c(r k ) − cwhite2.

60 Stopping Rules

Chapter 6 

Test Problems 

This software package includes three tomography test problems: parallel- and 

fan beam tomography and seismic tomography. Both parallel- and fan beam 

tomography arise from transmission tomography [32], [31], [28] where we study 

an object with nondiffractive radiation, i.e. X-rays. The loss of intensity of 

the X-rays are recorded by a detector and used to produce a two-dimensional 

image of the irradiated object. If we let I0 denote the intensity of beam L from 

the source, f(x) denote the linear attenuation coefficient at the point x, and I 

denote the intensity of the beam after having passed the object, then 

which can also be written as 

 

I 

L 

I0 

f(x)dx = log I0 

I , 

R 

− 

= exp L f(x)dx . 

This provides us with the line integrals of the function f along the lines L. The 

transform that maps the function on R 2 into a set of line integral are called the 

Radon transform [31]. 

The difference between parallel- and fan beam tomography lies in the arrangement 

of the rays L. For parallel beams the rays rise from sources arranged in 

parallel and with equally spacing. To get a better representation of the radiated

62 Test Problems 

ray i 

x 1 

x 2 

x 

3 

x 

4 

x 

5 

x 

6 

x 

7 

x 

8 

x 

9 

x 

10 10 10 10 10 10 10 10 

x 

11 

x 

12 

x 

13 

x 

14 

x 

15 

x 

16 

x 

17 

x 

18 

x 

19 

x 

20 

x 

21 

x 

22 

x 

23 

x 

24 

x 

25 

x 

26 

x 

27 27 27 27 27 27 27 27 

x 

28 28 28 28 28 28 28 28 

x 

29 

x 

30 

x 

31 

x 

32 

x 

33 33 33 33 33 33 33 33 

x 

34 34 34 34 34 34 34 34 

x 

35 

x 

36 

Figure 6.1: Illustration of parallel beam tomography for a specific angle of the sources. 

domain the sources can be rotated around the domain using different angles θ 

in such a way, that the rays are still parallel. Figure 6.1 illustrates an example 

of a discretized domain with parallel rays for a given angle of the sources. 

For fan beam tomography we only have a single source. From this source a 

number of rays are then arranged like a fan. There are two types of fan beam 

tomography, depending on whether the rays are equiangular or equispaced. Figure 

6.2 illustrates a discretized case of fan beam for equiangular rays, where the 

green circle illustrates the source and the red lines the rays. To get a better representation 

of the domain the source can be rotated around the domain keeping 

the distance to the center of the domain constant. 

Seismic tomography is a part of the geophysical tomography problems. In seismic 

tomography the travel time through a domain of the subsurface of the earth 

is observed. From inversions of the line integrals along the seismic waves the 

structure of the subsurface is estimated. The travel time tL for ray L can be 

expressed as 

 

tL = s(l)dl, 

where s(l) is the slowness, which is the reciprocal of the velocity. 

L

x 

1 

x 

2 

x 

3 

x 

4 

x 

5 

x 

6 

x 

7 

x 

8 

x 

9 

x x x x 

13 19 25 31 

x x x x 

14 20 26 32 

x x x x 

15 21 21 21 21 21 21 21 21 27 27 27 27 27 27 27 27 33 33 33 33 33 33 33 33 

x x x x x 

10 16 22 28 28 28 28 28 28 28 28 34 34 34 34 34 34 34 34 

x x x x x 

11 17 23 29 35 

x x x x x 

12 18 24 24 24 24 24 24 24 24 30 30 30 30 30 30 30 30 36 36 36 36 36 36 36 36 

Figure 6.2: Illustration of fan beam tomography. 

In our seismic tomography problem we consider a 2-dimesional subsurface slice. 

On the right side of the subsurface s sources are positioned, such that the 

distance between the sources is constant and the distance from the top located 

source to the surface and the distance from the bottom source to the boundary 

of the domain is half the distance between two sources. On the left side of 

the subsurface and on the surface a total of p seismographs or receivers are 

located under the same conditions as the sources. For each source s p rays are 

transmitted, such that all receivers are hit. Figure 6.3 illustrates the set-up of 

the seismic tomography problem, where the green circles denote the sources, the 

blue squares denote the receivers and the red lines denote the rays from one of 

the sources. 

To apply the three test problems we need a formulation as a linear system on 

the form Ax = b. This can be done similarly for all three test problems, since 

only the arrangement of the rays is different. To avoid confusions we observe a 

domain, which is described by the function f, which is either the object from 

parallel or fan beam or the structure of the subsurface. We start by dividing 

the domain into N parts of unit lengths in each of the dimensions. This gives 

us N 2 square cells. All cells are numbered from 1 to N 2 starting with the cell 

in the upper left corner to the cell in the bottom right corner running along the 

63

64 Test Problems 

x 

1 

x 

2 

x 

3 

x 

4 

x 

5 

x 

6 

x 

7 

x 

8 

x 

9 

x 

10 

x 

11 

x 

12 

x 

13 

x 

14 

x 

15 

x 

16 16 16 16 16 16 16 16 16 16 16 16 

x 

17 

x 

18 

x 

19 

x 

20 

x 

21 

x 

22 22 22 22 22 22 22 22 22 22 22 22 

x 

23 

x 

24 

x 

25 

x 

26 

x 

27 27 27 27 27 27 27 27 27 27 27 27 

x 

28 

x 

29 

x 

30 

x 

31 

x 

32 

x 

33 

x 

34 34 34 34 34 34 34 34 34 34 34 34 

x 

35 

x 

36 

Figure 6.3: Illustration of seismic tomography. 

columns, i.e. the numbering from figure 6.1, 6.2 and 6.3. Each cell j is assigned 

a constant value xj, which is an approximation of the average of the function f 

within the j’th cell. In this way the reshaped vector x is a discretized version 

of the ”true” function f. 

For illustration we consider the i’th ray in figure 6.1, which passes through cells 

in the domain. We define the element aij as the length of the i’th ray through 

cell j, i.e. aij = 0 if ray i does not pass through cell j. The contribution from 

ray i through cell j is then the length multiplied with the value of cell j, i.e. 

aij · xj. The measurements bi is then: 

 

bi = aijxj, i = 1, . . .,M, 

N 2 

j=1 

where M is the number of rays. 

The used exact solution depends on the chosen test problem. For the parallel 

and fan beam test problems the exact solution is the modified Shepp-Logan 

phantom head defined in [37]. The Shepp-Logan phantom is a famous model of 

the brain based on ellipses. The phantom is often used for medical tomography

Shepp−Logan Phantom, N = 100 

(a) The modified Shepp-Logan phantom. 

Seismic Phantom, N = 100 

(b) The seismic phantom subsurface. 

Figure 6.4: The exact solutions for the test problems. 

and can be scaled for different discretizations. In this modified version the 

contrast is improved for a better visiualization. Figure 6.4 (a) illustates the 

modified Shepp-Logan phantom for N = 100. 

For the seismic tomography test problem we have chosen to create our own 

phantom. This phantom is an illustration of a 2-dimensional subsurface of 

simple convergent boundaries of two tectonic plates with different slowness. We 

have chosen the case where the plates create a subduction zone, since one plate 

moves underneath the other. Also this test phantom can be scaled for different 

discretizations. Figure 6.4 (b) illustrates the tectonic phantom for N = 100. 

65

66 Test Problems

Chapter 7 

Testing the Methods 

In this chapter we will investigate the performance of the implemented iterative 

methods and the corresponding strategies. When performing these investigations 

we must pay attention to the term inverse crime. Inverse crime arises when 

the same model is used to produce simulated data and to invert data or when 

the same discretization is used to simulate and to invert. Inverse crime often 

results in problems that are easier to solve than problems that arise from real 

data, but if the algorithms do not work on inverse crime problems we cannot 

hope that they will work on real data. In this chapter we will use a standard 

test problem with inverse crime. 

The standard test problem will be used for almost every test case. We choose 

the parallel beam tomography test problem with the discretization N = 100. 

The angles of the sources are chosen to start with the angle 0 degrees and end 

with 179 degrees with a gap of 5 degrees. For each of these 36 angles we use 150 

parallel rays. The generated matrix A then has the dimension 5400 × 10000, 

which means the the system is underdetermined. We create a noisy right-hand 

side by adding white Gaussian noise with noise level δ = 0.05.

68 Testing the Methods 


0.5 

0.45 

0.4 

0.35 

0.3 

0.25 

0.2 

0 0.5 1 1.5 2 2.5 

(a) SNARK: Relative error histories. 

k optimal 

100 

80 

60 

40 

20 

λ 

0 

0 0.5 1 1.5 2 2.5 

(c) SNARK: Optimal number of 

iterations. 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0 0.5 1 1.5 2 2.5 

(e) fanbeamtomo: Relative error 

histories. 

k optimal 

100 

90 

80 

70 

60 

50 

λ 

λ 

40 

0 0.5 1 1.5 2 2.5 

(g) fanbeamtomo: Optimal number 

of iterations. 

λ 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0 0.5 1 1.5 2 2.5 

(b) paralleltomo: Relative error 

histories. 

k optimal 

100 

80 

60 

40 

λ 

20 

0 0.5 1 1.5 2 2.5 

(d) paralleltomo: Optimal 

number of iterations. 


0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0 0.5 1 1.5 2 2.5 

(f) seismictomo: Relative error 

histories. 

k optimal 

100 

80 

60 

40 

20 

λ 

λ 

0 

0 0.5 1 1.5 2 2.5 

(h) seismictomo: Optimal number 

of iterations. 

Figure 7.1: Relative error histories for the DROP method for four different test problems. 

λ

7.1 Convergence of DROP 69 


k optimal 

0.5 

0.45 

0.4 

0.35 

0.3 

0.25 

0.2 

0 0.02 0.04 0.06 0.08 0.1 

(a) SNARK head phantom 

100 

80 

60 

40 

20 

λ 

0 

0 0.02 0.04 0.06 0.08 0.1 

(c) SNARK head phantom 

λ 


k optimal 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0 0.02 0.04 0.06 0.08 0.1 

100 

90 

80 

70 

60 

λ 

(b) paralleltomo 

50 

0 0.02 0.04 0.06 0.08 0.1 

λ 

(d) paralleltomo 

Figure 7.2: The minimmu relative error af function of the relaxation parameter,when 

the weighting w is random numbers from 0 to 50. 

7.1 Convergence of DROP 

In section 3.1.5 when we derived the SIRT method DROP, we mentioned that 

the upper bound of the convergence interval for DROP can be estimated by 

2/ max(wi) for i = 1, . . .,m, where wi > 0 denotes the user-defined weighting 

of the equations. In this test we will look at the consequence for choosing 

this interval instead of the originally derived interval (0, 2/ρ(S −1 A T DA)). The 

advantage of using the simplified upper bound 2/ max(wi) is that we then do 

not have to compute the spectral radius ρ(S −1 A T DA), since this can be very 

expensive. 

For this test we will not only use the standard test problem. We will also use 

a test problem from fan beam tomography, seismic tomography and a special 

variant of the SNARK phantom head, which is not available in the software 

package. 

The size of the convergence interval has influence on the choice of the relaxation 

parameter λ = λk. We will therefore for the different mentioned test problems 

study the relative error histories for the different values of λ and look at the 

optimal number of iterations. In this way it should be clear what is lost by using


the simplified version of the convergence interval. 

Figure 7.1 illustrates the four different test problems and for each problem the 

minimum relative errors and the corresponding number of iterations. The vertical 

dotted line illustrates the upper bound when using 2/ max(wi). In section 

4.2.1 we defined the optimal value of the relaxation parameter λ to be the λ, 

which gives rise to the fastest convergence to the smallest relative error in the 

solution. We notice that only for the test problem SNARK the simplified convergence 

interval will contain the optimal value of λ. For all the other test 

problems the optimal value of λ is cut off. 

To get a better idea of the performance of the interval we chose to include a 

weighting matrix not equal to 1. The vector w is created as random numbers 

between 0 and 50. Figure 7.2 illustrates the minimum relative errors and the 

corresponding number of iterations for the test problem SNARK and the standard 

test problem. Again the vertical dotted line illustrates the simplified upper 

bound of the convergence interval. For this example we see that a lot of the 

original convergence interval is removed, and again the optimal value of λ is cut 

off. 

Based on these observations we conclude that using the simplified convergence 

interval is not a good idea if you are interested in finding an optimal value of 

the relaxation parameter λ, since we risk losing the optimal value of λ. We 

have therefore chosen that our implementation of the DROP method uses the 

original but more expensive convergence interval. 

7.2 Symmetric Kaczmarz as a SIRT Method 

When we in section 3.2.2 introduced the symmetric Kaczmarz method we mentioned 

that it could be rewritten on the SIRT form (3.1), in such a way that the 

matrix MSA is symmetric which means that the derived theory for the SIRT 

methods is also valid for the symmetric Kaczmarz method. Since we are not 

interested in computing the matrix MSA, the only strategies we can use are the 

Ψ1- and Ψ2-based relaxation strategies to chose the relaxation parameter λk. 

We will not include the modified Ψ1 and Ψ2 strategies, since we from the paper 

[16] do not have any good choice of the parameter τ and it is not a part of this 

project to discover the performance of this parameter. 

Figure 7.3 illustrates the relative error histories for the solutions with three 

different choices of λk. The red circles denote the relative error of the solutions 

when the Ψ1-based relaxation is chosen, the blue triangles illustrate the relative

7.3 Test of the Choice of Relaxation Parameter 71 


0.9 

0.8 

0.7 

0.6 

0.5 

symkaczmarz: Ψ−based relaxations 

0.4 

0 5 10 

k 

15 20 

Ψ 1 

Ψ 2 

λ = 0.25 

Figure 7.3: Ψ-based relaxation for symkaczmarz. 

error of the solutions when the Ψ2-based relaxation is chosen and the pink 

diamonds illustrate the relative error histories when we chose λ = 0.25. We 

notice that for both the Ψ1- and Ψ2-based relaxations the relative error decreases 

and levels out as the number of iterations increase which is the behavior we 

would expect. We also notice that the relative error for the Ψ1- and Ψ2-based 

relaxations do not reach the same level as the relative errors for the solutions 

where we have a constant value of λ, in the part of the interval, where we would 

expect the optimal value to be. We will later compare the performance of the 

strategies for choosing the relaxation parameter. 

7.3 Test of the Choice of Relaxation Parameter 

In section 4.2 we introduced several methods or strategies to select the relaxation 

parameter λk in a reasonable way. We will in this section investigate the 

performance of each of the methods or strategies. 

Training 

We start by investigating our developed strategies for finding the optimal value 

of λ = λk using training. In this test case we give the algorithm the best con-



1 

0.9 

0.8 

0.7 

0.6 

0.5 

Minimum relative error as function as λ 

0.4 

0 50 100 150 200 250 300 

λ 

(a) The minimum relative errors as function 

as λ. 

# iterations k 

100 

80 

60 

40 

20 

# iterations k as function of λ 

0 

0 50 100 150 200 250 300 

λ 

(b) The corresponding number of iterations 

k for the minimum relative error as 

function as λ. 

Figure 7.4: Illustration of the minimum relative errors for different lambda values and 

the corresponding number of iterations for Cimmino’s projection method. 


1 

0.9 

0.8 

0.7 

0.6 

0.5 


0.4 

0 0.5 1 1.5 2 

λ 


as λ. 


10 

8 

6 

4 

2 


0 

0 0.5 1 1.5 2 

λ 





the corresponding number of iterations for Kaczmarz’s method.



1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 


0.3 

0 0.5 1 1.5 2 

λ 


as λ. 


20 

15 

10 

5 


0 

0 0.5 1 1.5 2 

λ 





the corresponding number of iterations for the randomized Kaczmarz method. 


1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

Training to optimal λ 

landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

0 20 40 60 80 100 

number of iterations k 

Figure 7.7: The relative errors for the SIRT methods with optimal λ value.



0.9 

0.8 

0.7 

0.6 

0.5 

0.4 


kacczmarz 

symkaczmarz 

randkaczmarz 

0.3 

0 5 10 


15 20 

Figure 7.8: The relative errors for the ART methods with optimal λ value. 

ditions for determining the optimal value of the constant relaxation parameter 

λ, since we will use the training method on the problem we want to solve. 

Since we in our implementation have made it possible for the user to chose a 

maximum number of iterations we will investigate both the behaviour of the 

training methods, if the maximum number of iteration is chosen in a sensible 

way, and also the behaviour if the maximum number of iterations is chosen to 

be too small. 

In the following investigations we have chosen not to visualize the behaviour 

of all the implemented iterative methods since the performance of all methods 

can be represented by only a few examples. Figure 7.4 illustrates the minimum 

relative error as function as λ (the left figure) and the corresponding number 

of iterations needed to obtain this (the right figure) for Cimmino’s projection 

method. In the left figure we observe, that the minimum relative errors as 

expected are almost equal except in the beginning and in the end of the convergence 

interval. The red square denotes the λ-value found by the training 

strategy and the corresponding relative error. As we would expect the relative 

error is smaller than the upper bound of the resolution limit which is denoted 

by the green dashed lines. We then look at the right figure and observe that the 

found value denoted by the red diamond is very close to the minimum number 

of iterations used. As mentioned this example illustrates the typical behavior 

of the SIRT methods, and from this we are very satisfied with the performance 

of the training strategy for the SIRT methods.


We then take a look at figure 7.5. Again the figure illustrates the minimum 

relative error as function as λ (the left figure) and the corresponding number of 

iterations needed to obtain this (the right figure) but for Kaczmarz’s method. 

As expected only a small interval of the λ values have minimum relative errors 

below the upper bound of the resolution limit, and we notice that the λ found by 

the training strategy (the red square) lies just below this upper bound. When 

involving the number of iterations we notice that the found λ value (the red 

diamond) is in fact the value which is below the upper bound of the relative error 

and uses the minimum number of iterations. We notice that a lot of λ-values 

have a smaller number of iterations, but from the minimum relative errors they 

can be eliminated, since they are above the upper bound of the resolution limit. 

The behaviour of Kaczmarz’s method is similar to the symmetric Kaczmarz 

method, but for the randomized Kaczmarz method we observe deviation. Figure 

7.6 illustrates the behaviour of the randomized Kaczmarz method. We see 

that the performance of the left figure is similar to the figure for Kaczmarz’s 

method, but the right figure looks different. Since this method involves a random 

selection of the rows we get a stocastic result and we can only discuss the 

performance of the method as an average. We can then from the right figure see 

that the overall performance is close to the performance for Kaczmarz’s method. 

Figure 7.7 illustrates the relative errors for all the SIRT methods, when the 

optimal value of λ is used. We notice that the performance of the methods are 

almost equal except for Landweber’s method which has slower semi-convergence 

than the other methods. SART also returns a result which is sligthly better 

than most methods. We also notice that Cimmino’s projection method and 

Cimmino’s reflection method return the exact same solutions, but the relaxation 

parameter is exactly twice as big for the projection method as for the reflection 

method. When returning to the formulations of the two problems we also notice 

that only a factor 2 differs between the two methods. Therefore when we for 

one method can find the optimal value, the other must depend on a factor 2. 

Figure 7.8 illustrates the relative errors for all the ART methods, when the 

optimal value of λ is used. From this we notice that Kaczmarz’s method and 

symmetric Kaczmarz have similar behaviour, while randomized Kaczmarz seems 

to reach semi-convergence later than the other methods, but it seems to stay at 

the semi-convergence level. 

As metioned in section 4.2.1 the implemented strategies have a different approach 

to find the optimal value of λ if too few numbers of iterations are used. 

Figure 7.9 illustrates the minimum relative error in figure (a) and figure (b) 

illustrates the relative error histories of nine different values of λ for Cimmino’s 

projection method. From (b) we clearly see that the minimum relative error is 

found after 20 iterations which in this case is also the allowed maximum number 

of iteration. This obviously has an effect on figure (a), since semi-convergence



1 

0.9 

0.8 

0.7 

0.6 

0.5 


0.4 

0 50 100 150 200 250 300 

λ 


as λ. 

λ = 21.7167 

1 

0.5 

0 10 20 

λ = 130.3002 

1 

0.5 

0 10 20 

λ = 211.7378 

1 

0.5 

0 10 20 

λ = 76.0084 

1 

0.5 

0 10 20 

λ = 157.4461 

1 

0.5 

0 10 20 

λ = 238.8837 

1 

0.5 

0 10 20 

λ = 103.1543 

1 

0.5 

0 10 20 

λ = 184.5919 

1 

0.5 

0 10 20 

λ = 266.0296 

1 

0.5 

0 10 20 




Figure 7.9: Illustration of the minimum relative errors for different lambda values 

and the corresponding number of iterations for Cimmino’s projection method when the 

maximum number of iterations is 20. 


1 

0.9 

0.8 

0.7 

0.6 

0.5 


0.4 

0 0.5 1 1.5 2 

λ 


as λ. 

λ = 0.16327 

1 

0.5 

0 2 4 

λ = 0.97959 

1 

0.5 

0 2 4 

1 

λ = 1.5918 

0.5 

0 2 4 

λ = 0.57143 

1 

0.5 

0 2 4 

1 

λ = 1.1837 

0.5 

0 2 4 

1 

λ = 1.7959 

0.5 

0 2 4 

λ = 0.77551 

1 

0.5 

0 2 4 

1 

λ = 1.3878 

0.5 

0 2 4 

1 

λ = 2 

0.5 

0 0.5 1 





and the corresponding number of iterations for Kaczmarz’s method when the maximum 

number of iterations is 4.



1 

0.9 

0.8 

0.7 

0.6 

0.5 


0.4 

0 0.5 1 1.5 2 

λ 


as λ. 

λ = 0.16327 

1 

0.5 

0 2 4 

λ = 0.97959 

1 

0.5 

0 2 4 

1 

λ = 1.5918 

0.5 

0 2 4 

λ = 0.57143 

1 

0.5 

0 2 4 

1 

λ = 1.1837 

0.5 

0 2 4 

1 

λ = 1.7959 

0.5 

0 2 4 

λ = 0.77551 

1 

0.5 

0 2 4 

1 

λ = 1.3878 

0.5 

0 2 4 

1 

λ = 2 

0.5 

0 0.5 1 





and the corresponding number of iterations for the randomized Kaczmarz method when 

the maximum number of iteration is 4. 

is not reached for the λ values. In this case the optimal value of λ is only found 

based on the relative error, and from the red square in figure (b) we conclude 

that the found λ is reasonable, and that our developed strategy for finding the 

optimal value of λ performed as expected. 

Figure 7.10 illustrates the minimum relative error in figure (a), and figure (b) 

illustrates the relative error histories of nine different values of λ for Kaczmarz’s 

method. Again we see from figure (b) that the maximum number of iterations is 

reached for each value of λ, and again the optimal value of λ is found based on the 

minimum in figure (a). The found value, the red square, seems to be reasonable. 

Figure 7.11 illustrates the minimum relative error and the relative error histories 

for the randomized Kaczmarz method. We notice that the found value of λ has 

a relative error below the upper bound of the resolution limit. We also notice 

that the curve for minimum relative errors is more flat for randomized Kaczmarz 

than for Kaczmarz’s method, which can make it difficult for the algorithm to 

determine which of the values to choose. 

Line Search 

The next strategy of choosing the relaxation parameter we will investigate is 

line search. As metioned when we introduced line search in section 4.2.2, this 

strategy can only be used for SIRT methods, where T = I. Figure 7.12 illustrates 

the relative error histories for Landwebers method, Cimmino’s projection 

method, Cimmino’s reflection method, and the CAV method, when λ is cho-


Relative Error 

0.8 

0.7 

0.6 

0.5 

0.4 

Relative errors 

landweber 

cimminoProj 

cimminoRefl 

cav 

0.3 

0 10 20 30 40 50 


Figure 7.12: Relative error histories for the relaxation parameter λ chosen with line 

search. 

sen using line search. We notice that besides the semi-convergence behaviour, 

then both of the Cimmino methods, and for CAV the error has a zigzagging 

behaviour. Experience shows that this behavior depends on the noise on data. 

For small noise levels the zigzagging is almost invisible but for larger noise levels 

the erratic behaviour increases. The explanation of this behaviour seems to be 

that line search assumes consistent data which is not the case in out test problem. 

The conclusion of the performance of the line search strategy is then that 

for small noise levels we get a good performance, but not for larger noise levels. 

Relaxation to Control Noise Propagation 

The last of the introduced strategies for choosing the relaxation parameter actually 

consists of four different strategies, since it consists of both the Ψ1- and 

the Ψ2-based relaxation and their modified versions. Since we earlier in this 

chapter saw that the symmetric Kaczmarz method could also be used together 

with these strategies, we will test the strategies on the SIRT methods and the 

symmetric Kaczmarz method. 

Figure 7.13 illustrates the relative error histories when λk is chosen using the 

Ψ1- and Ψ2-based relaxations. We notice the relative error remains almost 

constant after the minimum has been obtained for both Ψ1 and Ψ2, which shows 

that these strategies indeed dampens the influence of the noise-error, which




0.9 

0.8 

0.7 

0.6 

0.5 

Relative errors for Ψ 1 

landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

symkaczmarz 

0.4 

0 50 100 150 


0.9 

0.8 

0.7 

0.6 

0.5 

(a) Ψ1-based relaxation. 


0.9 

0.8 

0.7 

0.6 

0.5 


landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

symkaczmarz 

0.4 

0 50 100 150 


(b) Ψ2-based relaxation. 

Figure 7.13: The relative error histories for Ψ1- and Ψ2-based relaxations. 

Relative errors for Ψ 1 modified 

landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

0.4 

0 50 100 150 


(a) Modified Ψ1-based relaxation. 


0.9 

0.8 

0.7 

0.6 

0.5 


landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

0.4 

0 50 100 150 


(b) Modified Ψ2-based relaxation. 

Figure 7.14: The relative error histories for the modified Ψ1- and Ψ2-based relaxations.


Minimum relative error Training Line search Modified Ψ2 

Landweber 0.3874 (70) 0.3875 (49) 0.4519 (150) 

Cimmino (projection) 0.4240 (40) 0.4267 (33) 0.4503 (150) 

Cimmino (reflection) 0.4240 (40) 0.4267 (33) 0.4503 (150) 

CAV 0.4235 (40) 0.4266 (33) 0.4504 (150) 

DROP 0.4262 (39) - 0.4513 (150) 

SART 0.4048 (45) - 0.4399 (150) 

Kaczmarz 0.4246 (3) - - 

Symmetric Kaczmarz 0.4247 (2) - 0.5010 (7)* 

Randomized Kaczmarz 0.3957 (9) - - 

Table 7.1: Table of the minimum relative error for each SIRT or ART method combined 

with the different strategies for choosing λ. The numbers appearing in brackets 

are the used number of iterations. The * for symmetric Kaczmarz denote that for this 

method the modified Ψ2 strategy is not used. Instead is the Ψ1 strategy used, since this 

strategy gave the best result for the symmetric Kaczmarz method. 

reduces the sensitivity of the solution such that the influence of the choosing 

too many iterations is dampened. We notice that for the SIRT methods the 

performance for Ψ2 is better than Ψ1, while for the symmetric Kaczmarz method 

the performance of Ψ1 is better than Ψ2. 

For the modified versions of the Ψ1- and Ψ2-based relaxations the parameter τ 

is chosen based on the results from [16]. Figure 7.14 illustrates the relative error 

histories, when λk is chosen using the modified Ψ1- and Ψ2-based relaxations. 

Again we notice that the influence of the noise error is dampened. Comparing 

the modified versions with the original version we notice that the modified 

strategies reach a lower level of relative errors within the same number of iterations. 

The conclusion is therefore that the acceleration of the strategies seems to 

be a good idea. As metioned we have only used a constant value of the parameter 

τ but it could be interesting to see if choosing τk depending on the iteration 

could give an even better result. This investigation and a closer investigation of 

how to determine a constant “optimal” value of the parameter τ is not a part 

of this project, and will therefore not be investigated further. 

Comparisation of the Relaxation Strategies 

By observing the figures 7.7, 7.12 and 7.14 we can compare the performance of 

the different relaxation strategies, since they are applied on the same problem. 

When comparing the methods with the different strategies we will consider both 

the minimum relative error and the used number of iterations for this minimum.





0.7 

0.6 

0.5 

0.4 

0.3 


landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

0.2 

0 20 40 60 80 100 


(a) Trained λ for SIRT methods. 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0 50 100 150 


0.7 

0.6 

0.5 

0.4 

0.3 


landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

symkaczmarz 

(c) Ψ1-based relaxation. 


landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

0.2 

0 50 100 150 


(e) Modified Ψ1-based relaxation. 


0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0 10 20 30 40 50 





0.7 

0.6 

0.5 

0.4 

0.3 


kacczmarz 

symkaczmarz 

randkaczmarz 

0.2 

0 5 10 


15 20 

(b) Trained λ for ART methods. 

0.7 

0.6 

0.5 

0.4 

0.3 


landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

symkaczmarz 

0.2 

0 50 100 150 


0.7 

0.6 

0.5 

0.4 

0.3 

Relative errors 

(g) Line search. 

(d) Ψ2-based relaxation. 


landweber 

cimminoProj 

cimminoRefl 

cav 

drop 

sart 

0.2 

0 50 100 150 


(f) Modified Ψ2-based relaxation. 

landweber 

cimminoProj 

cimminoRefl 

cav 

Figure 7.15: The relative error histories for the SNARK test problem using the different 

relaxation strategies.


The minimum relative errors and used iterations is gathered in table 7.1. For 

the Ψ-based strategies we have only shown the best result, which for all the 

SIRT methods was with the modified Ψ2 strategy, but for symmetric Kaczmarz 

was the Ψ1 strategy. 

By looking at figure 7.7 and table 7.1 we notice that most of the methods are 

almost equally good, when the optimal relaxation parameter is found for each 

method. The only method that has a smaller minimum but found with more 

iterations is the Landweber method. From figure 7.12 and the table, where 

line search is used to compute the relaxation parameter we notice that again 

Landweber has a smaller minimum relative error than the other methods but 

uses more iterations. Comparing the minimum relative errors obtained with the 

training strategy with the line search strategy we see that the in general gives 

almost the same relative errors. Regarding the used number of iterations line 

search uses a few less than with an optimal value of the relaxation parameter. 

We then compare with figure 7.14 (b), since we have already concluded that the 

modified Ψ2-based relaxation gives the best results of the Ψ-based relaxations. 

For the modified Ψ2-based relaxation we see that all methods perform equally 

well with this strategy. Comparing this with the other strategies we conclude 

that the minimum relative error is almost the same as for the other methods. 

Concerning the number of iterations the modified Ψ2 strategy has not found the 

minimum after 150 iterations, since the strategy dampens the noise-error, but 

we also notice that not much has happened with the relative error for the last 

50 iterations. 

The conclusion for the SIRT methods with this test problem must be, that all 

three introduced strategies gives satisfactory results. The risk when using line 

search is that the method assumes consistency which we cannot guarantee for 

large noise levels and in this case it seems that the modified Ψ2 strategy is a 

good alternative. What is interesting is that the training strategy gives the best 

result but one must also keep in mind that the training strategy is given optimal 

conditions since training and solving are performed on the same problem. The 

difference between training and the other strategies is that training gives a constant 

relaxation parameter, where the other methods have adaptive relaxation 

parameters. We can therefore conclude that using both using a constant and 

an adaptive relaxation parameter seems to be a good choice. 

For the ART methods we only have the strategy of finding an optimal relaxation 

parameter by using training. Only for symmetric Kaczmarz the Ψ1- and Ψ2based 

relaxations are defined. For this method we compare the result from figure 

7.8 with figure 7.13. From this we notice that using the Ψ1-based relaxation we 

get the highest minimum relative error which is obtained after 7 iterations. The 

number of iterations is therefore larger for the Ψ1-based relaxation than using 

training to find an optimal value where the minimum relative error was found

7.4 Stopping Rules 83 

after 2 iterations. Even though the constant relaxation strategy performs better 

we must keep in mind, that the Ψ-based relaxations give good results without the 

need of knowing the exact solution, which is the case for the training strategy. 

For the remaining ART methods, where we only have the training strategy we 

could wish that an adaptive strategy existed, since it seems to give better results. 

As mentioned when the test problem was introduced, we have commited inverse 

crime when we created the test problem. To investigate the performance of the 

different relaxation strategies, when the test problem is not created with inverse 

crime we use the earlier used test problem SNARK. Figure 7.15 illustrates the 

relative error histories for the different methods using the different relaxation 

strategies, when the test problem SNARK is used. By looking at figure (a) 

we notice that for this test problem some methods perform better than others 

meaning that the minimum relative error is smaller for some methods than 

for others. We notice that this is also the case for other relaxation strategies. 

We also notice that the minimum relative error is almost the same whatever 

relaxation strategy is used. From our own results and the results in [16], that 

also uses the SNARK test problems we can conclude that for small noise levels 

line search is a very effective method, but for larger noise levels, where line 

search has erratic behaviour the Ψ-based relaxations are perferred, since the 

performance is almost equal, but the dampening of the error is better. 

We find it very interesting that the comparisons are the same for the two test 

problems when looking at the training strategy. Even though the training strategy 

did not seem to be a bad idea, the problem with this strategy is still, that 

one must have a similar test problem to train on. The line search method is 

only defined for a few of the SIRT methods while the Ψ-based relaxation seems 

to perform well on all SIRT methods even though the theory is only valid for 

SIRT methods where T = I. 

7.4 Stopping Rules 

To complete the testing of the introduced strategies and methods we also want 

to take a closer look at the performance of the different stopping rules which 

we introduced in section 5. Since two of the methods require training of a 

parameter, we start by looking at the performance of this training.


τ 

55 

50 

45 

40 

35 

30 

25 

τ for cimminoProj 

DP 

ME 

20 

10 20 30 

number of samples s 

40 50 

Figure 7.16: The trained value of τ for different number of samples for both the discrepancy 

principle (DP) and the monotone error rule (ME) using Cimmino’s projection 

method. 

τ 

1600 

1400 

1200 

1000 

800 

τ for drop 

DP 

ME 

600 

10 20 30 


40 50 

Figure 7.17: The trained value of τ for different number of samples for both the discrepancy 

principle (DP) and the monotone error rule (ME) using the DROP method.


τ 

820 

810 

800 

790 

780 

770 

760 

τ for kaczmarz 

750 

DP 

740 

10 20 30 


40 50 

Figure 7.18: The trained value of τ for different number of samples for the discrepancy 

principle (DP) using Kaczmarz’s method. 

Training 

As already mentioned the stopping rules DP and ME require training of the 

parameter τ, but when training this parameter the user must select the number 

of samples s which the parameter τ will be based on. It therefore makes sence 

first to investigate the influence of the number of samples s. 

Figure 7.16 illustrates the value of the trained parameter τ for a different number 

of samples s using Cimmino’s projection method. The blue circles denote the 

value of the parameter τ for DP and the red squares denote the value for ME. 

We see that except when using only 10 samples, the trained values almost do 

not vary. This is the case for both the DP and the ME parameter. Figure 7.17 

also illustrates the trained parameters τ for both the DP and the ME but when 

using the DROP method. We notice that the behaviour from figure 7.16 repeats, 

and that only the value esimated from 10 samples vary a lot. We let the results 

from the DROP method and Cimmino’s projection method be representative 

examples of the SIRT methods and conclude that using 15-20 samples would 

be a good choice, since the running time also increases as s increases. Figure 

7.18 illustrates the variation of the τ parameter for DP when using Kaczmarz’s 

method. Using the result for this representative example of the ART method 

we come to the same conclusion as for the SIRT methods, which is that using 

15-20 samples for the ART method is a good choice.





0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

landweber 

min rel. error 

DP 

ME 

NCP 

0.3 

0 50 100 150 

number of iteration k 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

(a) Landweber 

0.3 

0 20 40 60 80 100 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

cimminoRefl 

(c) Cimmino’s reflection 

drop 


DP 

ME 

NCP 

0.3 

0 20 40 60 80 100 


(e) DROP 


DP 

ME 

NCP 




0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

cimminoProj 


DP 

ME 

NCP 

0.3 

0 20 40 60 80 100 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

(b) Cimmino’s projection 

cav 


DP 

ME 

NCP 

0.3 

0 20 40 60 80 100 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

(d) CAV 

sart 


DP 

ME 

NCP 

0.3 

0 20 40 60 80 100 


(f) SART 

Figure 7.19: Illustration of the stopping rules for the SIRT methods.


Stopping index k∗ kopt NCP DP ME 

Landweber 135 84 133 134 

Cimmino (projection) 73 67 9 19 

Cimmino (reflection) 73 67 9 19 

CAV 74 66 8 16 

DROP 72 67 9 19 

SART 84 62 37 48 

Kaczmarz 7 6 5 - 

Symmetric Kaczmarz 3 3 2 - 

Randomized Kaczmarz 6 5 2 - 

Table 7.2: The stopping index k∗ for all iterative methods. For each method 

the stopping rule, which is closest to kopt is bold. 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

kaczmarz 


DP 

NCP 

0.3 

0 2 4 6 8 10 


(a) Kaczmarz 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

randkaczmarz 

0.3 

0 2 4 6 8 10 


(c) Randomized Kaczmarz 

symkaczmarz 


DP 

NCP 

0.3 

0 2 4 6 8 10 


(b) Symmetric Kaczmarz 


DP 

NCP 

Figure 7.20: Illustration of the stopping rules for the ART methods.


Testing the Stopping Rules 

After having determined the number of samples when training for the stopping 

rules DP and ME we can observe the performance of the stopping rules on 

the different iterative methods. We again give the training method optimal 

conditions since we train on and solve the same problem. For this test we will 

use the built-in default relaxation parameter for each method. 

For each method we solve the problem with only a maximum number of iterations 

as a stopping criteria. For all the iterations we compute the relative errors 

and find the minimum relative error. We then solve the same problem with each 

of the stopping rules and compare the result with the number of iterations for 

the minimum relative error. Table 7.2 contains the stopping index for each of 

the stopping rules for each method and the number of iterations used to reach 

the minimum relative error kopt. The figures 7.19 and 7.20 illustrate the relative 

error histories for the methods, and for each method it is marked where the 

stopping rules stopped the method. 

We start by looking at the results for Landweber’s method. From the table and 

from figure 7.19 (a) we see that for Landweber’s method both the stopping rule 

ME and DP are very close to the optimal stopping index. The stopping rule 

NCP stops the iterations after only 84 iterations. By looking at the figure this 

can be explained by the behaviour of the relative errors, since the change in the 

relative error is very small after 80 iterations. This implies that even though 

both ME and DP are very close to the optimal stopping index the result for 

NCP is not bad, since it stops after fewer iterations but with almost the same 

information. 

For both of Cimmino’s methods the stopping rule NCP is closest to the optimal 

stopping index. For this example the stopping rule DP allows only 9 iterations 

and from figure 7.19 (b) and (c) we notice that the relative errors after this point 

are still significantly decreasing. The stopping rule ME allows 19 iterations 

which is just before the relative errors starts to level out. This behaviour is 

recognized for both the CAV method (figure (d)) and for the DROP method 

(figure (e)). 

For the SART method NCP again gives the stopping index closest to the optimal 

stopping index kopt, but for this case both the DP and the ME are close to the 

point on the relative error history, where the error levels out. This means that 

for this method the different stopping rules return a solution of almost equal 

quality regarding the error. 

A conclusion on the stopping rules for the SIRT methods based on the table

7.5 Relaxation Strategies Combined with Stopping Rules 89 

and the figure must be, that the only stopping rule that presents really bad 

results is the DP but only for some of the SIRT methods. According to this 

small test a safe choice of stopping rule is the NCP since it always stops the 

iterations after the relative error has leveled off. The advantage of the NCP 

method is also that it does not require any knowledge of the problem. Both the 

DP and the ME require training, where information about the noise level must 

be known, and in this test we gave them optimal conditions to determine the 

stopping index, since the training problem is the same as the solving problem. 

Despite this advantage they perform more poorly than the NCP method. 

Figure 7.20 illustrates the relative error histories for the ART methods. For the 

ART methods we can only use DP and NCP. We first look at Kaczmarz method 

in figure (a). From this figure we notice that the NCP stopping rule is closest 

to the optimal stopping index kopt, but both stopping rules have reached almost 

the same level of relative error as the kopt index. Figure (b) illustrates the 

relative error histories for symmetric Kaczmarz. In this case the NCP stopping 

index is actually the same as the kopt, but the DP index is only one iteration 

smaller and the error almost at the same level as the optimal index. Figure (c) 

illustrates the relative error histories for randomized Kaczmarz, and in this case 

the DP stopping index is closest to the optimal index kopt. The NCP index is 

in this case a bad stopping index since we have not reached where the errors 

level out yet. For the ART methods we conclude that in most cases the NCP 

stopping rule is the most effective, but the DP stopping rule is not a bad choice. 

Again we must keep in mind, that DP was given optimal condions and that it 

requires training and knowledge of the noise level. 

7.5 Relaxation Strategies Combined with Stopping 

Rules 

We have earlier tested the performance of the stopping rules and the strategies 

to determine the relaxation parameter separately, and will now investigate the 

performance when the strategies are used together. 

Relaxation to Control Noise Propagation with Stopping 

Rules 

Since we earlier in section 7.3 concluded that the relaxation strategies Ψ1 and Ψ2 

were good choices of relaxation strategies, we will test the performance when 

using the modified Ψ2 strategy and the stopping rules together. Figure 7.21



0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

cimminoProj using Ψ 2 modified 


DP 

ME 

NCP 

0.3 

0 200 400 600 800 1000 


(a) The modified Ψ2 strategy for Cimmino’s 

projection method. 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

drop using Ψ 2 modified 


DP 

ME 

NCP 

0.3 

0 200 400 600 800 1000 


(b) The modified Ψ2 strategy for the DROP 

method. 

Figure 7.21: The relaxation strategy Ψ2 modified and combined with the stopping 

rules. 

illustrates the relative error histories for Cimmino’s projection method and for 

the DROP method. The stopping index for the stopping rules are illustrated 

by different markers. From this figure we notice that even though we allow 

1000 iterations, then the minimum of the relative error is not reached, since 

the method dampens the noise error and hence the semi-convergence behaviour. 

This implies that NCP does not stop the iterations and that DP and ME are 

stopped after only a few iterations, where we clearly see that the relative error 

has not reached a level that is close to the level after 1000 iterations. We 

conclude that since we earlier have shown that the Ψ-based relaxation are good 

choices of relaxation strategies we could use a stopping rule that could find an 

appropriate stopping index for these methods. 

Line Search with Stopping Rules 

Since line search turned out to give good results when the noise level is low, 

we are interested in investigating if the introduced stopping rules can be used 

together with line search. Figure 7.22 illustrates the relative error histories 

and the stopping index for all stopping rules on the SIRT methods, where line 

search is defined. We notice that for all four methods the NCP stopping rule 

stops the iterations too early, since we still have a significant decay after the 

NCP stopping index. On the other hand both the DP and the ME give very bad 

results results since they stop the iterations either way before or after the optimal 

index. We conclude that with this relaxation strategy none of the stopping rules 

give satisfactory results which could be cause by the earlier described zigzagging 

behaviour.




0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

landweber, line search 


DP 

ME 

NCP 

0.3 

0 10 20 30 40 50 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

(a) Landweber 

cimminoRefl, line search 

0.3 

0 10 20 30 40 50 




DP 

ME 

NCP 



0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

cimminoProj, line search 


DP 

ME 

NCP 

0.3 

0 10 20 30 40 50 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 


cav, line search 


DP 

ME 

NCP 

0.3 

0 10 20 30 40 50 


(d) CAV 

Figure 7.22: Illustration of the stopping rules for the SIRT methods using line search.


Total Work Units Stopping index k∗ WU 

Landweber 69 ∗ 138 

Cimmino (projection) 36 72 

Cimmino (reflection) 36 72 

CAV 7 ∗ 14 

DROP 36 72 

SART 29 ∗ 58 

Kaczmarz 3 ∗ 12 

Symmetric Kaczmarz 2 16 

Randomized Kaczmarz 6 ∗ 24 

Table 7.3: The * denotes that the stopping rule DP is used, while all other method 

uses NCP. 

Comparing the Performance of SIRT and ART 

We want to compare the performance of the SIRT and the ART methods and to 

give the methods equal possibility to perform well, we use the training strategy 

for the relaxation parameter since we in section 7.3 showed that all methods 

are almost equally good when the relaxation parameter is trained. Figure 7.23 

and figure 7.24 illustate the relative error histories for the SIRT and the ART 

methods. For each method the stopping index for the different stopping rules 

are marked. We notice from figure 7.23 that the NCP method does not work 

well for the Landweber method, the CAV method and the SART method. For 

Landweber we notice that the DP stopping rule are very close to the minimum 

relative error and we will therefor use the DP rule for Landweber. For the 

CAV method none of the stopping rule return satisfactory results, but the DP 

rule is the closest. For the SART method we also choose DP since it returns a 

result with less iterations than the minimum relative error, but almost at the 

same level of the error. For the rest of the SIRT methods we choose NCP to 

be the stopping rule since NCP stops the iterations for these methods when 

the relative errors level out, and the information when iterating further is very 

small. As mentioned earlier NCP is also the easiest stopping rule to use, since 

it does not require training and knowledge about the noise level. For the ART 

methods we choose DP for Kaczmarz, NCP for symmetric Kaczmarz and DP 

for randomized Kaczmarz. For all the mentioned choices we have chosen the 

stopping rule which is closest to the minimum relative error and with the error 

on the same level. 

To compare the SIRT and the ART methods we recall the introduced work unit 

WU from section 3.3. The total work of a method is then the number of used 

iterations multiplied with the work units per iteration for the given method. 

Table 7.3 shows the chosen stopping index and the total work of the method.





0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

landweber, λ = 0.00052968 


DP 

ME 

NCP 

0.3 

0 20 40 60 80 100 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

(a) Landweber 

0.3 

0 20 40 60 80 100 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

cimminoRefl, λ = 122.3413 


drop, λ = 2.1673 


DP 

ME 

NCP 

0.3 

0 20 40 60 80 100 


(e) DROP 


DP 

ME 

NCP 




0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

cimminoProj, λ = 244.6826 


DP 

ME 

NCP 

0.3 

0 20 40 60 80 100 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 


cav, λ = 2.2216 


DP 

ME 

NCP 

0.3 

0 20 40 60 80 100 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

(d) CAV 

sart, λ = 1.8541 


DP 

ME 

NCP 

0.3 

0 20 40 60 80 100 


(f) SART 

Figure 7.23: Illustration of the stopping rules for the SIRT methods with a trained 

value of λ.



0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

kaczmarz, λ = 0.43769 


DP 

NCP 

0.3 

0 2 4 6 8 10 


(a) Kaczmarz 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

randkaczmarz, λ = 1 

0.3 

0 2 4 6 8 10 


(c) Randomized Kaczmarz 

symkaczmarz, λ = 0.32624 


DP 

NCP 

0.3 

0 2 4 6 8 10 


(b) Symmetric Kaczmarz 


DP 

NCP 

Figure 7.24: Illustration of the stopping rules for the ART methods with a trained 

value of λ.


From that result we clearly see that the ART methods use less work units to 

obtain a solution which has the same quality as the SIRT methods. Only the 

CAV method uses almost the same amount of work units but we also recall 

that for this method the quality of the solution is not as good as for the other 

methods. 

For this package the SIRT methods have an advantage, since the implementation 

is done in MATLAB, where the structure of the SIRT methods can be used 

to speed up the running time but this is only the case for MATLAB implementations. 

The good performance of the ART methods present a dilemma. Through the 

project we have experienced that the theory and understanding of the SIRT 

methods are better than for ART, whereas we do not have theory for semiconvergence 

and adaptive relaxation strategies for the ART methods. The experiments 

through this chapter have shown that ART produced the fastest solution, 

but by choosing the relaxation parameter by an adaptive method the 

SIRT methods could produce just as accurate solutions but without the need 

for training. However this requires more computational work. In future work we 

could hope that someone would come up with an adaptive method for the ART 

method, such that they could produce results without the need for training. 

One could also hope for a stopping rule for the SIRT methods with adaptive 

relaxation parameter that was able to stop the iterations when the curve of the 

relative errors starts to level out, since this could minimize the computational 

work for this method. In general one could hope for a better stopping rule since 

our results through this chapter have shown that all the known stopping rules 

are unstable in finding the optimal stopping index.

96 Testing the Methods

Chapter 8 

Manual Pages 

ITERATIVE SIRT METHODS 

cav Component Averaging (CAV) iterative method 

cimminoProj Cimmino’s iterative projection method 

cimminoRefl Cimmino’s iterative reflection method 

drop Diagonally Relaxed Orthogonal Projections (DROP) 

iterative method 

landweber The Classical Landweber iterative method 

sart The Simultaneous Algebraic Reconstruction Technique 

(SART) iterative method 

ITERATIVE ART METHODS 

kaczmarz Kaczmarz’s iterative method also known as algebraic 

reconstruction technique (ART) 

randkaczmarz The randomized Kaczmarz iterative method 

symkaczmarz The symmetric Kaczmarz iterative method

98 Manual Pages 

TRAINING ROUTINES 

trainDPME Training strategy to estimate the best parameter 

when the discrepancy principle or monotone error 

rule is used as stopping rule 

trainLambdaART Training strategy to find the best constant relaxation 

parameter λ for a given ART method 

trainLambdaSIRT Training strategy to find the best constant relaxation 

parameter λ for a given SIRT method 

TEST PROBLEMS 

fanbeamtomo Creates a two-dimensional fan beam tomography test 

problem 

paralleltomo Creates a two-dimensional parallel beam tomography 

test problem 

seismictomo Creates a two-dimensional seismic tomography test 

problem 

DEMO ROUTINES 

ARTdemo Demo illustrating the simple use of the ART methods 

SIRTdemo Demo illustrating the simple use of the SIRT methods 

trainingdemo Demo illustrating the use of the training routines and 

the afterwards use of the SIRT and the ART methods 

AUXILIARY ROUTINES 

calczeta Calculates the roots of a specific polynomial g(z) of 

degree k

The Demo functions 

This MATLAB package includes three demo functions which illustrate the use 

of the remaining functions in the package. 

The demo function ARTdemo illustrates the use of the ART methods kaczmarz, 

symkaczmarz and randkaczmarz. First the demo function creates a parallel 

beam tomography test problem using the test problem paralleltomo. For this 

test problem noise is added to the right-hand side and the noisy problem is 

then solved using the ART methods with 10 iterations. The result is shown as 

four images, where one contains the exact solution and the remaining images 

illustrate the found solutions using the three ART methods. 

The demo functionSIRTdemo illustrates the use of the SIRT methodslandweber, 

cimminoProj,cimminoRefl, cav, drop, and sart. First the demo function creates 

a parallel beam tomography test problem using the test problemparalleltomo. 

For this test problem noise is added to the right-hand side and the noisy 

problem is then solved using the SIRT methods with 50 iterations. The result is 

shown as seven images, where one contains the exact solution and the remaining 

images illustrate the found solutions using the six SIRT methods. 

The demo function trainingdemo illustrates the use of the training functions 

trainLambdaART, trainLambdaSIRT, and trainDPME followed by the solving 

with an ART or a SIRT method. In this demo the used SIRT method is 

cimminoProj and the used ART method is kaczmarz. First the demo function 

creates a parallel beam tomography test problem using the test problem 

paralleltomo. For this test problem noise is added to the right-hand side. 

Then the training strategy trainLambdaSIRT is used to find the relaxation parameter 

for cimminoProj and trainLambdaART is used to find the relaxation 

parameter for kaczmarz. Including this information the stopping parameter is 

found for each of the methods, where cimminoProj uses the ME stopping rule 

and kaczmarz uses the DP stopping rule. After this we solve the problem with 

the specified relaxation parameter and stopping rule. The result is shown as 

three images, where one contains the exact image and the remaining images 

illustrate the found solutions. 

99


calczeta 

Purpose: 

Synopsis: 

Calculates the roots of a specific polynomial g(z) of degree k. 

z = calczeta(k) 

Description: 

This function calculates the unique root in the interval (0, 1) by use of Newton- 

Raphson’s method and Horner’s rule of the polynomial of degree k: 

g(z) = (2k − 1)z k−1 − (z k−2 + ... + z + 1) = 0. 

The input k can be given as both a scalar or a vector and the corresponding 

root or roots are returned in the output z. 

The fuction calczeta is used in the functions cav, cimminoProj,cimminoRefl, 

drop, landweber, sart and symkaczmarz. 

Algorithm: 

See appendix A.2 for further discription of the used algorithm. 

Examples: 

Calculate the roots for the degrees 2 up to 100 and plot the found roots. 

k = 2:100; 

z = calczeta(k); 

figure, plot(k,z,’bo’)

See also: 

cav,cimminoProj,cimminoRefl,drop,landweber,sart,symkaczmarz. 

References: 

1. See appendix A.2. 

101 

2. L. Eldén, L. Wittmeyer-Koch and H. B. Nielsen, Introduction to Numerical 

Computation, Studentlitteratur AB, 2004.


cav 

Purpose: 

Synopsis: 

Component Averaging (CAV) iterative method. 

[X info restart] = cav(A,b,K) 

[X info restart] = cav(A,b,K,x0) 

[X info restart] = cav(A,b,K,x0,options) 

Algorithm: 

For arbitrary x 0 ∈ R n the algorithm for cav takes the following form: 

x k+1 = x k + λkA T DS(b −Ax k ), 

 

where DS = diag w1/ n j=1 sja2 1j , . . .,wm/ n j=1 sja2 

mj and S = diag(s1, . . .,sn), 

where sj is the number of nonzero elements in column j of A. 

Description: 

The function implements the Component Averaging (CAV) iterative method for 

solving the linear system Ax= b. The starting vector is x0; if no starting vector 

is given then x0 = 0 is used. 

The numbers given in the vector K are iteration numbers, that specify which 

iterations are stored in the output matrix X. If a stopping rule is selected (see 

below) and K = [ ], then X contains the last iterate only. 

The maximum number of iterations is determined either by the maximum number 

in the vector K or by the stopping rule specified in the field stoprule in the 

struct options. If K is empty a stopping rule must be specified. 

The relaxation parameter is given in the field lambda in the struct options, 

either as a constant or as a string that determines the method to compute

103 

lambda. As default lambda is set to 1/˜σ 2 1, where ˜σ1 is an estimate of the largest 

singular value of D 1 

2 

S A. 

The second output info is a vector with two elements. The first element is an 

indicator, that denotes why the iterations were stopped. The number 0 denotes 

that the iterations were stopped because the maximum number of iterations 

were reached, 1 denotes that the NCP-rule stopped the iterations, 2 denotes that 

the DP-rule stopped the iteration and 3 denotes that the ME-rule stopped the 

iterations. The second element in info is the number of used iterations. 

The struct restart, which can be given as output, contains in the field s1 the 

estimated largest singular value. restart also returns a vector containing the 

diagonal of the matrix DS in the field M and an empty vector in the field T. The 

struct restart can also be given as input in the struct options such that the 

program does not have to recompute the contained values. We recommend only 

to use this, if the user has good knowledge of MATLAB and is completely sure 

of the use of restart as input. 

Use of options: 

The following fields in options are used in this function: 

- options.lambda: 

- options.lambda = c, wherecis a constant, satisfying 0 ≤ c ≤ 2/σ 2 1. 

A warning is given if this requirement is estimated to be violated. 

- options.lambda = ’linesearch’, where the method linesearch 

is used to compute the value for λk in each iteration using (4.11) 

from section 4.2. 

- options.lambda = ’psi1’, where the method psi1 computes the 

values for λk using the Ψ1-based relaxation (4.12) from section 4.2. 

- options.lambda = ’psi1mod’, where the methodpsi1mod computes 

the values for λk using the modified Ψ1-based relaxation (4.16) with 

τ1 = 2 from section 4.2. The parameter τ1 can be changed in line 

353 in the code. 





τ2 = 1.5 from section 4.2. The parameter τ2 can be changed in line 

400 in the code.


- options.restart 

- options.restart.M = a vector with the diagonal of DS. 

- options.restart.s1 = ˜σ1, where ˜σ1 is the estimated largest singu- 

lar value of D 1 

2 

S A 

- options.stoprule 

- options.stoprule.type 

- options.stoprule.type = ’NONE’, where no stopping rule is 

given and only the maximum number of iterations is used to 

stop the algorithm. This choice is default. 

- options.stoprule.type = ’NCP’, where the optimal number 

of iterations k∗ is chosen according to Normalized Cumulative 

Periodogram described in section 5.2. 

- options.stoprule.type = ’DP’, where the stopping index k∗ 

is determined according to the discrepancy principle (DP) described 

in section 5.1. 

- options.stoprule.type = ’ME’, where the stopping index k∗ 

is determined according to the monotone error rule (ME) described 


- options.stoprule.taudelta = τδ, where δ is the noise level and τ 

is user-chosen. This parameter is only needed for the stoprule types 

DP and ME. 

- options.w 

Examples: 

- options.w = w, where w is an m-dimensional vector. 

Generate a “noisy” 50 × 50 parallel beam tomography problem, computes 50 

cav iterations and show the last iterate: 

[A b x] = paralleltomo(50,0:5:179,150); 

e = randn(size(b)); e = e/norm(e); 

b = b + 0.05*norm(b)*e; 

X = cav(A,b,1:50); 

imagesc(reshape(X(:,end),50,50)) 

colormap gray, axis image off

See also: 

cimminoProj, cimminoRefl, drop, landweber, sart. 

References: 

105 

1. Y. Censor, D. Gordan and R. Gordan, Component averaging: An efficient 

iterative parallel algorithm for large sparse unstructured problems, Parallel 

Computing 27 (2001), p. 777-808.


cimminoProj 

Purpose: 

Synopsis: 

Cimmino’s iterative projection method. 

[X info restart] = cimminoProj(A,b,K) 

[X info restart] = cimminoProj(A,b,K,x0) 

[X info restart] = cimminoProj(A,b,K,x0,options) 

Algorithm: 

For arbitrary x 0 ∈ R n the algorithm for cimminoProj take the following form: 

x k+1 = x k + λkA T M(b −Ax k ), 

where M = wi 

m diagA(i, :) −2 

2 for i = 1, . . .,m. 

Description: 

The function implements Cimmino’s iterative projection method for solving linear 

systems Ax= b. The starting vector is x0; if no starting vector is given, then 

x0 = 0 is used. 

The numbers given in the vector K are iteration numbers that specify which 

iterations are stored in the output matric K. If a stopping rule us selected (see 


The maximum number of iterations is determined eiter by the maximum number 




either as a constant or as a string that determines the method to compute 


singular value of M 1 

2A.

107 


indicator that denotes why the iterations were stopped. The number 0 denotes 



the DP-rule stopped the iteration and 3 denotes that the ME-rule stopped the 


The struct restart, which can be given as output contains in the field s1 the 


diagonal of the matrix M in the field M and an empty vector in the field T. The 

struct restart can also be given as input in the struct options, such that the 

program do not have to recompute the contained values. We recommend only 

to use this, if the user has good knowledge of MATLAB and is completly sure 


Use of options 



- options.lambda = c, wherecis a constant, satisfying 0 ≤ c ≤ 2/σ 2 1 . 

















- options.restart


- options.restart.M = a vector with the diagonal of M. 

- options.restart.s1= ˆσ1, where ˆσ1 is the estimated largest singular 

value of M 1 

2 A. 



- options.stoprule.type = ’none’, where no stopping rule is 







is determined according to the discrepancy priciple (DP) described 

in 5.1. 



in 5.1. 



DP and ME. 

- options.w 

Examples: 



cimminoProj iterations and show the last iterate: 

See also: 



b = b + 0.05*norm(b)*e; 

X = cimminoProj(A,b,1:50); 


colormap gray, axis image off 

cav, cimminoRefl, drop, landweber, sart.

References: 

109 

1. G. Cimmino, Calcolo approssimato per le soluzioni dei sistemi di equazioni 

lineari, La Ricerca Scientifica, XVI, Series II, Anno IX, 1 (1938), p. 326- 

333.


cimminoRefl 

Purpose: 

Synopsis: 

Cimmino’s iterative reflection method. 

[X info restart] = cimminoRefl(A,b,K) 

[X info restart] = cimminoRefl(A,b,K,x0) 

[X info restart] = cimminoRefl(A,b,K,x0,options) 

Algorithm: 

For arbitrary x 0 ∈ R n the algorithm for cimminoRefl take the following form: 

x k+1 = x k + λkA T M(b −Ax k ), 

where M = 2wi 

m diagA(i, :) −2 

2 for i = 1, . . .,m. 

Description: 

The function implements Cimmino’s iterative reflection method for solving linear 

systems Ax= b. The starting vector is x0; if no starting vector is given, then 


The numbers given in the vector K are iteration numbers that specify which 

iterations are stored in the output matrix K. If a stopping rule is selected (see 








singular value of M 1 

2A.

111 





the DP-rule stopped the iterations and 3 denotes that the ME-rule stopped the 




diagonal of the matrix M in the field M and an empty vector in the field T. The 

struct restart can also be given as input in the struct options, such that the 

program do not have to recompute the contained values. We recommend only 

to use this, if the user has good knowledge of MATLAB and is completely sure 





- options.lambda = c, wherecis a constant, satisfying 0 ≤ c ≤ 2/σ 2 1 . 

















- options.restart


- options.restart.M a vector with the diagonal of M. 


value of M 1 

2 A. 

















DP and ME. 

- options.w 

Examples: 



cimminoRefl iterations and show the last iterate: 

See also: 



b = b + 0.05*norm(b)*e; 

X = cimminoRefl(A,b,1:50); 



cav, cimminoProj, drop, landweber, sart.

References: 

113 

1. G. Cimmino, Calcolo approssimato per le soluzioni dei sistemi di equazioni 

lineari, La Ricerca Scientifica, XVI, Series II, Anno IX, 1 (1938), p. 326- 

333.


drop 

Purpose: 

Synopsis: 

Diagonally Relaxed Orthogonal Projections (DROP) iterative method. 

[X info restart] = drop(A,b,K) 

[X info restart] = drop(A,b,K,x0) 

[X info restart] = drop(A,b,K,x0,options) 

Algorithm: 

For arbitrary x 0 ∈ R n the algorithm for the drop method takes the following 


x k+1 = x k + λkS −1 A T D(b −Ax k ), 

where S−1 = diag s −1 

j and sj is the number of nonzero elements in column j 

 

wi 

of A and D = diag for i = 1, . . .,m. 

Description: 

A(i,: 2 2 

The function implements the Diagonally Relaxed Orthogonal Projections (DROP) 

iterative method for solving the linear system Ax= b. The starting vector is x0; 

if no starting vector is given, then x0 = 0 is used. 

The numbers given in the vector K are the iteration numbers, that specify which 








115 

lambda. As default lambda is set to 1/ˆρ, where ˆρ is an estimate of the spectial 

radius of S −1 A T DA. 




were reached, denotes that the NCP-rule stopped the iterations, 2 denotes that 





diagonal of the matrix D in the field M and the diagonal of the matrix S in the 

field T. The struct restart can also be given as input in the struct options, 

such that the program do not have to recompute the contained values. We 

recommend only to use this, if the user has good knowledge of MATLAB and 

is completely sure of the use of restart as input. 




- options.lambda = c, where c is a constant, satisfying 0 ≤ c ≤ 2/ρ. 














- options.restart


- options.restart.M = a vector containing the diagonal of D. 

- options.restart.T = a vector containing the diagonal of S −1 . 

- options.restart.s1 = ˆσ1, where ˆσ1 = √ ˆρ. 









- options.stoprule.type = ’DP’, where the stopping index is 

determined according to the dicrepancy principle (DP) described 


- options.stoprule.type = ’ME’, where the stopping index is 

determined according to the monotone error rule (ME) described 




DP and ME. 

- options.w 

Examples: 



drop iterations and show the last iterate: 



b = b + 0.05*norm(b)*e; 

X = drop(A,b,1:50); 


colormap gray, axis image off

See also: 

cav, cimminoProj, cimminoRefl, landweber, sart. 

References: 

117 

1. Y. Censor, T. Elfving, G. Herman and T. Nikazad, On diagonally relaxed 

orthogonal projection methods, SIAM J. Sci. Comput., 30 (2007/08), p. 

473-504.


fanbeamtomo 

Purpose: 

Synopsis: 

Creates a two-dimensional fan beam tomography test problem. 

[A b x theta p R w] = fanbeamtomo(N) 

[A b x theta p R w] = fanbeamtomo(N,theta) 

[A b x theta p R w] = fanbeamtomo(N,theta,p) 

[A b x theta p R w] = fanbeamtomo(N,theta,p,R) 

[A b x theta p R w] = fanbeamtomo(N,theta,p,R,w) 

[A b x theta p R w] = fanbeamtomo(N,theta,p,R,w,isDisp) 

Description: 

This function creates a two-dimensional tomography test problem using fan 

beams. A 2-dimensional domain is divided into N equally spaced intervals in 

boths dimension creating N 2 cells. For each specified angle theta in degrees a 

source is located with distance RN to the center of the domain. From the sources 

p equiangular rays penetrate the domain with a span of w between the first and 

the last ray. The default values for the angles is theta = 0:359. The number of 

raysphave the default value equal toround( √ 2N). The distance from the center 

of the domain to the sources is given in the unit of side lengths and default value 

of R is 2. The default value of the span w is calculated such that from (0,RN) 

the first ray hits the point (-N/2,N/2) and the last hits (N/2,N/2). If the input 

isDisp is different from 0 then the function also creates an illustration of the 

problem with the used angles and rays etc. As defaul isDisp is 0. 

The function returns a coefficient matrix A with the dimension nA·p×N 2 , where 

nA is the number of used angles, the right hand side b and the phantom head 

reshaped as a vector x. The figure below illustrates the phantom head for N 

= 100. In case that default values are used the function also returns the used 

angles theta, the number of used rays for each angle p, the used distance from 

the source to the center of the domain given in side lengths R and the used span 

of the rays w.

Algorithm: 

119 

The element aij is defined as the length of the i’th ray through the j’th cell 

with aij = 0 if ray i does not go through cell j. The exact solution of the head 

phantom is reshaped as a vector and the i’th element in the right hand side bi 

is 

 

bi = aijxj, i = 1, . . .,nA ·p. 

N 2 

j=1 

For further information see chapter 6. 

Examples: 

Create a test problem and visualize the solution: 

See also: 

N = 64; theta = 0:5:359; p = 2*N; R = 2; 

[A b x] = fanbeamtomo(N,theta,p,R); 

imagesc(reshape(x,N,N)) 


paralleltomo, seismictomo. 

References: 

1. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, 

SIAM, 2001. 

Shepp−Logan Phantom, N = 100


kaczmarz 

Purpose: 

Kaczmarz’s iterative method also known as algebraic reconstruction 

technique (ART). 

Synopsis: 

[X info] = kaczmarz(A,b,K) 

[X info] = kaczmarz(A,b,K,x0) 

[X info] = kaczmarz(A,b,K,x0,options) 

Algorithm: 

For arbitrary x 0 ∈ R n the algorithm kaczmarz takes the following form: 

Description: 

x k,0 = x k 

x k,i = x k,i−1 bi − 

+ λk 

ai , xk,i−1 ai2 2 

x k+1 = x k,m . 

The function implements Kaczmarz’s iterative method for solving the linear 

system Ax= b. The starting vector is x0; if no starting vector is given then 



iterations are stored in the putput matrix X. If a stopping rule is selected (see 





The relaxation parameter is given in the field lambda in the struct options as 

a constant. As default lambda is set to 0.25. 

a i

121 




were reached, 1 denotes that the NCP-rule stopped the iterations and 2 denotes 

that the DP-rule stopped the iterations. 




- options.lambda = c, where c is a constant, satisfying 0 ≤ c ≤ 2. A 

warning is given if this requirement is estimated to be violated. 


Examples: 












is user-chosen. This parameter is only needed for the stoprule type 

DP. 


kaczmarz iterations and show the last iterate: 



b = b + 0.05*norm(b)*e;


See also: 

X = kaczmarz(A,b,1:10); 



randkaczmarz, symkaczmarz. 

References: 

1. S. Kaczmarz, Angenäherte auflösung von systemen linearer gleichungen, 

Bulletin de l’Académie Polonaise des Sciences et Lettres, A35 (1937), p. 

355-357.

landweber 

Purpose: 

Synopsis: 

The Classical Landweber iterative method. 

[X info restart] = landweber(A,b,K) 

[X info restart] = landweber(A,b,K,x0) 

[X info restart] = landweber(A,b,K,x0,options) 

Algorithm: 

For arbitrary x 0 ∈ R n the algorithm for landweber takes the following form: 

Description: 

x k+1 = x k + λkA T (b −Ax k ). 

123 

The function implements the Classical Landweber iterative method for solving 

the linear system Ax= b. The starting vector is x0; if no starting vector is given 

then x0 = 0 is used. 









lambda. As default lambda is set to 1/˜σ 2 1 , where ˜σ1 is an estimate of the largest 

singular value of A. 

The second output is a vector with two elements. The first element is an indicator, 

that denotes why the iterations were stopped. The number 0 denotes





iterations. The second element is info is the number of used iterations. 

The struct restart, which can be given as output, contains in the field s1 the 

estimated largest singular value. restart also returns an empty vector in both 

the fields M and T. The struct restart can also be given as input in the struct 

options, such that the program does not have to recompute the contained 

values. We recommend only to use this, if the user has good knowledge of 

MATLAB and is completely sure of the use of restart as input. 




- options.lambda = c, wherecis a constant, satisfying 0 ≤ c ≤ 2/˜σ 2 1 . 

















- options.restart 


value of A. 

- options.stoprule

Examples: 


125 













options.stoprule.taudelta = τδ, where δ is the noise level and τ 


DP and ME. 

We generate a “noisy” 50 ×50 parallel beam tomography problem, computes 50 

landweber iterations and show the last iterate: 

See also: 



b = b + 0.05*norm(b)*e; 

X = landweber(A,b,1:50); 



cav, cimminoProj, cimminoRefl, drop, sart 

References: 

1. L. Landweber, An iteration formula for fredholm integral equations of the 

first kind, American Journal of Mathematics 73 (1951), p. 615-624.


paralleltomo 

Purpose: 

Synopsis: 

Creates a two-dimensional parallel beam tomography test problem. 

[A b x theta p w] = paralleltomo(N) 

[A b x theta p w] = paralleltomo(N,theta) 

[A b x theta p w] = paralleltomo(N,theta,p) 

[A b x theta p w] = paralleltomo(N,theta,p,w) 

[A b x theta p w] = paralleltomo(N,theta,p,w,isDisp) 

Description: 

This function creates a two-dimensional tomography test problem using parallel 

beams. A 2-dimensional domain is divided into N equally spaced intervals in 

both dimensions creating N 2 cells. For each specified angle theta in degrees, 

p parallel rays, arranged symmetrically around the center of the domain, such 

that the width from the first to the last ray is w, penetrate the domain. The 

default values for the angles are theta = 0:179. The number of rays p has the 

default value equal to round( √ 2N). The default value of the width between the 

first and the last ray w is √ 2N. If the input isDisp is different from 0 then the 

function also creates an illustration of the problem with the used angles and 

rays etc. As defaul isDisp is 0. 

The function returns a coefficient matrix A with the dimension nA·p×N 2 , where 

nA is the number of used angles, the right hand side b and the phantom head 

reshaped as a vector x. The figure below illustrates the phantom head for N 

= 100. In case the default values are used, the function also returns the used 

angles theta, the number of used rays for each angle p, and the used width of 

the rays w. 

Algorithm: 

The element aij is defined as the length of the i’th ray through the j’th cell 

with aij = 0 if ray i does not go through cell j. The exact solution of the head

127 


is 

 

bi = aijxj, i = 1, . . .,nA ·p. 

N 2 

j=1 


Examples: 


See also: 

N = 64; theta = 0:5:179; p = 2*N; 

[A b x] = paralleltomo(N,theta,p); 



fanbeamtomo, seismictomo. 

References: 

1. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, 

SIAM, 2001. 

Shepp−Logan Phantom, N = 100


randkaczmarz 

Purpose: 

Synopsis: 

The randomized Kaczmarz iterative method. 

[X info] = randkaczmarz(A,b,K) 

[X info] = randkaczmarz(A,b,K,x0) 

[X info] = randkaczmarz(A,b,K,x0,options) 

Algorithm: 

For arbitrary x 0 ∈ R n the algorithm forrandkaczmarz takes the following form: 

x k+1 = x k + λ br(i) − ar(i) , xk ar(i) 2 a 

2 

r(i) , 

where r(i) is chosen from the set {1, . . .,m} randomly with probability proportional 

with a r(i) 2 2 . 

Description: 

The function implements the Randomized Kaczmarz iterative method for solving 

the linear system Ax= b. The starting vector is x0; if no starting vector is 

given then x0 = 0 is used. 







The relaxation parameter is given in the field lambda in the struct options as 

a constant. As default lambda is set to 1, since this corresponds to the original 

method.

129 




were reached, 1 denotes that the NCP-rule stopped the iterations and 2 denotes 





- options.lambda = c, where c is a constant. 


Examples: 













DP. 

Generate a “noisy” 50 × 50 parallel beam tomography problem, compute 10 

randkaczmarz iterations and show the last iterate: 



b = b + 0.05*norm(b)*e; 

X = randkaczmarz(A,b,1:10);


See also: 



kaczmarz, symkaczmarz. 

References: 

1. T. Strohmer and R. Vershynin, A randomized solver for linear systems 

with exponential convergence, Lecture Notes in Computer Science 4110 

(2006), p. 499-507.

sart 

Purpose: 

The Simultaneous Algebraic Reconstruction Technique (SART) iterative 

method. 

Synopsis: 

[X info restart] = sart(A,b,K) 

[X info restart] = sart(A,b,K,x0) 

[X info restart] = sart(A,b,K,x0,options) 

Algorithm: 

For arbitrary x 0 ∈ R n the algorithm for sart takes the following form: 

x k+1 = x k + λkV −1 A T W(b −Ax k ), 

where V = diag m i=1 ai 

j for j = 1, . . .,n and W = diag 

1, . . .,m. 

Description: 

 

P 1 

n 

j=1 ai j 

131 

 

for i = 

The function implements the SART (Simultaneous Algebraic Reconstruction 

Technique) iterative method for solving the linear system Ax= b. The starting 

vector is x0; if no starting vector is given then x0 = 0 is used. 










lambda. As default lambda is set to 1/ˆρ, where ˆρ is an estimate of the spectial 

radius of V −1 A T WA. 




were reached 1 denotes that the NCP-rule stopped the iterations, 2 denotes that 

the DP-rule stopped the iterations and 3 denote that the ME-rule stopped the 

iterations. The second element in info is the number if used iterations. The 

second element in info is the number of used iterations. 

The struct restart, which can be given as output contains in the filed s1 the 


diagonal of the matrix W in the fieldMand the diagonal of the matrix V −1 in the 

field T. The struct restart can also be given as input in the struct options, 

such that the program do not have to recompute the contained values. We 

recommend only to use this, if the user has good knowledge of MATLAB and 

is completley sure of the use of restart as input. 


















- options.restart

- options.restart.M = a vector containing the diagonal of W. 

- options.restart.T = a vector containing the diagonal of V −1 . 

- options.restart.s1 = ˆσ1, where ˆσ1 = √ ˆρ. 


Examples: 


133 





of iterations k∗ is chosen according to the Normalized Cumulative 



determined sccording to the discrepancy principle (DP) described 


- options.stoprule.type = ’ME’, where the stopping index is 

determined sccording to the monotone error rule (ME) described 



is user chosen. This parameter is only needed for the stoprule types 

DP and ME. 

Generate a “noisy” 50 × 50 parallel beam tomography problem, compute 50 

sart iterations and show the last iterate: 

See also: 



b = b + 0.05*norm(b)*e; 

X = sart(A,b,1:50); 



cav, cimminoProj, cimminoRefl, drop, landweber.


References: 

1. A. H. Andersen and A. C. Kak, Simultaneous algebraic reconstruction 

technique (SART): A superior implementation of the ART algorithm, Ultrasonic 

Imaging, 6 (1984), p. 81-94.

seismictomo 

Purpose: 

Synopsis: 

Creates a two-dimensional seismic tomography test problem. 

[A b x s p] = seismictomo(N) 

[A b x s p] = seismictomo(N,s) 

[A b x s p] = seismictomo(N,s,p) 

[A b x s p] = seismictomo(N,s,p,isDisp) 

Description: 

135 

This function creates a two-dimensional seismic tomography test problem. A 

two-dimensional domain illustrating a cross section of the subsurface is divided 

into N equally spaced intervals in boths dimensions creating N 2 cells. On the 

right boundary s sources are located and each source transmits waves to the 

p seismographs or receivers, which are scattered on the surface and on the left 

boundary. As default N sources and 2N receivers are chosen. If the input isDisp 

is different from 0 then the function also creates an illustration of the problem 

with the used angles and rays etc. As defaul isDisp is 0. 

The function returns a coefficient matrixAwith the dimensionsp·s×N 2 , the right 

hand side b and a created phantom of a subsurface as the vector x reshaped. 

The figure below illustrates the subsurface created when N = 100. In case the 

default values are used, the function also returns the used number of sources s 

and the used number of receivers p. 

Seismic Phantom, N = 100


Algorithm: 

The element aij is defined as the length of the i’th ray through the j’th cell with 

aij = 0 if ray i does not go through cell j. The exact solution of the subsurface 


is 

 

bi = aijxj, i = 1, . . .,s ·p. 

N 2 

j=1 


Examples: 


See also: 

N = 100; s = N; p = 2*N; 

[A b x] = seismictomo(N,s,p); 



fanbeamtomo, paralleltomo. 

References: 

1. See chapter 6.

symkaczmarz 

Purpose: 

Synopsis: 

The symmetric Kaczmarz iterative method. 

[X info] = symkaczmarz(A,b,K) 

[X info] = symkaczmarz(A,b,K,x0) 

[X info] = symkaczmarz(A,b,K,x0,options) 

Algorithm: 

137 

For arbitrary x 0 ∈ R n the algorithm for symkaczmarz takes the following form: 

x k,0 = x k 

x k,i = x k,i−1 bi − 

+ λk 

ai , xk,i−1 ai2 2 

x k+1 = x k,1 . 

Description: 

a i , i = 1, . . . , m − 1, m, m − 1, . . .,1 

The function implements the symmetric Kaczmarz iterative method for solving 

the linear system Ax= b. The starting vector is x0; if no vector is given then 










lambda. As default lambda is set to 0.25.





were reached, 1 denotes that the NCP-rule stopped the iterations, and 2 denotes 












Examples: 













DP. 


symkaczmarz iterations and show the last iterate:

See also: 



b = b + 0.05*norm(b)*e; 

X = symkaczmarz(A,b,1:10); 



kaczmarz, randkaczmarz. 

References: 

139 

1. ˚A. Björck and T. Elfving, Accelerared projection methods for computing 

pseudoinverse solutions of systems of linear equations, BIT Vol. 19 issue 

2 (1979), p. 145-163.


trainDPME 

Purpose: 

Training strategy to estimate the best parameter when the discrepancy 

principle or the monotone error rule is used as stopping rule. 

Synopsis: 

tau = trainlambda(A,b,x exact,method,type,delta,s) 

tau = trainlambda(A,b,x exact,method,type,delta,s,options) 

Description: 

This function implements the training strategy for estimation of the parameter 

τ, when using the discrepancy principle or the monotone error rule as stopping 

rule. From test solution x exact and the corresponding noise free right-hand 

side b s noisy samples are generated with noise level delta. From each sample 

the solutions for the given methodmethod are calculated and according to which 

type of stopping rule is chosen in type an estimate of tau is calculated and 

returned. 

A default maximum number of iterations is chosen for the SIRT methods to 

be 1000 and for the ART methods to 100. If the this is not enough it can be 

changed in line 74 for the SIRT methods and in line 87 for the ART methods. 

Algorithm: 

See section 5.1. 


The following fields in options are used in this function. 

- options.lambda: See the chosen method method for the choices of this 

parameter.

141 

- options.restart: Only availible when method is a SIRT method. See 

the specific method for correct use. 

- options.w: If the chosen mehtod method allows weigths this parameter 

can be set. 

Examples: 

Generate a “noisy” 50 × 50 parallel beam tomography problem. Then the parameter 

tau is found using training for DP and this parameter is used with DP 

to stop the iterations and the last iterate is shown. 

See also: 


delta = 0.05; 

tau = trainDPME(A,b,x,@cimminoProj,’ME’,delta,20); 


b = b + delta*norm(b)*e; 

options.stoprule.type = ’ME’; 

options.stoprule.taudelta = tau*delta; 

[X info] = cimminoProj(A,b,200,[],options); 



cav, cimminoProj, cimminoRefl, drop, kaczmarz, landweber, randkaczmar, 

sart, symkaczmarz. 

References: 

1. T. Elfving and T. Nikazad, Stopping rules for Landweber-type iteration, 

Inverse Problems, Vol 23 (2007), p. 1417-1432.


trainLambdaART 

Purpose: 

Strategy to find the best constant relaxation parameter λ for a given 

ART method. 

Synopsis: 

lambda = trainLambdaART(A,b,x exact,method) 

lambda = trainLambdaART(A,b,x exact,method,kmax) 

Description: 

This function implements the training strategy for finding the optimal constant 

relaxation parameter λ for a given ART method, that solves the linear system 

Ax = b. The training strategy builts on a two part strategy. 

In the first part the resolution limit is calculated using kmax iterations of the 

iteration ART method given as a function handle in method. If kmax is not 

given or empty, the default value is 100. 

The first part of the strategy is to determine the resolution limit for the a specific 

value of λ. 

The second part of the stratgy is a modified version of a golden section search 

in which the optimal value of λ is found within the convergence interval of the 

specified iterative method. The method returns the optimal value in the output 

lambda. 

Algorithm: 

See section 4.2.1.

Examples: 

143 

Generate a “noisy” 50 ×50 parallel beam tomography problem, train to find the 

optimal value of λ for the ART method kaczmarz and use the found value, when 

10 iterations of the method are computed. At last the last iterate is shown: 

See also: 



b = b + 0.05*norm(b)*e; 

lambda = trainLambdaART(A,b,x,@kaczmarz); 

options.lambda = lambda; 

X = kaczmarz(A,b,1:10,[],options); 



trainLambdaSIRT 

References: 

1. See section 4.2.1.


trainLambdaSIRT 

Purpose: 

Strategy to find the best constant relaxation parameter λ for a given 

SIRT method. 

Synopsis: 

lambda = trainLambdaSIRT(A,b,x exact,method) 

lambda = trainLambdaSIRT(A,b,x exact,method,kmax) 

lambda = trainLambdaSIRT(A,b,x exact,method,kmax,options) 

Description: 

This function implements the training strategy for finding the optimal constant 

relaxation parameter λ for a given SIRT method, that solves the linear system 

Ax = b. The training strategy builds on a two part strategy. 

In the first step the resolution limit is calculated using kmax iterations of the 

iteration SIRT method given as a function handle in method. If kmax is not 

given or empty, the default value is 1000. 

To determine the resolution limit the default value of λ is used together with 

the contents of options. See below for correct use of options. 

The second part of the stratgy is a modified version of a golden section search 

in which the optimal value of λ is found within the convergence interval of the 

specified iterative method. The method returns the optimal value in the output 

lambda. 

Algorithm: 

See section 4.2.1.


The following fields in options are used in this function. 

- options.restart: See the specific method for correct use. 

145 

- options.w: If the chosen mehtod method allows weigths this parameter 

can be set. 

Examples: 

Generate a “noisy” 50 ×50 parallel beam tomography problem, train to find the 

optimal value of λ for the SIRT method cimminoProj and use the found value, 

when 50 iterations of the method is computed. At last the last iterate is shown: 

See also: 



b = b + 0.05*norm(b)*e; 

lambda = trainLambdaSIRT(A,b,x,@cimminoProj); 

options.lambda = lambda; 

X = cimminoProj(A,b,1:50,[],options); 



trainLambdaART 

References: 

1. See section 4.2.1.

146 Manual Pages

Chapter 9 

Conclusion and Future Work 

The goal of this thesis was to develop and implement a MATLAB package containing 

a number of iterative methods for algebraic reconstruction and describe 

the methods individually, and we believe that we have completed the task successfully. 

We have described the implemented methods and the corresponding theory. 

Furthermore the theory for the strategies for choosing the relaxation parameter 

is described and for each of the implemented methods the relevant strategies are 

available. We have also discussed and implemented a few stopping rules. We 

also introduced three tomography test problems from parallel beam tomography, 

fan beam tomography and seismic tomography. Furthermore manual pages for 

each function in the package are created. 

In our studies of the implemented methods and strategies we concluded that 

all the implemented strategies for choosing the relaxation parameter gave nice 

results. One should be aware that each method has its own advantage and 

disadvantage. The training strategy, which we developed, requires knowledge 

of the exact solution, but at the same time keeps the relative error very small. 

Line search can only be used on a small selection of the SIRT methods, and 

for larger noise levels it shows erratic behaviour but for small noise levels the 

performance is good. The last strategy which arose from the studies of the semiconvergence 

has the advantage that the noise-error is dampened which keeps the

148 Conclusion and Future Work 

relative error small when the resolution limit is reached. The disadvantage is 

that because of this damping then it requires many iterations to reach the same 

level for the relative error as the other strategies. 

The studies of the stopping rules showed very unstable results since the same 

stopping rule did not work equally good on for methods. The studies where 

we combined the relaxation parameter and the stopping rules confirmed the 

conclusion that neither of the stopping rules produced a stable result. The 

NCP stopping rule often gave the best result but when it did not the result was 

far away. 

We also compared the performance of the ART and the SIRT methods where 

we included the workload. We concluded that the ART methods in general used 

less work units to obtain a result of the same quality as for the SIRT methods. 

This caused a dilemma since the understanding for the SIRT methods are better 

since more theory is available for these methods. 

9.1 Future Work 

Finally we will discuss how the work in this thesis can be extended, and how 

we think the performance of the methods could be improved. 

An obvious way to continue the work from this thesis would be to look further 

into the block-iterative methods. We have only discussed a few of these and 

perhaps the implementation of the block-iterative methods could lead to an 

overall better performance. 

Another way to continue the work could have been to investigate the area of 

preconditioning methods for the already implemented methods. It could be 

interesting to observe the effect of the extension. 

If we should advise how the future development on the this field should proceed 

we could advise the development of more stable stopping rules. As discussed 

earlier the existing stopping rules are very unstable and to obtain a good result 

without knowing the exact solution the chosen stopping index is very important. 

Another field of development could be the development of an adaptive strategy 

for choosing the relaxation parameter for the ART methods. This is yet an 

unexplored field and the results for the SIRT methods suggest that this could 

be a good idea.

Appendix A 

Appendix 

A.1 Orthogonal Projection on a Hyperplane 

When defining the orthogonal projection on a hyperplane, we will first look at 

the case where origo lies in the hyperplane Hi and then at the general case 

where origo does not necessarily lies in the hyperplane Hi. We recall from [30] 

that the hyperplane Hi is defined as 

Hi = {x ∈ R n | a i , x = bi}, 

and the case where origo lies in the hyperplane is when bi = 0. 

Figure A.1 shows the case when bi = 0 ⇒ O ∈ Hi. In the figure O denotes 

origo and z is the point z ∈ R n which is projected onto the hyperplane. Pi(z) 

denotes the projection of z onto the hyperplane. We want to derive a relation 

for Pi(z) = z ∗ . Since the projection is orthogonal, we can write Pi(z) as z minus

150 Appendix 

bi = 0 ⇒ O ∈ Hi: 

a i 

a i 

O 

✻ ai 

θ✻ 

z 

✒ 

✿ Pi(z) 

Figure A.1: Projection on the hyperplane Hi in the case where origo lies in the 

hyperplane. 

the orthogonal projection along a i , which give the following: 

Pi(z) = z − z ∗ − z2 

= z − cosθz2 

a i 

a i 2 

a i 

a i 2 

= z − 〈ai , z〉 

ai z2 

2z2 

= z − 〈ai , z〉 

ai2 a 

2 

i . 

a i 

a i 2 

To obtain this result we have used that cosθ = 〈ai ,z〉 

a i 2z2 . 

We will now derive the orthogonal projection on a hyperplane in the case where 

origo O does not lie in the hyperplane. This case is illustrated in figure A.2 

where we want to project z on the hyperplane Hi. We introduce the vector z0, 

which ends in the same point as z. However z0 does not start in origo but in 

the intersection between the hyperplane Hi and the vector orthogonal to the 

hyperplane through origo a i . We denote this point x, and this gives us the 

following relation between z0 and z: 

z = x + z0 

z0 = z − x. 

We define x = αa i . This leads to the following: 

a i , x = a i , αa i = αa i 2 2 = bi. 

Hi

A.1 Orthogonal Projection on a Hyperplane 151 

bi = 0 ⇒ O ∈ Hi: 

x 

O 

✻ ai 

z 

z0✒✕ ✿ Pi(z0) 

Figure A.2: Projection on the hyperplane Hi in the case where origo does not lie in 

the hyperplane. 

From this we get that 

Hi 

α = bi 

ai2 . (A.1) 

2 

We can now determine the orthogonal projection on the hyperplane for z0 as: 

Pi(z0) = z0 − 

We then use that z0 = z − x = z − αa i : 

Pi(z0) = z − αa i − 

 

i a , z0 

a i . 

a i 2 2 

a i , z − αa i 

a i 2 2 

The projection of z on the hyperplane is then Pi(z) = αa i + Pi(z0) and using 

a i .

152 Appendix 

imaginary 

0.8 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

−0.6 

−0.8 

The roots for k = 10,...,30 

−1 −0.5 0 

real 

0.5 1 

k = 10 

k = 11 

k = 12 

k = 13 

k = 14 

k = 15 

k = 16 

k = 17 

k = 18 

k = 19 

k = 20 

k = 21 

k = 22 

k = 23 

k = 24 

k = 25 

k = 26 

k = 27 

k = 28 

k = 29 

k = 30 

Figure A.3: Illustration of the roots for the polynomials for k = 10, . . . , 30. 

this and (A.1) we get: 

Pi(z) = αa i + Pi(z0) = αa i + z − αa i − 

a i , z − αa i 

= z − 

ai2 a 

2 

i 

 

i i 2 a , z − αa 2 = z − 

a i 

= z − 

ai2 2 

 

i bi a , z − ai2 a 

2 

i2 2 

ai2 2 

= z + bi − a i , z 

a i 2 2 

A.2 Investigation of the Roots 

This section contains an investigation of the polynomial 

. 

a i 

a i , z − αa i 

a i 2 2 

gk−1(y) = (2k − 1)y k−1 − (y k−2 + . . . + y + 1) = 0, (A.2) 

and a description of the most suitable approach to calculate the unique root in 

the interval (0, 1). 

To investigate the behaiviour of the all the roots of the polynomial (A.2) we 

a i

A.2 Investigation of the Roots 153 

imaginary 

0.25 

0.2 

0.15 

0.1 

0.05 

0 

−0.05 

−0.1 

−0.15 

−0.2 

−0.25 

The roots for k = 10,...,30 

0.5 0.6 0.7 0.8 0.9 1 

real 

Figure A.4: Zoom of the roots for the polynomials for k = 10, . . . , 30 near the root 1. 

create a figure showing all the roots for the polynomials for k = 10, . . .,30, 

figure A.3. In the figure every polynomial is specified by a specific color and a 

specific marker type. This means that roots only belong to the same polynomial 

if both the color and the marker type are the same. This figure illustrates that 

every polynomial has a real root in the interval [0, 1]. The rest of the roots are 

either a real root in the interval [−1, 0] or complex roots. The complex roots 

create a circle that lies inside the unit circle in the complex plane. 

Since we are interested in the unique root in the interval [0, 1] we look at a zoom 

on these roots figure A.4. We see that the unique roots are isolated from the 

other real roots, but some of the complex roots are rather close. We will now 

investigate if this can cause problems when using Newton-Raphson’s iterative 

method to find the unique root. 

k = 10 

k = 11 

k = 12 

k = 13 

k = 14 

k = 15 

k = 16 

k = 17 

k = 18 

k = 19 

k = 20 

k = 21 

k = 22 

k = 23 

k = 24 

k = 25 

k = 26 

k = 27 

k = 28 

k = 29 

k = 30 

Newton-Raphson’s iterative method are in each step defined as: 

yk+1 = yk − g(yk) 

g ′ (yk) . 

We see that when finding a complex root with Newton-Raphson’s method the 

starting guess y0 has to be complex or the function g(yk) maps the real numbers 

into the complex numbers. Newton-Raphson’s method will therefore for our 

function (A.2) only find real roots if the staring guess is real since a polynomial 

maps the real numbers into the real numbers. And if we further give the starting 

guess y0 = 1, then we have isolated the unique root in the interval [0, 1]. In 

our implementataion of Newton’s method we will always use 6 iterations, since 

experince has shown that 6 would be a good choice.

154 Appendix 

To use Newton-Raphson’s method we need to calculate the derivative of the 

function but since the function is a polynomial, we can use Horner’s algorithm 

to determine both the function and the derivative [12]. 

A.3 Work Units for the SIRT and ART methods 

To compare both the performance of the SIRT and the ART methods we look 

at the work load of one iteration of each of the methods. We define a work 

unit WU to be one sparse matrix vector multiplication. We let ̟ denote the 

average number of non-zero elements in a row. Since the SIRT methods can 

all be written in the same form, we find the work load for one iteration in the 

following way: 

SIRT: 

r k = b − Ax k m + 2m · ̟ 

z k = Mr k m 

v k = A T z k 2m · ̟ 

q k = Tv k n 

x k+1 = x k + λq k 2n 

Total : (4̟ + 2)m + 3n 

≃ 2̟ · m ⇒ 2 WU. 

For Kaczmarz’s method one step can be written in the following way, 

Kaczmarz: 

ri = bi − 〈ai , xk,i−1 〉 2̟ 

xk,i = xk,i−1 + λ ri 

ai2 a 

2 

i 2̟ 

Total : 4̟ · m ⇒ 4 WU. 

Kaczmarz’s method require 4WU, since one iteration consists of m steps. 

Since one iteration of symmetric Kaczmarz consists of 2m−2 steps, the working 

units are: 

sym. Kaczmarz: 

ri = bi − 〈ai , xk,i−1 〉 2̟ 

xk,i = xk,i−1 + λ ri 

ai2 a 

2 

i 2̟ 

Total : 4̟ · (2m − 2) ⇒ 8 WU. 

Since the randomized Kacmarz method has the same formular as Kaczmarz’s 

method, except the selection of the row, the calculation of the work load for 

one step is the same as for Kaczmarz’s method. In the implementation of the 

randomized Kaczmarz method we define one iteration to be m steps. This means 

that for randomized Kaczmarz we have a work load of 4WU for one iteration.

Bibliography 

[1] A. H. Andersen and A. C. Kak, Simultaneous algebraic reconstruction technique 

(SART): A superior implementation of the ART algorithm, Ultrasonic 

Imaging, 6 (1984), p. 81-94. 

[2] G. Appleby and D. C. Smolarski, A linear acceleration row action method 

for projecting onto subspaces, Electron. Trans. Numer. Anal., 20 (2005), p. 

253-275. 

[3] ˚A. Björck and T. Elfving, Accelerated projection methods for computiong 

pseudoinverse solutions of systems of linear equations, BIT Vol. 19 issue 2 

(1979), p. 145-163. 

[4] Y. Censor and T. Elfving, Block-iterative algorithms with diagonally scaled 

oblique projections for the linear feasibility problem, SIAM Vol. 24 (2002), 

p. 40-58. 

[5] Y. Censor, T. Elfving, G. T. Herman and T. Nikazad, On diagonally relaxed 

orthogonal projection methods, SIAM Journal on Scientific Computing, Vol. 

30 issue 1 (2007), p. 473-504. 

[6] Y. Censor, D. Gordan and R. Gordan, Component averaging: An efficient 

iterative parallel algorithm for large sparse unstructured problems, Parallel 

Computing Vol. 27 issue 6 (2001), p. 777-808. 

[7] Y. Censor, D. Gordon and R. Gordon, BICAV: A block-iterative parallel algorithm 

for sparse systems with pixel-related weighting, IEEE Transactions 

on medical Imaging, Vol. 20, (2001), p. 1050-1060. 

[8] G. Cimmino, Calcolo approssimato per le soluzioni dei sistemi di equazioni 

lineari, La Ricerca Scientifica, XVI, Series II, Anno IX, 1 (1938), p. 326-333.

156 BIBLIOGRAPHY 

[9] A. Dax, Line search acceleration of iterative methods, Linear Algebra Appl., 

130 (1990), p. 43-63. 

[10] A. R. De Pierro, Methodos de projeção para a resolução de sistemas gerais 

de equações algébricas lineares, Thesis (tese de Doutoramento), Institutode 

Matematica da UFRJ, Cidade Universitaria, Rio de Janeirom Brasil, 1981. 

[11] L. T. Dos Santos, A parallel subgradient projections method for the convex 

feasibility problem, J. Comput. Appl. Math., 18 (1987), p. 307-320. 

[12] L. Eldén, L. Wittmeyer-Koch and H. B. Nielsen, Introduction to Numerical 

Computation, Studentlitteratur AB, 2004. 

[13] T. Elfving and T. Nikazad, Some properties of ART-type reconstruction algorithms, 

accepted for publication in Mathematical Methods in Biomedical 

Imaging and Intensity-Modulated Radiation Therapy (IMRT) 

[14] T. Elfving and T. Nikazad, Some block-iterative methods used in image 

reconstruction, unpublished article. 

[15] T. Elfving and T. Nikazad, Stopping rules for Landweber-type iteration, 

Inverse Problems, Vol 23 (2007), p. 1417-1432. 

[16] T. Elfving, T. Nikazad and P. C. Hansen, Semi-convergence and relaxation 

parameters for a class of SIRT algorithms, submitted to ETNA. 

[17] R. Gordon, R. Bender and G. T. Herman, Algebraic reconstruction techniques 

for 3 dimensional electron microscopy and x-ray photograph, Journal 

of Theoretical Biology, Vol.29 (1970), p. 471-481. 

[18] D. Gordon and R. Gordon, Component-averaged row projections: A robust, 

block-parallel scheme for sparse linear systems, SIAM Journal on Scientific 

Computing, Vol. 27, No. 3, p. 1092-1117. 

[19] D. Gordon and R. Gordon, Component-averaged row projections: A robust, 

block-parallel scheme for sparse linear systems, SIAM Journal on Scientific 

Computing, Vol. 27, No. 3, p. 1092-1117. 

[20] P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems, SIAM, 1998. 

[21] P. C. Hansen, Regularization Tools version 4.0 for Matlab 7.3, Numerical 

Algorithms (2007), 189-194. 

[22] P. C. Hansen, Discrete Inverse Problems: Insight and Algorithms, SIAM, 

2010. 

[23] U. Hämarik and U. Tautenhahn, On the monotone error rule for aprameter 

choice in iterative and continuous regularization methods, BIT, 41 (2001), 

p. 1029-1038.

BIBLIOGRAPHY 157 

[24] G. N. Hounsfield, Computericed transverse axial scanning tomography: Part 

I, discription of the system, Br. J. Radiol, 46 (1973), p. 1016-1022. 

[25] M. Jiang and G. Wang, Convergence studies on iterative algorithms for 

image reconstruction, IEEE Transactions on Medical imaging, 22 (2003), 

p. 569-579. 

[26] M. Jiang and G. Wang, Convergence if the Simultaneous Algebraic Reconstruction 

Technique (SART), IEEE Transactions on Image Proceseeing, 

Vol. 12, 2003, p. 957-961. 

[27] S. Kaczmarz, Angenäherte auflösung von systemen linearer gleichungen, 

Bulletin de l’Académie Polonaise des Sciences et Lettres, A35 (1937), p. 

355-357. 

[28] A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, 

SIAM, 2001. 

[29] L. Landweber, An iteration formula for Fredholm integral of the first kind, 

American Journal of Mathematics, Vol. 73 (1951), p. 615-624. 

[30] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000. 

[31] F. Natterer, The Mathematics of Computerized Tomography, SIAM, 2001 

[32] F. Natterer and F. Wübbeling, Mathematical Methods in Image Reconstruction, 

SIAM, 2001. 

[33] T. S. Pan, Acceleration and filtering in the Generalized Landweber iteration 

using a variable shaping matrix, IEEE Transactions on Medical Imaging, 

Vol. 12, (1993), p. 278-286. 

[34] C. Popa, Extensions of block-projections methods with relaxation parameters 

to inconsistent and rank-deficient least-squares problems, BIT 38 

(1998), p. 151-176. 

[35] G. Qu, C. Wang and M. Jiang, Necessary and sufficient convergence conditions 

for algebraic image reconstruction algorithms, IEEE Transactions on 

Image Processing Vol. 18 issue 2 (2009), p. 435-440. 

[36] T. Strohmer and R. Vershynin, A randomized solver for linear systems with 

exponential convergence, Lecture Notes in Computer Science 4110 (2006), 

p. 499-507. 

[37] P. Toft, The Radon Transform, Theory and Implementation, unpubliched 

dissertation, p. 199-201. 

[38] C. F. Van Loan, Introduction to Scientific Computing - A Matrix-Vector 

Approach Using MATLAB , Pearson Higher Education, 1996.

AIR Tools - A MATLAB Package for Algebraic Iterative ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?