18.01.2015 Views

Karjalainen, Pasi A. Regularization and Bayesian methods for ...

Karjalainen, Pasi A. Regularization and Bayesian methods for ...

Karjalainen, Pasi A. Regularization and Bayesian methods for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

52 3. <strong>Regularization</strong> theory<br />

where<br />

L T L =<br />

(<br />

C −1<br />

v 0<br />

0 C −1<br />

θ<br />

)<br />

(3.17)<br />

Thus generalized Tikhonov regularization is the optimal mean square estimator if<br />

v ∼ N(0,W −1 ) <strong>and</strong> θ ∼ N(θ ∗ ,α −2 (L T 2 L 2 ) −1 ).<br />

Sometimes we do not have any particular assumptions about the parameters θ<br />

themselves. They may not even have any physical meaning as in the case of generic<br />

basis functions in linear least squares. More often we have beliefs about what the<br />

measurements should be. In this case we have to regularize the prediction of the<br />

model, that is, we want that Hθ has some desirable properties. The regularized<br />

solution <strong>for</strong> W = I is then<br />

ˆθ α = arg min<br />

θ<br />

{‖Hθ − z‖ 2 + α 2 ‖L 2 (Hθ − Hθ ∗ )‖ 2 } (3.18)<br />

<strong>and</strong> the solution <strong>for</strong> this is clearly<br />

ˆθ α = (H T H + α 2 H T L T 2 L 2 H) −1 (H T z + α 2 H T L T 2 L 2 Hθ ∗ ) (3.19)<br />

This is of course of the same <strong>for</strong>m than (3.15) with regularization matrix L ′ = L 2 H.<br />

However in <strong>for</strong>m (3.19) st<strong>and</strong>ard regularization matrices are directly applicable.<br />

3.3 Principal component based regularization<br />

In [23] it is emphasized that the notion of regularization has its origin in approximation<br />

theory <strong>and</strong> many of the regularization <strong>methods</strong> there can be put in the<br />

<strong>for</strong>m<br />

ˆθ = GH T z (3.20)<br />

This estimator clearly covers the ordinary least squares solution with the choice<br />

G = (H T H) −1 . The selection of G can thus be seen as approximation of the<br />

inverse of H T H. The <strong>for</strong>m (3.20) includes also the Tikhonov regularization. The<br />

notion of ridge regression has been used <strong>for</strong> Tikhonov regularization with L = I<br />

<strong>and</strong> the parameter α is called then the ridge parameter. In general ridge regression<br />

problem we have some criterion, e.g. |θ| = c <strong>for</strong> some given c, <strong>for</strong> selecting the<br />

parameter α [69].<br />

It is possible to select<br />

G = ∑ 1<br />

v j vj T (3.21)<br />

α j<br />

j∈S<br />

where v i <strong>and</strong> α i are the eigenvectors <strong>and</strong> eigenvalues of matrix H T H, respectively.<br />

S is the index set S ⊂ I, where I = (1,..., rank (H T H)). The selection of the<br />

index set S can be based on the eigenvalues α i or the correlation of the eigenvectors<br />

with the data z. In the first approach the goal is to eliminate the large variances<br />

in regression by deleting the components <strong>for</strong> which α i is small. This approach is<br />

sometimes called the traditional principal component regression [23]. In the second<br />

approach it is undesirable to delete components that have large correlations with

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!