11.01.2014 Views

Slides for Fuzzy Sets, Ch. 2 of Neuro-Fuzzy and Soft Computing

Slides for Fuzzy Sets, Ch. 2 of Neuro-Fuzzy and Soft Computing

Slides for Fuzzy Sets, Ch. 2 of Neuro-Fuzzy and Soft Computing

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

System Identification <strong>and</strong><br />

Optimization Methods<br />

Based on Derivatives<br />

<strong>Ch</strong>apters 5 & 6 from Jang


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong><br />

Model space<br />

Adaptive networks<br />

Neural networks<br />

<strong>Fuzzy</strong> inf. systems<br />

Approach space<br />

Derivative-free optim.<br />

S<strong>of</strong>t<br />

<strong>Computing</strong><br />

Derivative-based optim.<br />

2


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Outline<br />

•System Identification<br />

•Least Square Optimization<br />

•Optimization Based on Differentiation<br />

• First order differentiation – Steepest descent<br />

• Second order differentiation<br />

3


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

System Identification: Introduction<br />

Goal<br />

• Determine a mathematical model <strong>for</strong> an unknown<br />

system (or target system) by observing its inputoutput<br />

data pairs<br />

4


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

System Identification: Introduction<br />

Purpose:<br />

• To predict a system’s behavior, as in time series<br />

prediction & weather <strong>for</strong>ecasting<br />

• To explain the interactions & relationships<br />

between inputs & outputs <strong>of</strong> a system<br />

• To design a controller based on the model <strong>of</strong> a<br />

system, as an aircraft or ship control<br />

• Simulate the system under control once the model<br />

is known<br />

5


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

System Identification: Introduction<br />

There are two main steps that are involved<br />

• Structure identification<br />

• Parameter identification<br />

6


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

System Identification: Introduction<br />

• Structure identification<br />

Apply a-priori knowledge about the target system to<br />

determine a class <strong>of</strong> models within which the search <strong>for</strong><br />

the most suitable model is to be conducted; this class<br />

<strong>of</strong> model is denoted by a function y = f(u,θ) where:<br />

y is the model output<br />

u is the input vector<br />

θ is the parameter vector<br />

f depends on the problem at h<strong>and</strong> <strong>and</strong> on the<br />

designer’s experience <strong>and</strong> the laws <strong>of</strong> nature governing<br />

the target system<br />

7


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

System Identification: Introduction<br />

• Parameter identification<br />

The structure <strong>of</strong> the model is known, however we<br />

need to apply optimization techniques in order to<br />

determine the parameter vector such that<br />

the resulting model describes the<br />

system appropriately:<br />

ŷ<br />

θ<br />

= f(u, θˆ )<br />

= θˆ<br />

y<br />

i<br />

− ŷ<br />

→<br />

0<br />

with y<br />

i<br />

assigned to<br />

u<br />

i<br />

8


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

System Identification: Introduction<br />

9


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

System Identification: Introduction<br />

The data set composed <strong>of</strong> m desired input-output pairs (u i<br />

;y i<br />

)<br />

(i = 1,…,m) is called the training data<br />

System identification needs to do both structure & parameter<br />

identification repeatedly until satisfactory model is found: it<br />

does this as follows:<br />

1) Specify & parameterize a class <strong>of</strong> mathematical models<br />

representing the system to be identified<br />

2) Per<strong>for</strong>m parameter identification to choose the parameters that best<br />

fit the training data set<br />

3) Conduct validation set to see if the model identified responds<br />

correctly to an unseen data set<br />

10<br />

4) Terminate the procedure once the results <strong>of</strong> the validation test are<br />

satisfactory. Otherwise, another class <strong>of</strong> model is selected <strong>and</strong><br />

repeat steps 2 to 4


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators<br />

General <strong>for</strong>m:<br />

y = θ 1 f 1 (u) + θ 2 f 2 (u) + … + θ n f n (u) (*)<br />

where:<br />

• u = (u 1 , …, u p ) T is the model input vector<br />

• f 1 , …, f n are known functions <strong>of</strong> u<br />

• θ 1 , …, θ n are unknown parameters to be estimated<br />

11


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators<br />

The task <strong>of</strong> fitting data using a linear model is referred<br />

to as linear regression<br />

We collect a training data set<br />

{(u i ;y i ), i = 1, …, m}<br />

Equation (*) becomes:<br />

⎧f<br />

⎪<br />

f<br />

⎨<br />

⎪M<br />

⎪⎩<br />

f<br />

1<br />

1<br />

1<br />

(u<br />

(u<br />

(u<br />

1<br />

2<br />

m<br />

) θ<br />

) θ<br />

1<br />

1<br />

) θ<br />

1<br />

+ f<br />

+ f<br />

+ f<br />

) θ<br />

) θ<br />

) θ<br />

+ ... + f<br />

+ ... + f<br />

+ ... + f<br />

which is equivalent to: A θ = y<br />

2<br />

2<br />

(u<br />

2<br />

(u<br />

1<br />

(u<br />

2<br />

m<br />

2<br />

2<br />

2<br />

n<br />

n<br />

(u<br />

(u<br />

n<br />

1<br />

2<br />

(u<br />

) θ<br />

) θ<br />

m<br />

n<br />

n<br />

) θ<br />

=<br />

=<br />

n<br />

y<br />

y<br />

=<br />

1<br />

2<br />

y<br />

m<br />

12


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators<br />

Where: A is an m*n matrix which is:<br />

θ is n*1 unknown parameter vector:<br />

A<br />

=<br />

θ =<br />

⎡f<br />

⎢<br />

M<br />

⎢<br />

⎣⎢<br />

f<br />

1<br />

1<br />

⎡θ<br />

⎢<br />

⎢<br />

⎢⎣<br />

θ<br />

1<br />

M<br />

(u<br />

(u<br />

n<br />

⎤<br />

⎥<br />

⎥<br />

⎥⎦<br />

1<br />

m<br />

) Lf<br />

n<br />

) Lf<br />

n<br />

(u<br />

1<br />

(u<br />

) ⎤<br />

⎥<br />

⎥<br />

) ⎥⎦<br />

m<br />

<strong>and</strong> y is an m*1 output vector:<br />

y<br />

⎡y<br />

=<br />

⎢<br />

M<br />

⎢<br />

⎢⎣<br />

y<br />

1<br />

m<br />

⎤<br />

⎥<br />

⎥<br />

⎥⎦<br />

<strong>and</strong> a<br />

T<br />

i<br />

=<br />

[ f (u ),...,f (u )]<br />

1<br />

i<br />

n<br />

i<br />

13<br />

A θ = y ⇔ θ= A -1 y (solution)


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators<br />

We have m outputs & n fitting parameters to find (or m<br />

equations & n unknown variables)<br />

Usually, m is greater than n, since the model is just an<br />

approximation <strong>of</strong> the target system <strong>and</strong> the data<br />

observed might be corrupted. There<strong>for</strong>e, an exact<br />

solution is not always possible.<br />

To overcome this inherent conceptual problem, an error<br />

vector e is added to compensate<br />

A θ + e = y<br />

14


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators<br />

The goal now consists <strong>of</strong> finding that<br />

reduces the errors between y i<br />

<strong>and</strong><br />

or<br />

min<br />

imize<br />

θ<br />

min imize<br />

θ<br />

m<br />

∑<br />

i = 1<br />

m<br />

y i<br />

− a i<br />

T<br />

θˆ<br />

θ<br />

∑ ( y ) −<br />

T<br />

i a i θ<br />

i=<br />

1<br />

2<br />

ŷ = T i ai<br />

θ<br />

15


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators<br />

If e = y - Aθ then:<br />

E( θ)<br />

=<br />

i = m<br />

∑<br />

i=<br />

1<br />

T<br />

2<br />

T<br />

(y i − ai<br />

θ)<br />

= e e = (y − Aθ)<br />

(y − Aθ)<br />

T<br />

We need to compute:<br />

min(y<br />

θ<br />

− Aθ)<br />

T<br />

(y<br />

−<br />

Aθ)<br />

16


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators<br />

Theorem [least-squares estimator]<br />

The squared error is minimized when θ =<br />

(called the least-squares estimators LSE) satisfies<br />

the normal equation A T A = A T y, if A T A is<br />

nonsingular, is unique & is given by<br />

θˆ<br />

θˆ<br />

= (A T A) -1 A T y<br />

θˆ<br />

θˆ<br />

17


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators: Example<br />

Example<br />

The relationship is between the spring length & the<br />

<strong>for</strong>ce applied L = k 1 f + k 0 (linear model)<br />

• Goal: find = (k o , k 1 ) T that best fits the data <strong>for</strong> a<br />

given <strong>for</strong>ce f o , we need to determine the<br />

corresponding spring length L 0<br />

- Solution 1: provide 2 pairs (L 0 ,f 0 ) <strong>and</strong> (L 1 ,f 1 ) <strong>and</strong> solve<br />

a linear system <strong>of</strong> 2 equations <strong>and</strong> 2 variables k 1 <strong>and</strong><br />

k 0<br />

However, because <strong>of</strong> noisy data, this solution is not<br />

reliable.<br />

⇒ kˆ<br />

θˆ<br />

1 & kˆ<br />

0<br />

- Solution 2: use a larger training set (L i ,f i )<br />

18


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators: Example<br />

since y = e + Aθ, we can write:<br />

⎡1<br />

1.1⎤<br />

⎢<br />

1 1.9<br />

⎥<br />

⎢ ⎥<br />

⎢1<br />

3.2⎥<br />

⎢ ⎥<br />

⎢<br />

1 4.4<br />

⎥<br />

⎢1<br />

5.9⎥<br />

⎢ ⎥<br />

⎢1<br />

7.4⎥<br />

⎢⎣<br />

1 9.2⎥<br />

4243 ⎦<br />

A<br />

⎡k<br />

0 ⎤<br />

⎢ +<br />

k<br />

⎥<br />

⎣{<br />

1 ⎦<br />

θ<br />

⎡1.5⎤<br />

⎡e1<br />

⎤ ⎢<br />

2.1<br />

⎥<br />

⎢<br />

e<br />

⎥ ⎢ ⎥<br />

⎢ 2 ⎥ ⎢2.5⎥<br />

⎢e3<br />

⎥ ⎢ ⎥<br />

⎢ ⎥ = 3.3<br />

e ⎢ ⎥<br />

⎢ 4 ⎥ ⎢4.1⎥<br />

⎢M<br />

⎥ ⎢ ⎥<br />

⎢ ⎥ 4.6<br />

e<br />

⎢ ⎥<br />

⎣{ 7 ⎦ ⎢⎣<br />

5.0⎥<br />

e 12 3 ⎦<br />

there<strong>for</strong>e the LSE <strong>of</strong> [k 0 ,k 1 ] T which minimizes<br />

is equal to :<br />

y<br />

20


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators: Example<br />

There<strong>for</strong>e, the LSE <strong>of</strong> [k 0<br />

,k 1<br />

] T which minimizes<br />

is equal to :<br />

e<br />

T<br />

e<br />

=<br />

7<br />

∑<br />

i=<br />

1<br />

e<br />

2<br />

i<br />

⎡kˆ<br />

⎢<br />

⎣⎢<br />

kˆ<br />

0<br />

1<br />

⎤<br />

⎥<br />

⎥⎦<br />

=<br />

(A<br />

T<br />

A)<br />

−1<br />

A<br />

T<br />

y<br />

=<br />

⎡1.20⎤<br />

⎢ ⎥<br />

⎣0.44⎦<br />

21


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators: Example<br />

We rely on this estimation because we have more data<br />

If we are not happy with the LSE estimators then we can<br />

increase the model’s degree <strong>of</strong> freedom such that:<br />

L = k 0 + k 1 f + k 2 f 2 + … + k n f n<br />

(least square polynomial)<br />

22


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators: Example<br />

Higher order models fit better the data but they do not<br />

always reflect the inner law that governs the system<br />

For example, when f is increasing toward 10N, the<br />

length is decreasing.<br />

23


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Least-Square Estimators: Example<br />

24


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Derivative-Based Optimization<br />

Based on first derivatives:<br />

• Steepest descent<br />

• Conjugate gradient method<br />

• Gauss-Newton method<br />

• Levenberg-Marquardt method<br />

• And many others<br />

Based on second derivatives:<br />

• Newton method<br />

• And many others<br />

25


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Steepest Descent<br />

Have n-dimensional space<br />

Want to find<br />

*<br />

θ = θ<br />

[ θ θ ] T<br />

θ = 1,<br />

2,...,<br />

such that<br />

Investigate the search space via iteration<br />

such that<br />

E<br />

θ<br />

E<br />

θ n<br />

(<br />

*<br />

θ ) = min( E( θ ))<br />

( )<br />

1<br />

= θ<br />

k<br />

+ ηkdk<br />

= 1,2,3,...<br />

k +<br />

k<br />

( θ ) = E( θ + η ) < E( θ )<br />

d<br />

k + 1 k k<br />

k<br />

26


<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />

Steepest Descent<br />

If the direction d<br />

is determined from the gradient <strong>of</strong> E( θ )<br />

i.e.,<br />

g<br />

( θ) = ∇E( )<br />

( θ ) ∂E( θ ) ∂E( θ ) T<br />

def ⎡∂E<br />

θ = ⎢ , ,...,<br />

⎣ ∂θ<br />

∂θ<br />

1 2<br />

∂θ<br />

n<br />

⎥ ⎦<br />

⎤<br />

such that<br />

θ<br />

= θ<br />

k + 1 k<br />

−ηg<br />

then we talk about the method <strong>of</strong> steepest descent<br />

27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!