Slides for Fuzzy Sets, Ch. 2 of Neuro-Fuzzy and Soft Computing
Slides for Fuzzy Sets, Ch. 2 of Neuro-Fuzzy and Soft Computing
Slides for Fuzzy Sets, Ch. 2 of Neuro-Fuzzy and Soft Computing
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
System Identification <strong>and</strong><br />
Optimization Methods<br />
Based on Derivatives<br />
<strong>Ch</strong>apters 5 & 6 from Jang
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong><br />
Model space<br />
Adaptive networks<br />
Neural networks<br />
<strong>Fuzzy</strong> inf. systems<br />
Approach space<br />
Derivative-free optim.<br />
S<strong>of</strong>t<br />
<strong>Computing</strong><br />
Derivative-based optim.<br />
2
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Outline<br />
•System Identification<br />
•Least Square Optimization<br />
•Optimization Based on Differentiation<br />
• First order differentiation – Steepest descent<br />
• Second order differentiation<br />
3
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
System Identification: Introduction<br />
Goal<br />
• Determine a mathematical model <strong>for</strong> an unknown<br />
system (or target system) by observing its inputoutput<br />
data pairs<br />
4
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
System Identification: Introduction<br />
Purpose:<br />
• To predict a system’s behavior, as in time series<br />
prediction & weather <strong>for</strong>ecasting<br />
• To explain the interactions & relationships<br />
between inputs & outputs <strong>of</strong> a system<br />
• To design a controller based on the model <strong>of</strong> a<br />
system, as an aircraft or ship control<br />
• Simulate the system under control once the model<br />
is known<br />
5
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
System Identification: Introduction<br />
There are two main steps that are involved<br />
• Structure identification<br />
• Parameter identification<br />
6
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
System Identification: Introduction<br />
• Structure identification<br />
Apply a-priori knowledge about the target system to<br />
determine a class <strong>of</strong> models within which the search <strong>for</strong><br />
the most suitable model is to be conducted; this class<br />
<strong>of</strong> model is denoted by a function y = f(u,θ) where:<br />
y is the model output<br />
u is the input vector<br />
θ is the parameter vector<br />
f depends on the problem at h<strong>and</strong> <strong>and</strong> on the<br />
designer’s experience <strong>and</strong> the laws <strong>of</strong> nature governing<br />
the target system<br />
7
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
System Identification: Introduction<br />
• Parameter identification<br />
The structure <strong>of</strong> the model is known, however we<br />
need to apply optimization techniques in order to<br />
determine the parameter vector such that<br />
the resulting model describes the<br />
system appropriately:<br />
ŷ<br />
θ<br />
= f(u, θˆ )<br />
= θˆ<br />
y<br />
i<br />
− ŷ<br />
→<br />
0<br />
with y<br />
i<br />
assigned to<br />
u<br />
i<br />
8
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
System Identification: Introduction<br />
9
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
System Identification: Introduction<br />
The data set composed <strong>of</strong> m desired input-output pairs (u i<br />
;y i<br />
)<br />
(i = 1,…,m) is called the training data<br />
System identification needs to do both structure & parameter<br />
identification repeatedly until satisfactory model is found: it<br />
does this as follows:<br />
1) Specify & parameterize a class <strong>of</strong> mathematical models<br />
representing the system to be identified<br />
2) Per<strong>for</strong>m parameter identification to choose the parameters that best<br />
fit the training data set<br />
3) Conduct validation set to see if the model identified responds<br />
correctly to an unseen data set<br />
10<br />
4) Terminate the procedure once the results <strong>of</strong> the validation test are<br />
satisfactory. Otherwise, another class <strong>of</strong> model is selected <strong>and</strong><br />
repeat steps 2 to 4
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators<br />
General <strong>for</strong>m:<br />
y = θ 1 f 1 (u) + θ 2 f 2 (u) + … + θ n f n (u) (*)<br />
where:<br />
• u = (u 1 , …, u p ) T is the model input vector<br />
• f 1 , …, f n are known functions <strong>of</strong> u<br />
• θ 1 , …, θ n are unknown parameters to be estimated<br />
11
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators<br />
The task <strong>of</strong> fitting data using a linear model is referred<br />
to as linear regression<br />
We collect a training data set<br />
{(u i ;y i ), i = 1, …, m}<br />
Equation (*) becomes:<br />
⎧f<br />
⎪<br />
f<br />
⎨<br />
⎪M<br />
⎪⎩<br />
f<br />
1<br />
1<br />
1<br />
(u<br />
(u<br />
(u<br />
1<br />
2<br />
m<br />
) θ<br />
) θ<br />
1<br />
1<br />
) θ<br />
1<br />
+ f<br />
+ f<br />
+ f<br />
) θ<br />
) θ<br />
) θ<br />
+ ... + f<br />
+ ... + f<br />
+ ... + f<br />
which is equivalent to: A θ = y<br />
2<br />
2<br />
(u<br />
2<br />
(u<br />
1<br />
(u<br />
2<br />
m<br />
2<br />
2<br />
2<br />
n<br />
n<br />
(u<br />
(u<br />
n<br />
1<br />
2<br />
(u<br />
) θ<br />
) θ<br />
m<br />
n<br />
n<br />
) θ<br />
=<br />
=<br />
n<br />
y<br />
y<br />
=<br />
1<br />
2<br />
y<br />
m<br />
12
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators<br />
Where: A is an m*n matrix which is:<br />
θ is n*1 unknown parameter vector:<br />
A<br />
=<br />
θ =<br />
⎡f<br />
⎢<br />
M<br />
⎢<br />
⎣⎢<br />
f<br />
1<br />
1<br />
⎡θ<br />
⎢<br />
⎢<br />
⎢⎣<br />
θ<br />
1<br />
M<br />
(u<br />
(u<br />
n<br />
⎤<br />
⎥<br />
⎥<br />
⎥⎦<br />
1<br />
m<br />
) Lf<br />
n<br />
) Lf<br />
n<br />
(u<br />
1<br />
(u<br />
) ⎤<br />
⎥<br />
⎥<br />
) ⎥⎦<br />
m<br />
<strong>and</strong> y is an m*1 output vector:<br />
y<br />
⎡y<br />
=<br />
⎢<br />
M<br />
⎢<br />
⎢⎣<br />
y<br />
1<br />
m<br />
⎤<br />
⎥<br />
⎥<br />
⎥⎦<br />
<strong>and</strong> a<br />
T<br />
i<br />
=<br />
[ f (u ),...,f (u )]<br />
1<br />
i<br />
n<br />
i<br />
13<br />
A θ = y ⇔ θ= A -1 y (solution)
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators<br />
We have m outputs & n fitting parameters to find (or m<br />
equations & n unknown variables)<br />
Usually, m is greater than n, since the model is just an<br />
approximation <strong>of</strong> the target system <strong>and</strong> the data<br />
observed might be corrupted. There<strong>for</strong>e, an exact<br />
solution is not always possible.<br />
To overcome this inherent conceptual problem, an error<br />
vector e is added to compensate<br />
A θ + e = y<br />
14
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators<br />
The goal now consists <strong>of</strong> finding that<br />
reduces the errors between y i<br />
<strong>and</strong><br />
or<br />
min<br />
imize<br />
θ<br />
min imize<br />
θ<br />
m<br />
∑<br />
i = 1<br />
m<br />
y i<br />
− a i<br />
T<br />
θˆ<br />
θ<br />
∑ ( y ) −<br />
T<br />
i a i θ<br />
i=<br />
1<br />
2<br />
ŷ = T i ai<br />
θ<br />
15
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators<br />
If e = y - Aθ then:<br />
E( θ)<br />
=<br />
i = m<br />
∑<br />
i=<br />
1<br />
T<br />
2<br />
T<br />
(y i − ai<br />
θ)<br />
= e e = (y − Aθ)<br />
(y − Aθ)<br />
T<br />
We need to compute:<br />
min(y<br />
θ<br />
− Aθ)<br />
T<br />
(y<br />
−<br />
Aθ)<br />
16
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators<br />
Theorem [least-squares estimator]<br />
The squared error is minimized when θ =<br />
(called the least-squares estimators LSE) satisfies<br />
the normal equation A T A = A T y, if A T A is<br />
nonsingular, is unique & is given by<br />
θˆ<br />
θˆ<br />
= (A T A) -1 A T y<br />
θˆ<br />
θˆ<br />
17
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators: Example<br />
Example<br />
The relationship is between the spring length & the<br />
<strong>for</strong>ce applied L = k 1 f + k 0 (linear model)<br />
• Goal: find = (k o , k 1 ) T that best fits the data <strong>for</strong> a<br />
given <strong>for</strong>ce f o , we need to determine the<br />
corresponding spring length L 0<br />
- Solution 1: provide 2 pairs (L 0 ,f 0 ) <strong>and</strong> (L 1 ,f 1 ) <strong>and</strong> solve<br />
a linear system <strong>of</strong> 2 equations <strong>and</strong> 2 variables k 1 <strong>and</strong><br />
k 0<br />
However, because <strong>of</strong> noisy data, this solution is not<br />
reliable.<br />
⇒ kˆ<br />
θˆ<br />
1 & kˆ<br />
0<br />
- Solution 2: use a larger training set (L i ,f i )<br />
18
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators: Example<br />
since y = e + Aθ, we can write:<br />
⎡1<br />
1.1⎤<br />
⎢<br />
1 1.9<br />
⎥<br />
⎢ ⎥<br />
⎢1<br />
3.2⎥<br />
⎢ ⎥<br />
⎢<br />
1 4.4<br />
⎥<br />
⎢1<br />
5.9⎥<br />
⎢ ⎥<br />
⎢1<br />
7.4⎥<br />
⎢⎣<br />
1 9.2⎥<br />
4243 ⎦<br />
A<br />
⎡k<br />
0 ⎤<br />
⎢ +<br />
k<br />
⎥<br />
⎣{<br />
1 ⎦<br />
θ<br />
⎡1.5⎤<br />
⎡e1<br />
⎤ ⎢<br />
2.1<br />
⎥<br />
⎢<br />
e<br />
⎥ ⎢ ⎥<br />
⎢ 2 ⎥ ⎢2.5⎥<br />
⎢e3<br />
⎥ ⎢ ⎥<br />
⎢ ⎥ = 3.3<br />
e ⎢ ⎥<br />
⎢ 4 ⎥ ⎢4.1⎥<br />
⎢M<br />
⎥ ⎢ ⎥<br />
⎢ ⎥ 4.6<br />
e<br />
⎢ ⎥<br />
⎣{ 7 ⎦ ⎢⎣<br />
5.0⎥<br />
e 12 3 ⎦<br />
there<strong>for</strong>e the LSE <strong>of</strong> [k 0 ,k 1 ] T which minimizes<br />
is equal to :<br />
y<br />
20
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators: Example<br />
There<strong>for</strong>e, the LSE <strong>of</strong> [k 0<br />
,k 1<br />
] T which minimizes<br />
is equal to :<br />
e<br />
T<br />
e<br />
=<br />
7<br />
∑<br />
i=<br />
1<br />
e<br />
2<br />
i<br />
⎡kˆ<br />
⎢<br />
⎣⎢<br />
kˆ<br />
0<br />
1<br />
⎤<br />
⎥<br />
⎥⎦<br />
=<br />
(A<br />
T<br />
A)<br />
−1<br />
A<br />
T<br />
y<br />
=<br />
⎡1.20⎤<br />
⎢ ⎥<br />
⎣0.44⎦<br />
21
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators: Example<br />
We rely on this estimation because we have more data<br />
If we are not happy with the LSE estimators then we can<br />
increase the model’s degree <strong>of</strong> freedom such that:<br />
L = k 0 + k 1 f + k 2 f 2 + … + k n f n<br />
(least square polynomial)<br />
22
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators: Example<br />
Higher order models fit better the data but they do not<br />
always reflect the inner law that governs the system<br />
For example, when f is increasing toward 10N, the<br />
length is decreasing.<br />
23
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Least-Square Estimators: Example<br />
24
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Derivative-Based Optimization<br />
Based on first derivatives:<br />
• Steepest descent<br />
• Conjugate gradient method<br />
• Gauss-Newton method<br />
• Levenberg-Marquardt method<br />
• And many others<br />
Based on second derivatives:<br />
• Newton method<br />
• And many others<br />
25
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Steepest Descent<br />
Have n-dimensional space<br />
Want to find<br />
*<br />
θ = θ<br />
[ θ θ ] T<br />
θ = 1,<br />
2,...,<br />
such that<br />
Investigate the search space via iteration<br />
such that<br />
E<br />
θ<br />
E<br />
θ n<br />
(<br />
*<br />
θ ) = min( E( θ ))<br />
( )<br />
1<br />
= θ<br />
k<br />
+ ηkdk<br />
= 1,2,3,...<br />
k +<br />
k<br />
( θ ) = E( θ + η ) < E( θ )<br />
d<br />
k + 1 k k<br />
k<br />
26
<strong>Neuro</strong>-<strong>Fuzzy</strong> <strong>and</strong> S<strong>of</strong>t <strong>Computing</strong>: <strong>Fuzzy</strong> <strong>Sets</strong><br />
Steepest Descent<br />
If the direction d<br />
is determined from the gradient <strong>of</strong> E( θ )<br />
i.e.,<br />
g<br />
( θ) = ∇E( )<br />
( θ ) ∂E( θ ) ∂E( θ ) T<br />
def ⎡∂E<br />
θ = ⎢ , ,...,<br />
⎣ ∂θ<br />
∂θ<br />
1 2<br />
∂θ<br />
n<br />
⎥ ⎦<br />
⎤<br />
such that<br />
θ<br />
= θ<br />
k + 1 k<br />
−ηg<br />
then we talk about the method <strong>of</strong> steepest descent<br />
27