Slides for Fuzzy Sets, Ch. 2 of Neuro-Fuzzy and Soft Computing

System Identification and 

Optimization Methods 

Based on Derivatives 

Chapters 5 & 6 from Jang

Neuro-Fuzzy and Soft Computing: Fuzzy Sets 

Neuro-Fuzzy and Soft Computing 

Model space 

Adaptive networks 

Neural networks 

Fuzzy inf. systems 

Approach space 

Derivative-free optim. 

Soft 

Computing 

Derivative-based optim. 

2


Outline 

•System Identification 

•Least Square Optimization 

•Optimization Based on Differentiation 

• First order differentiation – Steepest descent 

• Second order differentiation 

3


System Identification: Introduction 

Goal 

• Determine a mathematical model for an unknown 

system (or target system) by observing its inputoutput 

data pairs 

4



Purpose: 

• To predict a system’s behavior, as in time series 

prediction & weather forecasting 

• To explain the interactions & relationships 

between inputs & outputs of a system 

• To design a controller based on the model of a 

system, as an aircraft or ship control 

• Simulate the system under control once the model 

is known 

5



There are two main steps that are involved 

• Structure identification 

• Parameter identification 

6



• Structure identification 

Apply a-priori knowledge about the target system to 

determine a class of models within which the search for 

the most suitable model is to be conducted; this class 

of model is denoted by a function y = f(u,θ) where: 

y is the model output 

u is the input vector 

θ is the parameter vector 

f depends on the problem at hand and on the 

designer’s experience and the laws of nature governing 

the target system 

7



• Parameter identification 

The structure of the model is known, however we 

need to apply optimization techniques in order to 

determine the parameter vector such that 

the resulting model describes the 

system appropriately: 

ŷ 

θ 

= f(u, θˆ ) 

= θˆ 

y 

i 

− ŷ 

→ 

0 

with y 

i 

assigned to 

u 

i 

8



9



The data set composed of m desired input-output pairs (u i 

;y i 

) 

(i = 1,…,m) is called the training data 

System identification needs to do both structure & parameter 

identification repeatedly until satisfactory model is found: it 

does this as follows: 

1) Specify & parameterize a class of mathematical models 

representing the system to be identified 

2) Perform parameter identification to choose the parameters that best 

fit the training data set 

3) Conduct validation set to see if the model identified responds 

correctly to an unseen data set 

10 

4) Terminate the procedure once the results of the validation test are 

satisfactory. Otherwise, another class of model is selected and 

repeat steps 2 to 4


Least-Square Estimators 

General form: 

y = θ 1 f 1 (u) + θ 2 f 2 (u) + … + θ n f n (u) (*) 

where: 

• u = (u 1 , …, u p ) T is the model input vector 

• f 1 , …, f n are known functions of u 

• θ 1 , …, θ n are unknown parameters to be estimated 

11



The task of fitting data using a linear model is referred 

to as linear regression 

We collect a training data set 

{(u i ;y i ), i = 1, …, m} 

Equation (*) becomes: 

⎧f 

⎪ 

f 

⎨ 

⎪M 

⎪⎩ 

f 

1 

1 

1 

(u 

(u 

(u 

1 

2 

m 

) θ 

) θ 

1 

1 

) θ 

1 

+ f 

+ f 

+ f 

) θ 

) θ 

) θ 

+ ... + f 

+ ... + f 

+ ... + f 

which is equivalent to: A θ = y 

2 

2 

(u 

2 

(u 

1 

(u 

2 

m 

2 

2 

2 

n 

n 

(u 

(u 

n 

1 

2 

(u 

) θ 

) θ 

m 

n 

n 

) θ 

= 

= 

n 

y 

y 

= 

1 

2 

y 

m 

12



Where: A is an m*n matrix which is: 

θ is n*1 unknown parameter vector: 

A 

= 

θ = 

⎡f 

⎢ 

M 

⎢ 

⎣⎢ 

f 

1 

1 

⎡θ 

⎢ 

⎢ 

⎢⎣ 

θ 

1 

M 

(u 

(u 

n 

⎤ 

⎥ 

⎥ 

⎥⎦ 

1 

m 

) Lf 

n 

) Lf 

n 

(u 

1 

(u 

) ⎤ 

⎥ 

⎥ 

) ⎥⎦ 

m 

and y is an m*1 output vector: 

y 

⎡y 

= 

⎢ 

M 

⎢ 

⎢⎣ 

y 

1 

m 

⎤ 

⎥ 

⎥ 

⎥⎦ 

and a 

T 

i 

= 

[ f (u ),...,f (u )] 

1 

i 

n 

i 

13 

A θ = y ⇔ θ= A -1 y (solution)



We have m outputs & n fitting parameters to find (or m 

equations & n unknown variables) 

Usually, m is greater than n, since the model is just an 

approximation of the target system and the data 

observed might be corrupted. Therefore, an exact 

solution is not always possible. 

To overcome this inherent conceptual problem, an error 

vector e is added to compensate 

A θ + e = y 

14



The goal now consists of finding that 

reduces the errors between y i 

and 

or 

min 

imize 

θ 

min imize 

θ 

m 

∑ 

i = 1 

m 

y i 

− a i 

T 

θˆ 

θ 

∑ ( y ) − 

T 

i a i θ 

i= 

1 

2 

ŷ = T i ai 

θ 

15



If e = y - Aθ then: 

E( θ) 

= 

i = m 

∑ 

i= 

1 

T 

2 

T 

(y i − ai 

θ) 

= e e = (y − Aθ) 

(y − Aθ) 

T 

We need to compute: 

min(y 

θ 

− Aθ) 

T 

(y 

− 

Aθ) 

16



Theorem [least-squares estimator] 

The squared error is minimized when θ = 

(called the least-squares estimators LSE) satisfies 

the normal equation A T A = A T y, if A T A is 

nonsingular, is unique & is given by 

θˆ 

θˆ 

= (A T A) -1 A T y 

θˆ 

θˆ 

17


Least-Square Estimators: Example 

Example 

The relationship is between the spring length & the 

force applied L = k 1 f + k 0 (linear model) 

• Goal: find = (k o , k 1 ) T that best fits the data for a 

given force f o , we need to determine the 

corresponding spring length L 0 

- Solution 1: provide 2 pairs (L 0 ,f 0 ) and (L 1 ,f 1 ) and solve 

a linear system of 2 equations and 2 variables k 1 and 

k 0 

However, because of noisy data, this solution is not 

reliable. 

⇒ kˆ 

θˆ 

1 & kˆ 

0 

- Solution 2: use a larger training set (L i ,f i ) 

18



since y = e + Aθ, we can write: 

⎡1 

1.1⎤ 

⎢ 

1 1.9 

⎥ 

⎢ ⎥ 

⎢1 

3.2⎥ 

⎢ ⎥ 

⎢ 

1 4.4 

⎥ 

⎢1 

5.9⎥ 

⎢ ⎥ 

⎢1 

7.4⎥ 

⎢⎣ 

1 9.2⎥ 

4243 ⎦ 

A 

⎡k 

0 ⎤ 

⎢ + 

k 

⎥ 

⎣{ 

1 ⎦ 

θ 

⎡1.5⎤ 

⎡e1 

⎤ ⎢ 

2.1 

⎥ 

⎢ 

e 

⎥ ⎢ ⎥ 

⎢ 2 ⎥ ⎢2.5⎥ 

⎢e3 

⎥ ⎢ ⎥ 

⎢ ⎥ = 3.3 

e ⎢ ⎥ 

⎢ 4 ⎥ ⎢4.1⎥ 

⎢M 

⎥ ⎢ ⎥ 

⎢ ⎥ 4.6 

e 

⎢ ⎥ 

⎣{ 7 ⎦ ⎢⎣ 

5.0⎥ 

e 12 3 ⎦ 

therefore the LSE of [k 0 ,k 1 ] T which minimizes 

is equal to : 

y 

20



Therefore, the LSE of [k 0 

,k 1 

] T which minimizes 

is equal to : 

e 

T 

e 

= 

7 

∑ 

i= 

1 

e 

2 

i 

⎡kˆ 

⎢ 

⎣⎢ 

kˆ 

0 

1 

⎤ 

⎥ 

⎥⎦ 

= 

(A 

T 

A) 

−1 

A 

T 

y 

= 

⎡1.20⎤ 

⎢ ⎥ 

⎣0.44⎦ 

21



We rely on this estimation because we have more data 

If we are not happy with the LSE estimators then we can 

increase the model’s degree of freedom such that: 

L = k 0 + k 1 f + k 2 f 2 + … + k n f n 

(least square polynomial) 

22



Higher order models fit better the data but they do not 

always reflect the inner law that governs the system 

For example, when f is increasing toward 10N, the 

length is decreasing. 

23



24


Derivative-Based Optimization 

Based on first derivatives: 

• Steepest descent 

• Conjugate gradient method 

• Gauss-Newton method 

• Levenberg-Marquardt method 

• And many others 

Based on second derivatives: 

• Newton method 

• And many others 

25


Steepest Descent 

Have n-dimensional space 

Want to find 

* 

θ = θ 

[ θ θ ] T 

θ = 1, 

2,..., 

such that 

Investigate the search space via iteration 

such that 

E 

θ 

E 

θ n 

( 

* 

θ ) = min( E( θ )) 

( ) 

1 

= θ 

k 

+ ηkdk 

= 1,2,3,... 

k + 

k 

( θ ) = E( θ + η ) < E( θ ) 

d 

k + 1 k k 

k 

26


Steepest Descent 

If the direction d 

is determined from the gradient of E( θ ) 

i.e., 

g 

( θ) = ∇E( ) 

( θ ) ∂E( θ ) ∂E( θ ) T 

def ⎡∂E 

θ = ⎢ , ,..., 

⎣ ∂θ 

∂θ 

1 2 

∂θ 

n 

⎥ ⎦ 

⎤ 

such that 

θ 

= θ 

k + 1 k 

−ηg 

then we talk about the method of steepest descent 

27

Slides for Fuzzy Sets, Ch. 2 of Neuro-Fuzzy and Soft Computing

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?