09.08.2013 Views

A generalized Newton algorithm using higher-order derivatives

A generalized Newton algorithm using higher-order derivatives

A generalized Newton algorithm using higher-order derivatives

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 39, No. 1, JANUARY 1983<br />

A Generalized <strong>Newton</strong> Algorithm Using<br />

Higher-Order Derivatives<br />

R. KALABA 1 AND A. TISHLER 2<br />

Abstract. This paper presents a general optimization <strong>algorithm</strong> <strong>using</strong><br />

first-<strong>order</strong> to rth <strong>order</strong> <strong>derivatives</strong> to find the optimum of an r-<br />

continuously differentiable function of many variables. This <strong>algorithm</strong><br />

collapses to the <strong>Newton</strong>-Raphson <strong>algorithm</strong> when only first- and<br />

second-<strong>order</strong> <strong>derivatives</strong> are used. The computation of the required<br />

<strong>higher</strong>-<strong>order</strong> <strong>derivatives</strong> are readily available through the table<br />

<strong>algorithm</strong>. The <strong>generalized</strong> CES production function is used as an<br />

example.<br />

Key Words. Optimization <strong>algorithm</strong>s, <strong>generalized</strong> <strong>Newton</strong> method,<br />

<strong>higher</strong>-<strong>order</strong> <strong>derivatives</strong>, table method.<br />

1. Introduction<br />

The choice of optimization <strong>algorithm</strong>s for nonlinear functions is the<br />

subject of numerous studies in recent years [see Dennis and Mor6 (Ref.<br />

1), Dennis and Mei (Ref. 2), Goldfeld and Quandt (Ref. 3), Pierre (Ref.<br />

4), and Belsley (Ref. 5)]. Most of the efforts in that area were directed<br />

toward developing <strong>algorithm</strong>s which use only first-<strong>order</strong> analytical deriva-<br />

tives. The main reason for that direction in research was the difficulty in<br />

obtaining the analytical expressions for <strong>higher</strong>-<strong>order</strong> <strong>derivatives</strong> (see Ref.<br />

5). Recently, Kalaba, Rasakhoo, and Tishler (Ref. 6) and Kalaba and<br />

Tishler (Ref. 7) developed and applied the table <strong>algorithm</strong> to calculate<br />

exact <strong>higher</strong>-<strong>order</strong> <strong>derivatives</strong> of functions with many variables without<br />

explicitly <strong>using</strong> analytical expressions for those <strong>derivatives</strong> [see also<br />

Wengert (Ref. 8) and Bellman, Kagiwada, and Kalaba (Ref. 9) on this<br />

issue]. The current available software of the table <strong>algorithm</strong> makes the<br />

calculation of <strong>derivatives</strong> up to <strong>order</strong> 3 a very simple task. Thus, there is<br />

Professor of Economics and Biomedical Engineering, Department of Economics, University<br />

of Southern California, Los Angeles, California.<br />

2 Visiting Associate Professor of Economics, Department of Economics, University of<br />

Southern California, Los Angeles, California; on leave from the Faculty of Management,<br />

Tel Aviv University, Tel Aviv, Israel.<br />

1<br />

0022-3239/83/0100-0001503.00/0 ~) 1983 Plenum l~aMishing Corporation


2 JOTA: VOL. 39, NO. 1, JANUARY 1983<br />

no technological reason to avoid the use of <strong>higher</strong>-<strong>order</strong> <strong>derivatives</strong> in<br />

optimization <strong>algorithm</strong>s. It is well known that, in important applications,<br />

the <strong>Newton</strong>-Raphson method is superior, in terms of computer costs, to<br />

lower-<strong>order</strong> <strong>algorithm</strong>s [see, for example, Belsley (Ref. 5) on nonlinear<br />

maximum-likelihood estimation in economics]. Thus, it is reasonable that<br />

an <strong>algorithm</strong> which uses third or <strong>higher</strong>-<strong>order</strong> <strong>derivatives</strong> could be superior<br />

to the <strong>Newton</strong>-Raphson and lower-<strong>order</strong> <strong>algorithm</strong>s for various applica-<br />

tions.<br />

This paper presents a general optimization <strong>algorithm</strong> <strong>using</strong> first-<strong>order</strong><br />

up to rth <strong>order</strong> <strong>derivatives</strong> to find the optimum of an r-continuously<br />

differentiable function of many variables.<br />

This <strong>algorithm</strong> collapses to the well-known <strong>Newton</strong>-Raphson<br />

<strong>algorithm</strong> when only first-<strong>order</strong> and second-<strong>order</strong> <strong>derivatives</strong> are used.<br />

Thus, we denote the <strong>algorithm</strong> a <strong>generalized</strong> <strong>Newton</strong> <strong>algorithm</strong> of <strong>order</strong><br />

r, GN(r), when the highest <strong>derivatives</strong> to be used in the optimization process<br />

are of <strong>order</strong> r + 1.<br />

This paper does not intend to solve the question of what is the best<br />

choice of optimization <strong>algorithm</strong>. Rather, since <strong>higher</strong>-<strong>order</strong> <strong>derivatives</strong><br />

are readily available through the table <strong>algorithm</strong>, we suggest a simple<br />

<strong>algorithm</strong> for optimization <strong>using</strong> <strong>higher</strong>-<strong>order</strong> <strong>derivatives</strong>.<br />

This paper proceeds as follows. Section 2 presents and proves the<br />

main theorem. In Section 3, we develop the optimization <strong>algorithm</strong>. The<br />

<strong>generalized</strong> CES production function is used to test the optimization<br />

<strong>algorithm</strong> with second-<strong>order</strong> and third-<strong>order</strong> <strong>derivatives</strong> in Section 4. In<br />

Section 5, we discuss and develop some extensions of the optimization<br />

<strong>algorithm</strong>.<br />

2. Main Theorem<br />

The optimization problem is given by<br />

min F(x), (1)<br />

xER t(<br />

where F(x) is r-continuously differentiable.<br />

A necessary condition for a vector x* to be a local minimum for F(x)<br />

is<br />

Y -- (Y~, Y2,. • •, y~c)' -- ~F(x*)/ax = 0. (2)


JOTA: VOL. 39, NO. 1, JANUARY 1983 3<br />

Let the inverse functions of (2), assumed to exist, in a neighborhood<br />

of x*, be<br />

xx = hl(yl, yz, . . . , yK),<br />

x~: = hK(yl, Y2 ..... yK),<br />

where Yl, i = 1 ..... K, stand for yl evaluated at x.<br />

Theorem 2.1. If hi, i = 1 ..... K, are polynomials of degree r, the<br />

optimal x for problem (1) is given by<br />

where<br />

(3)<br />

x* =x°+8, (4)<br />

o (x0,. xO),<br />

X ~ .,,<br />

is the initial value for x in a neighborhood of x*, and 8 includes the second<br />

up to the K + 1 elements in the first row of the matrix A -1 (if it exists)<br />

evaluated at x = x °, where<br />

A [(1);(yl), .,(Yr¢); 2<br />

= .. (yl), (YlY2) .... , (y2);... ; (Yl), (Y~-ly2) ..... (y~)]<br />

(5)<br />

and the expressions in the parentheses are the following column vectors:<br />

(z)=[z; Oz/Oxl, .. ., Oz/OxK,"2 • o z/oxl, ,~ 2 O2z/OxlOx2 .....<br />

O2z/Ox2;... ; OrZ/oXr~, O~Z/OX~ -~" aXE ..... O~Z/OX~r]'. (6)<br />

Proof. We first prove the theorem for r = 2 and later show how to<br />

extend it to the general case.<br />

Write hi as a second-degree polynomial, that is,<br />

K K K<br />

xl=h1(y)=a~ + 2 a~yi+ 2 2 b#y~y 1 i. (7)<br />

i=1 i=1 j=i<br />

Next, differentiate (7) on both sides sequentially with respect to xl,<br />

x2,..., xK, to get<br />

K K K<br />

Ohl(y)/Oxl= 1= ~2 a~(OyffOxl)+ ~ E b~.[O(y,yj)/axl],<br />

i=l i=t ]=i<br />

K K K<br />

Ohl(y)/Ox2 = 0 = ~ a~(OyffOx~)+ ~ ~ b~.[O(y~yi)/Ox2],<br />

i=1 i=1 ]=i<br />

K K K<br />

Ohl(y)/Ox~z = O= ~. a~(OyffOxK,)+ ~ ~, b~[O(yiyi)/OxK].<br />

i=l i=t ]=i<br />

(8)


4 JOTA: VOL. 39, NO. 1, JANUARY 1983<br />

There are K equations [first-<strong>order</strong> <strong>derivatives</strong> of hi(y) with respect to<br />

xl .... , XK] in (8). The second-<strong>order</strong> <strong>derivatives</strong> of (7) with respect to<br />

xl, x2 ..... xr are given by<br />

K K K<br />

O2h~(y)/Ox2=O = Y~ a~(O2yi/oxZ)+ ~ ~ b])[O2(y~yj)/Ox2],<br />

i=1 i=1 j=l<br />

K K K<br />

O2hl(Y)/OxlOx2 =0= 2 a~(O2YJOXlOX2) + E Z bii[O 1 2 (yiyi)/OXl 3X2],<br />

i=1 i=1 j=i<br />

K K K<br />

02hl(y)/Oxl Oxg = 0 = ~ a~(O2yl/Oxl OXK)+ ~ ~', b~[O2(yiyi)/Oxl OXK],<br />

i=1 i=1 j=i<br />

K K K<br />

2 2<br />

0 hl(y)/Ox2=O= E ali(O2Yi/Ox~)+ ~ ~ I z 2<br />

i=1 i=1 j=i<br />

b~j[o (y~y~)/Ox2],<br />

K K K<br />

02hl(y)/Ox20XK = 0 = ~ a~ (OZyl/OXl Oxr,:)+ ~ Y, b~[O2(yiyi)/Ox20XK],<br />

i=I i=1 j=i<br />

K K K<br />

02hl(y)/Ox2=O= ~ a~(O2yl/Ox~)+ y~ y~ bij[O 1 2 (ylyi)/Oxlc]. 2<br />

i~l i=1 ]=i<br />

There are K(K + 1)/2 equations [distinct second-<strong>order</strong> <strong>derivatives</strong> of hKy)<br />

with respect to xl,..., XK] in (9). Thus, the system of equations (7)-(9)<br />

includes 1 +K +K(K + 1)/2 equations in the same number of unknowns<br />

(aol;a~,.. 1 a 1 b 1 b 1 b~cr).Rewritingsystem (7)-(9)<br />

.,aK;bn,...,blK, 22, 23," .-,<br />

in matrix form gives<br />

where<br />

d~ = (x °, 1, 0 ..... 0)',<br />

(9)<br />

A01 = dl, (10)<br />

e, -- (a0~; al,.. .,aLbh, b ~12, • • • , bb,)',


and<br />

A=<br />

-1 yl<br />

0 Oyl/Oxl<br />

:<br />

0 Oyl/OxK<br />

0 OZy~/ox~<br />

0 02yl/OXl OX2<br />

0 o2y~/Ox~<br />

JOTA: VOL. 39, NO. 1, JANUARY 1983 5<br />

2 2<br />

YK Yl YlY2 " " " YK<br />

OyK/Oxl O(y~)/axl o(y~yz)/ax~ ., • O(y~)/ax~<br />

: : : :<br />

Oy~/OxK o(y~)/~xr o(y~y2)/oxK .,. O(y~)/Oxr,<br />

02 yr/Ox21 02(y2)/Ox~ O2(yly2)/OX 2 ..o 02(y2)/t~X21<br />

82yK/aXl OX2 a2(y21)/OXl aX2 a2(ylY2)/OXl~X2 "'" 02(y2)/OXl OX2<br />

If A is nonsingular, the solution for the unknown 01 is given by<br />

0i =A-ldl. (1t)<br />

Repeating the derivation of Eqs. (7)-(9) for h2, we get a similar set<br />

of equations; that is,<br />

where<br />

d2=( x 2,0, 0 1, 0,..., 0)',<br />

A02 = d2~ (12)<br />

02=(ao 2"a~, 2 , b 2<br />

..... aK; b~l, 22 . . . . . b2Ky,<br />

and A is given in (10). Clearly, nonsingularity of A implies<br />

02 = A-ld2. (13)<br />

Applying the same derivation for any h~, i = 1 ..... K, yields<br />

where<br />

and<br />

AO~ = du (14)<br />

di= (x o, 0, .... 0, 1, 0 ..... 0)',<br />

i-1<br />

i i i i i<br />

0i = (a~; as .... , aK; bm blz .... , bKK),<br />

Oi = A -ldi, (15)<br />

where A is defined in (10). Thus, (11) gives the solution for all a~ and b~<br />

in (7), and (15) yields the unknown parameters for any h~, i = 1 ..... K.


6 JOTA: VOL. 39, NO. 1, JANUARY 1983<br />

Choose<br />

It follows from (7) that<br />

or, in general [see (15)],<br />

i<br />

x* =ao, i=1 ..... K.<br />

K K K<br />

i=1 i=1 j=i<br />

K K K<br />

Z a ~y~ + Z Z b ~:y~yj = 0,<br />

i=1 i=1 j=i<br />

b qy~yj" = 0<br />

for n = 1,...,K.<br />

If A in (10) is nonsingular, the K equations (17) hold if and only if<br />

From (2),<br />

hence,<br />

This implies that<br />

yl=y2 ..... yK=0.<br />

y=0~ atx =x*;<br />

x*=(at, ag .... ,a~)'.<br />

i<br />

X} g ---- a0<br />

(16)<br />

(17)<br />

is the first component of the vector 0~ in (14). This component is the product<br />

of the first row of A -~ in (10) and d, in (15); thus,<br />

i --1 0<br />

x* =ao=(A )taxi +(A-1)a.i+1, (18)<br />

where (A-1)i,i is the iith element of A -1. The first column in A is<br />

this implies that<br />

(1, 0, 0 ..... 0)';<br />

(A-1)1.1 = 1,<br />

which gives the desired result in (4).<br />

When r > 2, we rewrite (7) as follows:<br />

xl=hl(y)=a~+ E '<br />

K K K<br />

aiyi+ E E b~.yiyj<br />

i=l i=l .i=i<br />

K /,2 K<br />

+'"+22" "" Z qij 1 ..... YlYj "'" y,~yr. (19)<br />

i=1 l=i r=rn<br />

We compute all the first-<strong>order</strong> up to rth <strong>order</strong> <strong>derivatives</strong> of (19) with


JOTA: VOL. 39, NO. 1, JANUARY !983 7<br />

respect to xl,..., xx to get the same number of equations as there are<br />

unknowns<br />

(a 's, b's ..... q's).<br />

We form (10) again, where A is given now by (5). Moreover,<br />

and<br />

01=(a~.al ' 1 1<br />

..... aK; bm b12 ..... blK;... ;<br />

1 1 1<br />

qll.,.ll; qll...12 ..... qKK...KK)<br />

d~ = (xl, 1, 0 ..... 0)',<br />

where the number of elements in dl is equal to the number of parameters<br />

in 01. The vector 01 is given by (11) and, in general, 0i is given by (15).<br />

Again, choose<br />

x* =a?, i=1 ..... K,<br />

which implies that (19) holds if and only if<br />

yl=y2 ..... yK =0.<br />

The rest of the proof is exactly similar to the case where r = 2.<br />

In the next section, we shall use the results of Theorem 2.i, combined<br />

with the table <strong>algorithm</strong> given in KaJaba and Tishler (Ref. 7), to present<br />

a simple <strong>algorithm</strong> for solving problem (1).<br />

3. Generalized <strong>Newton</strong> Algorithm and Table Algorithm<br />

The theorem in Section 2 states that, if the functions hl (y), i = 1 ..... K,<br />

are polynomials of rth degree, <strong>using</strong> Eq. (4) will result in finding the optimal<br />

x*, starting at x ° close to x*, in one step.<br />

In general, one does not know hi(y), and we propose to approximate<br />

them by a polynomial of degree r. A GN(r) <strong>algorithm</strong> for finding the optimal<br />

x in problem (1) is defined as follows:<br />

Step a.O. Choose x ° and form A according to Eq. (5).<br />

Step a.i. Set i = 1, 2, 3 .....<br />

Step a.i.1. Calculate x; =x* according to Eq. (4). Go to Step a.i.2.<br />

Step a.i.2. If I]x ~-xi-lll0 being some tolerance, go to Step<br />

a.i.3.; otherwise, compute A <strong>using</strong> Eq. (5) at x =x i and go to Step a.i.1.<br />

Step a.i.3, x ~ is a local minimum for F(x) if y (x i) = 0 and the Hessian<br />

of F is positive definite.


8 JOTA: VOL. 39, NO. 1, JANUARY 1983<br />

Note that, if r = 1, the above <strong>algorithm</strong> is the <strong>Newton</strong>-Raphson<br />

<strong>algorithm</strong>. The GN(r) <strong>algorithm</strong> requires the use of the values of <strong>higher</strong>-<br />

<strong>order</strong> <strong>derivatives</strong> in the computation of the matrix A in Step a.i.2. Analytical<br />

<strong>derivatives</strong> of <strong>order</strong> 2 or <strong>higher</strong> are, in general, very difficult to obtain;<br />

thus, we shall use the table <strong>algorithm</strong> to compute the exact values of these<br />

<strong>derivatives</strong>.<br />

The precise description and properties of the table <strong>algorithm</strong> are given<br />

in Kalaba et al. (Ref. 6) and Kalaba and Tishler (Ref. 7). In this paper, we<br />

shall only briefly describe the table <strong>algorithm</strong>.<br />

In principle, a table <strong>algorithm</strong> can be constructed to calculate the value<br />

and partial <strong>derivatives</strong> through <strong>order</strong> r of any admissible (r, n) function,<br />

defined as follows.<br />

Definition 3.1. A function F:D -~ R, D C R n, n - 1, will be referred<br />

to as an admissible (r, n) function if, given any x = (xl ..... xn) in D, the<br />

value F(x) can be sequentially calculated in a finite number of steps by<br />

means of the n initial conditions<br />

sl =xl ..... s. =xn, (20)<br />

the two-variable algebraic special functions f : R 2 --> R, given by<br />

w=u+v, w=u-v, w=uv, w=u/v, w=u v, (21)<br />

and arbitrary, one-variable, rth-<strong>order</strong> continuously differentiable special<br />

functions of the form<br />

h :M~,R, McR. (22)<br />

It is quite easy to observe that, if we can write a function F and its<br />

<strong>derivatives</strong>, up to <strong>order</strong> r, analytically, then F is an admissible (r, n) function.<br />

Given any admissible (r, n) function F with domain point x, the general<br />

table <strong>algorithm</strong> for F(x) is constructed as follows:<br />

Step I. Form the admissible list for F(x), i.e., a list of initial conditions<br />

(20) and special functions (21) and (22) whose sequential evaluation yields<br />

F(x).<br />

Step 2. Replace (<strong>using</strong> a calculus subroutine) the initial conditions<br />

sl = xl in the admissible list for F(x) by a one-dimensional array (vector)<br />

containing the value and partial <strong>derivatives</strong> through <strong>order</strong> r of the function<br />

g : R" ~ R defined by<br />

g(xl .... , x°)-=xl.<br />

Specify analogous arrays (vectors) for the functions<br />

g(xl,...,x,)--xl, i---2 ..... n,


JOTA: VOL. 39, NO. 1, JANUARY 1983 9<br />

to replace the remaining initial conditions<br />

in the admissible list for F(x).<br />

si = xi, i = 2,..., n,<br />

Step 3. Replace each special function (21) and (22) in the admissible<br />

list for F(x) by a calculus subroutine for outputting the value of the special<br />

function, together with all of its partial <strong>derivatives</strong> with respect to x through<br />

<strong>order</strong> r. For any two-variable special function w = f(u, v), the input to the<br />

calculus subroutine will be the previously calculated one-dimensional arrays<br />

and<br />

U ~ (u, u~,~, usa,...)<br />

V =- (v, v,1, vx~ .... ),<br />

and the output will be a one-dimensional array<br />

For any one-variable special function<br />

W =- (w, w~ l, w~ .... ).<br />

z = h(y),<br />

the input to the calculus subroutine wilt be the previously calculated<br />

one-dimensional array<br />

Y - (Y, Yxl, Y~,<br />

and the output will be a one-dimensional array<br />

Z - (z, z~,, zx~,...).<br />

The list of calculus subroutines resulting from these three steps is<br />

called the table <strong>algorithm</strong> for F at x. The proof that the table <strong>algorithm</strong><br />

for F at x correctly calculates the value and all partial <strong>derivatives</strong> through<br />

<strong>order</strong> r of F at x follows by a straightforward complete induction argument.<br />

The value of the function F and its <strong>derivatives</strong> (up to <strong>order</strong> r) are<br />

exact, provided that the sequential evaluation given in steps (1) to (3) above<br />

is correct. In practice, as in the case with the use of explicit analytical<br />

<strong>derivatives</strong>, roundoff and truncation errors wilt be introduced.<br />

Currently we have a program that evaluates the <strong>derivatives</strong> up to <strong>order</strong><br />

3 of functions with arbitrary number of variables.<br />

It is obvious that to construct the matrix A in (5) for r = r °, one needs<br />

the values of all the <strong>derivatives</strong> of F(x) up to <strong>order</strong> r°+ 1. However, note<br />

that the construction of A requires the construction of the functions and<br />

all the <strong>derivatives</strong> (up to <strong>order</strong> r) of Yt,..-, YK, yl z, yly2, Yly3,--., as is


10 JOTA: VOL. 39, NO. 1, JANUARY 1983<br />

evident in the matrix A in (10). These vectors are already available from<br />

the computation of F(x) <strong>using</strong> the table <strong>algorithm</strong>; thus, the construction<br />

of A in (5) or (10) is straightforward.<br />

4. Application: CES Production Function<br />

In this section, we apply the table <strong>algorithm</strong> to the <strong>generalized</strong> CES<br />

production function given by [see Arrow et aI. (Ref. 10)]<br />

O=f(K,L;x)=~/[aKO+(1-a)Le] ~/~, 0


Table 1. Results of estimation, T ~- 25, GN(1) Algorithm.<br />

<<br />

©<br />

Iteration<br />

number t3 v ~ G F~ G~ Ge<br />

9<br />

Z<br />

0<br />

y-<br />

0 0.65000 -0.45000 1,45000 0.85000 349.36 29,77 0,15 x 104 0.11 x 103<br />

1 0.6476 -0.56873 1.45135 0.81543 72.68 6.13 0.89 X i03 0.78X 102<br />

2 0.61477 -0.54893 1.45128 0.80779 7.22 0.46 0.70 x 103 0,67 x 102<br />

5 0.59530 -0,45344 1,45364 0.80687 -2,42 -0.22 0.67 X 10 3 0,65 x 10 ~<br />

7 0,59985 -0.49852 1,49851 0.80021. -0.11 -0.93 x 10 -2 0.67 x 103 0.64 x 102<br />

9 0,60000 -0.50000 1.50000 0.80000 -0,75 x 10 -6 -0.67 x 10 .7 0.6 7 x 103 0.64 x 102<br />

Iteration 0 gives the initial conditions. Computations done in double precision on IBM 370/168 at the University of Southern California.<br />

,<<br />

oo


Fo<br />

Table 2. Results of estimation, T = 25, GN(2) Algorithm.<br />

<<br />

O<br />

Iteration<br />

number a f3 T ~ F,~ F~ F,~ F~a F.,~o,<br />

ho<br />

P<br />

9<br />

Z<br />

0 0.65000 -0.45000 1.45000 0.85000 349.36 29.77 0.15 X 10 4 0.11 X 103 0.32 × 102<br />

1 0.63532 --0.61571 1.45240 0,80926 25.96 2.00 0,76 X 103 0.70 X 102 0.19 X 103<br />

2 0.59309 --0.43161 1.43542 0.81037 -0.5X 10 -2 0.3X10 -2 0,68X103 0.66X 102 0.19X 103<br />

3 0.59998 --0.49977 1.49977 0.80002 --0.08 --0.07 0.67 X 10 3 0.64 × 102 0.19 × 10 3<br />

4 0.60000 -0,50000 1.50000 0,80000 -0.71 X t0 -6 -0.67 X 10 -7 0.67 X 10 3 0.64 × 102 0.19 × 103<br />

See notes of Table 1.<br />

Oo


JOTA: VOL. 39, NO. 1, JANUARY 1983 !3<br />

The values for (Kt, Lt) are lid normal variates; that is,<br />

~N t = 1, T. (26)<br />

Lt 60 '\100 350]J' "'"<br />

The true parameters are<br />

x*- (a,/3, r, ~) = (0.6, -0.5, 1.5, 0.8).<br />

The Q,'s were generated according to (23) given the values of x, K~, and<br />

L,. The starting point for optimization is<br />

x ° = (0.65, -0.45, 1.45, 0.85).<br />

Table 1 presents the results for the GN(1) <strong>algorithm</strong> (which is the<br />

<strong>Newton</strong>-Raphson method).<br />

The convergence to the optimum <strong>using</strong> GN(1) is fairly fast. Note that<br />

the function F is very steep further away from the optimum, as can be<br />

seen from the values of F~ and F~ in iterations 0, 1, 2. It is very fiat close<br />

to the optimum (the Hessian is almost singular), but we did not encounter<br />

any problems of convergence once we get close to the optimum. In other<br />

experiments, starting further away from x* (the true value), the <strong>algorithm</strong><br />

sometimes failed, as is expected, to achieve convergence.<br />

In Table 2, we present the results of <strong>using</strong> GN(2); that is, the GN<br />

<strong>algorithm</strong>, <strong>using</strong> <strong>derivatives</strong> of F up to <strong>order</strong> 3. The data and initial<br />

conditions are equal to those in Table 1. The number of iterations needed<br />

for convergence is, however, much smaller than in the <strong>Newton</strong>-Raphson,<br />

GN(1), <strong>algorithm</strong>. As was the case for GN(1), when we started further<br />

away from x*, the <strong>algorithm</strong> sometimes failed to converge.<br />

For our example, however, for a large number of experiments, every-<br />

time that convergence was achieved, GN(2) was much faster than GN(I)<br />

in terms of number of iterations. For the limited number of examples (the<br />

CES function and other functions) that we tried, it seems that the domain<br />

of attraction of GN(1) may be slightly greater than that of GN(2). We shall<br />

further elaborate on this in the next section.<br />

5. Extensions and Discussion<br />

Section 3 derives the <strong>generalized</strong> <strong>Newton</strong> <strong>algorithm</strong> in general form.<br />

However, one can improve this <strong>algorithm</strong> by defining it as follows (i is the<br />

iteration index):<br />

x ~+1 = x / + a/8 i, (27)<br />

where 8 ~ is defined in (4) and ct~, a scalar, is the stepsize along the search


14 JOTA: VOL. 39, NO. 1, JANUARY 1983<br />

direction. For simplicity of exposition, we chose ai = 1, but one can use<br />

any other method to determine a~ (Ref. 5).<br />

Additional research is needed to determine what is the optimal <strong>order</strong><br />

of <strong>derivatives</strong> to be used in different problems. In other words, what is the<br />

optimal GN(r)? The advantage of <strong>using</strong> larger r is obvious if the functions<br />

to be approximated [the h's in (3)] behave like a high-degree polynomial.<br />

However, there are two possible disadvantages. First, it can be costly, in<br />

terms of computer time, to evaluate <strong>derivatives</strong> of high <strong>order</strong> in functions<br />

with many variables. The limited number of experiments that we carried<br />

out <strong>using</strong> the table <strong>algorithm</strong> to evaluate <strong>derivatives</strong> up to <strong>order</strong> 3 needed<br />

negligible additional computer time. We feel that the additional computer<br />

costs for computing <strong>derivatives</strong> up to the fourth <strong>order</strong> for functions of up<br />

to 10 variables are not substantial. However, further experimentation is<br />

needed in that area. The second disadvantage can be described <strong>using</strong> a<br />

simple example.<br />

Let Y = F(x), where x is a scalar. Let<br />

and<br />

y = OF (x)/Ox = f(x ),<br />

x =h(y)<br />

is the inverse of f. Approximate h (y) as follows, <strong>using</strong> GN(2):<br />

a 2<br />

x=ao+aly+ 2Y +R, (28)<br />

where R is the remainder term. Differentiate (28) twice with respect to x<br />

to get<br />

1 = a l(oy/Ox) + a2[O(y 2)/Ox ] + OR/Ox, (29)<br />

0 = a 1 (02y/Ox 2) + az[O2(y 2)/0X Z] q_ 02R/ox 2. (30)<br />

Applying Theorem 2,1, we obtain<br />

i+1 i ~ tl t<br />

x =x -(y/y')[l+l/2(y/y )(y /y )], (31)<br />

where<br />

y' =Oy/Ox and y" =O2y/Ox 2.<br />

The problem is, as can be seen in (28) and (29), in neglecting OR/Ox<br />

and oER/Ox 2. Differentiation is a roughening operation; and, even if R in<br />

(28) is very small, OR/Ox can be large relative to 1 and oER/Ox 2 can be<br />

large relative to 0, thus, presenting a large noise to the approximation<br />

process (31). We expect, however, that close to the optimum, or when<br />

F(x) is smooth but nontrivial, <strong>higher</strong>-<strong>order</strong> methods will do better. We<br />

use an example to illustrate that last remark.


JOTA: VOL. 39, NO. 1, JANUARY 1983 15<br />

Consider the circle given by<br />

Solving for y, we have<br />

(y -a) 2 + (x -/3) 2 = R 2. (32)<br />

y = a ±,][R 2_ (x -/3)a]. (33)<br />

Assume that (33) is y =f(x) [see (2)], and we have to find x such that y = 0.<br />

The maximal value of x is<br />

x=a+R.<br />

If we choose x ° to be very close to a +R and use the negative value of<br />

the square root in Eq. (33) (to ensure convergence), we expect a <strong>higher</strong><br />

method of optimization to be superior, since it gives a better approximation<br />

to y = f(x). Table 3 gives the results of the optimization for a = i, I3 = 2,<br />

R=3.<br />

As is clearly seen in Table 3, GN(2) is much faster in locating the<br />

correct root for x, compared to GN(1). This is the result of a second-degree<br />

polynomial approximating the half circle much better than a first-degree<br />

polynomial.<br />

Finally, we would like to point out two other extensions to our<br />

<strong>algorithm</strong>.<br />

(a) Equation (7) could include terms other than yi and products of<br />

y~, yj. For example, one can specify (7) as follows:<br />

K K K<br />

1<br />

x~=hl(y)=a~+ E a~y~+ E E bo~/(Y,Yj) (34)<br />

i=1 i=lj=i<br />

Table 3. Results for Example (33) <strong>using</strong> GN(1) and GN(2).<br />

GN(1) GN(2)<br />

Iteration<br />

number x 2 x y<br />

0 3.999990 1.992254<br />

1 3.994846 1.824223<br />

2 3.887776 1.187138<br />

3 3.553617 0.425503<br />

4 3.291262 0.063478<br />

5 3.237612 0.001728<br />

6 3.236069 0.000001<br />

7 3.236068 0.000000<br />

3.999990 1.992254<br />

3.333327 0.114374<br />

3.236264 0.000220<br />

3.236068 0.000000<br />

Iteration 0 gives the initial conditions when x = 3.999990 and ~r = 1,/3 = 2, R = 3.


16 JOTA: VOL. 39, NO. 1, JANUARY 1983<br />

or<br />

K K K<br />

xl=hi(y) = ao~+ ~ a~x/(yi)+ ~ ~<br />

i=1 i=l]=i<br />

bii 1 sm(yiyi). •<br />

(35)<br />

Other possibilities are obvious. The use of the <strong>algorithm</strong> is unaffected by<br />

these specifications, which may improve the optimization process.<br />

(b) The basic general equation to compute x~ +I is given by (10) as<br />

follows:<br />

AOl = dl. (36)<br />

One could get x ~+~ as the first component in 01, <strong>using</strong><br />

O1 =A-*d1, (37)<br />

which solves for all the parameters in 01 [see (10)]. It is apparent that we<br />

need to solve only for a01. Thus, our problem can be redefined as follows:<br />

Compute c'z, when AO1 = dl, where<br />

Then,<br />

c=(a~, ,a 1 b~l,b 1 b ~:K)',<br />

z = (Yl ..... YK; Y~,YlY2,-.-,Y2)'.<br />

an =xl-c'z,<br />

and there is no need to solve for all the vector of unknowns 01. Solving<br />

(38), <strong>using</strong> only forward Gaussian elimination, is a more efficient technique,<br />

compared to matrix inversion.<br />

References<br />

1. DENNIS, J. E., and MORt~, J., Quasi-<strong>Newton</strong> Methods: Motivation and Theory,<br />

SIAM Review, Vol. 19, pp. 48-89, 1977.<br />

2. DENNIS, J. E., and MEI, H. H. W, Two New Unconstrained Optimization<br />

Algorithms Which Use Function and Gradient Values, Journal of Optimization<br />

Theory and Applications, Vol. 28, pp. 453-482, 1979.<br />

3. GOLDFELD, S. M., and QUANDT, R. E., Nonlinear Methods in Econometrics,<br />

North-Holland, Amsterdam, Holland, 1972.<br />

4. PIERRE, D. A., Optimization Theory with Applications, John Wiley, New York,<br />

New York, 1969.<br />

5. BELSLEY, D. A., On the Efficient Computation of the Nonlinear Full-Informa-<br />

tion Maximum-Likelihood Estimator, Journal of Econometrics, Vol. 14, pp.<br />

203-225, 1980.


JOTA: VOL 39, NO. 1, JANUARY I983 17<br />

6. KALABA, R., RASAKHOO, N., and TISHLER, A., Nonlinear Least Squares<br />

via Automatic Derivative Evaluation, Applied Mathematics and Computation<br />

(to appear).<br />

7. KALABA, R., and TISHLER, A., Optimization in Nonlinear Models via<br />

Automatic Derivative Evaluation (to appear).<br />

8. Wlb2NGERT, R,, A Simple Automatic Derivative Evaluation Program, Com-<br />

munications of the Association for Computing Machinery, Vol. 7, pp. 463-464,<br />

1964.<br />

9. BELLMAN, R., KAGIWADA, H., and KALABA, R., Wengert's Numerical<br />

Method for Partial Derivatives, Orbit Determination, and Ouasitinearization,<br />

Communications of the Association for Computing Machinery, Vol. 8, pp.<br />

231-232, 1965.<br />

10. ARROW, K., CHENERY, H. B., MINHAS, B., and SOLOW, R. M., Capital-<br />

Labor Substitution and Economic Efficiency, Review of Economics and Statis-<br />

tics, Vol. 43, pp. 228-232, 1961.<br />

[1. HENDERSON, J. M., and QUANDT, R. E., Microeconomic Theory, McGraw-<br />

Hill, New, York, New York, 1980,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!