12.07.2015 Views

Adaptive Critic Designs - Neural Networks, IEEE ... - IEEE Xplore

Adaptive Critic Designs - Neural Networks, IEEE ... - IEEE Xplore

Adaptive Critic Designs - Neural Networks, IEEE ... - IEEE Xplore

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

PROKHOROV AND WUNSCH: ADAPTIVE CRITIC DESIGNS 1001Fig. 6. A simple network for the example of computing the second-orderderivatives @ 2 J(t)=@R(t)@W C in our GDHP design given in Fig. 4.where , , , and . Thus,based on (11), we can adapt weights in the network using thefollowing expression:(16)where the indexes and are chosen appropriately. Wealso assume that either,or since is a constantbias term.The example above can be easily generalized to largernetworks.It is clear that HDP and DHP can be readily obtainedfrom a GDHP design with the critic of Fig. 5. The simplicityand versatility of this GDHP design is very appealing, and itprompted us to a straightforward generalization of the criticof Fig. 5 for AD forms of ACD’s. Thus, we propose actiondependentGDHP (ADGDHP), to be discussed next.Fig. 7. Adaptation in ADGDHP. The critic network outputs the scalar Jand two vectors, R and A . The vector A (t +1)backpropagated throughthe action network is added to R (t +1). The vector R (t +1)propagatesback through the model, then it is is split in two vectors. One of them goestinto the square summator to be added to the vector @U(t)=@R(t) and tothe rightmost term in (18) (not shown). The second vector is added to thevector @U(t)=@R(t) in another summator. both of these summators producetwo appropriate error vectors E 2 (t), as in (19) and (20). According to (3),the right oval summator computes the error E 1 (t). Two error vectors E 2 (t)and the scalar E 1 (t) are used to train the critic network. The action networkis adap ted by the direct path A (t +1)between the critic and the actionnetworks.whereD. ADGDHPAs all AD forms of ACD’s, ADGDHP features a directconnection between the action and the critic networks. Fig. 7shows adaptation processes in ADGDHP. Although one couldutilize critics similar with those illustrated in Figs. 3 and 4, wefound ADGDHP easier to demonstrate when a critic akin toone of Fig. 5 is used. In addition, we gained versatility in thatthe design of Fig. 7 can be readily transformed into ADHDPor ADDHP.Consider training of the critic network. We can write(17)(18)and , are the numbers of outputs of the model and theaction networks, respectively.Based on (17) and (18), we obtain two error vectors,andfrom (6) as follows:(19)(20)As in GDHP, the critic network is additionally trained by thescalar error according to (3). If one applies the LMSalgorithm, it results in an update rule similar to (11).Fig. 7 also shows the direct adaptation pathbetween the action and the critic networks. We express the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!