- Text
- Algorithm,
- Individual,
- Vectors,
- Neural,
- Individuals,
- Method,
- Multiple,
- Obtained,
- Adaptive,
- Differential,
- Evolution,
- Koszalin

Application of an Adaptive Differential Evolution Algorithm ... - Koszalin

SLOWIK: APPLICATION OF ADAPTIVE DIFFERENTIAL EVOLUTION ALGORITHM WITH MULTIPLE TRIAL VECTORS 3161The arr**an**gement **of** this paper is as follows: in Section II,gradient training methods are presented; in Section III, the DEalgorithm **an**d its application to ANN training are shown; inSection IV, the properties **of** the proposed DE-ANNT+ methodare described; in Section V, the structure **of** the assumed ANN**an**d neuron model is presented; in Section VI, the experimentsare described; in Section VII, some conclusions are presented;**an**d in the appendix, **an** example **of** the DE-ANNT+ method inoperation is described in detail.II. GRADIENT TRAINING METHODSTo use **an** ANN for **an**y problem solving, it is first necessaryto train the network. The training depends on **an** adaptation**of** free network parameters, that is, on the proper choice **of**neural weight values [21], [22]. Specialized gradient learningalgorithms are used for the adaptation **of** these weight values.Among these algorithms, the most popular are the error backpropagationmethod (EBP) [22] **an**d the LM algorithm [20].The EBP algorithm is based on the gradient method **an**dpermits efficient neural network training for solving difficultproblems, which **of**ten refer to nonseparable data [16]. It isa fundamental supervised training algorithm for multilayerfeedforward neural networks.Unfortunately, the EBP algorithm possesses several disadv**an**tages.Among these, the following are mentioned most**of**ten: a huge number **of** iterations is required to obtain satisfactoryresults **an**d the sensitivity **of** the error function to localminima. Its operation also depends on the value **of** the learningcoefficient. When the value **of** the chosen learning coefficient istoo small, the result is a long processing time for the algorithm,but when the value is too high, this c**an** cause the algorithm tooscillate [27].Another neural network training algorithm is the LM algorithm[20]. This algorithm modifies the values **of** weightsin a grouped m**an**ner, after the application **of** all the trainingvectors. It is one **of** the most effective training algorithms forfeedforward neural networks. However, this algorithm also possessessome disadv**an**tages. The main shortcomings are closelylinked to the computation **of** the error function **an**d Jacobi**an**inversion for obtaining a matrix in which the dimensions areequal to the total **of** all the weights in the neural network.Therefore, the requirement for memory is very high [23], [24].This algorithm is also local, **an**d there is no guar**an**tee **of** findinga global minimum for the objective function. In the case wherethe algorithm converges to the local minimum, there is no way**of** escape, **an**d the solution obtained is not optimal [28].Due to the disadv**an**tages **of** these methods, research ondifferent optimization techniques that are dedicated to ANNtraining is still required in both **of** the described cases. Therefore,**an** application **of** the adaptive DE algorithm with multipletrial vectors for training **of** **an** ANN, as described in this paper,is well founded. The proposed DE-ANNT+ algorithm requiresfewer iterations th**an** the EBP algorithm, it does not oscillate, itrequires less memory th**an** the LM algorithm, **an**d neurons withnondifferential activation functions c**an** be used in the ANN(whereas with the EBP **an**d LM algorithms, only neurons withdifferential activation functions c**an** be used in the ANN).III. BACKGROUNDA. DE **Algorithm**The DE algorithm was proposed by Price **an**d Storn [1]. TheDE algorithm has the following adv**an**tages over the traditionalgenetic algorithm: it is easy to use **an**d it has efficient memoryutilization, lower computational complexity (it scales betterwhen h**an**dling large problems), **an**d lower computational effort(faster convergence) [26]. DE is quite effective in nonlinearconstraint optimization **an**d is also useful for optimizing multimodalproblems [25].Its pseudocode form is as follows:a) Create **an** initial populationconsisting **of** PopSize individualsb) While (termination criterion is notsatisfied)Do Beginc) For each ith individual in thepopulationBegind) R**an**domly generate threeinteger numbers:r 1 ,r 2 ,r 3 ∈ [1; PopSize], where r 1 ≠r 2 ≠r 3 ≠ie) For each jth gene in ithindividual (j ∈ [1; n])Beginv i,j = x r1,j + F · (x r2,j − x r3,j )f) R**an**domly generate one real numberr**an**d j ∈ [0; 1)g) If r**an**d j

3162 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 58, NO. 8, AUGUST 2011where F ∈ [0, 2), **an**d r1,r2,r3,i∈ [1,PopSize] fulfill theconstraintr1 ≠ r2 ≠ r3 ≠ i. (2)Fig. 1. Part **of** (a) ANN, corresponding to its (b) chromosome containing theweight values; weights w i,0 represent bias weights [15].the optimized function [1], [10]. Another import**an**t property **of**this algorithm is a local limitation **of** the selection operator toonly two individuals (parent (x i ) **an**d child (u i )), **an**d, owingto this property, the selection operator is more effective **an**dfaster [10]. Also, to accelerate the convergence **of** the algorithm,it is assumed that the index r 1 (occurring in the algorithmpseudocode) points to the best individual in the population.B. DE **Algorithm** in ANN TrainingIn the literature, we c**an** find several applications **of** the DEalgorithm to ANN training. For example, we c**an** mention [30]**an**d [31]. In these papers, the DE algorithm without adaptiveselection **of** control parameters was used in ANN training.Therefore, the main problem in these papers was the tuning**of** the algorithm parameters. This problem was overcome in[15], in which the adapted DE algorithm [11] was used in ANNtraining (DE-ANNT algorithm). Due to the use **of** DE-ANNT,the tuning **of** primary DE parameters, such as F **an**d CR, is notneeded.IV. DE-ANNT+ METHODThe proposed DE-ANNT+ method is based on the previouslyelaborated DE-ANNT method [15] **an**d operates accordingto the following steps:In the first step, a population **of** individuals is r**an**domlycreated. The number **of** individuals in the population is storedin parameter PopSize. Each individual x i consists **of** k genes(where k represents the number **of** weights in the trained ANN).In Fig. 1(a), a part **of** **an** ANN with neurons from n to mis shown. Additionally, in Fig. 1(b), the coding scheme forweights in **an** individual x i connected to neurons from Fig. 1(a)is shown.Each jth (j ∈ [1,k]) gene **of** individual x i c**an** have valuesfrom a determined r**an**ge **of** variability (closed double-sided)from min j to max j . In the proposed method, the values **of**min j = −1 **an**d max j =1are assumed.In the second step, the NT (number **of** trial vectors) mutatedindividuals (trial vectors) V i,m (m ∈ [1,NT]) are created foreach individual x i in the population, according to the formulaV i,m = x r1 + F · (x r2 − x r3 ) (1)Indexes r2 **an**d r3 point to individuals r**an**domly chosen fromthe population. Index r1 points to the best individual in thepopulation, which has the lowest value **of** the training errorfunction ERR(.). This function is described as follows:ERR = 1 2 ·T∑(Correct i − Answer i ) 2 (3)i=1where i is the actual number **of** training vector, T is the number**of** all training vectors; Correct i is the required correct **an**swerfor the ith training vector, **an**d Answer i is the **an**swer generatedby the neural network for the ith training vector applied toits input. The DE-ANNT+ method minimizes the value **of** theobjective function ERR(.).From the created set **of** mutated vectors V i,m , only one vectorV i,m (individual), having the lowest value **of** the objectivefunction ERR(.), is chosen for each individual x i , **an**d it isassigned as vector v i .In the third step, all individuals x i are crossed over with theirmutated individuals v i . As a result **of** this crossover operation,**an** individual u i is created. The crossover operates as follows:for chosen individual x i =(x i,1 ,x i,2 ,x i,3 ,...,x i,j ) **an**d individualv i =(v i,1 ,v i,2 ,v i,3 ,...,v i,j ); for each gene j ∈ [1; k]**of** individual x i , r**an**domly generate a number r**an**d j from ther**an**ge [0; 1), **an**d use the following rule:If r**an**d j < CR then u i,j = v i,jElse u i,j = x i,jwhere CR ∈ [0; 1).In this paper, **an** adaptive selection **of** control parametervalues F **an**d CR is introduced (similarly as in [11]) accordingto the formulasA =T heBest iT heBest i−1(4)F =2· A · r**an**dom (5)CR = A · r**an**dom (6)where r**an**dom—the r**an**dom number with a uniform distributionin the r**an**ge [0; 1); T heBest i —the value **of** theobjective function for the best solution in ith generation;T heBest i−1 —the value **of** the objective function for the bestsolution in the i − 1th generation.From (5) **an**d (6), we c**an** see that, in the case **of** a stagnation(lack **of** ch**an**ges **of** the best solution), the F parameter takesr**an**dom values from the r**an**ge [0; 2), **an**d the CR parametertakes r**an**dom values from the r**an**ge [0; 1). In such a case, thesearching **of** the solution space has a more global character, **an**dthe DE algorithm may “get out” more easily from the localextreme that is causing its stagnation. However, in the casewhere the results obtained by the DE algorithm are improvingin subsequent generations, then the F parameter accepts r**an**domvalues from the r**an**ge [0; 2 · A), **an**d the CR parameteraccepts r**an**dom values from the r**an**ge [0; A). Obviously, the

- Page 1: 3160 IEEE TRANSACTIONS ON INDUSTRIA
- Page 5 and 6: 3164 IEEE TRANSACTIONS ON INDUSTRIA
- Page 7 and 8: 3166 IEEE TRANSACTIONS ON INDUSTRIA