11.07.2015 Views

View - Universidad de Almería

View - Universidad de Almería

View - Universidad de Almería

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Global Optimization of Low-Thrust Space Missions Using Evolutionary Neurocontrol 87A Delayed Reinforcement Learning Problem. In reinforcement learning (RL) problems,the optimal behavior of the learning system (called agent) has to be learned solely throughinteraction with the environment, which gives an immediate or <strong>de</strong>layed evaluation 1 J (alsocalled reward or reinforcement) [11]. The agent’s behavior is <strong>de</strong>fined by an associative mappingfrom situations to actions S : X ↦→ A. Here, this associative mapping that is typicallycalled policy in the RL-related literature, is termed strategy. The optimal strategy S ⋆ of theagent is <strong>de</strong>fined as the one that maximizes the sum of positive reinforcements and minimizesthe sum of negative reinforcements over time. If, given a situation X ∈ X , the agent triesan action A ∈ A and the environment immediately returns an evaluation J(X, A) of the(X, A) pair, one has an immediate reinforcement learning problem. More difficult are <strong>de</strong>layedreinforcement learning problems, where the environment gives only a single evaluationJ(X, A)[t], collectively for the sequence of (X, A) pairs occurring in time during the agent’soperation. From the perspective of machine learning, a spacecraft steering strategy may be<strong>de</strong>fined as an associative mapping S that gives – at any time along the trajectory – the currentspacecraft control u from some input X that comprises the variables that are relevant for theoptimal steering of the spacecraft (the current state of the relevant environment). Because thetrajectory is the result of the spacecraft steering strategy, the trajectory optimization problemis actually a problem of finding the optimal spacecraft steering strategy S ⋆ . This is a <strong>de</strong>layedreinforcement problem because a spacecraft steering strategy can not be evaluated before itstrajectory is known un<strong>de</strong>r the given environmental conditions (constellation of the initial andthe target body etc.) and a reward can be given according to the fulfillment of the optimizationobjective(s) and constraints. ANNs can be used to implement spacecraft steering strategies.Evolutionary Neurocontrol. For the work <strong>de</strong>scribed here, feedforward ANNs with a sigmoidneural transfer function have been used. Such an ANN can be consi<strong>de</strong>red as a continuousparameterized function (called network function) N w : X ⊆ R n i→ Y ⊆ (0, 1) no thatmaps from an n i -dimensional input space X onto an n o -dimensional output space Y. The parameterset w = {w 1 , . . . , w nw } of the network function comprises the n w internal parametersof the ANN, i.e., the weights of the neuron connections and the biases of the neurons. ANNshave already been successfully applied as neurocontrollers (NCs) for reinforcement learningproblems [7]. The most simple way to apply an ANN for controlling a dynamical systemis by letting the ANN provi<strong>de</strong> the control u(¯t) = Y (¯t) ∈ Y from some input X(¯t) ∈ X thatcontains the relevant information for the control task. The NC’s behavior is completely characterizedby its network function N w (that is – for a given network topology – again completelycharacterized by its parameter set w). Learning algorithms that rely on a training set – likebackpropagation – fail when the correct output for a given input is not known, as it is the casefor <strong>de</strong>layed reinforcement learning problems. EAs can be employed for searching N ⋆ becausew can be mapped onto a real-valued string c (also called chromosome or individual) that provi<strong>de</strong>san equivalent <strong>de</strong>scription of a network function. If an EA is already employed for theoptimization of the NC parameters, it is manifest to use it also for the co-optimization of theinitial conditions. This way, the initial conditions are ma<strong>de</strong> explicitly part of the optimizationproblem.Neurocontroller Input and Output. Two fundamental questions arise concerning the applicationof a NC for spacecraft steering, what input the NC should get (or what the NC shouldknow to steer the spacecraft) and what output the NC should give (or what the NC should do tosteer the spacecraft). To be robust, a spacecraft steering strategy should be time-in<strong>de</strong>pen<strong>de</strong>nt:to <strong>de</strong>termine the currently optimal spacecraft control u(¯t i ), the spacecraft steering strategyshould have to know – at any time step ¯t i – only the current spacecraft state x SC (¯t i ) and the1 This evaluation is analogous to the cost function in optimal control theory. To emphasize this fact, it will also be <strong>de</strong>noted by thesymbol J.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!