Dissertation - HQ
Dissertation - HQ
Dissertation - HQ
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
A general modelling framework for larval behaviour 123<br />
instantaneous gains (there are no gains or costs along the trajectories as<br />
long as they satisfy the given criterion at the last time step), which are<br />
thus defined by the function<br />
L(θ, x, d, t) = 0 ∀ θ, x, d, t (6.1)<br />
and a final gain equal to the amount of energy reserves of a larva when<br />
it reaches the nursery (x = 0), and zero elsewhere<br />
Φ(θ, x, T ) = θ · 1 {x=0} (6.2)<br />
The function 1 {x=0} equals one when the condition is satisfied (x = 0),<br />
zero otherwise.<br />
Eventually, from any initial point in time (t = t i) and state (θ ti ,x ti ),<br />
the optimisation problem can be written as the value function<br />
V (θ ti , x ti , t i) =<br />
max<br />
d ti ,...,d T −1<br />
E<br />
TX<br />
−1<br />
= max<br />
d ti ,...,d T −1<br />
E `θ T · 1 {xT =0}<br />
L(θ τ , x τ , d τ , τ) + Φ(θ T , x T , T )<br />
τ=t i<br />
´<br />
(6.3)<br />
!<br />
The value function<br />
meaning that, over all possible future decisions (d ti , . . . , d T −1), the final<br />
energy (θ T ) is maximised but only if the larva reaches the nursery (i.e.<br />
only if x T = 0).<br />
6.2.2 Stochastic dynamic programming equation<br />
Backward induction of decisions<br />
Now that the evolution of the state is described (by transition matrices)<br />
and that an optimisation criterion is specified (maximise energy<br />
resources at recruitment), optimal strategies have to be found. Optimal<br />
strategies are functions of state and time which give a sequence of<br />
optimal decisions (d # 0 , ..., d# T −1<br />
) for each state. They are computed by<br />
means of the stochastic dynamic programming equation (or Bellman’s<br />
equation) 243,244 which is the backward induction<br />
8<br />
V (θ, x, T ) = θ · 1 {x=0}<br />
0<br />
1<br />
(1 − p)V (0, x, t + 1) +<br />
pV (θ + ∆θ 0 , x + ∆x 0 , t + 1) ,<br />
V (θ, x, t) = max<br />
B<br />
><<br />
@ (1 − p)V (0, x, t + 1) +<br />
C<br />
A<br />
pV (θ − ∆θ 1 , x − ∆x 1 , t + 1) (6.4)<br />
0<br />
1<br />
(1 − p)V (0, x, t + 1) +<br />
pV (θ + ∆θ 0 , x + ∆x 0 , t + 1) ,<br />
d # (θ, x, t) ∈ argmax<br />
B<br />
@ (1 − p)V (0, x, t + 1) +<br />
C<br />
A<br />
>:<br />
pV (θ − ∆θ 1 , x − ∆x 1 , t + 1)<br />
Computation of<br />
the value function<br />
from the final gain<br />
where V (θ, x, T ) is the final gain and the first argument (i.e. the first<br />
two lines) of the max and argmax functions is the mean gain associated