01.12.2012 Views

c11.pdf

c11.pdf

c11.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Invatarea bazata pe diferente temporare (DT)<br />

� Cea mai simpla metoda DT, DT(0), o estimare a<br />

recompensei finale este calculata la fiecare stare si<br />

valoarea stare-actiune este actualizata la fiecare pas.<br />

V(s t) = V(s t) + α(r t+1 + γ V(s t+1) - V(s t))<br />

Estimare a recompensei<br />

� r t+1 este recompensa observata la momentul t+1.<br />

�� 㠖 rata de reducere pentru recompensa<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!