Chapter 9 in Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

Here you will find experiments and results obtained when performing various different planning lengths in the dynaQ and dynaQplus algorithm for the "blocking" maze, where at the 1000th timestep the environment changes blocking the previouly found path/policy. The planning lengths choosen were 0 (no planning), 5, and 50. When planning 50 iterations ahead it seems to be very important to specify the initial conditon on the action value function correctly. If this is specified incorrectly convergence may be difficult. Learning curves with various amounts of planning are plotted here. In all cases we see the benifit that dynaQplus provides since it is able to find a solution to the blockage in a quicker amount of time.

This result looks quite similar to that presented in the book.

John Weatherwax

Last modified: Sun May 15 08:46:34 EDT 2005