Here you will find experiments and results obtained when performing various different planning lengths in the
dynaQ algorithm. The planning lengths choosen were 0 (no planning), 5, and 50. When planning 50 iterations
ahead it seems to be very important to specify the initial conditon on the action value function correctly.
If this is specified incorrectly convergence may be difficult. Learning curves with various amounts of planning are
plotted here
This result looks quite similar to that presented in the book.
John Weatherwax
Last modified: Sun May 15 08:46:34 EDT 2005