Here you will find experiments and results obtained when performing various different planning lengths in the
  dynaQ algorithm.  The planning lengths choosen were 0 (no planning), 5, and 50.  When planning 50 iterations
  ahead it seems to be very important to specify the initial conditon on the action value function correctly.
  If this is specified incorrectly convergence may be difficult.  Learning curves with various amounts of planning are
  plotted here
   
  
   
  
  This result looks quite similar to that presented in the book.  
      
  
  John Weatherwax
Last modified: Sun May 15 08:46:34 EDT 2005