Chapter 6 in Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

A performance comparisons when learning using TD(0) v.s. constant alpha MC for various numbers of learning episodes on the batch learning algorithm on the random walk problem. This plot is produced when the code .m is run. This code effectively duplicates that presented in Figure 6.8 from the book.

John Weatherwax

Last modified: Sun May 15 08:46:34 EDT 2005