A performance comparisons when learning using TD(0) v.s. constant alpha MC for various numbers of learning episodes on the batch learning algorithm on the random walk problem. This plot is produced when the code .m is run. This code effectively duplicates that presented in Figure 6.8 from the book.


John Weatherwax
Last modified: Sun May 15 08:46:34 EDT 2005