A performance comparisons when learning using TD(0) v.s. constant alpha MC
for various numbers of learning episodes on the batch learning
algorithm on the random walk problem. This plot is produced when
the code .m is run. This code effectively duplicates that presented
in Figure 6.8 from the book.
John Weatherwax
Last modified: Sun May 15 08:46:34 EDT 2005