Chapter 6 in Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

Here you will find experiments and results obtained when performing n-step TD learning on the random walk examples from this chapter. I ran the experiments suggested in the book with one thousand episodes rather than the suggested one hundred. I also ran these experiments for one hundred alphas uniformly spaced between the limits given. For the online version of n-step TD learning we obtain

This result looks quite similar to that presented in the book. For the offline version of the same algorithm we obtained

This looks qualitativly the same as that found in thb book but has a more "jagged" appearance of the graphs for small n. This I believe is due to the more stringent requirements that offline TD learning methods require of their alpha parameterm, i.e. they must be small enough so that convergence is gaureenteed.

John Weatherwax

Last modified: Sun May 15 08:46:34 EDT 2005