Here you will find experiments and comparing TD(lambda) methods for the Markov chain example
with recurring states. We compare the TD algorithm in two versions the first is with accumulating elegiblity traces and
the second replacing eligiblity traces. I believe that when the probability of transition from a current state to
the state to its right is 0.5 the true solution to this problem (for five states) is V = [0.6,0.7,0.8,0.9,1.0].
Using this as the target outcome and comparing the two methods as we have done previously, we see that
with accumulating elegibility traces the algorithm does not converge
for as large a number of alphas. The replacing traces algorithm seems to converge for all lambda
in the range [0,1], while the accumulating elegibility traces converges over a much smaller range. In fact for
larger values of alpha the error in the replacing eligibility trace is less than that in the accumulating
elgability trace case.
John Weatherwax
Last modified: Sun May 15 08:46:34 EDT 2005