Here you will find experiments with the suggested algorithm from exercise 9.4. As discussed in the notes attempting to use this algorithm on the shortcut maze problem shows that it is not able to find the newly available path. To show this we plot the final policies for both dynaQplus and this suggested algorithm. The dynaQplus finally compute the policy given by

While the suggested algorithm computes the policy given by the following

We see that the suggested algorithm is not able to explore enought to find an inproved path.
John Weatherwax
Last modified: Sun May 15 08:46:34 EDT 2005