Here you will find experiments and results obtained when performing R-learing on the server access control problem. The first plot is the learned policy. This matches quite well the same plot from the book.

The corresponding plot of the state value function (as a function of the number of free servers) is given by

and is very similar to the corresponding plot in the book.
John Weatherwax
Last modified: Sun May 15 08:46:34 EDT 2005