Chapter 5 in Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

The final state value function obtained when following the deterministic policy as specified in the book. The first sequence of plots (four plots) is for 20000 Monte-Carlo trials. The second sequence of plots (the next four) is four 500,000 Monte-Carlo trials. These plots match quite well similar plots presented in the book.