Chapter 2 in Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

The average reward for several action value methods with standard devition 1 and softmax action selection.

The percentage of times we select the optimal action when the standard devition of rewards is 1 and softmax action selection

The cummulative average reward for several action value methods with standard devition 1 and softmax action selection.

The cummulative percentage of times we select the optimal action when the standard devition of rewards is 1 and softmax action selection.