The average reward for several action value methods with standard devition 1 and softmax action selection.

The percentage of times we select the optimal action when the standard devition of rewards is 1 and softmax action selection

The cummulative average reward for several action value methods with standard devition 1 and softmax action selection.

The cummulative percentage of times we select the optimal action when the standard devition of rewards is 1 and softmax action selection.
John Weatherwax