The average reward for several action value methods with standard devition 1 and softmax action selection.
The percentage of times we select the optimal action when the standard devition of rewards is 1 and softmax action selection
The cummulative average reward for several action value methods with standard devition 1 and softmax action selection.
The cummulative percentage of times we select the optimal action when the standard devition of rewards is 1 and softmax action selection.
John Weatherwax