The average reward for several action value methods with standard devition 1.

The percentage of times we select the optimal action when the standard devition of rewards is 1.

The cummulative average reward for several action value methods with standard devition 1

The cummulative percentage of times we select the optimal action when the standard devition of rewards is 1.

The average reward for several action value methods with standard devition 10.

The percentage of times we select the optimal action when the standard devition of rewards is 10.

The average reward for several action value methods with standard devition 10.

The cummulative percentage of times we select the optimal action when the standard devition of rewards is 10.
John Weatherwax