The average reward for several action value methods with standard devition 1 on a nonstationary reward distribution.

The percentage of times we select the optimal action when the standard devition of rewards is 1 on a nonstationary distribution.

The average reward for several action value methods with standard devition 10 on a nonstationary reward distribution.

The percentage of times we select the optimal action when the standard devition of rewards is 10 on a nonstationary distribution.

The average reward for several action value methods with standard devition 30 on a nonstationary reward distribution.

The percentage of times we select the optimal action when the standard devition of rewards is 30 on a nonstationary distribution.
John Weatherwax