Why are the graphs for average reward vs. steps so noisy?

45 Views Asked by At

I am reading Reinforcement Learning: An Introduction by Sutton and Barto. They have several graphs that plot either average reward vs. number of steps or %optimal action vs. number of steps for an $n$-armed bandit problem. I don't understand why these graphs are so noisy. Why should the average reward over 2000 different trials be so different from one step to the next? Would it be smoother with more trials?

Procedure: They ran 2000 10-armed bandit tasks. The action values, $q(a)$, were chosen from a $N[0,1]$ distribution, and the reward at the $t$th time step was given by a $N[q(a),1]$ distrubution.

enter image description here