Why are continuous functions, especially uniformly continuous function so important in generalization (estimating future values)?

61 Views Asked by At

I am reading a book about reinforcement learning and dynamic programming and the author stated somewhere the the $Q(s,a)$ function (for those of you who are not familiar with reinforcement learning, that function gives the Quality of taking action $a$ in state $s$) is piecewise linear and convex in the finite action-state space ($S$ x $A$) and thus uniformly continuous. He then renounces a very important result, which is that this property enable us to "generalize values i.e. $Q$s from one state-action pair to another i.e. this property makes it possible to take previously experienced action-values for different state-action pairs $(s,a)$ and approximate the action-value of any other action-state pair.

I don't understand how this property of the $Q$ function supports his claim. can someone please explain?

1

There are 1 best solutions below

0
On BEST ANSWER

Continuity itself is the heart of this. If a function is not continuous, then it can jump around erratically - a tiny change in the input can result in a massive change in the output. But the very definition of continuity is that small enough changes in the input will result in only small changes in the output. Thus the value of the function at one point provides an estimate of the values of the function at nearby points.

The problem is what counts as "small enough". For example $f(x) = \frac 1x$ is continuous on the interval $(0,\infty)$. For $x$ between $99$ and $101$, $f(x)$ differs from $f(100)$ by less than $0.012$. But for $x$ between $0.0099$ and $0.0101$, $f(x)$ may differ from $f(0.01)$ by more than $1$.

This is where uniform continuity comes in. If a function is uniformly continuous, then for any allowable tolerance $\epsilon$, there is some $\delta$ such that if you change the input by a distance $< \delta$, the value will change by less than $\epsilon$, no matter where you are. If you know what the $\delta$ is that works for your allowable tolerance, and can find a grid of known values such that all points are within $\delta$ of some grid point, then you can use that grid to estimate the value of any point to within $\epsilon$.

Likely this is what they are referring to. (I am not well-enough versed in the particular subject to read that paper without a lot of work, so I cannot say for certain.)