How good function approximation can be achieved with N bits?

61 Views Asked by At

Alice has a function $f: [0,1]\rightarrow [0,1]$ that is piecewise continuous. Bob wants to approximate $f$, but can only get $N$ bits of information. How good can his approximation $\hat{f}$ be?

The question needs some refinement since the answer may depend on the protocol: In case 1 Bob asks $N$ yes/no questions about $f$ to Alice she answers truthfully, in case 2 he only gets to ask about function values and get yes/no answers back (e.g. "Is $f(0.3)>0.5$?), and in case 3 Alice gives Bob $N$ bits of information for yes/no questions of her choosing.

We also need to settle on how to measure the fit; a reasonable choice may be to use the norm $\int_0^1 |f(x)-\hat{f}(x)|dx$ but I am open for alternatives.

One upper bound on the approximation in case 2 is if Bob asks $N=M^2$ questions that fill in a binary grid: "Is $f(i/M))\leq j/(M+1)$?" (for $i,j$ from 1 to $M$). This produces a piecewise constant approximation where the total error is maximally 1 for wildly varying functions: not so good. In this case just assuming $\hat{f}(x)=1/2$ is better.

For "nice" functions that do not vary wildly the total error will be $\approx 1/2M$. There Bob can ask $M$ questions to do interval halving for the best approximation in $[j/(M+1),(j+1)/(M+1)]$, arriving at a total error scaling as $2^{-M}$: much better.

In case 3, if Alice is feeling unhelpful, she will just tell Bob approximations to one single value $x^*$ and Bob's $\hat{f}$ will presumably have to be just $\hat{f}(x)=1/2, x\neq x^*$ and $f(x^*)$ otherwise. The error will be 1/2. But if she is helpful she might use the algorithm in (Konno & Kuno 1988) or something like it to give approximations to an optimal answer. I am however uncertain of what bounds on how good the answer would be.

In summary, it seems that the interactive questions in case 1 and 2 are exponentially more valuable than following a pre-committed list of questions, and a helpful Alice is equally good or better. But can Bob ask more clever questions in case 1? Or do we need to place more constraints on $f$ to make the problem well-posed?

1

There are 1 best solutions below

0
On

In case 1, Bob's procedure is some function from strings $s$ of $N$ bits to functions $\hat{f}_s$. Fixing any such mapping, we can WLOG make his questions "What is the $k$th binary digit of the string $s$ for which $\hat{f}_{s}$ achieves the best fit to your function?"

With this framing, it's clear that the problem is just a restriction of the general problem "Bob can precommit to a list of $n$ functions and wants to minimize the error of the best function $\hat{f}_i$ to an adversarially-selected $f$" to the case where $n$ is a power of two; I expect that looking at $n=1,2,3,\ldots$ is the more natural way to investigate the question.

However, without more restrictions on $f$ I think Bob can't learn anything, even in Case 1: the idea is basically "Alice picks $0$ or $1$ at random for the value of her function on each of many small intervals". Then any specific $\hat{f}$ will have expected integral difference $\epsilon/2$ over a given interval of length $\epsilon$, so by picking enough intervals, we can make the distribution of expected error be so tight around $1/2$ that the odds any one $\hat{f}$ goes below, say, $0.49$ in error is less than 1. Hence, by the probabilistic method, some concrete choice on Alice's part must evade good approximation for Bob's specific list of $n$ functions. (The same sort of idea should work even if we require Alice's function to be continuous and differentiable, or a lot of other "niceness" conditions - we just make the jump from $0$ regions to $1$ regions very steep.)

I think a natural restriction on Alice to make this problem interesting would be to require that $f$ be differentiable, and bound the absolute value of the derivative of $f$. (Or perhaps relax the differentiability constraint and just say that $\left|\frac{f(y)-f(x)}{y-x}\right|<c$ for some $c>0$.)

Under these restrictions, one can make nontrivial improvements: when $n=2$ and $c=1$, taking $\hat{f}_1=1/3$ and $\hat{f}_2=2/3$ guarantees an error of at most $4/9$, though I think this isn't tight even for this specific choice of $f_1$ and $f_2$.