Why does this approximation constant work?

205 Views Asked by At

I found an algorithm that used an approximation of $\log_2(x + 1)$ from $0$ to $1$ which simply followed the line $y = x + k$ where $k$ was some constant they discovered to be something like $0.043$. I wondered where this constant came from, so I made a Desmos page to experiment, and I graphically figured out what it represents.

The Desmos page

My idea was that I'd have to take the integral of the absolute value of the difference between the two curves and then minimize the average height with respect to the value $k$. This did not yield the desired $0.043$ that the algorithm used. However, the $0.043$ seems to be the number where the integral is spread out the most "evenly" (another way I could see it is that the maximum error is minimized. Notice how the maxima all match up). Is there some sort of statistical or actual algebraic way of expressing this? I'm just a high school student, I'd love to know how to express my findings better. Cheers.

4

There are 4 best solutions below

4
On BEST ANSWER

Let me explain my comments.

Let’s call $\alpha:= \log_2(1/\ln(2))-1/\ln(2)+1$. In this answer, we want to find $k\in\mathbb R$ so that the following quantity is minimized: $$ \max_{[0,1]} |\log_2(x+1)-(x+k)|=:err(k) $$ (the second quantity in RobPratt’s answer), which I called the maximum error, and which I believe is the quantity that is being minimized in the algorithm mentioned by the OP.

By definition of absolute value, and by some properties of $\max$ and $\min$, we have that $$ err(k)=\max\left\{\;\max_{[0,1]} \log_2(x+1)-(x+k)\;\;,\;\; \max_{[0,1]} -\left(\log_2(x+1)-(x+k)\right)\;\right\}=$$ $$ =\max\left\{M-k,-m+k\right\}, $$ where $M$ and $m$ are respectively the maximum and minimum values of the function $f(x)=\log_2(x+1)-x$.

Now one can use some Analysis, computing the derivative of the above function and studying its graph (I will skip this as I think it should not be difficult), to conclude that $$ M=\alpha\qquad\text{and}\qquad m=0. $$ It follows that $$ err(k)=\max\{\alpha-k,k\}=|k-\alpha/2|+\alpha/2 $$ achieves its minimum value when $k=\alpha/2$, meaning $$err(\alpha/2)=\alpha/2.$$ As I mentioned, this value is approximately $\alpha/2= 0.04303566$, pretty close to the number $0.043$ in the question.

Hence the division by $2$. It’s essentially because of the absolute value. If one looks at the graph I think can figure out why we have to choose $k$ to be half of $err(0)$.

2
On

Your first idea corresponds to finding $k\in\mathbb{R}$ that minimizes $$\int_0^1 \left|\log_2(x+1)-(x+k)\right| \mathrm{d}x.$$ Your second idea corresponds to finding $k\in\mathbb{R}$ that minimizes $$\max_{x\in[0,1]} \left|\log_2(x+1)-(x+k)\right|.$$

1
On

Starting from @RobPratt's answer, consider the function $$\Phi(k)=\int_0^1 \left|\log_2(x+1)-(x+k)\right| \,dx$$ which is to very pleasant.

Using numerical integration for a given value of $k$, we can generate the following table $$\left( \begin{array}{cc} k & \Phi(k)\\ 0.00 & 0.0573050 \\ 0.01 & 0.0479013 \\ 0.02 & 0.0397416 \\ 0.03 & 0.0329156 \\ 0.04 & 0.0275365 \\ 0.05 & 0.0237532 \\ 0.06 & 0.0217750 \\ 0.07 & 0.0219273 \\ 0.08 & 0.0248382 \\ 0.09 & 0.0326950 \\ 0.10 & 0.0426950 \\ \end{array} \right)$$ Zooming or interploating, we should obtain $k_{\text{opt}}=0.0644465$ for which $\Phi=0.0215533$.

There is another possibility which is equivalent to a linear regression based on an infinite number of data points. Minimize $$\Psi(k)=\int_0^1 \big[\log_2(x+1)-(x+k)\big]^2\,dx$$ $$\Psi(k)=k^2+k \left(\frac{2}{\log (2)}-3\right)+\left(\frac{7}{3}+\frac{2}{\log ^2(2)}-\frac{9}{\log (4)}\right)$$ Differentiating $$k_{\text{opt}}=\frac{3}{2}-\frac{1}{\log (2)}=0.057305\quad \implies\quad \Psi=\frac{1}{12}+\frac{1}{\log ^2(2)}-\frac{3}{\log (4)}=0.000659753$$

We could even make a small improvement considering $$\Psi(a,b)=\int_0^1 \big[\log_2(x+1)-(ax+b)\big]^2\,dx$$ $$\Psi(a,b)=\frac{a^2}{3}+a \left(b-\frac{1}{2 \log (2)}\right)+b^2+2 b \left(\frac{1}{\log (2)}-2\right)+\frac{2 (\log (2)-1)^2}{\log ^2(2)}$$ Differentiating with respect to $a$ and $b$ gives $$a=\frac{9}{\log (2)}-12=0.984255\quad \text{and} \quad b=8-\frac{11}{2 \log (2)}=0.0651773$$ giving $$\Psi=\frac{72 \log (2)-23}{4 \log ^2(2)}-14=0.000639095$$ which is a marginal improvement.

1
On

I prefer to add a separate answer for @RobPratt's second case

After @LorenzoPompili's comments

$$F(x,k)=\left|\log_2(x+1)-(x+k)\right|$$ As I wrote in comments, the intersection points are

$$x_{\text{low}}=-1-\frac{W_{-1}\left(-2^{k-1} \log (2)\right)}{\log (2)}\qquad \text{and}\qquad x_{\text{high}}=-1-\frac{W_0\left(-2^{k-1} \log (2)\right)}{\log (2)}$$ where $W(.)$ is Lambert function. The solution of $x_{\text{low}}=x_{\text{high}}$ is $$k=1-\frac{1}{\log (2)}+\frac{\log \left(1+\frac{1-\log (2)}{\log (2)}\right)}{\log (2)}$$ which is exactly twice the famous number. $$F\left(\frac{1}{\log (2)}-1,1-\frac{1}{\log (2)}+\frac{\log \left(1+\frac{1-\log (2)}{\log (2)}\right)}{\log (2)}\right)=0$$ Why the double ?