Why does this approximation constant work?

Question

Why does this approximation constant work?

205 Views Asked by Bumbble Comm At 30 Apr 2026 - 3:46

I found an algorithm that used an approximation of $\log_2(x + 1)$ from $0$ to $1$ which simply followed the line $y = x + k$ where $k$ was some constant they discovered to be something like $0.043$. I wondered where this constant came from, so I made a Desmos page to experiment, and I graphically figured out what it represents.

My idea was that I'd have to take the integral of the absolute value of the difference between the two curves and then minimize the average height with respect to the value $k$. This did not yield the desired $0.043$ that the algorithm used. However, the $0.043$ seems to be the number where the integral is spread out the most "evenly" (another way I could see it is that the maximum error is minimized. Notice how the maxima all match up). Is there some sort of statistical or actual algebraic way of expressing this? I'm just a high school student, I'd love to know how to express my findings better. Cheers.

Original Q&A

There are 4 best solutions below

Bumbble Comm On 16 Nov 2022 - 2:18

Your first idea corresponds to finding $k\in\mathbb{R}$ that minimizes $$\int_0^1 \left|\log_2(x+1)-(x+k)\right| \mathrm{d}x.$$ Your second idea corresponds to finding $k\in\mathbb{R}$ that minimizes $$\max_{x\in[0,1]} \left|\log_2(x+1)-(x+k)\right|.$$

Bumbble Comm On 16 Nov 2022 - 10:38

Starting from @RobPratt's answer, consider the function $$\Phi(k)=\int_0^1 \left|\log_2(x+1)-(x+k)\right| \,dx$$ which is to very pleasant.

Using numerical integration for a given value of $k$, we can generate the following table $$\left( \begin{array}{cc} k & \Phi(k)\\ 0.00 & 0.0573050 \\ 0.01 & 0.0479013 \\ 0.02 & 0.0397416 \\ 0.03 & 0.0329156 \\ 0.04 & 0.0275365 \\ 0.05 & 0.0237532 \\ 0.06 & 0.0217750 \\ 0.07 & 0.0219273 \\ 0.08 & 0.0248382 \\ 0.09 & 0.0326950 \\ 0.10 & 0.0426950 \\ \end{array} \right)$$ Zooming or interploating, we should obtain $k_{\text{opt}}=0.0644465$ for which $\Phi=0.0215533$.

There is another possibility which is equivalent to a linear regression based on an infinite number of data points. Minimize $$\Psi(k)=\int_0^1 \big[\log_2(x+1)-(x+k)\big]^2\,dx$$ $$\Psi(k)=k^2+k \left(\frac{2}{\log (2)}-3\right)+\left(\frac{7}{3}+\frac{2}{\log ^2(2)}-\frac{9}{\log (4)}\right)$$ Differentiating $$k_{\text{opt}}=\frac{3}{2}-\frac{1}{\log (2)}=0.057305\quad \implies\quad \Psi=\frac{1}{12}+\frac{1}{\log ^2(2)}-\frac{3}{\log (4)}=0.000659753$$

We could even make a small improvement considering $$\Psi(a,b)=\int_0^1 \big[\log_2(x+1)-(ax+b)\big]^2\,dx$$ $$\Psi(a,b)=\frac{a^2}{3}+a \left(b-\frac{1}{2 \log (2)}\right)+b^2+2 b \left(\frac{1}{\log (2)}-2\right)+\frac{2 (\log (2)-1)^2}{\log ^2(2)}$$ Differentiating with respect to $a$ and $b$ gives $$a=\frac{9}{\log (2)}-12=0.984255\quad \text{and} \quad b=8-\frac{11}{2 \log (2)}=0.0651773$$ giving $$\Psi=\frac{72 \log (2)-23}{4 \log ^2(2)}-14=0.000639095$$ which is a marginal improvement.

Bumbble Comm On 17 Nov 2022 - 2:32

I prefer to add a separate answer for @RobPratt's second case

After @LorenzoPompili's comments

$$F(x,k)=\left|\log_2(x+1)-(x+k)\right|$$ As I wrote in comments, the intersection points are

$$x_{\text{low}}=-1-\frac{W_{-1}\left(-2^{k-1} \log (2)\right)}{\log (2)}\qquad \text{and}\qquad x_{\text{high}}=-1-\frac{W_0\left(-2^{k-1} \log (2)\right)}{\log (2)}$$ where $W(.)$ is Lambert function. The solution of $x_{\text{low}}=x_{\text{high}}$ is $$k=1-\frac{1}{\log (2)}+\frac{\log \left(1+\frac{1-\log (2)}{\log (2)}\right)}{\log (2)}$$ which is exactly twice the famous number. $$F\left(\frac{1}{\log (2)}-1,1-\frac{1}{\log (2)}+\frac{\log \left(1+\frac{1-\log (2)}{\log (2)}\right)}{\log (2)}\right)=0$$ Why the double ?

**Bumbble Comm** · Accepted Answer

Let me explain my comments.

Let’s call $\alpha:= \log_2(1/\ln(2))-1/\ln(2)+1$. In this answer, we want to find $k\in\mathbb R$ so that the following quantity is minimized: $$ \max_{[0,1]} |\log_2(x+1)-(x+k)|=:err(k) $$ (the second quantity in RobPratt’s answer), which I called the maximum error, and which I believe is the quantity that is being minimized in the algorithm mentioned by the OP.

By definition of absolute value, and by some properties of $\max$ and $\min$, we have that $$ err(k)=\max\left\{\;\max_{[0,1]} \log_2(x+1)-(x+k)\;\;,\;\; \max_{[0,1]} -\left(\log_2(x+1)-(x+k)\right)\;\right\}=$$ $$ =\max\left\{M-k,-m+k\right\}, $$ where $M$ and $m$ are respectively the maximum and minimum values of the function $f(x)=\log_2(x+1)-x$.

Now one can use some Analysis, computing the derivative of the above function and studying its graph (I will skip this as I think it should not be difficult), to conclude that $$ M=\alpha\qquad\text{and}\qquad m=0. $$ It follows that $$ err(k)=\max\{\alpha-k,k\}=|k-\alpha/2|+\alpha/2 $$ achieves its minimum value when $k=\alpha/2$, meaning $$err(\alpha/2)=\alpha/2.$$ As I mentioned, this value is approximately $\alpha/2= 0.04303566$, pretty close to the number $0.043$ in the question.

Hence the division by $2$. It’s essentially because of the absolute value. If one looks at the graph I think can figure out why we have to choose $k$ to be half of $err(0)$.

Why does this approximation constant work?

There are 4 best solutions below

Related Questions in INTEGRATION

Related Questions in OPTIMIZATION

Related Questions in ALGORITHMS

Related Questions in APPROXIMATION

Trending Questions

Popular # Hahtags

Popular Questions