Approximating for the Error function $\text{erf}(x)$ through an Hyperbolic tangent function $\text{tanh}\left(\dfrac{4x}{4-x^2}\right)$
I was plotting some functions and I found that the function $$f(x) = \begin{cases} -1,\quad x\leq -2\\ 1, \quad x\geq 2 \\ \text{tanh}\left(\dfrac{4x}{4-x^2}\right),\, -2<x<2\end{cases}$$ "looks" very similar to the graph of the Error function as it is shown in Wolfram-Alpha:
But looking into the wikipedia page for the Error function this approximation is not listed, so I guess that regardless from the similarity in the plot, $f(x)$ it is considered as a "bad approximation":
Why it is considered a poor approximation?
Also, a simpler version of the Hyperbolic tangent function could fit even better as approximation: $$g(x) = \text{tanh}\left(\dfrac{11}{9}x\right)$$
But no relation with the Hyperbolic tangent function is listed in Wikipedia, so
Why hyperbolic tangents are considered bad approximations for the error function?
Here I left the plots in Desmos:
Added later (after some answers)
After 2 interesting answers, I got the idea of testing the series expansion of $\tanh^{-1}(\text{erf}(x))$ shown in Wolfram-Alpha, and just the first 2 terms makes a simple approximation than works quite good: $$f(x)=\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\frac{(4-\pi)x^3}{3\pi}\right)\right)$$
Here you could see it in Desmos where the maximum aplitude difference is lower than $0.0007$. Also note that don't requires to be defined as a piecewise function.
Does this approx. good enough for approximating probabilities?
Even since after these 2 first terms the Taylor expansion start to converge more slowly, by sacrificing accuracy near $x=0$ (since is symmetric), one could find approximations that reduce the maximum amplitude differences, and also, for making it having fewer terms I have choosen the following value (arbitrarily by trial and error):
$$g(x)=\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\frac{\pi}{35}x^3\right)\right)$$
which keeps the amplitude differences below $0.0005$.
I don't know How to measure if it will made too much mistakes if I use $g(x)$ instead of the Standard Gaussian CDF for taking probabilities, What do you think?
my last attempt
By trial and error I found that: $$z(x)=\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\frac{11}{123}x^3\right)\right)$$
keeps the difference $|\text{erf}(x)-z(x)|<0.00036$. Maybe someone could find an optimal $\hat{a}$ such it makes the best fit possible for $\text{erf}(x)$ through $\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\hat{a}x^3\right)\right)$
Also I compared in Wolfram-Alpha using $z(x)$ for taking probabilities of the standard Gaussian distribution, and the max mistake looks its below $0.018\%$, quite accurate!.








In my first answer, I tried to stay as close as possible to your initial attempf.
Restarting from scratch, what we have is $$\tanh ^{-1}(\text{erf}(x))=t\sum_{n=0}^\infty a_n\,t^{2n} \qquad \text{where}\qquad t=\frac{2 }{\sqrt{\pi }}x$$ where the first coefficients are $$\left( \begin{array}{cc} n & a_n \\ 0 & 1 \\ 1 & \frac{4-\pi }{12} \\ 2 & \frac{96-40 \pi +3 \pi ^2}{480} \\ 3 & \frac{5760-3360 \pi +532 \pi ^2-15 \pi ^3}{40320} \\ 4 & \frac{645120-483840 \pi +116928 \pi ^2-9328 \pi ^3+105 \pi ^4}{5806080} \\ \end{array} \right)$$
for which the errors $$R_n=| \text{erf}(x)-\tanh (S_n)|$$ are $$R_1\sim\frac{t^5}{8744}\qquad R_2\sim \frac{t^7}{3946}\qquad R_3\sim \frac{t^9}{45963}\qquad R_4\sim \frac{t^{11}}{1082270}$$ For the already arbitrary bounds, consider the norms $$\Phi_n=\int_{-\pi}^{+\pi} \Big( \text{erf}(x)-\tanh\left(S_n \right) \Big)^2\,dx$$ $$\Phi_1=6.3\times 10^{-7}\quad \Phi_2=4.4\times 10^{-7}\quad \Phi_3=1.7\times 10^{-7}\quad \Phi_4=1.5\times 10^{-9}$$
To improve the model, for sure, adding more terms is a solution. But, making the series as the $[2n+1,2n]$ Padé approximant $P_n$ is better. For example $$P_1=\tanh\left(t \,\frac{a_1+(a_1^2-a_2)\,t^2 } {a_1- a_2\,t^2 }\right)$$ leads to a maximum error of $0.00053$.
Edit
$$\Phi(\hat a)=\int_{-\infty}^{+\infty} \Bigg(\text{erf}(x)-\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\hat{a}x^3\right)\right)\Bigg)^2\, dx$$ is minimum for $\hat{a}=0.0896929$ and its value is $2.73 \times 10^{-7}$.