This is kind of a signal processing/programming/mathematics crossover question. At the moment it seems more math-related to me, but if the moderators feel it belongs elsewhere please feel free to migrate it.
I'm working on a project where I have limited computational power and need to make a speedy approximation of the hyperbolic tangent function over a fairly large range of input arguments. Assuming the numbers are stored in fixed point with an 8 bit fractional part then the approximation to $\tanh(x)$ should work to the limit implied by the resolution, or for arguments $\tanh^{-1}(\pm[1 - \frac{1}{2^8}]) \approx \pm3.1$. I know that Taylor series will not converge fast enough over this range, and I haven't fully looked into it but I don't believe that a Chebyshev polynomial approximation will converge rapidly enough either. For various reasons a table lookup is also not possible.
$\tanh(x)$ is algebraically equivalent to $\mathrm{sgn}(x)(1 - \frac{2}{e^{2|x|} + 1})$. This looks promising; a series expansion of $e^x$ converges better than $\tanh(x)$. For a rapid numerical calculation on limited hardware the " + 1" in the denominator throws a wrench into the works - it means a division has to be carried out, and since many simple processors don't have "hardware divide" the division has to be done in software through repeated shifts and subtractions, which is achingly slow.
To get around this I'm considering the following approach. Instead of using $\mathrm{sgn}(x)(1 - \frac{2}{e^{2|x|} + 1})$ to exactly represent tanh(x), define the following similar type of function: $\mathrm{sgn}(x)A(1 - e^{-B|x|})$ for unknown constants A and B. Then define a loss function:
$f(x) = \int_0^{3.1}[\tanh(x) - A(1 - e^{-Bx})]^2dx$
Then use a numerical method (gradient descent?) to find the values of A and B that minimize the loss. The problem of approximating $\tanh(x)$ rapidly is then converted into approximating $e^{-x}$ rapidly plus an extra subtraction and multiplication.
$e^{-x} = \dfrac{1}{e^x} = \dfrac{1}{2^{x \log_2(e)}} = \dfrac{2^{-x}}{2^y}$, where y is the integer part and x is the fractional part. Expressing $e^{-x}$ this way lets me reduce the range of the input argument; dividing an unsigned number by $2^y$ where y is a positive integer can be accomplished by logical shifting. For arguments between 0 and -1 just eyeballing a Taylor series expansion of $2^x$ looks pretty good for just 3 terms.
Does this approach seem reasonable, from a mathematical perspective? Pitfalls I haven't considered? Thanks for any advice.
Edit: I should add that in this application (audio processing) I'm willing to trade off absolute accuracy of the calculated value in exchange for speed of processing and a function that works "reasonably" well over the entire range of input arguments - the return value for every argument in the approximation doesn't have to be accurate to the $2^{-8}$ limit implied by the resolution.

You are certainly aware that the hyperbolic tangent has an asymptote; since no polynomial ever had a horizontal asymptote, it stands to reason that a polynomial will always poorly approximate the qualitative behavior of the hyperbolic tangent.
One viable approach would be to consider the rational functions that come from truncations of the continued fraction for the hyperbolic tangent:
$$\tanh\,z=\cfrac{z}{1+\cfrac{z^2}{3+\cfrac{z^2}{5+\cdots}}}$$
Here for instance are plots comparing $\tanh\,z$ and the convergent
$$R(z)=\cfrac{z}{1+\cfrac{z^2}{3+\cfrac{z^2}{5+\cfrac{z^2}{7+\cfrac{z^2}{9+\cfrac{z^2}{11}}}}}}$$
The left plot shows that the rational approximant and the actual function are almost visually indistinguishable, while the right plot depicts the function $\tanh\,z-R(z)$.
One other possibility you can use in conjunction with rational function approximation is the use of argument reduction; in particular, the identity
$$\tanh\,z=\frac{2\tanh\frac{z}{2}}{1+\tanh^2\frac{z}{2}}$$
is very useful here. To sketch a possible algorithm: scale $z$ by $2^{-k}$, where $k$ is appropriately chosen such that $\dfrac{z}{2^k}$ is "small enough", evaluate the truncated continued fraction at this reduced argument, and then repeatedly apply the double argument identity $k$ times.