tanh implementations for FPGA neural nets

657 Views Asked by At

In trying to put a neural network on my FPGA, I am running into the problem of describing my activation function. On my computer, I typically use the well-known tanh, so I am inclined to stick to my choice on the board too. However, I can't seem to find the best way to calculate tanh on an FPGA, so my question is:

Given a signed 16-bit word (with 8 bits of fraction length), what's the easiest way to implement the tanh function with reasonable accuracy?

Keep in mind that the target is an FPGA, so things like multiplications are OK (as long as I can prevent the word-length from growing too fast), but divisions are tricky and I would like to avoid them as much as possible. Also, the output word length can be optimized (I can devote all but two bits from the word to the fractional part, since the range is (-1, 1)). And by reasonable accuracy, I mean at least 5 decimals worth.

The options I have researched already are:

1) Look-up tables: These need no explanation, I am sure.

2) CORDIC: I was able to write a tanh CORDIC implementation using details from this paper from Walther, though I do not fully understand the 'convergence' of this algorithm, or how I can test it (meaning my implementation returns the right answer for radian values > 1.13, so where's the problem?)

3) Minimax approximation: This requires a division, and more importantly, the input argument needs to be raised to up to the nth power (n being the degree of the polynomial), which I think will cause problems with a fixed-point format like mine.

Are there other computationally cheap ways of calculating this function? There's plenty of literature out there on this very subject, using everything from stochastic machines to DCT interpolation filters, but none have clear details that I could understand.

Thanks in advance!

2

There are 2 best solutions below

2
On

The best way turns out to break into a small number of ranges and use Pade approximants in each range. This method requires one division.

For example, when $|z| < 2$ use $$ \tanh(z) \approx z\frac{945+105 z^2 + z^4}{15(63+28z^2+z^4)} $$ while when $2 \leq z < 4.5$ use $$ \tanh(z) \approx \frac{26.648 + 22.955(z-3) + 7.5759 (z-3)^2 + 1.0202(z-3)^3-0.002523(z-3)^4}{26.781+22.804(z-3)+7.6516(z-3)^2 + (z-3)^3}\\= \frac{0.995055 + 0.857143(z-3) +0.282885(z-3)^2+0.038095(z-3)^3-0.000094(z-3)^4}{1+0.85149(z-3)+0.285714(z-3)^2+0.03734(z-3)^3} $$ and when $4.5 \leq z$ use $$ \tanh(z) \approx -\frac{15 + 15.0002(z-6) + 6 (z-6)^2 + 1.0001(z-6)^3}{15.0002+15(z-6)+6.00007(z-60)^2 + (z-6)^3} $$ For $z > 7$ I would just say $\tanh(z) = 1$, although the expression shown works well all the way out to about $z = 200$.

Of course, to get good accuracy in that last case, you should figure out the tiny deviations from integers separately.

You would have to adjust where your fixed point lies depending on which range you are in, to get good enough accuracy in your constants; or you can normalize to put your constants in your preferred range.

1
On

Asker mentioned in comments that the FPGA has 36 kbits of RAM available for table storage. If the output format is S1.16 fixed-point, it is only necessary to store 1598 16-bit table entries to cover those results that are not $1$ to within the precision of the output format. With this simple straight lookup a maximum error of $7.63 \cdot 10^{-6}$ is achieved, roughly in line with the asker's expectations. The required table storage is 25568 bits, well within the hardware limitations.

I coded a quick prototype of this approach in a simple ISO-C program:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>

typedef int32_t int18_t; /* C doesn't provide an 18-bit type, fake it here */

uint16_t tab[0x63e]; // 1598 * 16 bits = 25568 bits

/* receives data in s7.8 fixed point, returns result in s1.16 fixed-point */
int18_t fxtanh (int16_t x)
{
    int18_t r;
    uint16_t absx = abs (x);

    r = (absx > 0x63d) ? (0x10000) : ((int18_t) tab [abs (x)]);
    r = (x < 0) ? (-r) : (r);
    return r;
}

int main (void)
{
    int i;
    int18_t res;
    double resf, ref, err, maxerr = 0;
    for (i = 0; i < 0x63e; i++) {
        tab[i] = (int)((tanh (i / 256.0) * 65536.0 + 0.5));
    }

    for (i = -0x8000; i <= 0x7fff; i++) {
        res = fxtanh ((int16_t)i);
        resf = (double)res / 65536.0;
        ref = tanh (i / 256.0);
        err = fabs (resf - ref);
        if (err > maxerr) maxerr = err;
    }
    printf ("maxerr = %15.8e\n", maxerr);
    return EXIT_SUCCESS;
}

Two other alternatives I examined, which compute $\mathrm{tanh}(x) = 1 - \frac{2}{\exp(2x) +1}$, do not achieve the accuracy desired by the asker. The first uses quadratic interpolation to compute $2^{x}$ from a table of 130 32-bit entries, returning a 16.16 fixed-point result. The maximum absolute error in $\mathrm{tanh}$ with this method is $3.2 \cdot 10^{-5}$. The second computes $\exp$ as a 16.16 fixed-point number using Meggitt's pseudo-multiplication process, requiring 20 table entries. The maximum absolute error in $\mathrm{tanh}$ is $3.51 \cdot 10^{-5}$ with this method.