Most efficient way to compute products of polynomials of same degree n

152 Views Asked by At

Say we consider a finite field $GF(p)=F_p$ and define some polynomials $f,g\in F_p[x]$. Let $$f(B) := \sum\limits_{k=0}^{n} a_iB^i,\\ g(B)=\sum\limits_{k=0}^{n} b_iB^i $$ where $B$ might be a basis of any field extension or a power of 2. (In my specific case I consider $B=2^{32}$)

The aim is to compute "$f*g$" with carring "$a_ib_i > B$?" to the next state.

  • If I do the schoolbook method I will get $n^2$ products and $(n-1)^2$ sums. There will be some further sums in the "carry-case".

  • The Karatsuba method will have less multiplication but more additions.

  • The recursive Karatsuba could have less multiplication, than Kratsuba.

  • Montgomery multiplication is actually unknown to me.

  • Number-Theoretic Transform is also unknown to me.

Aims of this question

  • Determine the most powerful multiplication method for polynomials over finite fields
  • Consider a reduction polynomial and determine the most powerful multiplication in that case.
  • Learn how use this method

I know, that none of those algorithms is the best at all, but in a specific range, depending on $n$ and $p$. Say $16\geq n \geq 11$ and $p\sim2^{340}$ prime. Furthermore we could consider $p\equiv -3 \bmod 8$, if this has any effect.

So I would recommend to first determine, how many multiplications and additions have to be done, for computing the product and consider then a method.

I'm interested in learning those methods for applying in the cases n=11 and n=16 with reduction polynomial $x^{16}-2$. But we only have to consider the first case.

Edit: Schoolbook Multiplication

If I programm the multiplication, the result is allways false. It can easily be checked by sage.

So I will present you my inner logic on a small example, and if there is some more space, with a M(n)WE. [Minimal (not) Working Example :) ]

So suggest we have a number A to the base $g=2^{32}$ represented as $A=\sum^n_{k=0} a_kg^k$. Then a Product of two such numbers will lead to

$$A\cdot B = \sum\limits_{k=0}^{2n-2} g^k \sum\limits_{h=0}^k (a_h \cdot b_{k-h})$$

working example

This is my starting point. Suppose we have n=3 and do the equal ordering of coefficients. Storing it as an array in type "arr"

#include <stdint.h>
#define SOA 3
typedef uint32_t arr[SOA];

now we start to deal with the multiplication

void arr_multiplication(arr A, arr B, arr C)
{  
    /* C = A*B
        require two arrays of size of arr, to keep store the result in
        + arr array, carry_array
            + array stores the least 3 entries of 32-bit
            + carry_array stoes the first 3 entries. 
            => C    = carry_array[2] . ... . carry_array[0] . array[2] . ... . array[0]
                    = (c_5,...,c_0)
        require different sized, to deal with overflow:
        + uint64_t c, carry
        + uint32_t carry1, carry2, left, right
            + carry1 stores the overflow of 32-bit
            + carry2 stores the overflow of 64-bit
            + left and right are split-holder
        require counter
        + short i, h, j;
    */
    arr array = {0x0}, carry_array = {0x0};
    uint64_t c, carry;
    uint32_t carry1=0, carry2=0, left, right;
    short i,h,j;

    // fill "array"
    for(i=0; i < 3; i++)
    {
        h=0; 
        carry = 0;
        for(j=i; j>=0; j--)
        {
            c = (uint64_t) A[h] * B[j];     // cast for dealing with overflow.
            split32(c, &left, &right);      // handle overflow

            c = (uint64_t) right + carry1 + array[i];   // add carry and last round-storage
            carry1 = 0;                                 // reset carry1

            carry += (uint64_t) left;                   

            split32(c, &left, &right);

            carry += left;
            array[i] = right; 
            // increase and decrease the counters
            h+=1;
        }
        C[i]=array[i];
        carry += carry2;
        split32(carry, &carry2, &carry1);
    }
}

This should fill the lower cases, but it does not. A test will reveal that this goes wrong. Consider the given example-arrays.

void main()
{
    arr A = {0xd45a950a, 0xc81aad59, 0x6cf1d82e},
        B = {0xa1170e7d, 0x74fe38a9, 0x5e688b75},
        C = {0x0},
        array_sol = {0x524751e2, 0xa4732df2, 0x5b3a3eef},
        carry_sol = {0x215f0468, 0x4d2e108c, 0x282d4afa};

    arr_multiplication(A, B, C);
    print_hex(C); 
    print_hex(array_sol); 
}

split32 could be done by:

void split32(uint64_t element, uint32_t* left, uint32_t* right){
    *left = (element >> 32);
    *right = (uint32_t) element;
}

Do you see, where I did an mistake? I cant figure it out.

Fixed

Some useful sources

[NTT] http://www.apfloat.org/ntt.html