Arithmetic in GF$(2^{32})$ using GF$(2^{16})$ and extensions

782 Views Asked by At

Ultimately, I'm looking to implement arithmetic in GF$(2^{32})$. I have a library that implements arithmetic in GF$(2^{16})$ using look-up tables for log and anti-log to implement multiplication, and addition/subtraction are simply $\oplus$ (xor).

My understanding is that I can implement GF$(2^{32})$ as GF$((2^{16})^2)$. I have been looking at this paper which describes algorithms to do that; however, I'm struggling to implement the reduction step.

Following Algorithm 2 on pg 14, attempting to multiply 0x0xAABBCCDD and 0x99887766 in GF$(2^{32})$, I'm using the following algorithm where * denotes multiplication in GF$(2^{16})$:

c[0] = a[0] * b[0]
c[1] = (a[1] * b[0]) ^ (a[0] * b[0])
c[2] = a[1] * b[1]

Now my understanding is that I need to reduce this number ($c[0]*2^{32} + c[1]*2^{16} + c[2]$) by the polynomial listed in Table II on pg 13: $x^2 + x + 8192$. This is where I become lost... how do I do this? The paper explains doing this assuming the reduction polynomial is of the form $p(x) = x^m + x^3 + x + v$. However, the polynomial listed in the table is not of this form, or if it is, then I'm completely lost.

My understanding is that if c[0] is not zero, then I need to reduce c[1] and c[2] using the following algorithm which will give a result in GF$(2^{32})$:

result[0] = c[1] ^ (poly[0] * c[0])
result[1] = c[2] ^ (poly[1] * c[0])

result = (result[0] << 16) + result[1]

However, given the polynomial $x^2 + x + 8192$, I'm not sure what values to use for poly[0] and poly[1] in the above algorithm. Also, this algorithm might also be wrong.

Any help on this final reduction step is greatly appreciated!

Update Posting the full Python code that uses PyFinite:

from pyfinite import ffield

f_16 = ffield.FField(16, gen=0x1100B)
f_32 = ffield.FField(32)

# x^2 + x + 8192
reduce_poly = [1, 1, 8192]

a_32 = 0xAABBCCDD
b_32 = 0x99887766

a_16 = [a_32 >> 16, a_32 & 0xFFFF]
b_16 = [b_32 >> 16, b_32 & 0xFFFF]


c = [0 for _ in range(0, 3)]

c[0] = f_16.Multiply(a_16[0], b_16[0])
c[1] = f_16.Add(f_16.Multiply(a_16[1], b_16[0]), f_16.Multiply(a_16[0], b_16[1]))
c[2] = f_16.Multiply(a_16[1], b_16[1])

print("C[0]: 0x{:04X}".format(c[0]))
print("C[1]: 0x{:04X}".format(c[1]))
print("C[2]: 0x{:04X}".format(c[2]))

print("32: 0x{:08X}".format(f_32.Multiply(a_32, b_32)))
print("16: 0x{:04X}{:04X}".format(c[1] ^ c[0], c[2] ^ f_16.Multiply(reduce_poly[2], c[0])))

For those that find this, the above values will not match because the polynomial used by GF$(2^{16})^2)$ and GF$(2^{32})$ in PyFinite are not the same. I'm not sure how to set the generator for the GF$(2^{32})$ field to something such that the match; however, I believe the above code is correct for GF$(2^{16})^2)$.

1

There are 1 best solutions below

7
On BEST ANSWER

reduction step

Short answer - there is no reduction step in that paper. Instead all of the operations are performed in GF((2^16)^2). Since any GF(2^32) could be mapped to the GF((2^16)^2) used in that paper, GF(2^32) is unknown.

In general, for any prime number p, any GF(p^k) can be mapped to any GF((p^n)^m) where k = n · m, but in order for the two representations of the fields to be mathematically compatible, the two fields must be isomorphic in addition and multiplication:

map(a + b) = map(a) + map(b)
map(a · b) = map(a) · map(b)

The parameters to map() are in GF(p^k), the mapped values are in GF((p^n)^m). Typically, the primitive element for GF((p^n)^m) = β(x) = x + 0. An exhaustive search is done for any primitive element α(x) of GF(p^k) that will meet the isomorphism requirements. The mapping can be accomplished via matrix multiplication k by k matrix with elements in GF(p) time a GF(p^k) value treated as a k by 1 matrix. The inverse matrix can be used to map back.

To answer the updated question, I changed the parameters so that the GF((2^16)^2) parameters are the same as the question. For a GF(2^32) example in this answer, I chose the common one used for jerasure since it's mentioned in the article, but any GF(2^32) can be mapped to the GF((2^16)^2) in the question. In this case the primitive element of GF((2^16)^2) = β(x) = x + 0. For mapping purposes, any primitive element α(x) of GF(2^32) that results in the two fields being isomorphic in addition (map(a+b) = map(a)+map(b)) and multiplication (map(a b) = map(a) map(b)) can be used.

I wrote an optimized exhaustive search program to find a isomorphic compliant primitive element α(x) for GF(2^32), and used it along with β(x) (= x + 0) to generate a 32 row by 32 bit mapping matrix and it's inverse to map between GF(2^32) and GF((2^16)^2). The mapping matrix column indexes correspond to bits 31 to 0, or 2^31 to 2^0. Define an array of powers p{...} = logα(x){2^31, 2^30, ..., 2, 1}, then the mapping matrix column values = β(x)^p{...}. The two matrices and the mapping code are shown at the bottom of this answer. The first row below is GF(2^32) multiply, the second row mapped the parameters and multiplied in GF((2^16)^2). The third row mapped the GF((2^16)^2) product back to GF(2^32), matching the GF(2^32) product:

GF(2^32)                    :  5ad5f3ad * 98a2afcf = 45ae8041
GF(2^32) to GF((2^16)^2) map:  aabbccdd * 99887766 = b14fe0bb
GF((2^16)^2) to GF(2^32) map:             b14fe0bb = 45ae8041

Using a(x) to represent the primitive element used for each field
to perform mapping via a 32 by 32 bit matrix multiply:

GF(2^32) = x^32 + x^22 + x^2 + x + 1     = hex 100400007
a(x) = x^28+x^25+x^24+x^23+x^19+x^9+x^7+x^6+x^5+x^3+x^2+x 
     =                                   = hex  138802ee
     = 2^567056c6 in GF(2^32)
     = found by optimize exhaustive search for a(x)

mapped to

GF((2^16)^2) = x^2 + x + 8192            = hex 100012000
a(x) = x + 0   (normal primitive)        = hex     10000

GF(2^16) => x^16 + x^12 + x^3 + x + 1    = hex     1100b
a(x) = x + 0   (normal primitive)        = hex         2 

Note that since the carryless multiply instruction pclmulqdq (operates on xmm registers) was added to X86 processors since 2008, GF(2^32) multiply can be implemented using 3 pclmulqdq and 1 xor, so no need to use composite field for multiply. For inversion (1/x) in GF(2^32), mapping to a composite field to calculate an inverse and mapping back may be faster than using exponentiation via repeated squaring (30 loops) to calculate x^(2^32-2) in GF(2^32) which results in (1/x).


The multiplication process should have been stated as

$c[0] \ x^2 + c[1] \ x + c[2]$ modulo $x^2 + x + 8192$

Where the coefficients of the polynomials are elements of GF(2^16), where

GF(2^16) => x^16 + x^12 + x^3 + x + 1

The 3 sub-products are:

c[0] = a[0] * b[0]                     = 0x56b3
c[1] = (a[1] * b[0]) ^ (a[0] * b[1])   = 0xe7fc
c[2] = a[1] * b[1]                     = 0xdda0

Then for c[0] x^2 + c[1] x + c[2] modulo x^2 + x + 0x2000

                           56b3
               ----------------
0001 0001 2000 | 56b3 e7fc dda0
                 56b3 56b3 3d1b
                      ---------
                      b14f e0bb

The article also mentions an alternative mapping for GF(2^32)

GF((2^8)^4) => x^4 + x^2 + 6x + 1
GF(2^8)     => x^8 + x^4 + x^3 + x^2 + 1

On a X86, PSHUFB (xmm or zmm registers) can be used to multiply 16 (SSE3) or 64 (AVX512) bytes by a constant in parallel in GF(2^8). This can greatly speed up Reed Solomon code, such as a matrix multiply of a matrix with large rows of data, such as an erasure code used for the cloud. It would be used when encoding data or correcting erasures. For GF(2^8), two 16 or 64 byte tables per constant are needed. For GF(2^16), eight 16 or 64 byte tables per constant are needed.


Mapping tables (in hex):

static DWORD mtb[32] = {  /* map GF(2^32) => GF((2^16)^2) */
    0x9b2185f6,0xe0e734b3,0xa1fc2d7c,0xee9afb21,
    0x19c63c77,0x17770b53,0x5287742b,0x0379891c,
    0x15b48167,0xa96405ce,0xb5a5539a,0xedff4a47,
    0x0b0051d0,0x0a626ea1,0x03ad3e51,0x5e6290c6,
    0xa5091db6,0xb3b41224,0xffb584aa,0xfe96d027,
    0x4b46b715,0xbca3b9ab,0x17dbaafe,0x173b3ad2,
    0xa91b3da7,0x4271982d,0x4dfaa2ba,0x0e384248,
    0x0f819e89,0x16b6b13c,0x09ad54da,0x00000001};

static DWORD itb[32] = {  /* map GF((2^16)^2) => GF(2^32) */
    0xb5ff1217,0x24a007b0,0xa6be4407,0x0eb8e985,
    0x75db543b,0x20b2faea,0x01d01acd,0x131b5df1,
    0x4add8000,0x5ac0e17b,0x045e559e,0x7b4378d8,
    0x70a52415,0x5d46673e,0x1d46b550,0x138802ee,
    0x0fa1abe5,0x34dfb720,0x549751ba,0x130b4354,
    0x0766e40a,0x30e79a6c,0x664fe922,0x7d35a2b3,
    0x498a130c,0x1388e4ae,0x4fdb2d90,0x67fbb262,
    0x1a1907f3,0x5d2bf537,0x26fbb1b8,0x00000001};

Mapping functions:

/*----------------------------------------------------------------------*/
/*      M32to162                                                        */
/*----------------------------------------------------------------------*/
static DWORD M32to162(DWORD a)
{
DWORD r = 0;
DWORD d;
    while(a){
        _BitScanReverse(&d, a);
        a ^= (1ul<<d);
        r ^= mtb[31-d];
    }
    return r;
}

/*----------------------------------------------------------------------*/
/*      M162to32                                                        */
/*----------------------------------------------------------------------*/
static DWORD M162to32(DWORD a)
{
DWORD r = 0;
DWORD d;
    while(a){
        _BitScanReverse(&d, a);
        a ^= (1ul<<d);
        r ^= itb[31-d];
    }
    return r;
}