Computationally inexpensive redundancy for partial encoding

61 Views Asked by At

With partial encoding I mean an encoding of information that only contains some of the information.

An example involves using a set of prime numbers and moduli. A small and silly example:

  1. Choose a set of prime numbers, $\{2,3,5\}$
  2. Choose a number that is less than the number of prime numbers, $n=2$
  3. Compute the product of the $n$ smallest numbers in the set, $basis=6$
  4. Encode data as a string of digits in the $basis$ $6$
  5. Divide each digit with each prime from the set and call each sequence of remainders a partial encoding

Assembly

  1. Given a set of sequences of remainders modulo different primes we can reconstruct the string from (4.) using the gcd algorithm iff the product of the basis of the moduli is greater or equal to the basis.

The purpose of partial encoding is to take some data in split it into parts. If you for instance have $10$ users who each have one complete copy of some data then there might be a significant risk that none of them are online. But if $100$ people each have $10\%$ of the data you would have a higher likelihood of reconstructing the original data. You would do this with the algorithm above, except you have to have $100$ primes and you might end up multiplying some really big numbers.

So I'm wondering if there is a more efficient way to partially encode data.

1

There are 1 best solutions below

0
On

There is a protocol where you can give each user 10% of the data in a very precise sense: when 10 users are online, you will have all the data, and when any fewer are online, you will have none of the data. This is known as the Reed-Solomon method.

Suppose you are trying to split up $b$ bits of data so that any $n$ users will together have all the data, but any $n-1$ users will have none of the data. The following math will take place over the finite field $\mathbb F$ with $2^b$ elements. Choose a polynomial $$ p(x)=x^n+a_{n-1}x^{n-1}+a_{n-2}x^{n-2}+\dots+a_1x+a_0 $$ where $a_1,a_2,\dots,a_{n-1}$ are randomly chosen elements of $\Bbb F$, while $a_0$ represents the $b$ bits of data (i.e, choose a bijection from $\Bbb F$ to the set of binary strings of length $b$).

Then, for $i=1,2,\dots,2^{b}-1$, the $i^\text{th}$ user is given the value of $p(i)$, where we interpret $i$ to be an element of $\Bbb F$. The idea is that a monic polynomial of degree $n$ is uniquely determined by any $n$ of its values, so any $n$ users can recover the polynomial, and therefore the data $a_0$.

Some notes:

  • $b$ should not be too small, or else you there will not be enough elements of $\Bbb F$ to give each user. You need $2^b$ to be larger than the number of potential users.
  • However, if $b$ is too large, the calculations will become cumbersome. If you need $b$ to be large, then it is probability best to encrypt the $b$ bits of data with some asymmetric encryption scheme like AES-$128$, then use this method to share the $128$ bit key.