I want to split the Bush's Baked Beans recipe into $M$ parts so that any set of $N<M$ people can reconstruct the recipe, but with the following constraints:
- Each person knows only a yes or no question, and the answer to it.
- The size of the whole recipe is public.
It would also be nice if it were robust in the sense that some people might only know the probability that their answer is yes.
This seems like a common thing to do. Does anyone know any references for this?
If not for the binary and robustness constraints we could encode it to a polynomial in $\mathbb{R}$ (or a large finite field) and give each person a field element. The more people who grouped together, the more probable it would be that we have found the right recipe.
You're looking for erasure coding.
Number your group of people $1, \ldots, M$. Let's say your recipe $\mathbf{m}$ is $k$ bits long, i.e., $\mathbf{m} \in \mathbb{F}_2^k$. Pick a $[M, k, d > M-N]_2$ erasure code $c : \mathbb{F}_2^k \to \mathbb{F}_2^M$ and calculate the encoded recipe $\mathbf{t} = (t_1, \ldots, t_M) = c(\mathbf{m})$. Now hand out $t_i$ to person $i$, for each $1 \le i \le M$.
Later, gather some subset of size $\ge\!N$ of your original group. Ask each of them for their value $t_i$ and place it in position $i$ of the vector $\mathbf{r}$. Fill in the remaining spaces with a placeholder $\_$, so $\mathbf{r} \in (\mathbb{F}_2 \cup \{\_\})^M$. Pass this value $\mathbf{r}$ into your decoder $d : (\mathbb{F}_2 \cup \{\_\})^M \to \mathbb{F}_2^k \cup \{\text{err}\}$ to recover your original recipe $\mathbf{m} = d(\mathbf{r})$.
Some remarks:
The notation $[M,k,d]_2$ above refers to a (linear) code of block length $M$, dimension (message length) $k$, and minimum distance $d$ over an alphabet of size $2$.
Consider the set $C = \operatorname{im} c = \{ c(\mathbf{x}) \mid \mathbf{x} \in \mathbb{F}_2 \} \subseteq \mathbb{F}_2^M$. The minimum distance of $c$ is
$$d = \min_{\mathbf{x},\mathbf{y} \in C} d_H(\mathbf{x},\mathbf{y}),$$
where $d_H(\mathbf{x},\mathbf{y})$ is the number of places in which $\mathbf{x}$ and $\mathbf{y}$ differ ($d_H$ is called the Hamming distance).
I chose $d > M-N$ so that if up to $M-N$ people are missing (i.e., $M-N$ spaces in $\mathbf{r}$ are $\_$), there's still at least one position in $\mathbf{r}$ that allows us to distinguish which element of $C$ was "stored" by the original $M$ people.
There exist decoders that do this, but I've never studied them.
Designing an encoder $c$ and decoder $d$ is a large task, and is dependent on your parameters $k$, $M$, and $N$. You can find some examples at the bottom of the Wikipedia article I linked.
If you give an order-of-magnitude estimate for those parameters, someone can probably assist you with your choice.
Your intuition about encoding as a polynomial in a large finite field is basically a Reed-Solomon code.