I have $n$ data blocks and $k$ parity blocks distributed across $m$ boxes where each box can contain atmost $b$ blocks. Each parity block is Ex-or of some data blocks (for ease of understanding we can assume each data/parity block as a single bit) and each data block is involved in atleast one Ex-or operation to get some parity block. Each data block can be involved in more than one parity block operation, infact will be as that will increase the fault tolerance of the code. So, we have $k$ parity equations as follows:
$$\begin{aligned} P_1 &= D_{11} \oplus D_{12} \oplus \cdots \\ \;\vdots \\ P_k &= D_{k1} \oplus D_{k2} \oplus \cdots \end{aligned}$$
Now if some data or parity block fails and we cannot recover the lost data blocks from the non-failed data/parity blocks we call this situation a "dataloss" situation. Now, I will give a simple example to explain this. Assume that we have $4$ data blocks and $3$ parity blocks where the parity equations are as follows: $$\begin{aligned} P_1 &= D_{1} \oplus D_{3} \oplus D_{4}\ \\ P_2 &= D_{1} \oplus D_{2} \oplus D_{3}\ \\ P_3 &= D_{2} \oplus D_{3} \oplus D_{4} \end{aligned}$$
We represent them by a matrix called generator matrix which is a $4$x$7$ matrix which is as follows:
$$ G=\left(\begin{array}{ccccccc} 1&0&0&0&1&1&0\\ 0&1&0&0&0&1&1\\ 0&0&1&0&1&1&1\\ 0&0&0&1&1&0&1 \end{array}\right) $$
Now, suppose $D1$,$D2$,$D3$ fails and we want to check whether that causes dataloss. So, we remove the columns corresponding to $D1$,$D2$,$D3$ from G and call the matrix we get "recovery matrix" which is
$$ G'=\left(\begin{array}{cccc} 0&1&1&0\\ 0&0&1&1\\ 0&1&1&1\\ 1&1&0&1 \end{array}\right) $$ here. Now, we check the rank of $G'$. If this is less than $n$ then we have $dataloss$, otherwise not. Here $rank$(G')=4 and $n$=4 and we don't have any $dataloss$. Basically, if we Ex-or $P1$,$P2$,$P3$ then we can recover $D3$, from that we can recover $D1$,$D2$.
A data/parity block fails if and only if the box containing it fails and if a box fails then all the data and parity blocks inside it fails. Now we want to formulate the following optimization problem:
Given $n$, $k$, $m$ and the $k$ parity equations (we are not allowed to choose which data blocks are xor-ed to form a check block, we are given an erasure code system) we want an assignment of the disk and parity blocks across $m$ boxes such that we can minimise the number cases where a box failure causes "dataloss"; i.e I want to minimise the function $G=\sum_{i=0}^m\;g(i)$, where
$$g(i) = \begin{cases} 1 & \text{if failure of box }i\text{ causes "dataloss"} \\ 0 & \text{otherwise} \end{cases}$$
Now the assignment of disk/parity blocks across $m$ boxes can be represented by a matrix of dimension $(n+k) \times m$, lets call it $B$. The $(i,j)$-th entry of the matrix is $1$ if the $j$-th box contains $i$-th data block ($(i-n)$-th parity block) for $i \le n$ (for $i>n$), otherwise it is $0$. So, the function $G$ is a function of all those matrix entries and the constraints we have are:
- each matrix entry is either 0 or 1
- sum of the entries in a row is always 1 (i.e each data/parity block is present in exactly one box)
- sum of the entries in each column is at least 1 (i.e. no box is empty)
- summation of all the matrix entries is $n+k$
Now, the function $G$ is coming as a non-linear function of the matrix entries. Also the domain set of $G$ is set of all $n$-tuples where elements of a tuple are 0 or 1. So, the domain set is not a convex set, so convex optimization techniques can't be applied here. How to solve this optimization problem?
If I refine problem like this: How much of the $n+k$ blocks can be covered using m subsets each of size atmost b then I think it is possible to formulate it as a maximum independent set problem. We will form a graph of n+k vertices where each vertex corresponds to one block(data/parity) but the definition of edges is not clear to me (this is the main problem I am trying to solve). Then the maximum independent set of this graph will give us the maximum subset (of the n+k columns) that can be covered.
Somehow I feel that the problem is NP-Complete.
Hope I have described the problem clearly.
======
Edit (comment from Jyrki Lahtonen): This was also posted at MO. I try to redirect eventual interested MO-posters here, because at the moment I think we have a better description of the question here.
This is too long to be comment, but I need to ask this to get/give a better idea of the problem setting. I cannot tell for sure, but based on my exposure to related problems I think it looks like the following: If the $n\times (n+k)$ binary matrix describing the presence of a data block in a data or a check block is $A$, then $A$ has $I_n$ block in the left (data block $D_i$ copied to its own block) and then a `random' (or carefully designed) $n\times k$ binary block $B$ telling us which data blocks are XORed to form that check block: one column per check block, so an entry '1' on a row $i$ and column $j$ of the matrix $B$ tells that data block $\#i$ is among those XORed to form check block $\#j$.
Now distributing these $n+k$ blocks of data into boxes amounts to partitioning the columns of the matrix $A$ into $m$ subsets. AFAICT data loss occurs, if and only if removal of the columns belonging to the box in question leaves us with a matrix $A'$ that is no longer of full rank $n$. This is because, if the matrix $A'$ were of full rank, we could select a subset of $n$ linearly independent columns. As they form an invertible matrix, data recovery could be done (by e.g. Gaussian elimination).
Is this what you have in mind? This seems to be closely related to erasure correction/recovery codes, and you may benefit from using that as a buzzword, when searching for more information.
Before we get serious, let me describe a toy example of the above. Let $$ A=\left(\begin{array}{ccccccc} 1&0&0&0&1&1&0\\ 0&1&0&0&0&1&1\\ 0&0&1&0&1&1&1\\ 0&0&0&1&1&0&1 \end{array}\right) $$ be this matrix that some of you will recognize as the generaotr matrix of the the $[7,4,3]$ Hamming code. Here the last 3 columns tell that the first check block is constructed by XORring data blocks 1,3 and 4, the second check block by XORring data blocks 1,2,3 and the last check block by XORrin data blocks 2,3,4.
I claim that in this case using $m=3$ boxes allows us to always recover from the loss of any single box. Put data blocks 1,2 into the first box; data blocks 3,4 into the second box; and all the check blocks into the third box. If we lose the third box, then we obviously have all the data in tact. If we lose the first box, then we can recover data block #1 by XORring the first check block with data blocks 3 and 4, and similarly we can recover data block #2 by XORring the last check block with data blocks 3 and 4. Similarly if we lose the second box, then we can recover the missing data blocks 3 and 4 usigng the the saved check blocks and the saved data blocks 1 and 2.
Because this binary code has minimum Hamming distance 3, it can recover from any two erasures. Therefore I knew in advance that I woudl be able to recover the lost contents of either of the first two boxes.
Please comment. I realize that you may be looking for a different approach. May be one with simpler recovery schemes?