Generating a random matrix with prescribed conditions

263 Views Asked by At

I need to uniformly generate a random matrix $X$ with positive integer entries satisfying a number of prescribed conditions:

  • The matrix dimensions are prescribed, say $m\times n$
  • For each row, the sum of the entries of that row is prescribed, ie $\sum_{j=1}^{n} X(i,j) = f(i)$
  • For each column, the sum of the entries of that column is prescribed, ie $\sum_{i=1}^{m} X(i,j) = g(j)$
  • Each entry $X(i,j)$ takes values in a prescribed positive integer interval, say $[a, b]$

It's clear for instance that a solution requires $\sum_{i=1}^{m}f(i) = \sum_{j=1}^{n}g(j)$, for both equal $\sum_{i=1}^{m}\sum_{j=1}^{n}X(i,j)$. It's also important the prescribed positive integer interval is compatible with the prescribed functions $f$ and $g$, although there is room here.

How do I go about doing this concretely? If a numerical example is needed, take the prescriptions:

  • $m = 8$
  • $n = 18$
  • $f(1)=f(2)=\dots =f(7) = 34$ and $f(8) = 32$
  • $g(j)=15$ for all $j$
  • The positive integer interval is $[1,4]$

Notice $(7 \cdot 34)+32=15 \cdot 18 = 270$.

1

There are 1 best solutions below

14
On BEST ANSWER

Depending on how many of these you want to generate, a fairly simple approach is to use Markov Chain Monte Carlo sampling; this can be used in practice to uniformly sample many combinatorial structures such as Latin squares; see, for example, Jacobson and Matthews (1996).

Here's a solution which works in the more general situation that, for each cell in the matrix, you have a lower bound $l_{ij}$ and an upper bound $u_{ij}$ with $l_{ij}<u_{ij}$. You need two very simple pieces of code. One generates a valid matrix, not necessarily uniformly; this is easy just by starting off each matrix entry at $a$, and repeatedly picking a random incrementable entry and incrementing it. The second piece of code just needs to randomly "tweak" the matrix; the only requirement is that the set of all matrices is connected via the tweaks. In your case, the natural tweak is to pick two rows and columns, and add the $2\times2$ matrix $$T=\pmatrix{1&-1\\-1&1}$$ provided the bounds are respected; we will show below that this is sufficient to connect all legal matrices.

To generate a matrix uniformly at random: generate a non-uniformly chosen matrix, and tweak it a large number of times. The Markov chain Monte Carlo theory tells you how many times you need to tweak before the result is effectively uniformly random. See the references; that time should be a low degree polynomial in the diameter of the space. In practice it depends on how worried you are about uniformity; you can just run it for a reasonable amount of time and call it done. Here's the first one I generated for your example parameters:

$$\left( \begin{array}{cccccccccccccccccc} 4 & 2 & 1 & 1 & 2 & 3 & 4 & 3 & 2 & 1 & 1 & 1 & 1 & 3 & 1 & 1 & 2 & 1 \\ 2 & 1 & 4 & 1 & 1 & 1 & 1 & 2 & 4 & 3 & 2 & 1 & 4 & 1 & 2 & 2 & 1 & 1 \\ 1 & 3 & 1 & 1 & 2 & 1 & 1 & 2 & 2 & 1 & 4 & 3 & 3 & 1 & 1 & 3 & 1 & 3 \\ 3 & 2 & 1 & 2 & 3 & 2 & 3 & 1 & 2 & 3 & 2 & 1 & 1 & 1 & 2 & 1 & 1 & 3 \\ 1 & 2 & 2 & 3 & 2 & 3 & 2 & 1 & 1 & 1 & 2 & 1 & 1 & 4 & 2 & 2 & 2 & 2 \\ 1 & 3 & 2 & 2 & 3 & 2 & 1 & 1 & 1 & 3 & 1 & 3 & 1 & 2 & 1 & 3 & 3 & 1 \\ 1 & 1 & 2 & 4 & 1 & 1 & 1 & 3 & 2 & 2 & 1 & 2 & 1 & 2 & 4 & 2 & 2 & 2 \\ 2 & 1 & 2 & 1 & 1 & 2 & 2 & 2 & 1 & 1 & 2 & 3 & 3 & 1 & 2 & 1 & 3 & 2 \\ \end{array} \right)$$

Theorem: The $2\times2$ tweaks connect the space of matrices.

Let $A$ and $B$ be distinct legal matrices; we will show that you can easily find a tweak of either $A$ or $B$ so as to reduce the $l_1$-norm $||A-B||_1=\sum_{i,j}|a_{ij}-b_{ij}|$ by at least $2$. This implies that you can get from $A$ to $B$ in at most $||A-B||_1/2$ tweaks.

Look at the sign pattern of $A-B$. There must be a $+$ somewhere, and there must be a $-$ elsewhere in the same row and a $-$ in the same column. Thus, there exist two rows and two columns such that the difference pattern is $\pmatrix{+&-\\-&*}$. There are now two cases in which we can tweak this $2\times 2$ submatrix. If $a_*>l_*$ (if $a_*>b_*$ we’re in this case), then we can subtract $T$ from $A$, reducing $||A-B||_1$ by either $2$ or $4$. Similarly, if $b_*<u_*$, we can add $T$ to $B$, which also decreases the distance by either $2$ or $4$. If neither of these is an option, then we must have $l_*=a_*=b_*=u_*$, but this contradicts the requirement that $l_*<u_*$.$\hspace{1em}\square$