Random binary matrix with given rows and columns sums

2.5k Views Asked by At

I need to generate a random binary matrix $(n, n)$ whose rows sums and columns sums are $4$.

I don't manage to find a quite efficient algorithm to do this. Have you an idea please ?

NB : The distribution should be uniform.

2

There are 2 best solutions below

4
On BEST ANSWER

I'm not an expert in graph theory and I tried a direct method. First, I generate a matrix with the correct number of ones in the rows; then, I fix the columns.

bin <- function(n=6,k=4){ # creates a matrix w 4 ones in every row and col
A <- matrix(0,n,n) # 0 matrix
for(i in 1:n){
    pos <- sample(1:n,k) # sample k positions in i-th row
    A[i,pos] <- 1 # and set them to 1
}
# sumrows is now ok

sumcol <- apply(A,2,sum)
missing <- pmax(0,k-sumcol) # vector of n component, how many ones miss for each col
excess <- pmax(0,sumcol-k) # vector of n component, how many ones miss for each col
cat(sum(missing)," ",sum(excess),"\n")
#
# main idea 
# pick a random column with a missing one (r) and a random row with an excess one (s)
# swap randomly a one in s-th columns with a zero in r-th column
# till matrix is ok (i.e., no columns has less than k ones)

while(sum(missing)>0){
    rr <- which(excess>0)
    ss <- which(missing>0)
    r <- if(length(rr)==1) rr else sample(rr,1)
    s <- if(length(ss)==1) ss else sample(ss,1)
    ex <- which(A[,r]==1 & A[,s]==0)
    j <- if(length(ex)==1) ex else sample(ex,1)
    A[j,s] <- 1;A[j,r] <-0

    sumcol <- apply(A,2,sum)
    missing <- pmax(0,k-sumcol)
    excess <- pmax(0,sumcol-k)
}
# sumcolumns is now ok    

A

}

rr (ss) are vectors with the indexes of the columns with missing (excess) ones. r and s are random draw from rr and ss. there must be at least one way to randomly swap a one with a zero in columns r and s. The if statements are due to the behavior of sample in R: sample(3,1) samples 1 element from 1:3, and does not provides 3. Hence, I check for its length.

10 matrices where n=200 and k=100 are generated in about 10 seconds on my AirBook and every matrix requires on average 110 fixes (out of 100*100=10000 entries).

I'm not yet familiar with the editor, polish my text if you wish. Hope this helps.

8
On

Expanding on Joriki's answer, you can use the algorithm inhttp://www.columbia.edu/~mo2291/Publications/DirectedRandomGraphs_29Jan13.pdf on page 7 or 8 to create a directed graph with outdegree 4 and I degree 4 at each vertex, with possible self-loops and multiple edges. Then you can exclude those with multiple edges. The directed adjacency matrix (a one in the i,j position means there is an edge from vertex i to j) gives you what you need.

The algorithm basically boils down to adding four random permutation matrices and changing the matrices if you get a 2 or higher in any entry. This is more efficient the larger $n$ is.