Calculating a random "blob" in a 10 x 10 grid

Question

Calculating a random "blob" in a 10 x 10 grid

1.4k Views Asked by Bumbble Comm At 02 Apr 2026 - 5:57

The problem:

You have a 10 x 10 grid where each cell can be either occupied (1) or unoccupied (0).
All occupied cells in the grid are part of a single "blob": a single shape defined by a set of occupied coordinates for which each occupied coordinate has at least one occupied coordinate to either its top, left, bottom, and/or right.
- (From this definition it seems the blob must occupy at least two coordinates.)
How can you quickly generate a blob that is chosen randomly from the set of all possible such blobs? If it's not possible to generate such a blob quickly, why is it not possible?

The (probably-wrong) method I'm using right now:

I choose a random cell from the 10 x 10 grid to be the first cell of the blob.
I then choose randomly from the set of cells neighboring the first cell to create the second cell of the blob (to make sure every blob is valid).
I then iterate through an updated-as-it's-added-to list of all of the current cells of the blob, considering each neighboring cell once (and only once).
A certain percentage of the time, I add this neighboring candidate cell to the blob.
- Right now I'm using 50% as the percentage.
- If this method of generating the blob can work (i.e. it doesn't have any fundamental problems with it), I would need to know what this certain percentage should be.

My gut feelings of what the solution may look like:

It seems like the blob is generally going to occupy most of the 10 x 10 grid.
It seems like it might be better to first choose the size of the blob, and only then choose which cells are occupied.
I suspect that the ideal way of creating the blob if I already know how big it should be would be to choose a random cell and then to repeatedly choose a random neighbor to the existing blob and make it part of the blob.
I suspect it may be helpful to first create an algorithm for a simpler one-dimensional grid before trying to create one for a two-dimensional grid.

Intermediate conclusions I'm coming to:

It looks like for the simpler case of a one-dimensional grid, the number of possible blobs is equal to the $(n-1)th$ triangular number.
- See the pictures below that show a one-dimensional grid. The pattern of possible shapes as you move from $n=2$ to $n=3$ to $n=4$ is a triangular-number pattern.

Clarifications:

Two disconnected areas of occupied cells do not count as a single blob.
Each possible blob needs to have an equal chance of being picked.
Two blobs of the exact same size, shape, and orientation, but different location are considered different blobs.
There are answers below which show how to generate a randomly-picked blob but where the randomly-selected blobs can't be generated quickly. I'm looking for a way to do it quickly or a proof that it's not possible to do it quickly.

Some pictures I drew to wrap my head around the issue:

Disclaimer:

I'm dealing with an interview question where I'm allowed to use third party help as long as I disclose it. The interview question isn't to generate the blob, but by being able to correctly generate a randomly-chosen blob I can test several different possible solutions to the interview question.

Some clarifications of Misha's answer:

Q: Why do you say "this method is definitely biased"?

A: Because we already know that the unbiased method works exactly the same way except you only accept the outcome of the random selection when the rules for a blob are applied "strictly" (i.e. you only have a blob when there's a single connected group of occupied spaces), and so if you instead just pick an occupied space and use its connected occupied spaces as the blob, you'll at the very least be biased towards smaller blobs.

Q: What do the vertical bars represent?

A: They represent size / cardinality. So $B$ is the blob itself, and $|B|$ is how many cells it contains.

Q: Why use the $∂$ character?

A: In topology it refers to the boundary of a subset of a space.

Q: How did you figure out the probability for the biased method?

A: Well, $\frac{|B|}{100}$ is just the probability that you choose one of the cells of the blob, assuming it exists. So presumably the other part is the probability that it was created. To make it easier to understand, note that in the first formula you see that the odds of getting the strict blob you want (strict because you're not tolerating occupied cells not part of your blob) on any particular attempt is $2^{-100}$, because you need all 100 of those coin-flips to go exactly the way you want them. So in this less-strict situation, you need the coin-flips for your desired occupied spaces to go the way you want, and you also need the coin-flips for the unoccupied neighboring spaces to go the way you want, so that there's no chance you'll have a neighbor end up occupied and change the blob to a different one.

Q: How did you figure out the lower bound?

A: He just took some constraints he knew to be true and then used calculus(?) to combine those functions to determine what the lowest value could be. But to rationalize it: if you look at the formula for $p(B)$ and say "I want to make $p(B)$ as small as possible", you can see that you want to make the $(\frac 12)^{|B| + |∂B|}$ term as small as possible, and the way to do that is to make $|B|+|∂B|$ as big as possible. And the biggest that sum can be is 100, where the blob and its neighboring cells take up the entire grid. Then you want to make the $\frac {|B|}{100}$ term as small as possible, and the way to do that is to figure out the smallest blob that where the blob and its neighbors fill the entire grid. But since he didn't try to find an actual blob, the number he ended up with (33) may not actually correspond to a real blob (I tried creating a blob of size 33 that occupied or bordered every cell and I couldn't do it). I think the effect of that is to make it take longer for the program to run than if the number were more accurate. If the number was perfectly accurate, then you'd have a 100% chance of accepting the rarest blobs, but with the number less accurate, you have less than a 100% chance of accepting the rarest blobs, but it should still be unbiased.

Q: If the goal of multiplying by the acceptance probability is to get rid of the $p(B)$ term, why use $p^*$ in the numerator? Why not use "1"?

A: Using a numerator greater than the probability of getting the rarest blobs (which is roughly $p^*$) will result in a non-uniform distribution, and using a numerator that's smaller will make the program take longer to run. The reason using a numerator greater than $p^*$ will result in a non-uniform distribution is that it will lead us to accept some not-rarest blobs with the same probability as the rarest blobs. If we have the numerator as $p^*$, then when we actually come across one of those rarest blobs, the $p(B)$ will equal $p^*$, and so the acceptance probability will be "1" (we'll always accept that blob). But if the numerator is larger, not only will we always accept the rarest blobs, but we will also always accept the less-rare blobs whose $p(B)$ is such that dividing the larger numerator by $p(B)$ will result in an acceptance probability of at least 1.

Original Q&A

There are 3 best solutions below

Bumbble Comm On 04 Dec 2017 - 6:39

First, one method that immediately comes to mind is to randomly assigning all 100 cells to be occupied or non-occupied, and then just keep doing this until you get a pattern that is an actual blob.

EDIT ... I should have stopped there ...

Unfortunately, this will bias things towards blobs of a size proportional to the percentage with which you make the cells black or white. For example, if you turn a square black or white with a chance of $50$%, then of course, you'd get almost all blobs with a size of around $50$.

EDIT: Not true. By setting the percentage of turning a cell to black or white to $50$%, the likelihood of each specific blob being the outcome of this process is the same as for any other specific blob.

So, you really want to avoid that bias ... and note that the method described in the post has a very similar bias, in how the size of the resulting blobs very much depends on the percentage your algorithm uses.

So, what to do? Can we calculate how many blobs there are of a certain size? Well, the number of possible blobs is of course really big. And I don't even see a simple formula for calculating an exact number or, for that matter, how many there are of a certain size. So I don't think it will be practical either to exactly pre-calculate these kinds of numbers so that you could randomly pick blobs of a certain size (let alone shape) proportional to how many there are of that size (or shape)

So ... I think you'll have to do a more statistical method. So, for example, you could try to get some kind of estimation for how many blobs there are of various sizes by doing the following:

For a blob of size $n$: Randomly turn $n$ of the $100$ squares black or white, and see if you get a blob. Do this a large (but still practical) number of times, say $10000$, or maybe even $100000$ and just count how many times you get a blob. Do this for all $n$, and that should get you a rough idea for the proportions of blobs of a certain size, e.g if for $n=3$ you obtained a blob $20$ times out of $10000$, but for $n=40$ you got a blob $200$ times out of $10000$, then there should roughly be $10$ times as many blobs of size $40$ then blobs of size $3$, and that is because each specific blob (of whatever size, shape, or orientation) will be chosen with the exact same likelihood as any other blob.

(By the way, for some $n$ you will hardly get any blobs at all, e.g. I would not be surprised if you get $0$ blobs out of $100000$, or even $1000000$, for $n=20$ ... but that's ok: that just means that the number of blobs of size $20$, out of the whole space of possible blobs, is really, really small ... and so statistically you may as well set its proportion to $0$ relatively to other sizes. Also, for large $n$, say $n>80$, the chances of getting a blob become really good, so you can probably get away with just generating $100$ random square assignments. )

Once you have those proportions, then generate a random blob by first picking an $n$ relative to those proportions, and then randomly picking $n$ squares out of the $100$, and just keep repeating that until you get a blob.

Bumbble Comm On 04 Aug 2020 - 12:28

There's a "fast" algorithm for generating all blobs with equal probability, but it has a long startup time, and requires a lot of memory:

(Precomputation): sequentially generate all 100-bit binary numbers; convert each to a $10 \times 10$ grid of black/white cells; check to see if it's a blob. If so, append it to a blob-list.
When done, compute the number of items, $N$, in the blob list.
(Actual use): Generate a uniform random number $K$ between $1$ and $N$, and return the $K$th element of the precomputed blob-list.

This isn't even slightly practical, of course, but I write it down to show that one cannot prove that there's no "fast" algorithm for blob generation.

**Bumbble Comm** · Accepted Answer

A definitely correct but inefficient method is to

Randomly choose which of the $100$ cells are occupied, independently with a probability of $\frac12$ for each cell.
Check if there is a single blob. If not, start over. If so, output this blob.

Step 1 is equally likely to generate each blob (as well as all the non-blobs), so the result (once a blob finally is generated) is equally likely to be any blob. More precisely, let $f$ be the probability that we get something that isn't a blob. Then the probability of generating your favorite blob $B$ on the $k^{\text{th}}$ trial is $f^{k-1} \cdot \frac{1}{2^{100}}$: we failed to get a blob $k-1$ times, and then finally got all the cells in $B$ and no others. So the overall probability of getting blob $B$ with this method is $\frac{2^{-100}}{1-f}$, which doesn't depend on $B$, and therefore all blobs are equally likely.

We choose a probability of $\frac12$ so that on each trial, any blob $B$ has the same probability of $2^{-100}$ of being generated. If the probability were $p \ne \frac12$, then the probability of generating a $k$-celled blob would be $p^k (1-p)^{100-k}$, which is not uniform.

Unfortunately, it will probably take a very long time to generate a blob with this method, since a random subset of the cells usually doesn't form a connected blob.

Next, here is an actually feasible method.

First, consider the following sampling method, which is not uniform:

Randomly choose which of the $100$ cells are occupied, independently with a probability of $\frac12$ for each cell.
Choose a cell uniformly at random. If it is not occupied (or if it has no occupied neighbors, so it doesn't count as a blob), start over. Otherwise, let $B$ be the connected component containing the chosen cell.

Even though this method is definitely biased, it has the nice feature that we know exactly the probability with which a blob $B$ is returned in a single trial. That probability is $$ p(B) = \frac{|B|}{100}\left(\frac12\right)^{|B|+|\partial B|} $$ where $|B|$ is the number of occupied cells in $B$ and $|\partial B|$ is the number of unoccupied cells bordering $B$. Moreover, we can put a lower bound on $p(B)$ for any $B$: it is $p^* = \frac{33}{100} (\frac12)^{100}$. (This is the minimum of $\frac{x}{100} (\frac12)^{x+y}$ over all $x,y$ with $x\ge2$, $y \ge 0$, $x+y \le 100$, and $y \le 2x+2$: the constraints on $|B|$ and $|\partial B|$.)

Now modify the method above to make the following method:

Randomly choose which of the $100$ cells are occupied, independently with a probability of $\frac12$ for each cell.
Choose a cell uniformly at random. If it is not occupied (or if it has no occupied neighbors, so it doesn't count as a blob), start over. Otherwise, let $B$ be the connected component containing the chosen cell.
With probability $p^*/p(B)$, return $B$. Otherwise, start over.

Now, on any given trial, the probability that $B$ is the blob we get after step 2 is is $p(B)$. But the probability that we actually return $B$ is that probability times the acceptance probability: $p(B) \cdot p^*/p(B)$, or $p^*$. So on any given trial, every blob has the same chance $p^*$ of being produced.

Experimentally, the average value of $\frac{p^*}{p(B)}$ seems to be between $0.001$ and $0.0001$, which means the average number of times steps 1-2 are repeated is not $2^{100}$ (as with the first method) but fewer than $10000$. This means we can actually run this algorithm in a few seconds as opposed to the lifetime of the universe.

Calculating a random "blob" in a 10 x 10 grid

There are 3 best solutions below

Related Questions in PROBABILITY

Related Questions in COMBINATORICS

Related Questions in ALGORITHMS

Related Questions in COMPUTATIONAL-MATHEMATICS

Trending Questions

Popular # Hahtags

Popular Questions