Are there more permutations of pixels in a picture or bases in the human genome?

171 Views Asked by At

An iPhone 7 takes pictures that have roughly 12 Megapixels. For simplicity, let's assert that the picture only encodes 256 values per red, green and blue channels such that a 1x1 pixel image has 256^3 unique representations (permutations). The number of permutations for the entire 12 Megapixel image is 256^(3 * number of pixels) = 256^(3 * 12,000,000) = 256^(36,000,000).

There are about 3 billion nucleotide bases in the human genome (which can be represented as a long string that looks like "ACATGACTTGAT..."). Since there are 4 bases by which all our DNA is expressed, the number of permutations in the human genome is 4^3,000,000,000.

For comparison, the estimate of atoms in our observable universe is 10^80

What's bigger: 4^(3,000,000,000) or 256^(36,000,000)?

I tried a couple BIG NUMBER calculators online (like this one from CASIO) but none of them seem to be able to handle numbers this big.

So if I try comparing their logarithms I get:

4^(3,000,000,000)
  =  4 ^ (3 * 10 ^ 9)
  = (3 * 10 ^ 9) * log(4)
  = (3 * 9 * log(10)) * log(4)
  = 27 * log(10) * log(4)
  ≃ 16

256^(36,000,000)
  = 256 ^ (36 * 10 ^ 6)
  = (36 * 10 ^ 6) * log(256)
  = (36 * 6 * log(10)) * log(256)
  = 216 * log(10) * log(256)
  ≃ 520

Did I do that right? If so, can it be said that there are about 33x more permutations in an iPhone picture than there are in the human genome?

Update

My math was indeed quite poor. Here's an update

4^(3,000,000,000)
  =  4 ^ (3 * 10 ^ 9)

  log (4 ^ (3 * 10 ^ 9))
    = (3 * 10 ^ 9) * log(4)
    ≃ 1806179974

256^(36,000,000)
  = 256 ^ (36 * 10 ^ 6)

  log(256 ^ (36 * 10 ^ 6))
    = (36 * 10 ^ 6) * log(256)
    ≃ 86696639

So the number of permutations in the data that describes the human genome sequence, though most of them will not yield a human, is much larger than the permutations of pixels in an iPhone image (regardless of how imperceptibly different those permuted images may be). I was tempted to try to find the ratios between these two logarithms again but realized that wouldn't be representative of the actual quotient. The actual quotient is:

(2^(6,000,000,000))/(2^(288,000,000))
  = 2^(6,000,000,000 - 288,000,000)
  = 2^(5,712,000,000)

So the data in the human genome has 2^(5,712,000,000) more permutations than the data in an iPhone picture.

3

There are 3 best solutions below

3
On

The other answers talk about the math, so I'll talk about if you should be doing that math.

It's important to keep in mind that these are both very rough upper bounds. In the particular case of DNA, I know that there is a huge amount that is identical in virtually every human, which drastically cuts down the number of possibilities. Google tells me that $96\%$ of our DNA is shared with chimps (and therefore presumably also humans) which would reduce the number by a multiplicative factor of 0.04. In actuality, I suspect the proper factor is several orders of magnitude smaller.

On the flip side, I doubt any human can tell RGB $001100$ from RGB $001200$. I have no idea how to make the bound on this reasonably tight, but quick experimentation suggests a $8000$-fold reduction would be appropriate.

As often is the case in applied math, figuring out what math to do is a major challenge in and of itself. Drawing any kind of conclusions off of these calculations would be wildly inappropriate, especially because my (vague) estimations actually switch which number is larger. I certainly wouldn't claim that either of my estimations are particularly close to the correct value.

2
On

You've made a major arithmetic error, probably due in part to the fact you are using = to mean something more like "and then..." rather than to assert two things are equal.

For example,

$$ 256^{36,000,000} = 256^{36 \cdot 10^6}$$

You then wrote that

$$ 256^{36 \cdot 10^6} = 36 \cdot 10^6 \log(256)$$

which is clearly false; presumably, however, your intention was to say

$$ \log\left(256^{36 \cdot 10^6}\right) = 36 \cdot 10^6 \log(256)$$

so that the substitutions would let you conclude

$$ \log\left( 256^{36,000,000} \right) = 36 \cdot 10^6 \cdot \log(256)$$

Then, your big mistake was asserting

$$ 36 \cdot 10^6 \cdot \log(256) = 36 \cdot (6 \log(10)) \cdot \log(256) $$

Presumably, you were motivated by the fact that

$$ \log(10^6) = 6 \log(10) $$

But if you weren't using this weird shorthand, you would have realized that the quantity you are working with is

$$36 \cdot 10^6 \cdot \log(256) $$

which, in particular, does not have a $\log(10^6)$ to be rewritten.


To do the sort of rewriting you wanted to continue to do, you need to take logarithms again — that is, to compute

$$ \log(\log( 256^{36,000,000} ) ) = \ldots = \log( 36 \cdot 10^6 \cdot \log(256) )$$

The logarithm properties used to rewrite that last term will give something different than what you had written.

3
On

Well, $4 = 2^2$ so $$4^{3000000000} = 2^{2\times3000000000} = 2^{6000000000}$$

And, $256 = 2^8$ so $$256^{36000000} = 2^{836000000} = 2^{288000000}$$

The first is a lot bigger.

Another way to look at this is how many bits would be needed to store the two data sets. One base of the genome can be encoded in $2$ bits so you need $6000000000$ bits to store the whole genome; that's 750MB which is about the capacity of a CD ROM. The image needs $36000000 \times 3 \times 8 = 864000000$ bits; that's 108MB and you can get about 7 on the same CD.

Also, I am not certain but I don't think that each of those pixels contains all three colour values; I think that they contain just one of them. So, the 36000000 in the second number should maybe be only 12000000.