What is the probability of having the same (binary) datasets?

75 Views Asked by At

Suppose we have $m$ binary data points as an outcome of a specific experiment, so the outcomes of those points are fixed. We save our data in a file of $m$ points having value $0$ or $1$. (for example our data is $0101$)

Then we damage a subset of the datafile such that $n<m$ outcomes are randomly changed. We are aware of the fact that $n$ bits could have changed.

. We know which $n$ could take a new value (or keeps the same value). For example if we damage our $0101$ where we know that only the first bit randomly changes, the result could be $0101$ or $1101$.

My question is:

Can we say that this probability can be calculated by $$p(\text{binary datasets are the same})=\frac{2^m - 2^n}{2^m} $$

Now alternatively, suppose that our datapoints have $r \in \{2,3,4,...\}$ possible outcomes. There are $m$ datapoints and $n < m$ missing datapoints. Can we say that the probability that our datasets (the complete one and the incomplete one) are the same is expressed as follows:

$$ p(\text{datasets are the same}| r,m,n) = \frac{r^m-r^n}{r^m}$$

1

There are 1 best solutions below

3
On BEST ANSWER

If I understand you correctly, you're comparing the $2^{m-n}$ certain bits with another $2^{m-n}$ bits. If those bits are completely random (either the original $2^{m-n}$ or the second $2^{m-n}$, or both), the probability that they are the same is $1/2^{m-n}$.

EDIT: For the new question, where you have $n$ bits that are changed randomly, the probability that the new dataset is the same as the original (which I presume is what you mean by "this probability") is $1/2^n$. And for general $r$ it would be $1/r^n$.