Random sample S from a population: $P(s \in k)$?

28 Views Asked by At

Consider the population {1, ...,N}. We get sample S by drawing m times with replacement from the population. Show that $P(k \in S)=\frac{m}{N}+\mathcal O(\frac{1}{N^2})$, if $N$ is large.

I am given that $f(n)=\mathcal O(g(n))$ if there exsists a constant $M$ such that for all $n$ $f(n)/g(n)\leq M$.

Intuitively, I do understand that if you get a sample with replacement that the chance that element $k$ is part of sample $S$ is $\frac{m}{N}$ plus something extra because $k$ can be drawn more than once. However, I do not understand what that function $\mathcal O(1/N^2)$ means.

1

There are 1 best solutions below

0
On BEST ANSWER

Write $f(N)$ for the something extra, i.e. $$ f(N) = P(k\in S) - \frac mN \ . $$ The assertion is that $f(N)=\mathcal O({1\over N^2})$. Just as you described, this means that there exists $M$ such that $$ |N^2f(N)|\le M $$ for all $N$. (I'm putting the absolute value around the LHS in case $f(N)$ is negative.) In words, the claim is that the extra piece, as a function of $N$, grows no faster than $1/N^2$.

So how do you demonstrate that something is $\mathcal O(1/N^2)$? Here's an example. For fixed $m$, consider the expression $$ \left(1-\frac1N\right)^m\tag1 $$ viewed as a function of $N$. This is the product of $m$ terms, each of the form $1-1/N$. If you expand out (1) as a polynomial in $1/N$, you'll get $$ \left(1-\frac1N\right)^m = 1 + {c_1\over N} + {c_2\over N^2} + \cdots + {c_m\over N^m} $$ for some constants $c_1,\ldots,c_m$. (Try for $m=2$ or $m=3$). The point of the $\mathcal O$ notation is to lump together the smaller order terms into a tidier expression that summarizes their growth as a function of $N$. We write $$ \left(1-\frac1N\right)^m = 1 + {c_1\over N}+{\mathcal O}(\frac1{N^2})\ .\tag2 $$ In words, the expression (1) behaves like $1+{c_1\over N}$ plus a remainder that grows no faster than $1/N^2$. Why $\mathcal O(1/N^2)$? Because the terms that we've replaced with the $\mathcal O$ expression are indeed bounded by a constant times $1/N^2$: $$ \left|{c_2\over N^2}+{c_3\over N^3} +\cdots+{c_m\over N^m}\right|\le {M\over N^2}\tag3 $$ One way to see why (3) is true is to multiply the LHS of (3) by $N^2$ and take the limit as $N\to\infty$. When we do this, $N^2$ times the LHS tends to the constant $c_2$, which implies that $N^2\cdot\mbox{(LHS)}$ must be a bounded sequence. This is exactly the requirement for something to be $\mathcal O(1/N^2)$.

How does this apply to your problem? Hint: the probability that $k$ is not a member of $S$ is $(1-\frac1N)^m$.