Bernoulli of a Bernoulli variable.

89 Views Asked by At

Let $X_1, X_2, ... $ be iid Bernoulli($\theta$), where $\theta$ is unknown. We want to estimate $\theta$. However, we do not observe $X_1, ...$, but instead observe $Y_1, Y_2, ...$, where $Y_i=X_i$ with probability $p$ and $Y_i=1-X_i$ with probability $1-p$, where $p\in(0,1)$ is known. Find a consistent sequence of estimate for $\theta$ based on $Y_1, ...$.

Here is what I have. We can use MLE to estimate $\theta$. Then we need to know $P(Y_i|\theta)=\sum_{X_i}P(Y_i|X_i)P(X_i|\theta)=\sum_{X_i}p^{X_i}(1-p)^{1-X_i}\theta^{X_i}(1-\theta)^{1-X_i}=p\theta+(1-p)(1-\theta)$. It seems wired that $P(Y_i|\theta)$ has nothing to do with $Y_i$. Can somebody help me to point out where I did wrong?

1

There are 1 best solutions below

0
On BEST ANSWER

Let's take the $n=1$ case first:

$$P(Y=1) = pP(X=1) + (1-p)P(X=0)=p\theta + (1-p)(1-\theta) = 2p\theta-\theta - p +1$$

Isolating $\theta$ we get a linear function of $\theta$:

$$c(\theta;p) := P(Y=1|\theta,p) = \theta(2p-1)+1-p$$

This means that $Y$ has a $\textrm{Bernoulli}(c(\theta;p))$ distribution.

Since each $Y_i$ is only dependent on $X_i$, which is from an iid sequence of $X_i$, we know the $Y_i$ are also iid.

Let $S_Y(n):=\sum_1^n Y_i$ then $S_Y(n) \sim \textrm{Binomial}(n,c(\theta;p))$

The MLE of $c(\theta;p)$ is $\widehat{c(\theta;p)}=\frac{S_Y(n)}{n} := \bar{Y_n}$ as per the usual MLE of the binomial parameter.

However, you are looking for the MLE of $\theta$ not $c(\theta;p)$. Here's where the transformation invariance of the MLE comes in handy:

$c(\theta;p)$ is 1-to-1 with $\theta$ given $p$, therefore $\hat{\theta}_{MLE} = c^{-1}(\widehat{c(\theta;p)})$, which we get as follows:

$$\widehat{c(\theta;p)} = \bar{Y_n} = \theta(2p-1)+1-p$$ $$c^{-1}(\bar{Y_n}) = \frac{\bar{Y_n}-(1-p)}{2p-1} = \hat{\theta}_{MLE}$$

This is the same result that @snoop got in the comments using method of moments -- but I wanted to address how you'd get this in a maximum likelihood framework.

Note some peculiarities with the MLE. If $p=0.5$ then $Y$ is effectively destroying all the information we get about $X$ so the MLE for $\theta$ is undefined. As $p\to 0.5$ the estimator becomes very erratic, often placing the MLE at $\{0,1\}$ due to under/overshoot.

Basically, as you approach complete information destruction $(p=0.5)$ you will typically need an ever larger sample to get an MLE that falls in $(0,1)$.