Maximum likelihood of Bernoulli

2.5k Views Asked by At

I'm trying to solve this exercise:

enter image description here

But I can't get to the correct answer. This is what I did:

$$\log(p(X|\theta) = \sum_{i=1}^{d}(\log(\theta_i^{x_i}) + \log((1-\theta_i)^{1-x_i})) \\ = \sum_{i=1}^{d} (x_i \log(\theta_i) + (1-x_i) \log (1-\theta_i))$$

I look for the maximum, so:

$$\frac{\partial}{\partial \theta} \log(p(X|\theta) = \sum_{i=1}^{d} \frac{x_i}{{\theta}_i} + \sum_{i=1}^{d} \frac{1-x_i}{1-\theta_i}$$

And:

$$\sum_{i=1}^{d} \frac{x_i}{{\hat{\theta}}_i} + \sum_{i=1}^{d} \frac{1-x_i}{1-\hat{\theta}_i} = 0$$

But when I solve that I can't get the correct answer.

2

There are 2 best solutions below

2
On

For samples $\mathbf x_1, \mathbf x_2, \ldots, \mathbf x_n$, each of which is $d$-dimensional vector, likelihood function is equal to $$ f(\theta; \mathbf x_1, \ldots, \mathbf x_n) = \prod_{k=1}^n\prod_{i=1}^d \theta_i^{\mathbf x_{k}(i)}(1-\theta_i)^{1-\mathbf x_{k}(i)}=\prod_{i=1}^d \theta_i^{\sum_{k=1}^n\mathbf x_{k}(i)}(1-\theta_i)^{n-\sum_{k=1}^n\mathbf x_{k}(i)}, $$ where $\mathbf x_{k}(i)$ is the $i$th coordinate of vector $\mathbf x_{k}$.

Log-likelihood function is equal to $$ L(\theta; \mathbf x_1, \ldots, \mathbf x_n) = \sum_{i=1}^d \left(\sum_{k=1}^n\mathbf x_{k}(i) \cdot\log(\theta_i)+\biggl(n-\sum_{k=1}^n\mathbf x_{k}(i)\biggr)\cdot\log(1-\theta_i)\right). $$ When differentiating with respect to $\theta_i$, all terms except that which containes $\theta_i$, disappear: for $i=1,\ldots,d$ $$ \frac{\partial}{\partial \theta_i}L(\theta; \mathbf x_1, \ldots, \mathbf x_n) = \frac{\sum_{k=1}^n\mathbf x_{k}(i)}{\theta_i}-\frac{n-\sum_{k=1}^n\mathbf x_{k}(i)}{1-\theta_i}=\frac{\sum_{k=1}^n\mathbf x_{k}(i)-n\theta_i}{\theta_i(1-\theta_i)}. $$ Please note that $\left(\log(1-x)\right)'=-\frac{1}{1-x}$.

For any $i$, MLE $\hat\theta_i$ is a solution of equation $$ \sum_{k=1}^n\mathbf x_{k}(i)-n\hat\theta_i =0, $$ $$ \hat\theta_i=\frac1n{\sum_{k=1}^n\mathbf x_{k}(i)} $$ We obtain MLE for vector $\theta$ as $$ \hat\theta=\frac1{n}\sum_{k=1}^n\mathbf x_{k}. $$

0
On

If you have a sample of size $n$ then the likelihood function is

$$L(\theta)=\prod_{k=1}^n \prod_{i=1}^d \theta_i^{x_{ik}}\cdot(1-\theta_i)^{1-{x_{ik}}} $$

Taking logs

$$\log\left(L(\theta) \right)=\sum_{k=1}^n \sum_{i=1}^d \left(x_{ik}\cdot \log(\theta_i)+(1-x_{ik})\cdot \log(1-\theta_i) \right)$$

Now we focus on $i=1$ to calculate the estimator for $\theta_1$. We can do that because the partial derivatives w.r.t. $\theta_i$ are all equal.

$\frac{\partial }{\partial \theta_1}\left(\sum_\limits{k=1}^n \left(x_{1k}\cdot \log(\theta_1)+(1-x_{1k})\cdot \log(1-\theta_1) \right)\right)$

$$=\frac{1}{\hat\theta_1}\cdot \sum\limits_{k=1}^{n}x_{1k}-\frac{1}{1-\hat\theta_1}\cdot \sum\limits_{k=1}^{n} (1-x_{1k})=0$$

$$(1-\hat\theta_1)\cdot \sum\limits_{k=1}^{n}x_{1k}= \hat\theta_1\cdot \sum\limits_{k=1}^{n} (1-x_{1k})$$

After some steps it comes out that $\hat\theta_1=\frac{\sum\limits_{k=1}^n x_{1k}}{n}$

Thus $\hat\theta=(\hat \theta_1, \hat \theta_2, \hat \theta_3,\ldots,\hat \theta_d)^T=\frac1n\cdot \left( \sum\limits_{k=1}^n x_{1k},\sum\limits_{k=1}^n x_{2k},\sum\limits_{k=1}^n x_{3k}\ldots ,\sum\limits_{k=1}^n x_{dk}\right)^T$ $=\frac1n\ \sum\limits_{k=1}^n \left(x_{1k}, x_{2k}, x_{3k}\ldots , x_{dk}\right)^T=\frac1n\cdot \sum_\limits{k=1}^n\mathbf{X_k}$