Maximum Likeliehood Question (Bernoulli or Binomial p.d.f.?)

1.1k Views Asked by At

enter image description here

Bernoulli enter image description here

Binomial enter image description here

I'm calculating the Maximum Likelihood for parameter p which is the proportion of purchases made by women. Since there are 70 trials of the Bernoulli experiment, is the MLE a function of the 2nd p.d.f. $f(x|n,p)$ where n=70 ? I know that I need to set the derivative of log-likelihood (w/ respect to p) to 0 and solve for p but when I take the derivative of this, I get $(x/p) + (70-x)/(p-1)$ when I plug in 70 for n which includes x. I'm not sure if this is $x_i$ still since the is the (n choose x). This is where I get lost...

My second approach was to se the Bernoulli p.d.f., but made my MLE of p = mean which is 35 and that didn't seem right since there is definitely a skew.

My question is, which p.d.f. do I use to figure out the MLE?

2

There are 2 best solutions below

1
On BEST ANSWER

The likelihood function is the PDF, with $x$ regarded as fixed (observed) and $p$ as a variable. You want to solve for $p$ in terms of $x$ and $n,$ The factor ${n \choose x}$ is often omitted because it is regarded as a constant (which does not functionally depend on $p$).

Using the log-likelihood function, you have found the expression $(x/p) + (n - x)/(1 - p).$ Set it to 0 and solve for $p$ to get the answer $\hat p = x/n,$ where the $\hat{}$ recognizes that $x/n$ is the maximum likelihood estimator of $p.$

A computer demonstration in R may be useful to visualize what is going on. In R, the PDF is dbinom. For fixed $x = 58$ and $n = 70$ we regard it as a function of $p$ by looking at its value for a 'grid' of many values of $p$ and looking for the value that maximizes the likelihood function.

n = 70;  x = 58;  p = seq(0, 1, by=.0001)
like = dbinom(x, n, p)
p[like == max(like)]   # MLE from grid search
## 0.8286
x/n                    # MLE from derived formula
## 0.8285714

plot(p, like, type="l", lwd = 3, col="blue")
abline(v = x/n, col="red")
abline(h = 0, col="green3")

The plot below shows the likelihood function, with its maximum value indicated by the vertical line.

enter image description here

Notice that the likelihood function is sharply curved in the vicinity of $\hat p = x/n.$ As you continue your study of maximum likelihood estimators you will find that this sharp curvature has to do with 'goodness' of the MLE as an estimate of $p.$ If $n = 35$ and $x = 29$ we have less information, $\hat p$ is the same, and the curvature is not quite so sharp.

enter image description here

7
On

The short answer to your bolded question is, "either."

The way you write up the solution depends on which model you use. The difference is in how you regard the sample. With the Bernoulli model, you would regard the sample as a sequence of $n = 70$ observations $$\boldsymbol x = (x_1, x_2, \ldots, x_{70}),$$ where each $x_i \in \{0,1\}$ is an independently and identically distributed Bernoulli random variable such that $\Pr[x_i = 1] = p$ represents the probability that person $i$ in the sample is a woman. Since each $x_i$ has distribution $f_{x_i}(x \mid p) = p^x (1-p)^{1-x}$, the joint distribution of the entire sample is given by $$f_{\boldsymbol X}(\boldsymbol x \mid p) = \prod_{i=1}^{70} p^{x_i} (1-p)^{1-x_i} = p^{\sum_{i=1}^{70} x_i} (1-p)^{70 - \sum_{i=1}^{70} x_i}.$$ The log-likelihood is therefore $$\ell( p \mid \boldsymbol x) = S \log p + (70 - S) \log(1-p),$$ where $S = \sum_{i=1}^{70} x_i$ is the sum of the sample, which also happens to be the number of women that purchased the cereal in the sample. The maximum likelihood estimate $\hat p$ of $p$ then occurs at a critical point of $\ell$, and it is easy to verify that $$\hat p = \frac{S}{70} = \frac{1}{70} \sum_{i=1}^{70} x_i = \bar x,$$ the sample proportion.

Now, what if we had used the binomial model? Well, in such a case, the sample consists of a single observation $X$, as calculated by the observed number of women that purchased the cereal out of the total number of people observed. That is to say, $$X \sim \operatorname{Binomial}(n = 70, p)$$ and the likelihood of $p$ given $X = x$ is $$\mathcal L(p \mid x) = f_X(x \mid p) = \binom{70}{x} p^x (1-p)^{70-x}.$$ The log-likelihood is $$\ell(p \mid x) = \log \binom{70}{x} + x \log p + (70-x) \log(1-p).$$ And you will immediately notice that this log-likelihood is similar to the one we calculated in the Bernoulli model, but with two differences: first, the sample total $S$ we used in the Bernoulli model is actually the random variable $X$ in the binomial model; and second, there is an additive term $\log \binom{70}{x}$ in the log-likelihood of the binomial model. This additive term, however, does not affect the calculation of the MLE, because it is constant with respect to the parameter to be estimated, namely $p$. Therefore, when calculating the critical points, this log-likelihood yields the same result: $$\hat p = \frac{x}{70} = \bar x,$$ the sample proportion of women.

This problem illustrates a number of interesting facts: first, that the binomial model is in some sense "related" to the Bernoulli model (the meaning and nature of this relationship is clearer in the context of a discussion of sufficient statistics, but is not in the scope of this question). Second, a likelihood function is only unique up to proportionality (and by extension, a log-likelihood is only unique up to the addition of a scalar constant).