Estimation of population mean using probability density function

49 Views Asked by At

I'm currently learning about the maximum likelihood estimation for the parameters of various distributions. I know that the MLE for the popualtion mean of the normal distribution is the common mean formula. But I can calculate the likelihood of a given number to be the popualtion mean by using the probability density function, which I multiply for each datapoint: $$L(\mu,\sigma|x_1,x_2...x_n) = L(\mu,\sigma|x_1)*L(\mu,\sigma|x_2)...L(\mu,\sigma|x_n)$$

$$ = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x_1-\mu}{\sigma})^2} * \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x_2-\mu}{\sigma})^2} ... \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x_n-\mu}{\sigma})^2} $$

I would like to ask if the code I wrote is a correct way to estimate the population mean by plugging in a series of numbers for the mean and calculate the likelihood of each.

Thank you

set.seed(100)
sample <- rnorm(100, mean=30,sd=25)

#Likelihood of tested pop_mean by calculating PDF of each data-point is sample
#sample -> vector of datapoints (x_1,...,x_n)
#mu -> test likelihood that number is pop mean
#stdev -> use a constant for standrad deviation
estim_sample <- function(sample, mu, stdev){
  res_pdf <- unlist(lapply(sample, function(x) dnorm(x, mean=mu, sd=stdev)))
  est <- prod(res_pdf)
  return(est)
}


#Create vector of numbers to test for pop mean
test_mu <- c(1:100)
#Run estimation to test which pop_mean is most likely
lh_pop <- lapply(test_mu, function(m) estim_sample(sample, mu=m, stdev=24))
lh_pop <- unlist(lh_pop)                

#Plot
df <- data.frame(test_val=test_mu, lh_estim=lh_pop)
max <- dplyr::filter(df, lh_estim==max(lh_estim))

plot(x=test_mu, y=log10(lh_pop))
abline(v=max$test_val)
1

There are 1 best solutions below

1
On BEST ANSWER

The code you wrote enables you to visually confirm that the mean is somewhere near the middle of the interval (20, 40). Since you know that you chose the mean to be 30, you can look at the figure and feel reassured that the code worked. Which it does!

But this code will have a hard time generally helping you identify the mean, since most of the time the maximum likelihood estimate of the mean will not be an integer. You're bringing your knowledge of the problem's setup in when you evaluate the output of the code. This code is less useful in situations when you don't have that kind of advance knowledge.

The value of the maximum likelihood approach lies in our ability to use differential calculus to estimate parameters without relying on a search method of the kind you're doing here. So, typically, one would take the first derivative of the likelihood (or another related function; more on that in a minute), and then set that equal to 0. That first order condition, which identifies the critical values of the parameters, must be satisfied at the maximum if it lies in the interior of the parameter space, so starting there eliminates lots of possible values. It's not sufficient, however, since those critical values only include the maximizer. They also include other things, like minimizers and saddle points. So, in general, we might want to check the second derivative at the critical points, to see if it is negative, the second order condition. When this condition holds, we have a maximizer---but not yet the maximizer, just a local one near that point. It could be global, or it could not be. We should proceed to evaluate all candidates and compare values. The critical point with a negative second derivative and the largest value of the likelihood in the set of all such candidates will be the maximum likelihood estimate.

We sometimes know that our function is globally concave, i.e., has negative second derivatives everywhere. When that's true, we can skip the last two steps, since the only critical value will be the maximizer.

In practice, we will often want not to work with the likelihood directly, since it is cumbersome to take the derivative of a really big product. Instead, we can first log the likelihood function, turning our big product into a big sum, which is much easier to differentiate. This trick works because the log changes the shape of the function without moving its critical values around (because log is a strictly increasing function). So we would really want to take the derivative of the log likelihood.

Finally, it's worth saying that this procedure works best with simple examples. When likelihood functions get really complicated, and we therefore no longer have a globally concave function (or. worse, do not know whether we do or not), the first order condition can cease to be the best tool for the job. Other optimizers, many/most of which use information about derivatives, are often faster and more reliable.