What is an unbiased estimator and utility of fisher information

611 Views Asked by At

I am self learning estimation theory and finding it quite difficult to grasp the utility of Cramer Rao Lower bound. In text books and online tutorials always say that one should derive the CRLB of the estimator. If the variance of the estimator is greater than equal to the inverse of the Fisher information, then we say that no other better estimator exists. The inverse of the Fisher information is the CRLB. If the variance is equal to the CRLB, then the estimator is efficient. Intuitively, an estimator is nothing but a formula or an expression that is used to find an unknown value/ parameter.

1) Is there a better intuitive way to explain what CRLB bound tells us and why we need it?

2) What is meant by efficient estimator and efficiency to do what? With what do we compare the efficiency of an estimator. These questions may be trivial but I found very difficult to extract key information from highly mathematical heavy stuff.

3) Do we discard the estimator if inefficient?

Please correct me if any information is wrong. Thank you.

2

There are 2 best solutions below

3
On BEST ANSWER

Please begin by looking at the main answer (by @Learner) to this question. Then I have a simple example that may help illustrate.

Consider the problem of estimating the success probability $\theta$ of the distribution $\mathsf{Binom}(n, \theta)$ using information from $n$ trials. The maximum likelihood estimator of $\theta$ is $\hat \theta = X/n,$ where $X$ is the observed number of successes.

The PDF of $X$ is $$f(x\,|\,\theta) = {n \choose x}\theta^x(1-\theta)^{n-x},$$ for $x = 0, 1, \dots, n.$ Given $n$ and $\theta,$ the PDF allows you to calculate $P(X = x).$

Viewed as a function of $\theta$ for observed $x$ this same relationship is known as the likelihood function. $$\ell(x\,|\,\theta) = {n \choose x}\theta^x(1-\theta)^{n-x} \propto \theta^x(1-\theta)^{n-x},$$ for $0 \le \theta\le 1.$ The symbol $\propto$ ("proportional to") indicates that for observed $x$ the constant factor ${n \choose x}$ has been omitted. Many authors take the point of view that the likelihood function is defined only 'up to a positive constant'. Then the maximum likelihood estimator (MLE) of $\theta$ is found by maximizing $\ell(x\,|\,\theta)$.

Suppose $n = 5$ and $x = 2.$ Then the MLE $\hat \theta = 0.4$ is the value of $\theta$ that maximizes the curve in the left panel of following figure. Also, if $n=50$ and $x = 20,$ then again the MLE $\hat \theta = 0.4.$

But there is more 'information' in 50 observations than in five, so the likelihood curve in the right panel has a 'sharper' maximum.

enter image description here

The second derivative of the curve near $\hat \theta = 0.4$ is greater on the right. Very roughly speaking, the Fisher information gives the expected curvature.

1
On

The Cramer Rao bound is fundamental to the study of statistics, let me try to explain you why without using a lot of symbols. An estimator does exactly that: it estimates. Given the data you've seen what do you think a certain parameter really is. Now, thinking about estimators one can ask whether or not ones are better than others. Have you realized that people don't like high variance of things? There is a weird feeling when something changes a lot. The whole study of statistics follows as a principle trying to reduce variance. Wouldn't it be nice if you could find among all possible unbiased estimators of your parameter, one with minimal variance? I.e. one which looks more solid/stable? That is the result of Cramer and Rao, not only do they tell you that there is a lower bound for the variance, they also tell you how to compute it.