Unbiased data vis-a-vis unbiased estimator

34 Views Asked by At

I have recently started studying statistics on my own. Pls clarify the following.

Unbiased data means spread will be more . Say normal pdf curve - spread more ==> unbiased data ==> variance more? (pls correct if i understood wrongly. I am thinking if all data is to be represented then there should be representations from all sections and so spread of data will be more)

Biased data means normal pdf curve spread less ==> variance less?

If i am correctly thinking, efficient estimator defination is given as that whose variance approaches crawmer rao lower bound(it means variance is less).

But for data sample to be unbiased spread is more which means variance is more.

Both are opposite to each other. Pls explain. Somewhere i understood wrongly

1

There are 1 best solutions below

0
On

I believe you are talking about the bias-variance tradeoff, which is a general concept in statistics and machine learning.

The bias-variance tradeoff in extremely simple terms:

If we only have a single variable, we can use all the ''information'' of the data to estimate that single parameter. We can then expect the estimate of that single variable to be fairly accurate. However, reality is a complex place and we do not expect this simple model to accurately describe the real world. This model is biased.

We can seek to reduce the bias by increasing the model complexity. This introduces additional variables. The true values of these variables must necessarily be estimated from observed data. If we have several variables, the ''information'' will be evenly divided on the multiple variables. Because either of the variable now receives just a fraction of the available information, the estimates of these multiple variables are more likely to deviate from the true values. This complex model is less biased but difficult to ''train'' from data.