difference between bias vs variance

104 Views Asked by At

I am confused about variance and bias of function.How one can tell if function is overfitting or underfitting?how can you write formulas that express that?In machine learning if approximator has better results on training data and worse on unseen data than it overfits(it learns individual points of the data), if it is doing terribly on training set and loss(MSE), is high that means that it underfits.Can you give me forumal or concepts or similar things which can help me to understand, bias and variance in statistics.

1

There are 1 best solutions below

0
On BEST ANSWER

The simple explanation is that you seem to be trying to predict some out-of-sample value $\theta$ where your estimator is $\hat \Theta$, and your loss will be proportional to $(\hat \Theta - \theta)^2$ so you should aim to minimise $\mathbb E\left[\left(\hat \Theta - \theta\right)^2\right]$

You can rewrite this as a sum: $\mathbb E\left[\left(\hat \Theta - \theta\right)^2\right] = \left(\mathbb E\left[\hat \Theta - \theta\right]\right)^2 +\mathbb E\left[\left(\hat \Theta - \mathbb E\left[\hat \Theta\right] \right)^2\right]$ where

  • $\left(\mathbb E\left[\hat \Theta - \theta\right]\right)^2$ is the square of the expected bias $E\left[\hat \Theta - \theta\right]$
  • $\mathbb E\left[\left(\hat \Theta - \mathbb E\left[\hat \Theta\right] \right)^2\right]$ is the variance of the estimator $\hat \Theta$, ignoring how accurate it is

This illustrates that you should not just try to make your estimator unbiased, nor just try to minimise the variance of your estimator, but take both into account at the same time. Your machine learning can be tuned to attempt this by methods such as cross-validation on your training set, and can do so without considering either the bias or the variance explicitly, by concentrating directly on $\mathbb E\left[\left(\hat \Theta - \theta\right)^2\right]$.

As an illustration of at the same time and that this is a broader question than over- or under-fitting, if you are trying to estimate the variance of a normally distributed random variable of unknown mean and variance, the estimator $\hat \sigma^2_{n-1} = \frac1{n-1} \sum (x_i-\bar x)^2$ has the merit of being unbiased and of having the smallest variance of all unbiased estimators. But it does not minimise $\mathbb E[(\hat \sigma^2 - \sigma^2)^2]$; on that criterion the best estimator would be $\hat \sigma^2_{n+1} = \frac1{n+1} \sum (x_i-\bar x)^2$ even though this is biased downwards with $E[\hat \sigma^2_{n+1} - \sigma^2] = -\frac{2\sigma^2}{n+1}$. Most machine learning questions are more complicated than this and so do not lend themselves to simple analysis, but the concept of finding the best model to minimise out-of-sample error is similar.