Definition of "Bias" in Machine learning models

276 Views Asked by At

In the estimation of a parameter say the average of a population the definition of "bias" is very clear. It is the difference between the average estimator value (averaged over random samples) and the true value of the parameter.

In machine learning models the same term "Bias" (as in bias variance tradeoff) is used but I have seen many different definitions of it. Is there a standard definition?

I have seen it defined as:

  1. The bias at a specific point $x$ is the average difference (over samples) between $\hat f(x)$ and $f(x)$
  2. As before the average difference but averaged over all data points $x$ and not at a specific data point
  3. The difference between the best $\hat f$ in the hypothesis class and $f$
  4. The difference between $\hat f$ and $f$ as the number of records in the sample goes to infinity.

I would also like to know if we should use a different definition for regression where it makes sense to talk about a systematic error (because we can underestimate the value and overestimate it) and classification where every error is systematic because there is only one type of error that can be made.

A final question is how all this would apply to classification with KNN. It seems conventional wisdom that with a higher $k$ we get a higher bias. But applying the above definitions I am not convinced this should always be true.

1

There are 1 best solutions below

0
On BEST ANSWER

Is there a standard definition?

As you showed yourself, there isn't. You'll have to figure out what the author means exactly by bias. To add to your list, the most common thing, at least in my experience, people mean when they talk about model bias is actually inductive bias, which is

In machine learning, the term inductive bias refers to a set of (explicit or implicit) assumptions made by a learning algorithm in order to perform induction, that is, to generalize a finite set of observation (training data) into a general model of the domain. Without a bias of that kind, induction would not be possible, since the observations can normally be generalized in many ways. Treating all these possibilities in equally, i.e., without any bias in the sense of a preference for specific types of generalization (reflecting background knowledge about the target function to be learned), predictions for new situations could not be made. — Hüllermeier et al.

Which includes basically any possible architectural design choice you make when constructing a learning algorithm.

Now, to address your other two points:

classification where every error is systematic because there is only one type of error that can be made.

I do not agree the assertion that systematic error is the only kind of error that can be made. Mislabelling, a type of random error, is present in almost any larger scale classification dataset.

Regarding KNNs, at least for KNN-Regression we have a very simple closed form for the bias-variance trade off:

$$ \operatorname {E} [(y-{\hat {f}}(x))^{2}\mid X=x]=\underbrace{\left(f(x)-{\frac {1}{k}}\sum _{i=1}^{k}f(N_{i}(x))\right)^{2}}_{\texttt{BIAS}^2}+\underbrace{\frac {\sigma ^{2}}{k}}_{\texttt{VAR}}+\underbrace{\sigma^{2}}_{\texttt{Bayes err.}} $$

here, bias is indeed monotonically increasing with $k$ while the variance is monotonically decreasing with $k$.