I'm currently studying the "Learning from data" course - by Professor Yaser Abu, and I do not get the "bias-variance tradeoff" part of it. Actually, the concepts are fine $-$ the math is the problem.
In the lecture 08, he defined bias and variance as follows:
$\text{Bias} = \mathbb{E}_{\mathbf{x}}\left[(\bar{g}(\mathbf{x}) - f(\mathbf{x}))^2 \right]$, where $\bar{g}(\mathbf{x}) = \mathbb{E}_{\mathcal{D}}\left[g^{(\mathcal{D})}(\mathbf{x})\right]$
$\text{Var} = \mathbb{E}_{\mathbf{x}}\left[ \mathbb{E}_{\mathcal{D}}\left[( g^{(\mathcal{D})}(\mathbf{x}) - \bar{g}(\mathbf{x}))^2\right] \right]$
To clarify the notation:
- $\mathcal{D}$ means the data set $(\mathbf{x}_1, y_1), \cdots, (\mathbf{x}_n, y_n)$.
- $g$ is the function that approximates $f$; i.e., I'm estimating $f$ by using $g$. In this case, $g$ is chosen by an algorithm $\mathcal{A}$ in the hypothesis set $\mathcal{H}$.
After that, he proposed an example that was stated in the following manner:
Example: Let $f(x) = \sin(\pi x)$ and a data set $\mathcal{D}$ of size $N = 2$. We sample $x$ uniformly in $[-1, 1]$ to generate $(\mathbf{x}_1, y_1)$ and $(\mathbf{x}_2, y_2)$. Now, suppose that I have two models, $\mathcal{H}_0$ and $\mathcal{H}_1$.
- $\mathcal{H}_0 : h(x) = b$
- $\mathcal{H}_1 : h(x) = ax + b$
For $\mathcal{H}_0$, let $b = \frac{y_1 + y_2}{2}$. For $\mathcal{H}_1$, choose the line that passes through $(\mathbf{x}_1, y_1)$ and $(\mathbf{x}_2, y_2)$.
Simulating the process as described, he states that:
- Looking for $\mathcal{H}_0$, $\text{Bias} \approx 0.50$ and $\text{Var} \approx 0.25$.
- Looking for $\mathcal{H}_1$, $\text{Bias} \approx 0.21$ and $\text{Var} \approx 1.69$.
Here is my main question: How can one get these results analytically?
I've tried to solve the integrals (it didn't work) that came from the $\mathbb{E}[\cdot]$, but I'm not sure if I'm interpreting in the right way which distribution is which. For example, how to evaluate $\mathbb{E}_{\mathcal{D}}\left[g^{(\mathcal{D})}(\mathbf{x})\right]$ (it is the same as evaluating $\mathbb{E}_{\mathcal{D}}\left[b\right]$ or $\mathbb{E}_{\mathcal{D}}\left[ax+ b\right]$, for $\mathcal{H}_0$ and $\mathcal{H}_1$, respectively, right?)? The random variable which has uniform distribution over $[-1, 1]$ is $\mathbf{x}$, right? Thus $\mathbb{E}_{\mathbf{x}}[\cdot]$ is evaluated with respect to a random variable that follows $U[-1, 1]$ distribution, right?
If anyone could help me to understand at least one of the two scenarios, by achieving the provided numbers for the $\text{Bias}$ and $\text{Var}$ quantities; it would be extremely helpful.
Thanks in advance,
André
The answer to all your questions is “yes”. (Where you write “evaluating $\mathbb E_{\mathcal D}[b]$ or $\mathbb E_{\mathcal D}[ax+b]$ for $\mathcal H_0$ and $\mathcal H_1$”, $a$ and $b$ need to be computed from the data as given in the problem statement, e.g. $b=\frac{y_1+y_2}2$.)
I'll calculate the bias and variance for $\mathcal H_0$.
We have
\begin{eqnarray*} \bar g(x) &=& \mathbb E_{\mathcal D}\left[g^{(\mathcal D)}(x)\right] \\ &=&\int_{-1}^1\frac{\mathrm dx_1}2\int_{-1}^1\frac{\mathrm dx_2}2\frac{\sin\pi x_1+\sin\pi x_2}2 \\ &=& 0\;, \end{eqnarray*}
so the bias is
\begin{eqnarray*} \mathbb E_x\left[\left(\bar g(x)-f(x)\right)^2\right] &=& \mathbb E_x\left[f(x)^2\right] \\ &=& \int_{-1}^1\frac{\mathrm dx}2\sin^2\pi x \\ &=&\frac12 \end{eqnarray*}
and the variance is
\begin{eqnarray*} \mathbb E_x\left[\mathbb E_{\mathcal D}\left[\left(g^{(\mathcal D)}(x)-\bar g(x)\right)^2\right]\right] &=& \int_{-1}^1\frac{\mathrm dx}2\int_{-1}^1\frac{\mathrm dx_1}2\int_{-1}^1\frac{\mathrm dx_2}2\left(\frac{\sin\pi x_1+\sin\pi x_2}2\right)^2 \\ &=& \int_{-1}^1\frac{\mathrm dx_1}2\int_{-1}^1\frac{\mathrm dx_2}2\left(\frac{\sin\pi x_1+\sin\pi x_2}2\right)^2 \\ &=& \frac14\;. \end{eqnarray*}
I don't know why they're given with $\approx$, as these are their exact values.
For $\mathcal H_1$, you'll have more involved integrations, since you get $x_2-x_1$ in the denominator:
$$ a=\frac{y_2-y_1}{x_2-x_1}=\frac{\sin\pi x_2-\sin\pi x_1}{x_2-x_1} $$
and
$$ b=\frac{x_2y_1-x_1y_2}{x_2-x_1}=\frac{x_2\sin\pi x_1-x_1\sin\pi x_2}{x_2-x_1}\;. $$
Also, in this case you have an actual dependence on $x$, whereas for $\mathcal H_0$ the integration over $x$ for the variance was trivial since $g$ was constant.