Expected squared error of ensemble (bagging)

1.4k Views Asked by At

I am working through the Deep Learning book chapter on regularization (https://www.deeplearningbook.org/contents/regularization.html#pf5).

On page 253 there is a derivation of the expected squared error of the ensemble predictor where the predictor is composed of $k$ regression models.

We are given that each model makes an error $\epsilon_i$ on each example with errors drawn from a zero-mean multivariate normal distribution. We are also given that the variance is $E(\epsilon_i^2)=v$ and the covariance is $E(\epsilon_i \epsilon_j)=c$.

Hence the average prediction of all of the ensemble models is $\frac{1}{k}\sum_i \epsilon_i$.

My question regards the following claim that the expected squared prediction error of the ensemble is thus:

$\begin{align} E((\frac{1}{k}\sum_i \epsilon_i)^2)&=\frac{1}{k^2}E(\sum_i(\epsilon_i^2+\sum_{j\neq i} \epsilon_i \epsilon_j))\\ &=\frac{1}{k}v + \frac{k-1}{k}c. \end{align}$

I am fine with the first line but how do we from $\frac{1}{k^2}E(\sum_i(\epsilon_i^2+\sum_{j\neq i} \epsilon_i \epsilon_j))$ to $\frac{1}{k}v + \frac{k-1}{k}c$?

It seems like (assuming we have $N$ examples) the correct equality should be

$$\frac{1}{k^2}E(\sum_i(\epsilon_i^2+\sum_{j\neq i} \epsilon_i \epsilon_j))=\frac{N}{k^2}v +\frac{N(N-1)}{k^2}c.$$

I know that to make the equality match the equality given in the book my $N$s should be $k$s... what am I missing here? We are summing over examples not models.

2

There are 2 best solutions below

0
On

I think the problem stems from the very confusing formulation in the text that “each model makes an error $\epsilon_i$ on each example”, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.

0
On

I believe there is no talk of the number of examples $N$. We simply want to find the average expected error in a given example when using an ensemble classifier with $k$ classifiers.

$$ E\left[\left(\frac{1}{k}\sum_{i}\epsilon_{i}\right)^{2}\right]=E\left[\left(\frac{1}{k^{2}}\sum_{i}\epsilon_{i}^{2}+\frac{1}{k^{2}}\sum_{i\ne j}\epsilon_{i}\epsilon_{j}\right)\right]. $$ Here, we have $k$ variance (${{\epsilon}_i}^2 $) terms, and $2{k\choose2}$ covariance (${\epsilon}_i{\epsilon}_j$) terms.
Now, $2{k\choose2} = k(k-1)$. Therefore, \begin{align*} E\left[\left(\frac{1}{k}\sum_{i}\epsilon_{i}\right)^{2}\right] & =\frac{1}{k^{2}}\sum_{i}\left(E\left[\epsilon_{i}^{2}\right]\right)+\frac{1}{k^{2}}\sum_{i\ne j}\left(E\left[\epsilon_{i}\epsilon_{j}\right]\right)\\ & =\frac{kv}{k^{2}}+\frac{k(k-1)c}{k^{2}}. \end{align*}

This gives the desired result.