Minimizing the variance in a variant of bagging Weighted Aggregation(Wagging)

68 Views Asked by At

In our machine learning course we have learned Bagging, wherein A variant of bagging call Weighted Aggregation is introduced, where the result is a weighted sum of all the estimators instead of average. Namely,$$y_i=w_i h(x)$$,the predicted label is therefore $$\bar{Y} = \sum_i w_i yi $$ where $\sum_i w_i = 1$ and $w_i \ge 0$

Since Bagging is used primarily to reduce variance, we have been asked to how to choose $w_i$'s, to minimize $\text{Cov}(\bar{Y})$ under the assumption $\text{Var}(y_i)=\sigma^2$ and $\text{Cov}(y_i,y_j)=\rho \sigma^2$. The result is quite straight forward and intuitive, $w_i=\frac{1}{m}$, where $m$ denotes the number of estimators. But how to prove it mathematically? We have been asked to prove it by both using Lagrange multiplier and not using it. Can anybody shed some lights? I've been working on it with any luck.

What I am having now is $$ \text{min }\sigma^2 \sum_{i=1}^{n} w_i^2+2\rho\sigma^2 \sum_{1\le i < j \le n}^n w_iw_j, \\ \text{subject to} \sum_{i=1}^{n} w_i = 1. $$ Then I got the Lagrangian, $$ \mathcal{L}(w_1,\dots,w_n, \lambda) = \sigma^2 \sum_{i=1}^{n} w_i^2 + \rho \sigma^2 \sum_{i \ne j} w_i w_j - \lambda (1 - \sum_{i=1}^{n} w_i)$$ Take partial derivative w.r.t each $w_i$'s and $\lambda$ and set to $0$ I got, $$ \frac{\partial \mathcal{L}}{\partial w_1} = 2w_1\sigma^2 + 2(w_2 + w_3 + \dots + w_m) \rho \sigma^2 + \lambda = 0 \\ \frac{\partial \mathcal{L}}{\partial w_2} = 2w_2\sigma^2 + 2(w_1 + w_2 + \dots + w_m) \rho \sigma^2 + \lambda = 0 \\ \vdots \\ \frac{\partial \mathcal{L}}{\lambda} = 1-\sum_{i=1}^{m} w_i = 0 $$ Then I convert to matrix form as, $$ \begin{bmatrix} 2\sigma^2 & 2\rho\sigma^2 & 2\rho\sigma^2 &\dots& 1 \\ 2\sigma^2 & 2\rho\sigma^2 & 2\rho\sigma^2 & \dots&1 \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & 1 &\dots & 0 \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ \lambda \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} $$ Namely, $$ \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ \lambda \end{bmatrix}= \begin{bmatrix} 2\sigma^2 & 2\rho\sigma^2 & 2\rho\sigma^2 &\dots& 1 \\ 2\sigma^2 & 2\rho\sigma^2 & 2\rho\sigma^2 & \dots&1 \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & 1 &\dots & 0 \end{bmatrix}^{-1}\begin{bmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} $$ It is easy to see that $w_1=w_2=w_3...$ thus $w_i$=$\frac{1}{n}$ since they have the same express, but how should I address this mathematically? I am stuck. Also, there is obviously a better way to arrive at the solution, but I am struggling to find out.