Minimizing the variance in a variant of bagging Weighted Aggregation(Wagging)

68 Views Asked by Bumbble Comm At 11 Apr 2026 - 2:52

In our machine learning course we have learned Bagging, wherein A variant of bagging call Weighted Aggregation is introduced, where the result is a weighted sum of all the estimators instead of average. Namely,$$y_i=w_i h(x)$$,the predicted label is therefore $$\bar{Y} = \sum_i w_i yi $$ where $\sum_i w_i = 1$ and $w_i \ge 0$

Since Bagging is used primarily to reduce variance, we have been asked to how to choose $w_i$'s, to minimize $\text{Cov}(\bar{Y})$ under the assumption $\text{Var}(y_i)=\sigma^2$ and $\text{Cov}(y_i,y_j)=\rho \sigma^2$. The result is quite straight forward and intuitive, $w_i=\frac{1}{m}$, where $m$ denotes the number of estimators. But how to prove it mathematically? We have been asked to prove it by both using Lagrange multiplier and not using it. Can anybody shed some lights? I've been working on it with any luck.

What I am having now is $$ \text{min }\sigma^2 \sum_{i=1}^{n} w_i^2+2\rho\sigma^2 \sum_{1\le i < j \le n}^n w_iw_j, \\ \text{subject to} \sum_{i=1}^{n} w_i = 1. $$ Then I got the Lagrangian, $$ \mathcal{L}(w_1,\dots,w_n, \lambda) = \sigma^2 \sum_{i=1}^{n} w_i^2 + \rho \sigma^2 \sum_{i \ne j} w_i w_j - \lambda (1 - \sum_{i=1}^{n} w_i)$$ Take partial derivative w.r.t each $w_i$'s and $\lambda$ and set to $0$ I got, $$ \frac{\partial \mathcal{L}}{\partial w_1} = 2w_1\sigma^2 + 2(w_2 + w_3 + \dots + w_m) \rho \sigma^2 + \lambda = 0 \\ \frac{\partial \mathcal{L}}{\partial w_2} = 2w_2\sigma^2 + 2(w_1 + w_2 + \dots + w_m) \rho \sigma^2 + \lambda = 0 \\ \vdots \\ \frac{\partial \mathcal{L}}{\lambda} = 1-\sum_{i=1}^{m} w_i = 0 $$ Then I convert to matrix form as, $$ \begin{bmatrix} 2\sigma^2 & 2\rho\sigma^2 & 2\rho\sigma^2 &\dots& 1 \\ 2\sigma^2 & 2\rho\sigma^2 & 2\rho\sigma^2 & \dots&1 \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & 1 &\dots & 0 \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ \lambda \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} $$ Namely, $$ \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ \lambda \end{bmatrix}= \begin{bmatrix} 2\sigma^2 & 2\rho\sigma^2 & 2\rho\sigma^2 &\dots& 1 \\ 2\sigma^2 & 2\rho\sigma^2 & 2\rho\sigma^2 & \dots&1 \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & 1 &\dots & 0 \end{bmatrix}^{-1}\begin{bmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} $$ It is easy to see that $w_1=w_2=w_3...$ thus $w_i$=$\frac{1}{n}$ since they have the same express, but how should I address this mathematically? I am stuck. Also, there is obviously a better way to arrive at the solution, but I am struggling to find out.

Original Q&A

Minimizing the variance in a variant of bagging Weighted Aggregation(Wagging)

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions