Variance of a Random Forest

399 Views Asked by At

I'm reading through Introduction to Statistical Learning (ISL) right now and I'm having trouble with understanding the variance of a random forest. Does anyone know how this variance is derived? What is $\rho$?

enter image description here

1

There are 1 best solutions below

2
On

The quote says $\rho$ is the pairwise correlation, i.e. the correlation between any two of the random variables. In other words $\mathbb E[(X_i-\mu)(X_j-\mu)]=\rho\sigma^2$ when $i \not=j$ and $\sigma^2$ when $i=j$

So if we have $B$ such random variables, the variance of their average is $$\mathbb E\left[\left(\frac1B\sum X_i-\mu\right)^2\right] \\= \frac1{B^2}\mathbb E\left[\sum_i (X_i-\mu)\sum_j (X_j-\mu)\right] \\ =\frac1{B^2} \left((B^2-B)\rho \sigma^2+B\sigma^2\right)\\= \rho \sigma^2 +\frac{1-\rho}{B}\sigma^2$$