Folks, I have a basic information theory question. I am fitting a highly parameterized model to some data. In general:
$$ y = \sum\limits_{i=1}^{13} \alpha_i X_i $$
Currently I use gradient descent to find $\alpha$. The general assumption would be that the $X_i$s are i.i.d. However I am not sure if this necessarily holds true for my features and therefore need a method to analyse the correlation between $\alpha_i$s. Also if the feature space has high correlation what would be principled approaches to reduce model complexity? Thanks