I'm struggeling with a part of a proof.
Let $A = \mathcal{N}(\mu, \Sigma)$ be a $n-$variate Gaussian, and let $R$ be a $n \times n$ rotation matrix. We can rotate this distribution by the rotation matrix via $\mathcal{N}(R \mu, R\Sigma R^T)$. Now I want to know: under which rotation matrix is the sum of standard deviations minimized?
To formalize, we want to minimize the square rooted elements along the diagonal: $$\text{argmin}_{R} \sum_{i=1}^n \sqrt{(R\Sigma R^T)_{i,i}}$$
Strong suspicion:
I have a strong suspicion that if we rotate our distribution such that it becomes uncorrelated (use the normalized principle component vectors as a new basis and construct $R$ such that we rotate to that basis), this sum is minimized. This appeared to be the case when I solved this numerically.
In terms of reasoning why, I'm getting stuck.
Steps so far:
The trace (so sum of marginal variances) of $R\Sigma R^T$ remains equal under rotation.
Therefore, the problem "feels" a bit like minimizing $\sum_{i=1}^n |a_i|$ for a set of numbers under the constraint that $C = \sum_{i=1}^n a_i^2$ which is usually done by trying to maximize the difference between the $a_i$'s, but that's how far I got.
Maybe we can use the fact that the uncorrelated basis/principle component basis is used in PCA as it is the direction that explains the maximum amount of variance?
Another way I tried to look at it is by taking an uncorrelated Gaussian and showing that any rotation would increase the sum of marginal standard deviations, but that didn't help much either.
Consider any $n\times n$ real symmetric PSD matrix $B=UDU^T$ where $U$ is orthogonal and $D$ is diagonal. Now collect the diagonal elements of $B$ in vector $\mathbf b$ and collect the eigenvalues of $B$ (diagonals of $D$) in vector $\mathbf d$.
1.) $\mathbf b\preceq \mathbf d$
which reads: $\mathbf b$ is (strongly) majorized by $\mathbf d$
this is a corollary of Maximize $\mathrm{tr}(Q^TCQ)$ subject to $Q^TQ=I$
with the additional constraint that the columns of $Q$ are standard basis vectors which proves that for any $m\in \big\{1,2,\dots,n\big\}$
$\sum_{k=1}^m b_{[k]} \leq \sum_{k=1}^m d_{[k]}$
(where e.g. $b_{[k]}$ denotes the kth largest value in $\mathbf b$)
And recall for the case of $m=n$ that
$\sum_{k=1}^n d_{k}=\text{trace}\big(D\big)=\text{trace}\big(UDU^T\big)=\text{trace}\big(B\big)=\sum_{k=1}^n b_{k}$
2.) for $x\geq 0$ note that $x\mapsto x^\frac{1}{2}$ is concave (check 2nd derivative), thus $f:\mathbb R_{\geq 0}^n\longrightarrow \mathbb R$ given by
$f\big(\mathbf a\big)= \sum_{k=1}^n a_k^\frac{1}{2}$ is Schur Concave
Putting (1) and (2) together gives
$\text{trace}\big((B\circ I)^\frac{1}{2}\big)=f\big(\mathbf b\big)\geq f\big(\mathbf d\big) = \text{trace}\big(D^\frac{1}{2}\big)$