If $X_1, \ldots, X_n$ are iid $N(0,1)$ or in other words $\mathbf{X}=(X_1, \ldots, X_n)$ is distributed $N(\mathbf{0}, \mathbf{I})$, then $A\mathbf{X}+\mu$ is distributed $N(\mu, AA^t)$. Showing that the covariance matrix becomes $AA^t$ is a mathematically well known result.
But I am having trouble seeing intuitively why the rigorous mathematics should end up with the result $AA^t$. For example consider the related example of $X=(X_1,X_2)$ distributed $N(0,C)$ where
$$C = \begin{pmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \end{pmatrix}$$ and $\sigma_1^2=1,\sigma_2^2=2$.
And consider the matrix
$$A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}$$ and the random vector $AX$. The random vector $X$ has density that looks like the left image (horribly drawn and not to scale). Since the effect of $AX$ should look something like the right image and since $A$ acts differently on the two axis why should resulting covariance matrix look like $ACA^t$.
The covariance matrix has the decomposition $C=UDU^t$ how can one see that the new covariance matrix $ACA^t=AUDU^tA^t$ has the intuitively expected orthogonal basis in its spectral decomposition?
