I'd like to preface this post by saying that this is my first post on stack exchange, so if there is anything to improve, be it redaction or just the structuring of posts, I'm more than willing to learn how to improve.
I've been playing around with implementations of Varimax rotations (as described in [1], [2]) on R and Python ([3]), and trying to understand the mathematics behind it. In particular, it seems the implementations on R and Scikit-Learn([3]) are using the algorithm described in [4]. The optimization problem that has to be solved is $$arg\max_{R} tr(R^{T}Q(R)) = tr\left[R^T.\Phi^T.\left[(((\Phi.R)\circ (\Phi.R)\circ (\Phi.R))-\frac1p (\Phi.R).diag((\Phi.R)^T.(\Phi.R)) \right]\right]$$ where $\Phi$ is a constant matrix of dimensions $p \times k$ and $R$ is an orthonormal rotation matrix (so this is a constrained optimization problem). The (iterative) solution proposed here is:
1. Start with R = I, where I is the kxk identity matrix
2. Solve argmax tr(R(Q(R)) with R = I
3. Calculate the SVD of Q(R): [U,S,V] = svd(Q(R))
4. Update R as R = UV^{T}, the optimum here is tr(S)
5. Repeat the above procedure until tr(S) variations fall under the specified tolerance
I understand how to optimize each individual trace ( which is just tr(S)), but I don't understand why updating R like this will make sure that you have an increasing sequence of traces until you find the maximum of the problem. In other words, naming each iterate R as $R_{i}$ with it's corresponding $S_{i}$, why can it be said that $tr(S_{i + 1}) \geq tr(S_{i})$?
Also, I am not sure if I can ask more than one part question or if I should split them on different threads, but if it's OK to do so here, I also am unsure whether this would give a global maximum for the problem. My intuition tells me that it doesn't, but I can't be sure. If it really isn't, sounds like an interesting problem to solve (that is, add some algorithm to try and solve it globally).
Thanks in advance!
Citations:
[1]Kaiser, Henry F., The varimax criterion for analytic rotation in factor analysis, Psychometrika 23, 187-200 (1958). ZBL0095.33603.
[2]Sherin, R. J., A matrix formulation of Kaiser’s varimax criterion, Psychometrika 31, 535-538 (1966). ZBL0152.18705.
[3] https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/decomposition/_factor_analysis.py
PD: English is not my first language, so if anything is a bit unclear, I can provide clarification wherever is needed.