Let $N$ a large number. Let $x,y \in \mathbb{R}^{2 \times N}$. Each column of these vectors is denoted with subindex $i$ : $x_i$ stands for the i-th column of $x$. Let $R$ be a $2 \times 2$ rotation matrix, and $\Delta \in \mathbb{R}^2$, such that: \begin{equation} R^Tx_i = y_i-\Delta \end{equation} In practice, there is noise $e_i \in \mathbb{R}^2$ on the signal $y_i$ : \begin{equation} R^Tx_i = (y_i+e_i)-\Delta \end{equation} Its covariance $\Sigma_i \in \mathbb{R}^{2 \times 2}$ is given at each time sample.
The least squares problem that we are interested in is: \begin{equation} \min_{R,\Delta} \frac{1}{N}\sum_{i = 1}^N \| \Sigma_i^{-1/2}( R^T x_i - y_i + \Delta)\|_2^2 \end{equation} to account for the uncertainty in each sample $y_i$ through $\Sigma_i$. How to solve for $R, \Delta$ ?
Since $N$ is a constant, we don't have to worry about it, and we'll just minimize
$f(R, \Delta) = \displaystyle \sum_{i=1}^N (R^T x_i - y_i + \Delta)^T \sigma_i (R^T x_i - y_i + \Delta) $
The gradient of $f(R, \Delta)$ with resepct to $\Delta$ is
$ \nabla_{\Delta} f = 2 \displaystyle \sum_{i=1}^N \sigma_i (R^T x_i - y_i + \Delta ) = 0 $
Here $\sigma_i$ is the weight of the $i$-th observation.
And this means that at the minimum we will have
$ \Delta = \displaystyle - \dfrac{ \sum_{i=1}^N \sigma_i (R^T x_i - y_i)}{\sum_{i=1}^N \sigma_i} = - \dfrac{1}{\sum_{i=1}^N \sigma_i} \cdot \left( R^T \left(\sum_{i=1}^N \sigma_i x_i\right) - \left(\sum_{i=1}^N \sigma_i y_i \right) \right)\\ = - \left( R^T x_{ave} - y_{ave} \right)$
where
$ x_{ave} = \dfrac{1}{\sum_{i=1}^N \sigma_i} \cdot \displaystyle \left( \sum_{i=1}^N \sigma_i x_i\right)$
$ y_{ave} = \dfrac{1}{\sum_{i=1}^N \sigma_i} \cdot \displaystyle \left(\sum_{i=1}^N \sigma_i y_i \right) $
Substituting the expression for $\Delta$ above into $f(R, \Delta) $ yields
$ f(R) = \displaystyle \sum_{i=1}^N (R^T z_i - w_i)^T \sigma_i (R^T z_i - w_i) $
where $z_i = x_i - x_{ave} $ and $ w_i = y_i - y_{ave} $
Expanding $f(R)$:
$f(R) = \displaystyle \sum_{i=1}^N \sigma_i \left( z_i^T R R^T z_i + w_i^T w_i - 2 z_i^T R w_i \right) = \sum_{i=1}^N \sigma_i \left( z_i^T z_i + w_i^T w_i - 2 z_i^T R w_i \right) $
The first two terms in the summand are constant and independent of $R$, so they can dropped from the objective function. Hence, we now want to minimize
$ \underset{R}{\text{min}} \hspace{10pt} \left(- 2 \displaystyle \sum_{i=1}^N \sigma_i z_i^T R w_i \right)$
Hence, we want to find
$ \underset{R}{\text{max}} \hspace{10pt} \left( \displaystyle \sum_{i=1}^N \sigma_i z_i^T R w_i \right)$
Now
$ \sigma_i z_i^T R w_i = \text{trace}( \sigma_i w_i z_i^T R )$
Therefore
$ \displaystyle \sum_{i=1}^N \left( \sigma_i z_i^T R w_i \right) = \text{trace}( W \Sigma Z^T R) $
where $W \in \mathbb{R}^{2 \times N} $ has its $i$-th column equal to $w_i$ and $ Z \in \mathbb{R}^{2 \times N} $ has its $i$-th column equal to $z_i$, and $\Sigma \in \mathbb{R}^{N \times N} $ is diagonal with the $i$-th diagonal element equal to $\sigma_i $.
At this point, we can find the SVD (Singular Value Decomposition) of the matrix $ W \Sigma Z^T $ , so that
$ W \Sigma Z^T = U S V^T $
here $S$ is the diagonal matrix of the singular values resulting from the decomposition, and $U$ and $V$ are unitary matrices (i.e. they are rotation matrices). Now we have
$ \text{trace}( U S V^T R ) = \text{trace} ( S V^T R U ) $
Note that $V^T R U$ is a rotation matrix, and thus the trace will be maximum where $V^T R U$ is equal to the identity matrix. Therefore for the maximum trace we must have
$ V^T R U = I $
i.e.
$ R = V U^T $
Once we have $R$, we can calculate $ \Delta = - \left( R^T x_{ave} - y_{ave} \right)$