Find transformation matrix between two noisy vector sets by optimization of loss function

Question

Find transformation matrix between two noisy vector sets by optimization of loss function

1.2k Views Asked by Bumbble Comm At 10 May 2026 - 2:51

In theory I've got two sample sets of $N$ (row) vectors $A = \{\vec{a}_0,...,\vec{a}_{N-1}\}$ and $B = \{\vec{b}_0,..., \vec{b}_{N-1}\}$ with all $\vec{a}_n, \vec{b}_n \in \mathbb{R}³$.

I can make sample sets of almost arbitrary length (arbitrary $N$). Within certain limitations that is.

There's an unknown transformation $T$ so that $\vec{b}_n = \vec{a}_n \cdot T$

Each vector $\vec{a}_n$ and $\vec{b}_n$ is also noisy so that $\vec{a'}_n = \vec{a}_n + \epsilon_a(n)$ and $\vec{b'}_n = \vec{b}_n + \epsilon_b(n)$. So the actual sets I'm working with are rather $A' = \{\vec{a'}_0, ..., \vec{a'}_{N-1}\}$ and $B' = \{\vec{b'}_0, ..., \vec{b'}_{N-1}\}$.

In practice I only have $A'$ and $B'$, so, only the noisy vectors.

Noise can be pretty strong, at least on $B$, noise on $A$ however not as much. And the noise is different on both vectors such that $\epsilon_a(n) \neq \epsilon_b(n)$. For simplification I assume that both $\epsilon$ are normal distributed.

I need to find the transformation matrix $T$ based on $A'$ and $B'$.

Furthermore, I think I can assume that the optimal $T$ is unique for all samples.

Current state

What I'm trying to do at the moment is to minimize the loss between a transformed $\vec{a'}_n$ and $\vec{b'}_n$. The loss is the reduced sum of the squared differences between all $\vec{a'}_n \cdot T$ and $\vec{b'}_n$.

I'm using Python and TensorFlow to optimize $T$, so I can't say much about the gradients. I'm also using the SGD optimizer (have tried others as well) of TensorFlow to optimize $T$.

~~Due to the noise on $A'$ and $B'$ I can imagine that the loss and thus its gradient are not steady. That seems like a problem. Is that correct?~~ Now that I think about it, probably not.

So, actual questions...

Does this approach at all make sense?
Which prerequisites would this approach have?
Which pitfalls did I overlook?
Which kind of preprocessing of $A'$ and $B'$ do you think might help to stabilize the approach?
Can you think of other, better, more reliable approaches to get the matrix $T$ from the noisy datasets $A'$ and $B'$?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

The approach makes sense.

It sounds like you're trying to find a matrix $T$ s.t. $AT \sim B$. Here's how I would approach it.

First, consider the equation: $AT = B$ $$$$ $$\text{ $(1)$ } \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \dots \\ a_{N1} & a_{N2} & \dots & a_{Nn} \end{bmatrix} \begin{bmatrix} t_{11} & t_{12} & \dots & t_{1n} \\ t_{21} & t_{22} & \dots & t_{2n} \\ \dots \\ t_{n1} & t_{n2} & \dots & t_{nn} \end{bmatrix} = \begin{bmatrix} b_{11} & b_{12} & \dots & b_{1n} \\ b_{21} & b_{22} & \dots & b_{2n} \\ \dots \\ b_{N1} & b_{N2} & \dots & b_{Nn} \end{bmatrix}$$ $$$$

If $A$ is invertible, could just take $T = A^{-1}B$. Unfortunately, this is unlikely.

However, we can find the minimum norm for a linear system using a pseudo inverse:

$T \sim (A^TA)^{-1}A^TB$ which looks an awful lot like the solution for a multiple regression problem. We can also look at $(1)$ as: $$$$ $$$$ $$\begin{bmatrix} A \end{bmatrix} \begin{bmatrix} \vec t_{1} & \vec t_{2} & & \dots & \vec t_{n} \\ \end{bmatrix} = \begin{bmatrix} \vec b_{1} & \vec b_{2} & & \dots & \vec b_{n} \end{bmatrix}$$

$$\iff \begin{bmatrix} A\vec t_{1} & A\vec t_{2} & & \dots & A\vec t_{n} \\ \end{bmatrix} = \begin{bmatrix} \vec b_{1} & \vec b_{2} & & \dots & \vec b_{n} \end{bmatrix} $$ $$$$

This motivates the idea that you can view the decomposition problem as $n$ separate least squares multiple regression problems:

Let $\vec b_i$ be a column vector of $B$, then find a column vector $\vec t_i$ for $T$ such that: $A\vec t_i \sim \vec b_i$, or :

$$\vec t_i^* = argmin \text{ }||A\vec t_i - \vec b_i||^2 $$

The solution to this problem is: $$\vec t_i^* = (A^TA)^{-1}A^T\vec b_i$$ which is exactly the expression proposed above.

To make your solution more stable, you can take a ridge regression approach to each problem parametrized by some $\lambda > 0$. (Just remember to center and scale your data).

$$\vec t_i^* = (A^TA + \lambda I)^{-1}A^T\vec b_i$$

Therefore, you can break your problem up into solving $n$ separate (possibly regularized) multiple regression problems via Tensorflow. You can use SGD or Numerical linear algebra. All should work, and the columns of your $T$ matrix will be precisely the solutions to each regression problem.

Note that the approach breaks down when the transformation: $T: A \to B$ is nonlinear.

Find transformation matrix between two noisy vector sets by optimization of loss function

Current state

So, actual questions...

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in TRANSFORMATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions