Find transformation matrix between two noisy vector sets by optimization of loss function

1.2k Views Asked by At

In theory I've got two sample sets of $N$ (row) vectors $A = \{\vec{a}_0,...,\vec{a}_{N-1}\}$ and $B = \{\vec{b}_0,..., \vec{b}_{N-1}\}$ with all $\vec{a}_n, \vec{b}_n \in \mathbb{R}³$.

I can make sample sets of almost arbitrary length (arbitrary $N$). Within certain limitations that is.

There's an unknown transformation $T$ so that $\vec{b}_n = \vec{a}_n \cdot T$

Each vector $\vec{a}_n$ and $\vec{b}_n$ is also noisy so that $\vec{a'}_n = \vec{a}_n + \epsilon_a(n)$ and $\vec{b'}_n = \vec{b}_n + \epsilon_b(n)$. So the actual sets I'm working with are rather $A' = \{\vec{a'}_0, ..., \vec{a'}_{N-1}\}$ and $B' = \{\vec{b'}_0, ..., \vec{b'}_{N-1}\}$.

In practice I only have $A'$ and $B'$, so, only the noisy vectors.

Noise can be pretty strong, at least on $B$, noise on $A$ however not as much. And the noise is different on both vectors such that $\epsilon_a(n) \neq \epsilon_b(n)$. For simplification I assume that both $\epsilon$ are normal distributed.

I need to find the transformation matrix $T$ based on $A'$ and $B'$.

Furthermore, I think I can assume that the optimal $T$ is unique for all samples.

Current state

What I'm trying to do at the moment is to minimize the loss between a transformed $\vec{a'}_n$ and $\vec{b'}_n$. The loss is the reduced sum of the squared differences between all $\vec{a'}_n \cdot T$ and $\vec{b'}_n$.

I'm using Python and TensorFlow to optimize $T$, so I can't say much about the gradients. I'm also using the SGD optimizer (have tried others as well) of TensorFlow to optimize $T$.

Due to the noise on $A'$ and $B'$ I can imagine that the loss and thus its gradient are not steady. That seems like a problem. Is that correct? Now that I think about it, probably not.

So, actual questions...

  1. Does this approach at all make sense?
  2. Which prerequisites would this approach have?
  3. Which pitfalls did I overlook?
  4. Which kind of preprocessing of $A'$ and $B'$ do you think might help to stabilize the approach?
  5. Can you think of other, better, more reliable approaches to get the matrix $T$ from the noisy datasets $A'$ and $B'$?
1

There are 1 best solutions below

6
On BEST ANSWER

The approach makes sense.

It sounds like you're trying to find a matrix $T$ s.t. $AT \sim B$. Here's how I would approach it.

First, consider the equation: $AT = B$ $$$$ $$\text{ $(1)$ } \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \dots \\ a_{N1} & a_{N2} & \dots & a_{Nn} \end{bmatrix} \begin{bmatrix} t_{11} & t_{12} & \dots & t_{1n} \\ t_{21} & t_{22} & \dots & t_{2n} \\ \dots \\ t_{n1} & t_{n2} & \dots & t_{nn} \end{bmatrix} = \begin{bmatrix} b_{11} & b_{12} & \dots & b_{1n} \\ b_{21} & b_{22} & \dots & b_{2n} \\ \dots \\ b_{N1} & b_{N2} & \dots & b_{Nn} \end{bmatrix}$$ $$$$

If $A$ is invertible, could just take $T = A^{-1}B$. Unfortunately, this is unlikely.

However, we can find the minimum norm for a linear system using a pseudo inverse:

$T \sim (A^TA)^{-1}A^TB$ which looks an awful lot like the solution for a multiple regression problem. We can also look at $(1)$ as: $$$$ $$$$ $$\begin{bmatrix} A \end{bmatrix} \begin{bmatrix} \vec t_{1} & \vec t_{2} & & \dots & \vec t_{n} \\ \end{bmatrix} = \begin{bmatrix} \vec b_{1} & \vec b_{2} & & \dots & \vec b_{n} \end{bmatrix}$$

$$\iff \begin{bmatrix} A\vec t_{1} & A\vec t_{2} & & \dots & A\vec t_{n} \\ \end{bmatrix} = \begin{bmatrix} \vec b_{1} & \vec b_{2} & & \dots & \vec b_{n} \end{bmatrix} $$ $$$$

This motivates the idea that you can view the decomposition problem as $n$ separate least squares multiple regression problems:

Let $\vec b_i$ be a column vector of $B$, then find a column vector $\vec t_i$ for $T$ such that: $A\vec t_i \sim \vec b_i$, or :

$$\vec t_i^* = argmin \text{ }||A\vec t_i - \vec b_i||^2 $$

The solution to this problem is: $$\vec t_i^* = (A^TA)^{-1}A^T\vec b_i$$ which is exactly the expression proposed above.

To make your solution more stable, you can take a ridge regression approach to each problem parametrized by some $\lambda > 0$. (Just remember to center and scale your data).

$$\vec t_i^* = (A^TA + \lambda I)^{-1}A^T\vec b_i$$

Therefore, you can break your problem up into solving $n$ separate (possibly regularized) multiple regression problems via Tensorflow. You can use SGD or Numerical linear algebra. All should work, and the columns of your $T$ matrix will be precisely the solutions to each regression problem.

Note that the approach breaks down when the transformation: $T: A \to B$ is nonlinear.