In theory I've got two sample sets of $N$ (row) vectors $A = \{\vec{a}_0,...,\vec{a}_{N-1}\}$ and $B = \{\vec{b}_0,..., \vec{b}_{N-1}\}$ with all $\vec{a}_n, \vec{b}_n \in \mathbb{R}³$.
I can make sample sets of almost arbitrary length (arbitrary $N$). Within certain limitations that is.
There's an unknown transformation $T$ so that $\vec{b}_n = \vec{a}_n \cdot T$
Each vector $\vec{a}_n$ and $\vec{b}_n$ is also noisy so that $\vec{a'}_n = \vec{a}_n + \epsilon_a(n)$ and $\vec{b'}_n = \vec{b}_n + \epsilon_b(n)$. So the actual sets I'm working with are rather $A' = \{\vec{a'}_0, ..., \vec{a'}_{N-1}\}$ and $B' = \{\vec{b'}_0, ..., \vec{b'}_{N-1}\}$.
In practice I only have $A'$ and $B'$, so, only the noisy vectors.
Noise can be pretty strong, at least on $B$, noise on $A$ however not as much. And the noise is different on both vectors such that $\epsilon_a(n) \neq \epsilon_b(n)$. For simplification I assume that both $\epsilon$ are normal distributed.
I need to find the transformation matrix $T$ based on $A'$ and $B'$.
Furthermore, I think I can assume that the optimal $T$ is unique for all samples.
Current state
What I'm trying to do at the moment is to minimize the loss between a transformed $\vec{a'}_n$ and $\vec{b'}_n$. The loss is the reduced sum of the squared differences between all $\vec{a'}_n \cdot T$ and $\vec{b'}_n$.
I'm using Python and TensorFlow to optimize $T$, so I can't say much about the gradients. I'm also using the SGD optimizer (have tried others as well) of TensorFlow to optimize $T$.
Due to the noise on $A'$ and $B'$ I can imagine that the loss and thus its gradient are not steady. That seems like a problem. Is that correct? Now that I think about it, probably not.
So, actual questions...
- Does this approach at all make sense?
- Which prerequisites would this approach have?
- Which pitfalls did I overlook?
- Which kind of preprocessing of $A'$ and $B'$ do you think might help to stabilize the approach?
- Can you think of other, better, more reliable approaches to get the matrix $T$ from the noisy datasets $A'$ and $B'$?
The approach makes sense.
It sounds like you're trying to find a matrix $T$ s.t. $AT \sim B$. Here's how I would approach it.
First, consider the equation: $AT = B$ $$$$ $$\text{ $(1)$ } \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \dots \\ a_{N1} & a_{N2} & \dots & a_{Nn} \end{bmatrix} \begin{bmatrix} t_{11} & t_{12} & \dots & t_{1n} \\ t_{21} & t_{22} & \dots & t_{2n} \\ \dots \\ t_{n1} & t_{n2} & \dots & t_{nn} \end{bmatrix} = \begin{bmatrix} b_{11} & b_{12} & \dots & b_{1n} \\ b_{21} & b_{22} & \dots & b_{2n} \\ \dots \\ b_{N1} & b_{N2} & \dots & b_{Nn} \end{bmatrix}$$ $$$$
If $A$ is invertible, could just take $T = A^{-1}B$. Unfortunately, this is unlikely.
However, we can find the minimum norm for a linear system using a pseudo inverse:
$T \sim (A^TA)^{-1}A^TB$ which looks an awful lot like the solution for a multiple regression problem. We can also look at $(1)$ as: $$$$ $$$$ $$\begin{bmatrix} A \end{bmatrix} \begin{bmatrix} \vec t_{1} & \vec t_{2} & & \dots & \vec t_{n} \\ \end{bmatrix} = \begin{bmatrix} \vec b_{1} & \vec b_{2} & & \dots & \vec b_{n} \end{bmatrix}$$
$$\iff \begin{bmatrix} A\vec t_{1} & A\vec t_{2} & & \dots & A\vec t_{n} \\ \end{bmatrix} = \begin{bmatrix} \vec b_{1} & \vec b_{2} & & \dots & \vec b_{n} \end{bmatrix} $$ $$$$
This motivates the idea that you can view the decomposition problem as $n$ separate least squares multiple regression problems:
Let $\vec b_i$ be a column vector of $B$, then find a column vector $\vec t_i$ for $T$ such that: $A\vec t_i \sim \vec b_i$, or :
$$\vec t_i^* = argmin \text{ }||A\vec t_i - \vec b_i||^2 $$
The solution to this problem is: $$\vec t_i^* = (A^TA)^{-1}A^T\vec b_i$$ which is exactly the expression proposed above.
To make your solution more stable, you can take a ridge regression approach to each problem parametrized by some $\lambda > 0$. (Just remember to center and scale your data).
$$\vec t_i^* = (A^TA + \lambda I)^{-1}A^T\vec b_i$$
Therefore, you can break your problem up into solving $n$ separate (possibly regularized) multiple regression problems via Tensorflow. You can use SGD or Numerical linear algebra. All should work, and the columns of your $T$ matrix will be precisely the solutions to each regression problem.
Note that the approach breaks down when the transformation: $T: A \to B$ is nonlinear.