I am trying to implement a modified version of sift tracking. I found a matlab implementation of sift to track objects in which, basically, they take the indices of the matches and compute the following transform:
pts = [x;y]'; % pts contains the points x and their transform y
pt1 = [point,ones(size(pts,1),2),ones(size(pts,1),2)];
pt2 = [ones(size(pts,1),2),pts,ones(size(pts,1),2)];
m1= [1,0,0,0,0,0;0,1,0,0,0,0;0,0,0,0,0,0;0,0,0,0,0,0;0,0,0,0,1,0;0,0,0,0,0,0];
m2= [0,0,0,0,0,0;0,0,0,0,0,0;0,0,1,0,0,0;0,0,0,1,0,0;0,0,0,0,0,0;0,0,0,0,0,1];
t1=pt1*m1;
t2=pt2*m2;
A = reshape([t1(:) t2(:)]',size(t1,1)+size(t2,1), []);
b = reshape(npoint(:),2*size(npoint,2),1); % npoints are the indices of the points
transformation = inv(A'*A)*A'*b
I don't understand how the transform is computed nor what is represents.
Has anyone an idea on this?
inv(A'*A)*A'*blooks like a least squares approximation. You can approximate$$ A\cdot x \approx b $$
by solving
$$ \left(A^T\cdot A\right)\cdot x = A^T\cdot b $$
which can be solved for $x$ as
$$ x = \left(A^T\cdot A\right)^{-1}\cdot A^T\cdot b $$
You can also fiond this forumla in the section Model verification by linear least squares in the Wikipedia article on SIFT.
Let's also translate these matrices into math notation:
$$ M_1= \begin{bmatrix} 1&0&0&0&0&0\\ 0&1&0&0&0&0\\ 0&0&0&0&0&0\\ 0&0&0&0&0&0\\ 0&0&0&0&1&0\\ 0&0&0&0&0&0 \end{bmatrix} \qquad M_2= \begin{bmatrix} 0&0&0&0&0&0\\ 0&0&0&0&0&0\\ 0&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&0&0\\ 0&0&0&0&0&1 \end{bmatrix} $$
The matrices
pt1andpt2seem to look something like this:$$ \textit{pt}_1 = \begin{bmatrix} x_1 & y_1 & 1 & 1 & 1 & 1 \\ x_2 & y_2 & 1 & 1 & 1 & 1 \\ \vdots & & & & & \vdots \\ x_n & y_n & 1 & 1 & 1 & 1 \end{bmatrix} \qquad \textit{pt}_2 = \begin{bmatrix} 1 & 1 & x_1 & y_1 & 1 & 1 \\ 1 & 1 & x_2 & y_2 & 1 & 1 \\ \vdots & & & & & \vdots \\ 1 & 1 & x_n & y_n & 1 & 1 \end{bmatrix} $$
The above formulation disregards the difference between
pointandpts. Multiplying those $M$ matrices from the right sets certain columns to zero, so you end up with$$ t_1 = \begin{bmatrix} x_1 & y_1 & 0 & 0 & 1 & 0 \\ x_2 & y_2 & 0 & 0 & 1 & 0 \\ \vdots & & & & & \vdots \\ x_n & y_n & 0 & 0 & 1 & 0 \end{bmatrix} \qquad t_2 = \begin{bmatrix} 0 & 0 & x_1 & y_1 & 0 & 1 \\ 0 & 0 & x_2 & y_2 & 0 & 1 \\ \vdots & & & & & \vdots \\ 0 & 0 & x_n & y_n & 0 & 1 \end{bmatrix} $$
The
reshapethen stacks these matrices.$$ A = \begin{bmatrix} x_1 & y_1 & 0 & 0 & 1 & 0 \\ x_2 & y_2 & 0 & 0 & 1 & 0 \\ \vdots & & & & & \vdots \\ x_n & y_n & 0 & 0 & 1 & 0 \\ 0 & 0 & x_1 & y_1 & 0 & 1 \\ 0 & 0 & x_2 & y_2 & 0 & 1 \\ \vdots & & & & & \vdots \\ 0 & 0 & x_n & y_n & 0 & 1 \end{bmatrix} $$
These are the same lines you also find in the Wikipedia article, although Wikipedia has them interleaved, so they are in a different order.
Now the right side is built from
npoint. Despite the comment, I assume that this is not point indices, but instead the new positions of the points, after the transformation was applied. Wikipedia calls this $u$ and $v$. So you have$$ b = \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} $$
So what you are doing is you are solving the following least squares approximation problem:
$$ \begin{bmatrix} x_1 & y_1 & 0 & 0 & 1 & 0 \\ x_2 & y_2 & 0 & 0 & 1 & 0 \\ \vdots & & & & & \vdots \\ x_n & y_n & 0 & 0 & 1 & 0 \\ 0 & 0 & x_1 & y_1 & 0 & 1 \\ 0 & 0 & x_2 & y_2 & 0 & 1 \\ \vdots & & & & & \vdots \\ 0 & 0 & x_n & y_n & 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} m_1 \\ m_2 \\ m_3 \\ m_4 \\ t_x \\ t_y \end{bmatrix} \approx \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} $$
which is just another way of writing
$$ \begin{bmatrix}m_1 & m_2 \\ m_3 & m_4\end{bmatrix} \cdot\begin{bmatrix}x_i \\ y_i\end{bmatrix} +\begin{bmatrix}t_x \\ t_y\end{bmatrix} \approx\begin{bmatrix}u_i \\ v_i\end{bmatrix} $$
So in the end, your
transformationwill be a vector of six elements, namely $(m_1, m_2, m_3, m_4, t_x, t_y)$, which define the affine transformation$$ \begin{bmatrix}x \\ y\end{bmatrix} \mapsto \begin{bmatrix}m_1 & m_2 \\ m_3 & m_4\end{bmatrix} \cdot\begin{bmatrix}x \\ y\end{bmatrix} +\begin{bmatrix}t_x \\ t_y\end{bmatrix} $$
Note that I wouldn't program things that way. I'd rather treat this as two systems of equations, since rows related to $u$ don't influence those related to $v$. So I'd write something like
$$ \begin{bmatrix} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ \vdots & & \vdots \\ x_n & y_n & 1 \end{bmatrix} \cdot \begin{bmatrix} m_1 & m_3 \\ m_2 & m_4 \\ t_x & t_y \end{bmatrix} \approx \begin{bmatrix} u_1 & v_1 \\ u_2 & v_2 \\ \vdots & \vdots \\ u_n & v_n \end{bmatrix} $$
This should translate into matlab like this: