Understanding an affine transformation

1k Views Asked by At

I am trying to implement a modified version of sift tracking. I found a matlab implementation of sift to track objects in which, basically, they take the indices of the matches and compute the following transform:

pts = [x;y]'; % pts contains the points x and their transform y 

pt1 = [point,ones(size(pts,1),2),ones(size(pts,1),2)]; 
pt2 = [ones(size(pts,1),2),pts,ones(size(pts,1),2)]; 
m1= [1,0,0,0,0,0;0,1,0,0,0,0;0,0,0,0,0,0;0,0,0,0,0,0;0,0,0,0,1,0;0,0,0,0,0,0];
m2= [0,0,0,0,0,0;0,0,0,0,0,0;0,0,1,0,0,0;0,0,0,1,0,0;0,0,0,0,0,0;0,0,0,0,0,1];

t1=pt1*m1; 
t2=pt2*m2; 

A = reshape([t1(:) t2(:)]',size(t1,1)+size(t2,1), []); 
b = reshape(npoint(:),2*size(npoint,2),1); % npoints are the indices of the points

transformation = inv(A'*A)*A'*b

I don't understand how the transform is computed nor what is represents.

Has anyone an idea on this?

1

There are 1 best solutions below

2
On BEST ANSWER

inv(A'*A)*A'*b looks like a least squares approximation. You can approximate

$$ A\cdot x \approx b $$

by solving

$$ \left(A^T\cdot A\right)\cdot x = A^T\cdot b $$

which can be solved for $x$ as

$$ x = \left(A^T\cdot A\right)^{-1}\cdot A^T\cdot b $$

You can also fiond this forumla in the section Model verification by linear least squares in the Wikipedia article on SIFT.

Let's also translate these matrices into math notation:

$$ M_1= \begin{bmatrix} 1&0&0&0&0&0\\ 0&1&0&0&0&0\\ 0&0&0&0&0&0\\ 0&0&0&0&0&0\\ 0&0&0&0&1&0\\ 0&0&0&0&0&0 \end{bmatrix} \qquad M_2= \begin{bmatrix} 0&0&0&0&0&0\\ 0&0&0&0&0&0\\ 0&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&0&0\\ 0&0&0&0&0&1 \end{bmatrix} $$

The matrices pt1 and pt2 seem to look something like this:

$$ \textit{pt}_1 = \begin{bmatrix} x_1 & y_1 & 1 & 1 & 1 & 1 \\ x_2 & y_2 & 1 & 1 & 1 & 1 \\ \vdots & & & & & \vdots \\ x_n & y_n & 1 & 1 & 1 & 1 \end{bmatrix} \qquad \textit{pt}_2 = \begin{bmatrix} 1 & 1 & x_1 & y_1 & 1 & 1 \\ 1 & 1 & x_2 & y_2 & 1 & 1 \\ \vdots & & & & & \vdots \\ 1 & 1 & x_n & y_n & 1 & 1 \end{bmatrix} $$

The above formulation disregards the difference between pointand pts. Multiplying those $M$ matrices from the right sets certain columns to zero, so you end up with

$$ t_1 = \begin{bmatrix} x_1 & y_1 & 0 & 0 & 1 & 0 \\ x_2 & y_2 & 0 & 0 & 1 & 0 \\ \vdots & & & & & \vdots \\ x_n & y_n & 0 & 0 & 1 & 0 \end{bmatrix} \qquad t_2 = \begin{bmatrix} 0 & 0 & x_1 & y_1 & 0 & 1 \\ 0 & 0 & x_2 & y_2 & 0 & 1 \\ \vdots & & & & & \vdots \\ 0 & 0 & x_n & y_n & 0 & 1 \end{bmatrix} $$

The reshape then stacks these matrices.

$$ A = \begin{bmatrix} x_1 & y_1 & 0 & 0 & 1 & 0 \\ x_2 & y_2 & 0 & 0 & 1 & 0 \\ \vdots & & & & & \vdots \\ x_n & y_n & 0 & 0 & 1 & 0 \\ 0 & 0 & x_1 & y_1 & 0 & 1 \\ 0 & 0 & x_2 & y_2 & 0 & 1 \\ \vdots & & & & & \vdots \\ 0 & 0 & x_n & y_n & 0 & 1 \end{bmatrix} $$

These are the same lines you also find in the Wikipedia article, although Wikipedia has them interleaved, so they are in a different order.

Now the right side is built from npoint. Despite the comment, I assume that this is not point indices, but instead the new positions of the points, after the transformation was applied. Wikipedia calls this $u$ and $v$. So you have

$$ b = \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} $$

So what you are doing is you are solving the following least squares approximation problem:

$$ \begin{bmatrix} x_1 & y_1 & 0 & 0 & 1 & 0 \\ x_2 & y_2 & 0 & 0 & 1 & 0 \\ \vdots & & & & & \vdots \\ x_n & y_n & 0 & 0 & 1 & 0 \\ 0 & 0 & x_1 & y_1 & 0 & 1 \\ 0 & 0 & x_2 & y_2 & 0 & 1 \\ \vdots & & & & & \vdots \\ 0 & 0 & x_n & y_n & 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} m_1 \\ m_2 \\ m_3 \\ m_4 \\ t_x \\ t_y \end{bmatrix} \approx \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} $$

which is just another way of writing

$$ \begin{bmatrix}m_1 & m_2 \\ m_3 & m_4\end{bmatrix} \cdot\begin{bmatrix}x_i \\ y_i\end{bmatrix} +\begin{bmatrix}t_x \\ t_y\end{bmatrix} \approx\begin{bmatrix}u_i \\ v_i\end{bmatrix} $$

So in the end, your transformation will be a vector of six elements, namely $(m_1, m_2, m_3, m_4, t_x, t_y)$, which define the affine transformation

$$ \begin{bmatrix}x \\ y\end{bmatrix} \mapsto \begin{bmatrix}m_1 & m_2 \\ m_3 & m_4\end{bmatrix} \cdot\begin{bmatrix}x \\ y\end{bmatrix} +\begin{bmatrix}t_x \\ t_y\end{bmatrix} $$

Note that I wouldn't program things that way. I'd rather treat this as two systems of equations, since rows related to $u$ don't influence those related to $v$. So I'd write something like

$$ \begin{bmatrix} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ \vdots & & \vdots \\ x_n & y_n & 1 \end{bmatrix} \cdot \begin{bmatrix} m_1 & m_3 \\ m_2 & m_4 \\ t_x & t_y \end{bmatrix} \approx \begin{bmatrix} u_1 & v_1 \\ u_2 & v_2 \\ \vdots & \vdots \\ u_n & v_n \end{bmatrix} $$

This should translate into matlab like this:

pts = [x;y]';
A = [pts, ones(size(pts,1),1)];
transformation = (inv(A'*A)*A'*npoint)'