I am trying to code the method describes in "Background Subtraction for Freely Moving Camera" and I got some troubles with the first part "Rank-Constraint for the Background".
Summary
The authors proposed an approach to detect moving objects with a moving camera where the background is static part of the scene (i.e. static objetcs) and the foreground represents the moving objects. To do so, feature points are tracked accross frames and form trajectories. The i-th trajectory can be represented as $w_i = \begin{bmatrix} x_{1i}^\top & ...& x_{Fi}^\top \end{bmatrix} \in \mathbb{R}^{1 \times 2F}$ where $x_{fi} = \begin{bmatrix}u_{fi} & v_{fi} \end{bmatrix} ^\top$ in each frame $f$. The set of these trajectories can be arranged into a $2F \times P$ matrix:
$$ W_{2F \times P} = \begin{bmatrix} w_1^\top & ... & w_P^\top \end{bmatrix} = \begin{pmatrix} u_{11} & ... & u_{1P} \\ v_{11} & ... & v_{1P} \\ ... & & ... \\ u_{F1} & ... & u_{FP} \\ v_{F1} & ... & v_{FP} \\ \end{pmatrix} $$
With $P$ the numbers of feature points. For some reasons, the authors used a sliding window on 30 frames and so $W_{2F \times P} = W_{60 \times P}$
Now, let's talk about the subspace:
The authors want to find a subspace where trajectories that belong to the background fit in this space and those that belong to the foreground don't. To do that, they used the RANSAC algorithm:
- Select 3 trajectories randomly $W_3 = \begin{bmatrix}w_i^\top & w_j^\top & w_k^\top\end{bmatrix} \in \mathbb{R}^{60 \times 3}$.
- Construct a projection matrix $P = W_3(W_3^\top W_3)^{-1} W_3^\top \in \mathbb{R}^{60 \times 60}$
- Measure the projection error $f(w_i|W_3) = || Pw_i^\top - w_i^\top ||_2$
- If $f(w_i|W_3) < threshold$ then the trajectory belongs to the background, else the trajectory belongs to the foreground.
My problem
The 3 trajectories selected to construct the projection matrix got a very low projection error (that's ok). The problem is that other trajectories with the same "shape" that the 3 basis trajectories got a high error (more that 1 and the authors fixed the treshold of RANSAC to 0.01). If I translate one of the 3 basis trajectories, just for the projection error computation, I also got a high projection error.
The authors didn't explain if they apply some normalization to the trajectories before the computation of the projection matrix. But I suppose they did since I got a high projection error when I translate one of the 3 basis trajectories.
What do you think about that? Am I totally wrong?
Edit: I pushed the code on github if someone want to test it!
Thanks for your answers!
MN