I am studying computer vision and learning about different geometric transformations in 2D and 3D. I understand that the 8-point algorithm requires 8 2D point correspondences to recover the 8 DOF of the fundamental matrix, and given some other information, we can recover with fewer points required.
I also know that 4 corresponding pairs of coplanar 2D points can be used to recover a homography using the DLT.
Given 3D correspondences, though, what is the minimum number of points required to correctly determine a similarity transform (uniform scale, rotation, and translation) between point sets?
Intuitively, I would think only 3 non-collinear point correspondences are necessary (and ideally the estimation would happen in a robust way using RANSAC or something similar), but I can't make a DOF argument or the like. Can someone provide the intuition here?
Thanks!
Think about it in terms of the degrees of freedom in the transformation. For a composition of a scaled rotation with a translation we have the following degrees of freedom: 3 for the translation, 3 for the rotation (axis and angle), and one more for the uniform scale factor. Each pair of points contributes three equations, so three pairs of points should be more than enough, while two pairs doesn’t quite do it. An appendix to the paper linked by Peter Sheldrick in the comment above describes a straightforward way to compute the transformation given exactly three pairs of points.