So I'm estimating camera matrix P from 3D and 2D points (for full projection).
I have 2 questions:
1.) One should take at least >=6 non collinear points to calculate P. So if I have about 300 points in 3D and 2D, can I select any 6 points in random to calculate P, or is there some criterion I have to check to make sure that these points are non collinear.
2.) After calculating P, say one applies the conversion to the 3D points as:
[c; r; 1] = P * [X; Y; Z; 1];
A previous question I asked here was about the existence of a lambda term on the LHS of the above equation, which seems not to matter as P is a homogeneous matrix so any scaling on it preserves the transformation (as per @amd).
Now say I apply P to a 3D point [2,3,6]:
P * [2; 3; 6; 1] = [20; 10; 5]
So does that mean that to get the 2D point, I need to divide (normalize) values by 5?
That is, [20; 10; 5] / 5 = [4; 2; 1] (this is typically what we do in calculating coordinates from homogenous transformations, also the reason scaling on P does not seem to matter)
So that I can get it in [c; r ; 1] form?
OR
Do I simply set c = 20, r = 10?
EDIT:
I'm getting error in the order of about 10^4 after determining P using a linear method (SVD) and then mapping c to X, is this expected?
Since you’re going to have to do a least-squares fit or use some other estimate to recover $P$ from the point correspondences, why not use all of them, or at least a much larger subset than the minimum five-and-a-half? Unless the data are somehow pathological, the larger sample will tend to reduce the overall error in the estimate and reduce the probability that you have a degenerate configuration of points. BTW, a set of colinear points isn’t the only degenerate configuration. If the camera and all point lie on or near a twisted cubic, the solution will also be ambiguous. Another important degenerate configuration is the union of a plane and a line containing the camera.
As for your second question, yes, you divide through by the third coordinate to convert the homogeneous coordinates to (inhomogeneous) Cartesian. If the last component is $0$, you’ve got a point at infinity, which you’ll get if you project points on the principal plane. For a real camera, such points are outside of the field of view, though. As you noted in another question, depth information is encoded in this third component, so depending on your application you might want to retain it somewhere.
Generally speaking, for any homogeneous coordinates, the significant part is the ratios among the components, which is why any non-zero scalar multiple of the coordinate tuple represents the same object. Other examples of homogeneous coordinates are barycentric and trilinear coordinates. Similarly, for a homogeneous matrix, it’s the ratios between elements that are important, which is why $P$ has only 11 degrees of freedom and is unique up to scalar multiple.