We use a pinhole camera model; the parameters we estimate for each camera area rotation R, a translation t, a focal length f and two radial distortion parameters k1 and k2. The formula for projecting a 3D point X into a camera R,t,f,k1,k2 is:
P = R * X + t (conversion from world to camera coordinates)
p = -P / P.z (perspective division)
p' = f * r(p) * p (conversion to pixel coordinates)
where P.z is the third (z) coordinate of P. In the last equation, r(p) is a function that computes a scaling factor to undo the radial distortion: r(p) = 1.0 + k1 * ||p||^2 + k2 * ||p||^4. This gives a projection in pixels, where the origin of the image is the center of the image, the positive x-axis points right, and the positive y-axis points up (in addition, in the camera coordinate system, the positive z-axis points backwards, so the camera is looking down the negative z-axis, as in OpenGL).
Where, there camera and point indices start from 0. Each camera is a set of 9 parameters - R,t,f,k1 and k2. The rotation R is specified as a Rodrigues' vector.
Is the R in the pinhole model the same R as they talk about in the 9 parameters in the buttom? I cant seem to make sense of R*X+t should give a new 3x1 vector P if R is just a vector?
What part am I mising?
I would like to understand their way of using the pinhole model.
No its not same.
From your equations;
R * X is not possible as R and X are both 3x1 vectors. So R has to be 3x3 matrix.
Lets assume that R is a 3x3 rotation matrix in pinhole model whereas R_ is 3x1 rotation vector obtained by taking Rodrigues of R. K is camera Intrinsic matrix containing f,k1,k2. Following will be the proper way to solve this problem.
The code below is taken from OpenCV's bundle adjuster. It shows how to get a new vector P'.