Given an image of known size in pixels of an rectangle with known world coordinates, find the location and orientation (rotation) of the camera with respect to the world.
Here's an example of a $4032 \times 3024$ image:
Vertices $A,B,C,D$ of the rectangle are the four red dots and their locations in the image (in pixels) relative the top left corner are given as shown.
In addition, we attach a world reference frame to the rectangle whose unit length is one centimeter, and its origin is at point $D$. From measurements taken of the object it is known that the world coordinates of points $C$ and $A$ are as follows
$ C = (15.2, 0,0) $
$ A = (0, 9.5,0) $
Given all that information, I would like to determine the location (position) as well as the orientation (rotation) of the camera relative to the world.
This problem was inspired by this problem
Your help is highly appreciated.
My attempt:
Based on my answer in the above-referenced question, the coordinates given in the image above have to corrected with respect to the center of the image. As mentioned in question, the size of the image is $4032 \times 3024$ pixels. Therefore, the coordinates of the center are $( \dfrac{4032}{2}, \dfrac{3024}{2}) = (2016, 1512) $. With this, the corrected coordinates becomes
$P_1 = A = (-1296, 696)$
$P_2 = B = (976, 828)$
$P_3 = C = (1604, -252) $
$P_4 = D = (-1616,-524) $
And
Next, we have to find the focal length $z_0$ of the camera. This is possible because the four given points are the four corners of a rectangle. So taking four rays, originating from the camera's projection center, of the form
$ Q_i = t_i \begin{bmatrix} P_i \\ z_0 \end{bmatrix} $
If the $R_i$'s are vertices of a rectangle (which they are) then two conditions must be satisfied:
$Q_2 - Q_1 =Q_3 - Q_4 $
$ (Q_2 - Q_1) \cdot (Q_3 - Q_2) = 0 $
The first of these equations gives a linear system of three equations in the four unknown $t_i$'s. Its solution is
$ (t_1, t_2, t_3, t_4) = \lambda \mathbf{v} $
where $\mathbf{v} \in \mathbb{R}^4 $ and is now known.
The second equation leads to a quadratic equation in $z_0$, and gives
$ z_0 = - \sqrt{ \dfrac{ (v_2 P_2 - v_1 P_1) \cdot ( v_3 P_3 - v_2 P_2) }{ (v_3 - v_2)(v_2 - v_1) }} $
From the given distances between the world points, we can calculate the value of $\lambda$
$ \lambda = \dfrac{ 15.2 }{ \| v_3 Q_3 - v_4 Q_4 \| } = \dfrac{ 9.5 }{ \| v_1 Q_1 - v_4 Q_4 \| }$
Now we have $Q_1, Q_2, Q_3, Q_4$, which are the position vectors of the four corners of the rectangle relative to camera's coordinate frame.
Now the world coordinates $r_i$'s of the four corners are related to the $Q_i$'s by
$ r_i = O_c + R_c Q_i, \ i = 1, 2, 3, 4 $
And we have
$ r_4 = (0, 0,0) , r_3 = (15.2, 0,0) , r_1 = ( 0, 9.5,0) $
Taking difference of equations, we get
$ r_3 - r_4 = R_c (Q_3 - Q_4) $
$ r_1 - r_4 = R_c (Q_1 - Q_4) $
Now if we take the cross product
$ V = (r_3 - r_4) \times (r_1 - r_4) $ and $ V' = (Q_3 - Q_4) \times (Q_1 - Q_4)$, then we have the following matrix equation from which we can determ $R_c$:
$[ r_3 - r_4, r_1 - r_4, V ] = R_c [ Q_3 - Q_4, Q_1 - Q_4, V' ] $
So that
$ R_c = [ r_3 - r_4, r_1 - r_4, V ][ Q_3 - Q_4, Q_1 - Q_4, V' ]^{-1} $
Taking any of the equations of the form
$ r_i = O_c + R_c Q_i $
we can determine $O_c$ because all the other variables are now known.
Using the above-outlined method, I obtained for this image
$ R_c = \begin{bmatrix} 0.993754 && 0.07518 && -0.08247 \\ -0.11157 && 0.686052 && -0.71895 \\ 0.00253 && 0.723657 && 0.690155 \end{bmatrix} $
and
$ O_c = \begin{bmatrix} 6.077108 \\ -11.5862 \\ 13.10689 \end{bmatrix} $
