I am trying to calculate Image to World model for my thesis dealing with road lanes. As a disclaimer I have to say that linear algebra is not my strong suite.
The idea is - given that I know yield, pitch, and position of the camera - I can translate image pixels to real world coordinates which will be useful in road recognition algorithm.
I managed to get working Camera Pinhole Perspective projection. Here are the matrices used
Extrinsic Matrix
Translates to camera position and rotates accordingly
$$\begin{pmatrix} &1 &0 &0 &-cx \\ &0 &1 &0 &-cy \\ &0 &0 &1 &-cz \\ &0 &0 &0 &1 \end{pmatrix} \begin{pmatrix} &1 &0 &0 &0 \\ &0 &\cos(\text{yaw}) &-\sin(\text{yaw}) &0 \\ &0 &\sin(\text{yaw}) &\cos(\text{yaw}) &0 \\ &0 &0 &0 &1 \end{pmatrix} \begin{pmatrix} &\cos(\text{pitch})) &0 &\sin(\text{pitch})) &0 \\ &0 &1 &0 &0 \\ &-\sin(\text{pitch}) &0 &\cos(\text{pitch}) &0 \\ &0 &0 &0 &1 \end{pmatrix} $$
Projection
$f$ is the focal length of the camera.
Based on https://en.wikipedia.org/wiki/3D_projection
$$\begin{pmatrix} Fx\\ Fy\\ Fz\\ Fw \end{pmatrix} = \begin{pmatrix} &1 &0 &1/f &0 \\ &0 &1 &1/f &0 \\ &0 &0 &1 &0 \\ &0 &0 &1/f &0 \end{pmatrix} \begin{pmatrix} dx\\ dy\\ dz\\ 1 \end{pmatrix}$$
$$p = \begin{pmatrix} Fx/Fw\\ Fy/Fw\\ 1 \end{pmatrix}$$
Intrinsic Matrix Scales to pixel units and moves origin to center. $w$ is the width of screen and $W$ is the width of the sensor. Similarily with height
$Fx = w/W$
$Fy = h/H$
$$\begin{pmatrix} &Fx &0 &w/2 \\ &0 &Fy &h/2 \\ &0 &0 &1 \\ \end{pmatrix} $$
In typical projection I first multiply 3d point with extrinsic matrix, then project it using Projection matrix and then apply Intrinsic matrix.
But how can I reverse the process? I can use assumption that all points lie on the road plane (Y == 0). Yet I am not sure how to fit it with all these matrixes. I know I can invert Intrinsic and Extrinsic Matrix, but I can't do it with the projection matrix, because is singular.
Any lead would be useful. Thanks
The location on the image plane will give you a ray on which the object lies. You’ll need to use other information to determine where along this ray the object actually is, though. That information is lost when the object is projected onto the image plane. Assuming that the object is somewhere on the road plane is a huge simplification. Now, instead of trying to find the inverse of a perspective mapping, you only need to find a perspective projection of the image plane onto the road. That’s a fairly straightforward construction similar to the one used to derive the original perspective projection.
Start by working in camera-relative coordinates. A point $\mathbf p_i$ on the image plane has coordinates $(x_i,y_i,f)^T$. The original projection maps all points on the ray $\mathbf p_i t$ onto this point. Now, we’re assuming that the road is a plane, so it can be represented by an equation of the form $\mathbf n\cdot(\mathbf p_o-\mathbf r)=0$, where $\mathbf n$ is a normal to the plane and $\mathbf r$ is some known point on it. We seek the intersection of the ray and this plane, which will satisfy $\mathbf n\cdot(\mathbf p_i t-\mathbf r)=0$. Solving for $t$ and substituting gives $$\mathbf p_o = {\mathbf n\cdot \mathbf r \over \mathbf n\cdot \mathbf p_i}\mathbf p_i.$$ Moving to homogeneous coordinates, this mapping is the linear transformation represented by the matrix $$ M = \pmatrix{1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ {n_x \over \mathbf n\cdot\mathbf r} & {n_y \over \mathbf n\cdot\mathbf r} & {n_z \over \mathbf n\cdot\mathbf r} & 0}, $$ i.e., $$ \mathbf p_o = M\pmatrix{x_i \\ y_i \\ f \\ 1}. $$ Once you have this, it should be obvious how to complete the mapping back to world coordinates.
All that’s left is to find the parameters $\mathbf n$ and $\mathbf r$ that describe the road plane in camera coordinates. That’s also pretty simple. Since we’re taking the road to be the plane $y=0$ in world coordinates, its normal there is $(0,1,0)^T$. As for a known point on the road, the origin will do. Another reasonable choice is the point at which the camera’s optical axis meets the road, since the the camera-relative coordinates of that point will be of the form $(0,0,z)^T$. Convert both of these into camera-relative coordinates, and you’re done.
Note that you don’t necessarily need to know anything about the camera to compute a perspective transformation that will map from the image plane to the road plane. If you can somehow find four pairs of non-colinear points, i.e., a pair of quadrilaterals, that correspond to each other on these two planes, a planar perspective transformation that relates them can be computed fairly easily. See here for details. Essentially, you calibrate the camera view by matching a region of the image to a known region in the road plane.
Update 2018.10.22: If you have the complete camera matrix $P$, which you do, there’s a fairly straightforward way to construct the back-mapping to points on the road with a few matrix operations. We choose a coordinate system for the road plane, which gives us a $4\times3$ matrix $M$ that maps from these plane coordinates to world coordinates, i.e., $\mathbf X = M\mathbf x$. The image of this point is $PM\mathbf x$. If $PM$ is invertible, which it will be unless the camera center is on the road plane, the matrix $(PM)^{-1}$ maps from image to plane coordinates, and so the back-mapping from image to world coordinates on the road is $M(PM)^{-1}$. For the plane $Y=0$, a natural choice for $M$ is $$M=\begin{bmatrix}1&0&0\\0&0&0\\0&1&0\\0&0&1\end{bmatrix},$$ which simply inserts a $Y$-coordinate of zero to obtain world coordinates. You can adjust the origin of this coordinate system by changing the last column of $M$.