Reconstructing $3D$ point cloud from depth maps and known camera parameters

529 Views Asked by At

The Task

I have four depth maps of the same scene, one from each side (front, back, left, right). Given a depth map, I would like to reconstruct a 3D point cloud. For each depth map I have the following camera-specific matrices: intrinsic $(3\times3)$, extrinsic $(3\times4)$ & camera$(3\times4)$.

My attempt

Base

Let us denote the intrisic matrix with $I_C$ and the extrinsic matrix with $E_C$ (for some camera $C$): $$s \begin{bmatrix}u\\v\\1\end{bmatrix} = I_C E_C \begin{bmatrix}X\\Y\\Z\\1\end{bmatrix}$$ Multiplying with $I_C^{-1}$ from the left we get: $$s\cdot I_C^{-1} \begin{bmatrix}u\\v\\1\end{bmatrix} = E_C \begin{bmatrix}X\\Y\\Z\\1\end{bmatrix}$$ Separating $Z$ (which we know from the depth map) from the rest of the vector we derive: $$s\cdot I_C^{-1} \begin{bmatrix}u\\v\\1\end{bmatrix} = E_C^{(-3)} \begin{bmatrix}X\\Y\\1\end{bmatrix}+Z\cdot E_C^{(3)}$$ where $E_C^{(-3)}$ is the matrix that one gets if the third column is omitted while $E_C^{(3)}$ denotes the third column. $$\begin{bmatrix}X\\Y\\1\end{bmatrix} = (E_C^{(-3)})^{-1}\left(s\cdot I_C^{-1} \begin{bmatrix}u\\v\\1\end{bmatrix} -Z\cdot E_C^{(3)}\right)$$

My question(s)

  1. How can I determine $s$ analytically? Is $s$ different for every point or a global constant?
  2. Is there a more straightforward way to do this?
  3. My depth maps have a resolution of $640 \times 480$. Which coordinates do I need to supply as $u$ and $v$? Normalized to the unit interval?