Find the viewing angles of a cube from a projection of its edges

688 Views Asked by At

If I project a cube's corner on a plane by photographing it from a distance I get three lines radiating from a point. From two angles $A$, $B$ made by these three lines in the plane, I should be able to calculate the viewing angles relative to the cube's ($\mathbf{\hat{x}}$, $\mathbf{\hat{y}}$, $\mathbf{\hat{z}}$) axes in spherical coordinages.

I should be able to, but I can't right now. I need a helpful hint how to do this projection without going too deep into projective geometry

I'm just asking about the simple math problem here; photogrammetry is a whole topic unto itself.

Cropped from 17-Oct-2018 NASA ICE tweet, (sadly, the dimensions are not in the ratio 1:4:9)

enter image description here

2

There are 2 best solutions below

10
On

Well, I gave it a shot but couldn't come up with an easy way to solve the equation for $\phi$ except guess and check. Someone more knowledgeable may know. Edit: Someone helped me with the equation (thanks Yves) so it's now easier to solve for $\phi$.

enter image description here

I've added some further explanation. I should clarify that angles A,A',B, B' are all on the top surface of the cube between the z axis projected onto the $xy$ plane, and the $x$ and $y$ axes. We are trying to determine the polar coordinates of the line of view which includes two angles. These are, the angle the line of sight makes with the positive $x$ axis (angle $A$ can be used to determine $\theta$) and the angle the line of sight makes with the $z$ axis (angle $\phi$). I've projected a true view of the top surface and shown how the two dark triangles, co-planar with the top surface of the cube, have been elongated along their adjacent sides from $a1$ to $a2$. Angle $\phi$ is the angle the top surface is tilted through to become a true view. Sorry about the projection lines not matching up I couldn't get it all on one image.

enter image description here

enter image description here

enter image description here

5
On

For a general perspective projection, knowing the directions of the images of the coordinate axes at some image point (or the angles between them) isn’t enough to recover the camera’s attitude. The problem is that their vanishing points are usually finite, so these angles depend on the location in the image. For the image in your question, it’s probably safe to assume that the object is far from the camera and that the range of depths in that part of the image is small relative to this distance. This allows us to approximate the perspective projection with an orthographic projection—i.e., to use an affine camera. (Hartley & Zisserman discuss the errors in this approximation in Multiple View Geometry In Computer Vision §6.3.) The nice thing about this approximation for our purposes is that vanishing points of lines that are not parallel to the projection direction are all at infinity—parallel lines are mapped to parallel lines—so that the directions of the coordinate axis images don’t depend on the position in the image.

An affine camera matrix can be decomposed as $$\mathtt P = \begin{bmatrix} \mathtt K_{2\times2} & \mathbf t \\ \mathbf 0^T & 1 \end{bmatrix} \begin{bmatrix} \mathtt R_2 & \mathbf 0 \\ \mathbf 0^T & 1 \end{bmatrix}.$$ Here, $\mathtt R_2$ is the first two rows of a $3\times3$ rotation matrix $\mathtt R$, the intrinsic matrix $\mathtt K_{2\times2}$ is upper-triangular, and $\mathbf t$ is the image coordinates of the world coordinate system origin. We’re essentially trying to recover the missing third row $\mathbf r^3$ of $\mathtt R$. This is the world-coordinate direction vector that points from the origin toward the camera.

The translation $\mathbf t$ only affects the last column of $\mathtt P$, which we don’t care about, so we can safely ignore it. If we assume square pixels and no skew, then, up to an irrelevant constant factor $\mathtt K_{2\times2}=\mathtt I_{2\times2}$, reducing $\mathtt P$ to the matrix at right. The first three columns of a camera matrix are the image coordinates of the world coordinate axes’ vanishing points, so if $\mathbf x$, $\mathbf y$ and $\mathbf z$ are the direction vectors measured from the image, we have $$\mathtt R_2 = \begin{bmatrix}\lambda\mathbf x & \mu\mathbf y & \tau\mathbf z\end{bmatrix}$$ for some positive scale factors $\lambda$, $\mu$ and $\tau$. The rotation matrix $\mathtt R$ is orthogonal, so $\mathbf r^3 = \mathbf r^1\times\mathbf r^2$. You can then convert this to spherical coordinates, if desired. Note that we’ve recovered the entire rotation matrix, so you also have the camera’s roll angle, should you want it. To find the scale factors, use orthogonality of $\mathtt R$ again: $\mathtt R_2\mathtt R_2^T=\mathtt I_{2\times2}$. This produces a system of three linear equations in $\lambda^2$, $\mu^2$ and $\tau^2$ that you can solve using standard methods. I would simply substitute the resulting positive square roots into $\mathtt R_2$ and then compute the cross product of the rows, but if you want a closed-form solution, it’s not too hard to substitute and simplify, although you should be careful to preserve the signs of the elements of $\mathtt R_2$ when canceling against terms under the radicals.

Working from the image in your question, I measure $\mathbf x = (178,20)^T$, $\mathbf y = (-162,58)^T$, $\mathbf z=(-3,166)^T$. Applying the above method yields $\mathbf r^{3T}\approx$ (-0.438, -0.879, 0.189), which has an azimuth of -116.5° and polar angle 79.12°. Using this projection direction produces this picture:

projected axes

which looks about right.

The above isn’t quite the solution that you asked for, but only a small modification is required to work with a pair of angles instead. Essentially, we rotate the image coordinate system so that $\mathbf z = (0,1)^T$. We then start from $$\mathtt R_2 = \begin{bmatrix}\pm\lambda\sin A & \pm\mu\sin B & 0 \\ \pm\lambda\cos A & \pm\mu\cos B & \tau\end{bmatrix},$$ with the signs chosen to reflect the correct quadrants relative to the plumb line $\mathbf z$. Using the angles from the example in Phil H’s answer, this method produces $\mathbf r^{3T} \approx$ (-0.4449, -0.618222, 0.64797), which corresponds to an azimuth of -125.74° and polar angle 49.6113°, the same result as in that other answer.

There are a few degenerate configurations to deal with. If any of the image lines coincide, there’s no way to retrieve the view direction without more information, such as known lengths along the axes that coincided. If the projection is parallel to one of the world coordinate axes, its image will be a single point instead of a line, so the above method doesn’t apply. It’s easy enough to work out an appropriate rotation from the remaining visible axis images, though. An interesting degenerate configuration is when the three vectors are equally spaced: the corresponding linear system has only the trivial solution. I suppose that this null result is reflective of the ambiguity that’s in the old “tiers of cubes” optical illusion. Since you know which edge is which, though, this can be disambiguated, but the above algorithm is of no help.

I haven’t looked into whether the aspect ratio or no-skew restrictions can be relaxed without requiring additional data from the image and scene.