Geometric Understanding of a Plane Induced homography

1.2k Views Asked by At

Ive been working with Hartley and Zisserman's Multiple View Geometry. I am struggling to understand equation 13.2 in which he defines a plane induced homography:

$H_\pi=K'(R-tn^\top/d)K^{-1}$

However, I understand how the intrinsic matricies K work. So, for the purpose of this post, we can treat them as the identity and reduce to:

$H_\pi=R-tn^\top/d$

So here are my questions:

1) What is the geometric interpretation of the outer product $tn^\top$? I understand that a cross product is usually the vector orthogonal to two vectors, but the transpose means that we get a matrix out of it. Intuition says that this is a rotation matrix, but then what is the rotation?

2) What is the geometric interpretation of subtracting the two matrices $R - tn^\top$. My guess is that we are subtracting two rotations, but wouldn't this change the axis of rotation completely?

3) What is $d$? I have been told both depth from the camera to the point of intersection with the plane, and the disparity which I think is $\|t\|$.

1

There are 1 best solutions below

0
On

To parse this homography it might be more instructive to back up a step and look at the expression for the image of a point $\mathbf x$ on the first image plane (viz the proof of equation 13.1 on the previous page): $$\mathbf x'=\mathtt R\mathbf x-(\mathbf t\mathbf n^T/d)\,\mathbf x.$$ The two image planes are related by an orientation-preserving rigid motion—a rotation plus translation. The rotation appears as $\mathtt R$ in the first term. If you rewrite the second term as $-{\mathbf n^T\mathbf x\over d}\mathbf t$, you can see that it’s a translation in the direction of $\mathbf t$—the displacement of the second camera from the first—but the amount of translation isn’t fixed and depends on $\mathbf x$. The plane $\mathbf\pi_E=(\mathbf n^T,d)^T$ is at a (signed) distance of ${d\over\|\mathbf n\|}$ from the origin and ${\mathbf n^T\mathbf x\over\|\mathbf n\|}$ is the scalar projection of $\mathbf x$ onto $\mathbf n$, so you can consider the coefficient of $\mathbf t$ to be the ratio of these two values.

Another way to see where the factor $-{\mathbf n^T\mathbf x\over d}$ comes from is to examine the back-projection from the first image plane onto $\mathbf\pi_E$. The proof of equation 13.1 gives you one way to find this point, but I’ll take a slightly different path. From Marsh’s Applied Geometry for Computer Graphics and CAD we have the following formula for a central projection with viewpoint $\mathbf V$ onto the plane $\mathbf\pi$: $$\mathtt M=\mathbf\pi\mathbf V^T-(\mathbf\pi^T\mathbf V)\mathtt I_4.$$ When back-projecting from the first image plane to $\mathbf\pi_E=(\mathbf n^T,d)^T$, the viewpoint is at the origin and so $\mathbf V=(0,0,0,1)^T$ and $$\mathtt M=\left[\begin{array}{c|c} -d\mathtt I_3 & \mathbf 0 \\ \hline \mathbf n^T & 0 \end{array}\right].$$ Then $$\mathtt M\mathbf X=\mathtt M(\mathbf x^T,1)^T=(-d\mathbf x^T,\mathbf n^T\mathbf x)^T$$ which is equivalent to $\left(\mathbf x^T,-{\mathbf n^T\mathbf x\over d}\right)^T$. The back-projection of $\mathbf X$ is just the intersection point of the ray $(\mathbf x^T,\rho)^T$ and $\mathbf\pi_E$, so the factor $-{\mathbf n^T\mathbf x\over d}$ is inversely proportional to the distance between a point $\mathbf X$ on the first image plane and its back-projection onto $\mathbf\pi_E$. This is not the same as the distance from $\mathbf X$ to $\mathbf\pi_E$, however, since the distance is measured along a ray that in general is not perpendicular to $\mathbf\pi_E$.