So I'm a bit stuck on a problem that wants me to show the relation between any image point and it's corresponding 3D point can be represented by a 3x3 matrix. My idea was to use the general form of the camera model which has the 3D point transposed multiplied with the extrinsic and intrinsic factors to get the image point, but there are quite a few unknowns(image center, focus length and the like). I'm having issues with trying to put the general form of a plane and using that to transform the image point into it's 3D counterpart. I'm not entirely sure about how the general form of a plane can fit in into this sort of transformation, so I do believe that's the sticking issue that I'm having trouble with.
Update: I'm using the pinhole camera model in the most general form: (x1,x2,x3)T = MintMext (Xw, Yw, Zw, 1)T
First, some notation: upper-case bold letters for homogeneous coordinate vectors of points in $\mathbb{RP}^3$ and lower-case bold for points in $\mathbb{RP}^2$; a tilde over the symbol will indicate the corresponding inhomogeneous Cartesian coordinate vector in $\mathbb R^3$ and $\mathbb R^2$, respectively. We have the projection $\mathbf x = \mathtt P\mathbf X$ from the world to the image. I’m assuming a finite camera, so that $\mathtt P$ is a full-rank $4\times3$ matrix. The columns of this matrix are designated $\mathbf p_1$ through $\mathbf p_4$.
The back-projection of an image point $\mathbf x$ is a world ray that emanates from the camera center $\mathbf C$. (If you don’t have the center handy, you can compute it from $\mathtt P$ using the fact that $\mathtt P\mathbf C=0$.) By decomposing $\mathtt P$ into $[\mathtt M\mid\mathbf p_4]$, we find that $[(\mathtt M^{-1}\mathbf x)^T; 0]^T$ is the point at infinity that projects to $\mathbf x$. The back-projected ray is then the join of this point and the camera center, $\tilde{\mathbf C}+\lambda\mathtt M^{-1}\mathbf x = \mathtt M^{-1}(\lambda \mathbf x-\mathbf p_4)$ in inhomogeneous Cartesian coordinates. This back-mapping can’t be represented by a $3\times3$ matrix, but if you assume that $\tilde{\mathbf C}$ is the origin, the inhomogeneous direction vector of the ray is enough to describe it, and that’s just $\mathtt M^{-1}\mathbf x$.
There is a different decomposition of $\mathtt P$ that connects it more transparently to the image plane, although it’s not nearly as convenient as the above decomposition. In case you didn’t know, a plane with implicit Cartesian equation $ax+by+cz+d=0$ can be represented by the homogeneous vector $\mathbf\Pi=[a,b,c,d]^T$ in $\mathbb{RP}^3$: just write the equation as $\mathbf\Pi^T\mathbf x=0$. Central projection onto $\mathbf\Pi$ relative to the viewpoint $\mathbf C$ is given by the matrix $$\mathtt M=\mathbf C\mathbf\Pi^T-(\mathbf C^T\mathbf\Pi)\mathtt I_4.$$ (When $\tilde{\mathbf C}=0$ this matrix has a particularly simple form.) The camera projection transformation can then be viewed as central projection onto the image plane $\mathbf\Pi$ followed by an affine transformation $\mathtt A$ that maps the image plane onto the $x$-$y$ plane, and finally deletion of the $z$-coordinate, i.e., $$\mathtt P = \begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&0&1\end{bmatrix} \mathtt A \mathtt M.$$ To back-project an image point $\mathbf x$, we can reverse the last two steps, producing a point on the image plane and then, assuming again that the camera is at the world origin, delete the last coordinate of the result to get the ray’s direction vector in $\mathbb R^3$. (Technically, we should project the point on the image plane onto the plane at infinity first, but that projection is just a matter of setting the last coordinate to zero.) This transformation cascade is accomplished by the $3\times3$ matrix $$\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\end{bmatrix} \mathtt A^{-1} \begin{bmatrix}1&0&0\\0&1&0\\0&0&0\\0&0&1\end{bmatrix},$$ which is just $\mathtt A^{-1}$ with its last row and third column deleted. $\mathtt A$ can be derived from the world-to-camera transformation and the camera’s intrinsic matrix, but I won’t go into the details here.