For 3d software, in the code, I'm changing a 3d point to a 2d point on a 2d plane, which represents a screen view, with the following method:
$x, y, z$ = given point in the 3d system.
$(X_u, Y_u, Z_u)$ = The transform for the 2d view plane Y-direction/upwards vector relative to the 3d system.
$(X_r, Y_r, Z_r)$ = The transform for the 2d view plane X-direction/rightwards vector relative to the 3d system.
View plane values: $x_1, y_1, z_1$
Equations:
$$x_1 = X_r x + Y_r y + Z_r z$$
$$y_1 = X_u x + Y_u y + Z_u z$$
$$z_1 = 0$$ Note that the software still tracks $z_1$ even though it always equals $0$
Now, I need to determine how to reverse this and determine the 3d point $x, y, z$. But I do not have the original 3d point. I do have values for:
The transforms for the 2d plane: $(X_u, Y_u, Z_u)$ and $(X_r, Y_r, Z_r)$
I, of course, have: $x_1, y_1, z_1$
I also happen to have the 3d x, y, z (say, $x_2$, $y_2$, $z_2$) values for a second point on the 2d plane (being the 3d values, this point has not been transformed, just like $x, y, z$ have not been transformed) The second point might be useful because $z_1$ was made to equal $0$. Though, please note, for my purposes, it should be okay if we pretend $$z_1 = 0$$ by way of a transform, so when it is reversed, the 3d point will actually lie on the 2d plane.
How can this be done?
A pure 3D rotation matrix is orthonormal, $$\mathbf{R} = \left[ \begin{matrix} \hat{e}_1 & \hat{e}_2 & \hat{e}_3 \end{matrix} \right ] = \left [ \begin{matrix} x_1 & x_2 & x_3 \\ y_1 & y_2 & y_3 \\ z_1 & z_2 & z_3 \end{matrix} \right ]$$ where the three (column) vectors form the basis: $$\begin{array}{lll} \hat{e}_1\cdot\hat{e}_1 = 1, ~ & \hat{e}_1\times\hat{e}_1 = 0 \\ \hat{e}_1\cdot\hat{e}_2 = 0, ~ & \hat{e}_1\times\hat{e}_2 = \hat{e}_3 \\ \hat{e}_1\cdot\hat{e}_3 = 0, ~ & \hat{e}_1\times\hat{e}_3 = -\hat{e}_2 \\ \hat{e}_2\cdot\hat{e}_1 = 0, ~ & \hat{e}_2\times\hat{e}_1 = -\hat{e}_3 \\ \hat{e}_2\cdot\hat{e}_2 = 1, ~ & \hat{e}_2\times\hat{e}_2 = 0 \\ \hat{e}_2\cdot\hat{e}_3 = 0, ~ & \hat{e}_2\times\hat{e}_3 = \hat{e}_1 \\ \hat{e}_3\cdot\hat{e}_1 = 0, ~ & \hat{e}_3\times\hat{e}_1 = \hat{e}_2 \\ \hat{e}_3\cdot\hat{e}_2 = 0, ~ & \hat{e}_3\times\hat{e}_2 = -\hat{e}_1 \\ \hat{e}_3\cdot\hat{e}_3 = 1, ~ & \hat{e}_3\times\hat{e}_3 = 0 \\ \end{array}$$ The nice property of such matrices is that their inverse is their transpose: $$\mathbf{R}^{-1} = \mathbf{R}^{T} = \left [ \begin{matrix} x_1 & y_1 & z_1 \\ x_2 & y_2 & z_2 \\ x_3 & y_3 & z_3 \end{matrix} \right ]$$ which is also an orthonormal matrix itself. So, to reverse a pure rotation, you just apply the transpose of the matrix: $$\vec{a} = \mathbf{R}\vec{b} \quad \iff \quad \vec{b} = \mathbf{R}^{T} \vec{a}$$
The problem with recovering 3D position from a 2D image is that the third component (depth or distance or $z$) is unknown. This means that each 2D point can be only converted to a line, perpendicular to the original 2D plane, passing through the point on that 2D plane.
When the same object is viewed from two different angles, and each point can be identified separately, the depth information can be obtained from the intersection of those two lines. This is also called stereoscopic vision or stereopsis, and is how depth perception works in us humans. A number of visual tricks rely on tricking the brain to consider two separate points, each seen by a separate eye, as the same one, and thus constructing an incorrect perception (of depth in particular).
Human depth perception is also aided by perceiving edges and overlaps, and the brain automatically assigning "depth" to each detail based on these cues. There are complex computer programs that do this also, that can recover some/enough depth information from a single photograph (things like rooms/walls, tables, rectangular objects; but not very well from things like faces or softly curving objects) to reproduce a 3D analog of that picture, but the mathematics and perception modeling is rather complex, and definitely not something one can explain, or even start to explain, in a StackExchange answer.