Let's assume we have a camera located at coordinate 190, 170, 150. It is facing YPR: -135, 45, 0. It produces a 1600x900 image, and it sees a point in the viewport 30 units away, off center by 20 pixels up and 10 pixels to the right of center.
I'm trying to find the world coordinate of the point that it sees.
In terms of steps, I am thinking about it in this way:
1.First, update the camera's YPR so that the point in question is at the center of the image. Use the known FoV of 87 degrees, center offset of (cx = -20, cy = -10), and image size of 1600 x 900 to update the camera's YPR:
yaw *= (-10 / 1600) * 87 # convert pixel diff to angle diff around z axis (pixel x)
pitch *= (-20 / 900) * 87 # convert pixel diff to angle diff around x axis (pixel y)
roll = roll # unchanged
2.Next, calculate a 3D rotation matrix? Perform a series of rotations given the known distance (i.e. radius)?
3.Apply the translation matrix [x1, y1, z1] and return the coordinates
Doing this for one axis of rotation is pretty straightforward:
def calculate_point(x1, y1, angle, c):
angle = math.radians(angle)
a = c * math.cos(angle)
b = c * math.sin(angle)
x2 = x1 + a
y2 = y1 + b
return round(x2, 2), round(y2, 2)
# prints (8.66, 9.66)
print(calculate_point(3, 4, 45, 8))
Extending this to 2 and 3 dimensions is where I get lost. I want to create a function that takes as input x1, y1, z1, roll, pitch, yaw, distance, and returns x2, y2, z2 (the second point).

If the camera is positioned at $C$ and the frame of reference specifying the directions of the camera horizontal direction and vertical direction is given by the $3 \times 3$ matrix $R$, then, if $r$ is any point in space, and $p$ is the same point expressed in the camera reference frame coordinates, the two are related by
$ r = C + R p \hspace{15pt}(1)$
From which,
$ p = R^T (r - C) \hspace{15pt}(2)$
The generated image in the frame of reference of the camera is given by,
$ I = (I_x, I_y, I_z) = \left(-\dfrac{ f }{ p(3)}\right) p \hspace{15pt}(3) $
Note that $I_z = - f $ for all points.
This point $I$ is converted to pixels on the screen with horizontal $N_x$ pixels and vertical $N_y$ pixels as follows
$ X = \begin{bmatrix} N_x / 2 \\ N_y / 2 \end{bmatrix} + \begin{bmatrix} \dfrac{N_x}{W} && 0 \\ 0 && -\dfrac{N_y}{H} \end{bmatrix} \begin{bmatrix} I_x \\ I_y \end{bmatrix} \hspace{15pt}(4)$
If you plug in $I$, this becomes
$ X = \begin{bmatrix} N_x / 2 \\ N_y / 2 \end{bmatrix} - \dfrac{1}{p(3)} \begin{bmatrix} \alpha_x && 0 \\ 0 && \alpha_y \end{bmatrix} \begin{bmatrix} p_x \\ p_y \end{bmatrix} \hspace{15pt}(5)$
Where $\alpha_x = \dfrac{f \ N_x}{W} $ and $\alpha_y = - \dfrac{f \ N_y}{H} $
The two constants $\alpha_x$ and $\alpha_y$ are specific of the camera, and can be easily determined using a test image.
Now back to the problem. You are given an image with a point $X$ on it. The line connecting the center of the camera $C$ with the actual point $r$ in space corresponding to it, will cross the projection plane of the camera at a specific point, say $Q = [q_x, q_y, -f] $
Equation $(5)$ tell us that
$ \dfrac{1}{f} \begin{bmatrix} q_x \\ q_y \end{bmatrix} = \begin{bmatrix} \dfrac{1}{\alpha_x} && 0 \\ 0 && \dfrac{1}{\alpha_y} \end{bmatrix} \left( X - \begin{bmatrix} N_x/2 \\ N_y/2 \end{bmatrix} \right) \hspace{15pt}(6)$
The distance between $C$ and the second point whose coordinates we want is known. Let this distance be $d$. And let's call the second point $G$. Then we know, from equation $(1)$, that
$ G = C + R p $
We also know that
$ p = t Q = t (q_x , q_y, -f) = (t f) (\dfrac{q_x}{f}, \dfrac{q_y}{f}, -1) \hspace{15pt}(7)$
Let $\lambda = t f$, and let $ v = (\dfrac{q_x}{f}, \dfrac{q_y}{f}, -1)$, then $v$ is completely known. Hence, we now have,
$ G = C + \lambda R v $ , where $\lambda \gt 0 \hspace{15pt}(8)$
We know the distance between $G$ and $C$ is $d$, hence
$ d = \| G - C \| = \lambda \| R v \| = \lambda \| v \|\hspace{15pt}(9) $
and this implies that
$ \lambda = \dfrac{d }{ \| v \| } = \dfrac{d}{\sqrt{((q_x/f)^2 + (q_y/f)^2 + 1)}} \hspace{15pt}(10) $
Note that $f$ is not known, but $\dfrac{q_x}{f}$ and $\dfrac{q_y}{f}$ are known (from equation $(6)$).
This completes the specification of the second point $G$, just plug in the found $\lambda$ into $(8)$.
As for the matrix $R$ given in the question, where it is specified as $YPR = (-135^\circ, 45^\circ, 0^\circ) $. The world $xy$ plane is assumed to be horizontal. Also the initial direction in which camera is facing is also horizontal.
Therefore, initially, before any rotations, the camera local frame is such that: its $x$ axis is coincident with the world $x$ axis, and its $y$ axis is coincident with the world $z$ axis, and its $z$ axis is coincident with the $(-y)$ world axis. This orientation of the camera (before any Yaw/Pitch/Roll rotations) is obtained for a camera, whose axes are coincident with the world axes, by a rotation about the local $x$ axis by an angle of $+90^\circ$.
I assumed that Yaw is the first rotation, and it is about the local $y$ axis, followed by Pitch rotation which is about the local $x$ axis, and finally Roll rotation which is about the local $(-z)$ axis, which is a rotation about the local $z$ axis with a negative angle. This translates into the product of the following matrices
$ R = R_x(90^\circ) R_{y'}(\phi_y) R_{x''} (\phi_x) R_{z'''} (-\phi_z) $
with $\phi_y = -135^\circ, \phi_x = 45^\circ, \phi_z = 0^\circ $
These rotations are made with respect to the local frame of reference before each rotation.
When evaluating the above product, ignore that it is about the local axis, just get the order correct , left to right is first to last (instead of last to first). These rotations are given here.
With the rotation matrix for the given YPR is
$R = \begin{bmatrix} - \dfrac{1}{\sqrt{2}} && -0.5 && - 0.5 \\ - \dfrac{1}{\sqrt{2}} && 0.5 && 0.5 \\ 0 && \dfrac{1}{\sqrt{2}} && -\dfrac{1}{\sqrt{2}} \end{bmatrix} $
Suppose we run a simple test to determine the parameters $\alpha_x$ and $\alpha_y$ of the camera. Suppose we place the camera at the origin, in the standard orientation parallel to the axes. And we place a point $A = (15, 20, 20)$ (expressed in world coordinates) in front of the camera (note that the direction the camera is pointing at is the positive world $y$ axis, which is the negative of its own $z$ axis).
This means that in local camera coordinates $p= (15, 20, -20) $
Further, suppose we observe that the image of this point on the screen of the camera is at $(1025, 150) $
Substituting these values in equation $(5)$, we get
$ \begin{bmatrix} 1025 \\ 150 \end{bmatrix} = \begin{bmatrix} 800 \\ 450 \end{bmatrix} + \dfrac{1}{20} \begin{bmatrix} \alpha_x && 0 \\ 0 && \alpha_y \end{bmatrix} \begin{bmatrix} 15 \\ 20 \end{bmatrix} \hspace{15pt}(5)$
And this reads,
$ 1025 = 800 + 0.75 \alpha_x $
$ 150 = 450 + \alpha_y$
From which, we $\alpha_x = 300 $, $\alpha_y = -300 $
Back to the question. The image $X$ in pixels is $(10, 20)$ off the center, therefore,
$ X - \begin{bmatrix} N_x / 2 \\ N_y / 2 \end{bmatrix} = \begin{bmatrix} 10 \\ 20 \end{bmatrix} $
Plugging this into equation $(6)$, gives us
$ \dfrac{1}{f} \begin{bmatrix} q_x \\ q_y \end{bmatrix} = \begin{bmatrix} \dfrac{1}{\alpha_x} && 0 \\ 0 && \dfrac{1}{\alpha_y} \end{bmatrix} \begin{bmatrix} 10 \\ 20 \end{bmatrix} \hspace{15pt}(6)$
Using the found values of $\alpha_x$ and $\alpha_y$, gives us
$ \dfrac{1}{f} \begin{bmatrix} q_x \\ q_y \end{bmatrix} = \begin{bmatrix} \dfrac{1}{30} \\ - \dfrac{1}{15} \end{bmatrix} $
Therefore, vector $v$ is given by
$ v = ( \dfrac{1}{30} , - \dfrac{1}{15} , - 1) $
And this means that (see equation $(10)$)
$ \lambda = \dfrac{30}{\| v \|} = 29.9170122892$
Finally, the second point that we're after is
$ G = C + \lambda R v = \begin{bmatrix} 190 \\ 170 \\ 150 \end{bmatrix} + 29.9170122892 R v $
Direct calculation of this gives
$ G = (205.25, 153.34, 169.74) $