A user is looking at his screen, interacting with a camera set in a world at position $(0, 0, 0)$, looking at $(0, 0, -1)$ and with an "up" direction of $(0, 1, 0)$.
The user's viewport width is $w$ and his viewport height is $h$.
The camera's focal length is $f$, the focal width is $w_f$ and focal height is $h_f$.
Vectors can be represented in the four coordinate systems:
- the viewport coordinate system (in pixels) ($x_p$),
- the film coordinate system ($x_f$),
- the camera coordinate system ($x_c$),
- the world coordinate system ($x_w$).
And converted via the following transformation equations:
- $x_f = T_{f \leftarrow p} x_p$
- $x_c = T_{c\leftarrow f} x_f = T_{c\leftarrow f} T_{f\leftarrow p} x_p $
- $x_w = T_{w\leftarrow c}x_c = T_{w\leftarrow c}T_{c\leftarrow f} T_{f\leftarrow p} x_p$
As a user drags his mouse on the screen from the initial position $p_i$ to the final position $p_f$, I want to find the transformation $T_{w\leftarrow c}$ such that
- $p_{i_{world}} = p_{f_{world}}$
- The up direction is maintained
In other words, the point in the world under the mouse "follows" the mouse as it is being dragged and the camera does not experience roll about the "look at" vector. Similar to how your pointer follows your finger while panning on google maps, but in 3D with a fixed camera looking at a 3D picture.
Here's what I got so far:
In this situation, $T_{c\leftarrow f}$ and $T_{f\leftarrow p}$ are constant since the camera's film size, the camera's focal length and the viewport size are constant.
This means that provided I know the initial mouse position $p_{i_{pixels}}$ and final mouse positions $p_{f_{pixels}}$. Then (1) can be rexpressed as
$$ p_{i_{world}} = T_{f_{w \leftarrow c}} p_{f_{camera}} $$
Since I can calculate both $p_{i_{world}}$ and $p_{f_{camera}}$, then I can find an axis-angle representation of $T_{f_{w \leftarrow c}}$.
$$ \theta = \cos^{-1} \left( \frac{ a \cdot b }{ |a| |b| } \right) $$ $$ \hat{n} = \frac {a \times b} {| a \times b |} $$
Where $a = p_{i_{world}}$ and $b = p_{f_{camera}}$.
I can turn that into a quaternion if I want, but if I apply that to my camera, I lose condition (2) and the user experiences some dizzying roll of his camera.
What kind of transformation should I do to get a $T_{w \leftarrow c}$ that maintains the up condition and $p_{i_{world}} = p_{f_{world}}$?
For those of you who can make do with an approximate solution. I found one using small-angle approximations and Euler angles.
First, $T_{w \leftarrow c}$ can be expressed using Tait-Bryan angles in the $YXZ$ order (yaw about $y$, pitch about $x'$, roll about $z^{\prime\prime}$).
In this order, condition (2) (maitaining the up direction) can be respected if we restrict the roll angle to $0$.
For $\theta$ is the yaw angle about the $y$ axis and $\phi$ is the pitch angle about the $x'$ axis, the corresponding transformation matrix is the following:
\begin{equation} T_{w \leftarrow c} = \begin{bmatrix} C_{\theta} & S_\theta S_\phi & S_\theta C_\phi \\ 0 & C_\phi & - S_\phi \\ -S_\theta & C_\theta S_\phi & C_\theta C_\phi \end{bmatrix} \end{equation}
But because the notation is getting tedious, from now on
\begin{equation} T = T_{w\leftarrow c} \\ p_{1_c} = p_{i_{camera}} \\ p_{2_c} = p_{f_{camera}} \\ p_w = p_{world}. \end{equation}
Recalling condition (1),
\begin{equation} \label{eq:T} p_w = T_2 p_{2_c} = T_1 p_{1_c} \end{equation}
Now, since we both know $p_{1_c}$ (the corresponding mouse position at the last frame) and $p_{2_c}$ (the mouse position at the current frame). We know that
\begin{equation} \label{eq:d} p_{1_c} = D p_{2_c} \end{equation}
And thus,
\begin{equation} T_2 p_{2_c} = T_1 D p_{2_c} \\ \Rightarrow T_2 = T_1 D \\ \end{equation}
Great! If I can solve (or approximate) $D$, I can find $T_2$!
Since $p_{1_c}$ and $p_{2_c}$ are "not far away" from each other. We can assume that they are some small $\Delta \theta$ and small $\Delta \phi$ away from one another.
\begin{equation} D = \begin{bmatrix} C_{\Delta\theta} & S_{\Delta\theta} S_{\Delta\phi} & S_{\Delta\theta} C_{\Delta\phi} \\ 0 & C_{\Delta\phi} & - S_{\Delta\phi} \\ -S_{\Delta\theta} & C_{\Delta\theta} S_{\Delta\phi} & C_{\Delta\theta} C_{\Delta\phi} \end{bmatrix} \end{equation}
Using small-angle approximations, we know
$$ \sin x \approx x \\ \cos x \approx 1 - \frac{x^2}{2} $$
Knowing this and $p_{1_c} = D p_{2_c}$, we can get second degree polynomial equations for $\Delta\theta$ and $\Delta\phi$. In fact, from the second row of the matrix, we find that
$$ 0 = - \frac{y_2}{2} \Delta\phi^2 -z_2 \Delta\phi + (y_2 - y_1) \\ $$
For which we can find two solutions. We will choose the one that respects the small-angle assumption.
$$ \Delta\phi = \frac{-b \pm \sqrt{b^2 - 4 a c}}{2a} $$
And finally, from the first row of the matrix we get,
$$ 0 = - \frac{x_2}{2} \Delta\theta^2 - (\Delta\phi y_2 + z_2 ( 1 - \frac{\Delta\phi ^2}{2})) \Delta\theta + (x2 - x1) $$
For which we can find two solutions. And by choosing the one that respects the small-angle assumption, we have successfully approximated $D$!
$$ \Delta\theta = \frac{-b \pm \sqrt{b^2 - 4 a c}}{2a} $$
Now that $D$ is approximated, we can calculate $T_2$ by adding $\Delta\theta$ and $\Delta\phi$ to the angles of $T_1$.
We're done!