3D Math: How to make a camera

1.2k Views Asked by At

I was able, using code, to make/rotate/move around a 3D cube. Now I want to make a camera, that is movable. How exactly would I do that? What math is involved?
I have read posts like this, but I do not understand how they do it, and I do not want the type of camera they are talking about. I want to have up/downs.
Can someone please explain to me how they do it in the above post, or show me another way to do it? I am sorry if this is a stupid question.

1

There are 1 best solutions below

0
On BEST ANSWER

You need a location for the camera $\vec{c} = (c_x , c_y , c_z)$, and the orientation of the camera. The best way to describe the orientation of the camera is with an unit quaternion, $\mathbf{q} = (q_w, q_i, q_j, q_k)$. If someone talks to you about Euler angles or Tait-Bryan angles, or rotations around axes, they're trying to trap you and lead you astray; no self-respecting programmer uses those.

First, let's look at the operations on unit quaternions we need.

After any operation that modifies or produces a new orientation, you want to make sure it is an unit quaternion. You do this by normalizing the quaternion to unit length, by dividing all four components with the square root of the sum of the squares of the components, $\sqrt{q_w^2 + q_i^2 + q_j^2 + q_k^2}$. This does not affect the orientation, just makes sure it behaves correctly in future operations: $$\left\lbrace ~ \begin{aligned} q_w^\prime &= \frac{q_w}{\sqrt{q_w^2 + q_i^2 + q_j^2 + q_k^2}} \\ q_i^\prime &= \frac{q_i}{\sqrt{q_w^2 + q_i^2 + q_j^2 + q_k^2}} \\ q_j^\prime &= \frac{q_j}{\sqrt{q_w^2 + q_i^2 + q_j^2 + q_k^2}} \\ q_k^\prime &= \frac{q_k}{\sqrt{q_w^2 + q_i^2 + q_j^2 + q_k^2}} \\ \end{aligned} \right. \tag{1}\label{1}$$ In general, normalizing vectors to unit length also means dividing its components by the square root of the sum of their squares.

If we have an orientation described by $\mathbf{q}$, and we want to rotate it by $\mathbf{p}$, both unit quaternions, we calculate $\mathbf{q}^\prime = \mathbf{p}\mathbf{q}$ using Hamilton product: $$\left\lbrace ~ \begin{aligned} q_w^\prime &= p_w q_w - p_i q_i - p_j q_j - p_k q_k \\ q_i^\prime &= p_w q_i + p_i q_w + p_j q_k - p_k q_j \\ q_j^\prime &= p_w q_j - p_i q_k + p_j q_w + p_k q_i \\ q_k^\prime &= p_w q_k + p_i q_j - p_j q_i + p_k q_w \\ \end{aligned} \right. \tag{2}\label{2}$$ Note that the original orientation is rightmost, and rotation to be applied leftmost.

If we have two orientations, $\mathbf{q}$ and $\mathbf{p}$, we can interpolate between them, using $0 \le t \le 1$, $$\left\lbrace ~ \begin{aligned} q_w^\prime &= (1-t)q_w + t p_w \\ q_i^\prime &= (1-t)q_i + t p_i \\ q_j^\prime &= (1-t)q_j + t p_j \\ q_k^\prime &= (1-t)q_k + t p_k \\ \end{aligned} \right. \tag{3}\label{3}$$ but you'll need to normalize the result $\mathbf{q}^\prime$ to unit length as discussed earlier in $\eqref{1}$. Also, if $q_w p_w \lt 0$, you need to negate all four components of $\mathbf{p}$ (or $\mathbf{q}$, if you like) first, or the interpolation will be the "long way around". (You can negate all four components of an unit quaternion, and it won't affect the orientation it describes at all.) For $t = 0$, $\mathbf{q}^\prime = \mathbf{q}$; for $t = 1$, $\mathbf{q}^\prime = \mathbf{p}$. If $\mathbf{q}^\prime$ is the direction of our camera or our eyes, this interpolation traces the direction change around a great circle; a very natural way.

For camera movement from one static orientation to another, you may wish to use a smoother $t$, as the above leads to "jerky" stop and start for the change. Using $$t^\prime = 3 t^2 - 2 t^3$$ will give a much smoother transition, and $$t^\prime = 6 t^5 - 15 t^4 + 10 t^3$$ even smoother start and stop, no jerk at all.

If you want to apply just a part of a rotation, interpolate as above between the current orientation and fully rotated current orientation, and normalize the result. This is very useful if you have Newtonian mechanics, and have a varying time step length $t$.

The "no rotation" quaternion is $(1, 0, 0, 0)$, i.e. $w$ component 1, and all other components zero.

To construct an orientation from scratch, pick an unit axis $(a_x, a_y, a_z)$, $a_x^2 + a_y^2 + a_z^2 = 1$ (or divide each component by $\sqrt{a_x^2 + a_y^2 + a_z^2}$ to make it an unit axis), and an angle $\theta$ around that axis. Then, $$\left\lbrace ~ \begin{aligned} q_w &= \cos\left(\frac{\theta}{2}\right) \\ q_i &= a_x \sin\left(\frac{\theta}{2}\right) \\ q_j &= a_y \sin\left(\frac{\theta}{2}\right) \\ q_k &= a_z \sin\left(\frac{\theta}{2}\right) \\ \end{aligned} \right . \tag{4}\label{4}$$ If you want to invert a rotation, just negate its $i$, $j$, and $k$ components.

The default orientation (in the projection scheme below) is that we view towards positive $z$ axis, with $x$ increasing right, and $y$ up, in the projection plane.

You can chain as many rotations as you like, by multiplying them oldest right, newest left, using Hamilton product as shown in $\eqref{2}$, as long as you occasionally remember to normalize it to unit length as shown in $\eqref{1}$. You see, when using floating-point numbers, they're not exact, and rounding errors creep in. For unit quaternions, the errors are distributed in such a fashion that unit normalization clears them out without biasing towards any direction, unlike e.g. if you try to normalize a rotation matrix $\mathbf{R}$.

Just remember that the order of rotations, and therefore the order of multiplications matter; it is not a commutative operation.

If you have say a multi-jointed limb, with the rotation of each limb described by a quaternion, you can "undo" the rotations by multiplying their inverses (so negating the $i$, $j$, $k$ components) in the reverse order.

When we wish to rotate points, we construct a 3×3 rotation matrix based on the unit quaternion, $\mathbf{R}$: $$\mathbf{R} = \left[ \begin{matrix} 1 - 2 (q_j^2 + q_k^2) & 2 (q_i q_j - q_k q_w) & 2 (q_i q_k + q_j q_w) \\ 2 (q_i q_j + q_k q_w) & 1 - 2 (q_i^2 + q_k^2) & 2 (q_j q_k - q_i q_w) \\ 2 (q_i q_k - q_j q_w) & 2 (q_j q_k + q_i q_w) & 1 - 2 (q_i^2 + q_j^2) \\ \end{matrix} \right] \tag{5}\label{5}$$ This matrix is orthonormal, and its inverse is its transpose.

To apply a rotation to a point $\vec{p}$, we do $\vec{p}^\prime = \mathbf{R}\vec{p}$: $$\left[ \begin{matrix} r_{11} & r_{12} & r_{13} \\ r_{21} & r_{22} & r_{23} \\ r_{31} & r_{32} & r_{33} \\ \end{matrix} \right] \left[ \begin{matrix} x \\ y \\ z \end{matrix} \right ] = \left[ \begin{matrix} r_{11} x + r_{12} y + r_{13} z \\ r_{21} x + r_{22} y + r_{23} z \\ r_{31} x + r_{32} y + r_{33} z \\ \end{matrix} \right] \tag{6} \label{6}$$

If you want to do a rotation around a point $\vec{c}$, it is easiest to do using $\vec{p}^\prime = \vec{c} + \mathbf{R}(\vec{p} - \vec{c})$. In other words, substract the center of rotation coordinates from the points before rotation, and afterwards add them back.

The 3D projection is easiest to do if you change the coordinates so that your eye, or camera, is at origin. This means that if your camera is at $\vec{c}$, with orientation $\mathbf{R}$, you calculate point coordinates using $$\vec{p}^\prime = \left[ \begin{matrix} x \\ y \\ z \end{matrix} \right] = \mathbf{R}(\vec{p} - \vec{c}) \tag{7}\label{7}$$ Then, to project the coordinates to 2D, you use $$\left\lbrace ~ \begin{aligned} x^\prime &= \displaystyle x \frac{d}{z} \\ y^\prime &= \displaystyle y \frac{d}{z} \\ \end{aligned} \right. \tag{8}\label{8}$$ where $d$ is the distance from the camera to the projection plane, and determines your field of view. It also means that any point with $z \lt d$, is "behind" the projection plane, and thus invisible.

There is one more operation, and that is constructing an orientation towards some specific thing. Usually, two vectors are supplied: the target towards which the camera points to, and either an "up" or a "right" vector, directions in 3D space that will be vertical or horizontal in the camera view. This is easiest to do by constructing the rotation matrix $\mathbf{R}$ directly, then recovering the unit quaternion from the rotation matrix. For best numerical stability, there are three different formulas which are used, depending on which of the diagonal elements in the rotation matrix are largest in magnitude, so it's a bit tedious to write, so I'll leave it out from here.

That is basically all there is to it, really. With these, you can do any 3D graphics you like, and not be subject to e.g. gimbal lock like you would be with Euler angles.


In 3D libraries, rotation and translation is often combined into a 4×4 matrix: $$\left[ \begin{matrix} x^\prime \\ y^\prime \\ z^\prime \\ 1 \end{matrix} \right] = \left[ \begin{matrix} X_x & Y_x & Z_x & T_x \\ X_y & Y_y & Z_y & T_y \\ X_z & Y_z & Z_z & T_z \\ 0 & 0 & 0 & 1 \\ \end{matrix} \right] \left[ \begin{matrix} x \\ y \\ z \\ 1 \end{matrix} \right] \iff \left[ \begin{matrix} x^\prime \\ y^\prime \\ z^\prime \end{matrix} \right] = \left[ \begin{matrix} X_x & Y_x & Z_x \\ X_y & Y_y & Z_y \\ X_z & Y_z & Z_z \\ \end{matrix} \right ] \left[ \begin{matrix} x \\ y \\ z \end{matrix} \right] + \left[ \begin{matrix} T_x \\ T_y \\ T_z \end{matrix} \right]$$ For projection, the libraries can use something called homogenous coordinates: $$\left[\begin{matrix} \frac{x}{z} \\ \frac{y}{z} \\ \frac{d}{z} \end{matrix} \right] = \left[ \begin{matrix} x \\ y \\ d \\ z \end{matrix} \right]$$ where the normal 3D coordinates are on the left, and the homogenous coordinates for the same point are on the right. These have some useful properties when used with the 4×4 matrix above, but suffice it to say, it's just an easier way to write the same operations as above, and is a form that a lot of current display hardware can accelerate.