How do i get a view frames axis directions given a camera's direction vector?

1.4k Views Asked by At

Say I have some custom view orientation in which the global coordinate frame looks like the picture below with no axis aligned to the edges of the computer monitors screen:

xyz-global

Say the direction vector from the origin away from the camera (away from you) is given by this vector:

Vector (-0.3159932494163513, 0.5070327520370483, -0.8019139766693115)

(Note that i am working in orthographic projections if that matters.)

But this is only one direction vector perpendicular to the computers monitor.

How would I get the direction vector parallel to the top (or bottom) edge of the monitor (i.e, the horizontal direction)? Likewise, how would i get the direction vector parallel to the left (or right) side edge of the monitor (i.e, the vertical direction)?

Do i have enough information to find the rest of this frames axis directions?

2

There are 2 best solutions below

2
On BEST ANSWER

The horizontal and vertical screen axes are gotten by a different method mentioned in the comments. They are black in color.

Amd posted a method and gave his values for u and v for the camera direction I provided. These are the blue lines.

amd's u and v lines in blue

I did the steps described by amd and projected my own points then read off the values. These gave the following red lines.

Mark's u and v lines in red

Something is off!

There is not enough information to find the remain two axis directions that are parallel to edges of the monitor. One cannot position the XYZ-frame to the monitor just by moving one of it's axis in a parallel direction to the view direction through the origin. The moved XYZ-frame can still take any orientation around it's view directed axis. More information is needed to fix it's orientation.

29
On

The short answer: Let $O$ be the screen image of the world-coordinate origin, so that we have $X-O=(x_h,x_v)$, $Y-O=(y_h,y_v)$ and $Z-O=(z_h,z_v)$ for the unit world-coordinate axis vectors in screen coordinates. Then the world coordinates of the direction vectors for the edges of the screen are simply $(x_h,y_h,z_h)$ and $(x_v,y_v,z_v)$.


Orthographic (a.k.a. orthogonal) projection makes this fairly easy to compute. This projects onto some plane perpendicular to the camera’s axis $\mathbf n$, which for the purposes of this problem we can take as passing through the origin. The projection can be represented as a change of basis from the global $xyz$ coordinate system to the camera’s $x'y'z'$ coordinate system, which is accomplished with a rotation, and then projecting orthogonally onto some plane perpendicular to the camera’s axis—the $z'$-axis. This projection amounts to deleting the $z'$ coordinate. There’s also a translation involved since the origin of the viewport coordinate system isn’t necessarily at the point that corresponds to the camera’s axis, but I’ll suppress this aspect of the projection since it has no effect on what you’re trying to compute.

Since orthographic projection is parallel to the camera’s axis, the actual location of the image plane is irrelevant—only its attitude, i.e., the camera’s orientation, matters. In terms of this orientation, which can be described via pitch, yaw and roll from some standard orientation, the unit vector $\mathbf n$ that gives the camera’s facing takes care of pitch and yaw, and we’re trying to recover the roll.

For simplicity, assume that the image plane passes through the origin. To avoid a change of orientation, the unit vector $\mathbf n$ that corresponds to the camera’s axis defines the negative $z'$-axis of the camera coordinate system. Let $\mathbf u$ and $\mathbf v$ be a pair of orthogonal unit vectors that define “right” and “up,” respectively for the camera. These two vectors are parallel to the edges of the viewport, so are the ones we’re trying to find. Together with $\mathbf n$ these vectors define the $x'y'z'$ camera coordinate system. They form an orthonormal basis for $\mathbb R^3$ and so are related to the world coordinate system by a rotation. (For a left-handed viewport coordinate system as is common in computer graphics, take $\mathbf n$ as defining the positive $z'$-axis instead.)

enter image description here

(The image plane is moved away from the world-coordinate origin for clarity.)

Recall that the columns of a transformation matrix are the images of the basis, so the matrix that maps from the camera coordinate system to the world coordinate system has $\mathbf u$, $\mathbf v$ and $-\mathbf n$ as its columns. The inverse of a rotation matrix is its transpose, so the matrix that maps from world coordinates to camera coordinates has these vectors as its rows. The projection is therefore given by the matrix product $$P=\left[\begin{array}{rrr}1&0&0\\0&1&0\end{array}\right]\begin{bmatrix}\mathbf u^T\\\mathbf v^T\\-\mathbf n^T\end{bmatrix}=\left[\begin{array}{r}\mathbf u^T\\\mathbf v^T\end{array}\right]=\left[\begin{array}{rrr}u_x&u_y&u_z\\v_x&v_y&v_z\end{array}\right]=\begin{bmatrix}P\mathbf e_x&P\mathbf e_y&P\mathbf e_z\end{bmatrix}.$$ The last equality highlights the fact that the columns of $P$ are the images of the world-coordinate basis vectors (expressed in viewport coordinates), so $\mathbf u$ and $\mathbf v$ can be read directly from the coordinates of these images.

Now, the image plane to screen mapping typically involves some scaling. If the horizontal and vertical scale factors are equal, then you can use the resulting vectors as is or normalize/scale them as needed. Otherwise (e.g., for non-square pixels), you’ll need to retrieve the scale factors from the camera’s intrinsic matrix.

For example, let’s say that with the unit normal in your question, you have for direction vectors of the images of unit world coordinate basis vectors the screen vectors $(18.5907, -3.80073)$, $(6.69494, 15.8854)$, and $(-3.09257, 11.5416)$, respectively. Taking the first coordinates of these vectors we have $(18.5907,6.69494,-3.09257)$, which when normalized is $\mathbf u=(0.929535, 0.334747, -0.154629)$. Similarly, the second coordinates produce $\mathbf v=(-0.190036,0.794268,0.577082)$. If you form the matrix $U=\begin{bmatrix}\mathbf u&\mathbf v&\mathbf n\end{bmatrix}$, you can verify that $U^TU=I$ so that this is an orthonormal set of vectors, and that $\det U=1$, so that it is a right-handed basis. This computation also verifies that for the projection matrix $P$ formed from these vectors, $P\mathbf u=(1,0)$ and $P\mathbf v=(0,1)$, which is what was wanted.


This works because of a reciprocity between orthonormal coordinate systems, which itself is a result of the symmetry of inner products. For any basis $\mathscr B=\{\mathbf e_1,\mathbf e_2,\mathbf e_3\}$ of $\mathbb R^3$, the coordinates of a vector $\mathbf v$ are the coefficients of the unique linear combination $\mathbf v=c_1\mathbf e_1+c_2\mathbf e_2+c_3\mathbf e_3$. If the basis is orthonormal, we have $\mathbf v=(\mathbf v\cdot\mathbf e_1)\mathbf e_1+(\mathbf v\cdot\mathbf e_2)\mathbf e_2+(\mathbf v\cdot\mathbf e_3)\mathbf e_3$, that is, the coordinates of $\mathbf v$ are the scalar projections of $\mathbf v$ onto the basis vectors. If we have another orthonormal basis $\mathscr B'=\{\mathbf e_1',\mathbf e_2',\mathbf e_3'\}$, we have $$\begin{align} \mathbf e_1 &= (\mathbf e_1\cdot\mathbf e_1')\mathbf e_1'+(\mathbf e_1\cdot\mathbf e_2')\mathbf e_2'+(\mathbf e_1\cdot\mathbf e_3')\mathbf e_3' \\ \mathbf e_2 &= (\mathbf e_2\cdot\mathbf e_1')\mathbf e_1'+(\mathbf e_2\cdot\mathbf e_2')\mathbf e_2'+(\mathbf e_2\cdot\mathbf e_3')\mathbf e_3' \\ \mathbf e_3 &= (\mathbf e_3\cdot\mathbf e_1')\mathbf e_1'+(\mathbf e_3\cdot\mathbf e_2')\mathbf e_2'+(\mathbf e_3\cdot\mathbf e_3')\mathbf e_3'.\end{align}$$ At the same time, we also have $$\begin{align} \mathbf e_1' &= (\mathbf e_1'\cdot\mathbf e_1)\mathbf e_1+(\mathbf e_1'\cdot\mathbf e_2)\mathbf e_2+(\mathbf e_1'\cdot\mathbf e_3)\mathbf e_3 \\ \mathbf e_2' &= (\mathbf e_2'\cdot\mathbf e_1)\mathbf e_1+(\mathbf e_2'\cdot\mathbf e_2)\mathbf e_2+(\mathbf e_2'\cdot\mathbf e_3)\mathbf e_3 \\ \mathbf e_3' &= (\mathbf e_3'\cdot\mathbf e_1)\mathbf e_1+(\mathbf e_3'\cdot\mathbf e_2)\mathbf e_2+(\mathbf e_3'\cdot\mathbf e_3)\mathbf e_3.\end{align}$$ Since $\mathbf a\cdot\mathbf b=\mathbf b\cdot\mathbf a$, we can see from the above that the $\mathscr B'$-coordinates of $\mathbf e_1$ are equal to the first $\mathscr B$-coordinates of the elements of $\mathscr B'$ and vice-versa, and similarly for the other basis vectors. This reciprocity is what allows us to read the world coordinates of $\mathbf u$ and $\mathbf v$ from the viewport/screen coordinates of $X$, $Y$ and $Z$, above.