
I have been reading about it for quite a while, but I still could not understand very well the steps it takes to compose a perspective projection matrix. Can anyone explain all the matrices and their respective jobs in composing a final perspective projection matrix?
I'm not familiar with that formula in the picture you provided, but here are my thoughts.
A perspective projection onto a horizontal plane $\{z = a\}$, with focus $(0,0,0)$ and focal length $a$, maps a point $(x,y,z)$ to $$P(x,y,z) = \left(\frac{ax}{z}, \frac{ay}{z}, a\right) = \frac az(x,y,z)$$ Verify that any point on the plane $(x,y,a)$ maps to itself.
In projective space, two points (excluding the origin itself) are equivalent if they're on the same line through the origin; for any number $k \neq 0$, $$(x,y,z,w) \sim (kx,ky,kz,kw)$$
So the projection becomes, for any point on the space $\{w = 1\}$, $$P(x,y,z,1) = \left(\frac{ax}{z}, \frac{ay}{z}, a, 1\right) = \frac az\left(x,y,z,\frac za\right) \sim \left(x,y,z,\frac za\right)$$ converting to matrix notation, $$= \begin{bmatrix} x \\ y \\ z \\ z/a \end{bmatrix} = \begin{bmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&1/a&0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix}$$ (Notice, it doesn't matter whether $w$ is $1$ or anything else, because of the column of zeroes. I don't know if that's important here.)
So, for a projection through the point $(0,0,0,1)$ onto a horizontal plane $\{(x,y,a,1)\}$, the matrix is
$$P = \begin{bmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&1/a&0 \end{bmatrix}$$
Now, to make a projection with any focus and any plane orientation, we only need to apply translations and rotations to what we already have. If the focus is $\vec c$ (and the focal length and orientation are the same as before), then the projection of $\vec x$ is $P(\vec x - \vec c) + \vec c$.
Similarly, if the plane's orientation is different, so that the normal vector is $R(0,0,1)$ for some rotation matrix $R$ (and the focus is $(0,0,0)$), then the projection is $RP(R^{-1}\vec x)$.
If the focus and orientation are both different, then the above formulas are combined into $RP(R^{-1}(\vec x - \vec c)) + \vec c$.
But this is a 3D formula. In 4D, the translation can be represented by a matrix
$$T = \begin{bmatrix} 1&0&0&c_x \\ 0&1&0&c_y \\ 0&0&1&c_z \\ 0&0&0&1 \end{bmatrix}$$
and the rotation matrix gets extra $0$'s and a $1$ in the corner. Now the projection is
$$P'(\vec x) = TRPR^{-1}T^{-1}\vec x$$ $$P' = TRPR^{-1}T^{-1}$$