The above image shows how a 3D object is projected onto a 2D image by a camera. Which makes perfect sense to me.
However it's then said that division by z is non linear (why?), so homogenous coordinates instead of euclidean coordinates should be used instead. Why does this "trick" help?
When transforming from 3D object space to homogenous coordinates, the final coordinate and final column in the transformation matrix isn't even used so why is it necessary?



The first transformation (in red) is non-linear. Note that you cannot multiply a matrix with $(x_s, y_s, z_s)$ to get $(x_i, y_i) = (f \frac{x_s}{z_s}, f \frac{y_s}{z_s})$. However, if you use a homogeneous coordinate system, then you can represent such transformation as linear function (the matrix product in the question colored in green).
Note that in homogeneous coordinate system $\forall a \neq 0, (ax, ay, a)$ refers to the same point and is represented by $(x, y, 1)$. So, the matrix product in green transforms the point $(x_s, y_s, z_s)$ to ... $$ \left( \begin{array}{c} u = f x_s \\ v = f y_s \\ w = z_s \end{array} \right) = \left( \begin{array}{c} f \frac{x_s}{z_s} \\ f \frac{y_s}{z_s} \\ 1 \end{array} \right) $$