Attaching the problem and the answer first
Problem:
Answer scheme:

I'm interested in the solution to part b of the question. Why is it that the distance from A to the line, after optimization, gave the shortest distance between the line and the plane? Nothing special is mentioned about the point, so I'd assumed this method worked for any point on the plane. But that is not case since $(0,0,-4)$ gave $\frac{7\sqrt{30}}6$. So after thinking about it, it seems like this proof skipped the step of projecting the shortest distance AB onto the normal vector of the plane, since [there exists already a normal line of the plane which intersects the line and happens to just pass through A] $(*)$. My question is, am I missing something obvious here? How was it known in the proof that $(*)$ is true (perhaps from work done in part a)?
By definition, if a line is parallel to a plane, it should have the same distance from any point on it to the shortest point on the plane.
Another way to calculate the distance between these points is by using the following formula:
Find two linearly independent vectors from the plane ($\vec u$, $\vec v$).
Find a vector from any point on the plane to the desired point on the line ($\vec w$).
Calculate the determinant of the matrix:
| $\vec v$ | | $\vec u$ | | $\vec w$ |
This will give you the volume of the parallelepiped formed by these three vectors.
Using this formula will always provide the shortest distance from a point to a surface.