I know how to find a 2 x 2 Hessian matrix but for machine learning, I'm getting confused since my multivariable calc class only dealt up to three dimensions, and in ML we're working with way more than 3 dimensions. How do I find the Hessian matrix for this?
In particular, I get that if we have the gradient, taking the partial derivatives of each gradient entry will yield the Hessian, but the thing is this solution is taking the partial derivatives by taking the whole derivative $w$ as opposed to the derivative w.r.t. say $w_i$
Also, how do we get from a column gradient vector to a n x n Hessian matrix? What step does this happen? To be honest, I don't see how the final answer is an n x n matrix

You seem to understand that if you take the partial derivative $\frac{\partial}{\partial w_i}$ with respect to the $j$th entry of the gradient $\frac{\partial}{\partial w_j}$, you get the second order partial $\frac{\partial^2}{\partial w_i \partial w_j}$. You can also think of this as taking a gradient of $\frac{\partial}{\partial w_j}$ to obtain the partial derivatives $\frac{\partial^2}{\partial w_1 \partial w_j}, \frac{\partial^2}{\partial w_1 \partial w_j}, \ldots, \frac{\partial^2}{\partial w_n \partial w_j}$ all at once. (And then you do this repeatedly for each $j$.) Disregarding the way this is notated, they are still fundamentally doing what you understand: computing second order partial derivatives. There are some shorthands they can use since the Hessian has the partial derivatives arranged in a certain way.
Perhaps you should do it in the way you understand (compute each second order partial derivative separately, and then arrange them together) to see that you can reproduce the expression that comes frmo the "matrix calculus" shorthand.
One thing to note is that $x_i x_i^\top$ is an outer product (not an inner product) and produces an $n \times n$ matrix (not a scalar).