Assume V + Ax = b is the equation where V is the vectors of residuals, A is the matrix for coefficients, x is the vector for unknowns, and b is the vector for observation.
It is common to read something like "The least squares estimator is obtained by minimizing V. Therefore we set the partial derivative of V^(t)V with respective to x equal to zero..."
What I don't understand is that by doing so, we may actually find a maximum point for V instead of minimum, since we haven't check for the sign in both direction. Why they can just assume getting a minimum point?