$y^TX^TXy$ achieves minimum when its second derivative is positive definite.

1.1k Views Asked by At

Prove that $y^T X^T X y$ achieves minimum value when its matrix of second derivative is a positive definite matrix.

I differentiated with respect to $y$ and got that the second derivative of this term only depends on $X^T X$ and thus $X^T X$ is our matrix of second derivative but I don't see the connection between then minimum and positive definiteness in this case and am unable to proceed with the proof.

3

There are 3 best solutions below

0
On BEST ANSWER

Any critical point $\mathbf{y_0} \in \mathbb{R}^n$ (where the gradient $\nabla f(\mathbf{y})$ is zero) of a multi-variable function $f(\mathbf{y}): \mathbb{R}^n\rightarrow\mathbb{R}$ is a minimum if the Hessian (matrix of second derivatives) $Hf(\mathbf{y})$ of this function evaluated at that critical point (i.e. $Hf(\mathbf{y_0})$) is a positive definite matrix. Your function is no exception!

If you approximate your function $\mathbf{y}^TX^TX\mathbf{y}=f(\mathbf{y})$ (or any other one) using a second order approximation (Taylor expansion) you get (knowing that $\nabla f(\mathbf{y_0})=\mathbf{0}$):

$f(\mathbf{y})\approx F(\mathbf{y})=f(\mathbf{y_0})+\frac{1}{2}(\mathbf{y}-\mathbf{y_0})^T Hf(\mathbf{y_0})(\mathbf{y}-\mathbf{y_0})$.

If $\mathbf{y_0}$ is the point where the function is minimum then $f(\mathbf{y_0})$ is the value of this minimum (it's the minimum itself). Which means that the function $f(\mathbf{y})$ has to have values that are $>f(\mathbf{y_0})$ for this point $\mathbf{y_0}$ to be a minimum (by definition!).

In other words, the term (called quadratic form) $\frac{1}{2}(\mathbf{y}-\mathbf{y_0})^T Hf(\mathbf{y_0})(\mathbf{y}-\mathbf{y_0})$ has to be $>0$. And this is exactly what it means for $Hf(\mathbf{y_0})$ to be positive definite (i.e. its quadratic form is positive).

Now for your function, $f(\mathbf{y})=\mathbf{y}^TX^TX\mathbf{y}$. Its Hessian is equal to $2X^TX$ (because $X^TX$ is symmetric, you can check this!)

This means that for the Hessian to be positive definite, $X^TX$ has to be positive definite, and for $X^TX$ to be positive definite your function (which is also a quadratic form) has to be positive.

Conclusion:

  • For your question "Prove that $\mathbf{y}^TX^TX\mathbf{y}$ achieves minimum value when its matrix of second derivative is a positive definite matrix". The answer is that this holds for all multi-variable functions not only this one.
  • For the Hessian of your function to be positive definite at the critical point, your function itself has to be positive at that point.
5
On

If it's second derivative is positive definite, $X^TX$ is positive definite.

$y^TX^TXy > 0, \forall y \neq 0$

Choose $y=0$ to show that the minimum can be attained.

3
On

Let $X_{n\times p}$ and $y\in \mathbb{R}^p$. So your minimization problem is $\min_{y\in \mathbb{R}^p}||Xy||^2_2$. As such, $$ ||Xy||_2^2=y'X'Xy, $$ derivation w.r.t $y$ gives you the gradient of this quadratic form, i.e., the critical point is $$ 2X'Xy=0. $$ You get a homogeneous system of equation that has a unique trivial soluation iff $X'X$ is a full rank matrix. Thus, let $b\neq 0, b\in \mathbb{R}^p$, thus $b'X'=c, c\in \mathbb{R}^n$, hence $$ b'X'Xb=c'c=\sum_{i=1}^nc_i^2\ge0. $$ I.e., $X'X$ is positive semi-definite and if $X'X$ is a full rank matrix then it is strictly positive definite.

That is, $y'X'Xy$ attains its minimum if the Hessian, $X'X$, is strictly positive definite matrix.