Prove that for $f(x) = \frac{x^\textrm{T}Ax}{x^\textrm{T}x}$ and any $v$, $v^\textrm{T}Hv=0$

250 Views Asked by At

$f(x) = \frac{x^\textrm{T}Ax} {x^\textrm{T}x}$, where $A$ is a symmetric $n \times n$ matrix and $x \neq 0$. I need to prove that if $H=H(f)(v)$, where $H(f)(v)$ is the Hessian of function $f$ evaluated for vector $v$, $v^\textrm{T}Hv=0$ for any $v$.

I have already proved that:

(a) $\frac{df}{dx}=0$ iff $x$ is an eigenvector of $A$;

(b) $f(cx) = f(x)$ for $c \in R$;

(c) $\lambda_\min \leq f \leq \lambda_\max$ where $\lambda_\max$ and $\lambda_\min$ are the maximal and minimal eigenvalues of $A$.

The hint to the question states the following: “ Hint: do not attempt to calculate the Hessian directly: the algebra is very messy. Use the intuition gave by (b), that a certain function is constant.”

3

There are 3 best solutions below

4
On

Let $v_1,v_2,\cdots, v_n$ be a basis of $R^n$ such that they are eigenvectors of $A$. By (a), $$ \nabla f(x)v_i=\frac{\partial f(x)}{\partial v_i}=0, \forall x\in R^n. $$ So \begin{eqnarray} v_j^TH(f)v_i=H(f)(v_i,v_j)=\lim_{t\to0}\frac{\nabla f(x+tv_i)(v_j)-\nabla f(x)(v_j)}{t}=0. \end{eqnarray} Hence for $\forall v\in R^n$, $v=\sum_{i=1}^nc_iv_i$, one has $$ vH(f)v=\sum_{i,j=1}^nc_ic_jv_j^HH(f)v_i=0. $$

0
On

I don't know how to use the hint. Perhaps you have some expression involving the Hessian that can be used. Here is a brute-force approach that seems to work for any square matrix $A$ (regardless of symmetry): We have $$ f(x) = \frac{x^{\top}Ax}{x^{\top}x} $$ Define $N(x)$ and $D(x)$ as numerator and denominator functions: $$ N(x) = x^{\top}Ax, \quad D(x) = x^{\top}x$$ Then $$ D(x) f(x) = x^{\top}Ax \quad $$ Taking the gradient of both sides, expressing the gradient as a row vector, gives $$ f(x) \nabla D(x) + D(x) \nabla f(x) = 2x^{\top}A \quad (*)$$ Now from derivative formulas we know $\nabla D(x) = 2x^{\top}$. Substituting this into (*) gives $$ 2f(x) x^{\top} + D(x) \nabla f(x) = 2x^{\top} A $$ Taking the derivative again gives $$ 2(\nabla f(x))^{\top}x^{\top} + 2f(x) I + (\nabla D(x))^{\top} \nabla f(x) + D(x) H(x) = 2A$$ where $I$ is the identity matrix. Substituting $\nabla D(x)=2x^{\top}$ gives $$ 2(\nabla f(x))^{\top} x^{\top} + 2f(x) I + 2x \nabla f(x) + D(x) H(x) = 2A$$ Substituting $D(x)=x^{\top} x$, assuming $x \neq 0$, and rearranging terms gives $$ \boxed{H(x) = 2\left(\frac{A - f(x) I - x \nabla f(x) - (\nabla f(x))^{\top}x^{\top}}{x^{\top} x}\right)}$$ Thus \begin{align} x^{\top} H(x)x &= 2\left(\frac{x^{\top}Ax - f(x) x^{\top}x -x^{\top}x \nabla f(x)x - x^{\top}(\nabla f(x))^{\top}x^{\top}x }{x^{\top} x}\right)\\ & = -2 \nabla f(x) x - 2 x^{\top} (\nabla f(x))^{\top}\\ &= -4 \nabla f(x) x \end{align}


It remains to show $\nabla f(x) x=0$ for all $x \neq 0$.

$$ f(x) = \frac{N(x)}{D(x)} $$ $$ \nabla f(x) = \frac{D(x)\nabla N(x) - N(x)\nabla D(x)}{D(x)^2} = \frac{D(x)2x^{\top}A - N(x) 2x^{\top}}{||x||^4}$$ $$ \nabla f(x) x = \frac{D(x)2x^{\top}Ax - N(x) 2x^{\top}x}{||x||^4}=0$$

6
On

$\newcommand{\R}{\Bbb{R}}\newcommand{\x}{\mathbf{x}}\newcommand{\a}{\mathbf{a}}\newcommand{\y}{\mathbf{y}}\newcommand{\v}{\mathbf{v}}$You can do this by using some facts from multivariable calculus.

Fix $\v \in \R^n$. Let $\phi:\R \to \R$ be defined by $\phi(c)=f(c\v)$. By part (b), we know that $\phi$ is constant, so $\phi'(c)=0$ and $\phi''(c)=0$ for all $c\in \R$.

The result will now follow by doing some multivariable calculus computations to show that in fact $\phi''(c) = \v^T (H(f)(c\v))\v$ (then putting $c=1$).

Firstly, note that $$\begin{align*} \phi'(c) &= D_c (f(c\v))\\ &= \v^T \nabla f(c\v) \quad (\text{using fact 1 from the bottom}) \end{align*} $$

where $\nabla f(\x)$ is the gradient vector of $f$ evaluated at $\x$ and is expressed as a column vector.

Therefore, we have

$$\begin{align*} \phi''(c) &= D_c (\v^T \nabla f(c\v)) \\ &= \v^T \left( D_c \nabla f(c\v)\right) \quad (\text{using fact 2 from the bottom})\\ &= \v^T \left( (H(f)(c\v)) \v\right) \quad (\text{ using fact 3 and multivariable chain rule}) \\ &= \v^T (H(f)(c\v)) \v. \end{align*} $$

Put $c=1$ and recall that $\phi''(1)=0$ to get the desired result.


A summary of the multivariable calculus facts used is given below.

If $h:\R^{n}\to\R^{m}$, then denote by $D_{\x} (h)$ ("the Jacobian of $h$") the $m\times n$ matrix whose $i,j$ entry is equal to $\dfrac{\partial h_i}{\partial x_j}$ for $i=1,\ldots,m$ and $j=1,\ldots,n$. (Here $h(\x) = \begin{bmatrix} h_1(\x) \\ \vdots \\ h_m(\x)\end{bmatrix}$ where $h_i:\R^n \to \R$ are the components of $h$ and $\x = \begin{bmatrix} x_1 \\ \vdots \\ x_n\end{bmatrix} \in \R^n$.) If we evaluate the Jacobian at a point $\a\in \R^n$, we may denote this as $D_{\x} (h)(\a)$ or $D_{\x} (h(\a))$.

The multivariable chain rule tells us that if $h:\R^{n}\to\R^{m}$ and $g: \R^{p}\to\R^{n}$, with the composite function $h\circ g : \R^p \to \R^m$, denoting $\y = g(\x)$ where $\x\in \R^n$ and $\y\in\R^p$, we have $D_{\x}(h\circ g) = (D_{\y} h)(D_{\x} g)$ (matrix product). Note in this formula, if $D_{\x}(h\circ g)$ is evaluated at $\a\in \R^p$, then $D_{\y} h$ should be evaluated at $\mathbf{b} = g(\a)\in \R^n$ and $D_{\x} g$ should be evaluated at $\a\in\R^p$.

I leave as an exercise to you to show the following facts (with some hints), which were used in some form above:

Fact 1. Let $f:\R^n\to \R$ and $\v$ a fixed vector in $\R^n$. Let $\phi:\R\to \R, \quad \phi(c)=f(c\v)$. Then $(D_c \phi) (c)\equiv \phi'(c) = \v^T \nabla f(c\v)$, where $\nabla f(c\v)$ is the gradient vector of $f$ evaluated at $c\v$. (Use multivariable chain rule to show this.)

Fact 2. (Derivative can "move inside" product with fixed matrix) Let $h:\R^n\to \R^m$ and $B$ be a fixed $p\times m$ matrix. Let $g:\R^n \to \R^p$ be given by $g(\x) = B h(\x)$ for $\x\in \R^n$. Then $D_{\x} g = B \left(D_{\x} h\right)$. (You can show this very easily using the multivariable chain rule.)

Fact 3. (Hessian of a function is equal to Jacobian of gradient) Let $f:\R^n\to \R$ be a twice continuously differentiable function. Then $H(f)(\x) = D_{\x}(\nabla f(\x))$ for all $\x\in \R^n$.