Why does $\frac{\textbf{g}^T\textbf{d}}{\textbf{d}^T\textbf{H}\textbf{d}}$ give the maximum of function $\mathcal{D}(\textbf{x}+\lambda\textbf{d})$

59 Views Asked by At

Let say I have to find the value $\lambda^*$, that maximizes the following quantity:

$$\lambda^*=\underset{\lambda\in \Lambda}{\text{arg max}}\;\;\mathcal{D}(\textbf{x}+\lambda\textbf{d}),$$

where $\textbf{x},\textbf{d}\in \mathbb{R}^n,\;\lambda\geq0$ and $\mathcal{D}$ is a quadratic function. I read from a paper by Bottou and Lin (page 12) that:

Since $\mathcal{D}$ is a quadratic function, $\mathcal{D}(\textbf{x}+\lambda\textbf{d})$ is shaped like a parabola. The location of its maximum $\lambda^*$ is easily computed using Newton's formula:

$$\lambda^*=\frac{\frac{\partial\mathcal{D}(\textbf{x}+\lambda\textbf{d})}{\partial \lambda}\biggr\rvert_{\lambda=0}}{\frac{\partial^2\mathcal{D}(\textbf{x}+\lambda\textbf{d})}{\partial \lambda^2}\biggr\rvert_{\lambda=0}}=\frac{\textbf{g}^T\textbf{d}}{\textbf{d}^T\textbf{H}\textbf{d}},$$

where $\textbf{g}=\nabla\mathcal{D}(\textbf{x})$ is the gradient vector and $\textbf{H}$ is the hessian of $\mathcal{D}$.

I tried searching for a derivation for this but so far I didn't find any. So my question is: Why does

$$\underset{\lambda\in \Lambda}{\text{arg max}}\;\;\mathcal{D}(\textbf{x}+\lambda\textbf{d})=\frac{\textbf{g}^T\textbf{d}}{\textbf{d}^T\textbf{H}\textbf{d}}\;\;\;?$$

A derivation would be nice :) Thank you for any help!

1

There are 1 best solutions below

4
On BEST ANSWER

From the second order Taylor epansion we get $$ \mathcal D(x+\lambda d) = \mathcal D(x) + \lambda g^T d + (1/2)\lambda^2 d^T H d + O(\lambda^3) $$ Taking the partial derivative regarding $\lambda$ we get $$ F(\lambda) = \frac{\partial \mathcal D(x+\lambda d)}{ \partial \lambda} = g^Td + \lambda d^THd + O(\lambda^2) \quad (*) $$ It seems they continue with a Newton Raphson iteration step to find a root, using initial value $\lambda_0=0$: $$ \lambda_{n+1} = \lambda_{n} - \frac{F(\lambda_n)}{F'(\lambda_n)} \Rightarrow \\ \lambda_1 = -\frac{F(0)}{F'(0)} = -\frac{g^Td}{d^THd} $$ which is the claim except for the sign. If we neglect the $O(\lambda^2)$ terms in $(*)$, we get the same.