Chain Rule twice (second order condition for optimality)

84 Views Asked by At

While studying nonlinear programming (more specifically, second order conditions for optimality), I came across the following:

Let $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ and $d, x^{*} \in \mathbb{R}^{n}$. Also, let $g: \mathbb{R} \rightarrow \mathbb{R}$ be such that:

$$g(\alpha):=f\left(x^{*}+\alpha d\right)$$

The first derivative of $g$ with respect to $\alpha$ equals (by the chain rule):

$$ g^{\prime}(\alpha)=\nabla f\left(x^{*}+\alpha d\right) \cdot d $$

Now, what I would like is to differentiate again with respect to $\alpha$ to obtain the following expression (stuck here):

$$ g^{\prime \prime}(\alpha)=\sum_{i, j=1}^{n} f_{x_{i} x_{j}}\left(x^{*}+\alpha d\right) d_{i} d_{j} $$

Attempt:

What I tried doing to make things easier was to consider $n = 2$ and change notation a little bit. In such case, when differentiating for the first time, we would have:

$$ \left \langle (f_{x_{1}}\left(x^{*}+\alpha d\right), f_{x_{2}}\left(x^{*}+\alpha d\right)),\left(d_{1}, d_{2}\right)\right\rangle $$

Now, I tried differentiating $$ f_{x_{1}}\left(x^{*}+\alpha d\right) d_{1}+f_{x_{2}}\left(x^{*}+\alpha d\right) d_{2} $$

with respect to $\alpha$ using the product rule and the chain rule, resulting in:

$$ f_{x_{1}x_{1}}\left(x^{*}+\alpha d\right) d_{1} + f_{x_{2}x_{2}}\left(x^{*}+\alpha d\right) d_{2} $$

which is ALMOST: $$ g^{\prime \prime}(\alpha)=\sum_{i, j=1}^{2} f_{x_{i} x_{j}}\left(x^{*}+\alpha d\right) d_{i} d_{j} $$

I just cannot figure out where does $d_{j}$ comes from, which is the missing term. It is very likely that it came from the chain rule. However, I also cannot make sense as to why the result would be $d_{j}$. It looks like it is the first component $d_{1}$ in $d \in \mathbb{R^{2}}$.

Can someone help? I am really struggling.

Thanks, Lucas

1

There are 1 best solutions below

5
On BEST ANSWER

As a general fact, if $\vec r, \vec s:\Bbb R \to \Bbb{R}^n$ are vector-valued functions then there is a product rule for dot products which you can easily prove in components:$$\frac{d (\vec r \cdot \vec s)}{dt} = \frac{d\vec r}{dt} \cdot \vec s(t) + \vec r(t) \cdot \frac{d \vec s}{dt}.$$

In your case, $g'(\alpha) = \nabla f(x^* + \alpha \vec d) \cdot \vec d$, where $\vec d$ can be thought of as a constant vector-valued function of $\alpha$, so applying the above we get $$g''(\alpha) = \frac{d \nabla f(x^* + \alpha \vec d)}{d \alpha} \cdot \vec d + \nabla f(x^* + \alpha \vec d) \cdot \underbrace{\frac{d \vec d}{d \alpha}}_{0} = \boxed{\frac{d \nabla f(x^* + \alpha \vec d)}{d \alpha} \cdot \vec d}.$$

By the same reasoning as before, and denoting the first partial of $f$ by $f_1$ and so on, $$\frac{d \nabla f(x^* + \alpha \vec d)}{d \alpha} = \begin{pmatrix} \frac{d f_1(x^* + \alpha \vec d)}{d\alpha} \\ \frac{d f_2(x^* + \alpha \vec d)}{d\alpha} \\ \vdots \end{pmatrix} = \begin{pmatrix} \nabla f_1(x^* + \alpha \vec d) \cdot \vec d \\ \nabla f_2(x^* + \alpha \vec d) \cdot \vec d \\ \vdots \end{pmatrix} = \begin{pmatrix} \sum_{j=1}^n f_{1j}(x^* + \alpha \vec d) d_j \\ \sum_{j=1}^n f_{2j}(x^* + \alpha \vec d) d_j \\ \vdots \end{pmatrix} $$

so dotting this with $\vec d$ we get $$g''(\alpha) = \sum_{i=1}^n \left( \sum_{j=1}^n f_{ij}(x^* + \alpha \vec d)d_j \right)d_i = \sum_{i,j=1}^n f_{ij}(x^* + \alpha \vec d)d_i d_j.$$