Computing second order mixed partial derivatives

56 Views Asked by At

In this chapter, page 34 of the pdf they give us the equation: $$ \frac{\partial ^2F(x)}{\partial w_i \partial w_j} = g''(s) \frac{\partial s}{\partial w_i}\frac{\partial s}{\partial w_j} + g'(s) (\frac{\partial ^2F_{l_1q}(x)}{\partial w_i \partial w_j} + ... + \frac{\partial ^2F_{l_1q}(x)}{\partial w_i \partial w_j})$$ $g(s)$ is the one dimensional function at the output of the network $F(\cdot)$ for input $x$, where $s$ is equal to the sum over $F_{l_kq}(x)$ for $k$ from $1 : m$. Now they claim this is simple differential calculus, but it is slightly lost on me. Luckily there is an example a few pages further down (page 38 of the pdf).

It's a simple 2 unit network, which is given as: $F(x, y) = g(w_3x + w_5y + w_4f(w_1x + w_2y))$

Now I am trying to compute $\frac{\partial^2F(x,y)}{\partial w_1 \partial w_2}$ by hand so I can apply what I learned to the problem I actually have. So my approach is:

$$\frac{\partial^2 g}{\partial w_1 \partial w_2} = \frac{\partial}{\partial w_1}\begin{bmatrix} \frac{\partial g}{\partial w_2}\end{bmatrix} = \frac{\partial}{\partial w_1}\begin{bmatrix} \frac{\partial g}{\partial s}\frac{\partial s}{\partial f}\frac{\partial f}{\partial a}\frac{\partial a}{\partial w_2}\end{bmatrix}$$ Where $s$ is the input of $g$, and $a$ is the input of $f$.

Now I try to apply the product rule and get as far as (where the factor in the brackets is being derived): \begin{align} \frac{\partial}{\partial w_1}\begin{bmatrix} \frac{\partial g}{\partial s} \end{bmatrix}\frac{\partial s}{\partial f}\frac{\partial f}{\partial a}\frac{\partial a}{\partial w_2} &= \frac{\partial^2 g}{\partial s^2}\frac{\partial s}{\partial f}\frac{\partial f}{\partial a}\frac{\partial a}{\partial w_1}\frac{\partial s}{\partial f}\frac{\partial f}{\partial a}\frac{\partial a}{\partial w_2}\\ \frac{\partial g}{\partial s} \frac{\partial}{\partial w_1} \begin{bmatrix}\frac{\partial s}{\partial f}\end{bmatrix}\frac{\partial f}{\partial a}\frac{\partial a}{\partial w_2} &= \frac{\partial g}{\partial s}\frac{\partial^2 s}{\partial f^2}\frac{\partial f}{\partial a}\frac{\partial a}{\partial w_1} \frac{\partial f}{\partial a}\frac{\partial a}{\partial w_2}\\ &+ \frac{\partial g}{\partial s}\frac{\partial^2 s}{\partial f \partial b}\frac{\partial b}{\partial w_1} \frac{\partial f}{\partial a}\frac{\partial a}{\partial w_2}\\ &+ .... \end{align}

here $b$ is just some other summand of s, say $w_3x$. Now clearly the third line would $= 0$, but I'm just wondering, is my approach correct? Do I just continue multiplying $\frac{\partial}{\partial w_1}$ with each factor, am I doing that correctly anyway? I.e. would I end with something with a factor $\frac{\partial^2 a}{\partial w_1 \partial w_2}$ ?

Specifically the second line, as I understand it the first line is equal to the first summand of the first equation of this post, then would the second line be part of the second summand?

I hope this is clear enough, I think in typing this out I have understood more of it, but I would be grateful for some confirmation.

1

There are 1 best solutions below

0
On BEST ANSWER

Let's keep track of what you are talking about.

You've rewritten $F$ as

  • $F = g(s)$, where
  • $s = w_3x + w_5y + w_4f$, and
  • $f = f(a)$, and
  • $a = w_1x + w_2y$

where $x, y$ and all the $w_i$ are independent variables.

Now you express $$\begin{align}\frac{\partial F}{\partial w_1} &= \frac {dg}{ds}\frac{\partial s}{\partial w_1}\\ &=\frac {dg}{ds}\frac{\partial s}{\partial f}\frac{\partial f}{\partial w_1}\\ &=\frac {dg}{ds}\frac{\partial s}{\partial f}\frac{d f}{d a}\frac{\partial a}{\partial w_1}\end{align}$$ Note that in this calculation, you've already made use of the fact that $f$ is the only variable that $s$ depends on that itself may depend on $w_1$.

Now $$\begin{align}\frac{\partial^2 F}{\partial w_2\partial w_1} &= \frac{\partial }{\partial w_2}\left[\frac{dg}{ds}\frac{\partial s}{\partial f}\frac{df}{da} \frac{\partial a}{\partial w_1}\right]\\ &=\left[\frac{\partial }{\partial w_2}\frac{dg}{ds}\right]\frac{\partial s}{\partial f}\frac{df}{da} \frac{\partial a}{\partial w_1} + \frac{dg}{ds}\left[\frac{\partial }{\partial w_2}\frac{\partial s}{\partial f}\right]\frac{df}{da} \frac{\partial a}{\partial w_1}\\ &\quad+ \frac{dg}{ds}\frac{\partial s}{\partial f}\left[\frac{\partial }{\partial w_2}\frac{df}{da}\right]\frac{\partial a}{\partial w_1} + \frac{dg}{ds}\frac{\partial s}{\partial f}\frac{df}{da}\left[\frac{\partial^2 a }{\partial w_2\partial w_1}\right]\end{align}$$ Where

  • $\dfrac{\partial }{\partial w_2}\dfrac{df}{da} = \dfrac{d^2f}{da^2}\dfrac{\partial a}{\partial w_2}$
  • $\dfrac{\partial }{\partial w_2}\dfrac{\partial s}{\partial f} = \dfrac{\partial^2 s}{\partial f^2} \dfrac{\partial f}{\partial w_2} = \dfrac{\partial^2 s}{\partial f^2}\dfrac{df}{da}\dfrac{\partial a }{\partial w_2}$
  • $\dfrac{\partial }{\partial w_2}\dfrac{dg}{ds} = \dfrac{d^2g}{ds^2}\dfrac{\partial s}{\partial w_2} = \dfrac{d^2g}{ds^2}\dfrac{\partial s}{\partial f}\dfrac{df}{da}\dfrac{\partial a}{\partial w_2}$

So $$\begin{align}\frac{\partial^2 F}{\partial w_2\partial w_1} &= \dfrac{d^2g}{ds^2}\dfrac{\partial s}{\partial f}\dfrac{df}{da}\dfrac{\partial a}{\partial w_2}\frac{\partial s}{\partial f}\frac{df}{da} \frac{\partial a}{\partial w_1}\\ &\quad+ \frac{dg}{ds}\dfrac{\partial^2 s}{\partial f^2}\dfrac{df}{da}\dfrac{\partial a }{\partial w_2}\frac{df}{da} \frac{\partial a}{\partial w_1}\\ &\quad+ \frac{dg}{ds}\frac{\partial s}{\partial f}\dfrac{d^2f}{da^2}\dfrac{\partial a}{\partial w_2}\frac{\partial a}{\partial w_1}\\ &\quad+ \frac{dg}{ds}\frac{\partial s}{\partial f}\frac{df}{da}\frac{\partial^2 a }{\partial w_2\partial w_1}\end{align}$$

You worry in your calculation for $\frac{\partial}{\partial w_2}$ about the dependence of other variables in $s$ on $w_2$, but that ship of generality had already sailed when you assumed them independent of $w_1$ earlier. All of the $w_i$ should properly be independent of each other. One might have $x, y$ dependent on them, but none of the calculations appears to assume that, either.

Now for your particular $s$ and $a$,

  • $\dfrac{\partial s}{\partial f} = w_4$
  • $\dfrac{\partial^2 s}{\partial f^2} = 0$
  • $\dfrac{\partial a}{\partial w_1} = x$
  • $\dfrac{\partial a}{\partial w_2} = y$
  • $\dfrac{\partial^2 a}{\partial w_2\partial w_1} = 0$

So $$\begin{align}\frac{\partial^2 F}{\partial w_2\partial w_1} &= g''(s)w_4f'(a)yw_4f'(a) x + 0 + g'(s)w_4f''(a)yx+ 0\\ &=w_4^2xyg''(s)[f'(a)]^2 + w_4xyg'(s)f''(a) \end{align}$$