Partial derivative with respect to a sum

67 Views Asked by At

Suppose there are functions $J = f(\sum_k y_k)$ and $y_k = g(x)$, $k \in \mathbb{R}$. Now suppose I want to apply the chain rule to calculate the partial derivative of $J$ w.r.t. $x$. I.e.,

\begin{equation} \frac{\partial J}{\partial x} = \frac{\partial J}{\partial (\sum_k y_k)} \frac{\partial (\sum_k y_k)}{\partial x}. \end{equation}

Because the partial derivative is a linear map, we can indeed write the second term in the right as

\begin{equation} \frac{\partial \sum_k y_k}{\partial y} = \sum_k \frac{\partial y_k}{\partial x} \end{equation} which is standard notation.

But what about the first term $\frac{\partial J}{\partial (\sum_k y_k)}$ on the right?

Does the partial derivative w.r.t. a sum make sense?

If it does, what is the result?

If it does not, why?

2

There are 2 best solutions below

3
On

There are two problems here.

One is that you’re taking a partial derivative with respect to one variable $x$ without specifying which other variables you’re regarding as independent of $x$ and keeping constant. That’s not entirely your fault; our suboptimal notation for partial derivatives makes it easy to do that. As long as you don’t have any other variables that you want to treat as independent of $x$, you should just write a normal (total) derivative.

The second problem is that you seem to be confusing formal arguments and actual arguments. The function $f$ takes one formal argument (for which you haven’t introduced a name). The actual argument, i.e. the value that you substitute for the formal argument in this specific case, is $\sum_ky_k$. Derivatives (both total and partial) are taken with respect to the formal arguments, not with respect to the actual arguments. A conventional way to write this would be to introduce a name for the formal argument of the function $f$, say $f:z\mapsto f(z)$, and then write

$$ \frac{\mathrm dJ}{\mathrm dx}=\frac{\mathrm d}{\mathrm dx}f\left(\sum_ky_k\right)=\frac{\mathrm df}{\mathrm dz}\left(\sum_ky_k\right)\cdot\frac{\mathrm d}{\mathrm dx}\left(\sum_ky_k\right)=\frac{\mathrm df}{\mathrm dz}\left(\sum_ky_k\right)\cdot\sum_k\frac{\mathrm dy_k}{\mathrm dx}\;. $$

Having said all that, people do sometimes write things like

$$ \frac{\mathrm df(abx)}{\mathrm d(ab)}\;. $$

or

$$ \frac{\mathrm df\left(\sum_ky_k\right)}{\mathrm d\left(\sum_ky_k\right)}\;. $$

That’s OK as long as you know what you’re doing and it’s clear to everyone what you mean.

0
On

It helps me to use variables for the intermediate derivatives in a chain rule computation, rather than the function names. Let the functions be called $f(t)$ and $y_k(x)$, so $J(x) = f(y_1(x) + \dots + y_n(x)$. If you write it this way, you can see a single-variable-calculus derivation of the derivative.

However, let us use the multivariable chain rule to verify. We have $$ \frac{dJ}{dx} = \sum_k \frac{df}{dt} \frac{\partial t}{\partial y_k} \frac{dy_k}{dx} $$ If $t = y_1 + \dots y_k$, then $\frac{\partial t}{\partial y_k} = 1$. So $$ \frac{dJ}{dx} = \frac{df}{dt} \sum_k \frac{dy_k}{dx} $$ In Newtonian notation, $$ J'(x) = f'(g_1(x) + \dots + g_n(x))(g_1'(x) + \dots + g_n'(x)) $$