Confusion related to calculation of derivative

103 Views Asked by At

I have this function \begin{align} &s = f(\theta,x),\\ &s_1 = f(\theta,x_1),\\ &s_2 = f(\theta,x_2),\\ &P = A^T \left[\begin{array}{cc} s_1 & 0 \\ 0 & s_2 \end{array} \right] A. \end{align}

Now if I take the partial derivative of $P$ wrt $\theta$, is it equal to $$\displaystyle\frac{\partial P}{\partial \theta} = \frac{\partial P}{\partial s_1} \frac{\partial s_1}{\partial \theta} + \frac{\partial P}{\partial s_2} \frac{\partial s_2}{\partial \theta}\quad ?$$

2

There are 2 best solutions below

0
On

Your notation suggests that $s_1$ and $s_2$ are functions of $\theta$ (alone) - is that correct?

Let's just consider $$P = \left(\begin{array}{cc}s_1 & 0 \\ 0 & s_2\end{array}\right)$$ to begin. (assuming $A$ is a constant matrix, it has no effect on this calculation). Then

$$ \frac{\partial P}{\partial s_1} = \left(\begin{array}{cc}1 & 0 \\ 0 & 0\end{array}\right); \quad \frac{\partial P}{\partial s_2} = \left(\begin{array}{cc}0 & 0 \\ 0 & 1\end{array}\right) $$

so $$ \frac{\partial P}{\partial s_1}\frac{\partial s_1}{\partial \theta} + \frac{\partial P}{\partial s_2}\frac{\partial s_2}{\partial \theta} = \left(\begin{array}{cc}1 & 0 \\ 0 & 0\end{array}\right)\frac{\partial s_1}{\partial \theta} + \left(\begin{array}{cc}0 & 0 \\ 0 & 1\end{array}\right)\frac{\partial s_2}{\partial \theta}\\ = \left(\begin{array}{cc}\frac{\partial s_1}{\partial \theta} & 0 \\ 0 & \frac{\partial s_2}{\partial \theta}\end{array}\right) $$ so it's kind of a trivial application of the chain rule. It would be less trivial if $P$ depended on $s_1$ and $s_2$ in a more complicated way, e.g. $$ P = \left(\begin{array}{cc}s_1^2 & s_1 - s_2 \\ s_1s_2 & s_2\end{array}\right) $$

0
On

Yes.

You have $g(s) = A^T \begin{bmatrix} s_1 & 0 \\ 0 & s_2 \end{bmatrix} A$, and some function $\sigma_x(\theta) = f(\theta,x) $ and you have $P = g \circ \sigma_x$.

Then $DP(\theta) = Dg(\sigma_x(\theta)) D \sigma_x(\theta)$.

Since $g$ is linear, we have $Dg(s) = g$, and we have $D \sigma_x(\theta) = \frac{\partial f ( \theta, x)}{\partial \theta} = \begin{bmatrix} \frac{\partial f_1 ( \theta, x)}{\partial \theta} \\ \frac{\partial f_2 ( \theta, x)}{\partial \theta} \end{bmatrix}$, so you have $DP(\theta) = A^T \begin{bmatrix} \frac{\partial f_1 ( \theta, x)}{\partial \theta} & 0 \\ 0 & \frac{\partial f_2 ( \theta, x)}{\partial \theta} \end{bmatrix} A$.

To see it in the form you wrote, note that $Dg(s)(h) = g(h)$, that is $Dg(s)(h) = \frac{\partial g ( s)}{\partial s_1 } h_1 + \frac{\partial g ( s)}{\partial s_2 } h_2$, then $DP(\theta) = \frac{\partial g ( s)}{\partial s_1 } \frac{\partial f_1 ( \theta, x)}{\partial \theta} + \frac{\partial g ( s)}{\partial s_2 } \frac{\partial f_2 ( \theta, x)}{\partial \theta} = A^T \begin{bmatrix} 1 & 0 \\ 0 & 0\end{bmatrix} A \frac{\partial f_1 ( \theta, x)}{\partial \theta} + A^T \begin{bmatrix} 0 & 0 \\ 0 & 1\end{bmatrix} A \frac{\partial f_2 ( \theta, x)}{\partial \theta}$.