Computing partial derivative of a matrix-valued function.

173 Views Asked by At

Suppose I had a function y = Xw + b1 where X is an N by D matrix and W is a D-dimensional vector and 1 is a D-dimensional vector of 1's.

Now I define another function $\xi = \frac{1}{2N}|| $y $ - $ t$||^2$

Note that is a vector of scalars.

If I want to take the partial derivative $\frac{\partial\xi}{\partial \textbf{y}}$, I am a bit confused as to how to work this out.

Would I be computing the gradient?

$\frac{\partial\xi}{\partial \textbf{y}} = (\frac{\partial\xi}{\partial y_1}, ..., \frac{\partial\xi}{\partial y_n})$

But then computing an arbitrary i'th partial in the above gradient is:

$\frac{\partial\xi}{\partial y_i} = \frac{\partial\xi}{\partial y_i}(\frac{1}{2N}\sum_{j=1}^N(y_j - t_j)^2) = \frac{1}{N}(y_j - t_j) \cdot \frac{\partial}{\partial y_i}(\textbf{Xw} + b\textbf{1}) $

which is where I get stuck.

Any help appreciated!

2

There are 2 best solutions below

7
On BEST ANSWER

$\begin{align}\frac{1}{2N}\frac{\partial}{\partial y_i}||y-t||^2=\\ =\frac{1}{2N}\frac{\partial}{\partial y_i}\sum_{i=1}^n(y_i-t_i)^2=\\ \frac{1}{N}(y_i-t_i) \end{align}$

1
On

As gabriele cassese's answer indicated, if you are taking the derivative of $\xi$ with respect to $y_i$, you don't have to worry about where the $y_i$s came from at all. This answer is about how to find the derivative of $\xi$ with respect to $x_{ij}$, since you indicated an interest in that in a comment.

I find it easiest to approach these sorts of problems if I do everything with scalars and convert back to vectors and matrices later if necessary. So let's rewrite your original function for $y$, doing the matrix multiplication by hand: $$y_i = \sum_j x_{ij}w_j + b_i$$ We also have $$\xi = \frac1{2N}\sum_k(y_k-t_k)^2$$ (I have changed the index to $k$ for clarity in the following step.)

Now we can use the chain rule: \begin{align} \frac{\partial\xi}{\partial x_{ij}} &= \sum_k \frac{\partial \xi}{\partial y_k}\frac{\partial y_k}{\partial x_{ij}} \\ &= \sum_k(y_k-t_k)\frac{\partial}{\partial x_{ij}}\left(\sum_\ell x_{k\ell}w_\ell+b_k\right) \\ &= \sum_k(y_k-t_k)w_j\delta_{ik} \\ &= (y_i-t_i)w_j. \end{align} The third line works because the only way for $x_{ij}$ to have an effect on $y_k$ is if $i=k$ and $j=\ell$. (If you haven't encountered the Kronecker delta $\delta_{ik}$ before, it is equal to $1$ if $i=k$ and $0$ otherwise.)

You could now substitute the definition of $y_i$ back into that final equation (but be careful with your indices), if you wanted your final answer in terms of $\mathbf X$ and $b$.