Suppose I had a function y = Xw + b1 where X is an N by D matrix and W is a D-dimensional vector and 1 is a D-dimensional vector of 1's.
Now I define another function $\xi = \frac{1}{2N}|| $y $ - $ t$||^2$
Note that is a vector of scalars.
If I want to take the partial derivative $\frac{\partial\xi}{\partial \textbf{y}}$, I am a bit confused as to how to work this out.
Would I be computing the gradient?
$\frac{\partial\xi}{\partial \textbf{y}} = (\frac{\partial\xi}{\partial y_1}, ..., \frac{\partial\xi}{\partial y_n})$
But then computing an arbitrary i'th partial in the above gradient is:
$\frac{\partial\xi}{\partial y_i} = \frac{\partial\xi}{\partial y_i}(\frac{1}{2N}\sum_{j=1}^N(y_j - t_j)^2) = \frac{1}{N}(y_j - t_j) \cdot \frac{\partial}{\partial y_i}(\textbf{Xw} + b\textbf{1}) $
which is where I get stuck.
Any help appreciated!
$\begin{align}\frac{1}{2N}\frac{\partial}{\partial y_i}||y-t||^2=\\ =\frac{1}{2N}\frac{\partial}{\partial y_i}\sum_{i=1}^n(y_i-t_i)^2=\\ \frac{1}{N}(y_i-t_i) \end{align}$