I have been struggling with deriving a gradient in matrix form for the following function. (w.r.t. $w$)
$$ H(w) = \sum_{i = 1}^{n} p(y_i - x_i^T*w) \log (p(y_i - x_i^T w)) $$
where $$p(x)$$ is the Gaussian,
$$ p(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp(-\frac{x}{2 \sigma^2}) $$
$X$ is $n \times m$, $y$ is $n \times 1$ and $w$ is $m \times 1$
So far I have used the chain rule to figure out the derivative but the dimensions do not add up if I want the gradient for $w$. How can I obtain this gradient w.r.t. $w$?
Let $z = y - X w $ (where $y,z,w$ are column vectors)
Then $z_i = y_i - \sum_j w_j X_{i,j} = y_i-x_i^T w $ (where $x_i$ is the $i-$ row of $X$) and $$\frac{\partial z_i }{\partial w_j}=-X_{i,j}$$
Recall that $ (p \log p) ' = (\log p +1) p'$
Then $$\frac{\partial H(w)}{\partial w_j}= - \sum_i (\log(p(z_i)+1) \frac{\partial p(z_i)}{\partial z_i} X_{i,j} $$
$$\frac{\partial H(w)}{\partial w} = - X^T s$$
where $s_i = (\log(p(z_i)+1) \frac{\partial p(z_i)}{\partial z_i}$
Can you go on from here?