Gradient of Gaussian entropy w.r.t. $W$

96 Views Asked by At

I have been struggling with deriving a gradient in matrix form for the following function. (w.r.t. $w$)

$$ H(w) = \sum_{i = 1}^{n} p(y_i - x_i^T*w) \log (p(y_i - x_i^T w)) $$

where $$p(x)$$ is the Gaussian,

$$ p(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp(-\frac{x}{2 \sigma^2}) $$

$X$ is $n \times m$, $y$ is $n \times 1$ and $w$ is $m \times 1$

So far I have used the chain rule to figure out the derivative but the dimensions do not add up if I want the gradient for $w$. How can I obtain this gradient w.r.t. $w$?

1

There are 1 best solutions below

0
On

Let $z = y - X w $ (where $y,z,w$ are column vectors)

Then $z_i = y_i - \sum_j w_j X_{i,j} = y_i-x_i^T w $ (where $x_i$ is the $i-$ row of $X$) and $$\frac{\partial z_i }{\partial w_j}=-X_{i,j}$$

Recall that $ (p \log p) ' = (\log p +1) p'$

Then $$\frac{\partial H(w)}{\partial w_j}= - \sum_i (\log(p(z_i)+1) \frac{\partial p(z_i)}{\partial z_i} X_{i,j} $$

$$\frac{\partial H(w)}{\partial w} = - X^T s$$

where $s_i = (\log(p(z_i)+1) \frac{\partial p(z_i)}{\partial z_i}$

Can you go on from here?