I would like to use matrix calculus to find the gradient of the following function with respect to $\mathbf{y}$:
$D_{KL}(\mathbf{x},\mathbf{Vy}) = \sum_i[x_i\log\frac{x_i}{(Vy)_i} - x_i + (Vy)_i]$
$\nabla_\mathbf{y}D_{KL} = ?$
Using differentials, I was able to make some progress:
$\mathbf{d}D_{KL} = \mathbf{d} \sum_ix_i\log x_i-\mathbf{d} \sum_i x_i \log (Vy)_i - \mathbf{d} \sum_i x_i + \mathbf{d} \sum_i (Vy)_i$
$ = \mathbf{d} (\mathbf{x^T} \log \mathbf{x}) - \mathbf{d} (\mathbf{x^T} \log(\mathbf{Vy} ) ) - \mathbf{d} \mathbf{(x^T1)} + \mathbf{d} (\mathbf{(Vy)^T1} ) $
$= - \mathbf{d} (\mathbf{x^T} \log(\mathbf{Vy} ) )+ \mathbf{d} (\mathbf{(Vy)^T1} )$ $= - \mathbf{x^T} \mathbf{d}(\log(\mathbf{Vy} ) )+ \mathbf{d} (\mathbf{(Vy)^T1} )$
But my knowledge of differentials stopped here. How do I continue this derivation? Ideally, I want to isolate $\mathbf{dy^T}$ on the right hand side, and then I should have the gradient.
EDIT: with the help of @greg, I can continue.
$= - \mathbf{x^T} (\mathbf{d}(\mathbf{Vy}) \oslash \mathbf{Vy} ))+ \mathbf{d} (\mathbf{(V^T1)^Ty} )$
$= - \mathbf{x^T} (\text{diag}^{-1}[\mathbf{Vy}]\mathbf{Vdy} )+ \mathbf{(V^T1)^T\mathbf{d} y}$
$\\$
$ \implies \nabla_{\mathbf{y}} D_{KL}= \left(-\mathbf{x^T}\text{diag}^{-1}[\mathbf{Vy}]\mathbf{V} +\mathbf{(V^T1)^T} \right)^T $
$=(-\mathbf{x^T}\text{diag}^{-1}[\mathbf{Vy}]\mathbf{V})^T +\mathbf{V^T1}$
$=-\mathbf{V}^T(\mathbf{x^T} \oslash (\mathbf{Vy})) +\mathbf{V^T1}$
One of the pieces that you are missing is the differential of an elementwise log function. $$\eqalign{ d\log(z) &= dz\oslash z \\ }$$ where $\oslash$ denotes elementwise/Hadamard division. This can be converted into a regular matrix product using a diagonal matrix $$\eqalign{ d\log(z) &= Z^{-1}dz \quad\implies\quad Z &= {\rm Diag}(z) \\ }$$ Another piece that you're missing is the differential of a product, i.e. $$\eqalign{ z &= Vy \quad\implies\quad dz &= V\,dy \\ }$$ And the final piece is the equivalence between the differential and the gradient. $$\eqalign{ d\lambda &= g^Tdz \quad\iff\quad \frac{\partial\lambda}{\partial z} &= g \\ }$$ Plus a reminder that $\;(Vy)^T{\tt1} = ({V^T\tt1})^Ty$
You should be able to take it from here.