I have the following situation. I have a $1 \times n$ row vector $v$, and a $n \times 1$ column vector $a$. I would like to represent $v \cdot \frac{d(\sigma(a))}{da}$ where $\sigma$ is the sigmoid function being applied to $a$ elementwise. Using the basic fact that $\sigma(x)' = \sigma(x)(1-\sigma(x))$ when $x$ is a scalar, we get that $\frac{d\sigma(a)}{da}$ is a diagonal $n \times n$ matrix with the $i$th diagonal entry being $\sigma(a_i)(1-\sigma(a_i))$. I would then like to represent the $1 \times n$ vector $v \cdot \frac{d\sigma(a)}{da}$ in terms of $v, a$ and the usual vector operations. But I can't think of a neat way without having to resort to Hadamard products - $v \bigodot (\sigma(a) \bigodot (1-\sigma(a))^T$ where $1$ here is a vector of $1$s.
Is there a nicer way to do this? To express either $\frac{d\sigma(a)}{da}$ in terms of $a$, or $v \cdot \frac{d\sigma(a)}{da}$ in terms of $v, a$?
$\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\D{{\rm Diag}}$Define the following variables (using element-wise functions) $$\eqalign{ e &= \exp(a),\qquad s &= \sigma(a)=\frac{e}{{\tt1}+e},\qquad S = \D(s) \\ }$$ Then the gradient of interest can be calculated as $$\eqalign{ ds &= (s - s\odot s)\odot da \;=\; \big(S-S^2\big)\,da \\ \p{s}{a} &= S-S^2 \\ }$$ Multiplying this gradient by a row or column vector yields $$\eqalign{ v^T&\left(\p{s}{a}\right) &\;=\; v^T\big(S-S^2\big) \\ &\left(\p{s}{a}\right)v &\;=\; \big(S-S^2\big)\,v \\\\ }$$
Note that one can eliminate all element-wise computations from the function evaluations by utilizing diagonal matrices, i.e. $$\eqalign{ A &= \D(a) \\ E &= \D(e) &= \exp(A) \\ S &= \D(s) &= \sigma(A) \;=\; (I+E)^{-1}E \\ }$$