Derivative d/dx of ||A*sigmoid(x)||^2,

234 Views Asked by At

Here ||A||^2 is the norm function which computes the sum of squares of all elements, x is a column vector.I have tried a lot of ways of computing it but none give me the correct answer. A is a matrix, the sigmoid function returns a list with the sigmoid of each element in the list.

3

There are 3 best solutions below

0
On

If $A$ is constant,$$\partial_i(A\sigma(x))_j=A_{jk}\partial_i\sigma(x)_k=A_{jk}\delta_{ik}\sigma^\prime(x_i)=A_{ji}\sigma^\prime(x_i).$$So$$\partial_i\Vert A\sigma\Vert^2=2A_{jk}\sigma(x_k)A_{ji}\sigma^\prime(x_i)=2\sigma^TA^TA\sigma^\prime.$$

0
On

What we have is the following

$$||\mathbf{A\sigma(x)}||^2 = \mathbf{\sigma^T(x)A^TA\sigma(x)}$$

Then we can use the following rule:

$$\mathbf{\frac{\partial}{\partial x}x^TAx} = \mathbf{x^T(A+A^T)}$$

in conjunction with the chain rule to get the following expression:

$$\mathbf{\frac{\partial}{\partial x}||A\sigma(x)||^2} = \mathbf{2\sigma^T(x)A^TA\cdot\frac{\partial \sigma}{\partial x}}$$

However, given that $\sigma$ is a list of one variable inputs, the gradient of $\sigma$ should be a diagonal matrix.

0
On

Denote the derivative of the sigmoid $\sigma(\chi)$ as $$\eqalign{ \sigma' = \frac{d\sigma}{d\chi} \\ }$$ When these scalar functions are applied elementwise on a vector $x$, they produce vector results $$s=\sigma(x),\qquad s'=\sigma'(x)$$ In such situations, it's usually more convenient to work with the differential quantity $$\eqalign{ ds &= s'\odot dx \\ }$$ The $\odot$ symbol represents the elementwise/Hadamard product, but this can be eliminated in favor of multiplication by the diagonal matrix $\;S' = {\rm Diag}(s')$ $$\eqalign{ ds &= S'\,dx \\ }$$ Now we're ready to calculate the requested gradient. $$\eqalign{ \phi &= \|As\|^2 \\&= As:As \\ d\phi &= 2As:A\,ds \\ &= 2A^TAs:ds \\ &= 2A^TAs:S'dx \\ &= 2S'A^TAs:dx \\ \frac{\partial\phi}{\partial x} &= 2S'A^TAs \\ \\ }$$


In some of the steps above, a colon is used to denote the trace/Frobenius product, i.e. $$\eqalign{ A:B &= {\rm Tr}(A^TB) }$$ The cyclic property of the trace allows such products to be rearranged in a number of ways, e.g. $$\eqalign{ A:B &= A^T:B^T &= B:A \\ A:BC &= B^TA:C &= AC^T:B \\ }$$ NB: $\,$If your sigmoid function happens to be the logistic function, then there are nice formulas for the scalar derivative $$\eqalign{ \sigma' &= \sigma - \sigma^2 \\ }$$ the vector differential $$\eqalign{ ds &= \left(S-S^2\right)dx \qquad{\rm where}\;\; S = {\rm Diag}(s) \\ }$$ and the gradient $$\eqalign{ \frac{\partial\phi}{\partial x} &= 2(S-S^2)A^TAs \\ }$$