Gradient of 2 layer network

405 Views Asked by At

I am considering a simple $2$-layer network with $m$ training data $(x^i, y^i)$ data whose cost function is

$$\ell(w,\alpha, \beta) := \sum_{i=1}^{m} \left( y^i - \sigma(w^Tz^i) \right)^2$$

where

$$\sigma(x) := \frac{1}{1+e^{-x}}$$

is the sigmoid function and $z_1^i = \sigma(\alpha^T x^i)$, and $z_2^i = \sigma(\beta^T x^i)$. The gradient is

$$\nabla_w \ell(w, \alpha, \beta) = - \sum_{i=1}^m 2(y^i - \sigma(u^i))\sigma(u^i)(1-\sigma(u^i)) z^i$$

where $(u^i) = w^Tz^i$

I am trying to derive the gradient by taking derivative over the given cost function, but am not able to do so. After applying chain rule, i get stuck at the term $\sigma(w^Tz^i)$ Any one could step me through please?

1

There are 1 best solutions below

1
On BEST ANSWER

The cost function was

$$\ell(w,\alpha, \beta) := \sum_{i=1}^{m} \left( y^i - \sigma(w^Tz^i) \right)^2$$

Now take derivative $$=2 \sum_{i=1}^{m} \left( y^i - \sigma(w^Tz^i) \right)D(\sigma(w^Tz^i))$$

We note that $$D(\sigma(w^Tz^i))=\sigma(w^Tz^i)(1-\sigma(w^Tz^i))D(w^Tz^i)$$ $$=\sigma(w^Tz^i)(1-\sigma(w^Tz^i))z^i$$

And done...

(Note that we used the usual identity for sigmoid, the derivative is $f(x)(1-f(x))$.)