How does the $\sigma$ function be approximated by the $\Delta$output function in calculus?

1.1k Views Asked by At

Let $x$ be input, $w$ be weight, $b$ be bias.

then define $z$ as

$$z=w \cdot x\equiv\sum_j w_j \cdot x_j+b$$

and define output as

$$\text{output} = \begin{cases} 0 & \text{if $w \cdot x + b ≤ 0$}\\[2ex] 1 & \text{if $w \cdot x + b > 0$} \end{cases}$$

Since $z=\sum_j w_j \cdot x_j+b$ is a linear function, then $\sigma(z)$ is defined as a $$\sigma(z)=\frac{1}{1+e^{z}}$$

explicitly

$$\sigma(z)=\frac{1}{1+e^{\sum_j w_jx_j+b}}$$

The $\sigma$ function turns the piecewise step function into a sigmoid function.

The smoothness of σ means that small changes $Δw_j$ in the weights and $Δb$ in the bias will produce a small change Δoutput in the output from the neuron.

And the following can well approximate $\Delta \text{output}$

$$\Delta\text{output}\approx \sum_j \frac{\partial\text{ output}}{\partial w_j}\Delta w_j + \frac{\partial\text{ output}}{\partial b_j}\Delta b$$

would anyone be able to give me some proof how the calculus function could approximate the output in details?

The original article is from here

1

There are 1 best solutions below

1
On

This follows from a Taylor expansion of the perturbation.

Let the output be denoted by $\mathcal{O(w,b)}$ where $w\in\mathbb{R}^m, b\in\mathbb{R}$.

So then we can define a change in output due to a perturbation in $w$ and $b$ via: $$\Delta \mathcal{O} = \mathcal{O(w+\Delta w,b+\Delta b)} - \mathcal{O(w,b)} $$

Using the Taylor expansion, we can expand about $(w,b)$ to get: \begin{align} \mathcal{O(w+\Delta w,b+\Delta b)} &\approx \mathcal{O(w,b)} + ([\Delta w,\Delta b])^T\nabla_{w,b} \mathcal{O}(w,b)\\ &= \mathcal{O(w,b)} + \sum_j \Delta w_j\,\partial_{w_j}\mathcal{O}+\Delta b\,\partial_{b}\mathcal{O} \end{align} which lets us expand the perturbation via: $$ \Delta \mathcal{O} = \sum_j \Delta w_j\,\partial_{w_j}\mathcal{O}+\Delta b\,\partial_{b}\mathcal{O} $$ where $\partial_x f = \partial f/\partial x$.

Note: to be make things easier, I suggest renaming $b$ to $w_0$, and letting it be part of $w$.