Finding $w_1,w_2,b$ such that $w_2 \sigma(w_1 x + b) \approx x$

63 Views Asked by At

Let $\sigma(z) = 1/(1+e^{-z})$. How can I find $w_1, w_2, b$ such that $w_2 \sigma(w_1 x + b) \approx x$ for $x \in [0,1]$?

The hint provided was to rewrite $x = \frac{1}{2}+\Delta$, assume $w_1$ is small, and use a Taylor expansion in $w_1 \Delta$.

(This problem comes from chapter 5 of Neural Networks and Deep Learning.)

Attempt: Performing the substitution from the hint yields

$$ w_2 \sigma\left(\frac{1}{2} w_1 + w_1 \Delta + b\right) \approx \frac{1}{2} + \Delta. $$

Assuming $w_1$ is small,

$$ w_2 \sigma(w_1 \Delta + b) \approx \frac{1}{2} + \Delta. $$

Taking a first order Taylor expansion of the LHS about $w_1 \Delta = 0$ gives

$$ w_2 \sigma(b) + w_1 w_2 \sigma'(b) \Delta \approx \frac{1}{2} + \Delta, $$

so $w_2 \sigma(b) \approx \frac{1}{2}$ and $w_1 w_2 \sigma'(b) \approx 1$, thus $w_2 \approx \frac{1}{2 \sigma(b)}$ and $w_1 \approx \frac{2 \sigma(b)}{\sigma'(b)}$. However, when I tried a few values of $b$, none of them produced very good approximations of $x$.

For what it's worth, the parameters $w_1=3.98, w_2=1.14, b=-2.27$ work pretty well (found via a grid search).

1

There are 1 best solutions below

2
On BEST ANSWER

I think that there is a problem when it is assumed that $w_1$ is small.

Let me show what I tried : developing as a Taylor series around $x=\frac 12$ gives $$w_2 \sigma (w_1 x+b)=w_2 \sigma \left(\frac{w_1}{2}+b\right)+w_1 w_2 \sigma '\left(\frac{w_1}{2}+b\right)\left(x-\frac{1}{2}\right)+O\left(\left(x-\frac{1}{2}\right)^2\right) $$ So, ignoring the higher order terms and rearranging $$w_2 \sigma (w_1 x+b)\approx -\frac{1}{2} w_1 w_2 \sigma '\left(\frac{w_1}{2}+b\right)+w_2 \sigma \left(\frac{w_1}{2}+b\right)+w_1 w_2 \sigma '\left(\frac{w_1}{2}+b\right)x$$ So, you want $$w_1 w_2 \sigma '\left(\frac{w_1}{2}+b\right)=1$$ $$-\frac{1}{2} w_1 w_2 \sigma '\left(\frac{w_1}{2}+b\right)+w_2 \sigma \left(\frac{w_1}{2}+b\right)=0$$ So, the first equation gives $w_2$ as a function of $b$ and $w_1$ and the second equation (in which, by magics, $w_2$ disappears) gives an implicit relation between $w_1$ and $b$. For sure, if we decide to neglect the terms $\frac {w_1} 2$ in the functions, we arrive to your results.

If now, we introduce the definition of $\sigma(z)$, after simplifications, the equations write $$2 e^{\frac{w_1}{2}+b}-w_1+2=0$$ $$\frac{w_1 w_2 e^{-\frac{w_1}{2}-b}}{\left(e^{-\frac{w_1}{2}-b}+1\right)^2}-1=0$$

As functions of $w_1$, the solutions are $$w_2=\frac{1}{w_1-2}+\frac{1}{2}$$ $$b=\log \left(\frac{w_1-2}{2}\right)-\frac{w_1}{2}$$

If we use the above expressions and push the Taylor expansion to higher orders, we obtain $$\frac{1}{2}+\left(x-\frac{1}{2}\right)+\left(2-\frac{w_1}{2}\right) \left(x-\frac{1}{2}\right)^2+O\left(\left(x-\frac{1}{2}\right)^3\right)$$ from which it is clear that $w_1=4$ is the best choice to which correspond $w_2=1$ and $b=-2$ which are very close to the solution you gave.

Edit

The problem can be solved in a simpler manner without using Taylor; considering the function $$f(x)=\frac{w_2}{1+e^{-w_1 x-b}}$$ Basically what is required is to satisfy $$f(\frac 12)=\frac 12 \quad , \quad f'(\frac 12)=1 \quad , \quad f''(\frac 12)=0$$ Here again, the first and second equations allow to eliminate two of the unknowns as a function of the third unknown; the last equation gives the last parameter. For sure, the same results are obtained.