multiplication with neural nets

54 Views Asked by At

I have to functions $f(x)$ and $g(x)$ and each of them i can realize with a neural net $\phi_f$ and $\phi_g$. My question is, how can i write a neural net for $f(x)g(x)$ ? so for example if g(x) is constant and equal to c and $\phi_f = ((A_1,b_1),...(A_L,b_L))$ then $\phi_{fg} = ((A_1,b_1),...,(cA_L,cb_L))$.

Actually i need to show it for $f(x)=x$ and $g(x)=x^2$ if this make something easier.

1

There are 1 best solutions below

0
On

The answer to your question depends a little bit on your activation function. However, the main idea would always be that you find a third network $\phi_{\rm mult}$ that realises the function $(x,y) \mapsto xy$. Then you obtain a neural network that realises $fg$ by composing $\phi_f$, $\phi_g$ with $\phi_{\rm mult}$.

Two questions remain. 1. How does we compose neural networks and 2. how to construct the multiplication neural network $\phi_{\rm mult}$? The answer to the first question is spelled out in these lecture notes, Definition 2.9 and 2.10 and I do not want to repeat them, since that is a lot to write. As for how to construct $\phi_{\rm mult}$, things depend on the activation function.

If your activation function $\rho: \mathbb{R} \to \mathbb{R}$ is smooth and has non-zero second derivative, then the following trick can yield an approximation to a multiplication network:

Let $\xi$ be the point, where $\rho''(\xi) = : a \neq 0$, then using Taylor's theorem $$ N \rho(x/ N +\xi) - N \rho(\xi) \to \rho'(\xi) x $$ $$ N^2\rho(x/ N +\xi) - N^2 \rho(\xi) - N \rho'(\xi) x \to \frac{a}{2} x^2 $$ for $N \to \infty$ (I feel like I almost certainly made a mistake here, but you get the idea). Using this, you can construct a neural network $\phi_{\rm square}$ such that its realisation approximates $x \mapsto x^2$ arbitrarily well. Then, using $$ (x+y)^2 - (x-y)^2 = 4xy $$ you can construct from the squaring neural network an approximation to the multiplication.

If, on the other hand, your activation function is piecewise linear, then you need to work a bit more. Let's consider the case of $\rho$ being the ReLU, then you'll find a construction of $\phi_{\rm mult}$ in Proposition 3 of https://arxiv.org/pdf/1610.01145.pdf. For arbitrary p.w. linear activation functions the same statement holds.