Adding a new nonlinear function to another nonlinear function would improve the nonlinearity?

31 Views Asked by At

Assume we have a combination of different functions as below: enter image description here

X is the input and it is a 2D matrix.

$f(X)$ is a linear operation.

$g(Y)=max(0,Y)$ is a nonlinear operation.

$k(Z)$ is a linear operation.

$h(Y)$ is a nonlinear operation that is more complex than $g(Y)$.

Now the relationship between A and the input vector X can be defined: $$A=\Gamma(X)$$ Where $\Gamma(X)$ can include or exclude the $h(Y)$.

Including $h(Y)$ gives us: $$\Gamma_a(X)=f(X)\circ g(Y)\circ k(Z)+h(Y)$$

And excluding $h(Y)$ yields:

$$\Gamma_b(X)=f(X)\circ g(Y)\circ k(Z)$$

Is it possible to prove mathematically that adding a more complex nonlinear function, $h(Y)$, to the system can enhance the overall nonlinearity when a nonlinear function $g(Y)$ is already present? In other words, the nonlineariy provided by $\Gamma_a(X)$ is better than $\Gamma_b(X)$?

Some Details: I am looking for a general proof. The question is in the context of neural network. $g$ is a ReLU activation function and $h$ is a morphological operation (like erosion). The goal in neural network is to fit a nonlinear function between input and output. I need to generally prove that adding h in addition to $g$ improves then nonlinearity than having the $g$ solely. I though that by approximating erosion to have it as a continuous function and then by taking second derivative, we might prove that the network includes more nonlinearity. But I am not sure if this approach is correct.