How to compute this "smooth max operator"?

109 Views Asked by At

I was seeking for an alternate way to activate each neuron of a neural network non-linearly. Eventually, I came up with the following binary operation: $$ x \lor y = \log (\exp x + \exp y) $$

With $-\infty$ being its identity element, along with associativity, this operator roughly works like $\max$. As such, I call this operator smooth max. To provide it some learnable parameters, use it as $f(x_1,\cdots,x_n) = b_0 \lor (w_1x_1+b_1) \lor \cdots \lor (w_nx_n+b_n)$.

I think this provides a good way of activating the neuron (in the next layer) they target, because of the following reasons:

  • This operator is smooth.

  • For $\frac{\partial}{\partial x}(x \lor y) = (1 + \exp(y-x))^{-1}$ and $\frac{\partial}{\partial y}(x \lor y) = (\exp(x-y) + 1)^{-1}$, it guarantees that at least one parameter will be effectively learned during backpropagation. That is, no worry of gradient vanishing problem.

  • It provides some degree of non-linearity on its own, so it doesn't need a separate activation function.

For experimental purposes, I planned to actually use this operator for neural networks I've built. However, I also think there is a problem for this operator. Namely, directly invoking exp and log functions that most programming languages provide will be very prone to floating-point overflow, or floating-point error. So how should I compute this operator safely?

1

There are 1 best solutions below

1
On BEST ANSWER

Without loss of generality assume $x>y$. Then $$ x \vee y = x + \log(1+\exp(y-x)). $$ This should already fix the biggest problems.