How to optimize this Smooth maximum function

817 Views Asked by At

I'm applying this smooth Maximum function $\sqrt[n]{x^n + y^n}$ instead of Hard Maximum function $max(x,y)$, In order to find the maximum value in an array.

I need a function that preserve a smooth function which allow me to apply gradient descent and back propagate the error, taking into consideration all the input value. Using Hard maximum gives as a gradient a value $1$ for the maximum input and $zeros$ everywhere else while the smooth maximum function gradient gives a values for each input.

I found by testing this function that whenever the input approach zero the function could not be applied, So I added a small value to the input with $1e-5$ to avoid dividing by zero.

What I found else and what you'll see by the figures below that the difference diverge largely between the hard maximum and the soft maximum output with reference to the input size. what I mean by this: if my array is of size $(2,2)$ or of size $(100,100)$ the difference between the hard and the maximum output value increase. So there is a proportional link between the size of the input and the value of the parameter $n$ in the smooth function. Once you increase the input size you have to adapt this parameter with it. But then we are risking to go for overflow because of the small value of the input.

note: All my values are above zero because I'm thresholdinng by Relu function. So in my case this function works if I could solve this proportional property between the input and the parameter n.

I tested the function $Log(Exp(x) - (n-1))$ and the implementation was more difficult because of both overflow and underflow. Wikipedia reference

The figures below plot the number of input versus the difference between Hard and soft

enter image description here where n = 4 enter image description here where n = 6 enter image description here where n = 4 is fixed until size of 6, and then it is divided by 2 n = np.ceil(i/2) where i is the input size.