One-sided decaying squared error

21 Views Asked by At

I'm trying to build a loss function to be used with a machine learning regression model that results in something that's almost a hybrid of classification and regression.

The idea is to decrease the penalty to the extent that the prediction is directionally similar to the true $y$ value (e.g., $\hat {y} = 1, y = 3$ is penalized less than $\hat {y} = 1, y = -1$ despite the distance being the same in both cases).

What I have so far is a function that will approach a simple squared error to the extent that $y$ and $\hat {y}$ are directionally dissimilar but will decay with directional similarity:

$\Large \frac {1}{e^{\hat {y}(y-\hat {y})}}\normalsize (y-\hat {y})^2$

In each of the following examples, $y$ is represented on the $x$ axis and $\hat {y}$ is fixed. All of these were generated with https://www.desmos.com/calculator and if you add a slider for the value of $\hat {y}$, you can animate it and the effect becomes more obvious.


This is with $\hat {y}=0$. The function output is $y^2$, which is equal to $(y-\hat {y})^2$.

enter image description here

This is with $\hat {y}=0.5$. The right side starts to bend over with the goal of rewarding predictions that are in the direction of the $y$. As $y$ becomes more positive, the precision of $\hat {y}$ is less important.

enter image description here

And again with $\hat {y}=1$. Because $\hat {y}$ is larger than the previous example, the loss is smaller at all $y > 1$ but larger at $y = 0$.

enter image description here

$\hat {y} = -0.5$ and $\hat {y} = -1$ results in mirror images of the last two.


I think this is almost usable as it is, but it needs a modification. Specifically, I'd like the curve to bend over in a way that is proportional to the size of $y$ and/or $\hat y$. For instance, the output where $\hat {y} = 1, y = 5$ should be at the same point on the curve as the output where $\hat {y} = 0.1, y = 0.5$.

I've tried to modify the function, specifically dividing $e$ by $|p|$ but I don't think it's getting there, so I'm opening it up for ideas. Another extension of this would be for the right-hand side to be monotonic but still asymptotic rather than decaying back to zero.

Any suggestions are welcome.