I have a set $S$ of coordinates $(x,y)$, and am estimating $f(x) = ax + b$ where $a > 0$. I also happen to know that $\forall x,y((x,y) \in S \implies y < f(x))$.
The question is how I can utilize this knowledge of the upper bound on values to improve the regression result?
My intuition is to run a "normal" linear regression on all coordinates in $S$ giving $g(x)$ and then construct $g'(x) = g(x) + c$, with $c$ being the lowest number such that $\forall x,y((x,y) \in S \implies y \leq g'(x))$, e.g. such that $g'(x)$ lies as high as it can whilst still touching at least point in $S$. I do, however, have absolutely no idea if this is the best way to do it, nor how to devise an algorithm that does this efficiently.
Any help would be greatly appreciated.
I played a bit with an alternative regression that used an exponential distribution for $y | \hat y$, so that $y$ has to be less than the regression line. This was motivated in part by the histogram at the bottom of page 62 of this Masters thesis, which suggests that network delay looks sort of like a shifted exponential distribution. Unfortunately, this ended badly; the maximum likelihood estimate is not always unique, and when it is unique the slope of the line does not represent the data well (it's estimated based on only two points). I'm considering writing this up and posting it to Stats.SE, because it was kind of interesting.
I think your original suggestion to do an ordinary regression and then move it up is a fine way to go about it. This is pretty easy, especially if you have a decent statistics library:
Your regression line now passes through the highest point and is above all the other points.