Biased linear regression

Question

Biased linear regression

95 Views Asked by Bumbble Comm At 05 Apr 2026 - 11:03

I have a set $S$ of coordinates $(x,y)$, and am estimating $f(x) = ax + b$ where $a > 0$. I also happen to know that $\forall x,y((x,y) \in S \implies y < f(x))$.

The question is how I can utilize this knowledge of the upper bound on values to improve the regression result?

My intuition is to run a "normal" linear regression on all coordinates in $S$ giving $g(x)$ and then construct $g'(x) = g(x) + c$, with $c$ being the lowest number such that $\forall x,y((x,y) \in S \implies y \leq g'(x))$, e.g. such that $g'(x)$ lies as high as it can whilst still touching at least point in $S$. I do, however, have absolutely no idea if this is the best way to do it, nor how to devise an algorithm that does this efficiently.

Any help would be greatly appreciated.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I played a bit with an alternative regression that used an exponential distribution for $y | \hat y$, so that $y$ has to be less than the regression line. This was motivated in part by the histogram at the bottom of page 62 of this Masters thesis, which suggests that network delay looks sort of like a shifted exponential distribution. Unfortunately, this ended badly; the maximum likelihood estimate is not always unique, and when it is unique the slope of the line does not represent the data well (it's estimated based on only two points). I'm considering writing this up and posting it to Stats.SE, because it was kind of interesting.

I think your original suggestion to do an ordinary regression and then move it up is a fine way to go about it. This is pretty easy, especially if you have a decent statistics library:

Fit the regression.
Calculate the residuals, the difference between the y coordinate of each point and the y coordinate of the line at that point, given by $\hat\beta_0 + \hat\beta_1 x_i$, where $x_i$ is the x coordinate of the point. There may be a built-in way to do this. There are matrix-algebra representations as well.
Find the largest residual. Not largest in absolute value, just straight-up largest.
Add the value of the largest residual to the intercept $\hat \beta_0$ of your regression model.

Your regression line now passes through the highest point and is above all the other points.

Biased linear regression

There are 1 best solutions below

Related Questions in ALGORITHMS

Related Questions in REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions