Long story short, I would like to efficiently:
Minimize ||bX-y||2 subject to X ≥ 0 and bX ≥ y
I have an observation that is a single curve (y) in the form of signal intensity vs. frequency. I would like to model this signal as a linear combination of a series of predictor curves in a design matrix, X. My current implementation is using an ordinary least squares analysis where the coefficients of the predictor curves, X, are given by,
b = (XTX)-1XTy
and the fitted values are given by,
ŷ = bX
However, this simple approach minimizes the sum of squared residuals and the fitted curves will always cross the data at some point, which is not physically realistic in the system that I am trying to model. From this, my desired restrictions would be:
bX ≥ y,
b ≥ 0
If anyone could point me in the right direction it would be greatly appreciated.
If you want the fit line to always be above the data points except where it touches them, then it would have to go through two of the data points while being above or going through all the other data points.
A simple way of doing this, although computationally annoying, is to get the line through each pair of data points. For each of these lines, check if the line is above or on all the other points. This is the one you want.
If there are n points, this takes $O(n^3)$ time since there are $n(n-1)/2$ pairs of points and you have to check $n-2$ other points for each.
A more computationally effective method, could be to get the convex hull of the data points (which takes $n\log n$ time). One of its lines will be above or on all the other points, so this will take $O(n^2\log n)$ time.
It would be interesting if a $O(n^2)$ or $O(n)$ method could be found.