Linear regression with two possible slopes

Question

Linear regression with two possible slopes

147 Views Asked by Bumbble Comm At 30 Mar 2026 - 3:27

Let's say, I have a dataset with $X$ and $Y$ values. $X$ represents the monthly average temperature and $Y$ represents the money spent on utilities. My underlying hypothesis is that the heating energy (and utility bill) will be proportional to the average monthly temperature, but depending upon whether the house has gas or electric heating, the slope, $ ^\circ C$, will be different.

How can I use linear regression to extract out these two slopes? If I just do a simple linear regression with $X$ and $Y$, I will only get a single slope that will represent the average $^\circ C$ between gas and electric heating. If I do a scatter plot, it's quite easy to see distinct linear relations (as shown in figure below), but I am lost in terms of how to extract the two slopes.

Original Q&A

There are 3 best solutions below

Bumbble Comm On 06 May 2021 - 9:53

As far as I can tell, assuming you cannot manually label, you have three options

Full-fledged optimization-based joint classification and regression
Two stages with initial unsupervised classification followed by standard regression
Some other heuristics based on adding/removing points to two regression problems

For such well-separated data as you have, I would go for the second alternative as it should be very easy to separate these clusters (unless it is just a test and you have a solver framework available)

I had some old MATLAB code illustrating the first, the following code sets up a case similar to yours and encodes it as a mixed-integer QP using the toolbox YALMIP. The MIQP solver Gurobi which I used for testing starts struggling already for 100 data-points. You essentially assign a binary variable to each data point and slope and let this variable describe which residual to be added to the objective.

%% Data
n = 25;
x1 = sort(rand(n,1));
x2 = sort(rand(n,1));
y1 = 2+3*x1+.3*randn(n,1);
y2 = 1+6*x2+.3*randn(n,1);
x = [x1;x2];
y = [y1;y2];

%% Optimization
line1 = binvar(2*n,1);
line2 = binvar(2*n,1);
sdpvar a1 b1 a2 b2
e = sdpvar(2*n,1);
Model = [implies(line1,e == y-(a1*x+b1))
         implies(line2,e == y-(a2*x+b2))
         line1+line2 == 1]
optimize(Model,e'*e)     

%% Evaluate
clf
hold on
t=(0:0.1:1);
l = plot(t,value(a1)'*t+value(b1),'k-');
l = plot(t,value(a2)'*t+value(b2),'k-')
i = find(value(line1));
j = find(value(line2));
plot(x(i),y(i),'b*',x(j),y(j),'r*')
plot(x1,y1,'ro',x2,y2,'bo')

I just had to try the linear least-squares method in the answer by JJacquelin. Seems to work well on data looking like yours (I was too lazy to extract the asymptotes so just symbolically plotted the quadratic, well the whole code is lazy)

sdpvar a b c d f g
e = x.^2*a + y.^2*b + 2*x.*y*c + f*x + g*y + 1;
optimize([],e'*e)
sdpvar x y
p = [a b c d f g];
s = sdisplay(replace(x^2*a + y^2*b + 2*x*y*c +f*x + g*y + 1,p,value(p)));
l = ezplot([s{1} '= 0'])

Bumbble Comm On 06 May 2021 - 10:57

If you had posted an example of data (numerical, not graphical) I would have tested the method of regression given page 19 in https://fr.scribd.com/doc/14819165/Regressions-coniques-quadriques-circulaire-spherique

Numerical examples are shown in the paper.

I am reluctant to propose this method without testing it with a representative example of your data because, as pointed out in the paper, the reliability depends a lot of the scatter of data wrt the order of magnitude of the data.

You can try it and see. But a-priori I would not guarentee the succes.

NOTE : The first step of the method consits in fitting an hyperbola as shown in the paper. If the fitting is succesful, the asymptotes can roughly be taken as the two straight lines. Then for better fitting the points could be separated into two sets wrt the axis of the hyperbola.

**Bumbble Comm** · Accepted Answer

If you cannot distinct the two groups then just classify the observations manually, e.g, whenever $y_i/x_i$ larger then some threshold, then it belongs to group $A$, then just estimate $$ y_i = \beta_0 + \beta_1 x_i + \beta_2 D_Ax_i + \epsilon_i, $$ where $D_A$ is the indicator of the $A$th group. The slope of the $A$th group will be $(\beta_1 + \beta_2)$, while for the other group just $\beta_1$.

Linear regression with two possible slopes

There are 3 best solutions below

Related Questions in STATISTICS

Related Questions in REGRESSION

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions