Linear regression: zero indexed?

38 Views Asked by At

I have a set of data corresponding to profit per month:

Dec -> 1726
Jan -> 1252
Feb -> 1472
Mar -> 1165
...

And a linear regression algorithm that gives me a formula in the form of $y = mx + b$:

sx = 0;
sy = 0;
stt = 0;
sts = 0;
for (i = 0; i < count(months); ++i) {
    sx += i;
    sy += profits[i];
}
for (i = 0; i < count(months); ++i) {
    t = i - sx/count(months);
    stt += t * t;
    sts += t * profits[i];
}
slope = sts/stt; //m
inter = (sy-(sx*slope))/count(months); //b

months is an array containing the profits per month before, with months[0] being December, months[1] being January, etc.

My question is if when plotting the linear regression, do I start with $1$ or $0$? If my graph has twelve months on it, is the graph plotted $0\ldots11$ or $1\ldots 12$? Say the formula is $y = 127x + 720$, should the first point on my chart be $y = 127 + 720$ or $y = 720$?

My confusion comes from not working with numbers for the x-axis but instead using months. There's no such thing as zeroth month, I start at the first, so there's my argument for using $1$ as the first point, but my algorithm is zero indexed and I think there should be some congruence there.

2

There are 2 best solutions below

0
On BEST ANSWER

Your code uses $x$ values from $0$ to $11$.

for (i = 0; i < count(months); ++i) {
  sx += i;  // sum of x values, x taken from i variable
  // ..  
}
0
On

I see a major problem. Your loop has "sx += i". At the end, if c = count(months), sx = c(c-1)/2.

Therefore, at each step, sx/c = (c-1)/2, so t = i-(c+1)/2.

Therefore, the x-values you use for getting sx, which are i, are not the same x-values you use for getting stt and sts, which are t = i-(c+1)/2.