I've got a data set, which in the beginning behaves like a straight line, but later it deviates.

I would like to determine when the straight behavior ends. To do so I would like to stepwise include more and more data, while I watch the fit quality. If it "goes bad" the straight behavior ends.
In order to do that I need a measure which tells me my fit quality independent of the size of the data set.
The residual grows with the data set:$R := \sum [ y_i - f(x_i) ]^2$, so I considered $\frac{R}{n}$, where n is the size of the data set, but in real life the vale grows with the size as well, even tough I always redo the fit if I include more points. I don't quite understand why. Here are my questions:
Why does $\frac{R}{n}$ grow with the size?
Is there an expression, which grades my fit quality independent of the size of the data set?
Further detail: as a residual I use the value numpy.polyfit() returns
Edit: It turns out $\frac{R}{n}$ does not grow, at least not on a significant scale. My problem is a ripped piece of paper. It has a straight edge and as some point the rip starts. This is the outline of my example piece:
This is how $\frac{R}{n}$ grow with n:
and these are some sections for different n:

It turns out I hadn't fully understood what I did yesterday. $\frac{R}{n}$ seems to be a good Indicator
Whether $\frac{R}{n}$ grows with $n$ or not will depend on the dataset you have. In this case I would expect it to only start to grow significantly with $n$ when you start adding points that do not fit well with a linear curve. This is however what you want as this will allow you to find where the linear fit is good or not. It is hard to judge your case without more information (can you add a plot?), but I did a small numerical experiment to see how $\frac{R}{n}$ evolves with $n$ for a case that looks similar to your drawing.
I generated some data using the function $$y = \left\{\matrix{x & x < 1\\\frac{x}{10} & x> 1}\right. + \text{random-noise}$$
A plot is given below (blue is the data, purple depicts $y=x$)
I did a linear least-square fit using only the first $n$ data-points and calculated $\frac{R}{n} \equiv \frac{1}{n}\sum_{i=1}^n(a(n)x_i + b(n)-y_i)^2$. The result is given below as a function of $x$ to allow comparison with the plot above.
In this (idealized) test case we see that $\frac{R}{n}$ do not grow as long as we are in the regime where the linear fit is good, but as soon as we start adding data that do not fit well with a linear curve we get a sudden increase in $\frac{R}{n}$.