Linealizing data points following $f \propto 1/x$-like function

39 Views Asked by At

I have some data ($x$ and $y$ points). I think the data approximately follows the equation $y=\frac{a}{x}$ for some positive constant $a$.

I want to perform a linear regression on these data. In order to get a straight line, I applied $1/x_i$ on the $x$ data points, which will be my new $x$ axis. Let's call them the $x'$ points.

I would expect that when I plot $y$ vs. $x'$ I get an approximately linear graph, with slope $a$. However, when I do it with my data I don't get a straight line, very far from it (I get something with positive but decreasing slope, $R^2\simeq 0.7$ with a linear regression).

However, if I apply $1/y_i$ on the $y$ points (let's call them $y'$ points) and graph $y'$ vs. $x$ I do get something very close to a straight line ($R^2\simeq 0.9994$), with a slope of approximately $\frac{1}{a}$, as I expected. Why is this happening?


For my own sanity, I calculated $(x, f(x))$ points for $f(x)=25/x$, and plotted $(1/x,f(x)$ (I got a slope of $25$ as expected) and $(x, 1/f(x))$ (I got a slope of $0.04$ as expected).


The specific data points are these, if you want to play around with them:

$$ x \quad\quad y\\ 2\quad 133.2\\ 10\quad 104.5\\ 100\quad 33.87\\ 470\quad 10.02\\ 1000\quad 5.263\\ 4700\quad 0.9433\\ 10000\quad 0.4629\\ $$

1

There are 1 best solutions below

2
On BEST ANSWER

For your data points we get $xy=266.3, 1045, 3387,4709.4, 5263,4433.51, 4629$. These values are not very close to one another (though the last five aren't too far apart), so the relationship is not of the form $y=a/x$. Thus it should not be surprising you get quite different results from running your two different regressions. To see why, notice that the regression you said fitted well was of the form

$$\frac{1}{y}=a+bx$$

This equation can be rearranged as

$$y=\frac{1}{a+bx}.$$

Since $a$ is nonzero for your data set, this explains why you found a decreasing slope when you plotted $y$ against $1/x$ (mathematically, let $z=1/x$ and differentiate the equation above with respect to $z$; the result is negative).

It also explains why you did not get a good fit from running a regression of the form

$$y=c+d\frac{1}{x}.$$

To see why, note that

$$\frac{1}{a+bx}=\frac{1/a}{1+\frac{b}{a}x}$$

which is approximately equal to

$$\frac{1/a}{\frac{b}{a}x}=\frac{1}{bx}$$

only if $(b/a)x$ is large relative to 1.