Line of best fit for $\{(n,n+\sin n) : n \in \mathbb{Z}\}$

84 Views Asked by At

It seems intuitive that the line of best fit for $\{(n,n+\sin n) : n\in \mathbb{Z}\}$ should be $y=x$.

More concretely, it seems like a reasonable conjecture would be:

If $y = m_k x + b_k$ is the line of best fit for the set of points $$\{ (n,n+\sin n) : n\in \mathbb{Z}, |n| \leq k \},$$ then $\lim_{k\to \infty} m_k = 1$ and $\lim_{k\to\infty} b_k = 0$.

Is this conjecture true? And if so, how would one go about proving it? And moreover still, if $\{a_n\}$ is a sequence in $\mathbb{R}$ which is uniformly distributed in some compact interval $[A,B]$, then how does the line of best fit change when considering the set $\{(n,n+a_n)\}$?

EDIT: Just to clarify, by "line of best fit" I mean using the method of least-squares.

2

There are 2 best solutions below

1
On BEST ANSWER

To start with, we're going to need a few identities:

$$\begin{eqnarray} \sum_{j = 1}^n j & = & \frac{n(n+1)}{2} \\ \sum_{j = 1}^n j^2 & = & \frac{n(n+1)(2n+1)}{6} \\ \sum_{j = 1}^n \sin j & = & \frac{\sin n - \sin (n+1) + \sin 1}{2(1 - \cos 1)} \\ \sum_{j = 1}^n j \sin j & = & \frac{(n + 1) \sin n - n \sin (n + 1)}{2(1 - \cos 1)} \end{eqnarray}$$

The first two are the triangular and square pyramidal numbers respectively, the third is the sum of sines formula as expressed in this answer, and the last is courtesy of Wolfram Alpha, although you can derive it using a similar approach to the sum of sines formula, but applying a formula for the sum of $k x^k$ (that in turn comes from differentiating the geometric series formula with respect to $x$).

Then, we're going to use this formula for the coefficients of a simple linear regression:

$$\begin{eqnarray} y & = & \alpha + \beta x \\ \beta & = & \frac{n \sum_j x_j y_j - \sum_j x_j \sum_j y_j}{n \sum_j x_j^2 - (\sum_j x_j)^2} \\ \alpha & = & \bar{y} - \beta \bar{x} \\ & = & \frac{1}{n}\left(\sum_j y_j - \beta \sum_j x_j \right) \end{eqnarray}$$

Now, we get to start substituting $x_j = j$ and $y_j = j + \sin j$ everywhere. Starting with the denominator of $\beta$:

$$\begin{eqnarray} n \sum_j j^2 - (\sum_j j)^2 & = & n \frac{n(n+1)(2n+1)}{6} - \left(\frac{n(n+1)}{2}\right)^2 \\ & = & \frac{n^2 (n+1)}{2}\left(\frac{2n+1}{3} - \frac{n+1}{2} \right) \\ & = & \frac{n^2 (n+1)(n-1)}{12}\end{eqnarray}$$

Next, the numerator:

$$\begin{eqnarray} n \sum_j j(j + \sin j) - \sum_j j \sum_j (j + \sin j) & = & n(\sum_j j^2 + \sum_j j \sin j) - \left( (\sum_j j)^2 + \sum_j j \sum_j \sin_j \right) \\ & = & \frac{n^2 (n+1)(n-1)}{12} + n \frac{(n+1) \sin n - n \sin(n+1)}{2(1 - \cos 1)} \\ && - \frac{n(n+1)}{2} \frac{\sin n - \sin(n+1) + \sin 1}{2(1 - \cos 1)} \\ & = & \frac{n^2 (n+1)(n-1)}{12} \\ && + \frac{n \left((n+1) \sin n - (n - 1)\sin(n+1) - (n+1) \sin 1 \right)}{4(1 - \cos 1)} \end{eqnarray}$$

Putting those together, we get:

$$\begin{eqnarray} \beta & = & 1 - \frac{n \left((n+1) \sin n - (n - 1)\sin(n+1) - (n+1) \sin 1 \right)}{4(1 - \cos 1)}\frac{12}{n^2 (n+1)(n-1)} \\ & = & 1 - \frac{3 \left((n+1) \sin n - (n-1) \sin(n+1) - (n+1) \sin 1 \right)}{n(n-1)(n+1)(1 - \cos 1)} \end{eqnarray}$$

And:

$$\begin{eqnarray} \alpha & = & \frac{1}{n}(\sum_j (j + \sin j) - \beta \sum_j j) \\ & = & \frac{1}{n}\left(\frac{\sin n - \sin(n+1) + \sin 1}{2(1 - \cos 1)} \right. \\ && \left. + \frac{n(n+1)}{2} \frac{3 \left((n+1) \sin n - (n-1) \sin(n+1) - (n+1) \sin 1 \right)}{n(n-1)(n+1)(1 - \cos 1)} \right) \\ & = & \frac{(2n+1)\sin n - (2n-2)\sin(n+1) - (n+2)\sin 1}{n(n-1)(1 - \cos 1)} \end{eqnarray}$$

Or, at least, that's probably close to being right, but the probability of an algebraic error creeping in there is pretty high. However, assuming that that's all roughly accurate, we can see that $\beta = 1 + O(n^{-2})$ and $\alpha = O(n^{-1})$, so in the limit as $n \rightarrow \infty$ we do indeed get that $y = x$ is the least squares regression line.

0
On

We also could consider the continuous case of an infinite number of data points $(x_i,x_i+\sin(x_i))$ for $0 \leq x \leq a$ and minimize $$\Phi(m,b)=\int_0^a \Big[(mx+b)-(x+\sin(x))\Big]^2\,dx$$ $$\Phi(m,b)=\frac{1}{6} \left(a \left(2 a^2 (m-1)^2+6 a b (m-1)+6 b^2+3\right)-3 \sin (a) (\cos (a)+4 m-4)\right)+$$ $$2 \cos (a) (a (m-1)+b)-2 b$$

$$\frac{\partial \Phi(m,b)}{\partial m}=\frac{1}{6} \left(a \left(4 a^2 (m-1)+6 a b\right)-12 \sin (a)\right)+2 a \cos (a)\tag 1$$ $$\frac{\partial \Phi(m,b)}{\partial b}=\frac{1}{6} a (6 a (m-1)+12 b)+2 \cos (a)-2\tag 2$$

Solving the two linear equations in $(m,b)$ gives $$m=1-\frac{6 (a-2 \sin (a)+a \cos (a))}{a^3}\quad \to ~ 1^-$$ $$b=\frac{2 (2 a-3 \sin (a)+a \cos (a))}{a^2}\quad \to ~ 0^+$$