Understanding an identity for least squares regression line gradient

Question

Understanding an identity for least squares regression line gradient

132 Views Asked by Bumbble Comm At 09 Apr 2026 - 6:05

In section 2.2 of this paper, Gelman and Park present the following identity for the gradient of the least squares line through a set of 2D points:

...we recall a simple algebraic identity that expresses the least-squares regression of $y$ on $x$ as a weighted average of all pairwise comparisons:

$$\begin{align} \hat\beta^{ls}&=\frac{\sum_i(y_i-\bar y)(x_i-\bar x)}{\sum_i(x_i-\bar x)^2}\\\\ &=\frac{\sum_{i,\,j}(y_i-y_j)(x_i-x_j)}{\sum_{i,\,j}(x_i-x_j)^2}\\\\ &=\frac{\sum_{i,\,j}\frac{y_i-y_j}{x_i-x_j}(x_i-x_j)^2}{\sum_{i,\,j}(x_i-x_j)^2}\end{align}$$

In the first line, which is a basic least squares result, the series are iterating over all the points. In the second and third lines the series are iterating over all pairs of points.

It feels like I might be missing something obvious, but how do we go from the first line to the second?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Rewrite the numerator

\begin{equation} \sum_i(y_i-\bar y)(x_i-\bar x) = \sum_i(y_i-\frac{1}{n}\sum_j y_j)(x_i-\frac{1}{n}\sum_j x_j) \end{equation} that is

\begin{equation} \sum_i(y_i-\bar y)(x_i-\bar x) = \sum_i(\sum_j \frac{1}{n} y_i-\frac{1}{n}\sum_j y_j)(\sum_j \frac{1}{n}x_i-\frac{1}{n}\sum_j x_j) \end{equation} or \begin{equation} \sum_i(y_i-\bar y)(x_i-\bar x) = \sum_i\sum_j( \frac{1}{n} y_i-\frac{1}{n} y_j)(\frac{1}{n}x_i-\frac{1}{n} x_j) = \frac{1}{n} \sum_{i,\,j}(y_i-y_j)(x_i-x_j) \end{equation}

Rewrite the denominator

\begin{equation} \sum_i(x_i-\bar x)^2 = \sum_i(x_i-\bar x)(x_i-\bar x) = \sum_i(x_i-\frac{1}{n}\sum_j x_j )(x_i-\frac{1}{n}\sum_j x_j ) \end{equation} that is \begin{equation} \sum_i(x_i-\bar x)^2 = \sum_i\sum_j(\frac{1}{n}x_i-\frac{1}{n}x_j )(\frac{1}{n}x_i-\frac{1}{n} x_j ) = \frac{1}{n} \sum_i\sum_j(x_i-x_j )^2 \end{equation}

Replace now

So \begin{equation} \frac{\sum_i(y_i-\bar y)(x_i-\bar x)}{\sum_i(x_i-\bar x)^2} = \frac{\frac{1}{n} \sum_{i,\,j}(y_i-y_j)(x_i-x_j)}{\frac{1}{n} \sum_i\sum_j(x_i-x_j )^2} = \frac{\sum_{i,\,j}(y_i-y_j)(x_i-x_j)}{\sum_{i,\,j}(x_i-x_j)^2} \end{equation}

Understanding an identity for least squares regression line gradient

There are 1 best solutions below

Rewrite the numerator

Rewrite the denominator

Replace now

Related Questions in SUMMATION

Related Questions in LEAST-SQUARES

Related Questions in WEIGHTED-LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions