Intuition for least square regression line involving joint distribution

212 Views Asked by At

Let $X$ and $Y$ be random variables of the continuous type having the joint pdf $f(x,y) = 8xy$, with $0 \leq x \leq y \leq 1$.

Determine the equation of the least square regression line.
Does the line make sense to you intuitively?

I have found the line to be $y = 0.361x + 0.607$ but I have no idea how to answer the intuition question.

Appreciate any explanation. Thank you.

1

There are 1 best solutions below

2
On

"Is this your homework, Larry?"

I. Look at the scatter plot

And find a relation such that $y_i\approx f(x_i), i=1,...,n $

II. "You might want to watch out that front window Larry".

Set a family of functions $F$ (ex: linear fonctions) and a cost function$L$ such that:

$\sum_{i=1}^{n}L(y-f(x))$ is a minimum for a given function $f\in F$., wher n is the number of given data (points, dot...).

$\Leftrightarrow \hat f =\arg\min_{f}\in F\sum_{i=1}^{n}L(y-f(x)),i=1,...,n$

for instance:

  • $L(x)=|x|$
  • $L(x)=x^2$
  • ...

We are assuming we have a sample of n points $(x_i,y_i)$ $$y_i=\beta_1+\beta_2x_i+\epsilon_i,\forall i=1,...,n$$ $\epsilon$ is sampling the "noise" and is assumed to be random (I encountered a case where it followed a normal law, so it mainly takes the mean value and sometimes diverges)

$\beta_1$ and $\beta_2$ are what we are searching for

III. "Look, Larry...Have you ever heard of Vietnam?"

-"Oh, for Christ's sake, Walter!"

-"You're entering a world of pain, son. We know that this is your homework."

$\hat f =\arg\min_{f}\in F\sum_{i=1}^{n}L(y-f(x)),i=1,...,n$

  • we set $L(x)$ to $x^2$ (the least square... you remember?)
  • and $f(x)=\beta_1+\beta_2x$

$(\hat\beta_1,\hat\beta_2)=\arg\min_{\beta_1,\beta_2}\sum_{i=1}^{n}(y_i-\beta_1-\beta_2x_i)^2$ it's likeminimizing the square of the noise $\epsilon_i$ for each $i$.

$$\epsilon_i=y_i-\beta_1+\beta_2x_i=y_i-\bar y_i$$

$y_i$ the observed point and $\bar y_i$ the theorical point.

$$(\hat\beta_1,\hat\beta_2)=\arg\min_{\beta_1,\beta_2}\sum_{i=1}^{n}(y_i-\beta_1-\beta_2x_i)^2$$ $$=(\hat\beta_1,\hat\beta_2)=\arg\min_{\beta_1,\beta_2}\sum_{i=1}^{n}(y_i-\bar y_i)^2$$ $$=(\hat\beta_1,\hat\beta_2)=\arg\min_{\beta_1,\beta_2}\sum_{i=1}^{n}\epsilon_i^2$$ $$=(\hat\beta_1,\hat\beta_2)=\arg\min_{\beta_1,\beta_2}\sum_{i=1}^{n}||\epsilon||^2$$

IV. "YOU SEE WHAT HAPPENS? YOU SEE WHAT HAPPENS LARRY!"

we will use $S(\beta_1,\beta_2)=\sum_{i=1}^{n}(y_i-\beta_1-\beta_2x_i)^2$

$S(\beta_1,\beta_2)$ is quadratic, so it is convexe thus, it admits a sole minimum point at $(\hat\beta_1,\hat\beta_2)$

we thus have to calculate the points for which the partial derivations are null:

$\begin{cases} \frac{\delta S}{\delta\hat\beta_1}= -2\sum_{i=1}^{n}(y_i-\beta_1-\beta_2x_i)=0\\ \frac{\delta S}{\delta\hat\beta_2}= -2\sum_{i=1}^{n}x_i(y_i-\beta_1-\beta_2x_i)=0\\ \end{cases}$

$$...Unbielievable calculations processing...$$

from there you find $\hat\beta_1$ and $\hat\beta_2$ to have the least-square-regression-line:

$$\hat\beta_1=\bar y-\hat\beta_2\bar x$$