Determine position of projected points onto a line?

2.8k Views Asked by At

I have a list of points $S$ in the form of $(p, q)$:

$$\begin{align} S = &(43, 58), (44, 60), (40, 60), (41, 61), \\ &(46, 60), (40, 57), (53, 62), (50, 61) \end{align}$$

And I wish to center them on the origin $(0, 0)$. I would do this by subtracting from them the midpoints ($(\bar{p}, \bar{q})$) for each dimension:

$$\begin{align} \bar{p} &= \frac{p_1 + p_2 + \dots + p_n}{n} \\ \bar{q} &= \frac{q_1 + q_2 + \dots + q_n}{n} \end{align}$$

I find $\bar{p} = 44.625$ and $\bar{q} = 59.875$. I find my new $S$ to be:

$$\begin{align} S_{\text{new}} = &(-1.625, -1.875), (-0.625, 0.125), (-4.625, 0.125), (-3.625, 1.125), \\ &(1.375, 0.125), (-4.625, -2.875), (8.375, 2.125), (5.375, 1.125) \end{align}$$

Using linear regression, I've found the line of best fit for this data set which crosses the origin to be $y = 0.26x + 0$. This is the line in which I want to project points of data onto from right angles.

enter image description here

My question is, how do I find these projected points (marked as red dots)? Taking point $(1.375, 0.125)$, I can make a triangle with vertices at the origin, the point, and the projected point like so:

enter image description here

I know the slope of $c$ ($0.26$), the position of vertex $ba$ ($(1.375, 0.125)$), and position of vertex $ca$ ($(0, 0)$), but how do I find the position of vertex $cb$?

This is for principal component analysis. To find the eigenvalue, I need the sum of squared distances from projected points to the origin. I've already found the eigenvector to be $\begin{bmatrix}0.96 \\ 0.25\end{bmatrix}$.

2

There are 2 best solutions below

2
On BEST ANSWER

One way to do this is by calculating the euclidean vector of the blue line, in this case it is $\begin{bmatrix} 1 \\ 0.26 \end{bmatrix}$, you want its norm to be $1$ so you divide it by its norm to get: $v = \begin{bmatrix} 0.97 \\ 0.25 \end{bmatrix}$.

Then see every point as vector and to get the coordinates of point A one the blue line you just have to calculate $(A \cdot v) \cdot v$.

For example, for the point $(-1.625, −1.875)$, you would find:

$$\left( \begin{bmatrix} -1.625 \\ -1.875 \end{bmatrix} \cdot \begin{bmatrix} 0.97 \\ 0.25\end{bmatrix}\right) \cdot \begin{bmatrix} 0.97 \\ 0.25 \end{bmatrix} = \begin{bmatrix} -1.98 \\ -0.51 \end{bmatrix}$$

0
On

The other answer to totally correct, but I would like to add a bit more explanation. What you would like to do is to transform the original data to a new coordinate frame. Let the new coordinate frame be represented by the vectors $\textbf{v}$ and $\textbf{w}$, as shown in the image below. Because $\textbf{v}$ and $\textbf{w}$ are perpendicular, their dot product is zero: $\textbf{v} \cdot \textbf{w} = 0$. Furthermore, I assume that $\textbf{v}$ is a unit vector (which is not the case in the image due to my bad drawing skills...), so $\textbf{v} \cdot \textbf{v}=1$.

New coordinate frame

Any datapoint $\textbf{x}$ can be represented in the new coordinate frame: $$ \textbf{x} = a \textbf{v} + b \textbf{w}. $$

As you are interested in the position on the line, you are only interested in $a$. To get $a$, we multiply the equation on both sides with $\textbf{v}$: $$ \textbf{x} \cdot \textbf{v} = a \textbf{v} \cdot \textbf{v} + b \textbf{w} \cdot \textbf{v} = a, $$ where I used $\textbf{v} \cdot \textbf{w} = 0$ and $\textbf{v} \cdot \textbf{v}=1$.

The only thing you now need to do is to get the position on the line in the original frame by simply multiplying $a$ with $\textbf{v}$, i.e., $(\textbf{x} \cdot \textbf{v}) \textbf{v}$.

In your case, $\textbf{v} = \frac{1}{\sqrt{1+0.26^2}} \begin{bmatrix} 1 \\ 0.26\end{bmatrix} \approx \begin{bmatrix} 0.96 \\ 0.25 \end{bmatrix}$, and this results in:

-1.9787   -0.5145
-0.5550   -0.1443
-4.3017   -1.1184
-3.1215   -0.8116
 1.3184    0.3428
-5.0323   -1.3084
 8.3622    2.1742
 5.3086    1.3802

New points that are on a line