In Linear algebra the eigenvectors of a matrixs are these vectors that don't change their direction after applying this matrix( as a Transformation) to the space. But In machine learning (PCA to be specific) the eigenvectors of a matrix are the directions of the maximum variance of the data points in this matrix. I can't connect the two ideas, how can i tell that the direction that doesn't change is the direction of the maximum variance?.
Can i say that the eigenvectors have 2 properties: they have the same direction after applying the transformation, and they also describe the direction of the maximum variance?
Thanks
PCA seeks the linear combination of your variables that has maximum variance, and the assertion is that the coefficients of that best combination constitute an eigenvector for the covariance matrix of the variables. (We assume the coefficients are normalized so that the sum of squares equals 1.)
You can see this in the two-dimensional case: Say you've observed a data set with two variables $x$ and $y$, and suppose the correlation between $x$ and $y$ is $\rho$. For simplicity we can assume the $x$ and $y$ variables each have variance $1$. Then the covariance matrix of $x,y$ is $$ \Sigma:=\begin{pmatrix}1&\rho\\\rho &1\end{pmatrix} $$ and the variance of the linear combination $ax+by$ is $$ \operatorname{var}(ax+by)=a^2\operatorname{var}x+b^2\operatorname{var}y + 2ab\operatorname{cov}(x,y)=a^2+b^2+2\rho ab.\tag1 $$ Suppose we want to maximize (1) over all $a,b$ subject to the constraint that $a^2+b^2=1$. Using Lagrange multipliers, we form the objective function $$ L(a,b;\lambda):= a^2+b^2+2\rho ab+\lambda(a^2+b^2-1) $$ and take partials with respect to $a, b, \lambda$: $$ {\partial L\over\partial a}=2(a+\rho b -\lambda a)\\ {\partial L\over\partial b}=2(b+\rho a -\lambda b)\\ {\partial L\over\partial \lambda}=a^2+b^2-1 $$ The maximum occurs where these partials are zero. Setting the first two of these to zero and rearranging into matrix form, we get: $$ \begin{pmatrix}1-\lambda &\rho\\\rho&1-\lambda\end{pmatrix} \begin{pmatrix}a\\b\end{pmatrix}= \begin{pmatrix}0\\0\end{pmatrix}, $$ or $$ (\Sigma-\lambda I){\bf v}={\bf 0}\tag2 $$ where we write ${\bf v}$ for the column vector $(a,b)^T$. But (2) is saying that $(a,b)^T$ is an eigenvector for the covariance matrix $\Sigma$ corresponding to eigenvalue $\lambda$.
Note that the covariance matrix has two eigenvalues. The corresponding eigenvectors represent the linear combinations with maximum and minimum variance.