variance maximization in PCA: doubt for the correctness of an Algerian passage.

44 Views Asked by At

I know trying to perform the steps omitted in Bishop's machine learning book (from point 12.4 to 12.5):

enter image description here

there is only one step that leaves me with a doubt of correctness namely whether I can, taken the unit vector $\vec{u}_{1}$ ($\vec{u}_{1}^{T} \vec{u}_{1} = 1$) consider the following expression valid:

$\vec{u}_{1}^{T} \vec{u}_{1} = (\vec{u}_{1})^{2}$

below all the steps:

$\frac{\partial }{\partial \vec{u}_{1}}\left [\vec{u}_{1}^{T} S \vec{u}_{1} + \lambda_{1} \left ( 1- \vec{u}_{1}^{T} \vec{u}_{1} \right )\right ] = 0$

so:

$= \frac{\partial }{\partial \vec{u}_{1}}\left [\vec{u}_{1}^{T} S \vec{u}_{1} + \lambda_{1}- \lambda_{1} \vec{u}_{1}^{T} \vec{u}_{1} \right ]$

$= \frac{\partial }{\partial \vec{u}_{1}}\left [\vec{u}_{1}^{T} S \vec{u}_{1} \right ] + \frac{\partial }{\partial \vec{u}_{1}}\left [\lambda_{1} \right ] - \lambda_{1} \frac{\partial }{\partial \vec{u}_{1}}\left [\vec{u}_{1}^{T} \vec{u}_{1} \right ]$

$ = \frac{\partial }{\partial \vec{u}_{1}}\left [\vec{u}_{1}^{T} S \vec{u}_{1} \right ] - \lambda_{1} \frac{\partial }{\partial \vec{u}_{1}}\left [\vec{u}_{1}^{T} \vec{u}_{1} \right ]$

$ = 2S \vec{u}_{1} - \lambda_{1} \frac{\partial }{\partial \vec{u}_{1}}\left [ \left ( \vec{u}_{1} \right )^{2}\right ]$

$= 2S \vec{u}_{1} - 2\lambda_{1} \vec{u}_{1} = 0$

so:

$S \vec{u}_{1} - \lambda_{1} \vec{u}_{1} = 0$

$S \vec{u}_{1} = \lambda_{1} \vec{u}_{1}$

1

There are 1 best solutions below

0
On

A good reference is Matrix Algebra (Econometric Exercises Book 1) by Karim Abadir and Jan Magnus. In Chapter 13, Section 2 they look at scalar functions of a vector. In this case you have $\phi(u_1)=u_1^Tu_1$; therefore the differential $$ d\phi = (du_1)^Tu_1+u_1^Tdu_1=u_1^Tdu_1+u_1^Tdu_1=2u_1^Tdu_1 $$ In the same way for $\theta(u_1)=u_1^TSu_1$ we have, $$ d\theta = (du_1)^TSu_1 + u_1^TSdu_1 = u_1^T S^T du_1 + u_1^TSdu_1 =u_1^T (S^T+S) du_1 $$ If $S^T=S$ (i.e. $S$ is symmetric) then we have, $$ d\theta = 2u_1^T S du_1 $$ Therefore from the optimization condition we must have, $$ 2u_1^TS - \lambda 2u_1^T = 0 $$ or, $$ u_1^TS =\lambda u_1^T $$ Take the transpose of both sides and use the property $S^T=S$ to obtain, $$ S u_1 = \lambda u_1 $$