Given a projection $M = I - \frac{ww^t}{||w||^2}$ how's the result $M_w$ is orthogonal to w

84 Views Asked by At

I am reading a paper on weight normalization authored by Salimans & Kingma. In this paper, the gradient is split into two parts where $g$ is the norm of $w$ and $v$ is the direction of gradient, the gradient of $v$ calculated as

$$\nabla_v L = \frac{g}{\|v\|} M \nabla_wL$$

where $ M := I - \frac{ww^T}{\|w\|^2} $

Because $\Delta v \propto \nabla_vL (\text{steepest descent/ascent})$, then $\Delta v$ is >necessarily orthogonal to weight $w$ since $M$ project it away from calculating $\nabla_v L$.

I dont quite get this sentence, why $\Delta v$ must 'necessarily' orthogonal to weight $w$? Is projection matrix $M$ any special form?

2

There are 2 best solutions below

0
On BEST ANSWER

After going through the paper to understand notation, The update rule for $v$ is,

$$ v' = v + \Delta v $$

Since, we do vanialla gradient descent (referred as steepest descent in the paper) for optimization. Assuming the step size to be $\eta$,

$$\Delta v = -\eta*\nabla_vL= -\eta*\frac{g}{||v||} M \nabla_w L$$

$\Delta v$ is orthogonal to weight $w$ because $M$ is a projection matrix that projects onto the complement of the $w$ vector. Mathematically,

$$ w^T\Delta v=-\eta\frac{g}{||v||}w^T\left( I - \frac{ww^T}{||w||^2}\right)\nabla_wL $$

$$ w^T\Delta v=-\eta\frac{g}{||v||}\left( w^T - \frac{||w||^2w^T}{||w||^2}\right)\nabla_wL=0 $$

Hence, $\Delta v$ is orthogonal to the original weight vector $w$.

3
On

Yes, the matrix $M$ is special. To see what it does let's set $$P_w:=\frac{ww^T}{\|w\|^2}.$$ If you take any vector $a\in\mathbb{R}^n$,you have $$P_wa=\frac{ww^T}{\|w\|^2}a=\frac{w}{\|w\|^2}w^T\cdot a,$$ where the dot denotes the scalar product. This tells you that, for every $a$, the vector $P_wa$ lives in the linear subspace generated by $w$, that is $P_w$ projects vectors into the linear subspace generated by $w$.
Now you are done because $M = I - P_w$ projects vectors into the linear subspace orthogonal to $w$ (here for an explanation), thus $\nabla_v L$ lives in subspace orthogonal to $w$.