partial derivative of optimization equation

116 Views Asked by At

I'm attempting to use stochastic gradient descent for a problem, but am stuck to solve the update rules. Basically I am confused on the below partial derivatives... any help would be appreciated!

I will only show the part of the equation that I am stuck on, as to not unnecessarily complicate my question.

$\mathcal{L} = ||\mathbf{Wu - v}||_{Fro}^{2}$

where $\mathbf{W} \in \mathbb{R}^{n \times n}$ and $\mathbf{u,v} \in \mathbb{R}^{n \times 1}$

I'm trying to solve the following:

$\frac{\partial \mathcal{L}}{\partial \mathbf{u}} = $ ?

$\frac{\partial \mathcal{L}}{\partial \mathbf{v}} = $ ?

$\frac{\partial \mathcal{L}}{\partial \mathbf{W}} = $ ?

Thank you!


Edit: The below is my progress...

Let $f(\mathbf{x}) = ||\mathbf{x}||^2_{Fro}$ and $g(\mathbf{W}) = \mathbf{Wu -v}$

then $f' = ||\mathbf{x^\top x}||^2_{Fro} = 2\mathbf{x} $ and $g' = \mathbf{u^\top}$

$\mathcal{L} = f'(g(\mathbf{W}))g'(\mathbf{W})$

$\frac{\partial \mathcal{L}}{\partial \mathbf{W}} = f'(\mathbf{Wu -v})(\mathbf{u^\top}) = 2(\mathbf{Wu -v})(\mathbf{u^\top})$

...which makes sense as this results in an $n \times n$ matrix to update $\mathbf{W}$ with.


Similarly for $f(\mathbf{x})$, but let $g(\mathbf{v})=\mathbf{Wu -v}$

then $f' = ||\mathbf{x^\top x}||^2_{Fro} = 2\mathbf{x} $ and $g' = -1$ ?

$\mathcal{L} = f'(g(\mathbf{v}))g'(\mathbf{v})$

$\frac{\partial \mathcal{L}}{\partial \mathbf{v}} = f'(\mathbf{Wu -v})(-1) = -2(\mathbf{Wu -v})$

...this results in a $n \times 1$ vector to update $\mathbf{v} $


Similarly for $f(\mathbf{x})$, but let $g(\mathbf{u})=\mathbf{Wu -v}$

then $f' = ||\mathbf{x^\top x}||^2_{Fro} = 2\mathbf{x} $ and $g' = \mathbf{W^\top}$ ?

$\mathcal{L} = f'(g(\mathbf{u}))g'(\mathbf{u})$

$\frac{\partial \mathcal{L}}{\partial \mathbf{u}} = f'(\mathbf{Wu -v})(\mathbf{W^\top}) = 2(\mathbf{Wu -v})(\mathbf{W^\top})$

...however this results in a vector matrix multiplication where the dimensions do not match (i.e., $n \times 1$ and $n \times n $)

if it were $= 2(\mathbf{W^\top})(\mathbf{Wu -v})$ then it would result in a properly formated $ n \times 1 $ vector to update u with, however I'm worried I do not understand this adjustment.