Stumbled across this example in a video:
minimise $x^Tx$
subject to $Ax = b$
first step:
$L(x,\nu) = x^Tx+\nu^T(Ax-b)$
$\nabla_xL(x,\nu) = 2x + A^T\nu = 0 \Rightarrow x = - \frac{1}{2}A^T\nu$
Alright, I'm not used to calculating the gradient like that.
I can see
$\nabla_x x^Tx =\nabla_x(x_1^2,x_2^2,..,x_n^2)^T = (2x_1,2x_2,..,2x_n)^T = 2x$
And
$\nabla_x \nu^Tb = 0$
makes sense, I guess.
I don't immediately see
$\nabla_x\nu ^TAx = A^T\nu$
though.
I can try
$\nabla_x\nu ^TAx = \nabla_x \nu^T\begin{bmatrix}((A^T)_1)^T x\\ ((A^T)_2)^T x\\ ...\\ ((A^T)_n)^T x\end{bmatrix} = \nabla_x[((A^T)_1)^T x\cdot \nu_1 + ((A^T)_2)^T x\cdot \nu_2+ ...+ ((A^T)_n)^T x\cdot \nu_n]$
But apart from my not knowing where to go from here, that also looks very wrong.
So what's happening, here?
And are there calculation rules for vector formulae that one can follow without having to break it down to the individual components?
For
with $\partial_i = \partial/\partial x_i$ we have \begin{align} \partial_i L(x, \nu) &= \partial_i \left( \sum_k x_k^2 + \left( \sum_j v_j \sum_k a_{jk} x_k - b_j \right) \right) \\ &= \sum_k \partial_i x_k^2 + \sum_j v_j \sum_k a_{jk}\partial_i x_k - \partial_i b_j \\ &= \sum_k 2 x_k \delta_{ik} + \left( \sum_j v_j \sum_k a_{jk} \delta_{ik} \right) \\ &= 2 x_i + \sum_j v_j a_{ji} \\ &= 2 x_i + \sum_j a_{ij}^T v_j \end{align} or $$ \DeclareMathOperator{grad}{grad} \grad_x L(x, \nu) = 2 x + A^T \nu $$ Note: I usually omit the summations and use Einstein summation convention (summing over indices that occur twice).