In the book "Sparse and Redundant Representations", the following optimization problem is stated:
$$\min_x J(x) \quad \text{subject to} \quad b=Ax$$
then using Lagrange multipliers, the following Lagrangian has been defined:
$$\mathcal{L}(x)= \parallel x\parallel_2^2+\lambda^T (Ax-b)$$
Where $\lambda$ is the $Lagrange\;multipliers$ for the constraint set.
And the following requirement has been obtained by taking a derivative of $\mathcal{L}(x)$ with respect to $x$:
$$\frac{\partial\mathcal{L}(x)}{\partial(x)}=2x+A^T\lambda$$
- My first question is, why in the derivative, we have $A^T\lambda$? Shouldn't it be $\lambda^TA$ instead?
The above optimization problem has been solved differently in the book "Convex Optimization by Stephen Boyd":
For the following optimization problem: $$\min_x f_0(x)\\\text{subject to}\quad h_i(x)=0, \quad i = 1, . . . , p$$
The Lagrangian associated with the above problem is:
$$L(x,\lambda,v)=f_0(x)+\sum_{i=1}^{p}v_ih_i(x)$$
Where $\lambda_i$ is referred to as $Lagrange\;multiplier$ associated with the $i$th inequality constraint $f_i(x) ≤ 0$
- Why $Lagrangian$ has been defined differently in the two books?
Q1 (already answered by Michael Grant)
The derivative of $\langle \lambda ,A x\rangle $ is $A^T \lambda $. Maybe it becomes clearer if you look at the following:
$$ \langle \lambda , Ax \rangle = \sum_{i=1}^n \lambda_i \sum_{j=1}^m A_{ij} x_j$$
so the $j$th element of the gradient, is the derivative of the above with respect to $x_j$ which is
$$ \sum_{i=1} \lambda_i A_{ij} = (A^T\lambda)_j$$
Q2: For your second question, the two Lagrangians are essentially the same (except that in the first case they only consider an equality constraint while in the second I suspect they consider both equality and inequality (*)). Just write:
$$ h_i(x) = (Ax -b)_i$$
and $\nu_i=\lambda_i$ then, in the first case, you have $\lambda^T(Ax-b) = \sum_{i=1}^p \lambda_i(Ax-b)_i$ and in the second you have $\nu^T(Ax-b) = \sum_{i=1}^p \nu_i(Ax-b)_i = \sum_{i=1}^p \nu_i h_i(x)$, indeed, exactly the same thing.
(*) by the way, your edits are a bit confusing here, (a) either add the sum corresponding to the inequality constraint and leave the sentence below it or (b) remove the sum (as you have done) and the sentence below, and the $\lambda$ in the Lagrangian.