Derivative of a quadratic form where the vector is exponentiated

46 Views Asked by At

$\newcommand{\vx}{\mathbf{x}}$ While solving a certain problem, I bumped into the following Lagrangian. $$ \mathcal{L}(\vx, \lambda) = \frac{(e^\vx) ^\top A e^\vx}{(e^\vx)^\top B e^\vx} - \lambda ((e^\vx)^\top \mathbf{1} - 1) $$ Here $\vx\in\mathbb{R}^d$, and $e^\vx$ means that we exponentiate each of its elements. $\lambda$ is the Lagrange multiplier for the condition that the elements of $e^\vx$ sum up to one. I would like to solve the optimization problem and I am therefore interested in $\nabla_\vx \mathcal{L}(\vx, \lambda)$.

Attempt at the derivative

I have never bumped into a problem like this, so I am unsure how to take derivatives properly.

$$ \begin{align} \nabla_\vx \left[(e^\vx) ^\top A e^\vx\right] &= (e^\vx \mathrm{I})(2 A e^\vx) = 2(e^\vx)^\top Ae^\vx \\ \nabla_\vx\left[(e^\vx) ^\top B e^\vx\right] &= 2 (e^\vx)^\top B e^\vx \end{align} $$ which would give $$ \nabla_\vx \mathcal{L}(\vx, \lambda) = \frac{2((e^\vx)^\top A e^\vx)((e^\vx)^\top B e^\vx) - 2((e^\vx)^\top B e^\vx)((e^\vx)^\top A e^\vx)}{4 ((e^\vx)^\top B e^\vx)^2} - \lambda e^\vx = -\lambda e^\vx $$ but surely this is incorrect

2

There are 2 best solutions below

0
On

$\newcommand{\vx}{\mathbf{x}}$ I think the problem arises with how you compute $\nabla_\vx e^\vx$.

Here is what I think is the correct way to compute the derivative: $$ \begin{align} \nabla_\vx \left[(e^\vx) ^\top A e^\vx\right] &= \frac{\partial e^\vx}{\partial\vx}A e^\vx + \frac{\partial A e^\vx}{\partial\vx} e^\vx \\ &= \begin{bmatrix} \frac{\partial e^{x_1}}{\partial x_1} & \frac{\partial e^{x_2}}{\partial x_1} & \cdots &\frac{\partial e^{x_n}}{\partial x_1} \\ \frac{\partial e^{x_1}}{\partial x_2} & \frac{\partial e^{x_2}}{\partial x_2} & \cdots &\frac{\partial e^{x_n}}{\partial x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial e^{x_1}}{\partial x_n} & \frac{\partial e^{x_2}}{\partial x_n} & \cdots &\frac{\partial e^{x_n}}{\partial x_n} \end{bmatrix} A e^\vx + \frac{\partial e^\vx}{\partial\vx}A ^\top e^\vx \\ &= \begin{bmatrix} e^{x_1} & 0 & \cdots & 0 \\ 0 & e^{x_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0& \cdots & e^{x_n} \end{bmatrix} A e^\vx + \begin{bmatrix} e^{x_1} & 0 & \cdots & 0 \\ 0 & e^{x_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0& \cdots & e^{x_n} \end{bmatrix} A ^\top e^\vx \\ &=\left( \begin{bmatrix} e^{x_1} & 0 & \cdots & 0 \\ 0 & e^{x_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0& \cdots & e^{x_n} \end{bmatrix} A + \begin{bmatrix} e^{x_1} & 0 & \cdots & 0 \\ 0 & e^{x_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0& \cdots & e^{x_n} \end{bmatrix} A ^\top \right)e^\vx \\ \end{align} $$

0
On

$ \def\h{\odot} \def\o{{\tt1}} \def\a{\alpha} \def\b{\beta} \def\l{\lambda} \def\p{\partial} \def\L{{\large\cal L}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\bR#1{\big(#1\big)} \def\BR#1{\Big(#1\Big)} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\c#1{\color{red}{#1}} \def\cFp{\c{F'}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $A scalar function $\phi(z)$ and its derivative $\phi'(z)$ can be applied element-wise to a vector argument $x$ to generate vector-valued results $$\eqalign{ f = \phi(x), \qquad f' = \phi'(x) \\ }$$ The $\c{\rm differential}$ of such an element-wise function can be written using the element-wise product $(\h),\,$ which in turn can be replaced by a diagonal matrix
$$\eqalign{ F' &= \Diag{f'} \qiq f' = F'\o,\quad \o^TF' = \LR{f'}^T \\ \c{df} &= f'\odot dx \;\equiv\; \c{F'\,dx} \\ \o^Tdf &= \o^TF'\,dx \,\equiv\; \LR{f'}^Tdx \\ }$$ For the exponential function, things are somewhat simplified since $\:f'(x)=f(x)=e^x$

For typing convenience, assume that $(A,B)$ are symmetric and define the scalars $$\eqalign{ \a &= f^TAf &\qiq d\a = 2f^TA\,\c{df} &\,=\,2f^TA\c{F'\,dx} &\,=\,\LR{2F'Af}^Tdx \\ \b &= f^TBf &\qiq d\b = 2f^TB\,\c{df} &\,=\,2f^TB\c{F'\,dx} &\,=\,\LR{2F'Bf}^Tdx \\ }$$ Write the Lagrangian using the above notation, then calculate its gradient $$\eqalign{ \L &= \b^{-1}\a - \l\LR{\o^Tf-1} \\ d\L &= \fracLR{ \b\,d\a-\a\,d\b}{\b^2} \;-\; \l\LR{\o^Tdf} \\ &= \fracLR{ 2\b F'Af-2\a F'Bf}{\b^2}^Tdx \;-\; \LR{\l f'}^Tdx \\ &= 2\b^{-1}\BR{\cFp Af-\L\cFp Bf}^Tdx \;-\; \LR{\l\cFp \o}^Tdx \\ \grad{\L}{x} &= \fracLR2{f^TBf}\cdot\c{\Diag{e^x}}\cdot\BR{Af-\L Bf - \l\o} \\ }$$