How should I go about solving this lagrange equation containing vectors and matrices?

69 Views Asked by At

I have a Lagrange equation which depends on a multiplier $\lambda$. \begin{equation} f(\lambda)=a^HR^{-1}a + \lambda(||a-\bar{a}||^2-\varepsilon) \end{equation} I know $\bar{a}$, $R$ and $\varepsilon$. $a$ and $\bar{a}$ are $M-$dimensional complex column vectors and $R$ is a real, symmetric $M\times M$ matrix. $x^H$ represents the conjugate transpose of $x$.

I have tried the conventional approach of calculating $\frac{\partial f}{\partial\lambda}=0$ and $\nabla_af=0$, which results in \begin{equation} a^HR^{-1} = \lambda(a-\bar{a}) \end{equation} \begin{equation} ||a-\bar{a}||^2=(a-\bar{a})^H (a-\bar{a})=\varepsilon \end{equation} The equation for $\varepsilon$ is just my constraint which I'm already aware of. As far as I can tell this can't be solved - at least not using techniques that I'm aware of.

My aim is to eliminate $a$ so that I have an equation made up of known quantities and $\lambda$, which I can solve numerically for $\lambda$. Every approach I've tried has ended in a dead end, and I'm curious if there's a technique I'm unaware of that could help me out.

The Lagrange equation comes from Appendix A of [1].

[1] https://ieeexplore.ieee.org/document/6381484

1

There are 1 best solutions below

1
On

I'll give you the correct way to calculate the optimal points since I think your system of equation isn't correct. Still though, if you calculate the gradient wrt $\mathbf a$, then, the Hessian turns out to be all $0$s. This is because the function $\mathbf{a^HR^{-1}a}$ differentiated partially wrt strictly $\mathbf a$ is $\mathbf{a^HR^{-1}}$ which differentiated again is $\mathbf 0$. So your function is essentially a hyperplane in $M$ dimensions, which obviously has no absolute maxima or minima. Let's start:

Since $\mathbf{a}\in \mathbb{C}^M$, each element $\mathbf a_i$ will be expressed as $x_i+iy_i$.

So your original lagrangian equation was

$$f(\lambda)=\mathbf{a^HR^{-1}a}+\lambda(||\mathbf{a-\bar{a}}||^2-\varepsilon)$$

The lagrangian multiplier method tells us to impose

$$\boldsymbol{\nabla}_{\mathbf a}f=\mathbf 0\ \ \text{and}\ \ \dfrac{\partial f}{\partial\lambda}=0.$$

The latter is trivial so I'll simply do the gradient.

In index notation, the first function can be expressed as $$\mathbf{a^HR^{-1}a}=R_{ij}^{-1}\bar{a_i}a_j,$$ for $i,j\in(1,...,M)$, being the matrix' rows and columns, respectively.

Now, taking the gradient of this, we get

$$(\boldsymbol{\nabla}_{\mathbf a}(\mathbf{a^HR^{-1}a}))_k=\dfrac{\partial}{\partial a_k}[R_{ij}^{-1}\bar{a_i}a_j]=R_{ij}^{-1}\bar{a_i}\dfrac{\partial a_j}{\partial a_k}=R_{ij}^{-1}\bar{a_i}\delta_k^j=R_{ik}^{-1}\bar{a_i},$$

for $k\in(1,...,M).$

What's left is taking the gradient of $\lambda(||\mathbf{a-\bar{a}}||^2-\varepsilon)$, which is what I think you did wrong (missed a $2$):

$$ \begin{aligned} (\boldsymbol{\nabla}_{\mathbf a}\lambda(||\mathbf{a-\bar{a}}||^2-\varepsilon))_k&=\lambda(\boldsymbol{\nabla}_{\mathbf a}(||\mathbf{a-\bar{a}}||^2))_k\\ &=\lambda(\boldsymbol{\nabla}_{\mathbf a}((\mathbf{a-\bar a})^H(\mathbf{a-\bar a})))_k\\ &=\lambda(\boldsymbol{\nabla}_{\mathbf a}((\mathbf{\bar a-a})^T(\mathbf{a-\bar a})))_k\\ &=-\lambda(\boldsymbol{\nabla}_{\mathbf a}(a_i-\bar a_i)(a_i-\bar a_i)))_k\\ &=-\lambda\dfrac{\partial(a_i-\bar a_i)^2}{\partial a_k}\\ &=-2\lambda(a_i-\bar a_i)\underbrace{\dfrac{\partial a_i}{\partial a_k}}_{\delta^i_k}=-2\lambda(a_k-\bar a_k). \end{aligned}$$

So the system of equations we get is

$$ \left\{ \begin{aligned} &R_{ik}^{-1}\bar a_i-2\lambda(a_k-\bar a_k)=0;\\ &||\mathbf{a-\bar a}||^2=\varepsilon. \end{aligned} \right. $$

Now it's time to use $\mathbf a_i=x_i+iy_i$:

For the first $M$ equations due to the gradient and the one due to the constraint, we can rewrite these elements explicitly as complex numbers:

$$ \begin{aligned} R_{ik}^{-1}\bar a_i-2\lambda(a_k-\bar a_k)&=R_{ik}^{-1}\bar a_i-2\lambda(((x_k+iy_k)-(x_k-iy_k)))\\ &=R_{ik}^{-1}\bar a_i-i4\lambda y_k\\ &=R_{ik}^{-1}(x_i-iy_i)-i4\lambda y_k\\ &=R_{ik}^{-1}x_i-i(R_{ik}^{-1}y_i+4\lambda y_k). \end{aligned} $$

$$ ||\mathbf{a-\bar a}||^2=(a_i-\bar a_i)^2=4y_i^2=\varepsilon. $$

Having turned the gradient equations into a complex number that includes the unknowns, since it is equated to $0$, we will impose the real and imaginary parts to be $0$:

$$ \left\{ \begin{aligned} &R_{ik}^{-1}x_i-i(R_{ik}^{-1}y_i+4\lambda y_k)=0\iff \left\{ \begin{aligned} &R_{ik}^{-1}x_i=0\\ &R_{ik}^{-1}y_i+4\lambda y_k=0 \end{aligned} \right. \\ &4y_i^2=\varepsilon. \end{aligned} \right. $$

For the real part, in view that the matrix' elements aren't necessarily all $0$, we obtain that all $x_i=0$, so $\mathbf a_i=iy_i$. As for the imaginary part equation, I'd combine the rest which only depend on $y_i$ and solve for them. Basically what we're left with is:

$$ \left\{ \begin{aligned} &R_{ik}^{-1}y_i+4\lambda y_k=0\\ &4y_i^2=\varepsilon. \end{aligned} \right. $$

The general approach would be isolating $\lambda$ in one of the $k$ equations and substitute in another. At last, using the constraint equation, you'd find $y_i$ in terms of the rest of $y$s and then substitute again and again until you get all $y$s.