derivative of $(Z\alpha-Y\alpha X\alpha)^2$ with respect to $\alpha$

101 Views Asked by At

I want to calculate the derivative of this function and solve the gradient equation

$$f(\alpha)=(Z\alpha-Y\alpha X\alpha)^2$$

Where $Z,Y,X \in \mathbb{R}^n$ and $\alpha^T\in \mathbb{R}^n$. Hence all the products $Z\alpha,Y\alpha, X\alpha$ are scalar. So is $f(\alpha)$.

What I did was

$$\nabla_\alpha f(\alpha) = 2(Z-2Y\alpha X)(Z\alpha-Y\alpha X\alpha)=0$$ $$Y\alpha X=\frac{1}{2}Z$$

not sure how to proceed. Multiplying by a vector of ones?

1

There are 1 best solutions below

4
On BEST ANSWER

First of all, let us rewrite all the vectors as column vectors (common convention) and using lower-case letters (also a common choice and easier to type): $$ f(\alpha) = (z^T\alpha - y^T\alpha\, x^T\alpha)^2 $$ where $\alpha, x, y, z \in \mathbb{R}^n$.

Taking the gradient is fairly easy, just apply the chain rule a couple of times: \begin{align} \nabla_\alpha f(\alpha) &= 2(z^T\alpha - y^T\alpha x^T\alpha)\nabla_\alpha(z^T\alpha - y^T\alpha \, x^T\alpha)\\ &=2(z^T\alpha - y^T\alpha x^T\alpha)(z-\nabla_\alpha(y^T\alpha\, x^T\alpha))\\ &=2(z^T\alpha - y^T\alpha x^T\alpha)(z- y^T\alpha x - x^T\alpha y) \\ &=2(z^T\alpha - y^T\alpha x^T\alpha)(z- (x y^T + y x^T)\alpha). \end{align} Note that $xy^T$ and $yx^T$ are two $\mathbb{R}^{n\times n}$ matrices and, in general $$ xy^T + yx^T \ne 2 xy^T \ne 2 yx^T. $$ Let us introduce the following matrices \begin{align} A &= \begin{pmatrix} x & y\end{pmatrix} \in\mathbb{R}^{n\times 2} & B &= \begin{pmatrix} y & x\end{pmatrix} \in\mathbb{R}^{n\times 2} & \end{align} and note that $AB^T = xy^T + yx^T \in\mathbb{R}^{n\times n}$. Then, we can rewrite the gradient as $$ \nabla_\alpha f(\alpha)=2(z^T\alpha - y^T\alpha x^T\alpha)(z- AB^T\alpha). $$

Next, even though you don't state it explicitly, it seems you're also interested in the stationary points of $f$, that is the points where the gradient is zero. In other words we'd like to find for which values of $\alpha$ we have $$ (z^T\alpha - y^T\alpha x^T\alpha)(z- AB^T\alpha) = 0. $$ This is only possible if the scalar $z^T\alpha - y^T\alpha x^T\alpha$ is zero or if so is the vector $z- AB^T\alpha$.

Case $z^T\alpha - y^T\alpha x^T\alpha=0$.

Besides the trivial solution $\alpha = 0$, this case is not straightforward as we need to solve a quadratic equation in $n$ variables. Apart from some special cases, the implicit equation of the solution space is all what we get (at least, I don't see other options at the moment).

Update: This case does have a simple solution (see @greg's comment below): $$ \alpha = \frac{z^T v}{x^T v\: y^T v}\,v\qquad\forall v \text{ such that } x^T v \ne 0 \text{ and } y^T v \ne 0. $$

Case $z- AB^T\alpha=0$

When $n=2$ this is a simple system of two equations and two unknowns.

When $n > 2$, we can work this case out as follows. First, note that a solution exists only if $z$ is in the range of $A=(x\quad y)$, that is $z=ax + by$ for some $a,b\in \mathbb{R}$. Then, assuming $x$ and $y$ are linearly independent, we can multiply both sides by the left inverse of $A$ and obtain $$ AB^T \alpha = z \iff B^T \alpha = \bigl(A^T A)^{-1} A^T z. $$ Now, note that we can't use the same approach again since $B^T$ is a fat matrix and, thus, it has no left inverse. However, we can use its right inverse matrix and write $$ \alpha = B \bigl(B^T B)^{-1}\bigl(A^T A)^{-1} A^T z + P_B^\perp w $$ where $$ P_B^\perp = I - B \bigl(B^T B)^{-1} B^T \in \mathbb{R}^{n \times n} $$ is the projection matrix onto the space orthogonal to the range of $B$ and $w$ is any vector in $\mathbb{R}^n$.

Remark: We can rewrite everything as a function of $A$ only, as opposed to using both $A$ and $B$. Indeed, it is enough to notice that $$ B = A J $$ where $$ J = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} $$ is the $2\times 2$ inversion matrix. Then, after simple algebra, $$ \alpha = A \bigl(A^T A)^{-1}J\bigl(A^T A)^{-1} A^T z + P_A^\perp w $$ where $$ P_A^\perp = I - A \bigl(A^T A)^{-1} A^T = P_B^\perp. $$

You can even go further and express everything in terms of the vectors $x$ and $y$, but it doesn't seem to bring any additional insight.