Stationary points of a bivariate quadratic function along the standard unit circle

39 Views Asked by At

Where along the unit circle $\mathbf{x}(\theta)=\begin{bmatrix}cos\theta\\sin\theta\end{bmatrix}$ does the quadratic function $f(\mathbf{x})=(\mathbf{x}-\mathbf{\mu})^TP(\mathbf{x}-\mathbf{\mu})$ attain its stationary points?

My attempt

$$\frac{\partial f}{\partial \mathbf{x}}=(\mathbf{x}-\mathbf{\mu})^T(P+P^T)$$

$$\frac{\partial\mathbf{x}}{\partial\theta}=\begin{bmatrix}-sin\theta\\cos\theta\end{bmatrix}$$

$$\frac{df}{d\theta}=(\begin{bmatrix}cos\theta\\sin\theta\end{bmatrix}-\mathbf{\mu})^T(P+P^T)\begin{bmatrix}-sin\theta\\cos\theta\end{bmatrix}=0$$

How do I proceed to solve for $\theta$ interms of $\mu$ and $P$? Do I have to break up $P$ into its entries or can I get an explicit matrix/vector formula? (We can assume that $P$ is symmetric if it helps somehow.)

1

There are 1 best solutions below

0
On BEST ANSWER

Let us start by considering an example to build up intuition. Take $P$ to be the identity matrix, so that $f(x) = \|x-\mu\|^2$. It follows that $\nabla f = 2(x-\mu)$, so that for $\frac{df}{d\theta} = 0$ to have a solution, we have to have $\nabla f\left( \frac{dx}{d\theta} \right) = 0$. Since $x(\theta)$ and $\frac{dx}{d\theta}$ are always orthogonal, the stationary points are those for which $\frac{dx}{d\theta}$ and $\mu$ are orthogonal. Since a line and a circle intersect in two points, there are two solutions. If $\nu$ is a vector spanning the line orthogonal to $\mu \neq 0$, then the two solutions are $\frac{\pm \nu}{\|nu\|}$ (this also includes the case where $\mu$ is on the unit circle; in this case, $\mu$ is a stationary point because $(x-\mu) = 0$, so that $\nabla f = 0$ is orthogonal to $\frac{dx}{d\theta}$). If $\mu = 0$, all points are stationary and $f(x(\theta))$ is a constant function of $\theta$.

It's not so hard to picture this example: picture a paraboloid centered at $\mu$ and intersect it with a cylinder centered around the $z$-axis of radius $1$. The two surfaces will intersect, and the curve of intersection's $z$-coordinate will have minimal $z$-value closest to $\mu$ and maximal $z$-value farthest from $\mu$, given that $\mu$ is outside the unit circle. (At least that's the picture I have in my head.)


We can assume without loss of generality that $P$ is symmetric, because if we write $f(P,x,\mu) = (x-\mu)^{\top} P (x-\mu)$, then $f(P,x,\mu)$ is a real number, therefore $$ f(P,x,\mu) = f(P,x,\mu)^{\top} = f(P^{\top},x,\mu), $$ and by linearity we deduce $$ \frac {f(P, x,\mu) + f(P^{\top}, x, \mu)}2 = \frac 12 \left((x-\mu)^{\top} P (x-\mu) + (x-\mu)^{\top} P^{\top} (x-\mu) \right) \\ = \frac 12 (x-\mu)^{\top} \left( P(x-\mu) + P^{\top} (x-\mu) \right) \\ = \frac 12 (x-\mu)^{\top} \left( P + P^{\top} \right) (x-\mu) \\ = (x-\mu)^{\top} \left( \frac {P+P^{\top}}2 \right) (x-\mu) \\ = f((P+P^{\top})/2, x,\mu) $$ Since the matrix $\frac{P+P^{\top}}2$ is symmetric, the operation $P \mapsto \frac{P + P^{\top}}2$ has no effect on the problem and replaces $P$ by a symmetric matrix.


We assume without loss of generality that $P$ is a symmetric matrix. By the spectral theorem, $P$ is orthogonally diagonalizable, i.e. we can write $P = Q^{\top} D Q$ where $Q$ is an orthogonal matrix. It follows that we can re-write $$ f(P,x,\mu) = f(Q^{\top}DQ, x, \mu) \\ = (Q(x-\mu))^{\top} D Q(x-\mu) \\ = (Qx, Q\mu)^{\top} D (Qx - Q\mu) \\ = f(D, Qx, Q\mu) $$ so that up to a change of basis, we can assume without loss of generality that $D$ is diagonal, because the orthogonal matrix just "turns the problem around"; if $x$ is a stationary point of $x \mapsto f(D,Qx,Q\mu)$, then $Qx$ is a stationary point of $y \mapsto f(D,y,\mu)$. Note that the orthogonality of $Q$ is important; orthogonal matrices map the unit circle to itself.


Write $D = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}$ and $x = (x_1,x_2)$ and $\mu = (\mu_1,\mu_2)$. We have $\nabla f = (\lambda_1(x_1-\mu_1), \lambda_2(x_2-\mu_2))$, and since $\frac{dx}{d\theta}$ is parallel to $(-x_2, x_1)$, we see that the stationary points will satisfy $$ \lambda_1(x_1-\mu_1)x_2 - \lambda_2(x_2-\mu_2)x_1 = 0 = (\lambda_1-\lambda_2) x_1x_2 - \lambda_1\mu_1 x_2 + \lambda_2\mu_2 x_1. $$ We have three cases, up to a permutation matrix (note that a permutation matrix maps the unit circle to itself because it is orthogonal). Note that the unit circle is compact and our function is continuous, so there is a minimum and a maximum, which means at least $2$ stationary points for sure.

I considered splitting into five cases

  • $\lambda_1, \lambda_2 > 0$.
  • $\lambda_1 > 0 > \lambda_2$.
  • $\lambda_1, \lambda_2 < 0$.
  • $\lambda_1 > 0, \lambda_2 = 0$
  • $\lambda_1 = \lambda_2 = 0$

but I couldn't get more specific than the polynomial equation I wrote above. In the last two cases (which you could call trivial), we can already say a little bit: set $\lambda_2 = 0$ and assume $\lambda_1 \neq 0$ (there is nothing to say when $\lambda_1 = 0$), so that $f(D,x,\mu) = \lambda_1 (x_1-\mu_1)^2$ and the equation for stationary points becomes $$ 0 = \lambda_1 x_1x_2 - \lambda_1\mu_1 x_2 = \lambda_1(x_1 - \mu_1) x_2 $$ so that either $x_2 = 0$ (and $x_1 = \pm 1$) or $x_1 = \mu_1$, which is only possible if $\mu_1 = \pm 1$, and in this case $x_1 = \mu$. In both cases, the solutions are $(\pm 1, 0)$. So there are two solutions in this case. The extrema $(\pm 1, 0)$ take the values $\lambda_1 (\pm 1 - \mu_1)^2$, so one can tell the maxima apart from the minima by comparing those two values.

In the first three cases, all I could say is that this is an equation of the form $$ a x_1 x_2 + b x_1 + c x_2 = 0 $$ which we can interpret as a linear equation in $x_1$ (or in $x_2$), and solving the system of polynomial equations $$ a x_1 x_2 + b x_1 + c x_2 = 0 \\ x_1^2 + x_2^2 = 1 $$ is an intersection of two quadrics, which has at most four solutions. More than that I could not say.


Hope that helps,