$w_t$ is the solution of $\max_z z^T w_{t - 1}$ s.t. $z^T \Sigma^{-1} z = 1$. Express $w_t$ as a function of $w_{t - 1}$

37 Views Asked by At

In our machine learning exam today in a exercise on Lagrange multipliers we had the following question.

Let $\Sigma \in \mathbb R^{d \times d}$ be a positive semidefinite matrix and $w_1, \ldots, w_T$ a sequence of vectors in $\mathbb R^d$, where $w_t$ is the solution of the constraint optimisation problem $$ \max_{z \in \mathbb R^d} z^T w_{t - 1} \quad \text{subject to} \quad z^T \Sigma^{-1} z = 1. $$ Find a closed form solution of $w_t$ as a function of $w_{t - 1}$.

Here's what I have done

The Lagrangian is \begin{equation*} L(z, \lambda) := z^T w_{t - 1} + \lambda(1 - z^T \Sigma^{-1} z). \end{equation*} In order for $z$ to be optimal, we require \begin{equation*} \frac{\partial L(z, \lambda)}{\partial z} = w_{t - 1} - \lambda \Sigma^{-1} z \overset{!}{=} 0 \iff w_{t - 1} = \lambda \Sigma^{-1} z \iff z = \frac{1}{\lambda} \Sigma w_{t - 1}. \end{equation*} Plugging the second last equality into the constraint $z^T \Sigma^{-1} z = 1$, we get \begin{equation*} z^T w_{t - 1} = \lambda z^T \Sigma^{-1} z = \lambda \end{equation*} and using the last equality we get \begin{equation*} \lambda = z^T w_{t - 1} = \frac{1}{\lambda} w_{t - 1}^T \Sigma w_{t - 1}, \end{equation*} implying \begin{equation*} \lambda = \sqrt{w_{t - 1}^T \Sigma w_{t - 1}}, \end{equation*} as $\Sigma$ is positive semidefinite (so we don't have to consider $- \sqrt{\ldots}$). We thus get \begin{equation*} w_t = z = \frac{\Sigma w_{t - 1}}{\sqrt{w_{t - 1}^T \Sigma w_{t - 1}}}. \end{equation*} I am not sure about this last step: Since we know that $w_{t - 1}$ fulfils $w_{t - 1}^T \Sigma w_{t - 1} = 1$ by construction (shouldn't it instead be $\Sigma^{-1}$??), this simplifies to $w_t = \Sigma w_{t - 1}$.

My questions

Is what I have done (or which parts of it) correct? And if not, can you give me a hint on how to solve this problem.

The next question (in a previous exam, where this question was also asked) was: "What algorithm does this look similar to regarding PCA mentioned in the lecture", for which the only answer I found is power iteration, i.e. $w_t = \frac{\Sigma w_{t - 1}}{\| \Sigma w_{t - 1} \|}$, which is very different from my answer.

1

There are 1 best solutions below

0
On BEST ANSWER

In order for $z$ to be optimal, we require \begin{equation*} \frac{\partial L(z, \lambda)}{\partial z} = w_{t - 1} - \lambda \Sigma^{-1} z \overset{!}{=} 0 \iff ... \end{equation*}

This is not correct. Note that the derivative of $x^T Ax$ is $(A + A^T)x$ instead of $Ax$. Coincidently though, your final result is correct:

\begin{equation*} w_t = \frac{\Sigma w_{t - 1}}{\sqrt{w_{t - 1}^T \Sigma w_{t - 1}}}, \end{equation*}

because you were implicitly working with some $\lambda' = 2\lambda$.

I am not sure about this last step: Since we know that $w_{t - 1}$ fulfils $w_{t - 1}^T \Sigma w_{t - 1} = 1$ by construction (shouldn't it instead be $\Sigma^{-1}$??), this simplifies to $w_t = \Sigma w_{t - 1}$.

By construction we have $w_{t}^T \Sigma^{-1} w_{t} = 1$, which is totally different from $w_{t}^T \Sigma w_{t} = 1$.

The next question (in a previous exam, where this question was also asked) was: "What algorithm does this look similar to regarding PCA mentioned in the lecture", for which the only answer I found is power iteration, i.e. $w_t = \frac{\Sigma w_{t - 1}}{\| \Sigma w_{t - 1} \|}$, which is very different from my answer.

Indeed, they are the same algorithm, just with respect to different norms. Standard power iteration uses the $\ell_2$ norm, while the above algorithm uses a norm with respect to matrix: $\|x\|^2_{A} = x^T Ax$. To see this, just notice that

$$w_t = \frac{\Sigma w_{t-1}}{\sqrt{w_{t-1}^T\Sigma w_{t-1}}} = \frac{\Sigma w_{t-1}}{\sqrt{w_{t-1}^T\Sigma^T \Sigma^{-1} \Sigma w_{t-1}}} = \frac{\Sigma w_{t-1}}{\sqrt{(\Sigma w_{t-1})^T \Sigma^{-1} (\Sigma w_{t-1})}} = \frac{\Sigma w_{t-1}}{\|\Sigma w_{t-1}\|_{\Sigma^{-1}}}.$$