Geometry behind least squares under constraints

47 Views Asked by At

I'm trying to solve the following minimisation problem by Lagrange multipliers. My questions is:

How to interpret the solutions geometrically?

Given a dataset $x_1, \dots, x_n \in \mathbb R^d$, a parameter vector $\theta \in \mathbb R^d$ and some $b \in \mathbb R^d$, I want to minimise $$J(\theta) = \sum_{k=1}^n \lVert \theta - x_k \rVert^2$$ with respect to $\theta$ under the constraints

  1. $\theta^T b = 0$,
  2. $\lVert \theta - c \rVert^2 = 1$.

For 1., I put $\mathcal L(\theta,\lambda) = J(\theta) - \lambda \theta^T b$, $\lambda \in \mathbb R$, and set

$$\frac{\partial}{\partial \theta}\mathcal L(\theta,\lambda) =\sum_{k=1}^n \frac{\partial}{\partial \theta} \lVert \theta - x_k \rVert^2 - \frac{\partial}{\partial \theta} \lambda \theta^T b =\sum_{k=1}^n 2(\theta-x_k) - \lambda b = 0.$$

I'm somewhat shaky about the vector calculus, but I hope it's correct? Anyway, this is equivalent to $n \theta - n \bar x = \frac{1}{2} \lambda b$ and gives $\theta_0 = \frac{1}{2n} \lambda b + \frac{1}{n} \bar x $ where $\bar x$ is the sample mean. $\theta_0$ satisfies $\nabla J(\theta) = \lambda \nabla \theta^T b$. By the constraint ${\theta_0}^T b = 0$ we get $\lambda = -2 {\bar x}^T \frac{b}{\lVert b \rVert}$.

Finally, this yields $$\theta_0 = \frac{1}{n} \bar x \left(1-\frac{b}{\lVert b \rVert}\right).$$

For 2. I get

$$\frac{\partial}{\partial \theta}\mathcal L(\theta,\lambda) =\sum_{k=1}^n \frac{\partial}{\partial \theta} \lVert \theta - x_k \rVert^2 - \frac{\partial}{\partial \theta} \lVert \theta - c \rVert^2 - 1 =\sum_{k=1}^n 2(\theta-x_k) - 2 \lambda (\theta - c)b.$$

Setting this equal to zero gives $$\theta_0 = \frac{1}{n-\lambda}\left(n \bar x + \lambda c\right).$$

It looks rather messy to feed this into the constraint to compute $\lambda$. Is there some trick I'm missing?

Anyway, my problem is that the exercise suggested a geometric meaning behind all of this and I have no clue. Could someone explain to me what's going on?