Is there a closed-form analytical solution to: Maximize $y^T (X \beta) $ s.t. $(X \beta)^T (X \beta) = y^T y$.

Question

Is there a closed-form analytical solution to: Maximize $y^T (X \beta) $ s.t. $(X \beta)^T (X \beta) = y^T y$.

82 Views Asked by Bumbble Comm At 04 May 2026 - 2:55

Maximize $y^T (X \beta) $ s.t. $(X \beta)^T (X \beta) = y^T y$.

Here $y$ is a known vector with size $n$ and $X$ is a known $n$ by $m$ matrix. $\beta$ is the unknown vector with size $m$ we want to find the solution to. $X$ is "skinny" so $n >> m$

I feel like intuitively there should be a closed form solution. $y^T (X \beta) $ from Cauchy inequality, maximizes when $y$ and $X \beta$ are linearly dependent. $(X \beta)^T (X \beta) = y^T y$ just enforces that the standard deviation of linear combination of $\beta X$ and $y$ are the same. But I cannot derive it ...

Original Q&A

There are 3 best solutions below

Bumbble Comm On 26 Dec 2022 - 2:15

Write the Lagrangian as follows: $\max y'(X\beta) - \frac{\lambda}{2} \{ (X\beta)'(X\beta) - y'y \}$.

This gives $\beta = \frac{1}{\lambda} (X'X)^{-1}X'y$.

Substituting in the constraint gives:

$\left(\frac{1}{\lambda} X(X'X)^{-1}X'y\right)'\left(\frac{1}{\lambda} X(X'X)^{-1}X'y\right) = y'y$

which simplifies to $\frac{1}{\lambda^2} = \frac{y'y}{y'X(X'X)^{-1}X'y}$

Hence, $\beta = \left( \frac{y'y}{y'X(X'X)^{-1}X'y}\right)^{1/2} (X'X)^{-1}X'y$.

Bumbble Comm On 26 Dec 2022 - 2:42

Plenty of answers have pointed out the Lagrange multipliers approach (which is certainly a good way to do this, and more general than the below). Here's an alternative.

Notice that the solution derived in the other answers is different from your intuition from Cauchy-Schwarz - the optimum $\beta_*$ is not $\propto X^\top y,$ but is tilted away from this to an extent due to the constraint. The Cauchy-Schwarz argument would work if instead the constraint were of the form $\beta^\top \beta = y^\top y$, and then we would recover the aligned solution. So why isn't C-S working for this question? The problem is that the the standard Cauchy-Schwarz works with vectors that are norm bounded, i.e. have $u^\top u$ constrained, but this is misaligned with the constraint of this question, which is on $\beta^\top X^\top X \beta.$ What is essentially happening is that a different norm than the standard $\ell_2$ is being restricted, and we need to adapt our use of Cauchy-Schwarz to respect this.

Let $\|\cdot\|$ be the usual $\ell_2$-norm. Define, for a positive definite symmetric matrix $M,$ $\| v\|_M = \sqrt{v^\top M v} := \|M^{1/2} v\|,$ where for concreteness $M^{1/2}$ is the principal square root (but any one would do). Notice that $\|\cdot\| = \|\cdot \|_I$. Now observe via the standard Cauchy-Schwarz inequality, $$ u^\top v = u^\top M^{-1/2} M^{1/2} v \le \|M^{-1/2} u\| \cdot \|M^{1/2} v\| = \|u\|_{M^{-1}} \|v\|_M,$$ with equality iff $M^{-1/2} u\propto M^{1/2} v \iff v \propto M^{-1}v$. If you like you can think of this as an extended version of Cauchy-Schwarz. (BTW you should show that $\|\cdot\|_M$ is a norm for positive definite $M.$)

Now, our constraint is $\beta^\top X^\top X \beta = \|\beta\|_{X^\top X} = \|y\|^2.$ Here notice that $X^\top X$ is a positive definite symmteric matrix so long as it is invertible. So, we conclude that if $X^\top X$ is invertible, then $$ (X^\top y)^\top \beta \le \|X^\top y\|_{(X^\top X)^{-1}} \|\beta\|_{X^\top X},$$ with optimality when $$ \beta \propto (X^\top X)^{-1} X^\top y.$$ The proportionality constant is simple to derive from $\|\beta\|_{X^\top X} = \|y\|^2.$

If $X^\top X$ is not invertible, things are messier - it's easier to go via the Lagrange-multipliers route (where you'd end up with a pseudoinverse in the picture), but I think you can also develop solutions by adding a small identity perturbation to $X^\top X$ and then developing limits (haven't worked this out though, so warnings!).

**Bumbble Comm** · Accepted Answer

Lagrangian: $$y^\top X \beta + \lambda(\beta^\top X^\top X \beta - y^\top y)$$

KKT conditions:

Stationarity: $X^\top y + 2\lambda X^\top X \beta = 0$
Primal feasibility: $\beta^\top X^\top X \beta = y^\top y$

Stationarity implies $\beta = c(X^\top X)^{-1} X^\top y$ for some scalar $c$. Plugging this into the primal feasibility condition will help you figure out the correct scalar $c$.

Is there a closed-form analytical solution to: Maximize $y^T (X \beta) $ s.t. $(X \beta)^T (X \beta) = y^T y$.

There are 3 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in OPTIMIZATION

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions