Yet another Least Squares matrix derivation

86 Views Asked by At

I understand the solution to the well known Least squares as explained in the following post

Least-squares solution to a matrix equation?

We solve for β so that below expression has minimal value.

$\mathbf(Y−Xβ)′ × (Y−Xβ)$

In above, lets assume that we have N samples each with D features

  • Y is N*1
  • X is N*D
  • β is D*1

I am wondering how the derivation steps would change had we assumed Y output shape is 1 * N

  • Y is N*1

So the equation for Y would be

$\mathbf Y = β'X'$

and not

$\mathbf Y = Xβ $

as in the original derivation.

Again, while I completely understand the steps in the original derivation, I could not solve if I had assumed Y = β′X′

$\mathbf (Y−β′X′)′×(Y−β′X′)$

Expanding the above for derivative wrt β yields below - but I couldn't proceed further

$\mathbf Y′Y − Y′β′X′ - XβY - Xββ′X′ $

For academic interest, I like to understand if it at all possible to solve for β this route and the steps

1

There are 1 best solutions below

0
On

In brief, the steps to solve a least squares problem are $$\eqalign{ y &= X\beta \quad\implies \min_\beta\,\|X\beta-y\|^2 \quad\implies \beta = (X^TX)^{-1}X^Ty \\ }$$ The decision to use $Y=y^T$ instead of $y$ has no effect on these steps, i.e. $$\eqalign{ Y^T &= X\beta \quad\implies \min_\beta\,\|X\beta-Y^T\|^2 \quad\implies \beta = (X^TX)^{-1}X^TY^T \\ }$$ The approach can be summarized as:
$\quad$Substitute the new $Y^T$ variable where ever the original $y$ variable appears.

Make this substitution in every expression, every formula, and every derivative.