Why does least square method work? Is it related to Jacobian matrix?

293 Views Asked by At

There are two questions:

  1. Does there exist any relationship between the Jacobian and the least square "cost" function?
  2. If so, make a reasonable explanation.

In a least-square method, to approximate $Y_{m \times 1}$ with input $X_{m \times n}$ and parameters $\theta_{n \times 1}$, e.g., the least method is to compress the value $J$, where

$J = ({X}{\theta} - Y)^T({X}{\theta} - Y)$ is minimized.

Something comes to my mind when I see this, which is the Taylor expansion,

$f({\textbf {x}}) = f(\textbf{x}_0) + (\textbf{x} - \textbf{x}_0) \nabla f(\textbf{x}_0) + \frac{1}{2}(\textbf{x} - \textbf{x}_0)^T \textbf{H}_f({\textbf x_0}) (\textbf{x} - \textbf{x}_0) + \textrm{<third-order-partial or higher>}$

The second-order is so much similar to what $J$ looks like. At first I think it is Jacobian, but it is more like part of the $f$ in the expression above. In my mind, if you can suppress the value of the second-order term, then it means second-order and higher terms are suppressed, such that

$f({\textbf {x}}) = f(\textbf{x}_0) + (\textbf{x} - \textbf{x}_0) \nabla f(\textbf{x}_0)$

is a good approximation. But I can't prove this rigorously. Is there any good way to prove this? Or is this idea reasonable?

Then, since the second order part of $f$ can be written as $\frac{1}{2}(\textbf{x} - \textbf{x}_0)^T \nabla \textbf{J}_f({\textbf x_0}) (\textbf{x} - \textbf{x}_0)$

and I find this,

http://www.math.iit.edu/~fass/477577_Chapter_5.pdf

Which is saying, you can solve least square problems using Cholesky decomposition. My imagination continues on using Cholesky decomposition on the Jacobian matrix, for example,

$\nabla \textbf{J}_f({\textbf x_0}) = \textbf{K}^T\textbf{K}$, and then

$(\textbf{x} - \textbf{x}_0)^T \textbf{K}^T \textbf{K}(\textbf{x} - \textbf{x}_0)$ is even more similar to $J = ({X}{\theta} - Y)^T({X}{\theta} - Y)$.

Without some rigorous reasoning, I am writing

${X}{\theta} - Y = \textbf{K}\textbf{x} - \textbf{K}\textbf{x}_0$

It seems so well matched because the first terms on both sides are based on the variable and the second terms are constants/references. It is an imagination and similarity does not necessarily mean identity. But I just want a clear picture on these terms. Explaining why they are similar should be very helpful.

Also, from some resources, Jacobian $\textbf{J}$ is denoted as a derivative matrix, and the best approximation of affine transformation $\textbf{T}(\textbf {x})$ for a differentiable $\textbf{F}: \mathbb{R}^n \rightarrow \mathbb{R}^m$ on point $\textbf {p}$ is,

$\textbf{T}(\textbf {x}) = \textbf{F}(\textbf {p}) + \textbf{J}(\textbf {p})(\textbf {x}-\textbf {p})$.

The least square method is to find the best projection from a high-dimension space to a lower-dimension space. Is this related to least square method?

1

There are 1 best solutions below

0
On

Whoops. I think the simplest answer is I got confused with the $J$, which serves as a notation for functional derivative. Jacobian does not look like that and it does not function like this.

Unless someone can prove there is some relationship, I already have my own answer. No, they are different. At least I spent some time thinking and deriving the process.

I should read more books before asking any questions.