Why does least square method work? Is it related to Jacobian matrix?

Question

Why does least square method work? Is it related to Jacobian matrix?

293 Views Asked by Bumbble Comm At 06 Apr 2026 - 7:50

There are two questions:

Does there exist any relationship between the Jacobian and the least square "cost" function?
If so, make a reasonable explanation.

In a least-square method, to approximate $Y_{m \times 1}$ with input $X_{m \times n}$ and parameters $\theta_{n \times 1}$, e.g., the least method is to compress the value $J$, where

$J = ({X}{\theta} - Y)^T({X}{\theta} - Y)$ is minimized.

Something comes to my mind when I see this, which is the Taylor expansion,

$f({\textbf {x}}) = f(\textbf{x}_0) + (\textbf{x} - \textbf{x}_0) \nabla f(\textbf{x}_0) + \frac{1}{2}(\textbf{x} - \textbf{x}_0)^T \textbf{H}_f({\textbf x_0}) (\textbf{x} - \textbf{x}_0) + \textrm{<third-order-partial or higher>}$

The second-order is so much similar to what $J$ looks like. At first I think it is Jacobian, but it is more like part of the $f$ in the expression above. In my mind, if you can suppress the value of the second-order term, then it means second-order and higher terms are suppressed, such that

$f({\textbf {x}}) = f(\textbf{x}_0) + (\textbf{x} - \textbf{x}_0) \nabla f(\textbf{x}_0)$

is a good approximation. But I can't prove this rigorously. Is there any good way to prove this? Or is this idea reasonable?

Then, since the second order part of $f$ can be written as $\frac{1}{2}(\textbf{x} - \textbf{x}_0)^T \nabla \textbf{J}_f({\textbf x_0}) (\textbf{x} - \textbf{x}_0)$

and I find this,

http://www.math.iit.edu/~fass/477577_Chapter_5.pdf

Which is saying, you can solve least square problems using Cholesky decomposition. My imagination continues on using Cholesky decomposition on the Jacobian matrix, for example,

$\nabla \textbf{J}_f({\textbf x_0}) = \textbf{K}^T\textbf{K}$, and then

$(\textbf{x} - \textbf{x}_0)^T \textbf{K}^T \textbf{K}(\textbf{x} - \textbf{x}_0)$ is even more similar to $J = ({X}{\theta} - Y)^T({X}{\theta} - Y)$.

Without some rigorous reasoning, I am writing

${X}{\theta} - Y = \textbf{K}\textbf{x} - \textbf{K}\textbf{x}_0$

It seems so well matched because the first terms on both sides are based on the variable and the second terms are constants/references. It is an imagination and similarity does not necessarily mean identity. But I just want a clear picture on these terms. Explaining why they are similar should be very helpful.

Also, from some resources, Jacobian $\textbf{J}$ is denoted as a derivative matrix, and the best approximation of affine transformation $\textbf{T}(\textbf {x})$ for a differentiable $\textbf{F}: \mathbb{R}^n \rightarrow \mathbb{R}^m$ on point $\textbf {p}$ is,

$\textbf{T}(\textbf {x}) = \textbf{F}(\textbf {p}) + \textbf{J}(\textbf {p})(\textbf {x}-\textbf {p})$.

The least square method is to find the best projection from a high-dimension space to a lower-dimension space. Is this related to least square method?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-02-06 10:50:23

Whoops. I think the simplest answer is I got confused with the $J$, which serves as a notation for functional derivative. Jacobian does not look like that and it does not function like this.

Unless someone can prove there is some relationship, I already have my own answer. No, they are different. At least I spent some time thinking and deriving the process.

I should read more books before asking any questions.

Why does least square method work? Is it related to Jacobian matrix?

There are 1 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions