Solve Linear Least Squares (Using the Gradient)

Question

Solve Linear Least Squares (Using the Gradient)

1.7k Views Asked by Bumbble Comm At 03 Apr 2026 - 6:46

In Least Square optimization,
A=\begin{pmatrix} 1 & t_1 & t_1^2 & \cdots & t_1^k \\ 1 & t_2 & t_2^2 & \cdots & t_2^k \\ \vdots & \vdots& \vdots & \ddots & \vdots \\ 1 & t_n & t_n^2 & \cdots & t_n^k \end{pmatrix}
b=\begin{pmatrix} s_1 \\ s_2 \\ \vdots\\ s_n\\ \end{pmatrix}
c=\begin{pmatrix} x_0 \\ x_1 \\ \vdots\\ x_n\\ \end{pmatrix}

$f(x_0,x_1,...,x_n)=\left\lVert b-Ax\right\rVert ^2=b.b-2A^Tb.x+x.A^TAx$.

In this how is $\nabla f(x)$ calculated. I know how to calculate gradient vector, but here with matrix notation I don't understand how in the book they have simply written it as $\nabla f(x)=-2A^Tb+2A^TAx$.

Also then they go on to show Hessian=$2A^TA$ is positive definite. I don't understand this part. Since $t_1,t_2, ..,t_n$ are distinct A has columns which are linearly independent so if $Ax=0$ then $x=0$. Then since $x.A^TAx=\left\lVert Ax\right\rVert^2 $ they say that Hessian is positive definite.

Why is the columns of A being linearly independent important?

I don't understand how they say Hessian is positive definite by $x.A^TAx=\left\lVert Ax\right\rVert^2 $

Can someone please explain this to me

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2015-12-08 22:36:15

Let $$ f_1(x) = b\cdot b ~,~~ f_2(x) = (A^T b)\cdot x ~,~~ f_3(x) = x\cdot(A^T A x) $$ We can write these in terms of components (with summation over repeated indices implied) as $$ f_1(x) = b_i b_i ~,~~ f_2(x) = A_{ij} b_i x_j ~,~~ f_3(x) = A_{ij} A_{ik} x_j x_k \,. $$

Gradient

The gradient is defined as $$ [\nabla f(x)]_p = \frac{\partial f}{\partial x_p} $$ Therefore $$ \begin{align} [\nabla f_1(x)]_p = \frac{\partial f_1}{\partial x_p} &= 0 \\ [\nabla f_2(x)]_p = \frac{\partial f_2}{\partial x_p} &= A_{ij} b_i \frac{\partial x_j}{\partial x_p} = A_{ij} b_i \delta_{jp} = A_{ip} b_i \\ [\nabla f_3(x)]_p = \frac{\partial f_2}{\partial x_p} &= A_{ij} A_{ik} \left[\frac{\partial x_j}{\partial x_p} x_k + x_j \frac{\partial x_k}{\partial x_p}\right]= A_{ij} A_{ik} \left[ \delta_{jp} x_k + x_j \delta_{kp}\right] = A_{ip} A_{ik} x_k + A_{ij} A_{ip} x_j \\ &= 2A_{ip} A_{ij} x_j \end{align} $$ Going back to matrix notation, $$ \nabla f_1(x) = 0 ~,~~ \nabla f_2(x) = A^T b ~,~~ \nabla f_3(x) = 2 A^TAx \,. $$ Therefore $$ \nabla f(x) = -2 A^T b + 2 A^T A x \,. $$

Hessian

The Hessian is defined as $$ [\nabla\nabla f(x)]_{pq} = \frac{\partial^2 f}{\partial x_p \partial x_q} = \frac{\partial}{\partial x_q}\left(\frac{\partial f}{\partial x_p}\right) $$ Repeating the process that we used for the gradient, $$ \begin{align} [\nabla\nabla f_2(x)]_{pq} &= \frac{\partial}{\partial x_q} (A_{ip} b_i) = 0 \\ [\nabla\nabla f_3(x)]_{pq} &= \frac{\partial}{\partial x_q} ( 2A_{ip} A_{ij} x_j) = 2A_{ip} A_{ij} \delta_{jq} = 2A_{ip} A_{iq} \end{align} $$ Therefore $$ \nabla\nabla f(x) = 2 A^T A \,. $$

Positive definite

You have to show that $$ t = z \cdot ( A^T A z) > 0 $$ Once again, going to component notation, $$ t = z_p A_{ip} A_{iq} z_q = (Az)\cdot(Az) = \lVert Az \rVert^2 $$ which is positive unless $Az = 0$. You now have to exclude the case where $Az = 0$, which you have done by invoking linear independence. Therefore, the Hessian is positive definitive.

Solve Linear Least Squares (Using the Gradient)

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions