In Least Square optimization,
A=\begin{pmatrix}
1 & t_1 & t_1^2 & \cdots & t_1^k \\
1 & t_2 & t_2^2 & \cdots & t_2^k \\
\vdots & \vdots& \vdots & \ddots & \vdots \\
1 & t_n & t_n^2 & \cdots & t_n^k
\end{pmatrix}
b=\begin{pmatrix}
s_1 \\
s_2 \\
\vdots\\
s_n\\
\end{pmatrix}
c=\begin{pmatrix}
x_0 \\
x_1 \\
\vdots\\
x_n\\
\end{pmatrix}
$f(x_0,x_1,...,x_n)=\left\lVert b-Ax\right\rVert ^2=b.b-2A^Tb.x+x.A^TAx$.
In this how is $\nabla f(x)$ calculated. I know how to calculate gradient vector, but here with matrix notation I don't understand how in the book they have simply written it as $\nabla f(x)=-2A^Tb+2A^TAx$.
Also then they go on to show Hessian=$2A^TA$ is positive definite. I don't understand this part. Since $t_1,t_2, ..,t_n$ are distinct A has columns which are linearly independent so if $Ax=0$ then $x=0$. Then since $x.A^TAx=\left\lVert Ax\right\rVert^2 $ they say that Hessian is positive definite.
Why is the columns of A being linearly independent important?
I don't understand how they say Hessian is positive definite by $x.A^TAx=\left\lVert Ax\right\rVert^2 $
Can someone please explain this to me
Let $$ f_1(x) = b\cdot b ~,~~ f_2(x) = (A^T b)\cdot x ~,~~ f_3(x) = x\cdot(A^T A x) $$ We can write these in terms of components (with summation over repeated indices implied) as $$ f_1(x) = b_i b_i ~,~~ f_2(x) = A_{ij} b_i x_j ~,~~ f_3(x) = A_{ij} A_{ik} x_j x_k \,. $$
Gradient
The gradient is defined as $$ [\nabla f(x)]_p = \frac{\partial f}{\partial x_p} $$ Therefore $$ \begin{align} [\nabla f_1(x)]_p = \frac{\partial f_1}{\partial x_p} &= 0 \\ [\nabla f_2(x)]_p = \frac{\partial f_2}{\partial x_p} &= A_{ij} b_i \frac{\partial x_j}{\partial x_p} = A_{ij} b_i \delta_{jp} = A_{ip} b_i \\ [\nabla f_3(x)]_p = \frac{\partial f_2}{\partial x_p} &= A_{ij} A_{ik} \left[\frac{\partial x_j}{\partial x_p} x_k + x_j \frac{\partial x_k}{\partial x_p}\right]= A_{ij} A_{ik} \left[ \delta_{jp} x_k + x_j \delta_{kp}\right] = A_{ip} A_{ik} x_k + A_{ij} A_{ip} x_j \\ &= 2A_{ip} A_{ij} x_j \end{align} $$ Going back to matrix notation, $$ \nabla f_1(x) = 0 ~,~~ \nabla f_2(x) = A^T b ~,~~ \nabla f_3(x) = 2 A^TAx \,. $$ Therefore $$ \nabla f(x) = -2 A^T b + 2 A^T A x \,. $$
Hessian
The Hessian is defined as $$ [\nabla\nabla f(x)]_{pq} = \frac{\partial^2 f}{\partial x_p \partial x_q} = \frac{\partial}{\partial x_q}\left(\frac{\partial f}{\partial x_p}\right) $$ Repeating the process that we used for the gradient, $$ \begin{align} [\nabla\nabla f_2(x)]_{pq} &= \frac{\partial}{\partial x_q} (A_{ip} b_i) = 0 \\ [\nabla\nabla f_3(x)]_{pq} &= \frac{\partial}{\partial x_q} ( 2A_{ip} A_{ij} x_j) = 2A_{ip} A_{ij} \delta_{jq} = 2A_{ip} A_{iq} \end{align} $$ Therefore $$ \nabla\nabla f(x) = 2 A^T A \,. $$
Positive definite
You have to show that $$ t = z \cdot ( A^T A z) > 0 $$ Once again, going to component notation, $$ t = z_p A_{ip} A_{iq} z_q = (Az)\cdot(Az) = \lVert Az \rVert^2 $$ which is positive unless $Az = 0$. You now have to exclude the case where $Az = 0$, which you have done by invoking linear independence. Therefore, the Hessian is positive definitive.