I have the training set $\{(0,0), (1,1), (2,1), (1,2)\}$. I want to find the best quadratic polynomial of the form $f(x) = a + bx + cx^2$ that minimizes the sum of squared error between $y$ and the regression function. I know that that vector is given by $\beta = (D^{T}D)^{-1}D^T\textbf{y}$, I don't know what the data matrix $D$ is supposed to be in this situation, however. How am I supposed to calculate $D$?
How can I perform polynomial regression with this dataset?
71 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
In your particular problem you are given data points $(x_j,y_j)$ and where $$y_j=a+ bx_j + c x^2_j+\varepsilon_j$$ In terms of vectors, this model can be expressed as $$\mathbf{y}=X\boldsymbol{\beta} + \varepsilon$$ where $\mathbf{y}=[y_1,\ldots,y_N]^\intercal$, $$X=\begin{pmatrix} 1 & x_1 & x_1^2 \\ \vdots & \vdots & \vdots\\ 1 & x_N & x^2_N \end{pmatrix} $$ $\boldsymbol{\beta}=[a, b, c]^\intercal$, and $\boldsymbol{\varepsilon}=[\varepsilon_1,\ldots, \varepsilon_N]^\intercal$.
This is $$\begin{pmatrix} 0\\ 1\\1\\2\end{pmatrix}=\begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 1\\ 1 & 2 & 4 \\ 1 & 1 & 1 \end{pmatrix}\begin{pmatrix}a\\ b\\ c\end{pmatrix} + \begin{pmatrix} \varepsilon_1\\ \varepsilon_2 \\ \varepsilon_3\end{pmatrix} $$
In principle, there is always a solution to the problem \begin{align} \hat{\boldsymbol{\beta}}:=\operatorname{argmin}_{\mathbf{\beta}\in\mathbb{R}^p}\|\mathbf{y}-X\boldsymbol{\beta}\|_2\tag{1}\label{one} \end{align} where $\mathbf{y}\in \mathbb{R}^N$ and $X\in\operatorname{mat}_{\mathbb{R}}(N,p)$, and $\|\;\|_2$ is the Euclidean norm.
The solution $\hat{\boldsymbol{\beta}}$ is unique if $X$ is of full rank (i.e. $\operatorname{rank}(X)=p)$), and is given by $\hat{\boldsymbol{\beta}}=(X^\intercal X)^{-1}X^\intercal\mathbf{y}$.
In general, when $X$ is not necessarily of full rank, there is always a particular solution $\boldsymbol{\beta}_*$ to \eqref{one} such that $\|\boldsymbol{\beta}_*\|_2$ is minimum amongst all solutions to \eqref{one}. This is given by $\boldsymbol{\beta}_*=X^+\mathbf{y}$, where $X^+$ is the Moore-Penrose inverse of $X$.
Most scientific packages (R, octave, etc.) have efficient implementations to find the Moore-Penrose inverse $X^+$ of a matrix ( ginv(X) in R)
When $\operatorname{rank}(X)=p$, then $X^+=(X^\intercal X)^{-1}X^\intercal$.
Notice that in your problem $\operatorname{rank}(X)=3$, Hence the solution to your problem is
$$\hat{\beta}=(X^\intercal X)^{-1}X^\intercal\mathbf{y}=X^+\mathbf{y}=\begin{pmatrix} 1 & 0 & 0 &0\\ -1.5 & 1 & -.5 & 1\\ .5 &, -.5 & .5 & -.5\end{pmatrix} \begin{pmatrix} 0 \\ 1 \\ 1\\ 2\end{pmatrix}= \begin{pmatrix} 0 \\ 2.5\\ -1 \end{pmatrix} $$
For any observation $i \in \{1,\ldots,N\}$, dependent variable $y_i$, and vector of independent variables $\mathbf{x}_i = (x_{i1},\ldots,x_{iK})'$ you want to minimize the sum of squared errors of
$$ y_{i} = \mathbf{x}_{i}^{'} \mathbf{\beta} + \varepsilon_{i} \hspace{1cm} i = 1,\ldots, N. $$
Stacking the $N$ equations, you get
$$ \mathbf{y} = \mathbf{X}\mathbf{\beta} + \mathbf{\varepsilon}, $$ where the $i$-th row of $\mathbf{X}$ is $\mathbf{x}_i^{'}$. From this, you get the usual estimator $\beta = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$. In your application, $\mathbf{x}_i = (1, x_i, x_i^2)'$. Can you now conclude what your matrix $D$ should look like?