How can I perform polynomial regression with this dataset?

71 Views Asked by At

I have the training set $\{(0,0), (1,1), (2,1), (1,2)\}$. I want to find the best quadratic polynomial of the form $f(x) = a + bx + cx^2$ that minimizes the sum of squared error between $y$ and the regression function. I know that that vector is given by $\beta = (D^{T}D)^{-1}D^T\textbf{y}$, I don't know what the data matrix $D$ is supposed to be in this situation, however. How am I supposed to calculate $D$?

2

There are 2 best solutions below

0
On

For any observation $i \in \{1,\ldots,N\}$, dependent variable $y_i$, and vector of independent variables $\mathbf{x}_i = (x_{i1},\ldots,x_{iK})'$ you want to minimize the sum of squared errors of

$$ y_{i} = \mathbf{x}_{i}^{'} \mathbf{\beta} + \varepsilon_{i} \hspace{1cm} i = 1,\ldots, N. $$

Stacking the $N$ equations, you get

$$ \mathbf{y} = \mathbf{X}\mathbf{\beta} + \mathbf{\varepsilon}, $$ where the $i$-th row of $\mathbf{X}$ is $\mathbf{x}_i^{'}$. From this, you get the usual estimator $\beta = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$. In your application, $\mathbf{x}_i = (1, x_i, x_i^2)'$. Can you now conclude what your matrix $D$ should look like?

2
On

In your particular problem you are given data points $(x_j,y_j)$ and where $$y_j=a+ bx_j + c x^2_j+\varepsilon_j$$ In terms of vectors, this model can be expressed as $$\mathbf{y}=X\boldsymbol{\beta} + \varepsilon$$ where $\mathbf{y}=[y_1,\ldots,y_N]^\intercal$, $$X=\begin{pmatrix} 1 & x_1 & x_1^2 \\ \vdots & \vdots & \vdots\\ 1 & x_N & x^2_N \end{pmatrix} $$ $\boldsymbol{\beta}=[a, b, c]^\intercal$, and $\boldsymbol{\varepsilon}=[\varepsilon_1,\ldots, \varepsilon_N]^\intercal$.

This is $$\begin{pmatrix} 0\\ 1\\1\\2\end{pmatrix}=\begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 1\\ 1 & 2 & 4 \\ 1 & 1 & 1 \end{pmatrix}\begin{pmatrix}a\\ b\\ c\end{pmatrix} + \begin{pmatrix} \varepsilon_1\\ \varepsilon_2 \\ \varepsilon_3\end{pmatrix} $$


  • In principle, there is always a solution to the problem \begin{align} \hat{\boldsymbol{\beta}}:=\operatorname{argmin}_{\mathbf{\beta}\in\mathbb{R}^p}\|\mathbf{y}-X\boldsymbol{\beta}\|_2\tag{1}\label{one} \end{align} where $\mathbf{y}\in \mathbb{R}^N$ and $X\in\operatorname{mat}_{\mathbb{R}}(N,p)$, and $\|\;\|_2$ is the Euclidean norm.

  • The solution $\hat{\boldsymbol{\beta}}$ is unique if $X$ is of full rank (i.e. $\operatorname{rank}(X)=p)$), and is given by $\hat{\boldsymbol{\beta}}=(X^\intercal X)^{-1}X^\intercal\mathbf{y}$.

  • In general, when $X$ is not necessarily of full rank, there is always a particular solution $\boldsymbol{\beta}_*$ to \eqref{one} such that $\|\boldsymbol{\beta}_*\|_2$ is minimum amongst all solutions to \eqref{one}. This is given by $\boldsymbol{\beta}_*=X^+\mathbf{y}$, where $X^+$ is the Moore-Penrose inverse of $X$.

  • Most scientific packages (R, octave, etc.) have efficient implementations to find the Moore-Penrose inverse $X^+$ of a matrix ( ginv(X) in R)

  • When $\operatorname{rank}(X)=p$, then $X^+=(X^\intercal X)^{-1}X^\intercal$.


Notice that in your problem $\operatorname{rank}(X)=3$, Hence the solution to your problem is

$$\hat{\beta}=(X^\intercal X)^{-1}X^\intercal\mathbf{y}=X^+\mathbf{y}=\begin{pmatrix} 1 & 0 & 0 &0\\ -1.5 & 1 & -.5 & 1\\ .5 &, -.5 & .5 & -.5\end{pmatrix} \begin{pmatrix} 0 \\ 1 \\ 1\\ 2\end{pmatrix}= \begin{pmatrix} 0 \\ 2.5\\ -1 \end{pmatrix} $$