Least Square Method(LSM) and Partial differential equation

2.7k Views Asked by At
 Hello people, 
 I was looking at the machine learning book and try to understand the 
 Least square method using partial differential equation.

$$ s = \sum( y_i - a_0 +a_1x_i)^2$$

 Now, try to find the 

$$a_0 , a_1$$

 for best fit for any given  data sets.

 Book just said take partial differential equation  and result is following.

$$a_o = \frac{\sum x_i^2 \sum y_i - \sum x_i y_i \sum x_i}{n \sum x_i^2 - (\sum x_i)^2}$$

$$a1 = \frac{n \sum x_i y_i - \sum x_i \sum y_i}{n \sum x_i^2 - (\sum x_i)^2}$$

 I do not have deep knowledge about partial differential equations. 
 I want to see the detail steps to get following $a_0$ and $a_1$
2

There are 2 best solutions below

2
On

I don't really see how partial differential equations come into play here. It looks more like a least squares fit you can solve with a linear equation system. You need to be able to do differentiation and then to solve the equation system when setting differentials to 0. Well it is equations resulting from doing partial differentiation, but you won't want to call it partial differential equations because that is when you solve for a function satisfying terms involving differentiation.

$$ s = \sum( y_i - a_0 +a_1x_i)^2$$

If we differentiate with respect to $a_0$:

$$ \frac{\partial s}{\partial a_0} = -2\sum( y_i - a_0 +a_1x_i)$$

And then differentiate with respect to $a_1$: $$ \frac{\partial s}{\partial a_1} = 2\sum x_i( y_i - a_0 +a_1x_i)$$

Now both of these should be equal to zero. Maybe you can get further with ths start.

0
On

I assume you are using the least square method (in linear regression) to find the best fitting straight line through a set of points representing the given training data. You might define s as being the error associated to $y=a_0 +a_1x$ by:

$$s = \sum_{i=1}^n( y_i - (a_0 +a_1x_i))^2$$

You goal is to find $a_0$ and $a_1$ that minimize your error. In multivariate calculus, this requires to find the values of $a_0$ and $a_1$ such that: $\frac{\partial s}{\partial a_0} =0$ and $\frac{\partial s}{\partial a_1} =0$

Differentiating $s(a_0,a_1)$ gives:

$$\frac{\partial s}{\partial a_0} = -2\sum_{i=1}^n( y_i - (a_0 +a_1x_i))$$

and,

$$\frac{\partial s}{\partial a_1} = 2\sum_{i=1}^n x_i( y_i - (a_0 +a_1x_i))$$

Setting $\frac{\partial s}{\partial a_0}=0$ and $\frac{\partial s}{\partial a_1}=0$ leads to:

$$\sum_{i=1}^n( y_i - (a_0 +a_1x_i))$$

and,

$$\sum_{i=1}^n x_i( y_i - (a_0 +a_1x_i))=0$$

By separating each term, you might rewrite your equations as follows:

$$(\sum_{i=1}^n x_i^2)a_1 + (\sum_{i=1}^n x_i)a_0 = \sum_{i=1}^n x_iy_i$$

and,

$$(\sum_{i=1}^n x_i)a_1 + (\sum_{i=1}^n 1)a_0 = \sum_{i=1}^n y_i$$

You have obtained your values of $a_0$ and $a_1$ that minimize the error and satisfy a linear equation which you can rewrite as the following matrix equation:

$$\begin{pmatrix}\sum_{i=1}^n x_i^2 & \sum_{i=1}^n x_i\\ \sum_{i=1}^n x_i & \sum_{i=1}^n 1 \\\end{pmatrix} \begin{pmatrix}a_1 \\ a_0 \\\end{pmatrix} = \begin{pmatrix}\sum_{i=1}^n x_iy_i \\ \sum_{i=1}^n y_i \\\end{pmatrix}$$

Let's denote the $2x2$ matrix M. You can show later that M is invertible as long as all the $x_i$ are not equal ( $det(M)\neq0$ ).

$$det(M) = (\sum_{i=1}^n x_i^2)(\sum_{i=1}^n 1) - (\sum_{i=1}^n x_i)(\sum_{i=1}^n x_i) = n(\sum_{i=1}^n x_i^2) - (\sum_{i=1}^n x_i)^2$$

thus,

$$\begin{pmatrix}a_1 \\ a_0 \\\end{pmatrix} = inv(\begin{pmatrix}\sum_{i=1}^n x_i^2 & \sum_{i=1}^n x_i\\ \sum_{i=1}^n x_i & \sum_{i=1}^n 1 \\\end{pmatrix})\begin{pmatrix}\sum_{i=1}^n x_iy_i \\ \sum_{i=1}^n y_i \\\end{pmatrix}$$

and,

$$inv(\begin{pmatrix}\sum_{i=1}^n x_i^2 & \sum_{i=1}^n x_i\\ \sum_{i=1}^n x_i & \sum_{i=1}^n 1 \\\end{pmatrix})=\frac{1}{det(M)}adj(M)$$

where,

$$adj(M)=\begin{pmatrix} \sum_{i=1}^n 1 & -\sum_{i=1}^n x_i\\ -\sum_{i=1}^n x_i & \sum_{i=1}^n x_i^2 \\\end{pmatrix}$$

refer to: https://en.wikipedia.org/wiki/Adjugate_matrix

Finally, your values of $a_0$ and $a_1$ are:

$$a_o = \frac{\sum_{i=1}^n x_i^2 \sum_{i=1}^n y_i - \sum_{i=1}^n x_i y_i \sum_{i=1}^n x_i}{n \sum_{i=1}^n x_i^2 - (\sum_{i=1}^n x_i)^2}$$

and,

$$a_1 = \frac{n \sum_{i=1}^n x_i y_i - \sum_{i=1}^n x_i \sum_{i=1}^n y_i}{n \sum_{i=1}^n x_i^2 - (\sum_{i=1}^n x_i)^2}$$

I hope this helps.

Cheers!