The minimum value of $E(\mathbf{w})$ ("Pattern Recognition and Machine Learning" by Christopher M. Bishop)

75 Views Asked by At

I am reading "Pattern Recognition and Machine Learning" by Christopher M. Bishop.

$y(x,\mathbf{w}):=w_0+w_1x+w_2x^2+\dots+w_Mx^M$, where the polynomial coefficients $w_0,\dots,w_M$ are collectively denoted by the vector $\mathbf{w}.$

$E(\mathbf{w}):=\frac{1}{2}\sum_{n=1}^N(y(x_n,\mathbf{w})-t_n)^2,$ where $x_1,\dots,x_N,t_1,\dots,t_n$ are real numbers.

The author wrote how to find the value of $\mathbf{w}$ for which $E(\mathbf{w})$ is as small as possible:

We can solve the curve fitting problem by choosing the value of $\mathbf{w}$ for which $E(\mathbf{w})$ is as small as possible. Because the error function is a quadratic function of the coefficients $\mathbf{w}$, its derivatives with respect to the coefficients will be linear in the elements of $\mathbf{w}$, and so the minimization of the error function has a unique solution, denoted by $\mathbf{w}^{*}$, which can be found in closed form. The resulting polynomial is given by the function $y(x, \mathbf{w}^{*}).$

I think the following argument is not right:

Because $\frac{\partial E}{\partial w_i}(\mathbf{w}^{*})=0$ for any $i\in\{0,1,\dots,M\}$, $E(\mathbf{w}^{*})$ is the minimum value of $E.$

So, I tried to prove $E(\mathbf{w}^{*})$ is really the minimum value of $E.$

$E$ is a quadratic function of the coefficients $\mathbf{w}.$
So, we can write $E(\mathbf{w})=(\mathbf{w},H\mathbf{w})+(\mathbf{a},\mathbf{w})+c$, where $H$ is a symmetric matrix and $\mathbf{a}\in\mathbb{R}^{M+1}$ and $c\in\mathbb{R}$.
$$E(\mathbf{w}-\mathbf{p})=(\mathbf{w}-\mathbf{p},H(\mathbf{w}-\mathbf{p}))+(\mathbf{a},\mathbf{w}-\mathbf{p})+c\\=(\mathbf{w},H\mathbf{w})-(\mathbf{w},H\mathbf{p})-(\mathbf{p},H\mathbf{w})+(\mathbf{p},H\mathbf{p})+(\mathbf{a},\mathbf{w})-(\mathbf{a},\mathbf{p})+c\\=(\mathbf{w},H\mathbf{w})-2(H\mathbf{p},\mathbf{w})+(\mathbf{a},\mathbf{w})+(\mathbf{p},H\mathbf{p})-(\mathbf{a},\mathbf{p})+c.$$
So, if $2H\mathbf{p}=\mathbf{a}$, then $E(\mathbf{w}-\mathbf{p})-((\mathbf{p},H\mathbf{p})-(\mathbf{a},\mathbf{p})+c)=(\mathbf{w},H\mathbf{w})$ is a quadratic form.
$H=\frac{1}{2}\begin{pmatrix} x_1^0 & \cdots & x_1^M \\ \vdots & \cdots & \vdots \\ x_N^0 & \cdots &x_N^M \end{pmatrix}^T \begin{pmatrix} x_1^0 & \cdots & x_1^M \\ \vdots & \cdots & \vdots \\ x_N^0 & \cdots &x_N^M \end{pmatrix}.$
We assume that $x_i\neq x_j$ for any $i,j$ such that $i,j\in\{1,\dots,N\}$ and $i\neq j.$
Then, $H$ is non-singular since $\begin{pmatrix} x_1^0 & \cdots & x_1^M \\ \vdots & \cdots & \vdots \\ x_N^0 & \cdots &x_N^M \end{pmatrix}$ is a Vandermonde matrix.
So, $2H\mathbf{p}=\mathbf{a}$ has a solution.
Let $2H\mathbf{p}_0=\mathbf{a}$.
Then, $E'(\mathbf{w}):=E(\mathbf{w}-\mathbf{p}_0)-((\mathbf{p}_0,H\mathbf{p}_0)-(\mathbf{a},\mathbf{p}_0)+c)=(\mathbf{w},H\mathbf{w})$ is a quadratic form.
Obviously, the minimum value of $E$ exists if and only if the minimum value of $E'$ exists.
So, we prove that the minimum value of $E'$ exists.
Since $H=\frac{1}{2}V^T V$, where $V=\begin{pmatrix} x_1^0 & \cdots & x_1^M \\ \vdots & \cdots & \vdots \\ x_N^0 & \cdots &x_N^M \end{pmatrix}$, $H$ is positive-definite.
So, the unique minimum value of $E'$ exists. So, the unique minimum value of $E$ exists.

Is my proof ok?