polynomial least squares derivation: normal equations

171 Views Asked by At

Suppose we have the problem $$ \min_{q \in \mathcal{P}_n} \|f - q\|_{L^2(w)}^2 $$ where $\mathcal{P}_n$ is the space of polynomials of degree $n$, $w$ is some weight function (measure with continuous density), and $\|g\|_{L^2(w)}^2 = \int_a^b (g(x))^2 w(dx)$. I want to clarify the connection between this case and its finite dimensional analog $$ \min_{\beta \in \mathbb{R}^{n}} \|y - X\beta|_2^2. $$ In finite dimensions, we set the gradient equal to zero and recover the normal equations $$ 2X^T X \beta - 2X^T \beta = 0. $$ How can I recover the normal equations in the infinite dimensional case above? I'm at \begin{align} \min_{q \in \mathcal{P}_n} \|f - q\|_{L^2(w)}^2 &= \min_{q \in \mathcal{P}_n} \int_a^b [f(x)^2 - 2q(x)f(x) + q(x)^2] w(dx) \\ &= \min_{q \in \mathcal{P}_n} \int_a^b [- 2q(x)f(x) + q(x)^2] w(dx) \end{align} but I don't know how to differentiate wrt $q(x)$ (and am not sure that this is even the right step).

1

There are 1 best solutions below

0
On

The Normal Equations actually hold in any Hilbert space as long as the subspace you are estimating in is of finite (co-)dimension and here $V=\mathcal P_n$ has $\dim n+1$.

To see this, observe by the Projection Theorem that the optimal vector $q=\sum a_j p_j, p_j \in V$ satisfies \begin{align*} \left\langle x -\sum a_j p_j | p_i \right\rangle=0. \end{align*} Use this to derive expressions for the $a_j$ and you end up with an infinite-dimensional generalization of the normal equations (i.e. your result will reduce to the matrix form if your Hilbert space itself is finite-dimensional).

Luenberger's classic optimization by vector space methods has a beautiful exposition of this topic in chapter 3, see also connections to projection operators and pseudoinverses.

Edit: in your case \begin{align*} \langle x| y\rangle = \int xy d\mu \end{align*} and $w(dx)=d\mu$.

Edit2: Your attempt to differentiate wrt $q$ is essentially the same as in the calculus of variations, you could probably get somewhere with Euler-Lagrange as well if you still want to try that route. This connects to the idea that there often are (at least) two ways to solve an (easy) optimization problem: the direct route via differential techniques, here Euler-Lagrange, or the geometric route, here the projection theorem, using orthogonality.