Linear model of the parameters implies linear combination of the parameters?

33 Views Asked by At

In usual motivations of the ordinary least squares estimator (e.g. https://en.wikipedia.org/wiki/Least_squares), we aim to find a function $f_{\theta}(x)$, where $f_{\theta}$ is a function parametrized by $\theta \in \Theta$, mapping from an abstract space $\mathcal{X}$ to $\mathbb{R}$. We assume that $\Theta$ is a vector space; for simplicity, take it to be $\mathbb{R}^d$.

There's usually a claim that any function $f_{\theta}$ that is linear in the parameters $\theta$ implies that $f_{\theta}(x)$ can be written as a linear combination: $$ f_{\theta}(x) = \sum_{i = 1}^d \theta_i \phi_i(x) = \theta^{\top} \phi(x), $$ where $\phi(x)$ is a function $\phi: \mathcal{X} \rightarrow \mathbb{R}^d$ with components $\phi_i(x) \in \mathbb{R}$. I'm not sure how you prove this claim just from the linearity of $f_{\theta}$ in the parameters.

How do just the properties of linearity, $f_{\theta_1 + \theta_2}(x) = f_{\theta_1}(x) + f_{\theta_2}(x)$ and $f_{c\theta}(x) = cf_{\theta}(x)$, imply that we can write $f_{\theta}$ as a linear combination of functions of $x$ as above?

1

There are 1 best solutions below

0
On

Going off of @Andrew in the comments (thanks!), I can just write a full proof here:

Let $\theta \in \mathbb{R}^d.$ We can rewrite $\theta = \theta_1 \mathbf{e}_1 + \dots + \theta_d \mathbf{e}_d$ where $\mathbf{e}_1, \dots, \mathbf{e}_d$ is the standard basis in $\mathbb{R}^d$. Because $\mathbb{R}^d$ is closed, each $f_{\theta_i \mathbf{e}_i}(x)$ is also a function of the parameter $\theta_i \mathbf{e}_i$ for $i = 1, \dots, d$.

Consider any $x \in \mathcal{X}$. By linearity in $\theta$, we can write: $$ f_{\theta}(x) = f_{\theta_1 \mathbf{e}_1 + \dots,\theta_d \mathbf{e}_d}(x) = \sum_{i = 1}^d f_{\theta_i \mathbf{e}_i}(x) = \sum_{i = 1}^d \theta_i f_{\mathbf{e}_i}(x). $$ Then, choose $\phi_i(x) := f_{\mathbf{e}_i}(x)$ for $i = 1, \dots, d$ to complete the proof.

More generally, we could've used any basis of $\mathbb{R}^d$ to construct the $\phi_i(x)$, and I guess that's what leads to different "feature representations" in applications of OLS in machine learning, say.