General formulation of machine learning optimisation problem

56 Views Asked by Bumbble Comm At 16 May 2026 - 1:05

I'm having some trouble understanding the first equation from this paper: https://arxiv.org/abs/1805.09545.

I interpret this as us searching for a minimiser $\phi^*$ in a Hilbert space, that has to be a linear combination of "a few" elements from a parameterised set $\{\phi(\theta):\theta\in \Theta\}$. We then replace "linear combination" above by "integral over signed measure". Ignoring the regularisation term $G$, we have $$J(\mu) = R\left( \int \phi(\theta)d\mu(\theta) \right),$$ where the integral is "mixing" all the $\phi(\theta)$ in the parameterised set, giving us a candidate minimiser $\phi':=\int \phi d\mu$, and we may write $R(\phi')$ or just $R(\mu)$ instead of the above display.

Is this interpretation correct?
Why do we pass to signed measures? Does it give the optimisation problem theoretical guarantees it wouldn't have if we restricted to linear combinations?
I don't see how training data is incorporated here - would it be included in the loss function $R$? E.g. say we are doing OLS ($Y=X\beta$), and write $\beta$ instead of $\int \phi d\mu$. Would we then have something like $$R(\beta) = R_{X,Y}(\beta) = \|Y-X\beta\|^2?$$
Why is $\phi$ a linear combination of elements in $\{\phi(\theta):\theta\in \Theta\}$, instead of the parameterised set including all such linear combinations? E.g. in the OLS case, is it more correct to let $\{\phi(\theta):\theta\in \Theta\}$ be the standard basis vectors in $\mathbb{R}^p\ni \beta$, and $\mu$ decide the appropriate linear combination of these basis vectors? And is this essentially why the regularisation is applied to $\mu$, not $\int \phi d\mu$?

Original Q&A

General formulation of machine learning optimisation problem

Related Questions in CONVEX-OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions