Suppose I have real data (e.g. from an experiment) $y_{exp}$ and a model for that data $y_{mod}(x)$, where $x$ is a vector of independent variables. In Bayesian (and frequentist) analyses, one often makes the assumption that
$y_{exp} - y_{mod}(x) \approx \epsilon(x)$, (1)
where epsilon is a stochastic error term and distributed in some way, e.g. $\epsilon(x) \sim N(0, \sigma^2(x))$.
My question is whether there are any constraints from probability theory on the form that equation (1) above can take. I'd like to be able to say something like the error term must in general satisfy
$d(y_{exp},y_{mod}(x)) \approx \epsilon(x)$,
where $d(,)$ is a distance metric according to metric-space theory. We can add some assumptions:
- Assume the model is the correct model and that there's no bias between the model and the data (a stretch in the real world).
- Assume both the model and the data are real valued.
- Assume model is continuous (may not be important).
There's a field called functional data analysis and perhaps I should look there. Any guidance is appreciated.
I would say that there are unlikely to be any hard restrictions on the form of the expression of the error term. We can essentially formulate an error term any way we like, where the only real restrictions are that it is mathematically coherent and workable, and that it is physically realistic (which isn't such a concern for mathematicians).
The theory of metrics gives us a good language for generalizing the notion of distance, but that does not mean that all errors we can define must be metrics. You have referred to additive noise/error, $\varepsilon=y_{\mathrm{exp}}-y_{\mathrm{mod}}$, but notice that this is signed. So it cannot be a metric, since metrics are supposed to be nonnegative. Another well-studied and still physically realistic example of an error term is multiplicative noise, $\varepsilon=y_{\mathrm{exp}}/y_{\mathrm{mod}}$, which is certainly not a metric since the absence of error (identical model and observation) corresponds to $\varepsilon=1$, not $0$. (You could, however, view each of these as metrics by discarding the sign of additive noise, and by viewing multiplicative noise as additive noise of the log-transformed data.) Moreover, we could compute the error after transforming the variables. We might choose to do so on measurements of angles, if we limited them to $[0,2\pi)$, by defining the error as $\varepsilon=\operatorname{mod}\left(y_{\mathrm{exp}},2\pi\right)-\operatorname{mod}\left(y_{\mathrm{mod}},2\pi\right)$. This too is not a metric since it has infinitely many distinct zeros.
So, we can really choose whatever formulation we want to define the error term, provided it is reasonable to the situation.
In terms of the form the distribution of $\varepsilon$, you might find Chebyshev's inequality useful. This places a probabilistic bound on the size of random variables, even in the absence of the assumption that they are normally distributed, but with the assumption that the random variable $\varepsilon$ has finite mean $\bar\varepsilon$ and finite nonzero variance $\sigma^2$. It states that the probability that $|\varepsilon-\bar\varepsilon|\ge k\sigma$ holds is less than or equal to $\frac1{k^2}$ for all positive real $k$. For instance, if we can assume that $\bar\varepsilon=0$, then $\mathrm{P}(|\varepsilon|\ge k\sigma)=\mathrm{P}\left(|y_{\text{exp}}-y_{\text{mod}}(x)|\ge k\sigma(x)\,\right)\le\frac1{k^2}$. This is very general but may not be a very tight bound compared to what we could find with stronger assumptions (e.g. that $\varepsilon$ is normal).