Regression with noise

Question

Regression with noise

347 Views Asked by Bumbble Comm At 30 Mar 2026 - 3:52

I have data acquired from simulation that is tainted by noise, which I know to a Wiener process (integration of white noise, with mean $\mu = 0$ and variance $\sigma^2$ known). I have good reasons to assume that the curve is a mixture of power laws, i.e., $y(x) \sim a_0 x^{-a}$ for $x< x_0$ and $y(x) \sim b_0 x^{-b}$ for $x>x_0$. The usual trick to this kind of function is to plot $\log(y(x))$ against $\log(x)$ and fit two linear curves.

However, if the noise on $y(x)$ becomes significant ($y(x) \approx \sigma$), the noise on the log-log plot becomes asymmetrically distributed. Currently, I use a Tuley's biweight functions instead of the usual least-squares but I was wondering if there was an established estimator for this kind of problem.

Here's an example of the problem :

Original Q&A

There are 2 best solutions below

Bumbble Comm On 12 Oct 2016 - 3:29

Perhaps it is not the same terminology or knowledge domain that you use, but from my point of view this is a problem of heteroscedasticity, i.e., the noise term does not seem to follow $N(0, \sigma^2)$, but rather $N(0,h(x_{4i})\sigma^2)$ where $h(x)$ is monotone increasing function of $x_4$. Thus, my suggestion would be to use the so-called FGLS. Namely, assume some pretty general structure on $h(x)$ and then estimate it, and using this estimator to compute a WLS estimators for $y=f(x_4)+\epsilon_i$, where $f$ can be some polynomial of $x_4$.

In details: Lets assume that your model is given $y_i = \beta_0 + \beta_1x_{4i} +\epsilon_i$, where $\epsilon_i \sim N(0,h(x_{4i})\sigma^2)$. You can assume that $h(x_4)=\exp\{\delta_0 + \delta_1x_4\}$, and estimate it using $$ \log(\epsilon^2)=\log(\sigma^2)+\delta_0+\delta_1x_4+\eta, $$
where $\log(\sigma^2)+\delta_0$ is the intercept term and noise term is following some $N(0, b^2)$ distribution. Finally, you can take $\hat{h}_i=\exp\{\hat{\delta}_0+\hat{\delta}_1x_4\}$ to be the corresponding weight $w_i$ for the $i$th data point, namely $$ y_i^*=y_i/\sqrt{\hat{h}_i} = \beta_0/\sqrt{\hat{h}_i} + \beta_1x_4/\sqrt{\hat{h}_i} + \epsilon_i/\sqrt{\hat{h}_i}. $$ Surely the desired properties of of the FGLS estimator are mainly hold asymptotically, i.e., $Var(y_i^*|x_4)=\sigma^2h(x_4)Var(1/\hat{h}^{1/2}(x_4))\to\sigma^2\frac{h(x_4)}{h(x_4)}=\sigma^2$. But for large enough $n$ it should suffice as well.

**Bumbble Comm** · Accepted Answer

Your picture demonstrates the bias nicely! And this is to be expected, since the Jensen inequality shows that your mean estimates will be biased if you take logs of additive Gaussian errors.

Since you should not take logs, you face a non-linear regression problem. In your case this should be no problem, since standard numerical software will provide a function to do this (in Matlab this would be "lsqnonlin"). But in addition to the non-linearity your error term does not fulfill the requirements for least squares.

Error term

Let $\epsilon_x$ be the white noise then your errors are $\eta(x_k) = \int_{x_1}^{x_k}{\epsilon_s \,ds}$ which means they are heteroskedastic $\text{Var}[\eta_k]=x_k \sigma^2$ and serially correlated $\text{corr}(\eta_k,\eta_{k-i})=\sqrt{\frac{x_{k-i}}{x_{k}}}$. Both features are clearly visible in the figure you posted.

To get rid of these issues I suggest not to fit to your data $y_k$ directly but to its increments $y_k-y_{k-1}$. This is because (assuming your observations are on a grid, i.e. $x_k=x_{k-1}+d$ with $d=\text{const}.$) then the error term is $\delta_k = \int_{x_{k-1}}^{x_k}{\epsilon_s \,ds}$ with $\text{Var}[\delta_k]=d \sigma^2$ and $\text{corr}(\delta_k,\delta_{k-i})=0$.

Fit with $x_0$ known

Assume now for a moment, that the location of the break $x_0$ would be known. Then your regression function for the increments is $$ f(x) = \begin{cases} a_0 \left( x^{-a} - (x -d)^{-a} \right) & \text{for $x<x_0$} \\ b_0 \left( x^{-b} - (x -d)^{-b} \right) & \text{for $x\geq x_0$}\end{cases}$$ which is a perfectly defined function with parameters, such that the $f(x_k)$ can be fit to the increments $y_k - y_{k-1}$ by least squares.

Finding $x_0$

Standard linear regression/time series analysis knows ways to find changes in trend. But this is always a linear trend. I am not sure something like this would work in the non-linear case. So I suggest a pragmatic solution: Just create a fit for every possible break $x_0$ and then choose the one you like best, for example using a criterion such as residual error.

Regression with noise

There are 2 best solutions below

Error term

Fit with $x_0$ known

Finding $x_0$

Related Questions in STATISTICS

Related Questions in NUMERICAL-METHODS

Related Questions in REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions