Generating a synthetic dataset

67 Views Asked by At

I want to generate a synthetic dataset for a simple method that estimates the uncertainty in regression using a neural network. The dataset is generated by sampling the following function with in the range x $\in$ $[-\pi, \pi]$, with $100$ data points:

\begin{equation} y = f(x) = \sin(0.5x) + \epsilon \end{equation}

Where $\epsilon \sim \mathcal{N}(0, \eta(x))$ and $\eta(x) = \begin{cases} 0.2 & \text{$x<0$}\\ 0.5 & \text{x $\geq$ 0} \end{cases}$

What exactly does represent the $\epsilon$? It is certainly the $\textbf{normal distribution}$, with $\mu = 0$ and $\sigma^2 = 0.2$ for $x<0$, and $\mu = 0$ with $\sigma^2 = 0.5$ for $x \geq 0$. But how can $\epsilon$ be a direct summand to $\sin(0.5x)$? The $\textit{Probability density function}$ for a normal distribution is also defined as follows:

\begin{equation} \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2} \end{equation}

So does that mean that the addition is performed by respectively adding the PDF to the first summand, whereby the values for $\mu$ and $\sigma$ are defined by the value of $x$?