KS normality test of Ornstein–Uhlenbeck process

206 Views Asked by At

I'm simulating a simple Ornstein-Uhlenbeck process

$dx=-x dt+\sqrt{2}dW$

which is well-known to have a steady state distribution of

$p_s(x)=\frac{1}{\sqrt{2\pi}}e^{-x^2/2}$

Here's my matlab code to run the simulation

dt = 0.01; T= 5000; X = zeros(size(0:dt:T)); for t=0:dt:T % This is the Euler scheme X(:,i+1)=X(:,i)-X(:,i)*dt+sqrt(2*dt)*randn; % This is the exact formula % X(:,i+1)=X(:,i)*exp(-dt)+sqrt(1-exp(-2*dt))*randn; i=i+1; end

Then I do the Kolmogorov–Smirnov test to check the normality of the resulting distribution

[h,p] = kstest(X);

but always get h = 1, which rejects the null hypothesis that the data obeys standard normal distribution.

To find out the reason, I generate random numbers of normal distribution x = randn(size(X)) and compare their cumulative distribution function with the standard normal distribution pd = makedist('normal',0,1); [fX,t] = ecdf(X); y = cdf(pd,t); plot(t,fX-y) [fx,t] = ecdf(x); hold on; plot(t,fx-y); The simulation-generated X and the matlab-generated x show very similar shape.
Then I do the two-sample KS test,

[h,p] = kstest2(X,x)

It returns h=0 (X and x are from the same distribution). So I'm really confused here. Why the simulated X cannot pass the normality test and the p value is far less than the significance level 0.05?

1

There are 1 best solutions below

1
On

The process $x(t)$ can be written as : $$x(t)=x(s)e^{-(t-s)}+\sqrt{2}\int_{s}^{t}{e^{-(t-u)}dW_u}$$, with $s<t$ and $x(0)=0$.

Let $t_0<t_1,...<t_n$ a subdivision of time, each $x(t_k) \sim \mathcal{N}(0,1-e^{-2t_k})$.

Obviously , one would expect that if $t_n$ is large, $x(t_n) \sim \mathcal{N}(0,1)$ approximately.

This can be tested by building a sample $U=\{x(t_n,\omega_1),...,x(t_n,\omega_N)\}$, with $x(t_k,\omega _l)$ the value of $x(t_k)$ for the $l^{th}$ scenario.

We can pick $t_n=5000$ as you did , and a decent number of scenarios $N=1000$ for example. MATLAB function kstest applied on $U$ should be positive, because we stick to one distribution : $x(t_n)$

What you tried to do is to test the sample $U'=\{x(t_1,\omega_1),...,x(t_n,\omega_1)\}$, which is basically one occurrence for each distribution $x(t_k)$. Each element of $U'$ are not taken from one distribution but $n$ ones. For example, $var(X_{t_1}) \approx 0.0198$ while $var(X_{t_{50}}) \approx 0.6321$. We hit the $0.999$-ish from $t_{500}$.

That means that roughly $10\%$ of your sample comes from distributions with variances smaller than $0.999$.

Let assume that this bogus part of the sample is irrelevant. We still have the issue with the covariance

Indeed, by definition , a sample must come from IID random variables. In your case, we have $$cov(x(t),x(s))=cov(\sqrt{2}\int_{0}^{t}{e^{-(t-u)}dW_u},\sqrt{2}\int_{0}^{s}{e^{-(s-u)}dW_u})$$ $$=2\int_{0}^{s}{e^{-(t-u)}e^{-(s-u)}du}=e^{s-t}$$

By construction of $U'$, we have non-nil correlation between its elements, even for large time points. Take $x(1000)$ and $x(1001)$ , both variance "are" one, but the correlation is $0.3678$.

The conclusion is that one should not expect a positive result from $ktest$.

You can try to take a subset of your initial sample from $t_1=50$ for example, and take the values every $30$ units. For example $U''=\{x(50,\omega_1),x(80,\omega_1)...,x(4970,\omega_1),,x(5000,\omega_1)\}$. It is just a suggestion, I do not guarantee results.