How can I demonstrate that my data is sampled from a Gaussian process?

174 Views Asked by At

I have an experiment that, I believe, produces data with Gaussian noise. That is, any subset of my data points have a joint multivariate normal distribution with covariance K (i.e., they are sampled from a function with a Gaussian Process prior). To simplify, suppose the mean function is zero, so the statistical model is:

$$ \mathbf{D} \sim N(\mathbf{0},\mathbf{K}) $$ Where the covariance has a stationary kernel plus independent noise: $$ K_{ij} = k(|\mathbf{d}_i-\mathbf{d}_j|) + \sigma_n^2\delta_{ij} $$

For example, $k$ may be a square-exponential kernel. I would like to compute a metric from my real data that indicates $k$ indeed has the correct functional form (including, that it's not just zero everywhere!).

Just to emphasize: this is a very high dimensional problem because the size of D is huge.

1

There are 1 best solutions below

2
On

A visual way to see it is to do a qq-plot. Take a bunch of $N(0,K)$ random variables (like via MATLAB's randn) and plot the quantiles of the distributions against each other (qqplot command in MATLAB). If they line up on the $x=y$ line, that's good evidence they are the same distribution and thus the noise is normal.

Another way are hypothesis tests. You can try forms of the Kolmogorov-Smirnov test (might need to use the 2-sample to deal with the covariance matrix). There are multivariable versions, or you can just check that linear combinations are normal (that's one way to characterize a multivariate normal), and you can check all possible combinations (difficult if the dimension is high). Another popular test for this is the Royston test. If you're using R there's a good package for these types of tests called MVN. Other tests are Mardia's test, the BHEP test, the Cox-Small test, and the Smith and Jain's adaptation of the Friedman-Rafsky test, but I haven't seen these "in the wild".