Scaled conjugate gradient for maximum likelihood estimation of covariance matrix hyperparameters

51 Views Asked by At

I am studying Gaussian processes at the moment. I am stuck on the part of getting the maximum likelihood estimation of kernel parameters. I have been told to study the scaled conjugate gradient optimization method but am confused as to how to apply it to the problem I have.

I understand how solving Ax = b translates into optimizing $f(x) = \frac{1}{2}x^TAx - b^Tx$ but do not see how to apply this in the given scenario.

As I am trying to estimate the parameters do I want my $x$ to be some $\theta = (\theta_0, \theta_1)$ where the $\theta_0$ and $\theta_1$ are say the leghtscale parameter and the signal variance parameter?

Eventually, I struggle to see how to translate the problem.