Benefit of using GP prior for Deep Neural Networks

275 Views Asked by At

I've been reading some papers on Bayesian Neural Networks and one that caught my attention is titled Deep Neural Networks as Gaussian Processes where they use Gaussian priors for the neural network weights and end up with a Gaussian Process, given that the hidden layer is wide enough.

By using the GP prior over functions produced by the network they are then able to perform Bayesian inference for regression tasks using deep neural networks.

My question is: What is the benefit gained from this? Why not simply use a vanilla GP? Also, is there some advantage of their method over using popular methods such as MCMC or variational inference to approximate the posterior distribution of the network weights?

My knowledge on Bayesian Neural Networks is still green, so any clarity on this topic would be greatly appreciated.

1

There are 1 best solutions below

0
On

I would suggest further reading of related papers on neural process:

Neural Processes combine elements from both neural networks (NN) and gaussian process (GP)-like Bayesian models to capture distributions over functions. However, NN are more flexible in modelling data, removing the need of say pre-processing data to apply GP effectively. In fact with careful choices of architectures, one can achieve any desirable model behaviour, e.g. GP-like predictive uncertainties. Also, this is closely related to how unsupervised learning using VAEs became popular and recently GANs etc.

In nutshell if you have huge amount of data (may be multi-modal), let the network do the thinking, else use GP if you have more domain knowledge.