I am looking at a figure that shows how the prior/posterior distribution vary in linear regression 
I don't quite understand why the Gaussians become tilted and smaller when the lines tend to fit the point.
Could someone explain this behavior? I'd like to understand the intuition behind it.
First, for simple Bayesian linear regression, we assume that the true model is $w_0 + w_1 x$, where $w_0$ and $w_1$ are the true parameter values and for which we would like to find out. This means that our parameter space is $\mathbb{R}^2$, and our parameter vector is $w = (w_0, w_1)$. So, every point in the first plot corresponds to a particular straight line (every $(w_0, w_1)$ coordinate gives me a new line $w_0 + w_1 x$).
In the Bayesian approach, we place a prior distribution on the parameter space, which means we put a probability distribution on the space of parameters $\mathbb{R}^2$. In this case, we don't really know where the parameters are, so we just put a mean zero Gaussian with large variance.
If we sample a few times from this Gaussian, we'll get a bunch of coordinates, and each coordinate gives a bunch of lines, we can plot these lines in the top right plot. See how these lines are all over the place, this is because we haven't seen any data and the prior just spreads the probability over the unit square.
Next, we observe a data point. We started off with our prior belief, but now can move on to incorporating the data into our understanding of the problem. This is called computing the posterior distribution. The posterior distribution will be a more concentrated distribution that uses the data to refine our estimate of where the true parameters $(w_0, w_1)$ actually are. So we get a tighter gaussian. Now, if we sample from this distribution a bunch of times, we get a bunch of lines again, which are starting to look a bit more concentrated.
Now, we start the whole process again, but we use the old posterior as our new prior. We see some more data, compute the new posterior and so on. As we see more data, our belief on where the true parameter values are becomes better and better, and the possible models (straight lines) that we think are probable become more and more concentrated.
That is the intuition, for the mathematical detail of computing posteriors refer to the text you're reading :)