I am trying to understand REML estimation for variance. So far I have been able to understand the obvious advantage of using it instead of maximum likelihood estimation(MLE). But I wanted to understand the mechanics of the process. So I found some explanation in the book by Verbeke et al. called linear mixed models for longitudinal data. 
I implemented a quick example in R to verify that it works and of course it did, however I am still unable to understand the mechanics.
For e.g. I remember taking an online lecture by Professor Gilbert Strang on linear algebra after which I could understand how OLS estimation works. He explained how it works with diagrams etc, but now I am clueless with this REML estimation's mechanics.
Can anyone please demystify it? Diagrams will be very helpful to know the soul behind the idea.