The idea is that the equalizer of $f$ and $g$ is given by the intersection of $(1,f)$ and $(1,g)$.
With this in mind the proof is straight forward, but I don't get the intuition behind it. What is this intersection morally?
Maybe a concrete example over vector spaces helps. I feel there should be some geometric analogy but I'm not getting it.
Partial Answer.
In the particular case of $R-Mod$ for some ring $R$ take $f,g : x\rightarrow y$.
The equalizer looks like $\{ t : x \ |\ f\ t = g\ t\}$
and the intersection looks like $\{ (u,v) : x\oplus x \ |\ (1,f)\ u = (1,g)\ v\}$ therefore $u=v$ and we get the equalizer.
That's actually what the proof does but in a more categorical language.
What's surprising is that what one would initially try to do which is setting $eq(f,g) := ker\big(coker(1,1)\circ(f,g)\big)$ doesn't seem to work, or at least I didn't find a direct proof. But I assume there's a reason why the book sidetracks with the intersections like this.
Furthermore this answer doesn't say how you think of that characterization with the intersection.
I feel that if I tried to prove this in a couple of months, I'd again try to go with $ker\big(coker(1,1)\circ(f,g)\big)$. So I still have no real intuition for how to come up with this trick or what's so fundamental about it.