I understand the steps of the proof in the book, but I don't see intuitively the of maximum increase at a point $P$ must be given by the $||\nabla f(x, y)||$. A graph has infinite directional derivatives at point $P$, I just don't see what is special about the sum of the directional derivatives in the $x$ direction and in the $y$ direction. If at a point $P$ the absolute value of the gradient is equal to the maximum increase, can't we just rotate the graph any amount around point $P$, the same maximum increase but a different gradient?
P.S. Please try to avoid using many advanced logic symbols.
The directional derivative is the dot product between the gradient and the unit vector along the given direction. That dot product attains its maximum value when the two vectors are parallel. Draw yourself the conclusion.