I have a good intuition that $A$ is independent of $B$ if $P(A \vert B) = P(A)$, and I see how you can easily derive from this that it must hold that $P(A,B) = P(A)P(B)$.
But the first statement is not normally taken as a definition; instead the second is.
What is the intuition, or even derivation behind defining $A$ and $B$ as independent iff $P(A, B) = P(A)(B)$?
The kind of explanation I am looking for would be one similar to that given by Jaynes for the definition of conditional probability in the first chapter of Probability: The Logic of Science, or even a Kolmogorov axiomatic explanation would help.
Arguing from the intuitive idea of probability (be it frequentist, Bayesian, or a la Jaynes), what can we say about $P(AB)$? Let us assume that $P(A)\le P(B)$. Since $AB\subseteq B$ we can safely deduce that $P(AB)\le P(B)$. By looking at well-known and elementary examples it is easy to be convinced that $P(AB)$ can attain any value between $0$ and $P(B)$. But examining these cases shows that extreme values, close to $0$ or close to $P(B)$ are obtained when information about $B$ having occurred either severely conflicts with $A$ occurring (to get close to $0$), or strongly correlates with $A$ occurring (to get close to $P(B)$).
Now, more mathematically, one value in the range of $P(AB)$ that appears naturally is, of course, $P(A)P(B)$, so it is natural to investigate when that would occur. Notice that this value is symmetric in $A$ and $B$. Since the exact location of $P(AB)$ in its possible range seems to be highly sensitive to whether, and how, $A$ and $B$ influence each other we must conclude that the special value $P(A)P(B)$, being symmetric in the arguments, means that the mutual influences are neutral. That neutrality is another way of thinking about independence. Thus, we turn the intuition into a definition and say that $P(AB)=P(A)P(B)$ holds if, and only if, $A$ and $B$ are independent.