Bayes theorem can be proved using the definition of conditional probability. Given events $A,B$ and $P(B)>0$ we have:
$$P(A|B)=\frac{P(AB)}{P(B)}$$
where $AB$ means the intersection of $A$ and $B$.
Bayes' theorem can then be derived using $P(AB)=P(A|B)P(B)$. But this assumes that $P(B)>0$. Now since $P(AB)=P(BA)$ we directly find:
$$P(A|B)=\frac{P(B|A)P(A)}{P(B)}.$$
Now the problem is as follows. The book I use (Wasserman) does not state that $P(A)>0$ although we defined $P(B|A)$. My feeling is that both $P(A),P(B)$ should be larger than zero so that condition probability is well-defined. Hopefully someome can tell me where I am wrong or that I am actually right and the book is not precise on this point.
The only constraint is $P(B) > 0$ since all probabilities lie on the interval $[0, 1]$, and $P(B)$ cannot be $0$ (otherwise, we would be dividing by zero!)
There is no reason for $P(A)$ to be zero. For example, let $A$ be the event that an apple starts flying. Let $B$ be the event that I roll a $3$ on a dice. Then,
$$P(A \mid B) = \frac{P(B \mid A)\cdot P(A)}{P(B)} = 0$$
This result is intuitive. An event with probability $0$ will remain $0$, even if we gain new information.