Why does the probability function of a Bernoulli variable $X\sim Bern(p)$ have the form $f(x|p)=p^x(1-p)^{1-x}$?
As I understand, the density function $f(x|p)$ for $X\sim Bern(p)$ should satisfy $$f(0|p)=1-p,\ f(1|p)=p$$ Such functions are certainly not unique. In general, I would also require that $f(x|p)=0$ for $x\neq0,1$. So I would define it as $f(x|p)=(1-p)I_{\{0\}}(x)+pI_{\{1\}}(x)$. But if we only require that the conditions hold at $0$ and $1$, $f(x|p)=(2p-1)x+1-p$ is also a choice.
Why is there only $f(x|p)=p^x(1-p)^{1-x}$ that is used, especially when maximum likelihood estimation problems are involved? Is there a theory behind this? Can it be generalized to other discrete distributions?
Suppose we have a sample $X_1,\cdots ,X_n$, in which there are $a=\sum_{i=1}^nx_i$ successes and $n-a$ failures. We want to estimate the probability of success. Given a probability $p$, the probability of getting $a$ successes and $n-a$ failures (in the particular order of the given sample) is $$p^a(1-p)^{n-a}$$ The logic behind maximum likelihood estimation requires you to maximize the likelihood function (which is the probability of getting the given sample given the value of $p$ in this case) by varying the value of $p$. If the pmf of a Bernoulli r.v. is chosen to be $f(x|p)=p^x(1-p)^{1-x}$, then we can come up with the likelihood function $$L(\theta)=\prod_{i=1}^np^{x_i}(1-p)^{1-x_i}=p^a(1-p)^{n-a}$$ which is our desired probability. Other choices of the p.m.f. like $f(x|p)=(2p−1)x+1−p $ would give $$L(\theta)=\prod_{i=1}^n((2p−1)x_i+1−p)$$ which would not give a convenient result.