I understand the concept behind finding a Maximum Likelihood Estimator, but when I'm setting up the likelihood function, I'm having trouble understanding if I start with a summation to create the joint probability function, or if I start with a product.
I thought it depended on the original distribution - perhaps product for continuous random variables and summation for discrete. But then I saw in my notes we started with a product when finding the MLE for p in the binomial distribution, and used a summation for finding the MLE for $\theta$ in the exponential distribution.
Is there a guideline?
If the observations are independent, then the joint density is the product of the marginal densities for distributions with densities (similarly for discrete random variables, the joint probability is the product of the marginal probabilities). This product can be taken as (proportional to) the likelihood you wish to maximise
You then have a choice:
It makes little theoretical difference which you use, since logarithms are continuous strictly increasing functions. Taking the derivative of the product gives $\frac{d}{dx} \prod_i f_i(x) = \left(\sum_i \frac{f_i^\prime(x)}{f_i(x)}\right)\left( \prod_j f_j(x)\right)$ while the derivative of the sums of the logarithms gives $\frac{d}{dx} \sum_i \log_e(f_i(x)) = \sum_i \frac{f_i^\prime(x)}{f_i(x)}$, and these will have the same zeros if the likelihood is positive. If the individual likelihoods involve multiplication, powers and exponentiation, then their logarithms may sometimes be easier to handle
TLDR: maximising the sum of the log-likelihoods gives the same result as maximising the product of likelihoods, but might be easier to manipulate in some cases