In Boyd's Chapter 7, it writes
I am just wondering what is the reason we do not maximize the likelihood function directly and instead construts the log-likelihood function?
What is the fundamental reason that makes the product of densities harder to maximize? Is it because it is difficult to test convexity, generate gradient, or something else?

In small-$n$ problems, optimizing the likelihood may be tractable, and is in practice sometimes done. However optimizing a likelihood function that involves the product of many terms (for instance $n \sim 10^8$) is computationally difficult because you must take derivatives of extremely high powers of terms and cross terms. It is much simpler to optimize a sum of these terms.