The score is usually described as a way of measuring the sensitivity of the likelihood function to some parameter. A natural choice would be the derivative of the likelihood, but the definition uses the derivative of the log likelihood. Why?
People commonly justify the "log" as a mathematical convenience to turn multiplications of PDFs into sums or because distributions often have exponentials. But this sounds weak since the score is defined with a single generic PDF. Also, in the context of optimization, the monotonicity is sometimes mentioned as an extra feature to support its use.
I know that the score can be further used in the Fisher Information and Cramer-Rao Bound, where the reason for the "log" is evident. But in the definition of the score, the "log" seems unnecessary.
I have seen the score defined as the normalized derivative of the likelihood, which by simple calculus becomes the derivative of the log likelihood. Although I feel more comfortable with this definition, it's secondary: the emphasis is not on the normalization, but rather on the log likelihood. Why?