I'm reading the very good All of Statistics by Larry Wasserman and I have a doubt. How do you define the score function?
Wasserman defines it as:
$$s(X; p) = \frac{\partial \log f(x; p)}{\partial p}$$
where $f(x; p)$ is the probability density, while in wikipedia:
$$s = \frac{\partial}{\partial p} \sum_{i=1}^{n}\log f(x_i; p)$$
Perhaps, as @Francisco below noted, there's the equivalence $f(x;p) = \prod_{i=1}^{n}f(x_i;p)$, but I'm not so sure Wasserman uses that criterion. For instance, he gives the example of a $X_1, X_2, ..., X_n \sim \text{Bernoulli}(p)$. The score is
$$s(X; p) = \frac{X}{p} + \frac{1 - X}{1 - p}.$$
This is result you get if you calculate with respect to a single $X$ Bernoulli variable. If you do the calculation with the full set of $X_i$ (so you use the wikipedia definition) you get a different result. For instance, if you want to calculate the maximum likelihood estimator, you need to take into account all the $X_{i}$, so why do you use only one $X$ for the score function? This seems like unnatural for me.
In other words: How would you do the case of the Bernoulli distribution above?
Thanks.
but $$f(x;p) = \prod_{i=1}^{n} f(x_i; p) $$ no? ... so both definitions are actually the same. I do not see the difference, the log of a product is the sum of the logs...