When and why do formulae involving sums over $x_i$ change to formulae involving $X$ in statistics? Specifically when dealing with likelihoods.

52 Views Asked by At

I've been reading up on stats recently and a question I'm working through involves calculating the log-likelihood of a distribution w.r.t a parameter $\beta$.

From my understanding, for some probability density function $f(x)$ that depends on a parameter $\beta$, the likelihood is defined as $$[1] \qquad L(\beta) = f(x_1|\beta)\times f(x_2|\beta)\times ... f(x_n|\beta) = \prod_i^n f(x_i|\beta) $$

and the log likelihood as $$[2] \qquad l(\beta) = \sum_i^n \log[f(x_i|\beta)] $$

The answer to the question then goes on to declare $$[3] \qquad l(\beta) = n \log[f(X|\beta)] $$

My question is, why can you change from a sum over $x_i$ in [2] to just a $X$ in [3]? Is [3] just a short hand for [2] or is there an important statistical concept or convention that I've not encountered?

From reading books and online searches it seems to be something to do with considering the whole distribution of $X$, but I've not encountered a proper explanation of this or an intuitive explanation. My intuition is that [3] is wrong and that you can only sum over the $x_i$'s if $x_1=x_2=...=x_n$, but then I'm still confused as to why the $x$'s would change to an $X$.

Thanks in advance.

--- Edit with more context --- Thanks for people's help so far. I think i need to explain my question a bit better so I'm going to add some context of the problem I'm trying to solve.

The problem that lead me to ask this question was about deriving the Cramer-Rao lower bound using a formula involving the second derivative of $\log[f(x|\beta)]$.

From the book I'm using, I have the CRLB as $$[4] \qquad V(\hat{\beta}) \geq \frac{1}{I(\beta)} $$

and the information as $$[5] \qquad I(\beta) = n i(\beta) = E[-l''(\beta)] = E[U(\beta)^2] $$

I also have some extra information from the question, $$[5] \qquad \frac{d}{d\beta}\log[f(x|\beta)] = \frac{1}{\beta} + log[x] $$

from this I can get the second derivative $$[6] \qquad \frac{d^2}{d\beta^2}\log[f(x|\beta)] = \frac{-1}{\beta^2} $$

This is where I got stuck. From looking at the information given to me, I'm pretty sure I have to use the $I(\beta) = E[-l''(\beta)]$ version of [5] to find the CRLB. And they've given [6] which heavily implies I have to use [2] to find the answer.

My logic for the next step was $$[7] \qquad I(\beta) = E\left[-\frac{d^2}{d\beta^2}\left(\sum_i^n \log[f(x_i|\beta)]\right)\right] $$

and you can put the derivative inside the sum to get $$[8] \qquad I(\beta) = E\left[-\sum_i^n\left(\frac{d^2}{d\beta^2}\log[f(x_i|\beta)]\right)\right] $$

Here is where I got stuck, I'm don't know if I can use [6] to solve [8] as [6] involves an $x$, whereas [8] involves an $x_i$.

I have the answer for this question provided to me, so I looked there for guidance but it was pretty unhelpful. I've copied it below in case it's useful to you guys

book answer

The CRLB is $\frac{1}{I(\beta)}$ and $I(\beta)=E(-l''(\beta)) $.

so $$ CRLB = \frac{-1}{n E\left(\frac{d^2}{d\beta^2}\log[f(X|\beta)]\right)} = \frac{-1}{n\frac{-1}{\beta^2}} = \frac{\beta^2}{n} $$

I'll call the last equation above [BA] for "book answer". I have a few questions about [BA]

  • I've been dealing with $f(x|\beta)$ throughout the question, why does it change to $f(X|\beta)$ now?

    • Also, if [8] is correct, why does $x_i$ change to $X$?
  • Where does the $n$ in [BA] come from?

I tried working backwards from [BA] towards my equation [7], that's where I got [3] from originally.

I think that I'm not understanding some part of the notation regarding $x_i$ $x$ and $X$. My current thinking is that $X$ is a random variable that has some associated p.d.f, $x_i$ is the $i^{th}$ "draw" from $X$ and $x$ are all of the "draws" from $X$ collected in a vector. But I'm pretty sure this must be wrong.

Thanks again for your help :-)

1

There are 1 best solutions below

2
On

There are various issues here:

  • It is common to use uppercase $\mathbf X$ and $X_i$ to indicate random variables, and to use lowercase $\mathbf x$ and $x_i$ to indicate particular values

  • It is common to use $\mathbf X$ and $\mathbf x$ to represent vectors or tuples (the alternatives $\overrightarrow{X}$ and $\overrightarrow{x}$ are rarely used in statistics), and to use $X_i$ and $x_i$ to indicate particular elements of the vectors

  • If you are sampling with replacement or from a continuous distribution, then there is typically an assumption that the random variables for the elements of the sample are independent and identically distributed. If so, the joint probability of the sample is the product of the individual probabilities (or the equivalent statement on densities). So the likelihood of the parameter given the observations is proportional to the product of the individual likelihoods; taking the logarithm of this means the log-likelihood of the parameter given the observations is proportional to the sum of the individual log-likelihoods.

  • Thus you do not need the observations to all be equal as suggested in your $x_1=x_2=\cdots=x_n$ for this to work, but you do need the random variables $X_1,X_2,\ldots,X_n$ to be identically distributed and independent.

I would read

  • $f(\mathbf x \mid \beta)$ as the joint density equal to $\prod f(x_i \mid \beta)$,
  • the likelihood of $\beta$ given the observation $\mathbf x$ as being proportional to this,
  • the log-likelihood of $\beta$ given the observation $\mathbf x$ being a constant plus $\sum \log( f(x_i \mid \beta))$ which might be also be written as $\log( f(\mathbf x \mid \beta))$
  • as this a function of $\mathbf x$ it can be applied to $X$ and written as $\log( f(\mathbf X \mid \beta))$, noting that such a function of a random variable is also a random variable

I cannot see where the $n$ came from in your [3]