Can we rewrite N Gaussian measurements ~iid, in one measurement with a Gaussian distribution? However, a book talking about these really confused me.
Suppose we have data from multiple related group. For example $x_{ij}$ can be the test score for student i in school j, for $j = 1:D$ and $i = 1:N_j$. We want to estimate the mean score for each school $\theta_j $and we assume that $\theta_j$ come from a common distribution $N(\mu,r^2)$ The join distribtuion has the form: $$p(\theta,X |\mu,r,\sigma) = \prod_{j=1}^D N(\theta_j|\mu,r^2) \prod_{i=1}^{N_j} N(x_{ij} | \theta_j,\sigma^2)$$
Once we have estimated $(\mu,r)$, we can compute the posteriors over the $\theta_j$'s. To do that, it simplifies matters to rewrite the joint distribution in the follow form, exploiting the fact that $N_j$ Gaussian measurements with values $x_{ij}$ and variance $\sigma^2$ are equivalent to one measurement of value $y_j := (1/N_j)\sum_{j=1}^{N_j} x_{ij}$ with variance $\sigma_j^2 := \sigma^2/N_j$
This yields $$p(\theta,X |\hat \mu,\hat r,\sigma) = \prod_{j=1}^D N(\theta_j|\hat\mu,\hat r^2) N(y_j | \theta_j,\sigma_j^2)$$
But a particular event A in which $X_{1j}= x_{1j},..., X_{nj} = x_{nj}$ is a subset of event B = {y:$y_j := (1/N_j)\sum_{j=1}^{N_j} x_{ij}$}
Can someone explain?
The real question here is why we can substitute the whole data with sufficient statistics in finding the posterior probability.
This is best to summarized in an equation
In regard of the getting the posteria, we can map to a statistic t= T(x1,...,xn) upon getting the observation(x1,...,xn) and because it simplified the calcuation of the likelihood $p(t(x)|\theta)$, we should focus the pdf of the statistic instead of the whole data