comparing 2 datasets which have different distributions

49 Views Asked by At

I'm currently analysing two datasets. They report the same information, but in different ways. I am looking to draw comparisons between the way items fail in each of the datasets.

  • In the first dataset we observe raw data: Time between failures of an item, $T_{ij}$, and the number of failures, $N_i$. Here, $j= 1 \ldots N_i$ for item $i$ and there are $i = 1 \ldots n$ items. Thus, $i$ represents the item number and $j$ represents the event number (failure) of that particular item.

  • In the other dataset we only see aggregates of these values: $T$ and $n$, where

    • $T= (\sum_{i=1}^{n} \sum_{j=1}^{N_i}{T_{ij}}$), the sum of all failure times.
    • $n$ is the number of units
    • We don't see any of the $N_i$s, so we don't know how many failures each item had.

The assumptions we are using are: \begin{align} T_i | \lambda_i &\sim exp( \lambda_i) \qquad (\lambda \text{ is the rate of the exponential distribution)} \\ \lambda_i &\sim Gamma(\alpha , \beta) \\ \therefore T_i &\sim Pareto( \alpha , \beta) \end{align}

or in terms of the Poisson process: \begin{align} N_i | \lambda_i &\sim Poisson( \lambda_i) \\ \lambda_i &\sim Gamma(\alpha , \beta) \end{align}

Our goal is to estimate $\alpha$ and $\beta$, and use this to compare the two datasets. This is trivial for the first dataset where we get the raw measurements, but how would you do it for the second dataset, where we only see the aggregated? What would be the best way of making comparisons between the two?