Testing if two uniform random variables share the same distribution

98 Views Asked by At

I recently ran into this small probability/statistics related problem, to which I could not find any immediate answers. I have a proposal of a robust solution to the problem and would like to know whether there is a more straight-forward way to do this type of "hypothesis" testing. As a background information, I have fairly much study experience regarding hypothesis testing and frequentist statistics but I don't do them as a trade and it has been (too) long since I last discussed similar problems. It follows that whenever I move out from the normal distributed world and ''standard'' testing, things begin to feel weird.

Let us have two sets of data, $x_1,\dots,x_n$ and $y_1,\dots,y_m$, where $n$ and $m$ are the sample sizes of our data sets. We assume that the data sets are sampled from two uniform distributions. From this assumption I would like to test whether the uniform distributions are the same, i.e. if they share the same support. For simplicity, I'm interested in the upper bounds of the supports and thus a one-sided test is something that we desire.

My method:

For clarity's sake, let's introduce two hypotheses $$H_0: b=d \quad \text{ and }\quad H_1: b< d, $$

where $b$ and $d$ correspond to the upper bounds of the supports of two sequences of random variables $X_1,\dots,X_n \sim U(a,b)$ and $Y_1,\dots,Y_n\sim(a,d)$ with $a<b\leq d$. The random variables are independent. In other words, $H_0$ corresponds to the situation where in fact $X_i\sim U(a,d)$.

We estimate $b$ and $d$ with reasonable estimates $\hat b := \{x_1,\dots,x_n\}$ and $\hat d :=\max\{y_1,\dots,y_n\}$, respectively. From our assumptions follow that $\hat b \leq \hat d$. Similarly, $a$ is estimated with $\hat a := \min\{x_1,\dots,x_n\}$. We continue with using a corresponding estimator $\hat B :=\max\{X_1,\dots,X_n\}$ to estimate the probability $$P(\hat B > \hat b \vert H_0 \text{ holds}) \approx P(\hat B > \hat b \vert b = \hat d \text{ and }a=\hat a) = 1-\bigg(\frac{\hat b-\hat a}{\hat d-\hat a}\bigg)^n,$$

when $\hat a < \hat b < \hat d$. The last equality is from the knowledge of cdf of a maximum value a sample of a uniformlly distributed random variable, where we substituted the unknown boundaries of the support with $\hat a$ and $\hat d$.

The probability above should give us a fairly good estimate of a probability of receiving larger maximum sample than we had gathered if the support of the underlying distribution is indeed having an upper bound of $d$ instead of $b$. If the probability calculated is ''small enough'' or otherwise to my liking, it would be reasonable to argue that then $\hat b \approx b$ is close enough to $\hat d \approx d$ for me to support the null hypothesis. (Alternatively, if the probability is high, it suggests that $\hat b$ is far from $d$.)

That is, my questions are as follows:

  1. Is this a sensible approach? Are other or better ways to test if two uniform distributions share a support?
  2. Is the approximation in calculating the conditional probability justified?
  3. Somehow it seems that a more sensible approach would be to directly examine the difference $\hat d - \hat b$, but I don't have a clear idea how to tell when the difference based on sample sizes is ''large enough''. Any clues?

Thanks!