Hypothesis testing with dependence of the statistics from the data

25 Views Asked by At

I am trying to understand why the hypothesis testing guarantees on the p-value don't work when dependence is involved. My knowledge so far is the following: let $X$ be a random variable modeling the dataset, if I a pick a statistics $\phi$ and evaluate it on a dataset tuple $x$, what is typically done is to reject the null hypothesis if $\mathbb{P}(\phi(X)\geq\phi(x)) \leq \alpha$ for some significance level $\alpha$ and accept it otherwise. If the distribution we are considering really belongs to the null hypothesis $H_0$ then the probability of making a false discovery is $\leq \alpha$ as $\mathbb{P}(\phi(X)\geq\phi(x)) \sim \mathcal{U}[0,1]$.

Now, I consider some sort of dependence: in my idea I have some function $f$ that given a dataset tuple as an input returns me the statistics $\phi$ to apply, then uniform assumption doesn't work, right?

I am formally trying to understand better what happens here and why it doesn't work. Let $$E=\{ (x,\phi=f(x)) : \text{the null hypothesis is true} \land \mathbb{P}(\phi(X)\geq\phi(x))\leq \alpha\}$$ with $E$ I want to denote the set of all the couples (dataset, statistics) that lead me to wrong conclusions and I want to understand what happens to the probability of this set (which should be the probability of making a false discovery).

My specific question is the following. If I fix the outcome of my function $f(x)=y$ and I consider $E_y=\{x: (x,y)\in E\}$, can I say that $\mathbb{P}(x\in E_y)=\leq \alpha$? The probability of $x$ being in $E_y$ is exactly the probability of $\{\text{the null hypothesis is true}\land \mathbb{P}(y(X)\geq y(x))\leq \alpha\} \leq \mathbb{P}(\mathbb{P}(y(X)\geq y(x))\leq \alpha\})$. Since I fixed $y$ do I have the guarantee? Or am I in the wrong direction?