Suppose we are given an initial data set $x$. From this dataset, we are able to compute with an appropriate hypothesis test a test statistic (i.e. with a One-Sample T-Test we would get the $t$ statistic).
Now what I am looking for is a way, where by changing the test statistic $t$ to a different value $t_{new}$, generate from it as many new datasets $x^*$, that are still constrained by a descriptive statistic of the original dataset, like the mean $\overline{x}$ or the standard deviation $\sigma_{x}$.
I am currently studying a method using simulated annealing, which is flexible enough to include as many constraints as possible, but it lacks the property of speed. Two other algorithm classes that I am looking into are an evolutionary programming approach and the one I am most curious about is an adversarial approach using Constrained GANs.
My assumption is that one could train a network for a specific test only, by feeding it a dataset of data together with the output test statistic $t$ and teach the generator to produce data, where the computed test statistic $t_{comput}$ would be relatively close to the inputted test statistic $t_{input}$.
$$ t_{comput} \simeq t_{input} $$
I am curious if any of you have had a similar problem, have any tips for me or would suggest a different approach altogether?
Thank you for your help.
I'm not sure I understand what you want or why. So my first guess may be far too simple.
Suppose we test $H_0:\mu = 30$ against $H_1: \mu \ne 30$ with $n = 100$ observations from $\mathsf{Norm}(\mu = 35, \sigma = 5).$
In R, I generate data
xin R statistical software rounded to four decimal places for an example. I show the seed so you can get exactly the same dataset if you wish:Now I want to constrain the situation to keep $a = 35.53788, t = 10.95725, n = 100$ and hypothetical mean 30. If I want a new dataset meeting these constraints, it must also have sample standard deviation $s = 5.054078.$
To do this; I sample
zof length 100 from standard normal.Thus, simply by rescaling and shifting
z, I have gotten a new data vectorx1with exactly the same t statistic asx. If I want it rounded to three places, that does not change the t statistic by much.This seems too simple, so I await your comment.