I am relatively new to statistical tests.
I have a hypothesis that I want to assess statistically. I want to know what metric (or what statistical test) should I use for it. Below, I will give an explanation of what I want:
I have a set $S$ of $M$ permutations, that is $S = ${ $\Pi_{1}, \Pi_{2},...,\Pi_{M} $}. Permutations in the set do not have equal lengths, however, their length is bounded by $N$. For example, $S=$ { $(2,1,3), (1,2,4,3,5), (1), (1)$ } is one example. All we know is that each permutation within the set consists of numbers $\{1,2,3,...,k\}$ where $k\le N$.
My hypothesis wants to get a number, say, the average of distances of permutations in the set from being sorted in ascending order, since my hypothesis asserts all permutations within the set should be sorted in ascending order.
I want the metric to have these attributes:
- A sorted observation of greater length should be more valuable (more meaningful) than a sorted observation of smaller length. For example $(1,2,3,4,5,6)$ is more meaningful than $(1,2,3)$.
- The length of permutation is involved in the metric. Specifically, since a sequence of length 1 is always sorted, the metric should not consider that.
- The metric should also consider the distance of the permutation from being sorted. For example, under the suggested metric, $(1,2,4,3,5,6)$ is better than $(1,6,2,3,4,5)$. I suspect this one may be achievable with $l_{1}$ norm.
I was wondering if there is any statistical test (with correlation and p-value) or any metric that I could use.
One thing I was considering was derangements. For example, something like this: $metric = \frac{\Sigma numbers \,not \, in \, place}{\Sigma d_{len(\Pi_{i})}}$ where $d$ is the number of possible derangement for the permutation of a specific length. It satisfies the second property but not the first and third one.
Finally, I should note that each $S$ is an observation of a greater set $observations = $ { $S_{1}, S_{2}, ...$ } and eventually, I want to use the metric or the statistical test for this set to see how close are the permutations within each observation are to being sorted and how much is the significance of such hypothesis.
Thanks.