I have a memory experiment where on each trial, a 7 letter scrambled word is presented, and after a delay the participant is shown the intact word and has to type the previously seen scrambled word exactly as it appeared. They do this for 100 trials, and I collect 50 subjects worth of data. I score each trial from 0 to 7, where 7 means they got the order of letters exactly correct, 5 means they made a mistake on 2/7 letters, and 0 means they completely messed up the order of the letters.
What distribution would make the most sense for modeling this discrete and ordinal data? I'm assuming that it could be best characterized by a mixture of two underlying distributions, a random guessing distribution (randomly inputting letter orderings) and a non-guessing distribution (participant has at least some memory of the correct ordering). I am having trouble figuring out what kind of distribution (e.g., negative binomial, normal gaussian, etc.) would make the most sense to characterize these distributions.
Any thoughts are appreciated!
More details: As an example, it would be like showing COPPRON for a fraction of a second, and then showing POPCORN and asking the subject to correctly type the previously seen scrambled word. I am looking for a distribution that can characterize the 0-7 point scale data I would receive. So for the non-guessing distribution, I want a distribution that can be bounded from 0 to 7 (or 0 to 1 if I normalize the data) where there is likely a peak centered around the most common score that the subject showcased (probably 5 or 6 depending on the difficulty). For the random guessing distribution, it's the same, but it's the distribution that results if we assume a subject always randomly typing the ordering of the letters.
I guess what I want to know is what type of distribution may be appropriate, in general, for characterizing discrete, bounded, ordinal data? The exact distribution parameters (like scale or shape) are not what I am interested in, rather the type of probability distribution that might make sense if we assume some sort of bounded distribution with a peak not at the boundaries.
For words with seven unique letters, the guessing distribution (where the subject simply writes the seven letters in some arbitrary order) has scores given by the Rencontres numbers. Fortunately, the Wikipedia plot summary gives the values for $n = 7$, so that the probability of a score of $k$ is simply
$$ \frac{D_{7, k}}{7!} = \frac{D_{7, k}}{5040} $$
where the exclamation point represents the factorial function. This can be approximated fairly closely by
$$ \frac{1}{k!e} $$
where $e \approx 2.71828$ is the base of the natural logarithm. Note that the distribution is to first order independent of the number of letters.
For words with repeated letters (e.g., where there are four permutations of P$_1$O$_1$P$_2$CO$_2$RN which leave the spelling unchanged), the distribution has no easy characterization. It's probably not too difficult to write a bit of code to brute force the exact probabilities.
As to a non-guessing distribution, that depends a lot on what you expect someone might remember. There's no standard for that. In your position, I'd probably just tabulate and see what you get in comparison to the guessing distribution. You can use something like chi-squared if you want something non-parametric, though it might be interesting to see if you get something that's easily parametrizable.