*Added an Addendum at the end
Hopefully my Title isn't too vague, but I will try to elaborate here. I posted a similar question: Given 10 random letters where the number of repeated letters is known (i.e. 3,2,1,1,1,1,1), what's the formula for finding the number of combinations? And it looks like I understand the formula for getting permutations once I have the multiset, but I don't understand how to get the multisets to begin with. So this question is just about generating the multisets themselves.
I am not sure if I am using the correct terminology (distribution, possible, unique, combinations, etc.) as I am not a mathematician or student taking a class on this. I am a Software Performance Engineer trying to solve a problem, but I need to understand the problem first. So, please refrain from using terminology, symbols, or expressions that would only be understood by someone who has a deep understanding of probability and multisets to begin with.
My knowledge on this subject is only what I have been able to learn in the past 2-3 days. If you are going to use symbols, shorthand, or terminology specific to this type of math, then please explain what it means. As I found in my other question, (26 1) apparently means something way different than (5 3) and somehow (5 3) = 10!/3!2!, but (26 1) = 26/1... I don't understand how I am supposed to know that or understand it. Also, the Union, Element, and Sum symbols that I have seen in formulas related to this topic (i.e. {\displaystyle |A|=\sum {x\in \operatorname {Supp} (A)}m{A}(x)=\sum {x\in U}m{A}(x)}) don't make sense to me as they have completely different meanings and uses in my field of work. Please try to use math symbols and explanations that can be understood by anyone and don't skip steps when possible.
With that out of the way, given the 26 letters of the English alphabet, how do I find the number of possible multisets for 4 random letters, and then for 5 random letters? To understand it completely, I am looking for all possible combinations of letters. I think that If I can get formulas and explanations that can be applied to those situations, then I should be able to take those and apply them to the 10 letter problem that I am really trying to solve.
Starting with 4, if each letter is unique: 1111 (apxz). Then if there is a duplicate: 211 (aact). then if there are 2 duplicates: 22 (nnii). Then if there is a triplet: 31 (oooy). And finally a quadruplet: 4 (rrrr). Then I need to do the same for 5 letters. So, 11111 (abcde); 2111 (uuakl); 221 (ppjjx); 311 (mmmsc); 32 (hhhww); 41 (qqqqz).
Please don't get hung up on the letter examples that I have given as those are just one potential combination based on the distribution, but I am looking for all possible combinations for that distribution. For the 41 example, I could just have easily used (aaaab), (zzzzq), (jjjjs), etc. because every letter combination is equally possible. I am just trying to figure out how to figure out how many possible ways there are given a known number of repeated letters.
Remember, I am not looking for just the answer, but rather how you go about finding the answer. I need to know which formula to use and why. If I am only given answers that apply to specific scenarios, then there is no way that I can use them to solve for future problems. I am trying to learn to fish, not just be handed a fish.
Thank you in advance.
Addendum:
I'm including a few examples to hopefully illustrate what I am trying to ask. Let's say I have a four letter pattern of "evet". In the final answer to the real world problem I am looking to solve, the order will matter, but for this question the order doesn't matter. So, "veet", "eetv", "vtee" are all the same as "evet" for the purposes of this question.
So, I have that one set of 4 letters, but I need to know how many other variations fit the same pattern. Instead of "evet", I could have gotten "avat", "bbxy", or "ossp". And "avat" = "vaat", "tvaa", etc. "bbxy" = "bbyx", "xybb", "bxyb", etc. (Probably didn't need to reiterate that, but since it seems like people are going out of there way to misunderstand my question, it might have been) I could have gotten any combination of letters so long as one of them is a couple and the other two are monuples. So, how do I write a formula where I can take the number of possible letters (26), then apply the 211 distribution of letters to it so that I get the total number of possible 4 letter words that with one letter doubled - the total number of variants that are in a different order, but still contain the same letter combination?
After I have that, I will need to be able to do the same if I started with "eett" instead. Will the same formula for 211 work for 22? If not, why not? Maybe the problem is that people are answering with simplified formulas for easier versions, but I need one that I can apply universally. Where n=pool of possible letters (26), k=number of selected letters (4 in these examples, but it could just as easily be 5, 10, 20, 100, 1000, etc.), and z=the known number of repeated letters ({2,2} in this example, but it could just as easily be {4},{3,1},{1,1,1,1}, or {2,1,1} when k=4, or it could be {51,25,14,5,3,2},{95,3,1,1},etc. when k=100, or {22,17,8,2,1}, {37,5,5,1,1,1}, etc. when k=50.
I keep seeing answers like (26 1)(25 2) or (26 1)×(25 2)×(23 2), but don't understand where these are coming from. I see there is a formula for binomial coefficients: !!(−)! Which I can use to get the (26 1) if n=26, and k=1 so 26!/(1!(26-1)! = (2625!)/125! = 26/1. Then I get where the 25 comes from as a letter is already accounted for from the previous step and thus is not in the pool of possible letters. But why 2? Where does that come from? If we are using 211, why is (25 2) used? Why not (25 1)(24 1)? That gets a different answer because nothing is divided by 2, but I don't see why a 1 was used for the 2 in 211, but a 2 is used for the 11 in 211. And I haven't seen the break down answer for k=4 with z=22, but I suspect that neither (26 4) nor (26 2) will work based on previous responses. It's probably closer to (26 1)*(25 2) or something like that, but I have no way of knowing currently.
Assume that the alphabet has $(26)$ characters, and that any Multiset will be drawing its letters from the alphabet.
First, I need to define the term multiplicity, as it relates to a Multiset. Consider the Multiset $\{A,A,A,A,B,B,C,C,D,E\}$. This Multiset has:
In order to enumerate how many Multisets that there are, for a specific Multiset pattern, you have to do two things:
Identify how many distinct multiplicities occur within the Multiset.
For each distinct multiplicity, identify how many letters have this multiplicity.
If you identify a Multiset pattern by its multiplicities, then the Multiset $G$ described above, would follow the pattern $\{4,2,2,1,1\}$.
When enumerating the number of Multisets that follow this pattern, there are only two relevant characteristics:
How many distinct multiplicities that there are.
Within each distinct multiplicity, how many letters have this multiplicity.
What this indicates is that (for example), the number of distinct Multisets that follow the pattern $\{4,2,2,1,1\}$ is the exact same as the number of distinct Multisets that (for example) follow the pattern $\{17, 13, 13, 11, 11\}$.
I will illustrate these ideas by enumerating the number of distinct Multisets that follow the pattern $\{4,2,2,1,1\}$ and distinct Multisets that follow the pattern $\{17,13,13, 11,11\}.$
For $\{4,2,2,1,1\}$, you have to choose $1$ letter that will serve as the letter in the Multiset that has multiplicity $(4)$.
Then, you have to choose $2$ letters from the remaining letters of the alphabet. These $2$ letters will each serve as the letters in the Multiset that have multiplicity $(2)$.
Then, you have to choose $2$ letters from the remaining letters of the alphabet. These last $2$ letters will serve as the letters in the Multiset that have multiplicty $(1)$.
So, you have to choose letters $(3)$ times, because you have $(3)$ distinct multiplicities in the $\{4,2,2,1,1\}$ pattern. Further, for each of Selection-1, Selection-2, Selection-3, you must choose $1,2,$ and $2$ letters respectively.
So, the enumeration is
$$\binom{26}{1} \times \binom{[26 - 1]}{2} \times \binom{[26 - 1 - 2]}{2}. \tag1 $$
So, the $3$ binomial factors represent that there are $(3)$ distinct multiplicities. In each binomial factor $~\displaystyle \binom{n}{k}~$, the $k$ component represents how many different letters share the same multiplicity.
Now, consider the Multiset whose pattern is (for example) $\{17,13,13, 11,11\}.$ The enumeration in (1) above is the exact same for this Multiset pattern as it was for the $\{4,2,2,1,1\}$ pattern.
This is because each pattern involved $3$ distinct multiplicities, and each multiplicity had $1,2,$ and $2$ letters respectively that shared this multiplicity.
A different perspective, when enumerating the $\{4,2,2,1,1\}$ pattern is that first you choose $1$ specific letter that will have multiplicity $(4)$. Suppose that you choose the letter $K$. Then, when you go to choose the next $3$ letters of the alphabet, for this Multiset, you have no choice. The next $3$ letters chosen must be $K,K,K$. This is because you chose the letter $K$ to have multiplicity $(4)$.
Then, suppose that the next two letters, each of which will have multiplicity $2$ are chosen to be $N$ and $C$. Then, the next two letters that you choose must also be $N$ and $C$, to (again) conform to the multiplicity assigned to the $N$ and $C$ letters.
Take a different example: to enumerate the Multiset that follows the pattern $\{3,2,1,1,1,1,1\}$ you would:
Count the number of distinct multiplicities.
In this case there are $(3)$.
Count how many letters share each of the multiplicities.
In this case, the number of letters are $1,1,5,$ respectively.
So the enumeration here would be
$$\binom{26}{1} \times \binom{25}{1} \times \binom{24}{5}.\tag2 $$
Again, note that in the $~\displaystyle \binom{n}{k}~$ factors, in (2) above, the $k$ components follow the pattern $1,1,5$ specifically because the number of multiplicities of each of the three types is $1,1,5$.
More generally, suppose that you have a Multiset with $r$ different multiplicities. Denote these distinct multiplicites as $m_1, m_2, \cdots, m_r$. Further suppose that for each element $i$ in $\{1,2,\cdots, r\}$ the number of letters that share multiplicity $m_i$ is $p_i.$
So, it is being assumed that you have:
Because there are only $(26)$ letters in the alphabet, it is also being assumed that :
$p_1 + p_2 + \cdots + p_r \leq 26.$
Then, the enumeration of the number of distinct Multisets that follow this pattern will be:
$$\binom{26}{p_1} \times \binom{[26 - p_1] }{p_2} \times \binom{[26 - p_1 - p_2]}{p_3} \times \cdots $$
$$\times \binom{[26 - (p_1 + p_2 + \cdots + p_{r-1})]}{p_r}.$$
Note that in this generic case, that involved the distinct multiplicities $m_1, \cdots, m_r$ with $p_1, \cdots, p_r$ different letters assigned to $m_1, \cdots, m_r,$ respectively, it is irrelevant what the actual size of each of the multiplicities $m_1, m_2, \cdots, m_r$ happen to be. As a way of illustrating that, note that the variables $m_1, m_2, \cdots, m_r$ do not appear anywhere in the generic formula above.