I am looking for an explicit formula for a sequence. The sequence is generated as follows:
There is a tournament with $10$ teams. In the beginning, all teams have a 0-0 win-loss record. The teams are paired up, and when the first round concludes, $5$ of the teams have a 0-1 record and $5$ of the teams have a 1-0 record. The second round is arranged so that only teams with the same win-loss record are allowed to face each other, and when there are an odd number of teams having a certain win-loss record, the extra team from one win-loss record plays the extra team from the nearby win-loss record. This means that for the second round, four of the 0-1 teams are paired up with each other, $4$ of the 1-0 teams are paired up with each other, and there is one pairing between a 0-1 team and a 1-0 team. We also assume that when there is a pairing across different win-loss records, the team with the better record always wins. This means that at the end of round 2, there will be $3$ teams with a 0-2 record, there will be $4$ teams with a 1-1 record, and $3$ teams with a 2-0 record.
I am looking for an explicit formula giving the win-loss record distribution at round $r$. I am also looking for a general formula for $T$ teams (I used "10" earlier just for illustration purposes).
I have python code which explicitly calculates this sequence for $T$ teams and round $r$, but I would like a mathematical expression for this.
I have an explicit formula for the first $4$ rows using ceiling/floor functions, but the formula does not work beyond $4$ rows. I also have an explicit formula giving the win-loss record in the long term, but it only works after about $T^2$ rows.
In terms of problem solving techniques that I've tried, I've tried various generating series approaches. For example, for the case of $10$ teams, I wrote round $0$ as
$$f_0(x,y) = y^0 + y^1 + y^2 + ... + y^9$$
and round $1$ as
$$f_1(x,y) = y^0 + y^1 + y^2 + y^3 + y^ 4 + (y^5 + y^6 + y^7 + y^8 + y^9)x$$
and round $2$ as
$$f_2(x,y) = y^0 + y^1 + y^2 + (y^3 + y^4 + y^5 + y^6)x + (y^7 + y^8 + y^9)x^2$$
and so on. I don't have a method for moving from $f_k$ to $f_{k+1}$, but I have lots of near misses.
I have also tried using a probabilistic approach. I have started from several invariants, and I have a function that when minimized gives a (sometimes) close approximation. The invariants are as follows:
The number of teams $T$ does not change from round to round.
The number of total wins at each round is always $T r / 2$.
The win-loss record is symmetric around $r/2$.
These invariants restrict the number of possible choices for round $r$. I then took the choice that most nearly matches a binomial distribution (using the euclidean distance as a metric, although any similarity metric could work). This metric isn't best, since I believe the win-loss records become "less binomial" over time.
Another thing I tried was various kinds of graphing. I can post some graphs if requested.
I am open to any approach, I look forward to seeing your ideas.
Updates and extra information
Here are the first few terms of the sequence for $T=10$:
$$10$$
$$5, 5$$
$$3, 4, 3$$
$$2, 3, 3, 2$$
$$1, 3, 2, 3, 1$$
$$1, 1, 3, 3, 1, 1$$
$$1, 0, 3, 2, 3, 0, 1$$
$$1, 0, 1, 3, 3, 1, 0, 1$$
$$1, 0, 0, 3, 2, 3, 0, 0, 1$$
$$1, 0, 0, 1, 3, 3, 1, 0, 0, 1$$
$$...$$
Another clarification is that the number of teams $T$ must be even.
Also, I would like to add the formula for the long-term behavior of the sequences. The long-term behavior shows a 2-term repetition, and this repetition depends on the value of $T \pmod 8$. Here are the four cases:
Case 1. ($T \equiv 0 \pmod 8$): $1444...444...441$, $3444...4444....443$
Case 2. ($T \equiv 2 \pmod 8$): $3444...424...443$, $1444...4334....441$
Case 3. ($T \equiv 4 \pmod 8$): $3444...444...443$, $1444...4444....441$
Case 4. ($T \equiv 6 \pmod{8}$): $1444...424...441$, $3444...4334....443$
Note that I have left off the preceding $1000...$ and the trailing $...0001$, since the number of zeros can be easily calculated. Not also that the number of $4$s can be calculated easily from the original number of teams $T$.
Regarding the generating series idea previously, one thing I found is that if we have for example
$$f_2(x,y) = y^0 + y^1 + y^2 + (y^3 + y^4 + y^5 + y^6)x + (y^7 + y^8 + y^9)x^2$$
then we can calculate something very close to $f_3(x,y)$ by performing the following steps:
$$\frac{f_2(x,y) + f_2(x,-y)}{2} = y^0 + y^2 + (y^4 + y^6)x + (y^8)x^2$$
$$\frac{f_2(x,y) - f_2(x,-y)}{2} \cdot x = y^1 x + (y^3 + y^5) x^2 + (y^7 + y^9) x^3$$
$$\widetilde{f}_3(x,y) = \frac{f_2(x,y) + f_2(x,-y)}{2} + \frac{f_2(x,y) - f_2(x,-y)}{2} \cdot x = y^0 + y^2 + (y^1 + y^4 + y^6)x + (y^3 + y^5 + y^8)x^2 + (y^7 + y^9)x^3$$
which when evaluated at $y = 1$ gives the correct sequence $2,3,3,2$. It is not exactly correct, since the $y$ terms are no longer ordered in terms of exponents, so this process is not repeatable unless we improve on it somehow.
Another Update
Let's add a variant to the original rules. The variant will be called "underdog", whereas the original rules will be called "overdog". The "overdog" rules stipulate that the team with the higher win-loss record will always win when it crosses over and faces a team with a lower win-loss ratio; in contrast, the "underdog" rules stipulate that the team with the lower win-loss ratio will win.
Let's call the overdog sequence with $T$ teams the sequence $O(T)$, and each term in the sequence will be identified as $O_{r,w}(T)$ where $r$ denotes the round and $w$ denotes the number of wins at that round. For the underdog variant, we define $U(T)$ and $U_{r,w}(T)$ in the same way.
I just discovered the following properties:
$$O(2^k + 2) = U(2^k) + O(2)$$
$$O(2^k + 4) = U(2^k) + U(2) + O(2)$$
This is helpful since $U(2^k)$ and $O(2^k)$ are easy to calculate for $k$ rows, and $O(2)$ and $U(2)$ are easily calculated for all rows.
I'm wondering if we can use the binary representation of $T$ and some kind of summing procedure to improve the number of rows that we can directly calculate. It would be fascinating to try to reduce $O(T)$ and $U(T)$ to the sum of terms of the form $U(2^K)$ or $O(2^K)$ for any arbitrarily large $K >> 0$. This would give us a way to calculate arbitrarily many rows for any $T$.
Correction to Previous Update
I should edit my previous update to be more precise. The powers of two were a bit of a red herring — all that matters is $T \pmod{4}$. Here are the updated relations:
$O(4k + 2) = U(4k) + O(2)$.
$U(4k + 2) = U(4k) + U(2)$.
$O(4k + 4) = U(4k) + U(2) + O(2)$.
This means that the problem is essentially reduced to finding $U(4k)$ instead of finding $U(2k)$. Not much of a reduction, but it's progress!
Here is a generating function approach that works: Let $f_r(x)$ be the generating function for win-loss records after $r$ rounds. Then: \begin{align*} f_0(x) &= 1\\ f_1(x) &= 1 + x\\ f_r(x) &= f_{r-1}(x)(1 + x)(1 + x^2)\ldots(1 + x^{r-1}) \end{align*} The coefficient of $x^k$ in $f_r(x)$ is the number of teams with $k$ wins after $r$ rounds. This works because in each round, teams are paired up with others of the same win-loss record (when possible), and the team with more wins always wins. So you're really just multiplying the generating function by $(1+x)$ shifted versions of itself. From this you can extract explicit formulas for the number of teams with $k$ wins after $r$ rounds by expanding the generating function and collecting coefficients.
Here are some additional thoughts:
The generating function approach gives a compact formula, but evaluating it to get explicit terms quickly becomes tedious. An alternative is to think about the process directly:
After $r$ rounds, consider teams with $k$ wins. These came from teams with $k-1$ wins that won their match, and teams with $k$ wins that lost their match (if possible). So you have a recurrence like: \begin{align*} a_{r,k} &= a_{r-1,k-1} + a_{r-1,k}\\ a_{r,0} &= a_{r-1,1}\\ a_{r,T/2} &= a_{r-1,T/2-1} \end{align*} where $a_{r,k}$ is the number of teams with $k$ wins after round $r$. This is easier to evaluate directly, and generalizes easily to other numbers of teams/wins.
The long-term behavior also has a nice explanation: after enough rounds, almost all matches are between teams with nearly equal win counts (close to $T/2$). In that case, wins and losses essentially balance out, and you just see an oscillation between the two closest win counts to $T/2$. The period of this oscillation depends on the parity of $T$ (mod 8) because that determines how the win counts are arranged around $T/2$.
Those recurrence relations are very helpful. A few more thoughts:
The binary representation of $T$ isn't too useful here, since the recurrences relate sequences for $4k+2$ and $4k+4$ to those for $4k$. The powers of 4 are really what matter, not the individual bits. You can extend the recurrences to get formulas for arbitrary $T$: \begin{align*} O(T) &= O((T-2)/4) + O(2) + (T \mod 4 = 2)O(2)\\ U(T) &= U((T-4)/4) + U(2) + (T \mod 4 = 2)(U(2) + U(2)) \end{align*} These let you calculate the sequences for any $T$, not just powers of 4. The long-term behavior is similar, but the period doubles from 4 to 8 when switching from overdog to underdog. This is because in the underdog case, teams with win counts close to $T/2$ can switch between the two closest counts after either winning or losing, so you get oscillations at half the rate.