Maximization Minimization Algorithm: Bradley-Terry Ranking

50 Views Asked by Bumbble Comm At 26 Mar 2026 - 10:19

With the help of exercices in my Machine Learning Class and course notes of Prof. Kenneth Lange available there, I'm trying to understand the derivation of the Maximization Minimization Algorithm with the example of Bradley-Terry Ranking:

Consider a sports league with $m$ teams. Assign team $i$ the skill level $\theta_i$. Bradley and Terry proposed the model $$ P(i\text{ beats } j) = \frac{\theta_i}{\theta_i + \theta_j} $$ To ensure that the skill levels are identifiable, we set $\theta_1= 1$. If $b_{i,j}$ is the number of times $i$ beats $j$, then the likelihood of the data is $$ L(\theta) = \prod_{i,j}\Big(\frac{\theta_i}{\theta_i + \theta_j}\Big)^{b_{i.j}} $$ Estimate $\theta$ by maximizing the log-likelihood $f(\theta) = \log L(\theta)$. Estimate the minorizing function $g(\theta|\theta^t)$ by using the supporting hyperplane. Maximize $g(\theta|\theta^t)$. \newline \newline \textbf{Solution} \newline The log-likelihood function is $$ f(\theta) = \sum_{i,j}b_{ij}(\log(\theta_i) - \log(\theta_i + \theta_j)) $$ We can use the inequality of the supporting hyperplane property $$ h(y) \ge h(x) + \nabla h(x)^\top(y-x) $$ For the convex function $h(x) = -\log(x)$ and we find \begin{align*} -\log(y) \ge -\log(x) -\frac{1}{x}(y-x) &=-\log(x) - \frac{y-x}{x}\\ &= -\log(x) - \frac{y}{x} + 1 \end{align*} Hence \begin{align*} -\log(y) &\ge -\log(x)- \frac{y}{x} + 1\\ \implies \sum_{i,j}\Big(-\log(y)\Big) &\ge\sum_{i,j}\Big( -\log(x)- \frac{y}{x} + 1\Big)\\ \implies \sum_{i,j}b_{ij}\Big(-\log(y)\Big) &\ge\sum_{i,j}b_{ij}\Big( -\log(x)- \frac{y}{x} + 1\Big) \end{align*} We can substitute $x = \theta_i^t + \theta_j^t$ with the previous estimate and $y = \theta_i + \theta_j$. We get \begin{align*} \sum_{i,j}b_{ij}\Big(-\log(\theta_i + \theta_j)\Big) &\ge\sum_{i,j}b_{ij}\Big( -\log(\theta_i^t + \theta_j^t)- \frac{\theta_i + \theta_j}{\theta_i^t + \theta_j^t} + 1\Big)\\ \underbrace{ \sum_{i,j}b_{ij}\Big(\log(\theta_i)-\log(\theta_i + \theta_j)\Big)}_{f(\theta)} &\ge\underbrace{\sum_{i,j}b_{ij}\Big(\log(\theta_i) -\log(\theta_i^t + \theta_j^t)- \frac{\theta_i + \theta_j}{\theta_i^t + \theta_j^t} + 1\Big)}_{g(\theta|\theta^t)}\tag{1} \end{align*} Notice that the objective now is separable w.r.t. the parameters $\theta_i$, so we need to maximize $$ \sum_{j \ne i}b_{ij}\log(\theta_i) - \sum_{j \ne i}\theta_i^t(b_{ij} + b_{ji})/(\theta_i^t + \theta_j^t)\tag{2} $$ w.r.t. $\theta_i$ for every $i$. We can put this in the form of $a \log (\theta_i) - \theta_i$ , where $a>0$. Because this function is concave, setting the derivative to zero (at $\theta_i = a$) yields the global optimum. The next estimate is therefore $$ \theta_i^{t + 1} = a = \frac{\sum_{j\ne i}b_{ij}}{\sum_{j\ne i}(b_{ij} + b_{ji})/(\theta_i^t + \theta_j^t)}\tag{3} $$ My questions

Is the derivation up to equation (1) correct?
What are the exact steps to go from (1) to (2)?
What are the exact steps to go from (2) to (3)?

Original Q&A

Maximization Minimization Algorithm: Bradley-Terry Ranking

Related Questions in SUMMATION

Related Questions in ALGORITHMS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in COMPUTATIONAL-MATHEMATICS

Related Questions in MAXIMUM-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions