GLM for Poisson Regression for Soccer Ratings Not Converging

Question

GLM for Poisson Regression for Soccer Ratings Not Converging

268 Views Asked by Bumbble Comm At 01 Apr 2026 - 3:07

I have been trying to formulate a model of soccer matches to help me predict the outcomes. The model I'm trying to formulate involves using Poisson regression to assign attack and defence ratings to different teams.

Let's say that I have a set of results like this:

A v B 2 0
B v C 2 1
A v C 1 1

I'm trying to fit the home and away defence ratings in a vector B such that Y = exp(X*B) where X is a matrix representing the results of the games.

The vector B is of the form:

B = [A_attack,A_defence,B_attack,B_defence,C_attack,C_defence]

From the above table of results the matrix X must look like this:

[1,0,0,-1,0,0]
[0,-1,1,0,0,0]
[0,0,1,0,0,-1]
[0,0,0,-1,1,0]
[1,0,0,0,0,-1]
[0,-1,0,0,1,0]

Finally Y represents the number of goals in all the matches. In this case Y = [2,0,2,1,1,1].

Now I've been using statsmodels, which is a Python package for doing this kind of thing and I'm running into problems.

In case anyone is familiar with statsmodels the calls I'm using are:

res = sm.GLM(Y, X, family= sm.families.Poisson()).fit(method='bfgs')

Where X and Y are a numpy Matrix and Array respectively, as defined above.

The code will often not converge. There are 20 teams in the Premier League so I need to fit 40 rankings. When the number of rows exceeds ~50 the conversion problems present themselves. For example I often see a Floating point exception: 8 message which I believe means there has been a divide by zero error.

When the method does converge the values are often non-sense, giving negative expect goals in a game.

What I would like to know is, is my modelling mathematically sound? Is there anyway I could tweak the model to make it converge?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-08-02 11:03:30

The problem here is that your model's parameters cannot be identified. That is to say that the same shift by a constant value in attack and defence ratings will produce the same differences for each row. You can fix this degree of freedom by, e.g., setting defence rating of team $C$ to zero.

Try to estimate the ratings with the following matrix that assumes C_defence = 0. You should be able to find the ratings now as everything is relative to team $C$ defensive rating:

import numpy as np


X = np.array([
    [1,0,0,-1,0],
    [0,-1,1,0,0],
    [0,0,1,0,0],
    [0,0,0,-1,1],
    [1,0,0,0,0],
    [0,-1,0,0,1]
])

Note that a better solution may be to impose $L_1$ or $L_2$ regularization for model parameters. This will also enable parameter identification. Moreover, it is especially useful when modelling football data which are quite noisy.

Finally, you may want to introduce explicitly an intercept and home team advantage (if applicable) parameter in your model.

GLM for Poisson Regression for Soccer Ratings Not Converging

There are 1 best solutions below

Related Questions in REGRESSION

Related Questions in POISSON-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions