Is this derivation of the gradient for the Conditional Logit model correct?

66 Views Asked by At

I've written a very simple implementation of the likelihood/gradient for the conditional logit model. The likelihood works fine, but the gradient vector is wrong. Is the below derivation correct? I've written an implementation in python that evaluates as I'd expect; however, the output is incorrect, leading me to believe my derivation is wrong.

The gradient I've derived is (apologies for the plain text, can't post the image of the formula):

grad = SUMi(Xijc - SUMj(Xij*e^(Xijb)) / SUMj(e^Xijb))

Here, i is each observation, j is an alternative within observation i, c is the chosen alternative in observation i, Xij is the feature vector for choice j in i and B are the corresponding coefficients.

Link to my SO post which contains the actual implementation.

1

There are 1 best solutions below

0
On BEST ANSWER

The derivation was incorrect. In the exponentiation I was only including the feature and coefficient for the partial derivative of the given coefficient. Rather, it should have been e to the dot product of all features and coefficients.