Why is my calculating derivative of log-likelihood function a little different from that of the textbook?

860 Views Asked by At

The log-likelihood function of logit model is

$$\mathcal{L}(\beta)=\sum_n\sum_iy_{ni}\ln P_{ni}$$

where $y_{ni}=1​$ if person $n​$ chose $i​$ and zero otherwise. and

$$P_{ni}=\frac{e^{V_{ni}}}{\sum_j e^{V_{nj}}}$$

and the observed utility $$V_{ni}=\beta x_{ni}$$

At the maximum of the likelihood function, its derivative with respect to each of the parameters is zero

$$\frac{\partial\mathcal{L}(\beta)}{\partial\beta}=0$$

Now, I calculate the derivative by myself by using chain rule

$$\frac{\partial\mathcal{L}(\beta)}{\partial\beta}=\frac{\partial\mathcal{L}(\beta)}{\partial P_{ni}}\cdot\frac{\partial P_{ni}}{\partial e^{V_{ni}}}\cdot\frac{\partial e^{V_{ni}}}{\partial V_{ni}}\cdot\frac{\partial V_{ni}}{\partial \beta}$$

$$=\sum_n\sum_i\bigg[\bigg(\frac{y_{ni}}{P_{ni}}\bigg)\cdot\bigg(\frac{\sum_je^{V_{nj}}-e^{V_{ni}}}{(\sum_je^{V_{nj}})^2}\bigg)\cdot\bigg(e^{V_{ni}}\bigg)\cdot\bigg(x_{ni}\bigg)\bigg]$$

$$=\sum_n\sum_i\bigg[\bigg(\frac{y_{ni}}{P_{ni}}\bigg)\cdot\bigg(\frac{1-P_{ni}}{\sum_je^{V_{nj}}}\bigg)\cdot\bigg(e^{V_{ni}}\bigg)\cdot\bigg(x_{ni}\bigg)\bigg]$$

$$=\sum_n\sum_i\bigg[\bigg(\frac{y_{ni}}{P_{ni}}\bigg)\cdot\bigg({1-P_{ni}}\bigg)\cdot\bigg(\frac{e^{V_{ni}}}{\sum_je^{V_{nj}}}\bigg)\cdot\bigg(x_{ni}\bigg)\bigg]$$

$$=\sum_n\sum_i\bigg[\bigg(\frac{y_{ni}}{P_{ni}}\bigg)\cdot\bigg({1-P_{ni}}\bigg)\cdot\bigg(P_{ni}\bigg)\cdot\bigg(x_{ni}\bigg)\bigg]$$

$$=\sum_n\sum_iy_{ni}({1-P_{ni}})x_{ni}$$

This is my result, but it's different from the textbook's result

$$\text{textbook}=\sum_n\sum_i({y_{ni}-P_{ni}})x_{ni}$$

I recheck several times, but I still do not know how the text book includes the $y_{ni}$ into the parenthesis, any idea?