I am working on the following problem:
Suppose that an individual $i$ can choose from 3 alternatives for $i=1, \ldots, N$. To model the choices of the individuals we consider a Multinomial Logit [MNL] model. The random variable $Y_i \in\{1,2,3\}$ denotes the choices. The probability that individual $i$ chooses alternative $j$ is given by $$ \begin{aligned} & \operatorname{Pr}\left[Y_i=j \mid x_i\right]=\frac{\exp \left(\beta_{0, j}+\beta_{1, j} x_i\right)}{1+\sum_{l=1}^2 \exp \left(\beta_{0, l}+\beta_{1, l} x_i\right)} \text { for } j=1,2 \\ & \operatorname{Pr}\left[Y_i=3 \mid x_i\right]=\frac{1}{1+\sum_{l=1}^2 \exp \left(\beta_{0, l}+\beta_{1, l} x_i\right)}, \end{aligned} $$
The solution is:
The following expectations hold $$ \mathrm{E}\left[I\left[Y_i=j\right] \mid x_i\right]=\operatorname{Pr}\left[Y_i=j \mid x_i\right] \text { for } j=1,2 $$ and hence the moment conditions are $$ \left.\mathrm{E}\left[\left(I\left[Y_i=j\right] \mid x_i\right]-\operatorname{Pr}\left[Y_i=j \mid x_i\right]\right)\right]=0 \text { for } j=1,2 $$ and $$ \left.\mathrm{E}\left[x_i\left(I\left[Y_i=j\right] \mid x_i\right]-\operatorname{Pr}\left[Y_i=j \mid x_i\right]\right)\right]=0 \text { for } j=1,2 . $$
My question is:
I understand why the equations hold true for j=1,2. But why do the equations only hold for $j=1,2$? In other words: Why is $j=3$ excluded?
I see that there are 4 unknown parameters, thus 4 equations will result in the exact identified case. However, if you have more moment equations than parameters, these extra equation still contain information about the parameters. Therefore we could apply a minimalization structure to obtain the parameters.
Is there a reason to "throw" these equations for $j=3$ "away"?