I am using the following specification to estimate a binary choice Probit model: $$ P(y_t=1|x_t) = \Phi(\pi_t), $$ where $\Phi$ is the cumulative distribution function of the normal distribution. My goal is to use Maximum Likelihood to obtain the estimates of the coefficients. I write down the log likelihood function as follows: $$ L = \sum_{t=1}^{T} (1-y_t) \cdot \log(1-\Phi(\pi_t) ) + y_t \cdot \log(\Phi(\pi_t) ) $$ It is straightforward to derive the gradient for $\beta$ coefficients: $$ \frac{\partial L}{\partial \beta} = \sum_{t=1}^{T} [-\frac{1-y_t}{1-\Phi(.)}\cdot \Phi'(.) + \frac{y_t}{\Phi(.)}\Phi'(.)] \frac{\partial \pi_t}{\partial\beta} = \sum_{t=1}^{T} \Phi'(.) \frac{y_t-\Phi(.)}{(1-\Phi(.)) \cdot \Phi(.)} \frac{\partial \pi_t}{\partial\beta} = \sum_{t=1}^{T} u_t \frac{\partial \pi_t}{\partial\beta} $$ For an ordinary probit model $\pi_t = x'_t \cdot \beta$. Then it is obvious that $\frac{\partial \pi_t}{\partial\beta} = x'_t$ and the calculation is straightforward.
Then I modify the equation for $\pi_t$ and allow for autoregressive dynamics in it: $$\pi_t = x'_t \cdot \beta + \gamma \cdot \pi_{t-1}$$ I assume that $\pi_0$ is known. Then the gradient calculation is altered because 1) $\partial \pi_t/\partial\beta$ changes, and 2) it is also required to find $\partial L / \partial \gamma$
Here is what I try. I first solve the equation for $\pi_t$ backward and then calculate the derivatives: $$\pi_t = \sum_{i=1}^{t}x'_i \beta \gamma^{t-i} + \gamma^t \pi_0 \Rightarrow \\ \frac{\partial \pi_t}{\beta} = \sum_{i=1}^{t}x'_i \gamma^{t-i} \\ \frac{\partial \pi_t}{\gamma} = \sum_{i=1}^{t}(x'_i \beta (t-i) \gamma^{t-i-1}) + t\gamma^{t-1}\pi_0 $$ Finally, $$ \frac{\partial L}{\partial \beta} = \sum_{t=1}^{T} u_t \sum_{i=1}^{t}x'_i \gamma^{t-i} \\ \frac{\partial L}{\partial \gamma} = \sum_{t=1}^{T} u_t (\sum_{i=1}^{t}x'_i \beta (t-i) \gamma^{t-i-1}) + t\gamma^{t-1}\pi_0 $$ However, when I use these calculations inside an optimizing procedure, model parameters are estimated incorrectly. I further calculate numerical derivatives of $L$ using R and find that they are different from the values I get by using the expressions above. Obviously, there seems to be a mistake in what I do.
Any suggestions would be greatly appreciated!