Binary choice with mixture of normal errors estimation

Question

Binary choice with mixture of normal errors estimation

59 Views Asked by Bumbble Comm At 21 Apr 2026 - 11:33

My problem is simplified as follows: I'm estimating a binary choice probit model where the error term follows a mixture of two normal distributions. This is:

$y_i = \mathbf{1} \left( \beta_0 + \beta_1 x_i + \epsilon_i > 0 \right)$

Where $\epsilon_i \sim 0.3 \mathcal{N}(-1,1) + 0.7 \mathcal{N}(1.2,0.2)$. I followed naively what I would do for other more simpler models. This is, let the likelihood function be:

$$ \text{NLL} = -\sum_{i=1}^{n} \left[ y_i \log(P(y_i=1|x_i; \Theta)) + (1-y_i) \log(1-P(y_i=1|x_i; \Theta)) \right] $$

Where $ P(y_i=1|x_i; \Theta) = w \Phi\left(\frac{\beta_0 + \beta_1 x_i - \mu_1}{1}\right) + (1-w) \Phi\left(\frac{\beta_0 + \beta_1 x_i - \mu_2}{\sigma_2}\right)$. Then, the parameters to estimate are: $\Theta = \{\beta_0, \beta_1, \mu_1, \mu_2, w, ,\sigma_1^2,\sigma_2^2\}$.

For the implementation, I simulated the data generating process. Then, to estimate the model I placed the following restrictions (in hope of identification): 1) $\sigma_1=1$, 2)Weights must sum to 1 $w+(1-w)=1$. With this I show below an implementation in matlab.

Results widely differ from the true parameters. I'm sort of novice in these topics. I have the feeling that this might be an identification problem. I tried changing initial values (even placing the true ones) but estimates are out of place. Am I wrong in my scale and location normalizations? Is my approach not correct?

Thanks!

matlab

Data Generating Process

x=randn(50000,1);
n=length(x);
mu1 = -1; sigma1 = 1; weight1 = 0.3;
mu2 = 1.2; sigma2 = 0.2; weight2 = 0.7;
beta_0=1;
beta_1=-1;

epsilon_mixture = zeros(n, 1);

% Generate the mixture distribution
for i = 1:n
    if rand() < weight1
        epsilon_mixture(i) = mu1 + sigma1 * randn();
    else
        epsilon_mixture(i) = mu2 + sigma2 * randn();
    end
end

y = (beta_0+beta_1*x+epsilon_mixture) >0;

function nll = probit_mixture_log_likelihood(params, x, y)
    % Unpack parameters
    beta_0 = params(1);
    beta_1 = params(2);
    mu1 = params(3); % Mean of the first normal distribution
    mu2 = params(4);
    p = params(5); % Mixture aux for the weight
    sigma2_aux = params(6);
        
    % Constraints for identifiability
    w = exp(p) / (1 + exp(p)); % Ensure 0 < w < 1

    % Calculate the linear predictor for all observations
    eta = beta_0 + beta_1 * x;

    sigma2 = exp(sigma2_aux); % Ensure sigma2 is positive
    
    % Calculate the probabilities for each component of the mixture for all observations
    prob1 = normcdf(eta, mu1, 1); % Fixed variance of 1 for the first normal
    prob2 = normcdf(eta, mu2, sigma2); % Variance for the second normal
    
    % Calculate the overall probabilities using the mixture weights
    prob = w * prob1 + (1 - w) * prob2;
    
    % Compute the negative log likelihood for all observations in a vectorized way
    nll = -sum(y .* log(prob + 1e-15) + (1 - y) .* log(1 - prob + 1e-15));
end

And then the maximization is done by:

initialParams = [0, 0, 0, 0, log(0.3/0.7), 0]; % Adjusted initial guesses

% Options for the optimization algorithm
options = optimoptions('fminunc', 'Algorithm', 'quasi-newton', 'Display', 'iter');

% Minimize the negative log-likelihood function
[paramEstimates, fval, exitflag, output] = fminunc(@(params) probit_mixture_log_likelihood(params, x, y), initialParams, options);

% Unpack the estimated parameters
beta_0_est = paramEstimates(1);
beta_1_est = paramEstimates(2);
mu1_est = paramEstimates(3);
mu2_est = paramEstimates(4);
p_est = paramEstimates(5);
sigma2_aux_est = paramEstimates(6);

% Recover mu2 and sigma2 from the estimated parameters
w_est = exp(p_est) / (1 + exp(p_est)); % Mixture weight for the first component, transformed back from log-odds
sigma2_est = exp(sigma2_aux_est); % Recover sigma2, ensuring it's positive```

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2024-02-26 17:08:52

Your derivation and MLE codes look good to me (I tried it out and the code works apart from that you need to also specify $\beta_0$ and $\beta_1$). It is indeed an identification issue.

Under this model specification with $\beta_0=0$ and $\beta_1=1$, you can try to plot the histogram of the latent variable $Y_i = \beta_0+\beta_1 x_i+\epsilon_i$. You will notice that, although $\epsilon_i$ takes a bimodal shape due to the mixture design, the latent variable $Y_i$ has a unimodal shape, as $Y_i$ is a convolution of $\mathcal{N}(0,1)$ and $\epsilon_i$, which smoothes out the bimodal features. But your binary choice variable $y_i$ is generated by the mixture $Y_i$ from which the bimodal feature (that identifies $\mu_1$ and $\mu_2$) is not evident. To estimate $\mu_1$ and $\mu_2$, $x_i$ can be viewed as a Gaussian noise, and in this case, the signal-to-noise ratio is simply way too small for the model to have good identification.

You can fix this issue by picking $x_i\sim\mathcal{N}(0,0.01)$, then with some good initial guesses, your code seems to work. You can also verify this by increasing the sample size of the simulation.

Binary choice with mixture of normal errors estimation

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in MATLAB

Related Questions in MAXIMUM-LIKELIHOOD

Related Questions in EXPECTATION-MAXIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions