Type I Error Rate Higher than Significance Level in Likelihood Ratio Test for Exponential Distribution

18 Views Asked by At

Type I Error Rate Higher than Significance Level in Likelihood Ratio Test for Exponential Distribution

Problem Description

I am conducting a Likelihood Ratio Test (LRT) to determine if data from a two-parameter exponential distribution fits a one-parameter exponential distribution (with the location parameter fixed at zero). However, I'm observing that the Type I error rate is significantly higher than the set significance level. I am looking for insights into why this discrepancy might be occurring.

Mathematical Formulation

The LRT compares two models:

  1. Full Model: A two-parameter exponential distribution, estimating both the rate $\lambda$ and location $\theta$ parameters.
  2. Reduced Model: A one-parameter exponential distribution, with the location parameter fixed at zero and only the rate parameter estimated.

The hypothesis tested is:

  • Null Hypothesis ($H_0$): The data follows a one-parameter exponential distribution ($\theta = 0$).
  • Alternative Hypothesis ($H_1$): The data follows a two-parameter exponential distribution ($\theta \neq 0$).

The test statistic $\Lambda$ is calculated as: $\Lambda = -2 \times (\text{LL}_{\text{reduced}} - \text{LL}_{\text{full}}) $ where $\text{LL}_{\text{reduced}}$ and $\text{LL}_{\text{full}}$ are the log-likelihoods of the reduced and full models, respectively. The p-value is then obtained from the chi-square distribution with 1 degree of freedom.

Python Implementation

Here's the code I'm using for the LRT and the simulation to calculate the Type I error rate:

import numpy as np
from scipy.stats import expon, chi2
from scipy.optimize import minimize

def log_likelihood_two_param(data, params):
    theta, lamb = params
    data_adj = data[data >= theta]
    if len(data_adj) < len(data) or lamb <= 0:
        return -np.inf
    return np.sum(np.log(lamb) - lamb * (data_adj - theta))

def log_likelihood_one_param(data, lamb):
    if lamb <= 0:
        return -np.inf
    return np.sum(np.log(lamb) - lamb * data)

def likelihood_ratio_test(data):
    res_full = minimize(lambda params: -log_likelihood_two_param(data, params),
                        x0=[np.min(data), 1/np.mean(data)], method='L-BFGS-B', bounds=[(np.min(data), np.max(data)), (0, None)])
    LL_full = -res_full.fun
    res_reduced = minimize(lambda lamb: -log_likelihood_one_param(data, lamb),
                           x0=[1/np.mean(data)], method='L-BFGS-B', bounds=[(0, None)])
    LL_reduced = -res_reduced.fun
    Lambda = -2 * (LL_reduced - LL_full)
    p_value = chi2.sf(Lambda, 1)
    return p_value

def simulate_and_test(sample_size, theta_true, lambda_true, significance_level, num_simulations):
    type_1_errors = 0
    for _ in range(num_simulations):
        data = np.random.exponential(1/lambda_true, sample_size) + theta_true
        p_value = likelihood_ratio_test(data)
        if p_value < significance_level:
            type_1_errors += 1
    return type_1_errors

# Simulation parameters
num_simulations = 1000
significance_level = 0.05
sample_size = 100
theta_true = 0  # Null hypothesis
lambda_true = 0.1 # Rate parameter

type_1_error_count = simulate_and_test(sample_size, theta_true, lambda_true, significance_level, num_simulations)
type_1_error_rate = type_1_error_count / num_simulations
print(type_1_error_rate)

The output indicates a Type I error rate (around 0.15) significantly higher than the set significance level (0.05).

Question

Could anyone help me understand why the Type I error rate is higher than expected in this simulation? Are there any issues with the implementation of the LRT or the assumptions underlying the test?