MLE for the sum of a normally distributed variable and constant after a specific time

73 Views Asked by At

We start off with a normally distributed random variable $X$ with known $\mu=100$ and $\sigma^2=1$, and after $\vartheta$ days, a constant $1$ gets added to the value each day. Given $X_1,...,X_n$ samples, how can I estimate $\vartheta$ with MLE?

1

There are 1 best solutions below

0
On BEST ANSWER

Your likelihood is $\prod_n \phi(X_n-100 - \mathbb I[n \ge \theta])$, with $\phi$ being the pdf of a standard normal and $\mathbb I$ being an indicator function, so its logarithm is $$\sum_n \log(\phi(X_n-100 - \mathbb I[n \ge \theta])).$$

This is discrete so there is not going to be a derivative with respect to $\theta$, but you can look at the partial sums of $\log(\phi(X_n-101)) - \log(\phi(X_n-100))$; you want the point after the point where this partial sum is most negative (if it is never negative then you want $\hat \theta =1$).

Here is a simulated example in R with $35$ samples and $\theta$ actually $15$.

set.seed(2023)
theta <- 15
maxn <- 35
jump <- 1
X <- rnorm(maxn, mean=100 + ifelse((1:maxn) >= theta, jump, 0), sd=1)
X 
# data in this example
# [1]  99.91622  99.01706  98.12493  99.81386  99.36651 101.09080  99.08627
# [8] 101.00164  99.60073  99.53188 100.32696  99.58725 100.56204 100.66336
#[15] 100.39710 101.69838 101.59585 101.45209 101.89674 101.57222 100.58835
#[22] 100.70567 102.21857 101.24411 100.55485  99.15220 100.37117 100.13892
#[29] 102.51492 103.73524 100.72512 102.27665 100.18902 100.95508 100.36059
diffloglike <- log(dnorm(X, 100+jump, 1)) -  log(dnorm(X, 100, 1))
cumdiffloglike <- cumsum(diffloglike)
thetahat <- ifelse(min(cumdiffloglike) > 0, 1, 
                   which(cumdiffloglike == min(cumdiffloglike)) + 1)
thetahat 
# 13

This actually finds the minimum in the partial sums at $12$, as shown in the chart below, but came close to doing so at $13$ or $14$ or $15$ because three successive values were close to $100.5$. So its MLE estimate with this data is $\hat \theta =13$ but could easily have been up to $16$ with a slight change to the observed data. Change the seed or parameters above to see variability in the MLE depending on the data - I deliberately found an example which did not give the correct answer.

plot(cumdiffloglike, main="Cumulative difference in log-likelihood")
abline(v=thetahat-1)
abline(h=cumdiffloglike[thetahat-1])

enter image description here