Optimization methods for probability distributions with respect to percentiles (for example medians)?

181 Views Asked by At

This question became quite a mess because of unclarities and need to edit several times, so I made a new one over here. Please do not answer this question but go to the updated one instead.


I am aware of many methods in statistics to optimize with respect to moments like mean $\mu$ and variance $\sigma^2$, but which methods exist to optimize with respect to a given median $(50\%)$ or more general case $x\%$ percentile? I don't remember it being treated in any literature I have read.


Edit: Clarification. The intent was to ask about methods which estimate the parameters of parametric distributions given the position of the prescribed percentiles. For a trivial example a uniform distribution prescribed $50\%$ at $x=1$ and $100\%$ at $x=2$ would be the uniform distribution with density function constant $0.5$ on the interval $x\in [0,2]$ and $0$ everywhere else.


Edit 2

Ok I will try rephrase again. How to find the parameters for a distribution having the cumulative distribution function $F$ satisfying $F(x_k) = c_k$ for some $n$ prescribed pairs $(x_k,c_k)$, $k \in \{1,2,\cdots,n\}$ or in terms of density function $f$: $\int_{-\infty}^{x_k} f(t) dt=c_k$. Or if not solvable, to minimize the error in some suitable way. (Maybe this error minimization procedure is where efforts should be spent?)

1

There are 1 best solutions below

7
On BEST ANSWER

Not sure what you mean by "optimize with respect to." An obvious way to estimate the population median, sometimes denoted $\eta,$ is to find the sample median, sometimes denoted $H.$

For many symmetrical distributions, the population mean $\mu$ is the same as the population median $\eta.$ Then the 'best' estimator of $\eta$ from a sample $X_1, X_2, \dots, X_n$ may be the sample mean $\bar X.$ An example is a normal distribution. The unbiased estimator of the 'center' $\mu = \eta$ with the smallest variance is $\bar X.$

Without going into the distribution theory, here is a demonstration via simulation for samples of size $n = 10$ from a normal distribution, that both $\bar X$ (A in the R code) and $H$ are unbiased estimators of the center, but that the sample mean is a less-variable estimator of the center than is the sample median.

m = 10^6;  n = 10
x = rnorm(m*n, 100, 15)
DTA = matrix(x, nrow=m)   # each of m rows is a normal sample of size n
A = rowMeans(DTA)         # vector of m sample means
H = apply(DTA, 1, median) # vector of m sample medians
mean(A);  mean(H)         # A and H both unbiased estimators of center
## 99.99282                 # aprx E(A) = 100
## 99.99174                 # aprx E(H) = 100
sd(A);  sd(H)             # But A has less variability than H
## 4.747664                 # aprx SD(A) = 15/sqrt(10) = 4.743416
## 5.585213                 # indicates SD(H) > SD(A)

However, the Laplace (double exponential) distribution is also symmetrical and the best estimator of the center $\mu = \eta$ is the sample median $H.$

Here is an analogous demo that for a Laplace distribution, the sample median is a better estimator of the center than the sample mean. I generated a Laplace random variable $X$ as $X = U - V + 100,$ where $U$ and $V$ are independent exponential distributions with rate 1. Thus $E(X) = 100.$

m = 10^6;  n = 10
x = rexp(m*n)-rexp(m*n)+100
DTA = matrix(x, nrow=m)
A = rowMeans(DTA);  H = apply(DTA, 1, median)
mean(A);  mean(H)  # Both estimators unbiased
## 99.99987
## 99.99969
sd(A);  sd(H)      # But median H has less variability
## 0.4466381
## 0.3809883

Also, there are cases in which neither $\bar X$ nor $H$ is best. An example is uniform distributions of the form $\mathsf{Unif}(0, \theta),$ for which the center is $\mu = \eta = \theta/2.$ An unbiased estimator of $\theta$ is $\hat \theta = \frac{n+1}{n}X_{(n)} = \max X_i.$ Then the best estimator of the center is $\hat \theta/2.$

There are even some useful symmetrical distributions that do not have a population mean $\mu,$ and so one may use the median $\eta$ as the center and try to estimate it by $H$. An example is Student's t distribution with one degree of freedom.

Finally, there are distributions in which $\mu \ne \eta,$ such as the family of gamma distributions, including the exponential. For these, it may be best to use $\bar X$ to estimate $\mu,$ find the relationship between $\mu$ and $\eta,$ and modify the estimator of $\mu$ to get an estimator of $\eta.$

In a mathematical statistics course one important topic of discussion is methods of finding optimal estimators for various parameters. This is not the place for a full discussion. If you can say something about your background in statistics and the situation(s) in which you want to estimate population medians and quantiles, perhaps I or someone else can give an answer targeted on your primary interests.