estimate the midpoint of an interval given by n random variables

398 Views Asked by At

$X_1,...,X_n$ be independent, uniformly distributed random variables on the interval $[a,b]$ for unknown $a,b \in \mathbb{R}$ and $a < b$. The midpoint of the interval is supposed to be estimated given by these random variables.
The task is it to write down an estimation problem and then see if the following estimators are unbiased:
a) $T_1(x) = \sum\limits_{i=1}^{n} x_i$ ,$~~~~~$ b) $T_2(x) = \frac{1}{2} (max~ x_i + min~ x_i)$ , $~~~~~$ c) $T_3(x) = x_1$.

Now, our definition of an estimation problem is the following...
An estimation problem consists of..
1) A sample room $(S,\mathcal{C})$ (measureable space), so C is a $\sigma$-Algebra.
2) A family $\{\mathbb{P}_\vartheta: \vartheta \in \theta\}$ of probability measures on $(S, \mathcal{C})$, where $\theta$ is an arbitrary parameter set.
3) $g: \theta \longrightarrow \Gamma \subset \mathbb{R}^{d}$ an estimating function (Most of the time $g$ is the identity, so $\Gamma = \theta$ and $g(\vartheta) = \vartheta$.

Since I am quite new to statistics, this definition isn't completely clear to me.
1) To set up the estimation problem I chose $([a,b]^{n}, \mathcal{B}([a,b]^{n}))$ as the the sample room ($\mathcal{B}([a,b]^{n}$ is the Borel $\sigma$-Algebra of the interval), because every sample has to be between $a$ and $b$. But I am not sure if this is correct.
2) This is the part, which I understand the least. What is the role of the $\vartheta$ ? And how can I find out the measure $\mathbb{P}_\vartheta(A)$ for $A \in \mathcal{B}([a,b]^{n})$ ?

I hope, I used the correct translations for the mathematical terms.

1

There are 1 best solutions below

0
On BEST ANSWER

In the spirit of @Henry's suggestion: Just to start the discussion, suppose $n = 10$ and you're taking ten observations from the probability distribution $\mathsf{Unif}(a, b).$ Then $\mu = E(X_i) = \frac{a+b}{2}.$ You want functions $T$ of the data for which $E(T) = \mu.$

Of course, $a, b$ with $a < b$ are unknown in your problem, but for the moment, let's suppose $a = 0,\, b=1.$ Here are ten observations from $\mathsf{Unif}(0, 1)$ ('standard uniform'), generated by R statistical software.

x = runif(10);  x
[1] 0.8271816 0.1792871 0.9898422 0.6193321 0.9893062
[6] 0.9166923 0.2977837 0.3063521 0.6174639 0.6119303

For these data, values of the three 'estimators' $T_1, T_2, T_3,$ respectively, are shown below:

sum(x); .5*(max(x) - min(x)); x[1]
## 6.355172
## 0.4052776
## 0.8271816

In this case, $T_1 = 6.355172$ is not a promising estimator of $\mu = 1/2.$ It is not even between $a = 0$ and $b = 1.$ (Maybe, dividing the total by $n = 10$ to get the sample mean would be an improvement). The average of the max and min (often called the 'midrange') $T_2 = 0.4052776,$ seems more promising; at least it is somewhere near the middle of the support $(a,b) = (0,1)$ of the distribution from which we are sampling. Finally, $T_3 = 0.8271816,$ also seems possible; it lies in $(0, 1)$ and obviously $E(T_3) = \mu = 1/2.$

Intuitively, it seems that $T_1/n$ and $T_2$ might be better choices than $T_3$ because the first two use all the data and in ways that seem 'fair' (in the sense of unbiasedness).

It is not difficult to show that $E(T_1/n) = E(T_2) = E(T_3) = \mu = \frac{a+b}{2} = 1/2.$ No doubt, later in your course you will discuss which one of these 'unbiased estimators' is "best". One criterion for "best" is to choose the unbiased estimator with the smallest variance. (Roughly speaking, "aimed at the right target" and "aimed most precisely".)

There are unbiased estimators in addition to the three mentioned here. (The sample median 0.618398 is one additional candidate). So I'll leave it to you to discover later whether it is possible to say that one of these four is the best of all possible unbiased estimators.

Note: In this discussion I am just taking an informal and intuitive approach in hopes it will help you understand the importance of your problem. I am ignoring the "foundational issues" here, not because they are unimportant, but because I don't have access to the notation, definitions, and development from your text. So it is best if you figure that part out on your own or ask your instructor.