Proving a martingale property of the empirical distribution function

270 Views Asked by At

Let $X_1,\ldots,X_n$ be iid random variables from the uniform distribution on $[0,1]$. Let $F_n(t)=\sum_{i=1}^n \mathbf{1}_{\{X_i\leq t\}}$ be the empirical distribution function.

I would like to prove that for $0\leq s\leq t <1$, $$E \left(\frac{F_n(t)-t}{1-t}\Big|F_n(u),u\leq s\right)=E \left(\frac{F_n(t)-t}{1-t}\Big|F_n(s)\right)=\frac{F_n(s)-s}{1-s}.$$

Any hint or reference is appreciated.

2

There are 2 best solutions below

0
On

Throughout my answer, I will consider iid random variables $X_i$ with a continuous cumulative distribution function $F$. For the particular case that the $X_i$ are uniformly distributed on $[0,1]$ we can apply the result below for $F(r) = r$, $r \in [0,1]$.

Step 1: Determine the distribution of the random vector $$(F_n(s_0),\ldots,F_n(s_k))$$ for any choice of $0 \leq s_1 < \ldots < s_k$.

Fix $0 \leq s_0 \leq \ldots \leq s_k$ for some $k \in \mathbb{N}$. For each $i \in \{1,\ldots,n\}$ the events $$\{X_i \leq s_0\}, \{s_0 < X_i \leq s_1\}, \ldots, \{s_{k-1} < X_i \leq s_{k}\}, \{X_i>s_{k}\}$$

are pairwise disjoint and their probabilities add up to $1$. Since the random variables $X_i$, $i=1,\ldots,n$, are independent and identically distributed, it follows that the random vector

$$n (F_n(s_0),F_n(s_1)-F_n(s_0),\ldots,F_n(s_{k})-F_n(s_{k-1}),1-F_n(s_{k})) \\ = \left( \sum_{i=1}^n 1_{\{X_i \leq s_0\}}, \sum_{i=1}^n 1_{\{s_0<X_i \leq s_1\}},\ldots, \sum_{i=1}^n 1_{\{s_{k-1} < X_i \leq s_{k}\}},\sum_{i=1}^n 1_{\{X_i>s_{k}\}} \right)$$

has a multinomial distribution with parameters $n$ and $$p := (F(s_0),F(s_1)-F(s_0),\ldots,F(s_{k})-F(s_{k-1}),1-F(s_{k}));$$ here $F$ denotes the cumulative distribution function of $X_1$. In particular, we have for any $0 \leq m_0 \leq \ldots \leq m_{k} \leq n$

$$\begin{align*} &\mathbb{P} \left( \forall j=0,\ldots,k: F_n(s_j) = \frac{m_j}{n} \right) \\ &= \mathbb{P} \bigg( F_n(s_0) = \frac{m_0}{n}, F_n(s_1)-F_n(s_0) =\frac{m_1-m_0}{n},\ldots,\\ &\qquad \quad F_n(s_k)-F_n(s_{k-1}) = \frac{m_k-m_{k-1}}{n}, 1-F_n(s_k) = \frac{n-m_k}{n} \bigg) \\ &= b_{n,m_0,\ldots,m_k} F(s_0)^{m_0} \prod_{j=1}^{k} (F(s_j)-F(s_{j-1}))^{m_j-m_{j-1}} (1-F(s_k))^{n-m_k} \tag{1} \end{align*}$$

where

$$b_{n,m_0,\ldots,m_k} := \begin{pmatrix} n \\ m_0, m_1-m_0, \ldots, m_k-m_{k-1}, n-m_k \end{pmatrix}$$

is a multinomial coefficient.

Step 2: Show that $$\mathbb{P}\left(F_n(t) = \frac{m}{n} \mid F_n(u), u \leq r \right) = \mathbb{P}\left(F_n(t) = \frac{m}{n} \mid F_n(r) \right)$$ for any $0 \leq m \leq n$ and $r \leq t$.

Fix $0 \leq r_0 < \ldots < r_i =: r \leq t $ for some $i \in \mathbb{N}$. To prove the assertion, we are going to apply Step 1 several times. First of all, it follows from Step 1 (applied twice) that for any $0 \leq m_0 \leq \ldots \leq m_{i}$

$$\begin{align*} & \mathbb{P}\left( F_n(t) = \frac{m}{n} \mid \forall j=0,\ldots,i: F_n(r_j) = \frac{m_j}{n} \right)\\ &= \frac{\mathbb{P}(\forall j=0,\ldots,i+1: F_n(r_j) = m_j/n)}{\mathbb{P}(\forall j=0,\ldots,i: F_n(r_j) = m_j/n)} \\ &= \begin{pmatrix} n-m \\ n-m_i \end{pmatrix} \left( \frac{F(t)-F(r_i)}{1-F(r_i)} \right)^{m-m_{i}} \left( \frac{1-F(t)}{1-F(r_i)} \right)^{n-m} \end{align*}$$ where we have set in the second line of the calculation $r_{i+1} := t$ and $m_{i+1} := m$. On the other hand, it follows from Step 1 that

$$\begin{align*} &\mathbb{P} \left( F_n(t)= \frac{m}{n}, F_n(r_i) = \frac{m_i}{n} \right)\\ &= \begin{pmatrix} n \\ m_i \quad m-m_i \quad n-m \end{pmatrix} F(r_i)^{m_i} (F(t)-F(r_i))^{m-m_i} (1-F(t))^{n-m}\end{align*}$$ and

$$\mathbb{P} \left( F_n(r_i) = \frac{m_i}{n} \right) = \begin{pmatrix} n \\ m_i \end{pmatrix} F(r_i)^{m_i} (1-F(r_i))^{n-m_i}.$$

which yields that

$$\begin{align*} &\mathbb{P}\left( F_n(t)= \frac{m}{n} \mid F_n(r_i) = \frac{m_i}{n} \right) \\ &= \frac{\mathbb{P}(F_n(t)=m/n, F_n(r_i)= m_i/n)}{\mathbb{P}(F_n(r_i)=m_i/n)} \\ &= \begin{pmatrix} n-m \\ n-m_j \end{pmatrix} \left( \frac{F(t)-F(r_i)}{1-F(r_i)} \right)^{m-m_{i}} \left( \frac{1-F(t)}{1-F(r_i)} \right)^{n-m}. \end{align*}$$

Comparing this with the earlier computation we conclude that

$$\begin{align*} &\mathbb{P}\left( F_n(t)= \frac{m}{n} \mid F_n(r_i) = \frac{m_i}{n} \right)\\ &\quad = \mathbb{P}\left( F_n(t)= \frac{m}{n} \mid \forall j=0,\ldots,i: F_n(r_j) = \frac{m_j}{n} \right). \end{align*}$$

Since sets of the form $$\bigcap_{j=0}^i \{F_n(r_j) = m_j/n\}$$ with $0 \leq r_0 \leq \ldots \leq r_i \leq r$, $0 \leq m_0 \leq \ldots \leq m_i \leq n$, $i \in \mathbb{N}$, are a $\cap$-stable generator of $\sigma(F_n(u); u \leq r)$, this proves the assertion.

Step 3: Prove that $$\mathbb{E}(F_n(t) \mid F_n(u), u \leq s) = F_n(s) + (1-F_n(s)) \frac{F(t)-F(s)}{1-F(s)}$$ for any $s \leq t$ such that $F(s)<1$.

Fix $s \leq t$ and $m_1 \in \{0,\ldots,n\}$. Using Step 1, we find

$$\begin{align*} &\mathbb{E}\left(F_n(t) \mid F_n(s) = \frac{m_1}{n} \right) \\ &= \sum_{m_2=m_1}^n \frac{m_2}{n} \mathbb{P}\left(F_n(t) = \frac{m_2}{n} \mid F_n(s) = \frac{m_1}{n} \right) \\ &= \frac{1}{n} \sum_{m_2=m_1}^n m_2 \begin{pmatrix} n-m_1 \\ m_2-m_1 \end{pmatrix} \left( \frac{F(t)-F(s)}{1-F(s)} \right)^{m_2-m_1} \left( \frac{1-F(t)}{1-F(s)} \right)^{n-m_2} \\ &= \frac{1}{n} \sum_{m_2=0}^{n-m_1} (m_2+m_1) \begin{pmatrix} n-m_1 \\ m_2 \end{pmatrix} \left( \frac{F(t)-F(s)}{1-F(s)} \right)^{m_2} \left( \frac{1-F(t)}{1-F(s)} \right)^{n-m_2-m_1} \end{align*}$$

If $Y$ is a random variable which has Bernoulli distribution with parameters $\left( n-m_1, p \right)$ for $p := \frac{F(t)-F(s)}{1-F(s)}$ then we can write the sum on the right-hand side as

$$\frac{1}{n} \mathbb{E}(m_1+Y).$$

As $\mathbb{E}(Y) = (n-m_1) p$ we get

$$\begin{align*} \mathbb{E}\left(F_n(t) \mid F_n(s) = \frac{m_1}{n} \right) &= \frac{1}{n} (m_1+\mathbb{E}(Y)) \\ &= \frac{m}{n_1} + \left(1- \frac{m_1}{n} \right) \frac{F(t)-F(s)}{1-F(s)}. \end{align*}$$

Since $F_n(s)$ takes only the finitely many values $m_1/n$ for $m_1 \in \{0,\ldots,n\}$ we have shown that

$$\mathbb{E}(F_n(t) \mid F_n(s)) =F_n(s) + \left(1- F_n(s) \right) \frac{F(t)-F(s)}{1-F(s)}.$$

Combining this with Step 2, we get

$$\mathbb{E}(F_n(t) \mid F_n(u), u \leq s) =F_n(s) + \left(1- F_n(s) \right) \frac{F(t)-F(s)}{1-F(s)}.$$

Step 4: Prove the martingale property

Fix $s \leq t$. By Step 3, we have

\begin{align*}\mathbb{E}(F_n(t) \mid F_n(u), u \leq s) &= F_n(s) + \left(1- F_n(s) \right) \frac{F(t)-F(s)}{1-F(s)} \\ &= F_n(s) + (1-F_n(s)) \left(1- \frac{1-F(t)}{1-F(s)} \right) \\ &= 1- (1-F_n(s)) \frac{1-F(t)}{1-F(s)}. \tag{2} \end{align*} Hence,

$$\begin{align*} \mathbb{E} \left( \frac{F_n(t)-F(t)}{1-F(t)} \mid F_n(u), u \leq s \right) &= \frac{1}{1-F(t)} \left( \mathbb{E}(F_n(t) \mid F_n(u), u \leq s) -1 \right)+1 \\ &\stackrel{(2)}{=} - \frac{1-F_n(s)}{1-F(s)} +1 \\ &= \frac{F_n(s)-F(s)}{1-F(s)}. \end{align*}$$

0
On

To prove your statement, please refer to the paper: ``Al-Hussaini, A., Elliott R. J., The Single Jump Process with Some Statistical Applications, Th. Probab. its Appl. vol 29(1985)3, 607-613. DOI:10.137/1129083''.

In this paper the statement is proved in following steps:

1 Let $$ \mathcal{F}^{(i)}_t:=\sigma\{1_{\{X_i\le u\}},u\le t\},\qquad L^{(i)}_t:=\frac{1_{\{X_i>t\}}}{1-t}, \qquad 0\le t<1. $$ Then, for $0\le s<1 $, $ \{X_i>s\} $ is an atom in $\mathcal{F}^{(i)}_s$ and $$ \mathsf{E}[L^{(i)}_t|\mathcal{F}^{(i)}_s]=L^{(i)}_s\qquad 0\le s\le t<1. $$ i.e. $ \{L^{(i)}_t,\mathcal{F}^{(i)}_t, 0\le t<1 \} $ is a martingale.

2 Let $$ M^{(i)}_t:=1-L^{(i)}_t=\frac{1_{\{X_i\le t\}}-t}{1-t}. $$
Since $ \{X_i, i\ge 1\}$ is an independent sequence, then $$ Z_n(t):=\frac{F_n(t)-t}{1-t}=\frac1n\sum_{i=1}^{n}M^{(i)}_t $$ and $ \{Z_n(t),0\le t<1\} $ is a martingale w.r.t. $\{\sigma(\mathcal{F}^{(i)}_t,1\le i\le n),0\le t<1\}$ (w.r.t. $\{\sigma(F_n(s),s\le t),0\le t<1\}$ too).