Limit of an error function

33 Views Asked by At

Recently, I faced the problem below. I think I found a solution. Anyway, I would like to ask you to verify my proof. Could it work? Is there a better manner to prove it?

Proving that $$\lim_{n\to \infty} f(n) = \dfrac{2}{3}$$ where $$ f(n) = \sum_{i=1}^{n} |X_i^{(n)}-Y_i^{(n)}|$$ no matter what the vectors $X^{(n)} = (X_1^{(n)}, \dots, X_n^{(n)}) \in \mathbb{R}^n$ and $Y^{(n)} = (Y_1^{(n)}, \dots,Y_n^{(n)}) \in \mathbb{R}^n$ are.

We only know that: $$0 \leq X_i^{(n)}, Y_i^{(n)} \leq 1 \quad \forall i,n$$ and $$\sum_{i=1}^{n}X_i^{(n)}, \sum_{i=1}^{n} Y_i^{(n)} = 1 \quad \forall n $$(i.e. the vectors are built randomly by taking $n$ values from the same positive uniform distribution, and then normalizing).

Just to clarify, we call $f$ the error function just because we can think of it as a measure (for a fixed $n$) of how good we are at forecasting the mix of some items (forecast vector $X$ vs the actual vector $Y$).

Here my Proof Let's work in steps

Step 1: Let $X \sim U_{[0,c]}, \quad c >0$ a random variable and let $a \in \mathbb{R}^{+}$ a positive number. Then, $Z = X/a \sim U_{[0,c/a]}$

It's enough showing that the cdf of $Z$ is $ F_{Z}(z) = \begin{cases} {0 \qquad z<0}\\ {az/c \qquad 0\leq z \leq 1/a}\\ {1 \qquad z>1/a} \end{cases} $.

Now, $ F_Z (z) = P(Z \leq z ) = P (X/a \leq z ) = P (X \leq az).$ Since we know $F_X$ then we know $P(X \leq az)$. Indeed $$P(X \leq az) = \begin{cases} {0 \qquad az<0}\\ {az/c \qquad 0\leq az \leq 1}\\ {1 \qquad az>1} \end{cases}$$ and the statement follows.

Step 2: Let $X_i'^{(n)} \sim U_{[0,c]}$ iid random variables. Let define $ X_i^{(n)} = \dfrac{X_i'^{(n)}}{\sum_{i=1}^n X_i'^{(n)}} $ Then, for $n$ sufficiently large $X_i^{(n)} \sim U_{[0,2/n]}$.

For the law of large numbers $\dfrac{\sum_{i=1}^n X_i'^{(n)}}{n} \cong \mathbb{E}(\bar{X'}) = \dfrac{c}{2}.$ Hence, from step 1 $X_i \sim U_{[0,2/n]}$

Step 3: If $X,Y \sim U_{[0,b]}$ for $b \in \mathbb{R}^{+}$ and they are independent, then $ \mathbb{E}[|X-Y|] = \dfrac{b}{3}$.

It's enough following the LOTUS, and after some calculations we land on $\mathbb{E}[|X-Y|] = b/3$.

Step 4: Following the notation of the problem, then $\lim_{n\to \infty} f(n) = \dfrac{2}{3}.$

From the law of large numbers together with Step 2 and Step 3 (with $b = 2/n$), we have $\lim_{n\to \infty} \dfrac{f(n)}{n} =\dfrac{\sum_{i=1}^{n} |X_i^{(n)} - Y_i^{(n)}|}{n} = \mathbb{E}[|X-Y|]= \dfrac{2/n}{3}$.

Finally $\lim_{n \to \infty} f(n) = \dfrac{2}{3} \cong 0.66.$

I also propose a numerical/visual solution to this problem

plot of <span class=$f$ function" />

For those who are thinking about the code, check here

import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

def f(X,Y):    # error function f for a fixed n
    X = np.array(X)
    Y = np.array(Y)
    return np.sum(np.abs(X-Y))

n_max = 100000  #evaluating the f function in {2,3,...n_max}
maxiter = 1 #for a given n we calculate the f function maxiter-times with different values of X and Y each time.
            # Then we take the mean
dict_values = {} # here we collect the 'n' as key and f(n) as value
for n in tqdm(range(2, n_max + 1)):
    lst_values = [] # here we collect the f(n) values for different trials for a fixed n
    for iter in range(maxiter):
        Y = np.random.uniform(0, 1, n)
        Y = Y / Y.sum()
        X = np.random.uniform(0, 1, n)
        X = X/X.sum()
        lst_values += [f(X, Y)]
    mean_error = sum(lst_values) / float(len(lst_values))
    dict_values[n] = mean_error


plt.plot(dict_values.keys(), dict_values.values())