I made an incremental algorithm which I would like to evaluate the complexity. The algorithm works with a sliding window of size n.
To study the complexity, the window is considered full and the data present are ${x_{1},...,x_{n}}$ . They are assumed to be random, i.i.d, but with no assumption regarding their distribution. In particular, they are not necessarily bounded. Finally, the most recent data is $x_{n+1}$.
The complexity of the algorithm depends on the distance from $x_{n+1}$ to its “nearest lower value” $x_{l}$ whose index is $l=\max\left\{ k\in\{1...n\}\;st\;x_{k}<x_{n+1}\right\}$. The distance $d$ is then $d=n+1-l$.
$d$ is obviously a random variable, and I would like to determine its mean, which would give the algorithm complexity.
I first thought $d$ was a geometric variable with argument 1/2 since $P\left(x_{n+1}\geq x_{n}\right)=1/2$ (which I cannot prove either), so $E\left[d\right]=2$ but it is not the case. Indeed, $x_{n+1}$ and $x_{n}$ are not independent so $d$ does not follow a geometric law. Moreover, I did some experiments on Excel to compute $E\left[d\right]$ using different distributions and it seems it does not depend on the distribution of the $x_{i}$ but on $n$ (I can not post the image, it confirms this result).
Below is a track I have tried to follow to determine $E\left[d\right]$ which proved unsuccessful:
$E\left[d\right]=\sum_{k=1...n}kP\left(d=k\right)$
and
$\begin{eqnarray*} P\left(d=k\right) & = & P\left(x_{n-k}<x_{n+1}\cap x_{n-k+1}\geq x_{n+1}\cap...\cap x_{n}\geq x_{n+1}\right)\\ & = & \forall\lambda_{1},\lambda_{2}\in\mathbb{R}\int_{\lambda_{1}<\lambda_{2}}P\left(x_{n-k}<\lambda_{1}\cap x_{n+1}\in\left[\lambda_{1};\lambda_{2}\right]\cap\bigcup_{j=n-k+1...n}x_{j}\geq\lambda_{2}\right)d\lambda_{1}d\lambda_{2}\\ & = & \int_{\lambda_{1}<\lambda_{2}}P\left(x_{n-k}<\lambda_{1}\right)P\left(x_{n+1}\in\left[\lambda_{1};\lambda_{2}\right]\right)\sum_{j=n-k+1...n}P\left(x_{j}\geq\lambda_{2}\right)d\lambda_{1}d\lambda_{2}\\ & = & \int_{\lambda_{1}<\lambda_{2}}F_{X}\left(\lambda_{1}\right)\left(F_{X}\left(\lambda_{2}\right)-F_{X}\left(\lambda_{1}\right)\right)\sum_{j=n-k+1...n}\left(1-F_{X}\left(\lambda_{2}\right)\right)d\lambda_{1}d\lambda_{2} \end{eqnarray*}$
But I can not go any further ? Do you have any idea ? Thanks in advance !
I just gave an answer at https://mathoverflow.net/questions/139706/computation-of-the-mean-of-a-random-variable-to-estimate-algorithm-complexity, that almost agrees with that of Wild Chan. I repost it here for comparison.
....
It seems to me that $P(d>k)=\tfrac1{k+1}$ for $k\in[0,n]$, so that $$E[d]=\sum_{k\ge 0}P(d>k)=H_{n+1}\simeq \ln n.$$ The reason is that the last $k+1$ terms of the sequence are in random order, so that the last one is the smallest among the last $k+1$ terms of the sequence with probability $P(d>k)=\tfrac1{k+1}$. When the last is the smallest of all I assume that $d$ is set equal to $n+1$. Does it make sense ?
For this to be valid, you need to assume that the probability distribution has no atoms, which is true for instance if it has a density of probability.
.....
Note that I obtain implicitly, for $n\ge k\ge 1$, $$P(d=k)=\frac{1}{k(k+1)}$$ and $$P(d=n+1)=\frac{1}{n+1},$$ to be compared with the solution by Wild Chan, $$P(d=k)=\frac{1}{k^2+3k+2}=\frac{1}{(k+1)(k+2)}$$ for $n-1\ge k\ge 0$, that is not a probability distribution, for it does not add to one. There is also a shift due to different understandings of the definition of $d$, mine being that the minimum value of $d$ is 1, while the other solution assumes that $P(d=0)=1/2$, it seems. This explains the difference $-2+\tfrac1{n+1}$ between the 2 solutions.
Also note a shorter computation below: $$\begin{eqnarray*} \int_{-\infty}^\infty F'(y)F(y)(1-F(y))^kdy &=& \int_{-\infty}^\infty F'(y)(1-F(y))^kdy - \int_{-\infty}^\infty F'(y)(1-F(y))^{k+1}dy \\ &=& \frac1{k+1}-\frac1{k+2}, \end{eqnarray*}$$ that agrees with Wild Chan result too.