I have a score function $\xi$ based on Jaccard similarity like this:
$\xi(A, B) = \frac{M_{11}}{M_{11} + {M_{10}} + M_{01}}$
where $A, B \in \{0,1\}^N$, and $M_{xy}$ is the number of elements with value $x \in \{0,1\}$ in $A$ and value $y \in \{0,1\}$ in $B$.
Considered that I obtain $B$ with random perturbations of the elements of $A$ with perturbation probability $p$, i.e., (independently for each element) with probability I invert the element of (e.g., from 0 to 1, or from 1 to 0), I would say that the expected score is:
$\mathbb{E}[\xi] = \frac{ N_1(1-p)}{N_1(1-p) + Np}$,
where $N_1$ is the number of elements equal to 1 in $A$.
- Is this correct?
Moreover, if I independently repeat this perturbation over $A$, I obtain different vectors $B_1, B_2, ..., B_n$. Then, I would like to define a value $n_{t^*}$ that represents the fraction of perturbed vectors whose score with $A$ is over a certain threshold $t^*$, i.e., (I suppose) $Pr[\xi(A, B_i) > t^*]$.
Is it correct (is it formal) to say that $\mathbb{E}(n_{t^*}) = Pr[\xi(A, B_i) > t^*]$?
How to compute the expected value $\mathbb{E}(n_{t^*})$?
Thank you for your answers.