Generally data processing inequality says that the entropy cannot increase on applying a function f, or to be precise $H(f(X))\leq H(X)$ (also it is reversed if we know the function is k-to-1 so there is an extra log(k) factor. or the mutual information DPI is $I(X:f(Y))\leq I(X,Y)$. I was wondering, what happens if we have a randomized/probabilistic function $f$. Say for example, f arbitrarily flips every bit with some probability $p$, does it still hold? can we claim anything more.
Also references to papers/notes to read more on using probabilistic techniques are welcome.
Have you tried writing a randomized function $f_R$ as a distribution $F$ over deterministic functions, and looked at $H(f(X))$ as $\mathbb{E}_{f\sim F}[H(f(X))]$ to see where this led? See e.g. Lemma 2 of these lecture notes for a related question (with total variation distance instead of entropy/mutual information).
-- Edit: as a small comment: in general, you will most likely need to assume that $f$ is independent of $X,Y$ (i.e., the randomized function has its "own coins", and cannot depend on the random variables it applies to).