I am reading the article Rainbow trees in uniformly edge-colored graphs. But I encountered some difficulties and was puzzled.
There is a random graph $G(n,\omega(1)/n))$ and a color set $\mathcal{A}_1$ of size at least $n/2$.
The authors perform a random sparsification procedure of $G[X_1]$ as follows, where $X_1$ is a subset of vertex set of $G$:
Include each $e \in\left(\begin{array}{c}|X_i|\\ 2\end{array}\right)$ as an edge independently at random with probability $\omega(1) / n$.
If $e \in\left(\begin{array}{c}|X_i|\\ 2\end{array}\right)$ was included as an edge in Step (1), assign $e$ a color from $[n]$ uniformly at random.(Now, we have thm that a.a.s. $G[X_1]$ uses at least $n/4$ colors from $\mathcal{A}_1$ hold.)
For every $c \in \mathcal{A}_1$, let $E_c$ be the set of edges colored $c$. If $E_c \neq \emptyset$, choose a single member of $E_c$ uniformly at random, that is, with probability $1 /\left|E_c\right|$.
Discard all edges not chosen in Step (3). (This includes, in particular, all edges whose color is in $[n] \backslash \mathcal{A}_1$.)
From the remaining set of edges choose a subset of size $n / 4$ uniformly at random.
The authors say that the resulting graph distribution, which we denote by $\tilde{\mathbb{G}}_1$, coincides with the uniform graph distribution, namely $\mathbb{G}\left(|X_i|, n / 4\right)$, a uniform random graph. I don't understand this. It seems intuitively so, but I don't know the exact proof. Can anyone help me?
If we condition on the event that $G[X_1]$ uses at least $n/4$ colors - which I assume is done in the paper, if not mentioned in the question - then any set of $n/4$ edges is equally likely to be chosen in step 5.
I think the quickest way to see this formally is to expose $G[X_1]$ and the random decisions made inside it in three phases:
Let $\mathcal E$ be the event that at least $n/4$ colors from $\mathcal A_1$ were used at least once. The important thing to note is that, conditioned on the outcome of phase 1, $\mathcal E$ is independent of phase 3. It is "phase 2-measurable". So conditioning on $\mathcal E$ (which we are told happens a.a.s.) is conditioning on what happens in phases 1-2.
Next, we can say that conditioned on any single outcome of phases 1-2, each set of $n/4$ edges is equally likely to be chosen in step 5. This is true just due to the fact that phases 1-2 never treat any elements of $\binom{X_1}{2}$ differently from any others. More formally: a permutation $\sigma$ of $\binom{X_1}{2}$ induces a bijection between phase-3 outcomes that lead to an $n/4$-element set $S \subset \binom{X_1}{2}$ being chosen, and phase-3 outcomes that lead to $\sigma(S)$ being chosen, so $S$ and $\sigma(S)$ must be equally likely. By varying $\sigma$, we can make $\sigma(S)$ be any $n/4$-element set we like.
By combining the statement in bold for all the outcomes of phases 1-2 where $\mathcal E$ occurs, we get the result we wanted: conditioned on $\mathcal E$, each set of $n/4$ edges is equally likely to be chosen in step 5.
(I would still be happier if we included a fictitious-continuation clause in the algorithm: if it turns out in step 5 that fewer than $n/4$ edges remain, disregard those remaining edges and just choose a set of $n/4$ edges from $\binom{X_1}{2}$ uniformly at random. Conditioning on $\mathcal E$, this fictitious continuation will never be used; but now we can say that $\tilde{\mathbb G}_1 \sim \mathbb G(|X_1|,n/4)$ in every outcome of phases 1-2, not just the outcomes where $\mathcal E$ occurs.)