I'm trying to understand the proof for strong consistency of distance covariance in metric spaces (Proposition 2.6 Distance covariance in metric spaces by Russel Lyons published in The Annals of Probability).
Let $(\mathcal{X},d_1)$ and $(\mathcal{Y},d_2)$ be two metric spaces and $((X_k,Y_k))_{k\in\mathbb{N}}$ be an i.i.d sequence of random elements with values in $\mathcal{X}\times \mathcal{Y}$ such that $$ Ed_1(X_1,x')<\infty \quad \quad and \quad \quad Ed_2(Y_1,y')<\infty, $$ for any $x'\in \mathcal{X}$ and $y'\in\mathcal{Y} $. The distance covariance $dcov(X,Y)$ is given by $$ dcov(X,Y) = Eh((X_{1},Y_1),...,(X_6,Y_6)), $$ where \begin{align*} h((x_{1},y_1),...,(x_6,y_6))=&f_1(x_1,x_2,x_3,x_4)f_2(y_1,y_2,y_5,y_6) ,\\ \end{align*} and $$ f_i(z_1,z_2,z_3,z_4) =d_i(z_1,z_2)-d_i(z_1,z_3)-d_i(z_2,z_4)+d_i(z_3,z_4). $$ The emperical distance covariance hence becomes a $V$-statistic with (in general) non-symmetric kernel $h$ of degree $6$, i.e. $$ dcov_n(X,Y) = \frac{1}{n^6} \sum_{i_1=1}^n \cdots \sum_{i_6=1}^n h((X_{i_1},Y_{i_1}),...,(X_{i_6},Y_{i_6})). $$ Now it is stated without reference to a SLLN that $dcov_n(X,Y) \to dcov(X,Y)$ almost surely.
Problem: The weakest conditions for SLLN for $V$-statistics I know require that
The kernel $h$ is symmetric and \begin{align*} E| h((X_{i_1},Y_{i_1}),...,(X_{i_6},Y_{i_6}))|^{\frac{\#\{i_1,...,i_6\}}{6}} < \infty, \end{align*} for any $1 \leq i_1 \leq \cdots \leq i_6 \leq 6$.
So my approach was to let $\tilde{h}$ be the symmetrized version of $h$: $$\tilde{h}((x_1,y_1),...,(x_6,y_6))=\frac{1}{6!} \sum_{\sigma\in \Pi_6} h((x_{\sigma(1)},y_{\sigma(1)}),...,(x_{\sigma(6)},y_{\sigma(6)})),$$ where $\Pi_6$ is the set of all permutations on $\{1,...,6\}$. Then the above conditions of the SLLN reduces to \begin{align*} E| h((X_{i_1},Y_{i_1}),...,(X_{i_6},Y_{i_6}))|^{\frac{\#\{i_1,...,i_6\}}{6}} < \infty, \end{align*} for all $(i_1,...,i_6)\in \{1,...,6\}^6$.
The only way (that i can see) we can create upper bounds for $h$ is by the triangle inequality, which amounts to the following inequalities (suppressing the $i$ index in the metric)
\begin{align} \frac{|f_i(z_1,z_2,z_3,z_4)|}{2}\leq \left\{ \begin{array}{lll} d(z_1,z_4), & d(z_2,z_3), & d(z_1,z_2) \lor d(z_1,z_3), \\ d(z_1,z_2) \lor d(z_1,z_4), & d(z_1,z_2) \lor d(z_2,z_3), & d(z_1,z_2) \lor d(z_2,z_4), \\ d(z_1,z_4) \lor d(z_1,z_3), & d(z_1,z_4) \lor d(z_2,z_3), & d(z_1,z_4) \lor d(z_2,z_4), \\ d(z_2,z_3) \lor d(z_1,z_3), & d(z_2,z_3) \lor d(z_2,z_4), & d(z_3,z_4) \lor d(z_1,z_3), \\ d(z_3,z_4) \lor d(z_1,z_4), & d(z_3,z_4) \lor d(z_2,z_3), &d(z_3,z_4) \lor d(z_2,z_4). \\ \end{array} \right. \end{align} and we realize that if for example $i_1=i_2$ then every combination of the above inequalities result in $$ E| h((X_{i_1},Y_{i_1}),...,(X_{i_6},Y_{i_6}))|^{\frac{\#\{i_1,...,i_6\}}{6}} \leq 4E([\gamma(X_{i_1},...)\theta(Y_{i_1},...)]^{\frac{\#\{i_1,...,i_6\}}{6}}), $$ where $\gamma$ and $\theta$ is any of the above upper bounds.
We know that $E\gamma(X_{i_1},...)<\infty$ and $E\theta(Y_{i_1},...)<\infty$, for example $$ Ed(X_{i_2},X_{i_3}) \lor d(X_{i_1},X_{i_3}) \leq E d(X_{i_2},x')+d(X_{i_3},x')+d(X_{i_1},x')+d(X_{i_3},x') < \infty. $$ But we can't use independence to split up the expectation since $X_{i_1}$ may not be independent of $Y_{i_1}$. I realize that if $\#\{i_1,...,i_6\}\leq 3$ then Cauchy-Schwarz inequality can be used to say that \begin{align*} 4E([\gamma(X_{i_1},...)\theta(Y_{i_1},...)]^{\frac{\#\{i_1,...,i_6\}}{6}})&\leq 4E([\gamma(X_{i_1},...)\theta(Y_{i_1},...)]^{1/2})+4 \\ &\leq 4[E\gamma(X_{i_1},...)]^{1/2}[E\theta(Y_{i_1},...)]^{1/2} +4< \infty, \end{align*} but if $3<\#\{i_1,...,i_6\}\leq 6$, then we can't utilize this either.
Thus: Is there another version of SLLN for $V$-statistics which is suitable in this situation, or is there another way to bound the above expectation?
You are right, the place is not properly explained in the article.
The problem is to show the integrability of $f_1(X_{i_1},X_{i_2},X_{i_3},X_{i_4})f_2(Y_{i_1},Y_{i_2},Y_{i_5},Y_{i_6})$ for all $i_1,i_2,\dots,i_6$, not just for distinct ones. First assume that $i_1,i_2,i_3,i_4$ are distinct and $i_1,i_2,i_5,i_6$ are distinct. Here one can employ other ideas of the article in order to establish the integrability. Namely, it is easy to see that $$ f_1(z_1,z_2,z_3,z_4) =d_\mu(z_1,z_2)-d_\mu(z_1,z_3)-d_\mu(z_2,z_4)+d_\mu(z_3,z_4).\tag{1} $$ Then we can estimate, using the Cauchy-Schwarz inequality, $$ \mathbb E[f_1(X_{i_1},X_{i_2},X_{i_3},X_{i_4})f_2(Y_{i_1},Y_{i_2},Y_{i_5},Y_{i_6})] \\\le \big(\mathbb E[f_1(X_{i_1},X_{i_2},X_{i_3},X_{i_4})^2]\mathbb E[f_2(Y_{i_1},Y_{i_2},Y_{i_5},Y_{i_6})^2]\big)^{1/2} $$ Further, by (1) $$ \mathbb E[f_1(X_{i_1},X_{i_2},X_{i_3},X_{i_4})^2]\\\le 4\big(\mathbb E[d_\mu(X_{i_1},X_{i_2})^2] + \mathbb E[d_\mu(X_{i_1},X_{i_3})^2] + \mathbb E[d_\mu(X_{i_2},X_{i_4})^2] + \mathbb E[d_\mu(X_{i_3},X_{i_4})^2]\big). $$ But $\mathbb E[d_\mu(X_{i},X_{j})^2] <\infty$ for $i\neq j$ (Lemma 2.1 of the article). Similarly, $$ \mathbb E[f_2(Y_{i_1},Y_{i_2},Y_{i_5},Y_{i_6})^2]<\infty, $$ qed.
When there are two coincidences, one can use Cauchy-Schwarz to proceed.
So let there be a single coincidence in some of $i_1,i_2,i_3,i_4$ be equal. If $i_1\neq i_2$, then we can proceed as in the article. Say, if $i_2=i_3$, we can estimate $f_1(X_{i_1},X_{i_2},X_{i_3},X_{i_4})\le g_1(X_{i_2},X_{i_3},X_{i_4})$, $f_2(\dots)\le g_2(Y_{i_1},Y_{i_5},Y_{i_6})$.
So the only remaining case is $i_1=i_2$. Here $$ h = \big(d_1(X_{i_3},X_{i_4})-d_1(X_{i_1},X_{i_3})-d_1(X_{i_1},X_{i_4})\big) \big(d_2(Y_{i_5},Y_{i_6})-d_2(Y_{i_1},Y_{i_5})-d_2(Y_{i_1},Y_{i_6})\big). $$ Upon expanding, there are many well-behaved terms. Ill-behaved come with the same (positive) sign and all of the same kind, so it is necessary and sufficient to establish the convergence of $$ \frac{1}{n^4}\sum_{i,j,k=1}^n d_1(X_i,X_j)d_2(Y_i,Y_k) $$ to zero. In turn, for the latter it is sufficient to have $$ \mathbb{E}[d_1(X,X')^{3/4}d_2(Y,Y'')^{3/4}]<\infty,\tag{1} $$ where $(X,Y)$ has the given distribution, and $X'$, $Y''$ have the given marginal distribution, and $(X,Y),X',Y''$ are independent. Assumption (1) might be necessary too, as for the classical Marcinkiewicz-Zygmund SLLN a similar condition is necessary. It does not seem though that (1) follows from the assumptions of the article. So I suggest you to inquire the author, how he deals with this.
UPD (from the discussion): here is an errata for original paper and Janson's paper, which shows that the consistency holds in the original formulation.