Am trying to understand Theorem 2.2 in Serfling (1968):
Proposition. Let $(\Omega,\mathcal A,P)$ be a probability space and $\mathcal F$ a sub-$\sigma$-algebra of $\mathcal A$. Let $X$ be a real random variable on $(\Omega,\mathcal A,P)$, and assume that $\|X\|_p<\infty$, where $1<p<\infty$. Then
$$ \|E[X|\mathcal F]-E[X]\|_p\leq 2\phi(\mathcal A,\mathcal F)^{1-\frac{1}{p}} \|X\|_p $$
where $\phi(\mathcal A,\mathcal F)=\sup_{A\in \mathcal A, F\in \mathcal F, P(F)>0} \bigg|P(A|F)-P(A)\bigg|$.
Proof. (Davidson (1994))
First assume that $X$ is simple with representation
$$X=\sum_{i=1}^n a_i 1_{A_i}$$
with $a_i\in\mathbb R$, $A_i\in\mathcal A$, $\cup_{i=1}^n A_i=\Omega$, $A_i$ pairwise disjoints. Let $q=\frac{p}{p-1}$. Then
$$\Big|E[X|\mathcal F]-E[X]\Big|^p=\Big|\sum_{i=1}^n a_i P[A_i|\mathcal F]-P[A_i]\Big|^p$$$$\leq \Big[\sum_{i=1}^n |a_i| \Big|P[A_i|\mathcal F]-P[A_i]\Big|\Big]^p$$
$$=\Big[\sum_{i=1}^n |a_i| \Big|P[A_i|\mathcal F]-P[A_i]\Big|^{1/p} \Big|P[A_i|\mathcal F]-P[A_i]\Big|^{1/q}\Big]^p$$
$$\leq \Big[\sum_{i=1}^n |a_i|^p \Big|P[A_i|\mathcal F]-P[A_i]\Big| \Big] \Big[\sum_{i=1}^n\Big|P[A_i|\mathcal F]-P[A_i]\Big|\Big]^{p/q}$$
$$\leq \Big[ E[|X|^p|\mathcal F]+E[|X|^p] \Big] \Big[\sum_{i=1}^n\Big|P[A_i|\mathcal F]-P[A_i]\Big|\Big]^{p/q}$$
$P$-almost surely, where the second inequality comes from Hölder's inequality.
Now Davidson writes: Let $A^+$ denote the union of all those $A_i$ for which $P[A_i|\mathcal F]-P[A_i]\geq 0$, and let $A^-=\Omega\setminus A^+$. Then
$$\sum_{i=1}^n\Big|P[A_i|\mathcal F]-P[A_i]\Big|=\Big|P[A^+|\mathcal F]-P[A^+]\Big|+\Big|P[A^-|\mathcal F]-P[A^-]\Big|$$
Why is this last equality true?
The next step is to claim
$$|P[A^+|\mathcal F]-P[A]|\leq \phi(\mathcal A,\mathcal F)$$
$$|P[A^-|\mathcal F]-P[A]|\leq \phi(\mathcal A,\mathcal F)$$
$P$-almost surely using the inequality from here.
Thanks a lot for your help.
For completeness I will try to finish the proof here. From the answers we have $$\sum_{i=1}^n\Big|P[A_i|\mathcal F]-P[A_i]\Big|\leq 2\phi (\mathcal A,\mathcal F) \quad\quad P\text{-almost surely}$$
Substituing back we get
$$\Big|E[X|\mathcal F]-E[X]\Big|^p\leq \Big[ E[|X|^p|\mathcal F]+E[|X|^p] \Big] \Big[2\phi (\mathcal A,\mathcal F)\Big]^{p/q} \quad\quad P\text{-almost surely} $$
Integrating both sides gives
$$E\Big[\Big|E[X|\mathcal F]-E[X]\Big|^p\Big]\leq 2E[|X|^p] \Big[2\phi (\mathcal A,\mathcal F)\Big]^{p/q} $$
Raising both sides to the power $1/p$ we obtain
$$ \|E[X|\mathcal F]-E[X]\|_p\leq 2\phi(\mathcal A,\mathcal F)^{1-\frac{1}{p}} \|X\|_p $$
Now assume $X$ is an arbitrary $\mathcal A$ measurable real random variable with $\|X\|_p<\infty$. Then there exists a sequence $(X_n)$ of simple $\mathcal A$ measurable random variables such that $|X_n|\leq|X_{n+1}|\leq |X|$ for each $n$ and $X_n\to X$ pointwise.
From the DCT using the dominating functions $|X|$, $|X|^p$ we get
$$ E[X_n]\to E[X]$$ $$E[|X_n|^p]\to E[|X|^p]$$
From the DCT for conditional expectations using the dominating functions $|X|$, $|X|^p$ we also have
$$ E[X_n|\mathcal F]\to E[X| \mathcal F] \quad \quad P\text{-almost surely}$$
$$ E[|X_n|^p|\mathcal F]\to E[|X|^p| \mathcal F] \quad \quad P\text{-almost surely}$$
From the first part we have
$$\Big|E[X_n|\mathcal F]-E[X_n]\Big|^p\leq \Big[ E[|X_n|^p|\mathcal F]+E[|X_n|^p] \Big] \Big[2\phi (\mathcal A,\mathcal F)\Big]^{p/q} \quad\quad P\text{-almost surely} $$
for each $n$. Since a countable union of null sets is null, we can combine all these statements and pass to the limit to obtain
$$\Big|E[X|\mathcal F]-E[X]\Big|^p\leq \Big[ E[|X|^p|\mathcal F]+E[|X|^p] \Big] \Big[2\phi (\mathcal A,\mathcal F)\Big]^{p/q} \quad\quad P\text{-almost surely} $$
Taking expectations on both sides and raising to the power $1/p$ as before we get
$$\|E[X|\mathcal F]-E[X]\|_p\leq 2\phi(\mathcal A,\mathcal F)^{1-\frac{1}{p}} \|X\|_p$$
Regarding Davidson's proof. Let $\kappa(\cdot,\cdot)$ denote the r.c.p. w.r.t. $\mathcal{F}$, and let $$ \nu(A,\omega):=\kappa(A,\omega)-\mathsf{P}(A). $$
Then for each $\omega\in \Omega$, $$ \sum_{i=1}|\nu(A_i,\omega)|=\nu(A^+(\omega),\omega)-\nu(A^-(\omega),\omega), $$ and $$ |\nu(A^+(\omega),\omega)|\vee |\nu(A^-(\omega),\omega)| \le \max_{A\in \mathcal{A}}|\nu(A,\omega)|, $$ where $\mathcal{A}=\{\bigcup_{i\in \mathcal{I}} A_i:\mathcal{I}\subset \{1,\ldots, n\}\}$. Now you can bound the rhs by the $\phi$-mixing coefficient.