I wish to prove the affine invariance property of the following outlyingness measure: $O(x,X) = \underset{\left\lVert u \right\rVert=1}{sup} \dfrac{\lvert u^T x - Med(u^T X) \rvert}{MAD(u^T X)}$
The equality I want to prove is therefore O(Ax+b, AX+b) = O(x, X) for any non-singular matrix $A$ and any $b \in R^d$, with $X$ the data matrix containing the $n$ samples with $d$ dimensions. What I got to, with $w=A^Tu$:
\begin{align} O(Ax+b) &= \underset{\left\lVert u \right\rVert=1}{sup} \dfrac{\lvert u^T (Ax+b) - Med(u^T (AX+b)) \rvert}{MAD(u^T (AX+b))} \\ &= \underset{\left\lVert u \right\rVert=1}{sup} \dfrac{\lvert u^T Ax - Med(u^T AX) \rvert}{MAD(u^T AX)} \\ &= \underset{\left\lVert u \right\rVert=1}{sup} \dfrac{\lvert w^T x - Med(w^T X) \rvert}{MAD(w^T X)} \\ &= (...) \\ &=? ~ O(x) \end{align}
This affine invariance is put forward in this document (p.11 of the pdf):
Theorem 2.9. The projection depth function PD(x;F) is a statistical depth function in the sense of Definition 2.1.
where Definition 2.1 first property is the desired affine invariance. The proof of this invariance is however described as "straightforward" (p. 19 of the pdf):
(a) Affine invariance. Straightforward.
Note that this document talks about the affine invariance of the projection (statistical) depth $PD(x, X) = \frac{1}{1+O(x,X)}$, but $PD(Ax+b)=PD(x)$ (omitting $(.,X)$ for conciseness) if and only if $O(Ax+b)=O(x)$. The latter equality, i.e. the affine invariance I'm looking for, is also put forward in this paper: on p.6 of the pdf "outlyingness is affine invariant", where p.3 puts forward Lemma 2.1 which states the statistical depth built with $O(x, X)$ is affine invariant as well. In the proofs at the end of the document, I don't see what I'm looking for (see p.19 of the pdf).
It would seem that this affine invariance is related (equivalent ?) to the affine equivariance of the location and scatter estimators used in $O(x, X)$, these estimators being the Med (median) and the MAD (median absolute deviation) respectively, as suggested in this other math.SE post and in the introduction of this paper :
Furthermore, with a proper choice of (μ, σ ), it can induce a lot of favorable estimators, such as Stahel-Donoho location and scatter estimators, which enjoy very high breakdown point robustness and affine equivari- ance (Zuo et al. 2004; Zuo 2006), and projection median, which has the highest breakdown point among all the exist- ing affine equivariant multivariate location estimators (Zuo 2003).
This affine equivariance is also mentioned in this other paper (see p.3 of pdf), for example (there is a lot of papers mentioning the random projection depth & associated outlyingness). The affine invariance of $O(x, X)$ is also mentioned here, where the loss of the invariance for a finite set of random projections is also put forward (with a solution to mitigate it):
For ∆ the set of all projections, this is the usual projection outlyingness (e.g., see Zuo, 2003) and is affine invariant. However, for ∆ finite, not even orthogonal invariance holds.
Intuition could be that, since we have a data standardization anyway, the affine transformation has no impact on the outlyingness score $O(x, X)$, but I would like to have the complete and rigorous equations development.