Reading about invariance and equivariance properties occurring in the field of Machine Learning, I'm trying to understand the proof of the main theorem presented in the paper Deep Sets: https://arxiv.org/pdf/1703.06114.pdf
The idea is that given a set $X = \{x_1, \dots x_m\}$ that is finite or countable I wanna consider some function $f:2^X \rightarrow \mathbb{R}$ that is permutation invariant, i.e.
$$f(\{x_1, \dots, x_m\}) = f(\{x_{\pi(1)}, \dots , x_{\pi(m)}\})$$
for every permutation $\pi$.
Then the theorem states that:
A function $f:2^X \rightarrow\mathbb{R}$ operating on a set $X$ is permutation invariant iff it can be decomposed in the form
$$f(X) = \rho\left(\sum_{x \in X}\phi(x)\right)$$
for suitable $\rho, \phi$.
Now I'm trying to understand the $(\Rightarrow)$ implication which proeeds as follows:
Let $c : X \rightarrow \mathbb{N}$ and let $\phi(x) = 4^{-c(x)}$, then $\sum_{x \in X}\phi(x)$ constitutes a unique representation for every set $X \in 2^X$. Now a function $\rho$ can always be constructed to obtain $f(X)$.
Questions:
Why does $\sum_{x \in X}\phi(x)$ constitutes a unique representation for every set?
Why does they choose $\phi(x)$ to be $4^{-c(x)}$ ?
Why one can always find a such $\rho$?
Every function $2^\mathfrak{X} \to \mathbb{R}$ is "permutation invariant" so the statement here isn't quite right - rather they're proving that every function $f:2^\mathfrak{X} \to \mathbb{R}$ has that form.
(Some of these $X$s are being mixed up. The $X$ in $2^X$, which they called $\mathfrak{X}$, is not the same as the $X$ in $f(X)$, and that's different to their $\mathcal{X}$......................)
Q1 The point is that with the right choice of $\phi$, you can recover $X$ from $\sum_{x \in X} \phi(x)$. Without loss of generality, we are working with subsets of $\mathbb{N}$ (i.e. $\mathfrak{X} = \mathbb{N}$ in their language). You can do that by putting $\phi(x)=4^{-x}$ because of the almost-uniqueness of base 4 representations of numbers between 0 and 1 - ignoring infinite trailing 3s then there is one and only one way to write each $r \in [0,1]$ in base 4, just like there's one and only one way to write each $r \in [0,1]$ in decimal so long as we forbid recurring 9s. In a number of the form $\sum_{x \in X} 4^{-x}$ there can't be trailing 3s because all the base-4 digits of that number are 0 or 1, so if $\sum_{x \in X} 4^{-x} = \sum_{y \in Y} 4^{-y}$ then $X=Y$.
Q2 They choose that form because they can use the almost-uniqueness of base 4 expansions. $3^{-x}$ would have been fine too. $2^{-x}$ wouldn't because $0.1 = 0.011111\cdots$ in binary.
Q3 no two different $X \in 2^\mathfrak{X}$ have the same value of $\sum_{x \in X} 4^{-x}$, so we can just define $\rho$ by making it equal to $f(X)$ on a number of the form $\sum_{x \in X} 2^\mathfrak{X}$ and 42 on every other number.