Question:
Is the following a correct notion/generalization of "indicator function" and "positive operator valued measure" (POVM) to free probability? Either way, do you know of any references discussing the connection?
Let $\mathscr{A}$ be a (unital) *-algebra (with unit $1_{\mathscr{A}}$) (see Wikipedia link and/or definition below) and let $(X, \Sigma_X)$ be a measurable space. Then a function $\mathbb{I}: \Sigma_X \to \mathscr{A}$ is a "generalized indicator function" (or "generalized observable") if it satisfies the following axioms:
- $\mathbb{I}$ is finitely additive (and/or countably additive if/when $\mathscr{A}$ is also a Banach algebra i.e. so that infinite summation can be defined, or assume for simplicity that $X$ is finite).
- $\mathbb{I}(X) = 1_{\mathscr{A}}$ ("probabilities sum to $1$").
- for all "events" $E \in \Sigma_X$, $\mathbb{I}(E) = \alpha \alpha^*$ for some $\alpha \in \mathscr{A}$ ("non-negativity" and self-adjointness).
Presumably one could also add the following restrictions to make these more closely resemble the "classical indicator function" $\Sigma_X \to \mathbb{R}^{X}$:
Recall that an "atomic event" $E \in \Sigma_X$ is such that $E \not= \emptyset$ and $E' \subseteq E$ implies that $E' = E$ or $E' = \emptyset$. Let $\mathcal{E}_X \subset \Sigma_X$ denote the set of all atomic events.
- The generalized indicator function $\mathbb{I}$ will be called "faithful" if the set $\{ \mathbb{I}(E) : E \in \mathcal{E}_X \}$ is linearly independent in $\mathscr{A}$. (In particular $\mathbb{I}$ will be injective, but this condition is strictly stronger.) (Should "linearly independent" be replaced with "orthogonal"?)
- The generalized indicator function $\mathbb{I}$ will be called "full" if the span of the set $\{ \mathbb{I}(E) : E \in \mathcal{E}_X \}$ equals all of $\mathscr{A}$.
For example, my understanding is that projection-valued measures always have to be "faithful", but positive operator valued measures (POVMs) do not.
Intuitively, if the dimension of $\mathscr{A}$ is "too small" and the cardinality of $X$ is "too large", then it should be impossible for any "generalized indicator function" $\mathbb{I}: \Sigma_X \to \mathscr{A}$ to be faithful. Similarly, if the dimension of $\mathscr{A}$ is "too large" and the cardinality of $X$ is "too small", then it should be impossible for any "generalized indicator function" $\mathbb{I}: \Sigma_X \to \mathscr{A}$ to be full.
Motivation:
To better understand free probability theory, and in particular quantum information theory, I want to double-check whether I correctly understand the connection between the Kolmogorov/measure-theoretic formulation of "classical probability theory" and the free probability/"commutative algebra of RVs" formulation.
E.g. I'm pretty sure I understand the connection between the "Kolmogorovian"/measure-theoretic formulation of "classical probability theory" and the formulation in terms of indicator functions and expectation axioms/linear functionals, as described e.g. in Peter Whittle's book. I'm not entirely sure, but NCatLab implies this is on the right track.
Extra Definitions:
Modified from Wikipedia:
A *-ring $A$ is a unital ring (with unit $1_A$) with a map $*: A \to A$ such that:
- $(a_1 + a_2)^* = a_1^* + a_2^*$,
- $(a_1 a_2)^* = a_2^* a_1^*$,
- $1_A^* = 1_A$,
- $(a^*)^* = a$
for all $a, a_1, a_2 \in A$.
A $*$-algebra $\mathscr{A}$ is a (unital) $*$-ring with involution $*$ that is an associative algebra over a commutative $*$-ring $R$ with involution $^\dagger$, such that $(r \alpha)^* = r^\dagger \alpha^*$ for all $r \in R, \alpha \in \mathscr{A}$.
For simplicity we can assume that $R$ is the real numbers with $\dagger$ given by the identity function, or that $R$ is the complex numbers with $\dagger$ given by complex conjugation. But technically the definition is more general.
To go from "generalized indicator functions" or "generalized observables" to generating actual probability distributions, we of course need to consider linear functionals, i.e. "generalized expectations" or "generalized traces". What follows is basically copy-pasted-modified from Terry Tao's blog.
A "generalized expectation" or "generalized trace" is a linear functional $\tau: \mathscr{A} \to \mathbb{C}$ satisfying the following properties:
- for all $\alpha \in \mathscr{A}$, $\tau(\alpha^*) = \overline{\tau(\alpha)}$, i.e. the complex conjugate of $\tau(\alpha$).
- $\tau(1_{\mathscr{A}}) = 1$.
- For all $\alpha \in \mathscr{A}$, $\tau(\alpha \alpha^*) \ge 0$.
Note that the first property implies that for all self-adjoint elements of $\mathscr{A}$, i.e. $\beta^* = \beta$, that $\tau(\beta) \in \mathbb{R}$. (Terry Tao calls this property "$*$-linear", although that risks confusion with conjugate-linear/"antilinear".)
Basically one important way this perspective differs from "Kolmogorovian probability theory" is that, instead of considering/looking at individual probability distributions one at a time, if we are given a single "generalized indicator function/POVM" we will usually then consider the family of probability distributions generated by applying all possible "generalized expectations/traces" to that "generalized indicator function/POVM". I think that we need the "generalized indicator function/POVM" to be "full" (in the sense defined above) in order for this family of probability distributions to be "complete" or "as large as possible".
For example, consider $(X, \Sigma_X) = (\{1,2\}, \mathcal{P}(\{1,2\}))$ and $\mathscr{A} = \mathbb{R}^2$ with entrywise multiplication (and unit the all-ones vector). Then if we define $\mathbb{I}(\{1\}) = \mathbb{I}(\{2\}) = (\frac{1}{2}, \frac{1}{2})$, then this appears to satisfy all of the axioms above, but the only probability distribution we can generate (regardless of what $\tau$ is) is the "fair coin". Similarly we can only generate one (biased) "coin" distribution by setting, for any given $p \in (0,1)$, $\mathbb{I}(\{1\}) = (p,p)$ and $\mathbb{I}(\{2\}) = (1-p, 1-p)$, because by linearity $\mathbb{P}(\{2\}) = \tau(1-p, 1-p) = (1-p)/p \cdot \tau(p,p) = (1-p)/p \cdot \mathbb{P}(\{1\})$, which forces us to the same values of $\mathbb{P}(\{1\})$ and $\mathbb{P}(\{2\})$ for any $\tau$ because of the constraint $\mathbb{P}(\{1\}) + \mathbb{P}(\{2\}) = 1$.
On the other hand, for the "standard" indicator function $\mathcal{P}(\{1,2\}) \to \mathbb{R}^2$ with $\mathbb{I}(\{1\}) = (1,0)$ and $\mathbb{I}(\{2\}) = (0,1)$ the "full"-ness (and "faithful"-ness) condition is satisfied, and of course applying all possible $\tau$ then leads to all possible probability distributions on $\{1, 2\}$. So having this extra flexibility in choosing the "generalized indicator function" seems kind of weird when applied to "classical probability theory", but it seems to be a standard for POVMs, which is a source of confusion for me.