Introduction. This question is based on the Ph.D. thesis of B.T. Vo, which can be found in this website ("Papers" section). More specifically, in the introduction of the Ph.D. thesis, at page 8, there is a sub-section called "Critique of Bayesian Association based Multi-Object Filtering" (Appendix B3 is also relevant). This is where my question comes from; the notation used will follow the notation used in it.
The Ph.D. thesis main scope regards an innovative type of multi-target filtering, which, under "some" (actually many, see later) circumstances, outperforms canonical approaches, like the very well-known JPDA (Joint Probabilistic Data Association) and MHT (Multiple Hypothesis Tracking). The interesting fact is that it seems to emerge, from the work, that techniques based on Random Finite Sets (in mathematics, they are better known as random point processes) actually consistently outperfom classical (JPDA and MHT) approaches. From this point of view, my question is only "marginal", in the sense that it regards only the mathematical aspects of classical Bayesian Association and it doesn't involve the core ideas of the Ph.D. thesis itself.
Problem. In subsection 1.2.3, page 8, it can be read
Let $\Omega(Z)$ denote the space of all hypotheses defined from the measurement set $Z$. Association-based approaches compute the posterior probability of a hypothesis $\omega$ given the measurement set $Z$ using Bayes rule as follows $$ p(\omega|Z) = \frac{p(Z|\omega)p(\omega)}{p(Z)} \quad (1.1) $$ Since each hypothesis $\omega$ is an element of $\Omega(Z)$, $\omega$ itself depends on $Z$, and equation (1.1) should be written explicitly as $$ p(\omega|Z) = \frac{p(Z|\omega(Z))p(\omega(Z))}{p(Z)} \quad (1.2) $$ Closer examination reveals a number of conceptual issues with the application of Bayes rule in equation (1.2), which arise from the conditioning of the hypothesis on the measurement.
It can be easily shown that (1.2) is indeed conceptually inconsistent and I don't disagree with this. However, it is a direct consequence of having defined $\Omega(Z)$ as "the space of all hypothesis derived from the measurement set Z". This definition is intrinsically ill-posed: if $\omega_0$ is a single hypothesis (for example, in single-target tracking, $\omega_0 = H_0$, i.e., only noise is measured), it does not depend on $Z$ (the measurements). On the contrary, it is correct to say that $Z$ depends on $\omega$ (in Appendix B3, this is made clear by the rigorous notation $Z(\omega)$).
Let us consider the simple case of single-target detection, i.e., $$ \begin{cases} H_0 : \quad z(k) = n(k) \\ H_1 : \quad z(k) = s(k) +n(k) \end{cases} \quad (*) $$ with $k=1,\dots,K$ denoting a discrete time index ($K>1$ is an integer), $s(k)$ the signal of interest, $n(k)$ a sequence of i.i.d. random variables representing noise, typically zero-mean white gaussian noise, $z(k) \sim \mathcal{N}(0, \sigma^2)$, with $\sigma^2>0$ being an unknown parameter. It can easily be seen that the collection of all measurements, $Z = [z(1), z(2), \dots, z(K)]$, does in fact depend on the actual scenario ($H_0$ or $H_1$) en force, and not the opposite. Thus, $\omega \neq \omega(Z)$, but $Z = Z(\omega)$, where $\omega$ may be $H_0$ or $H_1$.
However, at page 9, it can be read
It may be possible to reconcile the conceptual inconsistencies by considering hypotheses without conditioning on the measurement, but even so a new technical problem arises with this approach. Consider the space $\Omega$ of all possible hypotheses formed by the union of $\Omega(Z)$ over all possible measurement sets $Z$. In this case, a hypothesis in $\Omega$ has no dependence on the measurement and hence there are no conceptual inconsistencies in equation (1.1). However, $\Omega$ is uncountably infinite, because the $\Omega(Z)$s are disjoint (i.e. $\Omega(Z) \cap \Omega(Z') = 0$ if $Z' \neq Z$) and the set of all $Z$ is uncountably infinite. Moreover, $\Omega$ does not have all the nice properties of, say, a Euclidean space. Consequently, it is not clear that the notion of probability density and integration on $\Omega$ can be defined.
Another definition of $\Omega$ is here given. While it solves the problems of the old definition, it seems inappropriate. Why should we accept such a space as the reference, from a Bayesian standpoint? It seems unclear and yet again ill-posed. $\Omega$ (i.e., the space of all hypothesis) cannot be tied to the space generated by all possible measurements $Z$: the scenario en force ($H_0$ or $H_1$, for single-target case) exists regardless if we make measurements or not.
What actually happens is that, if $H_0$ (no target) is en force and we start measuring, we get only noise; conversely, if $H_1$ (target is present) is en force and we start measuring, we get signal+noise. The logical consequence is that the hypothesis space is $\Omega = \{ \omega_0, \omega_1 \}$, where $\omega_0$ = "no target is present", and $\omega_1$ = "a single target is present", where "is present" is obviously a theorical simplification of the physical reality (the target may be present but outside the instrument's field of view, or below its sensitivity threshold, and so on), which holds when the usual assumptions holds. Given the usual assumptions in simple single-target detection, $\omega_0$ and $\omega_1$ are (by assumption) the only two possible cases (target disappearance/(re)appearance is not considered in the simple, canonical formulation given above). Thus, when we start measuring (with an idealized measuring device), either the measurement set $Z$ is described by $H_0$ or by $H_1$.
While $Z$ is still allowed to be uncountably infinite ($s(k)$ is physically bounded, thus must have finite energy and other properties, but a realization of $n(k)$ can assume any arbitrary large value, albeit with small probability), in fact $z(k) \in \mathbb{R}$, thus $Z \in \mathbb{R}^{K}$, this has no consequence on $\Omega$. All the probability density, integration, etc. are naturally defined in the usual Euclidean space (in the example, $\mathbb{R}^K$). They aren't directly based on $\Omega$, which is just the space of all hypothesis (note: I don't see a problem, highlighted at page 8 but for different reasons, with the fact that elements of $\Omega$ cannot be observed if we do not measure anything - I see a real problem if this could be done, even in theory); in fact, in Appendix B3, an observation space is defined.
Assuming that $\omega \neq \omega(Z)$ (i.e., we properly pick $\Omega$), I fail to see where the Bayesian association generates conceptual inconsistencies, but I may have missed the point the author was trying to make.
Question: do pages 8-9 (and possibly Appendix B3) of Vo's Ph.D. thesis actually prove anything regarding the mathematical soundness of classical Bayesian association?
The key idea of Bayesian statistics is (according to wikipedia) that "probability is orderly opinion, and that inference from data is nothing other than the revision of such opinion in the light of relevant new information." So keepig this in mind, equation (1.2) in the Thesis was derived from a wrong assumption. In the Thesis the author writes "$\Omega(Z)$ denote[s] the space of all hypotheses defined from the measurementset Z". What is not mentioned is that Hypothesis are formed a priori so Z is the set of prior measurements. The posterior probability is then computed using also the new measurements, thus another measurement set $\tilde{Z}$.