Under which probabilistic assumptions is this estimation approach correct?

268 Views Asked by At

trying to tackle a problem, I ended up having built a model $ p(\boldsymbol{x}_{i}|\boldsymbol{y}_{i,j}, \boldsymbol{x}_{j}) $ which gives me for different $ \boldsymbol{y}_{i,j} $ observations made by $ \boldsymbol{x}_{j} $, a new PDF of $ \boldsymbol{x}_{i} $. My ultimate goal is to obtain the best estimate for it, $ \hat{\boldsymbol{x}_{i}} $, given my $ \boldsymbol{y}_{i,j} $ observations. The most typical approach I have seen being used (from others who faced the same problem and made other models) is the:

\begin{equation} \widehat{\boldsymbol{x}_{i}}=\arg \max_{\boldsymbol{x}_{i}} f(\boldsymbol{x}_{i}) \prod_{j \in \mathcal{O}} f(\boldsymbol{y}_{i,j} \mid \boldsymbol{x}_{i}, \boldsymbol{x}_{j}) \end{equation}

However, I tried using instead simply the following and it worked just fine:

\begin{equation} \widehat{\boldsymbol{x}_{i}}=\arg \max_{\boldsymbol{x}_{i}} \prod_{j \in \mathcal{O}} p(\boldsymbol{x}_{i} \mid \boldsymbol{y}_{i,j}, \boldsymbol{x}_{j}) \end{equation}

I would like to ask why is that happening? What does the fact that both appear to work, suggest about their relationship? I did not mention the underlying assumptions (because I am not quite sure), therefore, I would like some help in identifying mathematically those.

Thank you for your time.

*Some further information regarding $ \boldsymbol{x}_{i} $, $ \boldsymbol{y}_{i,j} $, $ \boldsymbol{x}_j $ and their relation:

$ \boldsymbol{x}_{i} $ is the position of some node in 3D space. Assume that this node is emitting one single impulse signal. $ \boldsymbol{y}_{i,j} $ is the measurement of that signal from another node (namely node $ j $), whose position is "known" to us. A measurement depends only on the distance between the transmitting and the receiving node (the model of this dependency is the same for all receiving nodes). Therefore, given some measurement $ \boldsymbol{y}_{i,j} $ and the position of the receiver $ \boldsymbol{x}_j $, there is a distribution about where $ \boldsymbol{x}_{i} $ is.

The following graph depicts my system where my actual goal is to find the best position estimation for all nodes $ \boldsymbol{X} $ (their positions are independent to each other) because, in fact, I do not have any prior knowledge about them (the knowledge gets built iteratively via my optimization process).

System

So in fact, after optimizing $ \boldsymbol{x}_{i} $, I continue optimizing iteratively one by one all the rest of $ \boldsymbol{x}_{j} $'s (which become the new $ \boldsymbol{x}_{i} $ on each step) using the previous estimations of the $ \boldsymbol{x}_{j} $'s when available (else I am using a random position). I have practically seen that this method converges to a correct solution where nodes are in a relative reference system (since my initial positions are random) placed.