I am currently working on the best strategies to find objects in the sea.
My situation is as follows: a sunken submarine is in the sea modeled by a square matrix whose size is arbitrary, assume it is n. So the submarine has to be found within one of the n² boxes of the grid.
We have no exact idea at the beginning of the search about the submarine's position.
Let p(i,j) the probability of finding the wreck at (i,j).
Let c(i,j) the cost of searching at the position (i,j). (We can assume for example that this cost only depends on the Euclidian distance between our boat's position and the submarine's location).
The submarine is not moving, however at each step the boat travels. So we have to reconsider the costs at each search.
Finally let q the probability of successfully finding the wreck when searching in the right box.
At each step of the search we look into one of the boxes, we repeat this operation until we find the submarine.
And if one search is unsuccessful we update the probabilities using Bayes Theorem as explained in this article :
https://en.wikipedia.org/wiki/Bayesian_search_theory
My point is to prove, if it is possible, the result described in Searching in boxes of the previous article, that states that the optimal policy in order to find the submarine is to look at each step in the box which maximizes:
$\dfrac{p_{i,j}q}{c_{i,j}} $
Can anyone help me prove this result please?
Thank you for your time.
As discussed in the comments, the solution in the Wikipedia article assumes fixed costs, and thus doesn’t apply to your case of position-dependent costs.
Note also that $q$ depends on the box in that article. If it doesn’t, as in your case, there’s no need to include it in the ratio.
I'll combine the indices $i,j$ into a single index $k$ for conciseness. I'll denote the “attractiveness” $\frac{p_k}{c_k}$ by $a_k$.
If a strategy doesn’t always search the most attractive box that maximizes the current value of $a_k$, then it must include two consecutive searches which search the less attractive box of the two first.
To see this, note that the strategy eventually returns to every box. Assume box $b$ is the most attractive at time $t_1$ but box $c$ is picked instead. Until $b$ eventually gets picked at time $t_2$, it receives the same Bayesian update factors as other boxes that aren’t picked receive, while boxes that are picked and searched in vain receive lower update factors. Thus, when $b$ gets picked at $t_2$, it has the attractiveness it had at $t_1$ multiplied by the maximal Bayesian update factors, and thus a higher attractiveness than that of $c$ at $t_1$ multiplied by these same factors. Thus, there must be a first time after $t_1$ at which a box is picked that has higher attractiveness than $c$ would have had with maximal update factors, and this box must have had higher attractiveness than the one searched immediately before it.
If neither of these two consecutive searches succeeds, it doesn’t matter in which order they were performed. So assume that one of them succeeds. This is the search in box $1$ with probability $p_1'=\frac{p_1}{p_1+p_2}$ and the search in box $2$ with probability $p_2'=\frac{p_2}{p_1+p_2}$ (where $p_1$ and $p_2$ are the values before the first search). If we search box $1$ first and then box $2$, the expected cost is $p_1'c_1+p_2'(c_1+c_2)$, whereas if we search box $2$ first and then box $1$, the expected cost is $p_2'c_2+p_1'(c_1+c_2)$. The difference is $p_2'c_1-p_1'c_2=\frac{c_1c_2}{p_1+p_2}(a_2-a_1)$. Thus, in two consecutive searches, the more attractive box must be searched first to minimize the expected cost. Since the strategy doesn’t do this, it cannot be optimal. Thus the optimal strategy always searches the currently most attractive box.