Consider a partially observable Markov decision process (POMDP), see here for a complete definition.
The general definition allows for a reward function to be defined in terms of (pairs of) states and actions, as follows $r(s_t,a_t,s_{t+1})$.
Consider an objective where one is trying to be as certain of the underlying state as possible. What is the most appropriate reward function to achieve this goal? I was thinking that perhaps minimizing entropy of the belief could be useful but it’s not clear to me that entropy is minimized around the true state.
Let us, for a moment, assume that you have access to the Bayes filter $p(s_t|x_{1:t})$, where $x_{1:t}$ are the observations until now.
Then minimising the entropy of that will give you what you want. The optimal policy wrt that reward function will seek to get into a state where it knows for certain that it actually is in that state. For example, walls/corners in gridworlds are typically less entropic as the transition has less successor states there.
if you don't have access to the Bayes filter as a state estimator, thing can get more difficult. If you train your state estimator on the same objective, as @user497898 pointed out, the state estimator will just give erroneous estimates with high certainty to satisfy that objective. That is why you should not do that, and make sure your state estimator is following the true Bayes filter.
But even if you approximate that, your state uncertainty might be off. But that is very specific to the approximation you chose, and whether you put emphasis on accuracy or accurate quantification of uncertainty.