The Bayesian inference [1] tells how we can update the prior probability based on evidence. My question is that, in real world, we also update our prior probability of an hypothesis based on new information which is not necessarily evidence. But it seems that Bayesian inference cannot handle such cases.
Please see the following simple example of last year soccer World Cup:
Suppose we have the following prior probability (could be used for soccer betting):
$P(USA\; wins\; world\; cup\;)$
with a new evidence "USA 2-2 Portugal", we now have the posterior probability:
$P(USA\; wins\; world\; cup\;| USA\; 2-2\; Portugal) = \dfrac{P(USA\; 2-2\; Portugal | USA\; wins\; world\; cup\;) P(USA\; wins\; world\; cup\;)}{P(USA\; 2-2\; Portugal)}$
This posterior probability can be calculated. Particularly, $P(USA\; 2-2\; Portugal | USA\; wins\; world\; cup\;)$ can be calculated.
However, sometime we also update our prior based on new information. For example, when I know "Goalkeeper is sick" (which is a piece of information but not evidence), I will probably decrease the prior probability, because we clearly know that when the goalkeeper is sick, the soccer team will be much weaker. The posterior probability is:
$P(USA\; wins\; world\; cup\; | Goalkeeper\; is\; sick) = \dfrac{P(Goalkeeper\; is\; sick | USA\; wins\; world\; cup\;) P(USA\; wins\; world\; cup\;)}{P(Goalkeeper\; is\; sick)}$
But this posterior is hard to calculate in my opinion, because $P(Goalkeeper\; is\; sick | USA\; wins\; world\; cup\;)$ seems make no sense. More specifically, I think these two events are independent, so you cannot update the prior at all using Bayesian inference. But this is against our common practice.
So do you think Bayesian inference can handle this? And what other mathematical approaches are related to such problems? Thank you!
Update 1:
Thanks to jameselmore, here is one possible answer to my question. Basically, $P(Goalkeeper\; is\; sick | USA\; wins\; world\; cup\;)$ can be calculated and it does have a meaning. The idea is that a strong team shall not have a sick goalie. But now since the goalie of USA is sick, it is a "negative" evidence indicating that the USA might not be a strong team.
We can assign some actual numbers. Suppose the prior $P(USA\; wins\; world\; cup\;) = 0.5$, and the general likelihood of a sick goalie is $P(Goalkeeper\; is\; sick) = 0.1$. Now, if USA is a strong team, the likelihood of a sick goalie should be much lower than the average, so we have $P(Goalkeeper\; is\; sick | USA\; wins\; world\; cup\;) = 0.02$. Now, if suddenly, we heard that USA's goalie is sick, we can calculate the posterior as follows:
$P(USA\; wins\; world\; cup\; | Goalkeeper\; is\; sick) = \dfrac{0.02 \times 0.5}{0.1} = 0.1$.
That is, we drastically decrease the probability (or belief) of USA winning the world cup.
References:
Firstly, I'd like to apologize because my comments definitely provided more criticism than constructive commentary. (Although hopefully a healthy blend of both).
Secondly, I think the general answer to your question is: Yes, Bayesian inference can be used in this manner where you take known quantities $P($Goalie is Sick$)$, $P($Goal Keeper Is Sick | USA Wins World Cup$)$, $P($USA Wins World Cup$)$ to make an updated statement of the likelihood of a win with the updated information that the USA goalie is in fact sick.
The issue here is not the mathematics behind the relationship of these events, as much as it is your ability to accurately estimate the assumptions you are using originally. Events such as $P($USA Wins World Cup$)$ are not easily quantifiable, and may require an extreme magnitude of assumptions to even arrive at a number (all teams are equal, all goalies have same likelihood of sickness, etc).
So, under the assumption that you truly know the probability of the axiomatic events, then yes, you may use Bayesian inference in this way.
Typically, Bayesian inference is used not for predictions in the manner, but for discrediting extreme observations for scenarios with small data sets. Where you make an assumption that the random process you are observing comes from a distribution $P(X|\theta)$ and a further assumption that your $\theta$ parameter is in fact a random variable. I.E.$\exists \theta_1, \theta_2$ s.t. $P(\theta_1),P(\theta_2)>0$.
For example, consider the case of trying to estimate the height of the next individual that you encounter. A naive approach would be to average the heights of all individuals that you have encountered and use this as your estimate. If you had more information, say the gender of the individual, you could make much more accurate estimates due to the fact that the heights of humans varies quite differently (in both mean and variance) for the two genders.