Maximum Expected Long Term Utility.

40 Views Asked by At

I have the following question in Dynamic Games where the first player completely knows the state over all the period $T$ and tries to send signals to the second player (second player only knows the state of the previous stage) so that they match their actions to obtain the maximum long term utility.

Consider the following game with two players, $A_i$ is the action set of the player $i$. The set of nature states is denoted by $A_0$. The nature is Identically Independently Distributed. All the sets are discrete and finite. The strategies of the long term game are defined as follows:

$$A_{1,t} = \sigma_{1,t}(A_0^T, A_1^{t-1}, A_2^{t-1})$$ $$A_{2,t} = \sigma_{2,t}(A_0^{t-1}, A_1^{t-1}, A_2^{t-1})$$

Where $A_j^t$ stands for the classical notation for a sequence of random variables $A_{j,1}, .......... , A_{j,t}$.

Now the Questions is:

By assuming $A_0 = A_1 = A_2 = \{a , b\}$ and the following team game utility: $$u(a_0, a_1, a_2) = \begin{cases} 1 & \text{if} \; a_0 = a_1 = a_2 \\ 0 & \text{otherwise} \end{cases}$$

Prove that the maximum expected long-term utility is about $0.81$


The way I have attempted to answer this question:

First there is this corollary: any implementable joint distribution $Q(a_0,a_1,a_2)$ has to necessarily verify the following entropy condition: $$ H_Q(A_0,A_1,A_2) \geq H_Q(A_0) + H_Q(A_2) $$

In addition to that, the long term utility is: $\frac{1}{T} \sum_{t = 1}^{T} U(a(t)) = q_1u_1 + q_2u_2$

where $u_1$ contains the action profiles that yields a utility one and $u_2$ action profiles which yields zero and $q_1$ is the distribution of the action profiles that has utility one so we will have:

$\frac{1}{T} \sum_{t = 1}^{T} U(a(t)) = \mathbb{E}[u(a)] = q_1u_1 + q_2u_2 = q_1$

in addition to that what I only know but can't reach is: $\dfrac{h(q_1) - 1}{q_1 - 1} = \log_23$ the solution of this equation gives the answer which will be $0.81$ and $h(q_1)$ is the entropy: $h(q_1) = -q_1\log q_1 - (1-q_1)\log(1-q_1)$.