Caracterization of optimal strategy in zero-sum 2 players differential stochastic game

71 Views Asked by At

Let a $(\Omega,\Sigma,(\mathcal{F_t})_{0\leq t\leq 1},P)$ a probability space where $\Omega$ is the space of continuous functions $f:[0, 1]\rightarrow \mathbb{R}^n$, $(\mathcal{F_t})_{0\leq t\leq 1}$ a filtration generated by brownian motion $(W_t)_{0\leq t\leq 1}$ and $P$ the Wiener probability measure.

We have a differential game with 2 players whose admissible controls are $\boldsymbol{\alpha^1}$ and $\boldsymbol{\alpha^2}$ respectively, and the set $A = A_1 \times A_2$ where $A_i$ is the set of admissible strategies of player $i$.

Let $\sigma: [0, T]\times \Omega\times \mathbb{R}^{d} \rightarrow \mathbb{R}^{d \times d}$.

  • for each $x \in \mathbb{R}^{d}$ $\sigma$ is $\mathcal{P}-$measurable; where $\mathcal{P}$ is the $\sigma-$algebra of progressively measurable sets of $[0, 1]\times\Omega$.

  • $\exists c > 0$ such that $\forall t \in [0, T]; \forall \omega \in \Omega; \forall x, x' \in \mathbb{R}^{d_i}, |\sigma ^i(t,\omega,x) - \sigma(t,\omega,x')| < c|x - x'|$;

  • $\forall (t,\omega,x) \in [0, T]\times \Omega\times \mathbb{R}^{d_i}$, $\sigma ^i$ is invertible and $|(\sigma)^{-1}| \leq K(1 + |x|^{\delta})$ for some $K > 0$ and $\delta > 0$ independent de $t, \omega, x$;
  • $\forall (t,\omega,x) \in [0, T]\times \Omega\times \mathbb{R}^{d}$, $|\sigma(t,\omega,x)|\leq c(1 + |x|)$.

We accept that $b:[0, T]\times \Omega\times \mathbb{R}^{d}\times A\rightarrow \mathbb{R}^d$ satisfies:

  • $\forall x \in \mathbb{R}^d, \forall \alpha \in A$, $b(t,\omega,x,\alpha)$ is $\mathcal{P}-$measurable, further there exist $c > 0$ tal que $\forall (t,\omega, x, \alpha) \in [0, T]\times \Omega\times \mathbb{R}^{d}\times A |b(t,\omega,x,\alpha)|\leq c(1 + |x|)$;

  • is continuous in $\boldsymbol{\alpha} = (\boldsymbol{\alpha ^1}, \boldsymbol{\alpha ^2})$

Then, for each couple $(\boldsymbol{\alpha^1},\boldsymbol{\alpha^2})$ and $x \in \mathbb{R}^n$ fixed, $P_{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}$ defined by $\dfrac{dP_{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}}{dP} = \xi_{\sigma^{-1}b}(1)$, where $\xi_{\sigma^{-1}b}(1) = e^{\int_{0}^{1}\sigma^{-1}(s,\omega)b(s,\omega) dW_s - \frac{1}{2}\int_{0}^{1}\sigma^{-1}(s,\omega)b(s,\omega) ds}$ is the exponential process of $\sigma^{-1}b$, is probability measure. Further, Girsanov's Theorem hold that the process $B^{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}_t = W_t + \int_{0}^{t}\sigma^{-1}(s,\omega)b(s,\omega)ds 0\leq t\leq 1$ is brownian motion.

Let $g: \Omega\times\mathbb{R}^d\rightarrow \mathbb{R}$ bounded, $g(X_1)$ is $\mathcal{F}_1$-measurable, and $h$ satisfying the same assumption as $b$

$\Psi^{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}_t = \mathbb{E}_{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}\left[g(X(1)) + \int_{t}^{1}h(t,\omega,X_t,\boldsymbol{\alpha})|\mathcal{F}_t\right]$. player 1 want maximize $\mathbb{E}_{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}[\Psi^{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}_0]$ whereas player 2 want minimize. Where $\mathbb{E}_{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}$ is de expectation with $P_{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}$.

Fix $\boldsymbol{\alpha^2}$ and $x \in \mathbb{R}^n$, define $Z^{\boldsymbol{\alpha^2}}_t = ess sup_{\boldsymbol{\alpha^1}\in A_1}\Psi^{\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}}_t$ (essential supremum).

Question: $\boldsymbol{\alpha^{1}_*}$ is optimal reply of player 1 for $\boldsymbol{\alpha^2}$ $\Leftrightarrow$ $Z^{\boldsymbol{\alpha^2}}_t + \int^{t}_{0}h(s,\omega,x,(\boldsymbol{\alpha^{1}_*},\boldsymbol{\alpha^{2}}))ds$ is martingale. I showed $\Rightarrow$, but couldn't show $\Leftarrow$, I appreciate any tip.

Other question: Why is generally $Z^{\boldsymbol{\alpha^2}}_t + \int^{t}_{0}h(s,\omega,x,(\boldsymbol{\alpha^1},\boldsymbol{\alpha^2}))ds$ supermartingale?

(The problem was built based on problem showed in ''Kandethody M. Ramachandran & Chris P. Tsokos, Stochastic Differential Games, Theory and Applications, University of South Florida, Atlantis Press, Tampa, 2012, p.g. 47-49'')

1

There are 1 best solutions below

0
On BEST ANSWER

I could answer the first question: Note that $g(X_1)$ é $\mathcal{F}_1$-measurable, then $Z^{\boldsymbol{\alpha^2}}_1 = g(X_1)$ a.e. Therefore:

$Z^{\boldsymbol{\alpha^2}}_0 = Z^{\boldsymbol{\alpha^2}}_0 + \int_{0}^{0}h(s,\omega,x(\boldsymbol{\alpha}_{\ast}^1,\boldsymbol{\alpha}^2))(s)ds = \mathbb{E}_{\boldsymbol{\alpha_{\ast}^1},\boldsymbol{\alpha}^2}\left[Z^{\boldsymbol{\alpha^2}}_1 +\int^{1}_0 h(s,\omega,x(\boldsymbol{\alpha}_{\ast}^1,\boldsymbol{\alpha}^2))(s)ds|\mathcal{F}_0\right] = \mathbb{E}_{\boldsymbol{\alpha}_{\ast}^1,\boldsymbol{\alpha}^2}\left[g(X_1) + \int_{0}^{1}h(s,\omega,x(\boldsymbol{\alpha}_{\ast}^1,\boldsymbol{\alpha}^2))(s)ds|\mathcal{F}_0\right] = \Psi^{\boldsymbol{\alpha_{*}^1},\boldsymbol{\alpha}^2}_0 a.e.$

i.e., $\Psi^{\boldsymbol{\alpha}_{*}^1,\boldsymbol{\alpha}^2}_0$ is the best that player 1 can attain from $t = 0$ onwards, so $\boldsymbol{\alpha}_{*}^1$ is the optimal reply of player 1.