Question about the definition of Markov kernel

354 Views Asked by At

Let $(X,\mathcal A)$ and $(Y,\mathcal B)$ be measurable space. A ''Markov kernel'' with source $(X,\mathcal A)$ and target $(Y,\mathcal B)$ is a map $\kappa : \mathcal B \times X \to [0,1]$ with the following properties:
(1) For every (fixed) $B \in \mathcal B$, the map $x \mapsto \kappa(B, x)$ is $\mathcal A$-measurable.
(2) For every (fixed) $x \in X$, the map $B \mapsto \kappa(B, x)$ is a probability measure on $(Y, \mathcal B)$.

Can someone please explain what this definition is saying? I am not getting the point, please if someone explain with example that will be great help. Thanks.

2

There are 2 best solutions below

0
On

Markov kernels are just a way of expressing conditional distributions. The idea is that for each $x \in {X},$ you want to say that conditional law of some random variable $\bf Y$ given an observation of $\mathbf X = x$ is $\kappa(\cdot,x)$ --- this is sometimes denoted $\kappa(\cdot|x)$ instead. This is precisely the point of condition (2). $\kappa(B,x)$ is meant to represent $P(\mathbf Y \in B | \mathbf X = x)$.

In order to work with objects like $\kappa(B,x)$ as $x$ varies, the technical structure of probability theory requires that these conditional probabilities are not wild, in the sense that at least measure-theoretic issues don't arise. We would need this, for instance, to ensure that sets like $E = \{x: \kappa(B,x) \ge \tau\}$ are actually events. Note that such events are something we would often like to be able to deal with --- for example, if you observe $\mathbf Y \in B$, and you want to decide which $x$ are plausible for $\mathbf X$ given this observation, one reasonable answer is precisely $E$ for some value of $\tau$. The condition (1) says (more or less) that no matter what $B \in \mathcal{B}, \tau$ we choose, the sets $E$ belong to $\mathcal{A},$ which is precisely what is needed for things like $P(\mathbf X \in E)$ to make sense.

To sum up, Markov kernels are a formal way to set up conditional distributions. (2) is precisely the part of the definition that captures this aspect, while (1) is needed for technical reasons to ensure that the conditional distributions we treat are nice to work with.

0
On

@stochasticboy321's answer is a very good general answer, and I just want to add one other perspective from Markov chains. In this case, let's just simplify by assuming $X=Y$.

A Markov transition kernel $\kappa$ induces two operators:

  1. the map $\mu \mapsto \mu \,\kappa$ on the space of probability measures defined by $$\mu \, \kappa(A) = \int_{X} \kappa(A,x) \ \mu(dx),$$
  2. the map $f \mapsto \kappa f$ on the space of bounded measurable functions defined by $$\kappa f(x) := \int_{X} f(y) \ \kappa(dy,x).$$

Both of these operators are fundamental to the analysis of Markov chains. Briefly:

  1. Given a distribution $\mu$, we can view $\mu$ as the "starting" distribution of the process, and $\mu \, \kappa$ is the subsequent distribution of running the process one step.
  2. Given a bounded measurable function $f$, $\kappa f(x)$ computes the expected value of $f$ conditioned on a starting state $x$ and running the process one step.

The asssumptions we put on a Markov transition kernel are the minimum assumptions that make these operators well-defined.