I am studying stochastic differential equations from the book Stochastic Differential Equations, An Introduction with Applications by Bernt Oksendal and even though I sort of, more or less, understand the abstract definition of $E^{x}[f(X_t)]$, I get confused when it comes to calculations.
I have seen equalities like this several times:
$$E^{x}[f(X_t)]=\int_{\mathbb{R}^n}f(y)p_t(x,y)dy$$ where $p_t(x,y)$ is called transition measure. Oksendal himself has used this throughout the exercises in Chapter 8, but he hasn't explained what a transition measure is until the next chapter.
The abstract definition of $E^{x}[f(X_t)]$ is that it is the same as $E[f(X_t^{x})]$ with respect to the measure $P^{0}$ where $P^{0}$ denotes the probability law of Brownian motion starting at $0$. It is also called the expectation w.r.t. the probability measure $Q^{x}$, i.e. the probability law of $X_t$ starting at $x$.
A thorough explanation about how all these concepts come together and how they're related to each other is highly appreciated. Thanks.
Brownian motion is a (time-homogeneous) Markov process. The questions you're asking are basically questions about Markov processes. I'll let $X$ denote a general Markov process on $\mathbb{R}^{d}$.
One way to think of $X$ is as a collection of random vectors indexed by time $X = (X_{t})_{t \geq 0}$. For each $t$, $X_{t}$ is a random element of $\mathbb{R}^{d}$. A Markov process is completely specified by its transition kernel: in our case, I'll assume our Markov process is "continuous," meaning for each $t$ there is a transition function $p_{t} : \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}_{\geq 0}$ such that $$\mathbb{P}\{X_{s + t} \in A \, \mid \, X_{s} = x\} = \int_{A} p_{t}(x,y) \, dy.$$ An intuitive way of reading this is: $p_{t}(x,y)$ is the probability (density) of hitting $y$ after a time $t$, given that you started at $x$. One way of understanding what it means to be Markov is through the Chapman-Kolmogorov equations, which can be written $$p_{t_{1} + t_{2}}(x,y) = \int_{\mathbb{R}^{d}} p_{t_{1}}(x,z) p_{t_{2}}(z,y) \, dz.$$ In other words, the probability of going from $x$ to $y$ in time $t_{1} + t_{2}$ is the average of the the probability of going from $z$ at time $t_{1}$ to $y$ at time $t_{2}$ averaged against the probability of going from $x$ at time $0$ to $z$ at time $t_{1}$.
Notice that nothing I did said anything about where $X$ started, i.e. the distribution of $X_{0}$. However, for each $x \in \mathbb{R}^{d}$, there is a unique probability measure $\mathbb{Q}^{x}$ such that $Q^{x}\{X_{0} = x\} = 1$ and, under $Q^{x}$, $(X_{t})_{t \geq 0}$ is a Markov process with transition function $p_{t}$. For example, this yields $$\mathbb{Q}^{x}\{X_{t_{1}} \in A_{1},\dots,X_{t_{n}} \in A_{n}\} = \int_{A_{1}} \dots \int_{A_{n}} p_{t_{1}}(x,z_{1}) p_{t_{2} - t_{1}}(z_{1},z_{2}) \dots p_{t_{n} - t_{n - 1}}(z_{n - 1},z_{n}) \, dz_{n} \dots dz_{1},$$ whenever $A_{1},\dots,A_{n}$ are Lebesgue measurable subsets of $\mathbb{R}^{d}$ and $0 < t_{1} < t_{2} < \dots < t_{n} < \infty$.
Notice that it follows from the previous equation that we can write $$\mathbb{E}^{x}(f(X_{t})) = \int_{\mathbb{R}^{d}} f(z) p_{t}(x,z) \, dz,$$ if $\mathbb{E}^{x}$ is our notation for the expectation with respect to $Q^{x}$. Intuitively, $p_{t}(x,z) \, dz$ is the probability that $X_{t}$ lands in $B(z,dz)$ given that it started at $x$ and, thus, $\int_{\mathbb{R}^{d}} f(z) p_{t}(x,z) \, dz$ is the average value of $f(X_{t})$.