Let $(B_t,\mathcal F_t)_{t\geq 0}$ be a $d$-dimensional brownian motion. Assume the coefficients $b:[0,\infty)\times \mathbb R^n\to\mathbb R^n$, $\sigma:[0,\infty)\times \mathbb R^n\to\mathbb R^{n\times d}$ of the following SDE
$$dX_t=b(t,X_t)dt+\sigma(t,X_t)dB_t$$
are Lipschitz continuous with Lipschitz constant $L$. Denote by $(X_t^x)_{t\geq 0}$ the solution of the SDE with initial condition $x\in\mathbb R^n$. Then we have for all $p\geq 2$ $s,t\in [0,T]$ and $x,y\in\mathbb R^n$.
$$...$$
(the conclusion of the theorem is not important in this case).
In one of the steps of the proof the author applies Burkholder's inequality in the following way:
$$\mathbb E\bigg[|\int_s^t \sigma(r,X_r^y)dB_r|^p\bigg]\leq C_p\mathbb E\bigg[\bigg(\int_s^t \sigma(r,X_r^y)^2 dr\bigg)^{p/2}\bigg]$$
What I don't seem to understand is why we can apply the aforementioned inequality, as far as I'm concerned this inequality can be applied in cases where $\sigma\in L^2(\lambda\otimes\mathbb P)$, and hence the Ito integral above is a martingale. But under our assumptions nothing seems to limit us to that case. For sure $\sigma$ must be square integrable with respect to $\lambda$, otherwise the stochastic integral cannot be defined, but this does not imply that it's square integrable with respect to $\lambda\otimes\mathbb P$.
What comes to my mind is to use stopping times, (a localizing sequence) in order to ensure that the integrand is in $L^2(\lambda\otimes\mathbb P)$.
Notice that the author does things like this in other theorems regarding SDE. For instance in the theorem of uniqueness of the solution of a SDE he starts by assuming that $X_t$ and $Y_t$ are two solutions of an Stochastic Integral Equation. Put $Z_t=X_t-Y_t$.
Then $$Z_t=\int_a^t \big(\sigma(s,X_s)-\sigma(s,Y_s) \big)dB(s)+\int_a^t \big(f(s,X_s)-f(s,Y_s)\big)ds$$
Then he uses Cauchy-Schwarz inequality ($(a+b)^2\leq 2(a^2+b^2)$), then takes expectations on both sides.
Taking the first term on the right we have
$$\mathbb E\bigg[\bigg(\int_a^t \big(\sigma(s,X_s)-\sigma(s,Y_s) \big)dB(s)\bigg)^2\bigg].$$
Then he applies Ito isometry, but we don't know whether $\sigma\in L^2(\lambda\otimes\mathbb P)$ or not.
Do you have any idea why the author does this? Is this just for the sake of simplicity?
Once you know that the (unique) solution to the SDE
$$dX_t = b(t,X_t) \, dt + \sigma(t,X_t) \, dB_t,$$
is integrable, in the sense that,
$$\sup_{t \leq T} \mathbb{E}(|X_t^x|) < \infty, \tag{$\star$}$$
everything is fine. Indeed, since $\sigma$ satisfies a Lipschitz condition, it is, in particular, at most of linear growth, that is,
$$|\sigma(t,x)| \leq C(1+|x|), \qquad t \in [0,T], x \in \mathbb{R}^d,$$
for some $C=C(T)>0$, and so
$$\mathbb{E}\left(\int_0^T |\sigma(s,X_s^x)|^2 \, ds \right) < \infty,$$
i.e. $\sigma(t,X_t) \in L^2(\lambda_T \otimes \mathbb{P})$. An analogous estimate holds for $b$, and consequently we may apply Burkholder's inequality.
In case that you do not (yet) know that $(\star)$ holds, you can use stopping, i.e. set $$\tau_r^x := \inf\{t \geq 0; |X_t^x-x| \geq r\}.$$ Since $\sigma$ is continuous and $|X_{t \wedge \tau_r^x}^x| \leq |x|+r$, it follows immediately that $\sigma(t,X_{t \wedge \tau_r^x})$ is uniformly bounded. Do all the estimates for the stopped processes and then let $r \to \infty$ at the very end (e.g. using Fatou's lemma or monotone convergence).