Let$B_t$ be the standard Brownian motion and $M_t$ be its running maximum $\sup_{u \le t} B_u$.
Then I am trying to prove that the joint density of $(B_t, M_t)$ is given by $$P(B_t \in dx, M_t \in dy) = \frac{2(2y-x)}{\sqrt{2\pi t^3}}e^{-(2y-x)^2/(2t)}dxdy.$$
I can get this result by using the fact that $P[M_t \ge y, B_t \le x] = P[B_t \ge 2y-x] = \int_{2y-x}^\infty (2\pi t)^{-1/2} e^{-z^2 /(2t)} dz$.
Now if I differentiate this integral in $x$ and $y$, then I get the result.
But I am not sure why this makes sense. I know this to be true in the single variable case by FTC, so if we just have $P[B_t \le x]$ given by an integral then differentiating with respect to $x$ should give the density of $B_t$.
But how does this work when we have a joint distribution with one set for a random variable given by ${M_t \ge y}$ and the other given by $B_t \le x$ and we have a single integral that has a function of $y$ and $x$ on the lower limit of the integral?
I am curious to find out the mathematical justification for this process. I would greatly appreciate some help.
It sounds like you accept the fact that $$ P[M_t \ge y, B_t \le x] = P[B_t \ge 2y-x] = \int_{2y-x}^\infty (2\pi t)^{-1/2} e^{-z^2 /(2t)} dz\tag{1} $$ holds (which is shown in Karatzas & Shreve's book). The CDF of the joint distribution of $M_t$ and $B_t$ is $$ F(y,x)=P\Big[M_t\le y,B_t\le x\Big]=P[B_t\le x]-P[M_t \ge y, B_t \le x]\,. $$ Since $P[B_t\le x]$ does not depend on $y$ we get as PDF $$ f(x,y)=\frac{\partial^2}{\partial x\partial y}F(y,x)=-\frac{\partial^2}{\partial x\partial y}P[M_t \ge y, B_t \le x]\,. $$