I'd like to sample random paths to approximate the solution to the heat equation on the unit sphere shell. Specifically, I'd like to solve $u_t = \alpha\Delta u$ with some initial condition $u_0$. In spherical coordinates, unless I have made an error, I believe the pde is $$ u_t = \alpha\left(u_{\theta\theta} + \cot(\theta) u_\theta + \csc^2(\theta) u_{\phi\phi}\right). $$
Assuming I have written the pde correctly, then we may write $$ u(\theta,\phi,t) = \mathbb{E}\left[u_0\left(X_t,Y_t\right)\,\middle|\,X_0 =\theta, Y_0 = \phi\right], $$ where $$ \mathrm{d}X_t = \cot(X_t)\mathrm{d}t + \sqrt{\alpha}\mathrm{d}W_t,\qquad\mathrm{d}Y_t = \sqrt{\alpha}\left|\csc(X_t)\right|\mathrm{d}W'_t. $$ I am assuming that $\theta\in[0,\pi]$ measures the angle from the north pole and $\phi\in[-\pi,\pi]$. $W$ and $W'$ are independent white noise processes.
I have two main (related) issues at this point. If I were to implement this numerically, how would I handle sampling paths when $X_t$ and $\theta$ are near zero or $\pi$?
Beyond this, how does one interpret the SDE? Intuitively we should have that for every point $p$ on the sphere, the probability of transitioning to a different area should be relatively the same. So if I were at the north or south pole, the probability transition to nearby areas should be the same as those anywhere else on the sphere. So I should be able to handle the poles by rotation calculations.
At the same time, the SDE for $X_t$ has a drift term that I cannot get an intuition for. Why would heat diffusion be biased to migrate around the sphere in a particular direction? Why wouldn't $Y_t$ show such a drift term too?
Any help you could lend would be greatly appreciated.
Edited to fix a variable typo.