I am having a question to a paper I am reading. In his introduction the author writes the following:
I am not sure why this should be the case. The central limit theorem does not give some form of convergence rate and therefore I am not seeing that the MC method is immune to the dimension because of that.
Could somebody elaborate why this is the case? Would still be very interested in it.
For interested readers - it is the following publication: Deep Primal-Dual Algorithm for BSDEs: Applications of Machine Learning to CVA and IM

Say you have a (linear or not) PDE equation in $\mathbb{R}^d$ in the form $$ \frac{\partial}{\partial t} \nu(t, x) = \mathcal{L} \nu (t, x) $$ and you want to solve it numerically, or at least compute some macroscopic quantities which depends on the solution, such as $$ a_t = \int_{\mathcal{D} \subset \mathbb{R}^d}{ f(x) \nu(t, x) }.$$
Standard methods such as Finite Volume method or Finite element method would require first to discretize the region of $\mathbb{R}^d$ on which you want to compute the solution. This does not scale well with the dimension $d$.
Now assume you can find a probabilistic interpretation of the PDE in the sense that you find a stochastic process $(X_t)$ such that the law of $X_t$ is a solution of your PDE (sometimes called the Fokker-Planck PDE associated to the process): $$\forall t \geq 0,\quad \mathcal{L}(X_t)(dx) = \nu(t, x) dx.$$
Then, it holds that $$ a_t = \mathbb{E} f(X_t), $$ and then you can compute $a_t$ using Monte-Carlos method, by first simulating i.i.d. trajectories $(X^i_t)$ and using the approximation $$ a_t \approx \frac{1}{N} \sum_{k= 1}^N {f(X^i_t)}.$$
The advantage of this method is that it is usually independent of the dimension of the problem (as long as the probabilistic interpretation of the PDE behaves nicely in large $d$, - that is $\text{Var} f(X_t)$ behaves reasonably well with $d$) and consequently scale well with $d$. However for small $d$ the method is usually not competitive with standard numerical PDE techniques. Also you have to find a probabilistic interpretation of the PDE, which is not always possible.