I am given a multidimensional markovian stochastic process $X_1,X_2,...X_n$ with continuous state space and unknown to me function $V$. I want to approximate expectation $E(V(X_k)|X_{k-1} = x)$ which is a function of $x$.
Suppose that I can simulate the entire sequence $X_1,X_2,...X_n$ using Monte Carlo. And for each $X_{k-1}^i$ from the black box I am given the value of $V(X_k^i)$ where $i$ denotes simulation index.
I understand that my question is too general, I hope that you will give me some references for this topic.
This type of problem can be approached by "curve fitting" or "least squares Monte Carlo". Both are basically just two names for the same thing: least squares regression. The approach is straightforward and just requires a basic understanding of regression. So no SDEs, functional analysis or measure theoretic conditional expectation is required.
The setting
To recognize how and why regression can be applied set $x_i = X_{k-1}^i$ and $y_i=V(X_k^i)$. With this terminology your simulation gives you iid pairs $(x_i,y_i)$. You know that $E[V|X_{k-1}=x_i]$ is a function $f(x_i)$ but you cannot observe this function you only observe a (crude) estimate $y_i$. Call $e_i = y_i -f(x_i)$ the error of the estimate. Notice that
Now make the assumption that the error $e_i$ has a normal distribution and you are in the regression setting.
Will it work? It depends ...
How successful this will be depends. You need to "guess" the structure of $f$. A standard (initial) guess is assuming $f(x)=\sum \beta_j p_j(x)$ where $p_j(x)$ are polynomials. You choose the $\beta_j$ such that they minimise the quadratic error with your observations, i.e. $\hat\beta = \text{argmin} \sum_i(y_i - \sum \beta_j p_j(x_i))^2$.
Depending on the structure of $f$ other low dimensional families might be better. For example if you know that $f$ is piecewise linear, you would choose piecewise linear functions for the approximation. If $f$ is periodic, use periodic ones and so on.
Another important issue is how fine you can sample $X_{k-1}$. If $f$ is very irregular (i.e. wiggles a lot) and you have only few samples, you are in trouble. Then the variance of $e_i$ will be large and samples with adjacent $x_i$ will be far apart and provide only very little information. If $f$ is more or less linear you only need few samples and linear maybe quadratic polynomials.
There is a rich literature and many tools available to do regression and to analyse the results. Specifically for the pricing of American options have a look at this review
Finally you should try to incorporate as much prior information as possible into your approximation. I mentioned already the structure of $f$ and explained the problem for a single location $k$. But if the $V$ functions at different $k$ have a known relation, there are ways to incorporate this as well. But to have a reasonable discussion about this, you need to provide a more detailed specification of your problem.