Let me suppose I have one function of $y$ given $x$ : $f(y\mid x)$ and $N$ samples of $x$ : $\{x_i\}_{i=1}^N$. Here, I’d like to create a distribution over the space of $y$ based on this function $f$ given $x_i$ like:
$$ p(y\mid x_i) = \frac{\exp\{f(y\mid x_i)\}}{\sum_{y\in\mathcal{Y}} \exp\{f(y\mid x_i)\} } $$
However, unfortunately, the computation of $\sum_{y\in\mathcal{Y}}$ is infeasible. For example, the space $\mathcal{Y}$ is too large. Also, let me assume that it takes high costs to generate a sample $y_i^{(j)}$ from $f$ given $x_i$. So I can utilize only a few samples of $y$ and it’s difficult to apply Monte-Carlo approximation. As the extreme case, let me assume I have only one sample of $y$ per each $x_i$ : $y_{i}^{(j)}$.
Question: Can I simply use
\begin{align} \frac{\exp\{f(y_{i}^{(j)}\mid x_i)\}}{\sum_{i=1}^N \exp\{f(y_{i}^{(j)}\mid x_i)\}} \end{align}
instead of
\begin{align} \frac{\exp\{f(y_{i}^{(j)}\mid x_i)\}}{\sum_{j^\prime=1}^M \exp\{f(y_{i}^{(j^\prime)}\mid x_i)\}} \end{align}
as the rough approximation of $p(y\mid x_i)$ at $y_i^{(j)}$ given $x_i$? If not, what is the basic methods for this kind of situation where one can utilize small numbers of samples?
Thank you very much for reading this question!
Let me first say: Your notation is what's called overloaded; e.g. in your first equation, $y$ is both an argument and a summation index.
In principle, it is possible to sum over a possibly uncountable space; just integrate your function over the counting measure. The problem still remains in your case though, because only countably many of the summands may be nonzero for the integral to exist. Still, $f$ could be $-\infty$ (in an appropriate compactification of $\mathbb R$) everywhere except at countably many points, and in this case your first approach would actually work.
Note that in order for this to be a probability distribution, you would want that the following formula holds: $$ \forall y \in Y: p(y) = \sum_{i=1}^N p(x_i) p(y\mid x_i) \Rightarrow 1 = \sum_{y \in Y} \sum_{i=1}^N p(x_i) p(y\mid x_i) = \sum_{i=1}^N \sum_{y \in Y} p(x_i) p(y\mid x_i) $$ So you would need that for all $i$, $p(y\mid x_i)$ is nonzero at most at countably many places, but still for some $i$ nonzero at some places, and this wouldn't work out whenever the sum was infinite at all $i$. So for some $i$, $f$ would have to have the property indicated above. (At all other $i$, the sum could be infinite though, and then $p$ would be zero there.)
Your approximation is again indexed wrongly: It should read $$ p(y_m, x_k) = \frac{\exp\{f(y_m^{(j)}\mid x_k)\}}{\sum_{i=1}^N \exp\{f(y_i^{(j)}\mid x_k)\}}. $$ From what I've understood, $x$ and $y$ are not independent; that is, $y$ depends on $x$. Thus, whether or not the sample $y^{(j)}$ will be good depends on the variance of $y$ with respect to $x$ (since I don't know in which space $y$ lies, let's say that the variance is some abstract measure of how much $y$ usually deviates from its "standard" value).
Moreover, you would have to hope that all the other (possibly infinitely many) summands over $y$ do not amount to much. Also, it does not necessarily sum to $1$ when you sum over all $y$.
Your second expression is completely off: It should be the average of the expression above over $(j)$.