I am trying to learn about manifold learning techniques; a family of dimensionality reduction methods in machine learning. According to this idea, there is a low ($d$) dimensional, hidden space where the real data generation mechanism lies, which has $d$ degrees of variability. But we observe the data in a high ($m$) dimensional space where $m > d$. There is a function $f:\mathbb{R^d} \mapsto \mathbb{R^m}$ which is called embedding function which takes the data from low dimensional hidden space and maps to the high dimensional observable one as $x_i = f(\tau_i) + \epsilon_i$, where $\epsilon_i$ is a noise term. The aim here is to learn about the function $f$. All of these ideas are summed up in the following slide:
What I don't get exactly in this kind of method is the "embedding function" $f$. It is said that this function maps a $d$ dimensional space to a $d$ dimensional manifold in a higher, $m$ dimensional space. Is it a mathematical fact that such a function $f:\mathbb{R^d} \mapsto \mathbb{R^m}$ should always generate a $d$ dimensional manifold in its range space? I think it is not the case, since such a function can be a much more general one; mapping the input space in a randomly, irrelevant way to the range space.
So is it just an assumption of the approach that this $f$ function is well behaving, in the sense that it maps a low dimensional space more or less to a manifold of the same dimension; in a high dimensional observation space? Or is it a mathematical fact? How should I interpret this?
An arbitrary function $f: \Bbb R^d \to \Bbb R^m$ could indeed have a very nasty image. The key word is embedding - in the geometric context, this has a precise meaning which guarantees the image will have the same dimension as the domain.
In your case, you have a kind of inverse problem, where you're attempting to find such an $f$ given some dataset. The slide makes no mention of how $f$ is produced - whatever methods are used, they should be tailored to produce an embedding.