I'm told that if $\theta_{ML}$ is the MLE of parameter $\theta\in \mathbb{R}^n$ and $g:\mathbb{R}^n\rightarrow\mathbb{R}^m$ is injective then $g(\theta_{ML})$ is the MLE of $g(\theta)$. This doesn't exactly make sense to me though. If we required that $g$ were strictly increasing or decreasing then I could understand, but if $g$ is discontinuous at $\theta_{ML}$ and increasing from the left, decreasing from the right, would it not be possible for $g(\theta_{ML})$ to fail to be an MLE of $g(\theta)$?
I'm thinking in terms of a maximizer of a function on the real line since I'm a little shaky on the idea of MLEs. But if you had a function like $1-x^2$ for instance, it of course has a max at $x=0$. If we take the injective function
$$g(x) = \begin{cases} x & \text{ if } \ \ x < 0\\ 1-x & \text{ if } \ \ 0\leq x \leq 1 \\ x+1 & \text{ if } \ \ x > 1 \end{cases}$$
then 0 is not a maximizer of $g(1-x^2)$, since for instance $g(1-x^2)$ evaluated at $x=0$ is $g(1)$ is 0, but when evaluated at 1 is $g(0)$ is 1.
I feel in some way I may be misunderstanding what it means to have an MLE of a parameter, but not having specified the distribution function to which it's associated. Is this supposed to be sort of assumed background to the problem?
It seems like you are using your intuition for what it takes to maximize the function $g.$ But that has nothing to do with maximizing the likelihood function $L(\theta;x)$. They are completely different functions.
In answer to your last paragraph: yes, we are supposed to assume that there is a distribution that depends on the parameter $\theta,$ but none is mentioned because it isn't important which one.
As it sounds like, the maximum likelihood estimator for $\theta$ is the value at which the likelihood function $L(\theta;x)$ is maximized. What this question is concerned with is reparametrization, i.e. choosing a different representation for the parameter.
Here's a simple example. You've probably seen two different parametrizations for the exponential distribution: $$ f(x) = \lambda e^{-\lambda x}\\f(x) =\frac{1}{\theta}e^{-x/\theta}.$$ The first is parametrized with a rate parameter and the second with a mean parameter. We can express $$\lambda = g(\theta) = \frac{1}{\theta},$$ where here the reparametrization function $g$ is just $1/x.$ Maximizing the likelihood function (assuming a single sample) $ L(\theta;x) = \frac{1}{\theta}e^{-x/\theta}$ gives $\hat\theta =x.$ If we do the same thing for $L(\lambda;x)=\lambda e^{-\lambda x},$ we get $\hat\lambda = 1/x.$
If you think about this long enough, it should become obvious that the fact that $\hat\lambda = \frac{1}{\hat\theta}$ was no accident. If you simultaneously switch from the $\lambda$ parametrization to the $\theta$ parametrization, and take $\theta=1/\lambda,$ the numerical value of the likelihood doesn't change.
(I'm not satisfied with my explanation there, but I can't see how to improve it. This is one of those things that is frustratingly hard to articulate clearly even though it is blindingly obvious once you "get it".)
I'll leave it up to you to figure out why they require the reparametrization function $g$ to be injective. It shouldn't be hard to think of a non-injective $g$ that fails glaringly... just think of the least injective function imaginable.