I'm confused about the definition given in my class:
If $X=(X_1,\ldots,X_k)$ is a vector of $k$ discrete random variables, and $g$ is a function, then $$E(g(X))= \sum_{x_1\in \operatorname{Im}(X_1)} \cdots \sum_{x_k\in \operatorname{Im} X_k} g(x_1,\ldots,x_k)f_X(x_1,\ldots,x_k).$$
In the notes it is called a "formula", but I feel like this is a theorem rather than a definition, because it has to follow from the definition of $E(X)$ which was not given.
So what exactly is the definition of $E(X)$? Would it be $$E(X):= \sum_{x_1\in \operatorname{Im}(X_1)}\cdots\sum_{x_k\in \operatorname{Im} X_k} f_X(x_1,\ldots,x_k)\in\mathbb{R} \text{ ?}$$
Also what is the codomain of the function $g$? Is it $\mathbb{R}$ or $\mathbb{R}^k$
When you write down $E(X)$ you must remember that you are taking the expectation value of a vector. Fortunately there is pretty much only one thing this could be. The expectation value is itself a vector in $\mathbb R^k$ with components $$ (E(X_1),E(X_2),\ldots, E(X_k)).$$ If you were to write it in the same form of the other, it would be $$E(X) = \sum_{x_1\in Im(X_1)}...\sum_{x_k\in Im (X_k)} (x_1,\ldots,x_k)f_X(x_1,...,x_k)$$ where the $(x_1,\ldots,x_k)$ is a vector that you are summing up componentwise. Similarly, because expectations distribute linearly over vectors, the dimension of the codomain of $g$ doesn't really matter. It could be $\mathbb R^n$ for any $n$ where we interpret the sum as vector addition. So we can look at $E(X)$ as the special case where $g$ is the identity function $\mathbb R^k\to \mathbb R^k$
To address your main question, I would feel okay about thinking about that as a definition for the expected value of a vector function of a vector of discrete random variables. This is a vector version of what is sometimes called the law of the unconscious statistician. There are certainly more abstract and general definitions that this could be taken as derived from.
Edit
I guess the part that's a 'theorem' is that this is equivalent to what you would get by deriving the distribution of $g$ with a change of variables. I think this is generally more straightforward than working with change of variables formulas. After all, the multivariable PMF for the vector $g(X)$ will just be the expectation value of a suitable indicator function.