If you go through a book on introductory set theory you'll usually see something like "We take $f(x) = y$ as an abbreviation for $(x, y) \in f$."
Fair enough. If you then move forward by replacing any instance of $f(a) = b$ with $(a, b) \in f$ or vice versa, where $a$ and $b$ don't themselves include function application then we expect everything will work out.
But this is not how any practitioner will actually work. Once that equality sign is there, the user is going to start substituting $f(a)$ for $b$, because that's what you do with equality. Furthermore, common expressions like $f(u) = g(v)$ or $g(f(x)) = y$ can't be expanded robotically without additional rules, since we aren't given a meaning for $(u, g(v)) \in f$ or $(f(x), y) \in g$. It's not tough to figure out what those must mean, but it seems like something worth mentioning when you introduce the notation.
At this level, I don't think anyone really wants to go through a whole metatheoretical exposition just to justify the practical use of the $f(x) = y$ definition, but it does seem a bit cavalier to just throw it out there with no caveats. Especially when something like "define $f(x)$ as the big-Union of $f$'s image of $x$" is right there, and you get substitution, composition, etc. for free.
Am I missing some nice metatheorem about first-order logic that justifies using this definition of equality like actual equality and using $f(x)$ like a set (like a variant of how Extension by Definition is used to give names to the empty set, binary union, etc.) or is this really a "cheat"?
EDIT: Corrected what I was calling "Skolemization" to be "Extension by Definition".
After thinking about the comments for a while, I'll take a stab at answering my own question.
To clarify, this is about the seemingly common practice in expositions of axiomatic set theory of defining $f(x)\ \dot=\ y$ as mere notation for $(x, y) \in f$ instead of actually defining function application (I put a dot over the equals sign to clarify that this is a notational convention, not equality in the base language of FOL+ZFC) then later "forgetting" it's just notation and using $f(x)$ by itself as denoting a set or $\dot=$ to justify substitutions.
For example, we might later see $(\forall f \in A\rightarrow B)(\forall g \in B\rightarrow C)(\forall x \in A)(g(f(x)) \in C)$ in the same development, which doesn't include $\dot=$ at all and uses $f(x)$ as a set directly.
So how do we justify this? If we attempt to use the fact $f$ is a function to extend our theory's first-order language to include $\dot f$ as a function symbol definitionally (I put a dot over it to clarify it's a new symbol in the language, not the variable $f$), then we run into an immediate problem: $\dot f$ is a function constant, like $\varnothing$ or $\cup(x,y)$. We can't quantify over it. Alternatively, if we expand our syntax to allow function variables, then we are no longer in first-order logic. So we are missing something in the transition from the $\dot =$ definition to actual use.
Kurt pointed out what it is in the comments: the definition of function application.
Let's try to actually apply extension by definition on function $f$. We want a proof of the form $(\forall x...)(\exists! y)R(x,y)$. Let's start with $(\forall A)(\forall B)(\forall f \in (A \rightarrow B))(\forall x \in A)(\exists! y \in B) ((x, y) \in f)$. This doesn't quite match the form due to the bounded quantification. $$(\forall A)(\forall B)(\forall f)(\forall x)(\exists! y)[(f \in (A \rightarrow B) \land x \in A \implies x \in B \land (x, y) \in f) \land (f \not \in (A \rightarrow B) \lor x \not \in A \implies y = \varnothing)]$$ matches the form and is essentially equivalent. It just preserves the uniqueness of $y$ after the unique existential is moved past the bounds by giving it a default value when $f$ is not a function $A \rightarrow B$ or $x$ is not in its domain.
We end up with a new function constant in the extension $\text{apply}(A,B,f,x)$ with the defining axiom $(\forall A)(\forall B)(\forall f \in (A \rightarrow B))(\forall x \in A)[\text{apply}(A,B,f,x) \in B \land (x, \text{apply}(A,B,f,x)) \in f]$. I ignore the specification of the value of $\text{apply}$ when $f$ isn't a matching function or $x$ is outside of the domain, because we don't want to use it that way anyway.
If you only require that $f$ is a function generally and don't care about the domain and codomain you can define a similar $\text{apply}(f,x)$ without $A$ and $B$.
Then you can define the notation $f(x)$ to mean $\text{apply}(f,x)$ without any problems and proceed normally.
Which comes back to my initial question. It's easy enough to define function application in a way (like this or the big-Union of the image of $x$ under $f$, etc.) that works when used in practice, and it seems fundamental to a working definition of functions. Trying to skip over it notationally with $f(x) \dot = y$ seems like its skipping something that's actually important.