A common method for determining an inverse function $f^{-1}$ of a function $f$ is to write $f$ as an equation and solve for x. For example, if $f:=\mathbb{R}_0^+\rightarrow\mathbb{R}_0^+$, $x\mapsto x^2$, then $g:=\mathbb{R}_0^+\rightarrow\mathbb{R}_0^+$, $x\mapsto \sqrt{x}$ is an inverse function of $f$ because
$y=x^2$ iff $x=\sqrt{y}$.
(for another example see this post: https://math.stackexchange.com/a/2890915/493672)
Why does this method always work? The references I consulted seem to only state that this is how to do it, but not why it works. It's simply not evident to me. One problem I have is that functions are not equations, but in this case the authors write some function $f$ as an equation, treat it as an equation and get correct results anyway. It would be nice to have some justification for why inverse functions can be determined like this.
Edit: Fixed the range.
Let me answer on two levels.
First, if $y = f(x)$ for an invertible function $f$, then $x = f^{-1}(y)$. When you write out the equation $$y = f(x)$$ where $f(x)$ is some expression in $x$, then you solve this equation for $x$, the result will be in the form $$g(y) = x$$ or $x = g(y)$, for some other expression $g(y)$. But that expression defines a function $g$. And the very meaning "solve for $x$" is that the resultant equation $x = g(y)$ is equivalent to the original equation $y = f(x)$. But if we substitute for $x$ and $y$ in these equations from the other we get: $$x = g(f(x))\\y=f(g(y))$$ which is exactly the condition that $f$ and $g$ are inverse functions.
The second level is this. When you "solve $y = f(x)$ for $x$", what do you do? You make a series of changes to the equation to get an equivalent equation. The key rule is that if you do something to one side of the equation, you must do the same thing to the other side of the equation. But those "things" that you do can be thought of as applying some function to both sides:
Add something? $$g(x,y) = f(x,y) \longrightarrow g(x,y) + a = f(x,y) + a$$ is another way of writing $$g(x,y) = f(x,y) \longrightarrow h(g(x,y))= h(f(x,y))$$ where $h(t) = t + a$. Multiplying is the same, where $h(t) = at$. Taking the square is $h(t) = t^2$, etc.
So you start with $y = f(x)$, and step-by-step apply a sequence of functions: $$y = f(x)\\h_1(y) = h_1(f(x)) = h_1\circ f(x)\\h_2\circ h_1(y) = h_2\circ h_1\circ f(x)\\\vdots\\h_n\circ \dots \circ h_2\circ h_1(y) = h_n\circ \dots \circ h_2\circ h_1\circ f(x)$$
But there is a point to this. If you do it right, in the end $h_n\circ \dots \circ h_2\circ h_1\circ f(x) = x$, so you end up with $$h_n\circ \dots \circ h_2\circ h_1(y) = x$$ Letting $g = h_n\circ \dots \circ h_2\circ h_1$, this is $$x = g(y)$$ As before, this means $g$ is the inverse of $f$.