I've recently been doing some reading on the Legendre transformation. I've found these notes to be a nice reference.
I'm currently understanding the Legendre transformation as a way to change variables in a convex function $f(x)$, so that the function depends on its first derivative $f'(x)$ rather than its original argument. I understand this interpretation is in line with how we motivate the usage of the Legendre transformation in classical mechanics or thermodynamics.
In the notes that I refer to, they start off by considering $f(x)$, differentiating and defining $p = f'(x)$, and then inverting to get $x = f'^{-1}(p)$. They then substitute this back into the original function, to arrive at $f(f'^{-1}(p))$. They explain that this doesn't contain the same amount of information as $f(x)$, by giving the example of $f(x) = (x-x_0)^2/2$. The transformed function is $f(f'^{-1}(p)) = p^2/2$, and so hence they argue that we can see clearly that this doesn't contain the same amount of information as the original function, because it doesn't depend on $x_0$.
I think this works nicely as a heuristic argument; it's quite neat and simple and it makes sense to me. However it's not a proof, and it's not obvious to me how to extend it to a general function $f(x)$. Does anybody have a more complete argument about why the Legendre transformed function $f^*(p) = (xp - f(x))|_{x = f'^{-1}(p)}$ contains the same amount of information as the original function $f(x)$, whereas $f(f'^{-1}(p))$ alone does not?
Here is where my confusion arises from. Suppose I define a new variable $y = g(x)$ where $g$ is invertible, then invert and substitute to change variables from $f(x)$ to $f(g^{-1}(y))$. As far as I can see, given that $g$ is invertible then I should get the same amount of information contained in $f(g^{-1}(y))$ as in $f(x)$. For example, I believe if I want to integrate $\int_A f(x)d x$ where A is some subset of the domain of $f$, I should be able to use both the variables $x$, and $y = g(x)$, equally well to do this.
So given that $f'(x)$ has to be invertible if $f$ is convex, I don't see why my argument for a general invertible $g$ doesn't work in this case, where $g = f'^{-1}$.
A more precise statement of how I interpret the claims about information:
where $f$ is assumed to be strictly convex. Part 1 is a standard result that is proven in the notes, and the example of $f(x)=(x-x_0)^2/2$ establishes Part 2.