I am referring to the formula $Z = \frac{X - \mu}{\sigma}$.
I understand that this formula provides a way of mapping a random variable $X$~$N(\mu,\sigma)$ to another variable $Z$~$N(0,1)$. I have already read up on the properties of $Z$ and how to show that it has the expected value and variance that it does with that formula. I also understand why it works in my statistics problems: I'm just transforming $P(X\leq k)$ into $P(Z\leq \frac{k - \mu}{\sigma})$, and the latter we have values for (in the z-table) because it lets us work with any $\mu$ and $\sigma$ we really want, rather than having to find the $\mu$ and $\sigma$ for every possible $\mu$ and $\sigma$ and make those into their own tables.
I guess what I'm asking is more of a historical math question. Who was the first to discover this formula and its properties? How did they do it? Is there a way I could derive this formula purely from first principles and statistical intuitions or rules? I could not find anywhere online where someone ended up with formula from something else. Was this come up with just by accident?
I can't answer the historical part of your question, but the method you have described is based on two facts about the normal distribution:
You can find proofs of these facts in many statistics books.
As a consequence, $X- \mu$ has a normal distribution with mean zero and variance $\sigma^2$, and $(X- \mu) / \sigma$ has a normal distribution with mean zero and variance one. As you already know, it's more convenient to work with such a $N(0,1)$ variable because tables of its distribution are readily available.