For example in a Hilbert system for propositional logic, one sample system uses modus ponens along with three axioms:
I. $A \to (B \to A)$
II. $(A \to (B \to C)) \to ((A \to B) \to (A \to C))$
III. $(\lnot B \to \lnot A) \to (A \to B)$
How did Hilbert come up with these axioms as enough to represent an entire logic system?
Is it mostly trial and error to come up with a set of axioms or is there a method to the madness?
How this set of axioms was originally found was likely a mixture of trial and error, adapting earlier work, and optimizing for the minimal number of axioms. Connections to combinatory logic that I touch on below were probably also a factor.
For a (probably) ahistorical account, let me begin by slightly rewriting the first two axioms:
$A\to(\Gamma\to A)$
$(\Gamma\to(A\to B))\to(\Gamma\to A)\to(\Gamma\to B)$
If we scratch out the $\Gamma$s, we get $A\to A$ and $(A\to B)\to(A\to B)$ (the latter can be thought of as a curried form of $(A\to B)\land A \to B$). We can think of these as internalizations of the Identity and Cut rules: $$\dfrac{}{A\vdash A}\qquad \dfrac{A\vdash B\qquad \vdash A}{\vdash B}$$ Adding the $\Gamma$s back gives us internalizations of the above rules with contexts, i.e. the rules: $$\dfrac{}{\Gamma,A\vdash A}\qquad \dfrac{\Gamma,A\vdash B\qquad \Gamma\vdash A}{\Gamma\vdash B}$$
The upshot is that we can encode hypothetical reasoning by encoding $\Gamma\vdash A$ as $\Gamma\to A$ and then manipulating these with the internal identity and cut "rules". Sure enough, axioms I and II are what is necessary to prove the Deduction Theorem. As a proof relevant perspective on this, axioms I and II correspond to $K$ and $S$ of combinatory logic and the Deduction Theorem corresponds to bracket abstraction.
If we stop here, we have positive implicational logic. If we continue on and keep using our idea of encoding hypothetical reasoning $\Gamma\vdash A$ as $\Gamma\to A$, then we should look at the rules for $\neg$. In the sequent calculus they are: $$\dfrac{\Gamma,\neg A\vdash\Delta}{\Gamma\vdash A,\Delta}\qquad\dfrac{\Gamma\vdash\neg A,\Delta}{\Gamma,A\vdash\Delta}$$ Of course, using both of these rules gives: $$\dfrac{\Gamma,\neg B\vdash\neg A,\Delta}{\Gamma,A\vdash B,\Delta}$$ This rule has the benefit of not changing the number of conclusions. We can roughly get the original rules by choosing $A$ to be something that's provable or $B$ something that is refutable.
The first rule for $\neg$ suggests that we can represent the extra conclusions, $\Delta$, via extra negated assumptions in $\Gamma$. The third axiom then doesn't need to deal with the assumptions, since the Deduction Theorem based on the other two axioms means we can add them when we want. The handwaving I'm doing here is whether a rule like $\dfrac{\Gamma,\neg B\vdash\neg A}{\Gamma,A\vdash B}$ is strong enough as the only rule for negation to derive all classical results (in a context where we also have rules for implication). We could, for example, take $\neg\neg A\to A$ as an axiom instead but this turns out to be too weak on its own. That this might be the case, you could see by viewing it as an encoding of a rule as we've been doing. That (encoded) rule doesn't give us any way of dealing with singly negated hypotheses or (single) negations in conclusions.
In general, a lot of axioms in Hilbert style systems can be understood as viewing them as encodings of rules. As an example, one axiom for $\lor$ is $(A\to C)\to(B\to C)\to (A\lor B \to C)$ which we can view as an encoding of the rule $\dfrac{A\vdash C\qquad B\vdash C}{A\lor B\vdash C}$. Of course, the encoding represents $\vdash$, meta-implication (i.e. the horizontal bar in rules), and $\to$ all as $\to$. For example, above I said the second axiom encoded the Cut rule, but I could just as well have said it encoded a modus-ponens-with-contexts rule.