In MU puzzle: https://en.wikipedia.org/wiki/MU_puzzle#The_puzzle, We have "MU" string and 4 rules. Now when compared this to logic the wiki article says "The MI string is akin to a single axiom, and the four transformation rules are akin to rules of inference.".
What confuses me is why are the four transformation rules compared to rules of inference in logic and not to simply axioms? For me I would think that those 4 rules are axioms of that formal system, isn't it?
This may come down to a terminology issue - some authors use "axiom" fairly broadly. My stance, and I believe Hofstadter's as well, is the following:
It's at this point that a very annoying term crops up: "logical axiom". According to the previous paragraph, a logical axiom is really an inference rule that takes in no inputs and tells us that we can always infer a statement of a given form "for free."
At this point I should give an example of how this plays out. If I want to whip up a Hilbert system proof of some formula $Q$ from some formulas $P_1,...,P_n$, the situation is standardly described as follows:
We have a single inference rule, namely modus ponens: from $\varphi$ and $\varphi\rightarrow\psi$ we can infer $\psi$.
We have nine logical axioms (see here).
While the above two bulletpoints apply throughout the entire logical context we're in, in this specific case we also have the sentences $P_1,..., P_n$ as non-logical axioms.
By contrast, I would prefer to describe it as follows:
We have ten inference rules; one of these is binary (= takes in two inputs and spits out a third) while the other nine are nullary (= just declare a single type of sentence "free").
In this specific case, we have axioms $P_1,...,P_n$.
Sequent calculi basically take this stance. These consist of a series of basic rules for inductively determining a set of "valid sequents" (basically: assertions of the form "From $\Gamma$ we may deduce $\varphi$"). So everything we're given "for free" by the system is clearly declared as a rule, as opposed to a sentence, so there's no possible confusion. (Of course, sequent calculi are often in my experience thought of as less intuitive than Hilbert systems.)
The important feature is that in either case we have two pieces to any deduction: the ambient logical apparatus, and the specific hypotheses of the problem. Whether or not we want to distinguish between inference rules and logical axioms in the first category is purely superficial; the point is that in the Hofstadter example, we do have a "starting string" MI which isn't given an interpretation as a purely logical fact, so it makes sense to put it in that second category.