I am studying the definition of a monad from Saunders Mac Lane, Categories for the Working Mathematician, Springer Verlag, 1971. I have some difficulties getting an intuition about the relation to monoids explained on page 134. Here is the relevant content from page 133, 134, where I replaced the diagrams with text.
Any endofunctor $T : X \rightarrow T$ has composites $T^2 = T T : X \rightarrow X$ and $T^3 = T^2 T : X \rightarrow X$. If $\mu : T^2\dot\rightarrow T$ is a natural transformation, with components $\mu_x : T^2 x \rightarrow T x$ for each $x\in X$, then $T\mu:T^3\dot\rightarrow T^2$ denotes the natural transformation with components $(T\mu)_x = T(\mu_x) : T^3 x \rightarrow T^2 x$ while $\mu T : T^3\dot\rightarrow T^2$ has components $(\mu T)_x = \mu_{T x}$.
Definition. A monad $T = \langle T, \eta, \mu\rangle$ in a category $X$ consists of a functor $T: X\rightarrow X$ and two natural transformations: $$\eta: I_X\dot\rightarrow T, \mu : T^2 \dot\rightarrow T$$ such that $\mu \circ T\mu = \mu\circ \mu T$, $\mu\circ\eta T = id_T$, $\mu\circ T\eta = id_T$.
Since I am not able to draw the diagrams, I have replaced them with three equations, where $\circ$ is the composition of natural transformations and $id_T : T\dot\rightarrow T$ is the identity natural transformation. I am not sure if this notation is correct. If it is not, please let me know so I can correct it.
The text continues on page 134 as follows.
Formally, the definition of a monad is like that of a monoid $M$ in sets, as described in the introduction. The set $M$ of elements of the monoid is replaced by the endofunctor $T : X \rightarrow X$, while the cartesian product $\times$ of two sets is replaced by composite of two functors, the binary operation $\mu : M \times M \rightarrow M$ of multiplication by the transformation $\mu : T^2 \dot\rightarrow T$ and the unit (identity) element $\eta : 1 \rightarrow M$ by $\eta : I_X \dot\rightarrow T$. We shall thus call $\eta$ the unit and $\mu$ the multiplication of the monad $T$; the first commutative diagram of (2) is then the associative law for the monad, while the second and third diagram express the left and right unit laws, respectively. All told, a monad in $X$ is just a monoid in the category of endofunctors of $X$, with product $\times$ replaced by composition of endofunctors and unit set by the identity endofunctor.
So, here are my questions.
- What does it mean to replace the set $M$ of elements of the monoid by the endofunctor $T$? What are the elements of the new monoid as defined by this functor $T$?
- If $\mu : T^2 \rightarrow T$ is the multiplication in the new monoid, I would expect to write the associative law for $\mu$ as $$\forall x, y, z \in T : \mu(\mu(x, y), z) = \mu(x, \mu(y, z))$$ But what are these $x, y, z\in T$? $T$ is not a set!
- From my difficulties with points 1 and 2 above, I fail to see how the three monad laws in the definition express the associative law and the unit laws of this monoid.
I have read the example of a monoid from the introduction (pages 2 and 3) and I see that the diagrams for a monoid are similar to those for a monad, but my intuition is still very poor. How can I simply use a functor instead of a set and replace the Cartesian product of sets with functor composition?
Regarding point 2, we can actually interpret such equations in categories. In categorical logic, there is a notion of "generalized element". If we take the objects of a category as types, then we interpret variables of that type as being morphisms with codomain of that type.
I'm going to do this in a somewhat ad-hoc fashion because this particular case has some extra features I've not studied in the context of logic, so I'm just going to write true things rather than do it systematically.
Suppose you have two natural transformations $x : X \to P$, $y : Y \to Q$ to be viewed as elements, and a natural transformation $\tau : PQ \to R$. By convention, we write $\tau (x,y)$ to mean the natural transformation $XY \to R$ given by $xy \circ \tau $.
Then if we have three natural transformations $x : X \to T$, $y : Y \to T$ and $z : Z \to T$, we can compute
$$ \begin{align*} xyz \circ T\mu \circ \mu &= \left( (x \circ T) (yz \circ \mu) \right) \circ \mu \\&= (x \mu(y, z)) \circ \mu \\&= \mu(x, \mu(y, z)) \end{align*} $$
where the first equality is the interchange law (identifying $T$ with its identity natural transformation). Similarly,
$$ \begin{align*} xyz \circ \mu T \circ \mu &= \left( (xy \circ \mu)(z \circ T) \right) \circ \mu \\&= \mu(x, y) z \circ \mu \\&= \mu(\mu(x, y), z) \end{align*} $$
So, by the above theorem, the following are equivalent: