I'm familiar with the concept of a monad in computer programming, where it is used as a framework for expressing a variety of different types of computation.
I've been learning some basic category theory, and understand the corresponding formulation of a monad as a triple $(T, \mu, \eta)$. I'm curious as to how this idea of a monad arises in mathematics? Why are they interesting to mathematicians?
I agree with some of what goblin says. I also disagree with a couple of things. The main relevant ones are 1) the implicit suggestion that how programmers use monads will look like how mathematicians use them, and 2) that monads are used to "express data structures". Many "notions of computation" do not correspond to the colloquial meaning of "data structure", and even the ones that do I doubt many would describe them as "expressed" by monads unless they happen to be something like a free monad. I do agree with Ian's comment that a lot of the importance of monads to mathematicians has to do with their relationship to adjoint functors. I do think goblin is going to go in a good direction talking about universal algebra, and it is likely similar to what I'll talk about. I hope it goes into explicit concrete detail.
I'm mostly not going to talk about how they arose historically, but briefly, they were first defined by Roger Godement in 1958 in an article on sheaf theory in the context of Algebraic Topology. It sounds like he actually introduced the notion of a comonad as opposed to a monad, and he used the term "standard construction". I'm completely guessing here since I don't have access to the article, but I suspect he was defining a comonad to organize the degeneracy and face maps of a simplicial complex or similar. I would love if someone can verify or correct this in the comments. The idea of what I'm thinking of can be (excessively) compactly described as follows: The augmented simplex category, $\Delta_+$, can be characterized as the free monoidal category with a monoid object. An (augmented) simplicial object in a category $\mathcal{C}$ is a functor $\Delta_+^{op} \to \mathcal{C}$. The universal property of $\Delta_+$ means a monoidal functor from it to any (monoidal) category is the same thing as a monoid object in that category. You've probably heard of monads described as "monoid objects in the category of endofunctors", and, sure enough, a monad in a category $\mathcal{C}$ is equivalent to a monoidal functor $\Delta_+ \to [\mathcal{C},\mathcal{C}]$. A comonad is a functor $\Delta_+^{op} \to [\mathcal{C},\mathcal{C}]$. The upshot of this is every comonad in a category $\mathcal{C}$ gives rise to a simplicial object in $\mathcal{C}$ (though not every simplicial object arises this way).
If you go out and look at how monads are used by categorists, you'll probably find that it looks nothing like how monads are used in e.g. Haskell. Some of that is because they are being used for different things; very occasionally it's just differences in presentation. I want to touch on one aspect as it's a fairly concrete difference, it underscores some of the difference in emphasis between mathematicians and programmers, and it is one of the more important and impressive examples of monad theory.
As has been mentioned, and as you are probably aware, monads are related to adjunctions. Given any adjunction $F \dashv U$, we get a monad $T = UF$ (and a comonad $G = FU$). A natural question is: can we get an adjunction given a monad? The answer is "yes, in many ways" and there are two, not necessarily distinct, extreme approaches. One approach uses the the Kleisli category which you've probably heard of before. The other approach uses the Eilenberg-Moore category.
An object of the Eilenberg-Moore category for a monad, $T$, is called a $T$-algebra and consists of a pair of an object $A$ and an arrow $e : TA \to A$ that satisfies a bunch of laws that are unimportant for us. Now it turns out the Kleisli category is equivalent to the subcategory of the Eilenberg-Moore category consisting of objects $\mu_A : T^2 A \to TA$ where $\mu : T^2 \to T$ is the monadic multiplication, i.e.
joinin Haskell. For concreteness and because it is a simple and relevant example, let's set $T$ to be the free monoid monad, i.e. the list monad. I'll use Haskell syntax, so $TA = [A]$. Now, it turns out the algebra, $e$, gives rise to a monoid structure on $A$ via $a*b \equiv e([a,b])$ and $1 \equiv e([])$. The $T$-algebra laws imply the monoid laws. So, every $T$-algebra is actually a monoid, and it's immediate from the definition of "free monoid" that every monoid gives rise to a $T$-algebra. In other words, the category of monoids is equivalent to the category of $T$-algebras where $T$ is the free monoid monad. In general, if there exists an adjunction $F \dashv U : \mathcal{D} \to \mathcal{C}$ that leads to this situation, we say that $\mathcal{D}$ is monadic over $\mathcal{C}$. In our case, $U : \mathbf{Mon} \to \mathbf{Set}$, the underlying set functor, leads to the category of monoids being monadic over $\mathbf{Set}$. Now where this gets interesting is that this works for any category of algebraic structures, e.g. rings, groups, lattices, algebras. Now the general study of monadic categories can give results that apply to all algebraic structures. It's actually better than that though. Every monadic (over $\mathbf{Set}$ at least) category is "algebraic" in a bit more general sense than the traditional definition. This leads to algebraic presentations of things that may not obviously be algebraic, e.g. compact Hausdorff topological spaces.It should be reasonably clear that the Kleisli category, in our example, corresponds to the subcategory of free monoids. Programmers tend to focus on algebraic structures with lawless presentations (which are necessarily free). Most algebraic data types (certainly polynomial ones) correspond to lawless algebraic theories. (The ones that don't don't correspond to algebraic theories at all for mostly technical reasons.) Most algebraic structures mathematicians care about are not lawless or even free. Finite dimensional vector spaces are all free, and, sure enough, they are well studied by programmers.