I try to understand adjoint functors in category theory. I like the idea of thinking of a left adjoint as 'best approximation from above' and a right adjoint as 'best approximation from below.'
More formally, this corresponds to the universal arrow construction, if one want to find a left adjoint of $R:{\bf D}\rightarrow{\bf C}$, for any $c\in{\bf C}$ one needs to find an initial object $(Lc,i_c:c\rightarrow RLc)$ in the comma category $c\Downarrow R$.
My question is now, how should I think about the unique arrow demand for the initial object? Like, if I think of categories where all hom sets are trivial (empty or one element), the 'best approximation from above' metaphor works just fine because when I say things as 'from above' I usually don't think about multiple arrows between objects. Is there any way to enhance this metaphor in a nice way such that it also works for the general case?
EDIT: I shared some new thoughts in an answer below.
Take your favourite example of a universal arrow, say $\{a, b\} → U\{a, b\}^*$ (the free monoid generated by letters a, b). Now $\{a, b\}^* $ is certainly big enough to factor every arrow $\{a, b\} → UM$, and it will remain so even if you add some clutter, making it into eg. $\{a, b, c\}^*$. But now the factorization is not unique, because you can map $c$ to any element of $M$. So uniqueness seems to guarantee that the universal arrow is optimal in a sense.
This of course is very vague and imprecise: the biggest problem is that the size terminology is unpredictable, because an arrow $A → B$ can intuitively look as either proving that $A$ feels smaller (eg. $ℤ → ℝ$) or larger ($ℤ → ℤ/(2)$) than $B$, but it works fine for many basic examples, ie. free constructions, reflections, and (co)limits in familiar categories.
For initial/terminal objects, I would suggest remembering the classical cliché that it's the arrows that matter, not objects. If you are thinking about a terminal object as a sort of "best approximation from above", then every arrow $A → T$ to the terminal object needs to be "the best" one, which in the presence of parallel arrows to $T$ is impossible. In $\mathrm{Set}$, every non-empty set is "above", but only the singletons can be so in an optimal way.
Of course at the end of the day it's the formal properties of the uniqueness requirement that really matter. If $η : C → R(LC)$ is the universal arrow from $C$ to $R : \mathscr D → \mathscr C$, then the existence of factorization makes the function $\mathrm{Hom}(LC, D) → \mathrm{Hom}(C, RD)$ given by $g ↦ Rg ∘ η$ surjective, while the uniqueness makes it injective, and I would say that the resulting isomorphism of functors $\mathrm{Hom}(LC, -) ≅ \mathrm{Hom}(C, R-)$ is the most important formulation of universality.