Given two (1-)categories $\mathcal{C}, \mathcal{D}$, and given the 0-category (class) of funtors $\mathcal{C} \to \mathcal{D}$, denoted $Func(\mathcal{C} \to \mathcal{D})$, let's say we want to make some choice of (1-)category denoted $[[\mathcal{C}, \mathcal{D}]]$ such that $Ob([[\mathcal{C}, \mathcal{D}]]) = Func(\mathcal{C} \to \mathcal{D})$.
Usually we choose $[[\mathcal{C}, \mathcal{D}]]$ such that $Mor([[\mathcal{C}, \mathcal{D}]])$ are natural transformations. But imagine we're from a universe where no one has discovered the concept of natural transformation, and so we start only with the definitions of category and functor.
To reflect the structure of $Ob([[\mathcal{C}, \mathcal{D}]]) = Func(\mathcal{C} \to \mathcal{D})$, whose elements act on both objects and morphisms of $\mathcal{C}$, we can ask that for every $\eta \in Mor([[\mathcal{C}, \mathcal{D}]])$, with $\eta: F \to G$ for some functors $F, G \in Func(\mathcal{C} \to \mathcal{D})$, one has that $\eta$ sends every object (0-morphism) $c$ of $\mathcal{C}$ to a (1-)morphism $h:F(c) \to G(c)$, $h \in Mor(\mathcal{D})$ and every (1-)morphism $f: X \to Y$ of $\mathcal{C}$ to a morphism of morphisms ("2-morphism") $F(f) \to G(f)$, and to do so functorially.
So to get a choice of $Mor([[\mathcal{C}, \mathcal{D}]])$, we choose some category $Arr(\mathcal{D})$ ("arrows of $\mathcal{D}$") such that $Ob(Arr(\mathcal{D})) = Mor(\mathcal{D})$, and then choose elements of $Mor([[\mathcal{C}, \mathcal{D}]])$ to be functors $\mathcal{C} \to Arr(\mathcal{D})$ satisfying the above consistency conditions.
In other words, if we accept the above requirement, then to make some choice of definition for a "functor category" $[[\mathcal{C}, \mathcal{D}]]$, it suffices to make a choice of definition of "category whose objects are morphisms in $\mathcal{D}$".
Question: What is so special about the standard choice of $Arr(\mathcal{D})$, where the morphisms (of morphisms) are commutative squares in $\mathcal{D}$?
Can we explain the "specialness" without invoking natural transformations? (To avoid circular justification, because we are trying to use this choice to justify the choice of natural transformations as morphisms between functors.)
The choice of commutative squares as "morphisms of morphisms in $\mathcal{D}$" leads to (in the manner described above) the standard definition of functor category whose morphisms are natural transformations.
Note that the objects of $Arr(\mathcal{D})$ can be identified with functors $\mathbb{2} \to \mathcal{D}$ (where $\mathbb{2}$ denotes the "walking arrow category". So an answer to this question that would not be accepted is, "the morphisms of 'the' functor category $[[\mathbb{2}, \mathcal{D}]]$ work out to be commutative squares", because that answer a priori assumes that the morphisms we should choose for $[[\mathbb{2}, \mathcal{D}]]$ are natural transformations.
On the plus side, the above argument shows that a converse construction also holds, i.e. that to make a choice of definition of "category whose objects are morphisms in $\mathcal{D}$", it suffices to make a choice of definition of functor category $[[\mathcal{C}, \mathcal{D}]]$ for arbitrary $\mathcal{C}$ (and in particular $\mathcal{C} = \mathbb{2}$).
Note: For example, if we choose the morphisms of $Arr(\mathcal{D})$/$[[\mathbb{2}, \mathcal{D}]]$ to be invertible commutative squares, we get natural isomorphisms (as opposed to arbitrary natural transformations) as our morphisms of functors (and a groupoidal version of standard functor categories). Similarly, if we choose the morphisms of $Arr(\mathcal{D})$/$[[\mathbb{2}, \mathcal{D}]]$ to be the dual of the standard choice, then we should get (I think) contravariant natural transformations. So the standard choice is not the only consistent choice.
This question is long, so I've put my two guesses so far (defining whiskerings, making $Cat$ Cartesian closed) as a community wiki "answer" below. Related questions: (1) (2) (3) (4) (5)
Guess 1: We need the morphisms of $Arr(\mathcal{D})$/$[[\mathbb{2}, \mathcal{D}]]$ to in some way be "expressible" in terms of the objects, so that we can define "whiskerings". (Compositions of "1-morphisms" with "2-morphisms" to create new "2-morphisms".)
Commutative squares obviously allow this. But are they a necessary choice for this purpose?
Also this is somewhat unsatisfying because apparently whiskerings can be used to axiomatize strict 2-categories, which were defined to describe the properties satisfied by natural transformations.
So the justification in terms of whiskerings seems to roughly reduce to "we should choose natural transformations because we want to use them to define whiskerings because we want to use them to define natural transformations", i.e. it seems circular.
On the other hand, if there is really some other reason to believe (ideally defined solely in terms of functors and 1-categories) why the conventional whiskering properties in the definition of strict 2-category are important / reasonable properties to demand of a definition of "2-morphism", regardless of whether that definition of "2-morphism" eventually leads to the definition of "natural transformation" or not, then maybe the justification isn't circular. It's not clear that any exists however.
Saying that the whiskering properties are necessary to get a category "enriched over $Cat$" isn't an acceptable answer, because enrichment is defined in terms of monoidal categories, and monoidal categories are defined in terms of natural transformations. (Although seemingly we can avoid this by considering only strict monoidal categories?)
Guess 2: We want the standard definition of $Arr(D)$ because the resulting definition of functor category makes the category of (small) categories into a Cartesian closed category. (Cf. the answers to these two related questions, here and here.)
Are commutative squares only sufficient for that purpose? Or are they necessary?
Also, what if we are happy to work with categories (like topological or measurable spaces) that aren't Cartesian closed? Then why should we care whether the category of small categories of Cartesian closed? Would it still make sense for natural transformations to be considered a more fundamental notion than other choices of morphism for functor categories?
Using the subset notation for potentially proper classes, I argued above why we might choose definitions such that $Mor([[\mathcal{C}, \mathcal{D}]]) \subseteq Ob([[\mathcal{C}, Arr(\mathcal{D}]]) \cong_{0-Cat} Ob([[\mathcal{C}, [[\mathbb{2}, \mathcal{D} ]] ]])$.
Making the same identification of morphisms with functors from $\mathbb{2}$ that we made when identifying $Arr(\mathcal{D})$ with (or defining as) $[[\mathbb{2}, \mathcal{D} ]]$, we also have $Mor([[\mathcal{C}, \mathcal{D}]] \cong_{0-Cat} Ob([[\mathbb{2}, [[\mathcal{C}, \mathcal{D}]] ]])$.
So we are always identifying (as 0-categories / classes, i.e. via bijections) $Ob([[\mathbb{2}, [[\mathcal{C}, \mathcal{D}]] ]])$ with some subclass of $Ob([[\mathcal{C}, [[\mathbb{2}, \mathcal{D} ]]]]$, regardless of how we choose to define morphisms of functor categories.
Then perhaps for reasons of aesthetics / parsimony / "beauty" we might want to focus on definitions of morphism categories which would additionally allow an equivalence in terms of 1-category structure (i.e. via invertible functors), and that this would be largely the same as being Cartesian closed, and achieved by the standard definitions in terms of commutative squares / natural transformations. But this still doesn't resolve the issue of uniqueness.