Lars Hormander, The Analysis of Linear Partial Differential Operators I, chapter II:
Let $X$ be an open set in $\mathbb {R}^n$. A distribution $u$ in $X$ is a linear form of $C_0^\infty(X)$ such that for every compact $K \subset X$ there exist constants $C$ and $k$ such that $$ |u(\phi)| \leq C \sum_{|\alpha| \leq k } {\rm sup} |\partial ^{\alpha} \phi|, \hspace{1cm} \forall \phi \in C_0^\infty (K) $$
My question is. How does this definition even come to mind? Is there a reason to define a distribution like that?
A locally convex space $X$ is a linear space with a topology generated by a family of seminorms. A seminorm is a function $p:X\rightarrow [0,\infty)$ with $p(x+y)\leq p(x)+p(y)$ but where $p(x) = 0$ does not necessarily imply that $x = 0$.
Now for such spaces, a linear functional $l:X\rightarrow \mathbb{C}$ is continuous if and only if
$|l(x)|\leq C\sum_{k\in 1}^{n}p_k(x)$
for some finite collection of seminorms which generate the topology. See for instance Theorem IV.3.1 in John B. Conways book on functional analysis.
The principle is the same as for normed spaces (which is a locally convex space generated by one seminorm), where the continuous linear functionals are those that satisfy
$$|l(x)|\leq C|x|$$ for all $x$.
Distributions are simply the continuous linear functionals acting on the space of test functions, with topology generated by the family of seminorms $\{p_\alpha\}$ where
$$p_{\alpha}(\phi) = \sup |\partial^\alpha \phi|.$$
For more on this one can have a look at Chapter IV in the book by John B. Conway