I've just started to study modular forms and I was wondering about how one would motivate the definition.
I agree that $f\left( \frac{az + b}{c z + d} \right) = (cz + d)^k f(z)$ is an interesting property, although I do not quite see where it arises from. But I find the conditions that $f$ is holomorphic on $\mathbf{H}$ or that $f$ is holomorphic at the cusp a bit confusing.
Why wouldn't I assert that $f$ is holomorphic on the whole complex plane? And what is the motivation for being holomorphic at the cusp?
There are two different (but related!) contexts in which modular forms arise. Let me briefly describe both of them.
Homogeneous functions on lattices. Consider the family $\mathcal L$ of all lattices $\Lambda\subseteq\mathbb C$. A function $F:\mathcal L\to\mathbb C$ is called weight $k$ homogeneous if we have, for any $\alpha\in\mathbb C^\times$ (i.e. we exclude $0$) and $\Lambda\in\mathcal L$, $$F(\alpha\Lambda)=\alpha^{-k}F(\Lambda).$$ For any two linearly independent $\omega_1,\omega_2$ we may consider a lattice $\Lambda = \omega_1\mathbb Z\oplus\omega_2\mathbb Z$ and then define a new function, $F(\omega_1,\omega_2):=F(\omega_1\mathbb Z\oplus\omega_2\mathbb Z)$. This function is also homogeneous, namely $$F(\alpha\omega_1,\alpha\omega_2)=\alpha^{-k}F(\omega_1,\omega_2).$$ Swapping $\omega_1,\omega_2$ if necessary we may always assume that $\omega_1/\omega_2$ has positive imaginary part (see below for some further explanation). Taking $\alpha=1/\omega_2$ above we see that the behavior of the function $F$ is completely determined by its action on lattices one basis element of which is $1$. Thus it is quite natural to consider the function $$f(\tau)=F(\tau,1),$$ where $\tau=\omega_1/\omega_2$. This $f$ can't be any function -- many different bases give rise to the same lattice, but different $\tau$. Let's see what this gives us.
Any basis of the lattice $\tau\mathbb Z\oplus\mathbb Z$ has the form $(a\tau+b,c\tau+d)$, where $\left(\begin{matrix} a & b \\ c & d\end{matrix}\right)$ is an integer matrix of determinant $\pm 1$. In fact, the condition that $(a\tau+b)/(c\tau+d)$ has positive imaginary part is equivalent to that determinant being positive. Plugging things in, we get that $f$ should satisfy, for any $\left(\begin{matrix} a & b \\ c & d\end{matrix}\right)\in\mathrm{SL}_2(\mathbb Z)$, $$f(\tau)=F(\tau,1)=F(\tau\mathbb Z\oplus\mathbb Z)=F((a\tau+b)\mathbb Z\oplus(c\tau+d)\mathbb Z)=F(a\tau+b,c\tau+d)=(c\tau+d)^{-k}F\left(\frac{a\tau+b}{c\tau+d},1\right)=(c\tau+d)^{-k}f\left(\frac{a\tau+b}{c\tau+d}\right).$$ This is precisely the weight $k$ condition for modular forms! Such functions $f(\tau)$ are called weakly modular. To get modular forms, we require they are holomorphic on the halfplane (natural assumption) and at the cusps (not easy to justify from this approach).
Here it is also rather clear why it is not natural to expect those functions to be defined on the whole complex plane -- the parameter $\tau$ is supposed to be a lattice generator together with $1$. However, when $\tau\in\mathbb R$, $\tau$ and $1$ don't generate a lattice! Hence there is some kind of a "singularity" happening on the real line. As for the other half-plane, we could extend the definition there easily: $f(\overline\tau)=\overline{f(\tau)}$. However, we then get a function on a disconnected domain, which often brings up problems when trying to apply results of complex analysis. Thus restricting to just one connected component (one half-plane) is much simpler.
Differentials on the modular curve. (this part is less elementary, but does give a very useful point of view) The matrix group $\mathrm{SL}_2(\mathbb Z)$ acts on the upper halfplane by $\left(\begin{matrix} a & b \\ c & d\end{matrix}\right)\tau=\frac{a\tau+b}{c\tau+d}$ (this can be seen as coming from lattices again, but we needn't refer to functions on lattices for this, e.g. by just asking when do $\tau,1$ and $\tau',1$ generate similar lattices?). We can not construct the quotient of the halfplane by this action -- essentially, we pick, from every orbit, a single point and construct a new object from it. This object is a so called Riemann surface, meaning e.g. we can consider holomorphic functions defined on it. This surface is usually denoted by $Y$.
This surface, however, feels incomplete, in a way -- it has "missing points", as if it was a punctured sphere. This is where cusps come in: by introducing points lying at all rationals points and also at $\infty$ into the above action. Now, when we quotient the halfplane with cusps by this action, we get a surface $X$ which has these missing points filled in. We call it the modular curve. It is then a compact, connected space.
We would like to say that modular forms correspond to holomorphic functions on $X$. However, for that we would want these functions to be invariant, i.e. satisfy $f\left(\frac{a\tau+b}{c\tau+d}\right)=f(\tau)$. This isn't exactly the condition for modular forms (unless $k=0$). Instead, it turns out that modular forms correspond to different objects on $X$, namely differentials (which are like higher-order differential forms). We denote them by $f(z)(\mathrm{d}z)^l$. The difference from plain functions on $X$ is how they behave under the change of coordinates: writing $z=g(w)$ for some holomorphic functions $g$, we expect to have $$f(z)(\mathrm{d}z)^l=f(g(w))(\mathrm{d}g(w))^l=f(g(w))(g'(w)\mathrm{d}w)^l=f(g(w))g'(w)^l(\mathrm{d}w)^l.$$ In particular, for $g(w)=\frac{aw+b}{cw+d}$ with $\left(\begin{matrix} a & b \\ c & d\end{matrix}\right)\in\mathrm{SL}_2(\mathbb Z)$, we have $g'(w)=(cw+d)^{-2}$, hence $$f(z)(\mathrm{d}z)^l=(cw+d)^{-2l}f\left(\frac{aw+b}{cw+d}\right)(\mathrm{d}w)^l.$$ If we want this object to be defined on $w$, we would like the above to be invariant under change $w\mapsto z$. Thus we would like $$f(w)(\mathrm{d}w)^l=(cw+d)^{-2l}f\left(\frac{aw+b}{cw+d}\right)(\mathrm{d}w)^l,$$ i.e. $f(w)=(cw+d)^{-2l}f\left(\frac{aw+b}{cw+d}\right)$, which is again the modular form condition, but for weight $2l$. Hence, at least for even weights, we can interpret modular forms as analytic objects defined on a certain Riemann surface. This also justifies the desire to be holomorphic at the cusps -- this turns out to be simply the condition that this function is holomorphic at one of the points of the curve.
I will just briefly mention that this point of view lets us use tools of algebraic geometry like Riemann-Roch theorem, which, for example, lets us exactly compute the dimension of the space of modular forms.