I'm taking a course in General Relativity, and before getting into Physics itself, the required knowledge of Differential Geometry is being taught.
In that context, a linear connection on a smooth manifold $M$ was introduced as a collection of maps $\nabla^{(r,s)} :\Gamma(TM)\times \Gamma(T^r_s M)\to \Gamma(T^r_s M)$ which takes a vector field $X$ and an $(r,s)$-tensor field $T$ and produce one $(r,s)$-tensor field $\nabla^{(r,s)}_ X T$. We, however, drop the $(r,s)$ and just denote all maps by $\nabla$. This map is required by the definition to satisfy:
It is $C^\infty(M)$-linear on the first entry, that is $X\mapsto \nabla_X T$ for fixed $T$ is $C^\infty(M)$ linear.
It is linear on the second entry, that is $T\mapsto \nabla_X T$ is linear for fixed $X$.
It obeys Liebnitz rule in the second entry, that is, $\nabla_X(T\otimes S)=(\nabla_ X T)\otimes S+ T\otimes (\nabla_X S)$ for fixed $X$.
It reduces to $X$ itself on $(0,0)$-tensors, that is, $\nabla_X f = Xf$ for fixed $X$ and $f\in C^\infty(M)$.
Now given a smooth manifold with linear connection $(M,\nabla)$ one can define the Riemann Curvature Tensor as the tensor field $R : \Gamma(T^\ast M)\times \Gamma(TM)\times \Gamma(TM)\times \Gamma(TM)\to C^\infty(M)$ given by
$$R(\omega,Z,X,Y)=\omega(\nabla_X \nabla_Y Z - \nabla_Y\nabla_X Z-\nabla_{[X,Y]}Z).$$
The problem is that this thing was just defined with no motivation whatsoever. It is then said that it gives the change in a vector as paralel transported along a loop formed by $X$ and $Y$. It is also said that this characterizes the curvature of $\nabla$.
My question here is: how can one derive this tensor? I mean, given that we have a connection $\nabla$ and we want to define its curvature, how can we derive this expression, and discover that this tensor field will do? I don't know even why it should be a tensor field, let alone follow some steps to arrive at the correct tensor field.
I just don't like the approach of "define this because it works". I want to be able to find out that this is the correct thing to do.
Since you're taking a general relativity course, let me answer this from the point of view of a physicist.
Have you heard of tidal forces in general relativity? The intuition is that, in a spacetime with gravity, a family of objects that are initially moving at the same velocity but are positioned slightly displaced from one another will deviate from one another as they move over time. Said another way, gravity is something that pulls objects apart. Think of astronauts being stretched like spaghetti when they fall into black holes.
To describe this physical intuition mathematically, let us parametrise the positions of our of family of object as $x^\mu(s,\tau)$, where $\tau$ is proper time and $s$ is a spacetime parameter that parameterises the various objects in our family (i.e. $s$ parametrises the different body parts of our poor astronaut). Assuming that these objects travel along geodesics, which is true of freely falling objects in general relativity, you can prove that the deviation of the paths of these various objects in our family is given by, $$\frac{D^2 x^\mu} {d \tau^2} = R^\mu_{\nu\rho\sigma} \frac{dx^\mu}{d\tau} \frac{dx^\rho}{d\tau} \frac {dx^\sigma}{ds}.$$ In other words, the Riemann curvature tensor $R^\mu_{\nu\rho\sigma}$ , as defined in your course, is precisely the thing that measures the strength of gravitational tidal forces! The bigger the Riemann curvature tensor, the more strongly neighbouring objects are pulled apart by gravity.
For a proof of this tidal force formula, look up "geodesic deviation" in wikipedia.
Now, rather than starting with your definition of the Riemann curvature tensor and showing that it obeys this tidal force property, why not turn the argument the other way round and define the Riemann curvature tensor to be the quantity that appears in this tidal force equation, which, after some calculation, turns out to agree with definition from your course?
Finally, you asked why it is that the Riemann tensor must be a tensor. The tensorial property is crucial if you're going to use this quantity in general relativity. The whole point of GR is to write equations of motion that look the same in any coordinate system, and tensors are the mathematical quantities that do this for you.