Now I seem to understand the construction of the tensor product of $R$-Modules by defining $$F_R(M \times N) := \bigoplus_{(m,n) \in M \times N} R\delta_{(m,n)}$$ and constructing the submodule $D$ in a manner such that $\otimes: M \times N \to M \otimes N = F_R(M \times N)\, / \, D$ is bilinear and thus by the universal mapping property, we can find $\ell \circ M = B$ where $M: (m,n) \to \delta_{(m,n)}$, $B: M \times N \to P$, $P$ being some $R$-module. And since we can show that $D \subseteq \ker{\ell}$ and $\ell$ is linear, then given $\otimes$, we may essentially disregard the submodule $D$ and have $\ell \circ \otimes = B$ and thus the tensor product admits a unique linear map $\ell$.
Now my question is as follows:
Does someone have a proof of why there exists a property such that this linear map must exist, I understand how everything works if this property is true. But why and when is it true?
My other question is why is it necessary to construct the tensor product at all? Why can't we just construct $F_R$ and use the linear map from it to $P$? Is it because we want the properties that come along with modding out $D$?
I'll assume $R$ is commutative so $M\otimes_RN$ is an $R$-module.
The elements of $M\otimes_RN$ look like sums of pure tensors of the form $m\otimes n$ with $m\in M,n\in N$, modulo the relations that describe bilinearity of the $\otimes$ symbol. The $R$-module structure of the tensor product is defined by $r(m\otimes n)=rm\otimes n$ (which equals $m\otimes rn$). In order to construct this module formally, we have to quotient the free module on the Cartesian product as we do.
One checks the map $\otimes:M\times N\to M\otimes_RN$ given by $(m,n)\mapsto m\otimes n$ is bilinear. Suppose we have any other bilinear map $f:M\times N\to P$ and we want to find out if it can be factored as
$$M\times N\xrightarrow{\otimes}M\otimes_RN\xrightarrow{\tilde{f}}P.$$
Necessarily $\tilde{f}(m\otimes n)=f(m,n)$, so $\tilde{f}$ is uniquely determined by $f$. One also checks that the map defined by this equality (namely $\tilde{f}(m\otimes n)=f(m,n)$) is a well-defined $R$-module morphism.
Why not just use the free module on the Cartesian product? Because it's big, unwieldy, it has too much redundancy. We want things as tight as possible; it's a sort of "minimal information needed to specify" situation. Any $R$-module homomorphism is determined by where it sends the elements in a chosen generating set. The generating set of $F_R(M\times N)$ is all of $M\times N$, and the information of $f(m,n)$ for all $(m,n)\in M\times N$ is way, way more information than we need. Somehow we need to make it tighter in order to remove all of the ugly redundancy. The tensor product achieves that; any bilinear map $f:M\times N\to P$ is determined by where it sends the elements of a generating set of the module $M\otimes_RN$.
Indeed, if one wants to find a universal "way station" through which all bilinear maps factor through, one needs this $R$-module to be generated by the images of all the $(m,n)\in M\times N$, and hence contain their sums as well, but must be subject to bilinearity since the map from $M\times N$ to this way station needs to be bilinear. This exactly characterizes what $M\otimes_RN$ is!
The tensor product, and its universal property, is extremely useful. Multilinear maps become linear maps, hom spaces can be rearranged with tensor-hom adjunction, scalars can be extended.