I've practiced G-J elimination and understand most of the algorithm insofar as it represents the different manipulations one can apply to a system of equations.
However, when we're talking about Reduced Row Echelon Form, I don't understand why the ones have to be specifically on the diagonal from top to bottom left to right and cannot be on the diagonal from bottom to top left to right instead (at least that's what I was taught).
Is there something about having the ones bottom to top left to right that would inherently break the algorithm, does it just make certain mistakes more probable, or is it just purely convention?
It's purely convention. It corresponds to having an expression for each variable only written in terms of later variables (so your expression for $x_2$ could depend on $x_3$, $x_{17}$, and so on, but it can never depend on $x_1$).
The key thing you want to do when you're simplifying a system of linear equations is get to a point where you don't have circular dependencies; if your expression for $x_1$ depends on $x_2$ and your expression for $x_2$ also depends on $x_1$, you haven't really made progress toward a solution. Arranging for each variable to depend only on later variables is a convenient way to stop that from happening, but of course arranging for each variable to depend only on earlier variables -- which is what your bottom-to-top diagonal approach would do -- would also work.
On the other hand, you'll probably make fewer mistakes if you do the same thing every time, and it's useful (especially in a class setting) for everyone to agree on an algorithm. And if you have to pick one convention, the "depending on later variables" convention is kind of nice in that it often leads to you reducing to something that looks like an identity matrix, which may be easier to remember and/or help to reinforce the idea that the identity matrix is important.