I have searched high and low for an explanation on the general formula for Gradient, Divergence and Curl in Orthogonal Curvelinear Coordinates and I haven't found one which is both mathematically rigorous and doesn't require background in tensors.
I was wondering if anyone could link or explain the full derivation of the Divergence which doesn't require background in tensor analysis?
You ask for rigour, but do not give your background, so it is hard to estimate a suitable level of detail for you. I give a reasonably lengthy preamble to make sure that we are on the same page, as it were.
A curvilinear coordinate system is an injective smooth$^*$ map $ (u_i) \mapsto \mathbf{x}(u_i) $, taking $u$ in an open subset $U \subset \mathbb{R}^n$ to $\mathbf{x} \in \mathbb{R}^n$. $(u_i)$ are called the coordinates of a point. The tangent space at a point is the vector space of tangent vectors to curves in $\mathbb{R}^n$ passing through the point, which curves can be specified by parametrising the coordinates in $U$. Thus the vectors $$ \mathbf{E}_i = \frac{\partial \mathbf{x}}{\partial u_i} $$ span the tangent space (and if the above was not in your mathematical vocabulary, these basis vectors are a reasonable place to start: it is not trivial that the tangent space is a vector space, but the proof is both standard and irrelevant to the question).
$^*$ Strictly we can manage with much less, but the derivative of a smooth function is smooth, which is not the case for less restrictive function spaces.
The basis $\mathbf{E}_i$ and its variation from point to point tell us almost everything we want to know about the coordinate system. Firstly, the coordinates are called orthogonal if $\mathbf{E}_i \cdot \mathbf{E}_j = 0 $ when $i \neq j$ (here we use the standard inner product in $\mathbb{R}^n$, which we can get away with in this case). When this occurs we write $h_i = \sqrt{\mathbf{E}_i \cdot \mathbf{E}_i}$ for the scale factors, and define the normalised basis vectors by $\mathbf{E}_i = h_i \mathbf{e}_i$. (At this point we note that if we do this, we are forced to not use summation convention.) Thus a vector field (a smoothly varying function assigning each $\mathbf{x}$ a tangent vector) may be written as $\mathbf{F} = \sum_i F_i \mathbf{e}_i$, where $F_i = \mathbf{e}_i \cdot \mathbf{F}$.
Gradient
The gradient is the vector dual to the linear map on vectors given by the directional derivative of a function, $$ (\nabla f(\mathbf{x})) \cdot \mathbf{v} = df_{\mathbf{x}}(\mathbf{v}) = \left.\frac{d}{dt}\right|_{t=0} f(\mathbf{x}+t\mathbf{v}). $$ The advantage of this definition is that is independent of any particular coordinate system. Inter alia, this mean that it can be used to derive an expression for the gradient in a particular coordinate system. We have by the multivariate chain rule $$ \frac{\partial f}{\partial u_i} = (\nabla f) \cdot \frac{\partial \mathbf{x}}{\partial u_i} = (\nabla f) \cdot (h_i \mathbf{e}_i), $$ and rearranging, we find that $$ (\nabla f) \cdot \mathbf{e}_i = \frac{1}{h_i} \frac{\partial f}{\partial u_i}, $$ and as noted above, since the $e_i$ are orthonormal, this gives $$ \nabla f = \sum_i \frac{1}{h_i} \frac{\partial f}{\partial u_i} \mathbf{e}_i. $$
Divergence
$\renewcommand{\div}{\operatorname{div}}$ There are several approaches one can take to the divergence. The first is to use the definition that $$ \div{\mathbf{F}}(\mathbf{x}) = \lim_{\substack{\operatorname{volume within}{S} \to 0 \\ \mathbf{x} \text{ inside } S}} \frac{1}{\operatorname{volume within}{S} } \int_S \mathbf{F} \cdot d\mathbf{S} $$ for a "reasonable" set of hypersurfaces $S$: commonly closed balls or hypercuboids are taken (the latter giving the Cartesian expression as a sum of $\partial F_i/\partial x_i$), but any set of compact rectifiable hypersurfaces shrinking to $\mathbf{x}$ will do.
As may be apparent, however, while this definition is coordinate-free, it is sometimes difficult to specify the surface in a way that makes the calculation easy. Instead, we can exploit some other coordinate-free identities to give a much simpler derivation (and better mnemonic) of the expression for the divergence. We have the identity $$ \div{(\phi\mathbf{F})} = \nabla \phi \cdot \mathbf{F} + \phi \div{\mathbf{F}}, $$ and rearranging and integrating over a volume $V$ with smooth boundary $\partial V$ gives $$ \int_V \nabla \phi \cdot \mathbf{F} \, dV = \int_V (\div{(\phi\mathbf{F})} - \phi \div{\mathbf{F}} ) \, dV = \int_{\partial V} \phi\mathbf{F}\cdot d\mathbf{S} - \int_V \phi \div{\mathbf{F}} \, dV $$ for any smooth function $\phi$, where the second equality uses the divergence theorem. We can choose $\phi$ to be supported on a compact set in the interior of $V$, in which case the surface integral is zero, so $$ \int_V \nabla \phi \cdot \mathbf{F} \, dV = - \int_V \phi \div{\mathbf{F}} \, dV $$ If we can find the volume element in our coordinate system, we know everything on the left and most of the stuff on the right. Because the coordinates are orthogonal, $\{e_i\}$ is an orthonormal basis, so one finds the volume element is $h_1 h_2 \dotsm h_n \, du_1 \, du_2 \dotsm du_n$ using the Jacobian. So writing the left-hand side in coordinates, $$ \int_V \sum_i \frac{1}{h_i} \frac{\partial \phi}{\partial u_i} F_i h_1 h_2 \dotsm h_n \, du_1 \, du_2 \dotsm du_n, $$ we can now use ordinary integration by parts to shift the derivatives off $\phi$. Since $\phi$ is only nonzero on the interior of $V$, again the boundary terms vanish and we obtain $$ -\int_V \phi\sum_i \frac{\partial}{\partial u_i} \left(\frac{h_1 h_2 \dotsm h_n}{h_i} F_i \right) du_1 \, du_2 \dotsm du_n. $$ In coordinates, the right hand side is $$ \int_V \phi \div{\mathbf{F}} h_1h_2\dotsm h_n \, du_1\, du_2 \dotsm du_n, $$ so we now have both sides with no derivatives on $\phi$. This means that we can apply the fundamental lemma of the calculus of variations
We find therefore that $$ h_1h_2\dotsm h_n \div{\mathbf{F}} = \sum_i \frac{\partial}{\partial u_i} \left(\frac{h_1 h_2 \dotsm h_n}{h_i} F_i \right), $$ or if we write $h=h_1h_2\dotsm h_n$, $$ \div{\mathbf{F}} = \frac{1}{h} \sum_i \frac{\partial}{\partial u_i} \left(\frac{h}{h_i} F_i \right). $$ This method emphasises that the negative of the divergence is the adjoint of the gradient in the inner product $\int_V \mathbf{F} \cdot \mathbf{G} \, dV$.
Curl
$\DeclareMathOperator{\curl}{curl}$ Curl only exists in 3 dimensions, and is defined by $$ \mathbf{v} \cdot \curl{\mathbf{F}} = \lim_{\operatorname{area within}{\gamma} \to 0} \frac{1}{\operatorname{area within}{\gamma} } \int_{\gamma} \mathbf{F} \cdot d\mathbf{l}, $$ where $\gamma$ is a rectifiable curve lying in the surface perpendicular to $\mathbf{v}$ and $\mathbf{x}$ is inside $\gamma$.
One has (at least) the same two options here: either calculate with this definition directly using the image of a rectangle with sides along which only one coordinate varies, or exploit another vector calculus identity, such as $$ \curl{(\phi \mathbf{F})} = \nabla \phi \times \mathbf{F} + \phi \curl{\mathbf{F}}, $$ integrated over an open connected subset $S$ of a surface on which one coordinate is constant. One uses Stokes's theorem this time to write $$ \int_S \nabla \phi \times \mathbf{F} \cdot d\mathbf{S} = -\int_S \phi\curl{\mathbf{F}} \cdot d\mathbf{S} $$ for a $\phi$ compactly supported on the interior of $S$. The surface element can be computed in the usual way using the cross product of the tangent vectors of the parametrisation, and one again applies integration by parts and the Fundamental Lemma to derive the expression for each component of $\curl{\mathbf{F}}$ separately; this is considerably aided by the coordinates being orthogonal, so that $\mathbf{e}_i$ is a unit normal to the surface $u_i=\text{const}$. We find $$ \curl{\mathbf{F}} = \sum_i \mathbf{e}_i \epsilon_{ijk} \frac{1}{h_jh_k} \frac{\partial}{\partial u_j} (h_k F_k). $$