My question is about distributional meanings in PDEs. My specific question is at the bottom, but I'd be interested in a bit of general theory (even if this is a link to a particularly good set of notes). In my lectures, not much is said about them and I don't exactly understand what's going on!

(I believe that $|x|^{-1}$ is $\| x \|^{-1}$, not $|x_1|^{-1}$; this makes sense in the latter part of the question.)
I think that, given $T \in \mathcal S'(\Bbb R^3)$ and letting $f:\Bbb R^3 \rightarrow \Bbb R$ via $f(x) = \| x \|^{-1}$, we obtain $$-T(\Delta f) = 4 \pi \delta_0 \ \iff T({2 \over {\|x\|^3}}(3\|x\| + 4) = 4 \pi \delta_0,$$ by taking the Laplacian $\Delta$ of $f$. Is this correct?
If it is, then I still don't know how to do the next bit. By taking the divergence of $R_\epsilon := B_R \setminus B_\epsilon$, we should get $0$ on the LHS for all $\epsilon > 0$, but then $4\pi$ in the limit; how do I take the divergence with the $T$ there? Do I use the integral notation and then try using one of Green's identities?
In various branches of mathematics it turns out to be profitable to shift attention from what some objects are to what they do. In case of functions, we shift attention from (locally integrable) functions $f$ being some set of ordered pairs, etc to what it does: it defines a linear functional on the set of test functions, by
$$\varphi \mapsto \int f\varphi \tag1$$ When a derivative $f'$ exists, it also acts on test functions: $$\varphi \mapsto \int f'\varphi = -\int f\varphi' \tag2$$ Then we realize that the right hand side in (2) makes sense for any $f$, even discontinuous or nondifferentiable ones. So, even though we may not have a concrete picture of what $f'$ is in such cases, we can still say what it does to test functions.
Standard example is $f$ being the Heaviside function. For any test function $\varphi$ we have $$-\int f\varphi' = -\int_0^\infty \varphi' = \varphi(0)$$ This is written simply as $f'=\delta_0$, with $\delta_0$ being the name of the linear functional $\varphi\mapsto \varphi(0)$.
Similarly for the Laplacian: $\Delta f$ acts on test functions by $\varphi \mapsto \int f\Delta \varphi$. Another way to write this down: $\Delta f=T$ where $T$ is the distribution defined by $T(\varphi)=\int f\Delta \varphi$.
When you write $T(\Delta f) $, this is already a confusion. We don't plug $\Delta f$ into distributions, $\Delta f$ is a distribution, into which we plug test functions.
I don't understand where ${2 \over {\|x\|^3}}(3\|x\| + 4)$ came. Must be a computational mistake. The function $|x|^{-1}$ (yes, $|\cdot|$ means Euclidean norm; you can write it as $\|\cdot\|$ if you want) has zero Laplacian for $x\ne 0$, in the classical sense.
The calculation of $\Delta(|x|^{-1})$, following the hint in your question, is shown here (the answer talks about measures, but you can read it substituting "distribution" for measure")
As for a reference, Folland's Real Analysis has a chapter on distributions.