It is a known result that, given generically noncommuting operators $A,B$, we have $$ A^n B=\sum_{k=0}^n \binom{n}{k} \operatorname{ad}^k(A)(B) A^{n-k},\tag A $$ where $\operatorname{ad}^k(A)(B)\equiv[\underbrace{A,[A,[\dots,[A}_k,B]\dots]] $.
This can be proved for example via induction with not too much work.
However, while trying to get a better understanding of this formula, I realised that there is a much easier way to derive it, at least on a formal, intuitive level.
The trick
Let $\hat{\mathcal S}$ and $\hat{\mathcal C}$ (standing for "shift" and "commute", respectively) denote operators that act on expressions of the form $A^k D^j A^\ell$ (denoting for simplicity $D^j\equiv\operatorname{ad}^j(A)(B)$) as follows:
\begin{align} \hat{\mathcal S} (A^k D^j A^\ell) &= A^{k-1} D^j A^{\ell+1}, \\ \hat{\mathcal C} (A^{k} D^{j} A^\ell) &= A^{k-1} D^{j+1} A^{\ell}. \end{align} In other words, $\hat{\mathcal S}$ "moves" the central $D$ block on the left, while $\hat{\mathcal C}$ makes it "eat" the neighboring $A$ factor.
It is not hard to see that $\hat{\mathcal S}+\hat{\mathcal C}=\mathbb 1$, which is but another way to state the identity $$A[A,B]=[A,B]A+[A,[A,B]].$$ Moreover, crucially, $\hat{\mathcal S}$ and $\hat{\mathcal C}$ commute. Because of this, I can write
$$A^n B=(\hat{\mathcal S}+\hat{\mathcal C})^n (A^n B)=\sum_{k=0}^n\binom{n}{k} \hat{\mathcal S}^{n-k} \hat{\mathcal C}^{k}(A^n B),$$ which immediately gives me (A) without any need for recursion or other tricks.
The question
Now, this is all fine and dandy, but it leaves me wondering as to why does this kind of thing work? It looks like I am somehow bypassing the nuisance of having to deal with non-commuting operations by switching to a space of "superoperators", in which the same operation can be expressed in terms of commuting "superoperators".
I am not even sure how one could go in formalising this "superoperators" $\hat{\mathcal S},\hat{\mathcal C}$, as they seem to be objects acting on "strings of operators" more than on the elements of the operator algebra themselves.
Is there a way to formalise this way of handling the expressions? Is this a well-known method in this context (I had never seen it but I am not well-versed in this kinds of manipulations)?
To fix some notation, suppose that the operators $A$ and $B$ belong to a vector space $V$, and that we are working with strings of $m$ operators. (For example, in $A^k D^j A^l$ we have $m = k + j + l$). Rather than writing a product $A^k B^j$ as an element of $V$, we can instead write $$ A^{\otimes k} \otimes B^{\otimes j} = \underbrace{A \otimes \cdots \otimes A}_k \otimes \underbrace{B \otimes \cdots \otimes B}_j \in V^{\otimes m}$$ an element of the $m$th tensor power of $V$. We have a linear multiplication map $\mu: V^{\otimes m} \to V$, which is just composition of operators, so for example $\mu(A^{\otimes k} \otimes B^{\otimes j}) = A^k B^j$. So the idea will be to define $\hat{\mathcal{S}}$ and $\hat{\mathcal{C}}$ as linear operators $V^{\otimes m} \to V^{\otimes m}$, check that they commute and add to give the identity, and then finally apply them to a particular tensor $A^{\otimes n} \otimes B$, which will give an identity much like the one you're after. Applying the multiplication $\mu$ will then give the exact identity.
Defining the operators is not too hard. We can take $\hat{\mathcal{S}}, \hat{\mathcal{C}} : V^{\otimes m} \to V^{\otimes m}$ to be defined by the formulas $$ \begin{aligned} \hat{\mathcal{S}}(v_1 \otimes v_2 \otimes \cdots \otimes v_m) &= v_2 \otimes \cdots \otimes v_m \otimes v_1 \\ \hat{\mathcal{C}}(v_1 \otimes v_2 \otimes \cdots \otimes v_m) &= v_1 \otimes v_2 \otimes \cdots \otimes v_m - v_2 \otimes \cdots \otimes v_m \otimes v_1 \end{aligned}$$ We then check that these formulas do the right thing, for example we need to make sure that $\mu(\hat{\mathcal{C}}^k (A^{\otimes m-1} \otimes B)) = A^{m-k-1} D^k$ and so on.
With those definitions, it is easy to see that $\hat{\mathcal{C}} = \mathbb{1} - \hat{\mathcal{S}}$, and so they also commute, since $\hat{\mathcal{C}}\hat{\mathcal{S}} = \hat{\mathcal{S}}\hat{\mathcal{C}} = \hat{\mathcal{S}} - \hat{\mathcal{S}}^2$. So we can write $$ A^{\otimes n} \otimes B = (\hat{\mathcal{S}} + \hat{\mathcal{C}})^n (A^{\otimes n} \otimes B) = \sum_{k=0}^n \binom{n}{k} (\hat{\mathcal{S}}^{n-k} \hat{\mathcal{C}}^k) (A^{\otimes n} \otimes B)$$ and finally applying $\mu$ on both sides gives the formula you are after.