I read an answer to a question in Physics.StackExchange about how to perturbatively compute $\sqrt{A+B}$ where $A,B$ are matrices, $A$ is positive definite and $B$ is "sufficiently small and smaller than $A$". In the original question, the poster assumed $A$ is diagonal, the answer does not assume this.
Using Taylor expansion the answer's poster wrote the following expression as an answer: $$ \sqrt{A+B} = \sqrt{A} + \sum_{n=1}^{\infty}{\binom {1/2} n} \left( (L_B + \mathrm{ad}A )^{n-1} B \right) A^{\frac{1}{2}-n} $$ where $L_B(X):= BX$ and $\mathrm{ad}A(B) := [A,B] = AB-BA$. Also, $$ \binom \alpha k = \frac{\alpha^{\underline k}}{k!} = \frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-k+1)}{k(k-1)(k-2)\cdots 1} \quad\text{for } k\in\mathbb{N} \text{ and arbitrary } \alpha. $$
Other than stating it is based on a Taylor expansion at $X_0 = A$ the poster doesn't provide details how to derive this formula (e.g. why the ad operator appears exactly in that form). Without proof, I suspect the above formula may be incorrect.
Moreover, after understanding this formula I wish to implement it in a computer code, particularly in Python. This is not trivial since $L_B$ and $\mathrm{ad}A$ do not commute. So I will be happy to get tips on that as well.
Edit: I found a way to implement this formula for general $n$ using recursion (or equivalently, recursive loop computing one term after another, each iteration applying $(L_B + \mathrm{ad}A)$ on the previous iteration's result).
However, for some matrices (in particular, when $A$ and $B$ do not commute) the expression does not converge, and even worse the error term diverges (and even explodes). It occurs even for matrices $A$ with condition number of 36 and determinant of around 3. The matrices are generated randomly as follow (in Python):
N = 2
np.random.seed(seed=1) # fix a seed
X = np.random.rand(N,N) # generate random matrix ~ U(0,1)
A = X @ X.T # multiplying X by X transposed
A = 10*A/np.linalg.norm(A) # normalize A and make it "big enough"
Y = np.random.rand(N,N) # generate random matrix ~ U(0,1)
B = Y @ Y.T # multiplying Y by Y transposed
B = 0.01*B/np.linalg.norm(B) # normalize B and rescale to be sufficiently small
The seed is set in order to have the same matrices A,B each time for debugging. These matrices seem to be well-conditioned and positive-definite, so there should be no reason for the algorithm (Taylor expansion around $A$) to diverge. Why does it happen?
Edit 2: I think the formula $$ \sqrt{A+B} = \sqrt{A} + \sum_{n=1}^{\infty}{\binom {1/2} n} \left( (L_B + \mathrm{ad}A )^{n-1} B \right) A^{\frac{1}{2}-n} $$ is incorrect. First, the total power of $A$ is $1/2 - n + n-1 = -1/2$ while in the scalar Taylor expansion is $1/2 - n$. Second, I checked it numerically in Python.
So what is the correct formula?