I've been working on this algorithm for awhile, and have made some progress, but have hit a stumbling block. For the cases I'm working on, square roots are guaranteed to exist (atleast trivial ones), so that's not an issue. No assumptions are made about prime/composite modulus. Since roots come in even and odd pairs, I try to find both.
If the number is an actual square, that can be returned immediately, at any point of the algorithm, though I'm debating finding the non-trivial square roots in those cases as well, it just hasn't been necessary yet. For the even side of the algorithm, even numbers have the form $4*n^2$, so I calculate $4^{-1} \pmod n$ and multiply, then try to find the even and odd roots of that number, returning both multiplied by $2$, no issues there.
On the odd side, it's a bit more complicated. Odd square numbers have the form $8*\frac{k*(k+1)}{2}+1$, where $n = 2*k+1$. By subtracting $1$ and multiplying by $4^{-1}$, we isolate $k*(k+1)$. For some numbers, it suffices to just calculate non-modular $(k*(k+1))^{1/2}$ and take the floor, but this only works if the value of $k$ is less than $n^{1/2}$. If $k$ is greater, you can add multiples of $n$ and take the non-modular square root of that number and keep testing if you found the correct $k$, but this takes the algorithm to a much higher complexity than I want to allow.
I'm trying a find a way to transform $k*(k+1)$ into a problem I can solve for $k$ efficiently, I just haven't had much luck. An algorithm to determine the multiple of $n$ that's needed to find the proper value of $k$ would also work if it's fast (logarithmic), but I haven't had luck with that either. Is there some obvious steps I could try that I'm missing here? An example number where this is actually important is trying to find $72^{1/2} \pmod{287}$, which has trivial roots of $143$ and $144$, but also has at least one non-trivial roots $102$ and $185$. To solve it, I have to find the odd square root of $18$, but $k*(k+1) = 76 \pmod{287}$, with actual value of $k = 25$ and square root therefore $51$.
I'm not very worried about finding all of the roots, so much as finding at least one non-trivial root if it exists. The specific numbers I'm finding non-trivial roots of are $4^{-1}$ for odd numbers (all odd numbers have trivial roots $\frac{n+/-1}{2}$, only composite odd numbers with multiple prime factors have non-trivial roots), since this can be used to find values for which $a^2 \equiv a \pmod{n}$, which gives us values for extended euclidean algorithm to find factors of $n$.
EDIT: So to follow up on this, I've solved the problems with the original algorithm. essentially what we're looking at are the different partitions of residues mod n. If we can find one non-trivial square root of any residue in the partition, we can find all of the rest using 2^t*root mod n, where t is the distance between the residues in the chain. The length of these chains is always a factor of totient n, which can sometimes be much smaller than totient n, not allowing enough numbers to check to find an 'easy' non-trivial root. Also, we can't usually just stop when finding a perfect square because it may still yield a trivial root after 2^t multiplication.
The next step if there simply isn't an easy non-trivial root is to go to the next odd square (1 being the 1st), and finding a non-trivial square of one of the numbers in that chain, so that you have the root of 4 times an odd square and multiply by the inverse of the odd square. This obviously means the inverse must exist, but if you can't find an inverse, you already know that it shares a factor with n and can reduce n to get a number which works. The first 4 numbers for which a non-trivial root cannot be found in the first chain are 205, 217, 511, and 513. Because the lengths of the chains are factors of totient n (actually Carmichael function of n), prime numbers will have a very long running time to actually prove them prime, essentially n/2, since that's how many quadratic residues there are mod n.
But in practice for composite numbers, you don't actually need to go through every quadratic residue, since there are atleast sqrt(n) numbers which will yield a non-trivial root. And there may be a cutoff point where a number can be said to be deterministically prime/prime power (or atleast with a high degree of confidence) without having to perform every single iteration of modular multiplication. I'm trying to find hard numbers for the probability that a given chain contains or does not contain a non-trivial root, since the complexity rises to sqrt(n)*log(n)^c in the worst case (prime/prime power) but the average case is usually log(n)^c (actually substantially less than this).