Background: I am back to an old hobby of mine: investigating the Mandelbrot set. This time I am doing it from a computational efficiency perspective. The Mandelbrot set can be defined through recursion or iteration:
$$\cases{z_0=0\\z_{i+1} = {z_i}^2 + c}$$
The Mandelbrot set is the set of points $c\in \mathbb C$ for which $z_k$ never diverge for any $k\in \mathbb Z$.
For some points this is easy to prove that they belong to the Mandelbrot set, but for many it is incredibly hard. We can paint pretty pictures if we for sets of $c$ make functions of the lowest number of iterations required to be sure that our point diverges.
This question concerns the iteration:
$$z_r+i\cdot z_i = (z_r+i\cdot z_i)^2 + c_r + i\cdot c_i$$
It is easy to expand using square law, but if we want to do several iterations in one go, our polynomials become increasingly complex.
We can verify either by hand or with for example wolfram alpha that after taking two iterations in one go we get 9 real terms and 7 imaginary ones and averaging 2-3 multiplications per term.
Even huger number of terms in the expanded polynomial trying to take 3 iterations in one go.
This polynomial expansion does not seem like a very attractive opportunity to save calculations to say the least.
Or maybe there is something I am missing when it comes to simplification?

Expanding and using an algorithm for general polynomials is not the way to proceed, as the Mandelbrot polynomials have a special form: I think the $z \to z^2+c$ iteration is optimal for $1$ processor, $d$ steps takes $2d$ operations. Horner's method takes $2^{d+1}$ operations in comparison, and is optimal for general polynomials on $1$ processor.
For more processors,
$T^*_\infty(n)$ is the number of time steps taken to compute a general polynomial of degree $n$ given unlimited number of processors, with preconditioning step not included. The paper states:
$$T^*_\infty(n) \le \log_2 n + O(1)$$
The number of time steps taken to compute the Mandelbrot set polynomial of degree $2^d$ by doing $d$ iterations of $z^2 + c$ takes $2 d$ time steps on $1$ processor; setting $n = 2^d$ gives:
$$T^*_\infty(n) = d + O(1)$$
So, expanding the polynomial and using a parallel algorithm with unlimited processors for general polynomials gets a small improvement in the number of time steps (a factor of $2$). The overall amount of work performed will likely be much higher, and the cost model ignores lots of real-world issues like memory access.