Speedup achived by the use in many cases a square factor.
Notes: polynomial not given by the data, but constant, prepared earlier, may be useful in the evaluation of Chebyshev. To be profitable, the polynomial must be very high degree, otherwise the tests will show that Horner faster.
I don't know where I got the information about this algorithm from.
Calculation of polynomials
$p_n(x)=a_nx^n+a_{n-1}x^{n-1}+...a_1x+a_0$
Horner schema: $pn(x)=x*(x*(...(x*(a_{n}x+a_{n-1})+a{n-2})+...+a_2)+a_1)+a_0$
Horner schema – algorithm:
$w_n = a_n;$
$w_k = w_{k+1}*x+a_k (k = n-1, n-2,...0)$
cost: n+1
code:
w := A[n];
for k:=n-1 downto 0 do
w:=w*x+A[k];
There are faster formulae that can reduce the theoretical cost of calculations almost twice as fast. At first we bring the polynomial to the form for which the coefficients at the two highest powers are equal to one. Then we divide the polynomial by the square factors of the form $y^2-a_i$
Let us assume that the $p_n(x)$ polynomial has at its highest power a coefficient equal to one, i.e. it is a form: $p_n(x) = x_n+a_{n-1}x^{n-1}+a_{n-2}x^{n-2}+...a_{1}x+a_0$
after substitution $y = x + (a_{n-1}-1)/n$ ===> $x = y - (a_{n-1}-1)/n$
we'll get $q_{n}(y)$
$q_{n}(y) = y^n+y^{n-1}+b_{n-2}y^{n-2}+...b_{1}y+b_{0}$
If the given polynomial has at its highest power a coefficient different from unity, i.e. it is a polynomial $w_{n}(z)$ form:
$w_{n}(z) = c_{n}z^{n}+c_{n-1}z^{n-1}+c_{n-2}z^{n-2}+...c_{1}z+c_0$
then, in order to get $p_{n}(x)$ you either need to divide the whole polynomial by $c_n$ , or substitute
$x = Sqrt^{n}(c_{n})*z$ ===> $z = x/ Sqrt^{n}(c_n)$
depending on whether you want to scale the result or the argument of the function.
$q_{n}(y) = y^n+y^{n-1}+b_{n-2}y^{n-2}+...b_{1}y+b_{0}$
dividing $q_{n}(y)$ by $y^2-a_1$ we get for a certain $a_1$
$q_{n}(y) = (y^2-a_1)*( c_{n-2}*y^{n-2}+ c_{n-3}*y^{n-3}+c_{n-4}*y^{n-4}+...+c_1*y+c_0)+\nu_{1}y+\beta_1$
We want to choose such $a_1$ that after dividing the polynomial by $(y^2-a_1)$ we get a polynomial that will also have unity coefficients at the highest two powers and the rest of the polynomial division by $(y^2-a_1)$ will be a single number.
$q_{n}(y) = (y^{2}-a_{1})*(y^{n-2}+y^{n-3}+c_{n-4}*y^{n-4}+...+c_1*y+c_0)+\beta1$
where
$c_{n-2}=c_{n-3}=1 oraz \nu_1=0$
The recursive formula on $c_j$ has a form:
$c_j=b_{j+2}+a_{1}*c_{j+2}$ (j=n-4,n-5,....-2)
whereby $c_{-1} = \nu_{1}, c_{-2} = \beta_1$
Whereas $\nu_1 = 0$ we present $\nu_1$ as an expression dependent on $a_1$ and $b_i$
$\nu_1 = c_{-1} = b+a_{1}*c_{1} = b_{1}+a_{1}*(b_{3}+a_{1}*c_{3})$ =
$b_1+a_1*b_3+a_1^2(b_3+a_1*c_5)$ = $b_{-1}+a_1*b_3+a_1^2*b_5+...+a_n^{r-1}*b_{2r-1}+a_1^r$
(2r = n-2 for n even or n-1 for n odd substituting $\nu_1=0$, we obtain for $a_1$ equation: $b_1+a_1*b_3+a_1^2*b_5+...+a_n^{r-1}*b_{2r-1}+a_1^r = 0$
Let us assume that it has the real root, let $a_1$ be the root
then we'll get the decomposition
$q_{n}(y)=(y^n-a_{1})*(y^{n-2}+y^{n-3}+c_{m-4}*y^{m-4}+...c_{1}*y+c_0)+\beta_1$
dividing the polynomial in parentheses, if we find the root again, we will get
$q_n(y)=(y^2-a_1)*((y^2-a_2)*(y^{n-4}+y^{n-5}+d_{n-6}*y^{n-6}+...+a_0)+\beta_2)+\beta_1$
and when there's no root
$q_n(y) = (y^2-a_1)*(y*(y^{n-3}+y^{n-4}+....c_1)+c_0)+\beta_1$
When we always have found the element $a_i$, then $q_n(y)$ has a distribution:
$q_n(y) = (...(((\delta_{n}y^{2}+y+a_s)*(y^{2}-a_{s-1})+\beta_{s-1})*(y^2-a_{s-2})+\beta_{s-2})...)*(y^2-a_1)+\beta_1$ (*)
$\delta_n = 1$ for n-even; 0 for n-odd s = 1/2 *n for even n; 1/2 *(n+1) for odd n
Computation algorithm of $p_n(x)$
or $x = x*sqrt^n(a_n)$ when first scaling the function argument
$y = x+ (a_{n-1}-1)/n$
$z = y^2$
$w_s = \delta_{n}*z+y+a_s$
$w_j = w_{j+1}*(z-a{j})+\beta_j$ (j=s-1,s-2,...1)
$p_n(x) = w_1$
or $p_n(x) = a_{n}*w_1$ when at the end we scale the result of the function
example
$p_4(x) = x^4+5x^3+3x^2+2$
$a_3=5 => y = x+1 => x := y-1$
$q_4(y)= y^4+y^3-6y^2+5y+1$ substitute $\nu_1=0$
$0 = 5+a_1$
thereby $a_1 = -5$
and
$q_4(y)$ = $(y^2+5)*(y^2+y-11)+56$
the calculation algorithm:
$y := x+1$
$z := y*y$
$w := z+y-11$
$w := w*(z+5)+56$
1.>
$0.000204700z^7+0.001439274z^6+0.008328596z^5+0.041635012z^4+0.166667986z^3+0.500006347z^2+0.999999901z+0.999999801$
in the range <-1;1> the maximum difference is about 2e-7
$z:=x/0.297178123$ we will get
$x^7 + 2.0895004648·x^6 + 3.5932515591·x^5 + 5.3381571582·x^4 + 6.3504088014·x^3 + 5.6616347459·x^2 + 3.3649849203·x + 0.999999801$
$x = y-(2.08950044541-1)/7$ $x = y-0.155642921$
4. $y^7 + y^6 + 2.1506749053·y^5 + 3.1691354977·y^4 + 3.7604523590·y^3 + 3.3533330221·y^2 + 1.9930983727·y + 0.5923035272$
calculated:
$5.x^5+x4+1.4184012627x3+2.4368618551x2+2.7217944997x+1.5688833150$ $a: -0.732273642600117$ (root $x^3+2.1506749053x^2+3.760452359x+1.9930983727)$ remainder: $-0.5565483727$
this one has no root because $x^2+1.4184012627x+2.7217944997$ has no roots
and it's decomposed into
$x^4+x3+1.4184012627x2+2.4368618551x+2.7217944997$
and remaionder $1.5688833150$
is decomposed into $y^2-a$ for $a=-2.4368618551$ (root of the equation $x+2.4368618551$) and a second degree polynomial $x^2+x-1.0184605924$ and remainder $5.2036422682$
function horner(x:real):real;
begin
result:=0.0002047;
result:=result*x+0.001439274;
result:=result*x+0.008328596;
result:=result*x+0.041635012;
result:=result*x+0.166667986;
result:=result*x+0.500006347;
result:=result*x+0.999999901;
result:=result*x+0.999999801
end;
In the past, on the old Pascal compiler the following was faster, for newer compilers without any difference
function hornerfast(x:real):real;
begin
result:=((((((0.0002047*x+0.001439274)*x+0.008328596)*x
+0.041635012)*x+0.166667986)*x+0.500006347)*x+0.999999901)*x+0.999999801;
end;
And this is using a new method
function fast(z:real):real;
var y:real;
begin
y:=z*0.297178123+0.155642921;
z:=y*y;
result:=(z+0.732273642600117)*(y*((z+2.4368618551)*(z+y-1.0184605924)
+5.2036422682)+1.5688833150)-0.5565483727;
end;
Question: has anyone heard anything about this method? Any information on how to search online?