Sequential Evaluation of Polynomials If you don’t have any parallelism available, Horner’s scheme is a good option. And if you have the FMA instruction available, use it!

If your polynomial is of a high degree, Knuth showed a way (it is in the book) to reduce the number of operations needed than the naive Horner implementation. Evaluating Polynomials When Some Parallelism is Available

## Generalization of Horner’s Scheme

This method involves splitting up the polynomial into its odd and even powers, and then add them up.

Specifically, let \(y=x^{2}\) and get \(p_{e}\) and \(p_{o}\). When you evaluate both, then \(p=p_{e}+xp_{o}\). You can generalize this to \(y=x^{k}\).

## Estrin’s Method

You can use Estrin’s method. This method takes more operations than Horner’s method, but it is faster due to parallelism.

## Evaluating Polynomials on Modern Processors

In general, deciding what level of partitioning for evaluation of polynomials in parallel is nontrivial. There’s no general principle that will tell you which partitioning to use for a given order polynomial.

Evaluating in parallel may also impact the accuracy of your result. Computing Bounds on the Evaluation Error

Much of this is covered in the Handbook book. I will not repeat it here. Note that the error bounds there assume FMA is not used (i.e. separate error for addition and multiplication).

He gives a Maple program for computing the error bounds, as well as how
to use Gappa to get such a bound. Note that this error bound is *not*
the ones given by analytical formulae. It is an algorithm that usually
gives a much tighter bound. Use it only if you *need* a tighter bound.

## Evaluation Error With Methods Different From Horner’s Scheme

In general, just use Gappa to estimate the error bound. Don’t try to write your own Maple program unless you will use the same strategy for many functions.

## When High Accuracy Is Needed

We often need to compute at higher precision than our target precision (e.g. intermediate calculation).

Also, often \(a_{0}\), or \(a_{0}+a_{1}x\) will be much larger than the rest of the polynomial (perhaps due to range reduction). In these cases, evaluate the two separately, and store those values separately. Polynomial Evaluation By Specific Hardware I skipped this section.