Unless specified otherwise, everything in this chapter assumes no underflow.

We often will have many factors of \((1+\epsilon_{i})\), with each \(\epsilon_{i}\) being different. This can become quite tedious in expressions. So some definitions to make life easier.

Let \(|\epsilon_{i}|\le\mathbf{u}\). And let \(n\mathbf{u}<1\). Then define \(\theta_{n}\) from the following:

Now this condition is true:

Note that I defined \(\gamma_{n}\) above.

To show this, note that the RHS is just the geometric sequence, as \(n\mathbf{u}<1\). The remaining steps is to realize that \(|1+\epsilon_{i}|<1+\mathbf{u}\). Then expand \((1+\mathbf{u})^{n}\), and note that each term in the expansion is less than \((n\mathbf{u})^{k}\). And even though this sum is finite, it is less than the infinite sum - hence the geometric series bound.

An important property of this is that \(\gamma_{n}\le\gamma_{n+1}\). You could show this by doing a geometric series expansion of \(\gamma_{n+1}\) and noting that the geometric series expansion of \(\gamma_{n}\) is included in the sum (and all other terms are positive). Or you could just subtract the two terms and note it is positive.

Of course, this identity is true only of \((n+1)\mathbf{u}<1\).

Now let’s consider a practical application of this. We want to calculate \(\sum a_{i}\) where \(a_{i}\) are floating point numbers, and assuming no overflow. We will use the simple algorithm of adding the first two numbers, and then adding each successive number one at a time after that.

Now if you work it out, you’ll see that the final sum will be:

If we want the final error, then:

To get this, we utilized the fact that \(\theta_{n}\le\gamma_{n}\), and the monotonic property of \(\gamma_{n}\).

A very important point: This holds even with underflow, because when you
have an underflow in \(a+b\), the result is *exact*.

The book didn’t go through the steps, but if we want to calculate the dot product (using the naive algorithm), then

And for the naive Horner’s rule for polynomial evaluation (order \(n\)):

Note that you can use FMA for both the dot product and the polynomial evaluation, and you can reduce the bound. For example, for Horner’s rule, we get \(\gamma_{n}\) instead of \(\gamma_{2n}\). Do note that this doesn’t mean the error is reduced!

We are often interested in the **backward error**. As an example, say we
want to calculate \(y=f(x)\). So we use an algorithm, and get
\(\hat{y}\). Now the error is \(|y-\hat{y}|\). But the backward
error is \(|x-\hat{x}|\), where \(\hat{x}\) is such that
\(f(\hat{x})=\hat{y}\). We often hope that if one error is small, so
is the other.

Similarly, we define the **relative backward error**:

The usual error is called the **forward** error.

Now the book claims that \(\gamma_{n-1},\gamma_{n},\gamma_{2n}\) are the backward relative errors of the above algorithms, respectively. I can’t see why. Shouldn’t they be the forward errors?

In either case, the point is that the errors are small if \(n\mathbf{u}<<1\), and for Horner’s rule, if \(2n\mathbf{u}<<1\). The last condition is almost always true - for a 64 bit machine, we’d need a polynomial of the order of \(2^{50}\) to get a sizeable error.

The condition numbers of each of these are:

For summation:

For dot product:

For Horner’s rule:

If these are too large, the forward relative errors can be very large.