Formats

Posted by Beetle B. on Mon 28 January 2019

The standard defines several interchange formats to allow for transferring floating point data between machines. They could be as bit strings or as binary, but with the latter you have to be concerned about endianness.

A format is an arithmetic format if all the required operations defined by the standard are supported by the format.

The standard defines five interchange formats which must also be arithmetic formats:

  • 32 bit
  • 64 bit
  • 128 bit
  • Decimal 64 bit
  • Decimal 128 bit

An implementation is said to be conforming if it implements at least one of them.

Often, intermediate results of calculations are performed in a wider format to limit the likelihood of under/overflows. It usually gives better accuracy as well.

The standard also requires that conversions between any two supported formats be implemented.

Binary Encodings

The bias for the exponent is always \(e_{\mathrm{max}}\)

Decimal Encodings

There are two decimal encodings: Binary and Decimal. The binary encoding makes a software implementation easier. The decimal encoding makes a hardware implementation easier. The set of numbers representable by them is identical. Any conforming implementation must provide a way to convert from one to the other.

There is no clear division between the sign, exponent and significand bits.

There is no notion of normalizing a decimal representation (i.e. no requirement to minimize \(e\)). So when you do an arithmetic operation, you have to specify which exponent is preferred.

In BCD (binary coded decimal), each digit was represented by 4 bits. This was deemed wasteful, as you are using 4 bits to represent only 10 numbers.

The chapter goes into detail of the decimal encoding with examples. Return to it as needed.

Extended and Extendable Precisions

I skipped this for now. Read as needed. The specification in IEEE 754 is partial. They are optional and binary encodings are not specified.

Little Endian, Big Endian

On big endian, the most significant byte has the lowest address.