IEEE Floating Point Standard 

Floating point system

Floating point arithmetic

IEEE floating point standard

IEEE Floating Point Standard

  1. What is the IEEE floating point standard?
  2. Floating point number representation
  3. Special values in IEEE floating point standard

What is the IEEE Floating Point Standard?

The IEEE floating point standard is a floating point arithmetic system adopted by the Institute for Electrical and Electronics Engineer in the early 1980s. 

Requirements for machines adopting the IEEE floating point standard

  1. Arithmetic should be correctly rounded
  2. floating point numbers should be consistently represented across machines
  3. Exception handling should be sensible and consistent

Web Reference:

Back to Top

Floating point number representation

Single precision numbers in a 32-bit machine

The bit pattern b1b2b3...b9b10b11...b32  of a word in a 32-bit machine represents the real number

(-1)s x 2e-127 x (1.f)2

where s = b1,  e = (b2...b9)2, and f = b10b11...b32.  

sign bit biased exponent fraction from normalized mantissa
1 bit 8 bits 23 bits
s e f

Note that only the fraction from the normalized mantissa is stored and so there is a hidden bit and the mantissa is actually represented by 24 binary digits.

Double precision numbers in a 32-bit machine

The bit pattern b1b2b3...b12b13b14...b64  of two words in a 32-bit machine represents the real number

(-1)s x 2e-1023 x (1.f)2

where s = b1,  e = (b2...b12)2, and f = b13b14...b64.  

sign bit biased exponent fraction from normalized mantissa
1 bit 11 bits 52 bits
s e f

Note that only the fraction from the normalized mantissa is stored and so there is a hidden bit and the mantissa is actually represented by 53 binary digits.

 

Decimal values of some normalized floating point numbers on a 32-bit machine:

  Single Precision Double Precision
Machine epsilon 2-23 or 1.192 x 10-7   2-52 or 2.220 x 10-16
Smallest positive 2-126 or 1.175 x 10-38  2-1022 or 2.225 x 10-308 
Largest positive (2- 2-23) 2127 or 3.403 x 1038  (2- 2-52) 21023 or 1.798 x 10308 
Smallest subnormal 2-150 or 7.0 x 10-46 2-1075 or 2.5 x 10-324
Decimal Precision 6 significant digits 15 significant digits

Rounding in IEEE standard

Round to the nearest mode is the most common choice.  Basically, given a real number x, its correctly rounded value is the floating point number fl(x) that is closest to x

Back to Top

Special values in IEEE floating point standard

Single Precision representation

  sign bit biased exponent fraction from normalized mantissa
 

1 bit

8 bits

23 bits

7/4 0 0 1 1 1 1 1 1 1  1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
-34.432175 1 1 0 0 0 0 1 0 0  0 0 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 0 1 1 0 0
-959818 1 0 0 1 0 0 1 0   1 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0
+ 0 0 0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
- 0 1 0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
macheps 0 0 1 1 0 1 0 0 0  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"smallest" 0 0 0 0 0 0 0 0 1  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"largest" 0 1 1 1 1 1 1 1 0  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
infinity 0 1 1 1 1 1 1 1 1  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
NaN 0 1 1 1 1 1 1 1 1 Not all 0s or 1s
2-128** 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
**This is a subnormal number.  It is machine representable but is less accurate in computation than a normalizable value.
Back to Top