DRAFT IEEE Standard for Binary Floating-Point Arithmetic - Sonic.net
DRAFT IEEE Standard for Binary Floating-Point Arithmetic - Sonic.net
DRAFT IEEE Standard for Binary Floating-Point Arithmetic - Sonic.net
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>DRAFT</strong> <strong>IEEE</strong> <strong>Standard</strong> <strong>for</strong> <strong>Floating</strong>-<strong>Point</strong> <strong>Arithmetic</strong> – 2003 August 12 10:20<br />
3.0. Background and Terminology<br />
<strong>Floating</strong>-point arithmetic is a systematic approximation of real arithmetic.<br />
<strong>Floating</strong>-point arithmetic can only represent a finite subset of the infinite number of<br />
real numbers. Additionally, many of the axioms of real arithmetic, such as<br />
associatively of addition, do not hold <strong>for</strong> floating-point arithmetic. The<br />
mathematical structure unpinning the arithmetic in this standard is the extended<br />
reals, that is, the set of real numbers together with positive and negative infinity.<br />
For a given <strong>for</strong>mat, the process of rounding (section 4) maps an element of the<br />
extended reals to a representable numerical value included in that <strong>for</strong>mat. A<br />
representable numerical value can be mapped to one or more floating-point<br />
values of a <strong>for</strong>mat. The set of floating-point values a numerical value maps to is<br />
called the numerical value’s cohort. The elements of a cohort are distinct<br />
representations of the same numerical value. For example, in a binary floatingpoint<br />
<strong>for</strong>mat, the numerical value zero has the cohort {-0, +0}. The floating-point<br />
values of a <strong>for</strong>mat consist of:<br />
• tuples (s, e, m); the numerical value of a tuple is (–1) s b e b 1–p m<br />
• +infinity, -infinity<br />
• NaN<br />
For nonzero values, binary <strong>for</strong>mats have constraints on the relation between e and<br />
m which cause each numerical value representable in that <strong>for</strong>mat to map to a<br />
unique floating-point value in that <strong>for</strong>mat; in other words, nonzero numerical values<br />
have a unique representation in a binary <strong>for</strong>mat. Decimal <strong>for</strong>mats do not have the<br />
same constraints; a nonzero numerical value’s cohort can have multiple elements.<br />
For example, if m is a multiple of 10 and e is not emax, (s, e, m) and (s, e + 1, m<br />
/ 10) are two representations <strong>for</strong> the same numerical value.<br />
With one exception, the numerical value of the result of a floating-point arithmetic<br />
operation is only a function of the numerical values of the operands (see section<br />
5). In other words, the representation of the operands may only influence the<br />
representation of the result; the result has the same cohort indepenent of the<br />
operands’ representations. The exception to this this rule is division by zero, in<br />
which case the sign of the zero influences which infinity is returned (see section<br />
7.2); positive and negative infinity are not in the same cohort. Which<br />
representation is used <strong>for</strong> a result provides some in<strong>for</strong>mation about the history of<br />
the computation; the decimal specific operations (section 5.11) can be used to<br />
distinguish among the different representations.<br />
Copyright © 2003 by the Institute of Electrical and Electronics Engineers, Inc. This document is an unapproved<br />
draft of a proposed <strong>IEEE</strong>-SA <strong>Standard</strong> - USE AT YOUR OWN RISK. See statement on page 1.<br />
Page 16