DRAFT IEEE Standard for Binary Floating-Point Arithmetic - Sonic.net

More documents

Recommendations

Info

DRAFT IEEE Standard for Floating-Point Arithmetic – 2003 August 12 10:20 3.0. Background and Terminology Floating-point arithmetic is a systematic approximation of real arithmetic. Floating-point arithmetic can only represent a finite subset of the infinite number of real numbers. Additionally, many of the axioms of real arithmetic, such as associatively of addition, do not hold for floating-point arithmetic. The mathematical structure unpinning the arithmetic in this standard is the extended reals, that is, the set of real numbers together with positive and negative infinity. For a given format, the process of rounding (section 4) maps an element of the extended reals to a representable numerical value included in that format. A representable numerical value can be mapped to one or more floating-point values of a format. The set of floating-point values a numerical value maps to is called the numerical value’s cohort. The elements of a cohort are distinct representations of the same numerical value. For example, in a binary floatingpoint format, the numerical value zero has the cohort {-0, +0}. The floating-point values of a format consist of: • tuples (s, e, m); the numerical value of a tuple is (–1) s b e b 1–p m • +infinity, -infinity • NaN For nonzero values, binary formats have constraints on the relation between e and m which cause each numerical value representable in that format to map to a unique floating-point value in that format; in other words, nonzero numerical values have a unique representation in a binary format. Decimal formats do not have the same constraints; a nonzero numerical value’s cohort can have multiple elements. For example, if m is a multiple of 10 and e is not emax, (s, e, m) and (s, e + 1, m / 10) are two representations for the same numerical value. With one exception, the numerical value of the result of a floating-point arithmetic operation is only a function of the numerical values of the operands (see section 5). In other words, the representation of the operands may only influence the representation of the result; the result has the same cohort indepenent of the operands’ representations. The exception to this this rule is division by zero, in which case the sign of the zero influences which infinity is returned (see section 7.2); positive and negative infinity are not in the same cohort. Which representation is used for a result provides some information about the history of the computation; the decimal specific operations (section 5.11) can be used to distinguish among the different representations. Copyright © 2003 by the Institute of Electrical and Electronics Engineers, Inc. This document is an unapproved draft of a proposed IEEE-SA Standard - USE AT YOUR OWN RISK. See statement on page 1. Page 16
DRAFT IEEE Standard for Floating-Point Arithmetic – 2003 August 12 10:20 An encoding maps a floating-point value to a bit string. An encoding may be able to map NaN and infinity values to more than one bit string. The multiple NaN bit strings may be used to store retrospective diagnostic information (see section 6.2). Figure 0 -- Copyright © 2003 by the Institute of Electrical and Electronics Engineers, Inc. This document is an unapproved draft of a proposed IEEE-SA Standard - USE AT YOUR OWN RISK. See statement on page 1. Page 17
Page 1 and 2: DRAFT IEEE Standard for Floating-Po
Page 15: DRAFT IEEE Standard for Floating-Po
Page 67 and 68:
DRAFT IEEE Standard for Floating-Po
Page 69:
DRAFT IEEE Standard for Floating-Po
show all

DRAFT IEEE Standard for Binary Floating-Point Arithmetic - Sonic.net

Create successful ePaper yourself

Delete template?

Save as template?