16.07.2014 Views

DRAFT IEEE Standard for Binary Floating-Point Arithmetic - Sonic.net

DRAFT IEEE Standard for Binary Floating-Point Arithmetic - Sonic.net

DRAFT IEEE Standard for Binary Floating-Point Arithmetic - Sonic.net

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>DRAFT</strong> <strong>IEEE</strong> <strong>Standard</strong> <strong>for</strong> <strong>Floating</strong>-<strong>Point</strong> <strong>Arithmetic</strong> – 2003 August 12 10:20<br />

The smallest positive normal number is b emin and the largest is<br />

b emax (b − b 1-p ). The nonzero values with magnitude less than<br />

b emin are called subnormal because their magnitudes lie between<br />

zero and the smallest normal magnitude. One reason to<br />

distinguish between normal and subnormal binary numbers is that<br />

<strong>for</strong>mats may encode these two kinds of numbers differently.<br />

Every finite representable number is an integral multiple of the<br />

smallest subnormal magnitude b emin b 1−p .<br />

For any variable that has the value zero, the sign bit s provides an<br />

extra bit of in<strong>for</strong>mation. Although all <strong>for</strong>mats have distinct<br />

representations <strong>for</strong> +0 and –0, the signs are significant in some<br />

circumstances, such as division by zero, but not in others (see<br />

section 6.3). In this standard, 0 and ∞ are written without a sign<br />

when the sign is not important.<br />

[In this section, it may be appropriate to note and explain the differences between<br />

the real numbers (or real numbers extended with signed infinities) and a <strong>for</strong>mat's<br />

approximation to real numbers with respect to zero. In other words, while the<br />

real number system only has a single zero, floating-point numbers have both<br />

positive and negative zero <strong>for</strong> pragmatic reasons to make certain computations<br />

easier to <strong>for</strong>mulate (branch cuts, etc.) and to allow more regular rules (e.g.<br />

defining sign of 1/0). -JDD]<br />

Basic Format Parameters Defining Representable Values<br />

<strong>Binary</strong> <strong>for</strong>mat b = 2 Decimal <strong>for</strong>mat b = 10<br />

Parameter binary32 binary64 binary128 decimal32 decimal64 decimal128<br />

p digits 24 53 113 7 16 34<br />

emax +127 +1023 +16383 +96 +384 +6144<br />

emin –126 –1022 –16382 –95 –383 –6143<br />

Table 1a: Representable Value Parameters<br />

[Note: These choices <strong>for</strong> emin, emax , and p meet the constraints of section 3.1<br />

of <strong>IEEE</strong> 854:<br />

• (emax - emin)/p shall exceed 4 and should exceed 10<br />

• b p -1<br />

>= 10 5 .<br />

Copyright © 2003 by the Institute of Electrical and Electronics Engineers, Inc. This document is an unapproved<br />

draft of a proposed <strong>IEEE</strong>-SA <strong>Standard</strong> - USE AT YOUR OWN RISK. See statement on page 1.<br />

Page 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!