22.08.2013 Views

How to Print Floating-Point Numbers Accurately - Computer Science ...

How to Print Floating-Point Numbers Accurately - Computer Science ...

How to Print Floating-Point Numbers Accurately - Computer Science ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>How</strong> <strong>to</strong> <strong>Print</strong> <strong>Floating</strong>-<strong>Point</strong> <strong>Numbers</strong> <strong>Accurately</strong><br />

Guy L. Steele Jr.<br />

Thinking Machines Corporation<br />

245 First Street<br />

Cambridge, Massachusetts 02142<br />

Abstract<br />

gls0think. corn<br />

We present algorithms for accurately converting<br />

floating-point numbers <strong>to</strong> decimal representation. The<br />

key idea is <strong>to</strong> carry along with the computation an ex-<br />

plicit representation of the required rounding accuracy.<br />

We begin with the simpler problem of converting<br />

fixed-point fractions. A modification of the well-known<br />

algorithm for radix-conversion of fixed-point fractions<br />

by multiplication explicitly determines when <strong>to</strong> termi-<br />

nate the conversion process; a variable number of digits<br />

are produced. The algorithm has these properties:<br />

l No information is lost; the original fraction can be<br />

recovered from the output by rounding.<br />

l No “garbage digits” are produced.<br />

l The output is correctly rounded.<br />

0 It is never necessary <strong>to</strong> propagate carries on<br />

rounding.<br />

We then derive two algorithms for free-format out-<br />

put of floating-point numbers. The first simply scales<br />

the given floating-point number <strong>to</strong> an appropriate frac-<br />

tional range and then applies the algorithm for fractions.<br />

This is quite fast and simple <strong>to</strong> code but has inaccura-<br />

cies stemming from round-off errors and oversimplifica-<br />

tion. The second algorithm guarantees mathematical<br />

accuracy by using multiple-precision integer arithmetic<br />

and handling special cases. Both algorithms produce no<br />

more digits than necessary (intuitively, the “1.3 prints<br />

as 1.2999999” problem does not occur).<br />

Finally, we modify the free-format conversion al-<br />

gorithm for use in jized-format applications. Infor-<br />

mation may be lost if the fixed format provides <strong>to</strong>o<br />

few digit positions, but the output is always correctly<br />

rounded. On the other hand, no “garbage digits” are<br />

ever produced, even if the fixed format specifies <strong>to</strong>o<br />

many digit positions (intuitively, the u4/3 prints as<br />

1.333333328366279602” problem does not occur).<br />

Permission <strong>to</strong> copy without fee all or part of this material is granted<br />

provided that the copies are not made or distributed for direct com-<br />

mercial advantage, the ACM copyright notice and the title of the<br />

publication and its date appear, and notice is given that copying is<br />

by permission of the Association for Computing Machinery. To copy<br />

otherwise, or <strong>to</strong> republish, requires a fee and/or specific permission.<br />

01990 ACM 0-89791-364-7/90/0006/0112 $1.50<br />

I<br />

Proceedings of the ACM SIGPLAN’SO Conference on<br />

Programming Language Design and Implementation.<br />

White Plains, New York, June 20-22, 1990.<br />

I<br />

112<br />

Jon L White<br />

Lucid, Inc.<br />

707 Laurel Street<br />

Menlo Park, California 94025<br />

1 Introduction<br />

j onl0lucid. corn<br />

Isn’t it a pain when you ask a computer <strong>to</strong> divide 1.0 by<br />

10.0 and it prints 0.0999999? Most often the arithmetic<br />

is not <strong>to</strong> blame, but the printing algorithm.<br />

Most people use decimal numerals. Most contempo-<br />

rary computers, however, use a non-decimal internal<br />

representation, usually based on powers of two. This<br />

raises the problem of conversion between the internal<br />

representation and the familiar decimal form.<br />

The external decimal representation is typically used<br />

in two different ways. One is for communication with<br />

people. This divides in<strong>to</strong> two cases. For j&d-format<br />

output, a fixed number of digits is chosen before the<br />

conversion process takes place. In this situation the pri-<br />

mary goal is controlling the precise number of charac-<br />

ters produced so that tabular formats will be correctly<br />

aligned. Contrariwise, in free-format output the goal<br />

is <strong>to</strong> print no more digits than are necessary <strong>to</strong> com-<br />

municate the value. Examples of such applications are<br />

Fortran list-directed output and such interactive lan-<br />

guage systems as Lisp and APL. Here the number of<br />

digits <strong>to</strong> be produced may be dependent on the value<br />

<strong>to</strong> be printed, and so the conversion process itself must<br />

determine how many digits <strong>to</strong> output.<br />

The second way in which the external decimal rep-<br />

resentation is used is for inter-machine communication.<br />

The decimal representation is a convenient and conven-<br />

tional machine-independent format for transferring nu-<br />

merical information from one computer <strong>to</strong> another. Us-<br />

ing decimal representation avoids incompatibilities of<br />

word length, field format, radix (binary versus hexadec-<br />

imal, for example), sign representation, and so on. Here<br />

the main objective is <strong>to</strong> transport the numerical value<br />

as accurately as possible; that the representation is also<br />

easy for people <strong>to</strong> read is a bonus. (The IEEE 754<br />

floating-point standard [IEEE851 has ameliorated this<br />

problem by providing standard binary formats, but the<br />

VAX, IBM 370, and Cray-1 are still with us and will be<br />

for yet a while longer.)


We deal here with the output problem: methods of<br />

converting from the internal representation <strong>to</strong> the exter-<br />

nal representation. The corresponding input problem is<br />

also of interest and is addressed by another paper in thii<br />

conference [ClingerSO]. The two problems are not quite<br />

identical because we make the asymmetrical assumption<br />

that the internal representation has a fixed number of<br />

digits but the external representation may have a vary-<br />

ing number of digits. (Compare this with Matula’s ex-<br />

cellent work, in which both representations are assumed<br />

<strong>to</strong> be of iixed precision [Matula68] [Matula’70].)<br />

In the following discussion we assume the existence<br />

of a program that solves the input problem: it reads<br />

a number in external representation and produces the<br />

input representation whose exact value is closest, of all<br />

possible internal representation values, <strong>to</strong> the exact nu-<br />

merical value of the external number. In other words,<br />

this ideal input routine is assumed <strong>to</strong> round correctly in<br />

all cases.<br />

We present algorithms for fixed-point-fraction output<br />

and floating-point output that decide dynamically what<br />

is an appropriate number of output digits <strong>to</strong> produce<br />

for the external representation. We recommend the last<br />

algorithm, Dragon4, for general use in compilers and<br />

run;time systems; it is completely accurate and accom-<br />

modates a wide variety of output formats.<br />

We express the algorithms in a general form for con-<br />

verting from any radix b (the internal radii) <strong>to</strong> any<br />

other radix B (the ezternal radix); b and B may be any<br />

two integers greater than 1, and either may be larger<br />

than the other. The reader may find it helpful <strong>to</strong> think<br />

of B as being 10, and b as being either 2 or 16. Infor-<br />

mally we use the terms “decimal” and “external radix”<br />

interchangeably, and similarly “binary” and “internal<br />

radix”. We also speak interchangeably of digits being<br />

‘output” or Uprintedn.<br />

2 Properties of Radix Conversion<br />

We assume that a number <strong>to</strong> be converted from one<br />

radix <strong>to</strong> another is perfectly accurate and that the goal<br />

is <strong>to</strong> express that exact value. Thus we ignore issues of<br />

errors (such as floating-point round-off or truncation) in<br />

the calculation of that value.<br />

The decimal representation used <strong>to</strong> communicate a<br />

numerical value should be accurate. This is especially<br />

important when transferring values between comput-<br />

ers. In particular, if the source and destination com-<br />

puters use identical floating-point representations, we<br />

would like the value <strong>to</strong> be communicated exactly; the<br />

destination computer should reconstruct the exact bit<br />

pattern that the source computer had. In this case we<br />

require that conversion from internal form <strong>to</strong> external<br />

form and back be an identity function; we call this the<br />

113<br />

internal identity requirement. We would also like con-<br />

version from external form <strong>to</strong> internal form and back <strong>to</strong><br />

be an identity; we call this the external identity requim-<br />

ment. <strong>How</strong>ever, there are some difficulties with these<br />

requirements.<br />

The first difficulty is that an exact representation is<br />

not always possible. A finite integer in any radix can<br />

be converted in<strong>to</strong> a finite integer in any other radix.<br />

This is not true, however, of finite-length fractions; a<br />

fraction that has a finite representation in one radix may<br />

have an indefinitely repeating representation in another<br />

radix. This occurs only if there is some prime number<br />

p that divides the one radii but not the other; then<br />

l/p (among other numbers) has a finite representation<br />

in the first radix but not in the second. (It should be<br />

noted, by the way, that this relationship between radices<br />

is different from the relationship of incommensumbility<br />

as Matula defines it [Matula70]. For example, radices<br />

10 and 20 are incommensurable, but there is no prime<br />

that divides one and not the other.)<br />

It does happen that any binary, octal, or hexadeci-<br />

mal number has a finite decimal representation (because<br />

there is no prime that divides 2k but not 10). Therefore<br />

when B = 10 and b is a power of two we can guarantee<br />

the internal identity requirement by printing an exact<br />

decimal representation of a given binary number and<br />

using an ideal rounding input routine.<br />

This solution is gross overkill, however. The exact<br />

decimal representation of an n-bit binary number may<br />

require n decimal digits. <strong>How</strong>ever, es a rule fewer than<br />

n digits are needed by the rounding input routine <strong>to</strong><br />

reconstruct the exact binary value. For example, <strong>to</strong><br />

print a binary floating-point number with a 27-bit frac-<br />

tion requires up <strong>to</strong> 30 decimal digits. (A 27-bit frac-<br />

tion can represent a 30-bit binary fraction whose dec-<br />

imal representation has a non-sero leading digit; this<br />

is discussed further below.) <strong>How</strong>ever, 10 decimal dig-<br />

its always suffice <strong>to</strong> communicate the value accurately<br />

enough for the input routine <strong>to</strong> reconstruct the 27-<br />

bit binary representation [Matula68], and many par-<br />

ticular values can be expressed accurately with fewer<br />

than 10 digits. For example, the 27-bit binary floating-<br />

point number .110011001100110011001100110~ x 2-‘,<br />

which is equal <strong>to</strong> the 29-bit binary fixed-point frac-<br />

tion .00011001100110011001100110011~, is also equal <strong>to</strong><br />

the decimal fraction .09999999962747097015380859375,<br />

which has 29 digits. <strong>How</strong>ever, the decimal fraction 0.1<br />

is closer in value <strong>to</strong> the binary number than <strong>to</strong> any other<br />

27-bit floating-point binary number, and so suffices <strong>to</strong><br />

satisfy the internal identity requirement. Moreover, 0.1<br />

is more concise and more readable.<br />

We would like <strong>to</strong> produce enough digits <strong>to</strong> preserve<br />

the information content of a binary number, but no<br />

more. The difficulty is that the number of digits needed


depends not only on the precision of the binary number<br />

but on its value.<br />

Consider for a moment the superficially easier prob-<br />

lem of converting a binary floating-point number <strong>to</strong><br />

octal. Suppose for concreteness that our binary num-<br />

bers have a six-bit significand. It would seem that two<br />

octal digits should be enough <strong>to</strong> express the value of<br />

the binary number, because one octal digit expresses<br />

three bits of information. <strong>How</strong>ever, the precise oc-<br />

tal representation of the binary floating-point number<br />

.101101~ x 2-l is .264s x 8O. The exponent of the bi-<br />

nary floating-point number specifies a shifting of the<br />

significand so that the binary point is Tn the middle”<br />

of an octal digit. The first octal digit thus represents<br />

only two bits of the original binary number, and the<br />

third only one bit. We find that three octal digits are<br />

needed because of the difference in “graininess” of the<br />

exponent (shifting) specitications of the two radices.<br />

The number N of radix-B digits that may be required<br />

in general <strong>to</strong> represent an n-digit radix-b floating-point<br />

number is derived in [Matula68]. The condition is:<br />

B*-l > bn<br />

This may be reformulated as:<br />

N>--!f--+1<br />

log, B<br />

and if N is <strong>to</strong> be minimized we therefore have:<br />

N = 2 + [n/&g, B)]<br />

No more than this number of digits need ever be output,<br />

but there may be some numbers that require exactly<br />

that many.<br />

As an example, in converting a binary floating-point<br />

number <strong>to</strong> decimal on the PDP-10 [DEC73] we have<br />

b = 2, B = 10, and n = 27. One might naively sup-<br />

pose that 27 bits equals 9 octal digits, and if nine octal<br />

digits suffice surely 9 decimal digits should suffice also.<br />

<strong>How</strong>ever, using Matula’s formula above, we find that <strong>to</strong><br />

print such a number may require 2 + L27/(log, lo)] =<br />

2 + /27/(3.3219...)J = 2 + [8.1278...] = 10 decimal<br />

digits. There are indeed some numbers that require 10<br />

digits, such as the one which is greater than the rounded<br />

27-bit floating-point approximation <strong>to</strong> 0.1 by two units<br />

in the last place. On the PDP-10 this is the number<br />

represented in octal as 1756314631508, which represents<br />

the value .63146315s x 8 -l. It is intuitively easy <strong>to</strong> see<br />

why ten digits might be needed: splitting the number<br />

in<strong>to</strong> an integer part (which is zero) and a fractional part<br />

must shift three zero bits in<strong>to</strong> the significand from the<br />

left, producing a 30-bit fraction, and 9 decimal digits is<br />

not enough <strong>to</strong> represent all 30-bit fractions.<br />

The condition specified above assumes that the con-<br />

versions in both directions round correctly. There are<br />

114<br />

other possibilities, such. as using truncation (always<br />

rounding <strong>to</strong>wards zero) instead of true rounding. If the<br />

input conversion rounds but the output conversion trun-<br />

cates, the internal identity requirement can still be met,<br />

but more external digits may be required; the condition<br />

is BN-’ > 2 x bn - 1. If conversion truncates in both<br />

directions, however, it is possible that the internal iden-<br />

tity requirement cannot be met, and that repeated con-<br />

versions may cause a value <strong>to</strong> “drift” very far from its<br />

original value. This implies that if a data base is main-<br />

tained in decimal, and is updated by converting it <strong>to</strong> bi-<br />

nary, processing it, and then converting back <strong>to</strong> decimal,<br />

values not updated may nevertheless be changed, after<br />

many iterations, by a substantial amount [Matula70].<br />

That is why we assume and require proper rounding.<br />

Rounding can guarantee the integrity of the data (in<br />

the form of the internal identity requirement) and also<br />

will minimize the number of output digits. Truncation<br />

cannot and will not.<br />

The external identity requirement obviously cannot<br />

be strictly satisfied, because by assumption there are a<br />

finite number of internally representable values and an<br />

infinite number of external values. For every internal<br />

value there are many corresponding external representa-<br />

tions. From among the several external representations<br />

that correspond <strong>to</strong> a given internal representation, we<br />

prefer <strong>to</strong> get one that is closest in value but also as short<br />

as possible.<br />

Satisfaction of the internal identity requirement im-<br />

plies that an external consistency requirement be met:<br />

if an internal value is printed out, then reading that ex-<br />

act same external value and then printing it once more<br />

will produce the same external representation. This<br />

is important <strong>to</strong> the user of an interactive system. If<br />

one types PRINT 3.1415926535 and the system replies<br />

3.1415926, the user might think “Well and good; the<br />

computer has truncated it, and that’8 all I need type<br />

from now on.” The user would be rightly upset <strong>to</strong><br />

then type in PRINT 3.1415926 and receive the response<br />

3.1415925! Many language implementations indeed ex-<br />

hibit such undesirable “drifting” behavior.<br />

Let us formulate our desires. First, in the interest<br />

of brevity and readability, conversion <strong>to</strong> external form<br />

should produce as few digits as possible. Now suppose,<br />

for some internal number, that there are several exter-<br />

nal representations that can be used: all have the same<br />

number of digits. all can be used <strong>to</strong> reconstruct the in-<br />

ternal value precisely, and no shorter external number<br />

suffices for the reconstruction. In this case we prefer the<br />

one that is closest <strong>to</strong> the value of the binary number.<br />

(As we shall see, all the external number8 must be alike<br />

except in the last digit; satisfying this criterion there-<br />

fore involves only the correct choice for the last digit <strong>to</strong><br />

be output.)


On the other hand, suppose that the number of dig-<br />

its <strong>to</strong> be output has been predetermined (fixed-format<br />

output); then we desire that the printed value be as<br />

close as possible <strong>to</strong> the binary value. In short, the ex-<br />

ternal form should always be correctly rounded. (The<br />

Fortran 77 standard requires that floating-point output<br />

be correctly rounded [ANSI76, 13.9.5.21; however, many<br />

Fortran implementations do round not correctly, espe-<br />

cially for numbers of very large or very small magni-<br />

tude.)<br />

We first present and prove an algorithm for printing<br />

fired-point fractions that has four useful properties:<br />

(a) Information preservation. No information is lost<br />

in the conversion. If we know the length n of the<br />

original radix-b fraction, we can recover the original<br />

fraction from the radix-B output by converting back<br />

<strong>to</strong> radix-b and rounding <strong>to</strong> n digits.<br />

(b) Minimum-length output. No more digits than nec-<br />

essary are output <strong>to</strong> achieve (a). (This implies that<br />

the last digit printed is non-zero.)<br />

(c) Correct rounding. The output is correctly rounded;<br />

that is, the output generated is the closest approx-<br />

imation <strong>to</strong> the original fraction among all radix-B<br />

fractions of the same length, or is one of two closest<br />

approximations, and in the latter case it is easy <strong>to</strong><br />

choose either of the two by any arbitrary criterion.<br />

(d) Left-<strong>to</strong>-right genenrtion. It is never necessary <strong>to</strong><br />

propagate carries. Once a digit has been generated,<br />

it is the correct one.<br />

These properties are characterized still more rigorously<br />

in the description of the algorithm itself.<br />

From thii algorithm we then derive a similar method<br />

for printing integers. The two methods are then com-<br />

bined <strong>to</strong> produce two algorithms for printing floating-<br />

point numbers, one for free-format applications and one<br />

for fixed-format applications.<br />

3 Fixed-<strong>Point</strong> Fraction Output<br />

Following [Knuth68], we use the notation 1x1 <strong>to</strong> mean<br />

the greatest integer not greater than z. Thus 13.51 = 3<br />

and I-3.51 = -4. If z is an integer, then \xJ = 2.<br />

Similarly, we use the notation [z] <strong>to</strong> mean the smallest<br />

integer not less than x; [z] z - ]--zj . Also following<br />

[Knuth68], we use the notation {z} <strong>to</strong> mean z mod 1,<br />

or z - lz], the fractional part of Z. We always indicate<br />

numerical multiplication of a and b explicitly as a x b,<br />

because we use simple juxtaposition <strong>to</strong> indicate digits<br />

in a place-value notation.<br />

The idea behind the algorithm is very simple. The<br />

potential error in a fraction of limited precision may be<br />

described as being equal <strong>to</strong> one-half the minimal non-<br />

zero value of the least significant digit of the fraction.<br />

The algorithm merely initializes a variable M <strong>to</strong> this<br />

115<br />

value <strong>to</strong> represent the precision of the fraction. When-<br />

ever the fraction is multiplied by B, the error M is<br />

also multiplied by B. Therefore M always has a value<br />

against which the residual fraction may be meaningfully<br />

compared <strong>to</strong> decide whether the precision has been ex-<br />

hausted.<br />

Algorithm (FP)s:<br />

Finite-Precision Fixed-<strong>Point</strong> Fraction Prin<strong>to</strong>ut’<br />

Given:<br />

l An n-digit radix-b fraction f, 0 < f < 1:<br />

f = .f-lf-2f-a. .-f-n = c f-i x b-’<br />

i=l<br />

l A radix B (an integer greater than<br />

Output: The digits Fi of an N-digit (N<br />

fraction F:<br />

such that:<br />

N<br />

1).<br />

2 1) radix-B<br />

F = . F-1F-,F-a.. . F-N = c Fmi x B-’<br />

(a) IF - fl < 7<br />

(b) N is the smallest integer 2 1 such that (a) can be<br />

true.<br />

(4 IF - fl 5 7<br />

(d) Each digit is output before the next is generated;<br />

it is never necessary <strong>to</strong> “back up” for corrections<br />

Procedure: see Table 1. m<br />

The algorithmic procedure is expressed in pseudo-<br />

ALGOL. We have used the loop-while-repeat (“n + f<br />

loop”) construct credited <strong>to</strong> Dahl [Knuth74]. The loop<br />

is exited from the while-point if the while-condition<br />

is ever false; thus one may read “while B:” as<br />

“if not B then exitloop fi”. The cases statement<br />

is <strong>to</strong> be taken as a guarded if construction [Dijkstra76];<br />

the branch labeled by the true condition is executed,<br />

and if both conditions are true (here, if R = i) either<br />

branch may be chosen. The ambiguity in the cases<br />

statement here reflects the point of ambiguity when<br />

‘It is difficult <strong>to</strong> give short, meaningfnl names <strong>to</strong> each of a<br />

group of algorithms that are similar. Here we give each algorithm<br />

a long name that has enough adjectives and other qualifiers <strong>to</strong><br />

distinguish it Born the others, but these long names are unwieldy.<br />

For convenience, and as a bit of a joke, two kinds of abbreviated<br />

names are used here. Algorithms that are intermediate versions<br />

along a path of development have names such as “(FP)3” that are<br />

effectively contracted acronyms for the long names. Algorithms<br />

that are in “final form” have names of the form “Dragonk” for<br />

integers ic; these algorithms have long -es whose acronyms<br />

form sequences of letters “F” and “P” that specify the shape of<br />

so-called “dragon curves” [Gardner??].<br />

i=l


Table 1: Procedure for Finite-Precision Fixed-<strong>Point</strong><br />

Fraction Prin<strong>to</strong>ut ((FP)s)<br />

begin<br />

k e0;<br />

R + f;<br />

M c b-“/2;<br />

loop<br />

k+-k+l;<br />

Ut 1RxBJ;<br />

R+-(Rx B);<br />

Me-MxB;<br />

whileRLMandR): F-k+-U+l;<br />

endcases;<br />

N ck;<br />

end<br />

two radix-B representations of length n are equidistant<br />

from f. A given implementation may use any decision<br />

method when R = 3, such as “always U”, which effec-<br />

tively means <strong>to</strong> round down; “always U + l”, meaning <strong>to</strong><br />

round up; or WU E 0 ( mod 2) then U else U + l”,<br />

meaning <strong>to</strong> round so that the last digit is even. Note,<br />

however, that using “always U” does not always round<br />

down if rounding up will allow one fewer digit <strong>to</strong> be<br />

printed.<br />

This method is a generalization of the one presented<br />

by Taran<strong>to</strong> [Taran<strong>to</strong>59] and mentioned in exercise 4.4.3<br />

of [Knuth69].’<br />

4 Proof of Algorithm (FP)z<br />

(The reader who is more interested in practical appli-<br />

cations than in details of proof may wish <strong>to</strong> skip this<br />

section.)<br />

In order <strong>to</strong> demonstrate that Algorithm (FP)s satis-<br />

fies the four claimed properties (a)-(d), it is useful first<br />

<strong>to</strong> prove, by induction on k, the following invariants true<br />

at the <strong>to</strong>p of the loop body:<br />

M _ b-” x Bk<br />

k-<br />

2<br />

2By the way, the paper that was forward-referenced in the<br />

answer <strong>to</strong> exercise 4.4.3 in [Knuth81] was M early draft of this<br />

paper that contained only Algorithm (FP)3, its proof, and a not-<br />

yet-correct generalization <strong>to</strong> floating-point numbers.<br />

(i)<br />

116<br />

RkXBsk+~F-iXBsi=f<br />

i=l<br />

By & and Rk we mean the values of M and R at the<br />

<strong>to</strong>p of the loop as a function of k (the values of M and<br />

R change if and only if k is incremented). Invariant (i)<br />

is easily verified; we shall take it for granted. Invariant<br />

(G) is a little more complicated.<br />

Basis. After the first two assignments in the proce-<br />

dure, k = 0 and & = f . The summation in (ii) has no<br />

summands and is therefore zero, yielding<br />

&xB-‘+O=f<br />

which is true, because initially R = f.<br />

Induction. Suppose (ii) is true for k. We note that<br />

the only path backward in the loop sets F-(k+I) +<br />

[Rk x R] . It follows that:<br />

k<br />

f= Rk X Bsk+CF-iX B-’<br />

i=l<br />

= (B x Rk) x B- @+I) + -&F-; x B-’<br />

i=l<br />

=({RkxB}+lRkxBJ)xB- (k+l) + kFwi x B-”<br />

= (Rk+l + F--(k+l)) X B-(k+l) $ 2 F-i X B-’<br />

= Rk+l X B-(“+l) + C F-i X B-’<br />

k+l<br />

i=l<br />

which establishes the desired invariant.<br />

The procedure definitely terminates, because M ini-<br />

tially has a strictly positive value and is multiplied by<br />

B (which is greater than 1) each time through the loop.<br />

Eventually we have M > 4, at which point R 2 M<br />

and R 5 1 - M cannot both be true and the loop must<br />

terminate. (Of course, the loop may terminate before<br />

M > i, depending on R.)<br />

When the procedure terminates, there may be one of<br />

two cases, depending on RN. Because of the rounding<br />

step, we have one of the following:<br />

f=F+RNxB-N ifRN


f-F=R*xB-N if&s+<br />

F-f=R’xB-N if&z+<br />

From this we have:<br />

,F-f,=R*xB-“57<br />

proving property (c) (correct rounding).<br />

To prove property (a) (information preservation), we<br />

consider the two cases MN > i and MN 5 i. If we<br />

have MN > 3, then<br />

because MN =<br />

IF- fl< 7 1 - MN we have<br />

R* < MN, and so<br />

IF-fl=R’xB-“


Procedure: If f = 0 then simply print “0.0”. Otherwise<br />

proceed as follows. First compute u’ = u Q B-“, where<br />

“a” denotes floating-point multiplication and where z<br />

is chosen so as <strong>to</strong> make Y’ < bp. (If exponential for-<br />

mat is <strong>to</strong> be used, one normally chooses z so that the<br />

result either is between l/B and 1 or is between 1 and<br />

B.) Next, print the integer part of v’ by any suitable<br />

method, such as the usual division-remainder technique;<br />

after that print a decimal point. Then take the frac-<br />

tional part f of v’, let n = p - [log* v’J - 1, and apply<br />

Algorithm (FP)8 <strong>to</strong> f and n. Finally, if 2 was not zero,<br />

the letter “E” and a representation of the scale fac<strong>to</strong>r z<br />

are printed. I<br />

We note that if a floating-point number is the same<br />

size (in bits, or radix-b digits, or whatever) as an inte-<br />

ger, then arithmetic on integers of that size is often suffi-<br />

ciently accurate, because the bits used for the floating-<br />

point exponent are usually more than enough for the<br />

extra digits of precision required.<br />

This method for floating-point output is not entirely<br />

accurate, but is very simple <strong>to</strong> code, typically requir-<br />

ing only single-precision integer arithmetic. For appli-<br />

cations where producing pleasant output is important,<br />

program and data space must be minimized, and the in-<br />

ternal identity requirement may be relaxed, this method<br />

is excellent. An example of such an application might<br />

be an implementation of BASIC for a microcomputer,<br />

where pleasant output and memory conservation are<br />

more important than absolutely guaranteed accuracy.<br />

[The last sentence was written circa 1981. Nowadays all<br />

but the tiniest computers have the memory space for the<br />

full algorithm.] This algorithm was used for many years<br />

in the MacLisp interpreter and found <strong>to</strong> be satisfac<strong>to</strong>ry<br />

for most purposes.<br />

There are two problems that prevent Algorithm<br />

Dragon2 from being completely accurate for floating-<br />

point output.<br />

The first problem is that the spacing between adj,<br />

cent floating-point numbers of a given precision is not<br />

uniform. Let t) be a floating-point number, and let V-<br />

and v+ be respectively the next-lower and next-higher<br />

floating-point numbers of the same precision. For most<br />

values of v, we have v+ - v = v - v-. <strong>How</strong>ever, if v is an<br />

integral power of b, then instead v+ - v = b x (v - v-);<br />

that is, the gap below v is smaller than the gap above<br />

v by a fac<strong>to</strong>r of b.<br />

The second problem is that the scaling may cause<br />

loss of information. There may be two numbers, very<br />

close <strong>to</strong>gether in value, which when scaled by the same<br />

power of B result in the same scaled number, because of<br />

rounding. The problem is inherent in the floating-point<br />

representation. See [Matula68] for further discussion of<br />

this phenomenon.<br />

118<br />

Table 2: Procedure for Indefinite Precision Integer<br />

Prin<strong>to</strong>ut ((IP)2)<br />

begin<br />

k +- 0;<br />

R e d;<br />

M tb";<br />

s t 1;<br />

loop<br />

StSx B;<br />

kck+l;<br />

while(2xR)+M>2xS:<br />

repeat;<br />

Hck-1;<br />

assert S = B(“+‘)<br />

loop<br />

k+-k-l;<br />

S t S/B;<br />

u + [RISJ;<br />

RcRmodS;<br />

while2xR>Mand2xRs(2xS)-M:<br />

Dk + u;<br />

repeat;<br />

cases<br />

2xR5S: Dk+U;<br />

2xRLS: Dk+U+l;<br />

endcases;<br />

N + k;<br />

end<br />

6 Conversion of Integers<br />

To avoid the round-off errors introduced by scaling<br />

floating-point numbers, we develop a completely ac-<br />

curate floating-point output routine that performs no<br />

floating-point computations. As a first step, we exhibit<br />

an algorithm for printing integers that has a tetmina-<br />

tion criterion similar <strong>to</strong> that of Algorithm (FP)‘.<br />

Algorithm (IP)‘:<br />

Indefinite Precision Integer Prin<strong>to</strong>ut<br />

Given:<br />

l An h-digit radix-b integer d, accurate <strong>to</strong> position<br />

n (n 2 0):<br />

d = d,,dh-1. . . d,,+ld,(n zeros). = 2 a$ x b’<br />

l AradixB.


Output: Integers H and N (H 2 N 1 1) as an H-<br />

digit radix-B integer D:<br />

D = DHDH-I.. . DN+IDN(N zeros). = 5 Di x B’<br />

such that:<br />

(a) [D-d! < f<br />

(b) N is the largest integer, and H the smallest, such<br />

that la1 can be true.<br />

(c) ID -ii,‘< B”<br />

(d) Each digit is” output before the next is generated.<br />

Procedure: see Table 2. I<br />

In Algorithm (IP)l all quantities are integers. We<br />

avoid the use of fractional quantities by introducing a<br />

scaling fac<strong>to</strong>r S and logically replacing the quantities R<br />

and M by R/S and M/S. Whereas in Algorithm (FP)3<br />

the variables R and M were repeatedly multiplied by B,<br />

in Algorithm (IP)a the variable S is repeatedly divided<br />

by B.<br />

7 Free-Format <strong>Floating</strong>-<strong>Point</strong> Output<br />

From these two algorithms, (FP)’ and (IP)‘, it is now<br />

easy <strong>to</strong> synthesize an algorithm for “perfect” conversion<br />

of a (positive) fixed-precision floating-point number <strong>to</strong> a<br />

free-format floating-point number in another radix. We<br />

shall assume that a floating-point number f is repre-<br />

sented as a tuple of three integers: the exponent e, the<br />

“fraction” or “significand” m, and the precision p. To-<br />

gether these integers represent the mathematical value<br />

f= m x b(‘-P). We require m < bp; a normalized rep-<br />

resentation additionally requires m 2 b(P-l), but we do<br />

not depend on this.<br />

We define the function S/L& of two integer arguments<br />

z and n (Z > 0):<br />

shi&,(x,n) z lx x b”J<br />

This function is intended <strong>to</strong> be trivial <strong>to</strong> implement in<br />

radix-b arithmetic: it is the result of shifting x by n<br />

radix-b digit positions, <strong>to</strong> the left for positive n (intro-<br />

ducing low-order zeros) or <strong>to</strong> the right for negative n<br />

(discarding digits shifted <strong>to</strong> the right of the “radix-b<br />

point”).<br />

Algorithm (FPP)‘:<br />

Fixed-Precision Positive <strong>Floating</strong>-<strong>Point</strong> Prin<strong>to</strong>ut<br />

Given:<br />

a Three radix-b integers e, f, and p, representing the<br />

number v = f x bte+‘), withpz OandO < f < bP,<br />

l AradixB.<br />

i=N<br />

119<br />

Output: Integers H and N and digits DI, (H > k 2 N)<br />

such that if one defines the value<br />

V = 5 Di x B’<br />

i=N<br />

then:<br />

v- +v<br />

(4 -


Table 3: Fixed-Precision<br />

<strong>Floating</strong>-<strong>Point</strong> Prin<strong>to</strong>ut (( FPP)‘)<br />

begin<br />

Positive Table 4: Procedure Simple-Fizup<br />

procedure Simple-Fizup;<br />

begin<br />

assert f # 0 assert R/S = v<br />

R + Shifib(f) m=(e - no)); assert M-/S = M+/S = bte-p)<br />

S t shifib(l,max(O, -(e - p))); iff= shi&(l,p - 1) then<br />

assert R/S = f x b(‘-p) = v M+ + Shift,(M+,l);<br />

M- t shiflb(l, mu(e - p, 0)); R t- Shift,(R,l);<br />

M++M-; s t shiftb(S, 1);<br />

Simple-F&up; fi;<br />

Htk-1; k t 0;<br />

loop loop<br />

assert (R/S) x B” + 2 Di x B’ = u<br />

i=k<br />

kc&-l;<br />

U + I@ x B)/S_(;<br />

Rt(Rx<br />

M-tM-xB;<br />

M+tM+xB;<br />

B)modS;<br />

lowc2xR(2xS)-M+;<br />

while (not low) and (not high) :<br />

Dk + u;<br />

repeat;<br />

comment Let Vk =<br />

H<br />

C Di x B’.<br />

i=k+l<br />

v- +v<br />

assert low * - < (u X B” + Vk) 5 v<br />

2<br />

v+ + v<br />

assert high =$ u 5 ((u -k 1) x Bk + vk) < 2<br />

cases<br />

low and not high : Dk t U;<br />

high and not low : Dk t- U + 1;<br />

low and high :<br />

cases<br />

2xR5S: Dk+U;<br />

2xRzS: DktU$l;<br />

endcases;<br />

endcases;<br />

N tk;<br />

end;<br />

120<br />

assert (R/S) x Bk = v<br />

assert (M-/S) x Bk = v - v-<br />

assert (M+/S) x Bk = v+ -2)<br />

while R < [S/B] :<br />

&+--k-l;<br />

RtRxB;<br />

M-tM-xB;<br />

M+tM+xB;<br />

repeat;<br />

assert<br />

loop<br />

k = min(O, 1 + [log, vJ)<br />

assert (R/S) x Bk = v<br />

assert (M-/S) x Bk = v - v-<br />

assert (M+/S) x Bk = v+ -v<br />

while(2xR)+M+>2xS:<br />

ScSx<br />

k+-k+1;<br />

repeat;<br />

B;<br />

end:<br />

assert k = 1 + [ logB x$+]


The formula for v+ does not have <strong>to</strong> be conditional,<br />

even when the representation for v+ requires a larger<br />

exponent e than v does in order <strong>to</strong> satisfy f < bp. The<br />

formula for v- , however, must take in<strong>to</strong> account the sit-<br />

uation where f = b(J’- ‘1, because v- may then use the<br />

next-smaller value for e, and so the gap size is smaller by<br />

a fac<strong>to</strong>r of b. There may be some question as <strong>to</strong> whether<br />

this is the correct criterion for floating-point numbers<br />

that are not normalized. Observe, however, that this<br />

is the correct criterion for IEEE standard floating-point<br />

format [IEEE85], because all such numbers are normal-<br />

ized except those with the smallest possible exponent,<br />

so if v is denormalized then v- must also be denormal-<br />

ized and have the same value for the exponent e.<br />

To compensate for the phenomenon of unequal gaps,<br />

the variables M- and M+ are given the value that M<br />

would have in Algorithm (IP)‘, and then adjustments<br />

are made <strong>to</strong> M+, R, and S if necessary. The simplest<br />

way <strong>to</strong> account for unequal gaps would be <strong>to</strong> divide<br />

M- by b. <strong>How</strong>ever, this might cause M- <strong>to</strong> have a<br />

non-integral value in some situations. To avoid this,<br />

we instead scale the entire algorithm by an additional<br />

fac<strong>to</strong>r of b, by multiplying M+, R, and S by b.<br />

The presence of unequal gaps also induces an asym-<br />

metry in the termination test that reveals a previously<br />

hidden problem. In Algorithm (IP)‘, with M- = M+ =<br />

M, it was the case that if R > (2 x S) - M+ and<br />

2 x R < S, then necessarily R < M- also. Here this<br />

is not necessarily so, because M- may be smaller than<br />

M+ by a fac<strong>to</strong>r of b. The interpretation of this is that<br />

in some situations one may have 2 x R < S but nev-<br />

ertheless must round up <strong>to</strong> the digit U + 1, because<br />

the termination test succeeds relative <strong>to</strong> M+ but fails<br />

relative <strong>to</strong> M- .<br />

This problem is solved by rewriting the termination<br />

test in<strong>to</strong> two parts. The boolean variable low is true<br />

if the loop may be exited because M- is satisfied (in<br />

which case the digit U may be used). The boolean vari-<br />

able high is true if the loop may be exited because M+<br />

is satisfied (in which case the digit U + 1 may be used).<br />

If both variables are true, then either digit of U and<br />

U + 1 may be used, as far as information-preservation<br />

is concerned; in this case, and this case only, the com-<br />

parison of 2 x R <strong>to</strong> S should be done <strong>to</strong> satisfy the<br />

correct-rounding property.<br />

Algorithm (FPP) a is conceptually divided in<strong>to</strong> four<br />

parts. First, the variables R, S, M-, and M+ are ini-<br />

tialized. Second, if the gaps are unequal then the neces-<br />

sary adjustment is made. Third, the radix-B weight H<br />

of the first digit is determined by two loops. The first<br />

loop is needed for positive H, and repeatedly multiplies<br />

S by B; the second loop is needed for negative H, and<br />

repeatedly multiplies R, M-, and M+ by B. (Divisions<br />

by B are of course avoided <strong>to</strong> prevent roundoff errors.)<br />

121<br />

Fourth, the digits are generated by the last loop; this<br />

loop should be compared with the one in Table 1.<br />

8 Fixed-Format <strong>Floating</strong>-<strong>Point</strong> Output<br />

Algorithm (FPP)r simply generates digits, s<strong>to</strong>pping<br />

only after producing the appropriate number of digits<br />

for precise free-format output. It needs more flexible<br />

means of cutting off digit generation for fixed-format<br />

applications. Moreover, it is desirable <strong>to</strong> intersperse<br />

formatting with digit generation rather than buffering<br />

digits and then re-traversing the buffer.<br />

It is not difficult <strong>to</strong> adapt Algorithm (FPP)’ for fixedformat<br />

output, where the number of digits <strong>to</strong> be output<br />

is predetermined independently of the floating-point<br />

value <strong>to</strong> be printed. Using this algorithm avoids giving<br />

the illusion of more internal precision than is actually<br />

present, because it cuts off the conversion after sufficiently<br />

many digits have been produced; the remaining<br />

character<br />

ros.<br />

positions are then padded with blanks or ze-<br />

As an example, suppose that a Fortran program<br />

is written <strong>to</strong> calculate z, and that it indeed calculates<br />

a floating-point number that is the best possible<br />

approximation <strong>to</strong> z for that floating-point format,<br />

precise <strong>to</strong>, say, 27 bits. <strong>How</strong>ever, it then prints<br />

the value with format F20.18, producing the output<br />

u3.1415Q265160560607Q”.<br />

This is indeed accurately the value of that floatingpoint<br />

number <strong>to</strong> that many places, but the last ten<br />

digits are in some sense not justified, because the internal<br />

representation is not that precise. Moreover,<br />

this output certainly does not represent the value of<br />

A accurately; in format F20.18, u should be printed<br />

as “3.141592653589793238". The free-format printing<br />

procedure described above would cut off the conversion<br />

after nine digits, producing “3.14159265” and no<br />

more. The fixed-format procedure <strong>to</strong> be developed below<br />

would therefore produce “3.14159265<br />

9,<br />

or “3.141592650000000000”. Either of these is much<br />

less misleading as <strong>to</strong> internal precision.<br />

The fixed-format algorithm works essentially uses the<br />

free-format output method, differing only in when conversion<br />

is cut off. If conversion is cut off by exhaustion<br />

of precision, then any remaining character positions are<br />

padded with blanks or zeros. If conversion is cut off by<br />

the format specification, the only problem is producing<br />

correctly rounded output. This is easily almost-solved<br />

by noting that the remainder R can correctly direct<br />

rounding at any digit position, not just the last. In<br />

terms of the programs, almost all that is necessary is <strong>to</strong><br />

terminate a conversion loop prematurely if appropriate,<br />

and the following cases statement will properly round<br />

the last digit produced.


This doesn’t solve the entire problem, however, be-<br />

cause if conversion is cut off by the format specification<br />

then the early rounding may require propagation of car-<br />

ries; that is, the proof of property (d) fails <strong>to</strong> hold. This<br />

can be taken care of by adjusting the values of M- and<br />

M+ appropriately. In principle this adjustment some-<br />

times requires the use of non-integral values that cannot<br />

be represented exactly in radix b; in practice such values<br />

can be rounded <strong>to</strong> get the proper effect.<br />

As noted above, it is indeed not difficult <strong>to</strong> adapt<br />

the simplified algorithm for a particular fixed format.<br />

<strong>How</strong>ever, straightforwardly adapting it <strong>to</strong> handle a va-<br />

riety of fixed formats is clumsy. We tried <strong>to</strong> do this,<br />

and found that we had <strong>to</strong> introduce many switches and<br />

flags <strong>to</strong> control the various aspects of formatting, such<br />

as <strong>to</strong>tal field width, number of digits before and after<br />

the decimal point, width of exponent field, scale fac<strong>to</strong>r<br />

(as for “P” format specifiers in Fortran), and so on. The<br />

result was a tangled mess, very difficult <strong>to</strong> understand<br />

and maintain.<br />

We finally untangled the mess by completely separat-<br />

ing the generation of digits from the formatting of the<br />

digits. We added a few interface parameters <strong>to</strong> produce<br />

a digit genera<strong>to</strong>r (called Dragon4) <strong>to</strong> which a variety<br />

of formatting processes could be easily interfaced. The<br />

genera<strong>to</strong>r and the formatter execute as coroutines, and<br />

it is assumed that the “user” process executes as a third<br />

coroutine.<br />

For purposes of exposition we have borrowed cer-<br />

tain features of Hoare’s CSP (Communicating Sequen-<br />

tial Processes) notation [Hoare78]. The statement<br />

GENERATE! (zr, . . . , z,,)<br />

means that a message containing values zl,. . . , z, is <strong>to</strong><br />

be sent <strong>to</strong> the GENERATE process; the process that exe-<br />

cutes such a statement then continues its own execution<br />

without waiting for a reply. The statement<br />

FORMAT?(zl,...,z,)<br />

means that a message containing values 21,. . . , z, is<br />

<strong>to</strong> be received from the FORMAT process; the process<br />

that executes such a statement waits if necessary for a<br />

message <strong>to</strong> arrive. (The notation is symmetrical in that<br />

the sender and receiver of a message must each know<br />

the other’s name.) We do not borrow any of the control<br />

structure notations of CSP, and we do not worry about<br />

the matter of processes failing.<br />

To perform floating-point output one must effectively<br />

execute three coroutines <strong>to</strong>gether. In the CSP notation<br />

one writes:<br />

[ USER :: user process ]<br />

FORMAT :: formatting process ]<br />

GENERATE :: Drogon‘j ]<br />

122<br />

The interface convention is that the “user process” must<br />

first send the message<br />

FORMAT! (a, e, f, p, B)<br />

<strong>to</strong> initiate floating-point conversion and then may re-<br />

ceive characters from the FORMAT process until a “0”<br />

character is received. The *@” character serves as a ter-<br />

mina<strong>to</strong>r and should be discarded. Any of the formatting<br />

processes shown below may be used; <strong>to</strong> get free-format<br />

conversion, for example, one would execute<br />

[ USER :: user process 1<br />

FORMAT :: Free-Format ]<br />

GENERATE :: Dragon4 ]<br />

and <strong>to</strong> get fixed-format exponential formatting one<br />

would execute<br />

[ USER :: user process ]<br />

FORMAT :: Fized-Format Exponential 1<br />

GENERATE :: Dragon4 ]<br />

We have found it easy <strong>to</strong> write new formatting routines.<br />

9 Implementations<br />

In 1981 we coded a version of Algorithm Dragon4 in<br />

Pascal, including the formatting routines shown here as<br />

well as formatters for Fortran I%, F, and G formats and for<br />

PL/I-style picture formats such as $$$ , $$$ , $$9.99CR.<br />

This suite has been tested thoroughly. This algorithm<br />

has also been used in various Lisp implementations for<br />

both free-format and fixed-format floating-point output.<br />

A portable and performance-tuned implementation in C<br />

is in progress.<br />

10 His<strong>to</strong>rical Note and Acknowledgments<br />

The work reported here was done in the late 1970’s.<br />

This is the first published appearance, but earlier vex-<br />

sions of this paper have been circulated “unpublished”<br />

through the grapevine for the last decade. The algo-<br />

rithms have been used for years in a variety of language<br />

implementations, perhaps most notably in Zetalisp.<br />

Why wasn’t it published earlier? It just didn’t seem<br />

like that big a deal; but we were wrong. Over the years<br />

floating-point printers have continued <strong>to</strong> be inaccurate<br />

and we kept getting requests for the unpublished draft.<br />

We thank Donald Knuth for giving us that last required<br />

push <strong>to</strong> submit it for publication.<br />

The paper has been almost completely revised for pre-<br />

sentation at this conference. An intermediate version of<br />

the algorithm, Dragon3 (Free-Format Perfect Positive<br />

<strong>Floating</strong>-<strong>Point</strong> Prin<strong>to</strong>ut), has been omitted from this<br />

presentation for reasons of space; it is of interest only


in being closest in form <strong>to</strong> what was actually first im-<br />

plemented in MacLisp. We have allowed a few anachro-<br />

nisms <strong>to</strong> remain in this presentation, such as the sugges-<br />

tion that a tiny version of the printing algorithm might<br />

be required for microcomputer implementations of BA-<br />

SIC. You will find that most of the references date back<br />

<strong>to</strong> the 1960’s and 19’70’s.<br />

Helpful and valuable comments on drafts of this paper<br />

were provided by Jon Bentley, Barbara K. Steele, and<br />

Daniel Weinreb. We are also grateful <strong>to</strong> Donald Knuth<br />

and Will Clinger for their encouragement.<br />

The first part of this work was done in the mid <strong>to</strong> late<br />

1970’s while both authors were at the Massachusetts<br />

Institute of Technology, in the Labora<strong>to</strong>ry for <strong>Computer</strong><br />

<strong>Science</strong> and the Artificial Intelligence Labora<strong>to</strong>ry. From<br />

1978 <strong>to</strong> 1980, Steele was supported by a Fanny and John<br />

Hertz Fellowship.<br />

This work was carried forward by Steele at Carnegie-<br />

Mellon University, Tartan Labora<strong>to</strong>ries, and Thinking<br />

Machines Corporation, and by White at IBM T. J. Wat-<br />

son Research and Lucid, Inc. We thank these institu-<br />

tions for their support.<br />

This work was supported in part by the Defense Ad-<br />

vanced Research Projects Agency, Department of De-<br />

fense, ARPA Order 3597, moni<strong>to</strong>red by the Air Force<br />

Avionics Labora<strong>to</strong>ry under contract F33615-78-C-1551.<br />

The views and conclusions contained in this document<br />

are those of the authors and should not be interpreted<br />

as representing the official policies, either expressed or<br />

implied, of the Defense Advanced Research Projects<br />

Agency or the U.S. Government.<br />

11 References<br />

[ANSI761 American National Standards Institute. Draft<br />

proposed AN.9 Fortmn (BSR X3.9). Reprinted az ACM<br />

SIGPLAN Notices 11, 3 (March 1976).<br />

[Clinger90] Cling er, William D. <strong>How</strong> <strong>to</strong> read floating point<br />

numbers accurately. Proc. ACM SIGPLAN ‘90 Confer-<br />

ence on Progr amming Language Design and Implementa-<br />

tion (White Plains, New York, June 1990).<br />

[DEC73] Digital Equipment Corporation. DecSystem 10<br />

Assembly Language Handbook. Third edition. (Maynard,<br />

Massachusetts, 1973).<br />

[Dijkstra?G] Dijkzt ra, Edsger W. A Discipline of Program-<br />

ming. Prentice-Hall (Englewood Cliffs, New Jersey, 1976).<br />

[Gardner77] Gardner, Martin. “The Dragon Curve and<br />

Other Problems.” In Mathematical Magic Show. Knopf<br />

(New York, 1977), 203-222.<br />

[Hoare78] Hoarc, C. A. R. “Communicating Sequential<br />

Processes.” Communications of the ACM 21, 8 (August<br />

1978), 666-677.<br />

[IEEE811 IEEE <strong>Computer</strong> Society Standard Committee,<br />

Microprocessor Standards Subcommittee, <strong>Floating</strong>-<strong>Point</strong><br />

123<br />

Working Group. “A Proposed Standard for Binary<br />

<strong>Floating</strong>-<strong>Point</strong> Arithmetic.” <strong>Computer</strong> 14, 3 (March<br />

1981), 51-62.<br />

[IEEE851 IEEE. IEEE Standard for Binary <strong>Floating</strong>-<strong>Point</strong><br />

Arithmetic. ANSI/IEEE Std 754-1985 (New York, 1985).<br />

[Jensen741 Jensen, Kathleen, and Wirth, Niklaus. PAS-<br />

CAL User Manual and Report. Second edition. Springer-<br />

Verlag (New York, 1974).<br />

[Knuth68] Knuth, Donald E. The Art of <strong>Computer</strong> Pro-<br />

gramming, Volume 1: Fundamental Algorithms. Addison-<br />

Wesley (Reading, Massachusetts, 1968).<br />

[Knuth69] Knuth, Donald E. The Art of <strong>Computer</strong> Pro-<br />

gramming, Volume L: Seminumerical Algorithms. First<br />

edition. Addison-Wesley (Reading, Massachusetts, 1969).<br />

[Knuth74] Knuth, DonaId E. “Structured Programming<br />

with GO TO Statements.” Computing Survey3 6, 4 (De-<br />

cember 1974).<br />

[Knuth81] Knuth, Donald E. The Art of <strong>Computer</strong> Pro-<br />

gramming, Volume 2: Seminumerical Algorithms. Second<br />

edition. Addison-Wesley (Reading, Massachusetts, 1981).<br />

[MatuIa68] MatuIa, David W. “In-and-Out Conversions.”<br />

Communications of the ACM 11, 1 (January 1968), 47-<br />

50.<br />

[Matnla’lO] Matula, David W. “A Formalization of<br />

<strong>Floating</strong>-<strong>Point</strong> Numeric Base Conversion.” IEEE Trand-<br />

actions on <strong>Computer</strong>s C-19, 8 (August 1970), 681-692.<br />

[Moon741 Moon, David A. MacLiap Reference Manual, Re-<br />

vision 0. Massachusetts Institute of Technology, Project<br />

MAC (Cambridge, Massachusetts, April 1974).<br />

[Taran<strong>to</strong>59] Taran<strong>to</strong>, Donald. “Binary Conversion, with<br />

Fixed Decimal Precision, of a Decimal Fraction.” Com-<br />

munications of the ACM 2, 7 (July 1959), 27.


Table 5: Procedure Dragon4 (Formatter-Feeding Pre<br />

cess for <strong>Floating</strong>-<strong>Point</strong> Prin<strong>to</strong>ut, Performing F’ree-<br />

Format Perfect Positive <strong>Floating</strong>-<strong>Point</strong> Prin<strong>to</strong>ut)<br />

process Dragond;<br />

begin<br />

FORMAT? (b, e, f, p, B, Cu<strong>to</strong>flMode, Cu<strong>to</strong>flPlace);<br />

assert Cu<strong>to</strong>fMode = Urehtive” * Cu<strong>to</strong>ffPlace 5 0<br />

Round WpFlag t false;<br />

if f = 0 then FORMAT ! (0, k) else<br />

R + Wb(f, m=(e - P, 0));<br />

S t &i&(1, max(O, -(e - p)));<br />

M- t shi,&(l, max(e -p, 0));<br />

M+ + M-;<br />

Fizup;<br />

loop<br />

ktk-1;<br />

u t l(R x WSJ i<br />

Rc(Rx B)modS;<br />

M- tM- xB;<br />

M+tM+xB;<br />

low t 2 x RC M-;<br />

if Round UpFlag<br />

then high c 2 x R >(2 x S) - M+<br />

else high + 2 x R > (2 x S) - M+ fi;<br />

while not low and not high<br />

and k # Cu<strong>to</strong>#Place :<br />

FORMAT ! (V, k);<br />

repeat;<br />

cases<br />

low and not high : FORMAT! (V, k);<br />

high and not low : FORMAT! (U + 1,k);<br />

(low and high) or (not low and not high) :<br />

cases<br />

2 x R 2 S: FORMAT! (U, k);<br />

2x RZS: FORMAT!(U+l,k);<br />

endcases;<br />

endcases;<br />

fi;<br />

comment Henceforth this process will generate as<br />

many u- 1” digits as the caller desires, along<br />

with appropriate values of k.<br />

loop k + k - 1; FORMAT! (-1, k) repeat;<br />

end;<br />

124<br />

Table 6: Procedure Pisup<br />

procedure F&up;<br />

begin<br />

iff= shift,(l,p - 1) then<br />

comment Account for unequal gaps.<br />

M+ t shi&,(M+, 1);<br />

R t shi&.(R, 1);<br />

S t shi&(S, 1);<br />

fi;<br />

k t 0;<br />

loop<br />

while R < [S/B] :<br />

ktk-1;<br />

RtRxB;<br />

M-tM-xB;<br />

M+tM+xB;<br />

repeat;<br />

loop<br />

loop<br />

while (2 x R) + M+ 12 x S :<br />

StSxB;<br />

ktk+l;<br />

repeat ;<br />

comment Perform any necessary adjustment<br />

of M- and M+ <strong>to</strong> take in<strong>to</strong> account the for-<br />

matting requirements.<br />

case Cu<strong>to</strong>fiMode of<br />

%omnal” : Cu<strong>to</strong>ffPlace c k;<br />

“absolute* : Cu<strong>to</strong>ffAdjust;<br />

“relative” :<br />

Cu<strong>to</strong>ffPlace t k + Cu<strong>to</strong>flPlace;<br />

Cu<strong>to</strong>fiAdjust;<br />

endcase;<br />

while(2xR)+M+>2xS:<br />

repeat;<br />

end;<br />

Table 7: Procedure fill<br />

procedure fill(k, c);<br />

comment Send k copies of the character c <strong>to</strong> the<br />

USER process. No characters are sent if k = 0.<br />

for i from 1 <strong>to</strong> k do USER! (c) od;


Table 8: Procedure Cu<strong>to</strong>fAdjust<br />

procedure Cu<strong>to</strong>ffAdjust;<br />

begin<br />

a t Cu<strong>to</strong>ffPlace - k;<br />

Y + s;<br />

cases<br />

a?O: forjc1<strong>to</strong>adoytyxB;<br />

aOAw~max(d+1,2)<br />

ccw-d-l;<br />

GENERATE! (b,e, f,p, B, “absolute”, -d);<br />

GENERATE ? (U, k);<br />

ifk < c then<br />

ifk 0 then fiU(c - 1, u “); USER! (“on) fi;<br />

USER! (‘V’);<br />

fiZi(min(-k, d), “0”);<br />

else jiZr(c - k - 1, u “) fi;<br />

loop<br />

whilek>-d:<br />

Digit Chart U);<br />

ifk = 0 then USER! (“.“) fi;<br />

GENERATE<br />

repeat;<br />

? (U, k) ;<br />

else fiZZ(w, u*n) A;<br />

USER ! (“0”);<br />

end:


Table 12: Formatting process for free-format exponen-<br />

tial output<br />

process Free-Fomnal Exponential;<br />

begin<br />

USER ? (h et f, P, B);<br />

GENERATE! (b,e, f,p,B, “normal”, 0);<br />

GENERATE ? (U, expt);<br />

DigiiChar(U);<br />

USER! (“2’);<br />

loop<br />

GENERATE ? (U, k);<br />

whileU#-1:<br />

DigitChar(<br />

repeat ;<br />

if k = ezpt - 1 then USER ! (“0”) fi;<br />

USER! (“E”);<br />

if ezpt < 0 then USER ! (“-“); ezpl t - ezpi ii;<br />

j t 1;<br />

loop j t j x I?; while j 5 ezpt : repeat;<br />

loop<br />

i +- LVBJ;<br />

DigitChar(jezpZ/j]);<br />

ezpept t ezpt mod j;<br />

whilej>l:<br />

repeat;<br />

USER! (“0”);<br />

end;<br />

126<br />

Table 13: Formatting process for fixed-format exponen-<br />

tial output<br />

process Fixed-Format Exponential;<br />

begin<br />

USER ? (he, f, P, B, wl d, 2);<br />

assertd~OAx>lAw>d+x+3<br />

ctw-d-x-3;<br />

GENERATE! (b, e, f,p, B, “relative”, -d);<br />

GENERATE ? (U, ezpept);<br />

j t 1;<br />

a t 0;<br />

loop<br />

j +- j x B;<br />

cr+--a+1;<br />

while j 5 Iezptl :<br />

repeat;<br />

ifa< x then<br />

j;ZZ(c - 1, u “);<br />

DigitChar(<br />

USER! (“2’);<br />

for q t 1 <strong>to</strong> d do<br />

GENERATE ? (U, k) ;<br />

DigiZChar(U);<br />

od;<br />

USER ! (‘92”);<br />

if ezpt < 0 then<br />

USER ! (‘Q’)<br />

else<br />

USER ! (“+“)<br />

fi;<br />

ezpt e 1 expt 1;<br />

fiZZ(x - a, UO”);<br />

loop<br />

i +-- UIBJ;<br />

DigiiChar( [expt/jJ);<br />

ezpt t expt mod j;<br />

whilej>l:<br />

repeat;<br />

else $ZZ(w, u*n) fi;<br />

USER ! (‘W’);<br />

end

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!