27.06.2013 Views

A basic PDF writer in Tcl - Index of

A basic PDF writer in Tcl - Index of

A basic PDF writer in Tcl - Index of

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A <strong>basic</strong> <strong>PDF</strong> <strong>writer</strong> <strong>in</strong> <strong>Tcl</strong><br />

Lars Hellström<br />

February 3, 2005<br />

Abstract<br />

This file conta<strong>in</strong>s some <strong>basic</strong> rout<strong>in</strong>es that allow a <strong>Tcl</strong> script to write<br />

<strong>PDF</strong> files.<br />

Contents<br />

1 Usage 2<br />

1.1 File structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br />

1.2 Direct objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />

1.3 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />

1.4 Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

1.5 Outl<strong>in</strong>e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

1.6 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />

1.7 Rectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />

2 <strong>PDF</strong> files and objects 11<br />

2.1 Build<strong>in</strong>g objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

2.2 File structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br />

2.3 Hello World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />

3 Contents and resources 23<br />

3.1 Resources representation . . . . . . . . . . . . . . . . . . . . . . . . 24<br />

3.2 Formatt<strong>in</strong>g content . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />

3.3 Hello aga<strong>in</strong>, World . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

4 Document pages 32<br />

4.1 The tree <strong>of</strong> pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

4.2 Lengths and rectangles . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />

4.3 Paper sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />

4.4 A multi-page example . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />

5 Document outl<strong>in</strong>e 45<br />

5.1 Low-level stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

5.2 An outl<strong>in</strong>e <strong>of</strong> head<strong>in</strong>gs . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

5.3 An outl<strong>in</strong>e example . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />

1


1 Usage<br />

The aim <strong>of</strong> the <strong>basic</strong> pdf package is to simplify the generation <strong>of</strong> well-formed<br />

<strong>PDF</strong> files. Programmers who <strong>in</strong>tend to make use <strong>of</strong> it should first familiarize<br />

themselves with the actual <strong>PDF</strong> format specification, as it is not the aim <strong>of</strong> the<br />

<strong>basic</strong> pdf package 1 to substitute anyth<strong>in</strong>g else for the raw expressive power <strong>of</strong> the<br />

<strong>PDF</strong> format. Newcomers should f<strong>in</strong>d [2] (version 1.5 <strong>of</strong> the <strong>PDF</strong> specification) a<br />

good reference and <strong>in</strong>troduction to the details <strong>of</strong> the <strong>PDF</strong> format.<br />

1.1 File structure<br />

A <strong>PDF</strong> file is <strong>basic</strong>ally a (sometimes huge) data structure, consist<strong>in</strong>g <strong>of</strong> a myriad<br />

<strong>of</strong> objects (which are quite comparable to <strong>Tcl</strong> Objs, i.e., to <strong>Tcl</strong> values, although<br />

<strong>PDF</strong> objects have types). An object can be direct (encoded at the position it is<br />

used) or <strong>in</strong>direct (encoded somewhere else <strong>in</strong> the file and referenced by number).<br />

The absolute positions <strong>in</strong> the file <strong>of</strong> all <strong>in</strong>direct objects have to be given <strong>in</strong> a crossreference<br />

table at the end <strong>of</strong> the file, and gett<strong>in</strong>g this right is the first obstacle to<br />

generat<strong>in</strong>g a well-formed <strong>PDF</strong> file.<br />

pdf::put_obj (proc) The pdf package provides a model where <strong>in</strong>direct object can be assigned arbi-<br />

pdf::obj_ref (proc) trary str<strong>in</strong>gs as labels. Actual object numbers are allocated as needed and positions<br />

needed for the cross-reference table are recorded. The two <strong>basic</strong> commands<br />

for deail<strong>in</strong>g with <strong>in</strong>direct objects are<br />

pdf::rewrite_pdf (proc)<br />

pdf::close_pdf (proc)<br />

pdf::obj_ref {file} {reference label}<br />

pdf::put_obj {file} {reference label} {object}<br />

put_obj writes a <strong>PDF</strong> object to a file (thus mak<strong>in</strong>g it available as <strong>in</strong>direct object<br />

<strong>in</strong> that file), whereas obj_ref returns <strong>PDF</strong> code for a reference to an <strong>in</strong>direct<br />

object. obj_ref may occur before as well as after the put_obj for the object it<br />

refers to.<br />

Open <strong>PDF</strong> files are referenced via the usual identifier <strong>of</strong> the <strong>Tcl</strong> channel. To<br />

open a file for the purpose <strong>of</strong> creat<strong>in</strong>g a new <strong>PDF</strong> document, one uses<br />

pdf::rewrite_pdf {file name} 〈options〉<br />

which returns the identifier <strong>of</strong> the new file. The 〈options〉 is zero or more <strong>of</strong><br />

-permissions {<strong>in</strong>teger}<br />

-header {str<strong>in</strong>g}<br />

The permissions are the default permissions for the file <strong>in</strong> question. If this is not<br />

specified, then no such value is specified to open, The header is a str<strong>in</strong>g that will<br />

be put first <strong>in</strong> the file (as header). (The default header str<strong>in</strong>g declares the <strong>PDF</strong><br />

version to be 1.3 [1], which is a good compromise between support<strong>in</strong>g old <strong>PDF</strong><br />

consumers and provid<strong>in</strong>g <strong>PDF</strong> features.)<br />

The command used to close a <strong>PDF</strong> file should be<br />

1 But is a likely aim <strong>of</strong> add-on packages.<br />

2


pdf::close_pdf {file} {catalog label} {key} {value} ∗<br />

s<strong>in</strong>ce this is what will output the cross-reference table and trailer to this file,<br />

before it is closed. {catalog label} is the label <strong>of</strong> the /Catalog object for the<br />

document. The {key} {value} arguments are <strong>PDF</strong> objects which will be <strong>in</strong>serted<br />

<strong>in</strong>to the file’s trailer dictionary. Each {key} must be a name object, and each<br />

{value} the correspond<strong>in</strong>g value. (The /Size and /Root entries <strong>in</strong> this dictionary<br />

are generated automatically, so it is perfectly OK to only give two arguments to<br />

close_pdf.)<br />

It is part <strong>of</strong> the <strong>PDF</strong> specification how to make updates to an exist<strong>in</strong>g <strong>PDF</strong><br />

document, but the pdf package currently <strong>of</strong>fers no support for that. Should such<br />

support be added <strong>in</strong> the future, then one would use some other command than<br />

rewrite_pdf to open the file for modifications.<br />

1.2 Direct objects<br />

The pdf package commands that return {object}s (i.e., <strong>PDF</strong> code for an object)<br />

are<br />

pdf::boolean_obj {boolean}<br />

pdf::<strong>in</strong>t_obj {<strong>in</strong>teger}<br />

pdf::real_obj {value} {precision} ?<br />

pdf::str<strong>in</strong>g_obj {byte str<strong>in</strong>g}<br />

pdf::hexstr<strong>in</strong>g_obj {byte str<strong>in</strong>g}<br />

pdf::text_obj {str<strong>in</strong>g}<br />

pdf::name_obj {str<strong>in</strong>g}<br />

pdf::array_obj {object} ∗<br />

pdf::dict_obj {key object} {value object} ∗<br />

pdf::null_obj<br />

pdf::date_obj {clock value} {zonemode} ?<br />

pdf::length_obj {value} {unit} {precision} ?<br />

pdf::rect_obj {rectangle}<br />

pdf::<strong>in</strong>t_rect_obj {rectangle}<br />

pdf::resource_dict_obj {array-name}<br />

pdf::obj_ref {file} {reference label}<br />

All but the last <strong>of</strong> these return direct objects, whereas obj_ref as expla<strong>in</strong>ed above<br />

returns a reference to an <strong>in</strong>direct object. In addition to us<strong>in</strong>g the above commands,<br />

an {object} can also be the explicit <strong>PDF</strong> code for an object; this is most common<br />

with name objects.<br />

pdf::<strong>in</strong>t_obj (proc) The <strong>in</strong>t_obj command formats a <strong>Tcl</strong> <strong>in</strong>teger as a <strong>PDF</strong> object. The real_obj<br />

pdf::real_obj (proc) command similarly formats a <strong>Tcl</strong> double. The {precision} is the number <strong>of</strong> decimals<br />

that will be <strong>in</strong>cluded <strong>in</strong> the <strong>PDF</strong> code. When omitted, the current value <strong>of</strong><br />

pdf::precision (var.) the pdf::precision variable is used <strong>in</strong>stead. This variable is by default set to 3.<br />

pdf::str<strong>in</strong>g_obj (proc)<br />

pdf::hexstr<strong>in</strong>g_obj (proc)<br />

The str<strong>in</strong>g_obj command takes a {byte str<strong>in</strong>g} (a str<strong>in</strong>g consist<strong>in</strong>g <strong>of</strong> char-<br />

acters <strong>in</strong> the range \x00–\xFF) and returns the correspond<strong>in</strong>g <strong>PDF</strong> str<strong>in</strong>g object,<br />

delimited by parentheses. The hexstr<strong>in</strong>g_obj command does the same th<strong>in</strong>g, but<br />

3


makes use <strong>of</strong> hexstr<strong>in</strong>g (-delimited sequence <strong>of</strong> hexadecimal digits) encod<strong>in</strong>g<br />

<strong>in</strong>stead.<br />

Text objects and date objects are syntactically <strong>PDF</strong> str<strong>in</strong>g objects, but they<br />

are used <strong>in</strong> special contexts and are there given an <strong>in</strong>terpretation that is slightly<br />

different from that <strong>of</strong> ord<strong>in</strong>ary <strong>PDF</strong> str<strong>in</strong>gs. In particular, the character set for<br />

text object is always the full Unicode, whereas the encod<strong>in</strong>gs <strong>of</strong> ord<strong>in</strong>ary <strong>PDF</strong><br />

pdf::text_obj (proc) str<strong>in</strong>gs depend heavily on the context. The text_obj command takes an arbitrary<br />

pdf::date_obj (proc) <strong>Tcl</strong> str<strong>in</strong>g as argument and returns the correspond<strong>in</strong>g text object. The date_obj<br />

command takes a {clock value} (as used by the clock command) and returns the<br />

correspond<strong>in</strong>g <strong>PDF</strong> date object. The optional {zonemode} argument specifies how<br />

time zones are encoded <strong>in</strong> the object. An empty str<strong>in</strong>g (the default) or none means<br />

that no time zone specification should be <strong>in</strong>cluded. utc or gmt means encode the<br />

time as a UTC. local or full causes the <strong>of</strong>fset from local time to UTC to be<br />

computed and <strong>in</strong>cluded <strong>in</strong> the result.<br />

pdf::boolean_obj (proc) The boolean_obj and null_obj commands return boolean and null objects,<br />

pdf::null_obj (proc) respectively. They’re not that frequently used. The name_obj command returns<br />

pdf::name_obj (proc) the name object formed from a given str<strong>in</strong>g. This is most <strong>of</strong>ten used with variable<br />

str<strong>in</strong>gs, such as for example font names, that are not known when the program is<br />

written.<br />

pdf::array_obj (proc) The array_obj command returns the array object (comparable to a <strong>Tcl</strong> list)<br />

pdf::dict_obj (proc) that is formed from the given sequence <strong>of</strong> objects. The dict_obj command returns<br />

the dictionary object that is formed from the given sequence <strong>of</strong> keys and values.<br />

The {key object}s must all be name objects.<br />

1.3 Streams<br />

pdf::beg<strong>in</strong>_stream (proc) Much <strong>of</strong> the data <strong>in</strong> a <strong>PDF</strong> file is not stored <strong>in</strong> the above k<strong>in</strong>d <strong>of</strong> objects, but<br />

pdf::end_stream (proc) <strong>in</strong> a special k<strong>in</strong>d <strong>of</strong> <strong>in</strong>direct object called a stream. These are created us<strong>in</strong>g the<br />

commands<br />

pdf::beg<strong>in</strong>_stream {file} {label} {key} {value} ∗<br />

pdf::end_stream {file}<br />

The {label} is the one which will be used with obj_ref to refer to the stream<br />

object. Every stream comes with a stream dictionary that conta<strong>in</strong>s <strong>in</strong>formation<br />

about how the stream data should be decoded, e.g. decompressed. The {key}<br />

{value} arguments <strong>of</strong> beg<strong>in</strong>_stream are placed <strong>in</strong> this dictionary; the /Length<br />

entry for the stream is however automatically generated.<br />

Data written to a <strong>PDF</strong> file, us<strong>in</strong>g for example puts, between a beg<strong>in</strong>_stream<br />

and the match<strong>in</strong>g end_stream will go <strong>in</strong>to that stream. Such data need <strong>in</strong> general<br />

not conform to the ord<strong>in</strong>ary <strong>PDF</strong> syntax, but can be pretty much anyth<strong>in</strong>g. It<br />

will depend on where <strong>in</strong> the document the stream object is referenced whether the<br />

data is correct or not. Note that files opened us<strong>in</strong>g rewrite_pdf are configured to<br />

be b<strong>in</strong>ary. It is an error to try to beg<strong>in</strong> a new stream before end<strong>in</strong>g a previous one,<br />

but it is possible to use put_obj even <strong>in</strong>side a stream; the object is then cached<br />

<strong>in</strong>ternally and written to file after the stream has been ended.<br />

4


A special, but very common type <strong>of</strong> stream is the contents stream; this is for<br />

example used for all the text and graphics on actual document pages. Contents<br />

streams are created us<strong>in</strong>g the commands<br />

pdf::beg<strong>in</strong>_contents {resources-array} {file} {label} {key} {value} ∗<br />

pdf::end_contents {resources-array} {file}<br />

pdf::resource_dict_obj<br />

The special th<strong>in</strong>g about contents streams is that they are always associated with<br />

some resources dictionary, which maps names used <strong>in</strong> the contents stream to <strong>PDF</strong><br />

objects outside it. The extra feature provided by the . . . _contents commands as<br />

compared to the . . . _stream commands is a mechanism for keep<strong>in</strong>g track <strong>of</strong> the<br />

current set or resources and which permits extend<strong>in</strong>g this set when needed.<br />

When not <strong>in</strong>side a contents stream, data for resources dictionaries are<br />

(proc) kept <strong>in</strong> a <strong>Tcl</strong> array. The data can be converted to a <strong>PDF</strong> object us<strong>in</strong>g the<br />

pdf::beg<strong>in</strong>_contents<br />

resource_dict_obj command, which takes the name <strong>of</strong> an array as argument.<br />

beg<strong>in</strong>_contents similarly takes the name <strong>of</strong> an array as argument, and copies the<br />

(proc) data from this array to an <strong>in</strong>ternal (file-specific) storage. If the {resources-array}<br />

argument is empty then the <strong>in</strong>ternally stored resources dictionary starts out empty<br />

pdf::end_contents (proc)<br />

pdf::name_resource (proc)<br />

as well. end_contents conversely copies the resource dictionary entries from <strong>in</strong>-<br />

ternal storage to the specified {resources-array} (note: it does not clear that<br />

array first). If several contents streams are to share the same resources array,<br />

then one should pass the array filled <strong>in</strong> by the previous end_contents to the next<br />

beg<strong>in</strong>_content.<br />

Between beg<strong>in</strong>_contents and the match<strong>in</strong>g end_contents, one can use the<br />

name_resource command to get a name by which one can refer to a particular<br />

object from with<strong>in</strong> this contents stream. The syntax is<br />

pdf::name_resource {variable} {file} {type} {object} {suggested<br />

name} ?<br />

where the {variable} is the name <strong>of</strong> a variable that will be set to the wanted name<br />

object. {type} is the resource type, and should be one <strong>of</strong> ColorSpace, XObject,<br />

ExtGState, Font, Pattern, Properties, and Shad<strong>in</strong>g. {object} is the actual<br />

object (direct or <strong>in</strong>direct) and {file} is the <strong>PDF</strong> file.<br />

The optional {suggested name} argument can be used to force use <strong>of</strong> a particular<br />

name; if this is not supplied, then an available name is automatically generated.<br />

(Forc<strong>in</strong>g a particular name may be useful for backwards compatibility, as there are<br />

some known bugs <strong>in</strong> <strong>PDF</strong> readers which required us<strong>in</strong>g the same name <strong>in</strong> several<br />

different resource dictionaries.) Multiple calls for the same resource will reuse the<br />

same name, unless a suggested name is provided. The command returns 1 if a<br />

new name was added to the resource dictionary and 0 if an old name could be<br />

reused. An error is thrown if the {suggested name} is already assigned to some<br />

other object.<br />

The <strong>PDF</strong> specification also def<strong>in</strong>es ProcSet resources, but you need not worry<br />

about those. By default (i.e., if the ProcSet entry is not set), resource_dict_obj<br />

<strong>in</strong>serts an entry for the full set <strong>of</strong> procsets. Most <strong>PDF</strong> consumers never bothered<br />

about the procsets anyway.<br />

5


pdf::shipout (proc)<br />

1.4 Pages<br />

<strong>PDF</strong> requires that all pages are arranged <strong>in</strong> a data structure called the pages tree.<br />

The pdf package has commands that can take care <strong>of</strong> build<strong>in</strong>g this tree for you;<br />

if you use them, then you only have to worry about generat<strong>in</strong>g the pages <strong>in</strong> the<br />

order you want them to appear <strong>in</strong> the document.<br />

To f<strong>in</strong>ish a document page, one uses the command<br />

pdf::shipout {file} {label} {key} {object} +<br />

{file} here is the <strong>PDF</strong> file identifier and {label} is the reference label you want to<br />

assign to the page object. (L<strong>in</strong>ks <strong>in</strong> a <strong>PDF</strong> file require a reference to the target<br />

page, so it is likely that you will want to obj_ref the page.) The {key} and<br />

{object} arguments are attributes for the page object (keys and values for the<br />

dicitionary). This should not <strong>in</strong>clude the /Type and /Parent attributes, which<br />

are <strong>in</strong>serted automatically. An example:<br />

pdf::shipout $F "Page $n"\<br />

/Contents [pdf::obj_ref $F "Page $n contents"]\<br />

/Resources [pdf::obj_ref $F "Page $n resources"]<br />

Before the first shipout, one must <strong>in</strong>itialise the pages tree us<strong>in</strong>g the<br />

pdf::beg<strong>in</strong>_pages (proc) beg<strong>in</strong>_pages command, and after the last shipout, one must use end_pages<br />

pdf::end_pages (proc) to complete the pages tree.<br />

pdf::beg<strong>in</strong>_pages {file} {label prefix} {option} {value} ∗<br />

pdf::end_pages {file} {option} {value} ∗<br />

{option}s that beg<strong>in</strong> with a / are <strong>in</strong>terpreted as names <strong>of</strong> entries to <strong>in</strong>sert <strong>in</strong>to the<br />

root node <strong>of</strong> the pages tree; <strong>in</strong> this case the {value} must be an object. This is<br />

useful if some attribute (e.g. page size) is the same for all pages, as one can then<br />

specify it once at the root and let it be <strong>in</strong>herited by all the pages.<br />

Every node <strong>in</strong> the tree is given a reference label, so to avoid clashes with other<br />

objects, all /Pages nodes (but not the page nodes) are given labels that beg<strong>in</strong>s with<br />

the {label prefix} specified at beg<strong>in</strong>_pages. The end_pages command returns the<br />

label that was given to the root node.<br />

The pages tree constructed by end_pages is balanced and <strong>of</strong> m<strong>in</strong>imal size with<br />

respect to its arity (number <strong>of</strong> kids per parent). The default arity is 5, but that<br />

can be overridden us<strong>in</strong>g the -arity {option} <strong>of</strong> beg<strong>in</strong>_pages, <strong>in</strong> which case the<br />

correspond<strong>in</strong>g {value} is the new arity.<br />

1.5 Outl<strong>in</strong>e<br />

A similar mechanism exists for build<strong>in</strong>g the outl<strong>in</strong>e tree. Construction <strong>of</strong> this is<br />

pdf::beg<strong>in</strong>_outl<strong>in</strong>e (proc) begun at beg<strong>in</strong>_outl<strong>in</strong>e and completed at end_outl<strong>in</strong>e. To beg<strong>in</strong>_outl<strong>in</strong>e one<br />

pdf::end_outl<strong>in</strong>e (proc) must supply a str<strong>in</strong>g that will be used as prefix for all labels <strong>of</strong> nodes <strong>in</strong> the tree,<br />

and end_outl<strong>in</strong>e will return the label <strong>of</strong> the outl<strong>in</strong>e tree root node.<br />

pdf::beg<strong>in</strong>_outl<strong>in</strong>e {file} {prefix}<br />

pdf::end_outl<strong>in</strong>e {file}<br />

6


pdf::outl<strong>in</strong>e_head<strong>in</strong>g New items can be added to the outl<strong>in</strong>e us<strong>in</strong>g the outl<strong>in</strong>e_head<strong>in</strong>g command.<br />

(proc) This has the syntax<br />

pdf::outl<strong>in</strong>e_head<strong>in</strong>g {file} {level} {title} {option} {value} ∗<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file, {level} is the nom<strong>in</strong>al level <strong>of</strong> this<br />

item, and {title} is the title. The title is an ord<strong>in</strong>ary <strong>Tcl</strong> str<strong>in</strong>g and there is no<br />

restriction on which characters it may conta<strong>in</strong>.<br />

An {option} {value} is either a pair <strong>of</strong> <strong>PDF</strong> objects, where the first is a name<br />

object, or<br />

-open {boolean}<br />

The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the new item. These<br />

are what one should use to specify a dest<strong>in</strong>ation or equivalent for the outl<strong>in</strong>e item.<br />

The -open option controls whether this item will be open by default, i.e., if its<br />

subitems (if there will be any) should be shown. It defaults to false (closed).<br />

The {level} is relative, and can be an arbitrary str<strong>in</strong>g. The way it is used<br />

is that if {level} is greater than the current level, then a new level is begun.<br />

Else if {level} is greater than the previous level, the item is a sibl<strong>in</strong>g <strong>of</strong> the last<br />

item and the current level is updated. Otherwise the current level is ended and<br />

the issue is reexam<strong>in</strong>ed. This dynamically adapts to the set <strong>of</strong> {level}s actually<br />

used <strong>in</strong> a document, even if these are not consecutive. It also gracefully copes<br />

with <strong>in</strong>consistencies such as forgett<strong>in</strong>g some head<strong>in</strong>g level at the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> a<br />

document.<br />

It is possible to create rather obnoxious outl<strong>in</strong>es by hardwir<strong>in</strong>g particular zoom<br />

factors <strong>in</strong>to the outl<strong>in</strong>e. It is usually best to specify no more than the dest<strong>in</strong>ation<br />

page and vertical position, as shown <strong>in</strong> this example:<br />

pdf::outl<strong>in</strong>e_head<strong>in</strong>g $F 1 "Introduction" /Dest [<br />

pdf::array_obj [pdf::obj_ref $F "Page 1"] /XYZ null\<br />

[pdf::real_obj $ypos] null<br />

]<br />

There are also four lower level commands available, which may be useful if for<br />

some reason some <strong>in</strong>formation needed for an entry is not available until the end <strong>of</strong><br />

pdf::outl<strong>in</strong>e_node_set it (e.g. the position <strong>of</strong> that end). The outl<strong>in</strong>e_node_set command can be used<br />

(proc) to set entries <strong>in</strong> the dictionary <strong>of</strong> the current outl<strong>in</strong>e item. Its syntax is<br />

pdf::outl<strong>in</strong>e_item (proc)<br />

pdf::outl<strong>in</strong>e_beg<strong>in</strong>group<br />

(proc)<br />

pdf::outl<strong>in</strong>e_endgroup<br />

(proc)<br />

pdf::outl<strong>in</strong>e_node_set {file} {option} {value} ∗<br />

The outl<strong>in</strong>e_item command creates a new item <strong>in</strong> the current level <strong>of</strong> the outl<strong>in</strong>e.<br />

Its syntax is<br />

pdf::outl<strong>in</strong>e_item {file} {title} {option} {value} ∗<br />

These options and values are handled as for outl<strong>in</strong>e_head<strong>in</strong>g.<br />

For beg<strong>in</strong>n<strong>in</strong>g and end<strong>in</strong>g lower level groups <strong>of</strong> items, there are the commands<br />

pdf::outl<strong>in</strong>e_beg<strong>in</strong>group {file} {option} {value} ∗<br />

pdf::outl<strong>in</strong>e_endgroup {file} {option} {value} ∗<br />

7


pdf::pr<strong>in</strong>tf (proc)<br />

pdf::spr<strong>in</strong>tf (proc)<br />

The {option} and {value} arguments here affect the parent <strong>of</strong> the new group <strong>of</strong><br />

items. Note that between an outl<strong>in</strong>e_beg<strong>in</strong>group and the first outl<strong>in</strong>e_item<br />

after it, there is no current item <strong>in</strong> the outl<strong>in</strong>e.<br />

1.6 Contents<br />

Once <strong>in</strong>side a contents stream, <strong>PDF</strong> is fairly similar to Postscript (although still<br />

more strict and structured) with sequences <strong>of</strong> operands followed by some operator.<br />

To simplify writ<strong>in</strong>g such code, there is a command pr<strong>in</strong>tf which <strong>of</strong>fers format-<br />

style formatt<strong>in</strong>g <strong>of</strong> data written to the file. The syntax is<br />

pdf::pr<strong>in</strong>tf {file} {format list} {data} ∗<br />

and as with format, each conversion specifier <strong>in</strong> the {format list} consumes one<br />

or several {data} items. (It is probably a good idea to limit the length <strong>of</strong> {format<br />

list}s to small enough chunks that you can <strong>in</strong>stantly see what each {data} item is<br />

used for.) There is also a command spr<strong>in</strong>tf with syntax<br />

pdf::spr<strong>in</strong>tf {format list} {data} ∗<br />

that returns the formatted code rather than writ<strong>in</strong>g it to a file.<br />

The {format list}s are lists where every element is either explcit <strong>PDF</strong> code<br />

(typically an operator) or a conversion specifier. As with format, the conversion<br />

specifiers are recognised by the fact that their first character is a ‘%’. The contributions<br />

to the formatted <strong>PDF</strong> code from separate list elements will be separated<br />

by whitespace as necessary.<br />

The second character <strong>of</strong> a conversion specifier determ<strong>in</strong>es the type <strong>of</strong> conversion<br />

to carry out. The <strong>basic</strong> conversions are<br />

b Boolean, to be formatted by boolean_obj.<br />

i Integer, to be formatted by <strong>in</strong>t_obj.<br />

l Length, to be formatted by length_obj. This consumes two {data} arguments:<br />

one for the value and one for the unit.<br />

n Data is a str<strong>in</strong>g, to be formatted by name_obj.<br />

o Already formatted <strong>PDF</strong> object.<br />

r Real number, to be formatted by real_obj (with default precision accord<strong>in</strong>g<br />

to the precision variable).<br />

s <strong>PDF</strong> str<strong>in</strong>g, to be formatted by str<strong>in</strong>g_obj.<br />

In addition, the correspond<strong>in</strong>g upper case letters select the same formatt<strong>in</strong>g, but<br />

the (first) {data} argument is <strong>in</strong>terpreted as a list <strong>of</strong> values to format <strong>in</strong> the specified<br />

way. The character may also be an &, <strong>in</strong> which case the {data} is <strong>in</strong>terpreted<br />

as a list<br />

{format list} {data} ∗<br />

8


pdf::length (proc)<br />

pdf::length_obj (proc)<br />

which will be formatted by a recursive spr<strong>in</strong>tf call and <strong>in</strong>serted <strong>in</strong>to the result<br />

at that position. This is <strong>in</strong>tended to simplify encod<strong>in</strong>g structured data.<br />

The exact format <strong>of</strong> a conversion specifier is<br />

%〈char〉 〈count〉(.〈precision〉) ? ?<br />

The 〈count〉 defaults to 1 and specify<strong>in</strong>g a non-unit 〈count〉 value is equivalent<br />

to specify<strong>in</strong>g that many separate conversion specifiers <strong>in</strong> sequence. Specify<strong>in</strong>g a<br />

〈precision〉 overrides the precision default for real and length conversions.<br />

Page contents <strong>in</strong> <strong>PDF</strong> are primarily graphical, and thus there is a fair amount<br />

<strong>of</strong> coord<strong>in</strong>ates <strong>in</strong>volved. For manufactur<strong>in</strong>g isolated coord<strong>in</strong>ates, the length com-<br />

mand, its object-mak<strong>in</strong>g counterpart length_obj, and the pr<strong>in</strong>tf counterpart %l<br />

are convenient, as they make it possible to express lengths <strong>in</strong> physical units and<br />

then have them automatically converted to the (default) <strong>PDF</strong> length unit. The<br />

syntax is<br />

pdf::length {value} {unit}<br />

pdf::length_obj {value} {unit} {precision} ?<br />

where {value} is the numerical value, {unit} the name <strong>of</strong> the unit it is expressed<br />

<strong>in</strong>, and {precision} as with real_obj an optimal precision that is specified if one<br />

wishes to override the default. The units known to the pdf package are<br />

An example:<br />

bp Postscript po<strong>in</strong>t (1/72 <strong>in</strong>)<br />

cc cicero<br />

cm centimeter<br />

dd Didot po<strong>in</strong>t (European pr<strong>in</strong>ter’s po<strong>in</strong>t)<br />

<strong>in</strong> <strong>in</strong>ch<br />

mm millimeter<br />

pc pica<br />

pt (American) pr<strong>in</strong>ter’s po<strong>in</strong>t<br />

pdf::beg<strong>in</strong>_contents "" $F "A page"<br />

pdf::pr<strong>in</strong>tf $F {%l2 m %L l S} 5 cm 5 cm {10 15} cm<br />

pdf::name_resource times_font $F Font [pdf::dict_obj\<br />

/Type /Font /Subtype /Type1 /BaseFont /Times-Roman\<br />

/Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />

pdf::pr<strong>in</strong>tf $F {BT}<br />

pdf::pr<strong>in</strong>tf $F {%o %l Tf 1 0 0 1 %L1.1 Tm} $times_font 12 dd {8 10} cm<br />

pdf::pr<strong>in</strong>tf $F {%s Tj} [encod<strong>in</strong>g convertto macRoman "na\u00EFve"]<br />

pdf::pr<strong>in</strong>tf $F {ET}<br />

pdf::end_contents resarr $F<br />

1.7 Rectangles<br />

Coord<strong>in</strong>ates occur not only <strong>in</strong> page contents, but also <strong>in</strong> many other data structures<br />

<strong>in</strong> a <strong>PDF</strong> file. In particular it is common that one has to specify some<br />

9


pdf::make_rect (proc)<br />

pdf::<strong>of</strong>fset_rect (proc)<br />

pdf::<strong>in</strong>set_rect (proc)<br />

rectangle (e.g. the clickable area <strong>of</strong> a l<strong>in</strong>k, or the imagable area <strong>of</strong> a page), so the<br />

pdf package provides several commands for creat<strong>in</strong>g, modify<strong>in</strong>g, and formatt<strong>in</strong>g<br />

rectangles.<br />

The <strong>basic</strong> format for a {rectangle} that the pdf package uses is as a list <strong>of</strong> four<br />

elements<br />

{left} {bottom} {right} {top}<br />

each <strong>of</strong> which is the coord<strong>in</strong>ate <strong>of</strong> one side <strong>of</strong> the rectangle, <strong>in</strong> default <strong>PDF</strong> units<br />

(i.e., bp). Such lists are returned by the commands<br />

pdf::make_rect {option} {value} {unit} ? +<br />

pdf:<strong>of</strong>fset_rect {rect} {dx} {dy} {unit} ?<br />

pdf::<strong>in</strong>set_rect {rect} {amount} {unit}<br />

pdf::<strong>in</strong>set_rect {rect} {dx} {dy} {unit}<br />

pdf::<strong>in</strong>set_rect {rect} {dl} {db} {dr} {dt} {unit}<br />

pdf::standard_rect {rect}<br />

make_rect is a generic “tell me what you know about the rectangle and I’ll figure<br />

out what its coord<strong>in</strong>ates are” command. Each option specifies a value for one<br />

or two quantities that can be derived from the rectangle coord<strong>in</strong>ates, and by<br />

comb<strong>in</strong><strong>in</strong>g the <strong>in</strong>formation the command calculates the rectangle coord<strong>in</strong>ates.<br />

-width Distance from left to right<br />

-height Distance from bottom to top<br />

-left left<br />

-right right<br />

-top top<br />

-bottom bottom<br />

-ll {left bottom}<br />

-lr {right bottom}<br />

-ul {left top}<br />

-ur {right top}<br />

-center midpo<strong>in</strong>t<br />

-midx x-coord<strong>in</strong>ate <strong>of</strong> midpo<strong>in</strong>t<br />

-midy y-coord<strong>in</strong>ate <strong>of</strong> midpo<strong>in</strong>t<br />

The way it works is that the list <strong>of</strong> options is processed left to right, every<br />

option contributes some <strong>in</strong>formation about the wanted rectangle, and when all<br />

four coord<strong>in</strong>ates are known the rectangle is returned. The {value} is, depend<strong>in</strong>g<br />

on the option, either a number or a po<strong>in</strong>t (list <strong>of</strong> two numbers). The {unit} is the<br />

unit <strong>of</strong> the {value}; it defaults to bp if omitted.<br />

Once a rectangle has been constructed, it can be modified us<strong>in</strong>g the other<br />

commands shown above. The <strong>of</strong>fset_rect command moves the rectangle but<br />

preserves its size; the {unit} defaults to bp. The <strong>in</strong>set_rect command shr<strong>in</strong>ks a<br />

rectangle by mov<strong>in</strong>g the sides <strong>in</strong>wards by the specified amount(s), or for a negative<br />

amount grows the rectangle by mov<strong>in</strong>g the sides outwards. One can specify a s<strong>in</strong>gle<br />

10


pdf::standard_rect (proc)<br />

pdf::rect_obj (proc)<br />

pdf::<strong>in</strong>t_rect_obj (proc)<br />

pdf::wh_rect (proc)<br />

pdf::paper_rect (array)<br />

{amount} for all sides, separate {dx} and {dy} amounts for horizontal and vertical<br />

coord<strong>in</strong>ates respectively, or separate amounts for each <strong>of</strong> the four sides. A typical<br />

usage is to shr<strong>in</strong>k a rectangle to leave a marg<strong>in</strong>.<br />

It is possible to end up with a rectangle where the {bottom} is above the {top}<br />

or {left} is further right than {right} itself, i.e., a rectangle with negative height<br />

or width. <strong>PDF</strong> consumers typically normalises such rectangles by exchang<strong>in</strong>g<br />

the sides as necessary, so this is <strong>of</strong>ten not a problem, but if you want to ensure<br />

that a rectangle has positive height or depth then you may feed it through the<br />

standard_rect command. This might be necessary if you want to place a po<strong>in</strong>t<br />

below some given rectangle.<br />

To get a rectangle <strong>in</strong>to <strong>PDF</strong> code, there are three commands<br />

pdf::rect_obj {rectangle}<br />

pdf::<strong>in</strong>t_rect_obj {rectangle}<br />

pdf::wh_rect {rectangle}<br />

rect_obj returns a rectangle object (a <strong>PDF</strong> array <strong>of</strong> four real numbers).<br />

<strong>in</strong>t_rect_obj also returns a rectangle object, but rounds the coord<strong>in</strong>ates to <strong>in</strong>-<br />

tegers first to conserve space. wh_rect does not return <strong>PDF</strong> code, but simply the<br />

four element <strong>Tcl</strong> list<br />

{left} {bottom} {width} {height}<br />

that corresponds to the {rectangle}. These are the operands <strong>of</strong> the re operator,<br />

and can be conveniently formatted us<strong>in</strong>g %R.<br />

F<strong>in</strong>ally, there is an array paper_rect which conta<strong>in</strong>s the /MediaBox rectangles<br />

correspond<strong>in</strong>g to some popular paper sizes: A4, A4R (landscape A4), letter, and<br />

legal.<br />

Implementation<br />

2 <strong>PDF</strong> files and objects<br />

A Portable Document Format (<strong>PDF</strong>) file is, when compared with for example a<br />

PostScript file or HTML file, a rather disorganised document. This is because at<br />

the <strong>basic</strong> level, a <strong>PDF</strong> file is a heap rather than a text; it can be “disorganised”<br />

s<strong>in</strong>ce its logical structure is based on cross-referenc<strong>in</strong>g rather than on sequentiality.<br />

The first step is therefore to provide support for writ<strong>in</strong>g well-formed heaps.<br />

1 〈∗pkg〉<br />

2 package require <strong>Tcl</strong> 8.3<br />

<strong>Tcl</strong> 8.3 is required for array unset, and str<strong>in</strong>g equal is used <strong>in</strong> some places. It<br />

should be possible to make the code should run on <strong>Tcl</strong> 8.1.1 (which is required for<br />

str<strong>in</strong>g map) if those two were worked around.<br />

3 package provide pdf 0.2<br />

4 namespace eval pdf {}<br />

11


2.1 Build<strong>in</strong>g objects<br />

The <strong>in</strong>dependent units <strong>in</strong> a <strong>PDF</strong> file are called objects. An object is essentially<br />

a value (which <strong>in</strong>cludes a type). The procedures below construct str<strong>in</strong>gs <strong>of</strong> <strong>PDF</strong><br />

code that encode objects <strong>of</strong> various types. The str<strong>in</strong>gs returned are generally<br />

such that one must <strong>in</strong>sert whitespace between two such str<strong>in</strong>gs if the data is to<br />

be properly encoded. The str<strong>in</strong>gs may conta<strong>in</strong> newl<strong>in</strong>es if some build<strong>in</strong>g rout<strong>in</strong>e<br />

th<strong>in</strong>ks the l<strong>in</strong>es should otherwise be too long.<br />

pdf::boolean_obj (proc) The boolean_obj procedure returns a boolean object, correspond<strong>in</strong>g to the str<strong>in</strong>g<br />

passed as its only argument. The argument can be any <strong>Tcl</strong> boolean value.<br />

5 proc pdf::boolean_obj {value} {<br />

6 if {$value} then {return true} else {return false}<br />

7 }<br />

pdf::<strong>in</strong>t_obj (proc) The <strong>in</strong>t_obj procedure returns the <strong>PDF</strong> object correspond<strong>in</strong>g to the <strong>in</strong>teger supplied<br />

as argument.<br />

pdf::real_obj (proc)<br />

pdf::precision (var.)<br />

8 proc pdf::<strong>in</strong>t_obj {value} {format %d $value}<br />

The real_obj procedure returns the <strong>PDF</strong> object correspond<strong>in</strong>g to the real number<br />

supplied as argument. The syntax is<br />

pdf::real_obj {value} {precision} ?<br />

where {precision} is the number <strong>of</strong> decimals that will be <strong>in</strong>cluded <strong>in</strong> the object.<br />

If omitted, the value <strong>of</strong> the precision variable is used, and that defaults to 3.<br />

9 set pdf::precision 3<br />

10 proc pdf::real_obj {value {precision -1}} {<br />

11 if {$precision


24 if {$code==92} then {<br />

25 append str \\<br />

26 <strong>in</strong>cr len<br />

27 cont<strong>in</strong>ue<br />

28 } elseif {$code=100} {<br />

47 lappend L [str<strong>in</strong>g map [list \\ \\\\ ( \\( ) \\) \r \\r \n \\n]\<br />

[str<strong>in</strong>g range $str 0 99]]<br />

49 set str [str<strong>in</strong>g range $str 100 end]<br />

50 }<br />

51 if {[str<strong>in</strong>g length $str]} then {<br />

52 lappend L\<br />

[str<strong>in</strong>g map [list \\ \\\\ ( \\( ) \\) \r \\r \n \\n] $str]<br />

54 }<br />

55 set str ([jo<strong>in</strong> $L \\\n])<br />

56 }<br />

57 return $str<br />

58 }<br />

pdf::hexstr<strong>in</strong>g_obj (proc) The hexstr<strong>in</strong>g_obj procedure returns the <strong>PDF</strong> str<strong>in</strong>g object, encoded as hexadecimal<br />

digits, that corresponds to the argument. If the str<strong>in</strong>g is longer than 31<br />

characters then it will be broken on several l<strong>in</strong>es.<br />

59 proc pdf::hexstr<strong>in</strong>g_obj {str} {<br />

60 set hstr "


70 <strong>in</strong>cr len 2<br />

71 } else {<br />

72 error "Bad character $ch [format (U+%04x) $code] <strong>in</strong> <strong>PDF</strong>\<br />

str<strong>in</strong>g."<br />

74 }<br />

75 }<br />

76 append hstr ">"<br />

77 }<br />

pdf::text_obj (proc) The text_obj procedure returns the <strong>PDF</strong> text str<strong>in</strong>g object that corresponds to<br />

the argument str<strong>in</strong>g. The syntax is<br />

pdf::text_obj {str<strong>in</strong>g}<br />

where {str<strong>in</strong>g} is an arbitrary <strong>Tcl</strong> str<strong>in</strong>g. (Ord<strong>in</strong>ary <strong>PDF</strong> str<strong>in</strong>gs are more like <strong>Tcl</strong><br />

byte arrays.)<br />

The greatest complication <strong>in</strong> the implementation is check<strong>in</strong>g whether the<br />

{str<strong>in</strong>g} can be encoded <strong>in</strong> <strong>PDF</strong>DocEncod<strong>in</strong>g or will have to be expressed <strong>in</strong><br />

UTF-16BE. This is handled slightly sneakily, as <strong>in</strong> fact only the subset <strong>of</strong><br />

<strong>PDF</strong>DocEncod<strong>in</strong>g that co<strong>in</strong>cides with iso8859-1 (and hence Unicode) is allowed;<br />

any character outside that set triggers conversion to UTF-16BE (as does a str<strong>in</strong>g<br />

that beg<strong>in</strong>s with the Byte Order Mark \xFE\xFF).<br />

UTF-16BE-encoded str<strong>in</strong>gs are hexcoded, s<strong>in</strong>ce they are probably easier to<br />

<strong>in</strong>terpret that way. Str<strong>in</strong>gs not require<strong>in</strong>g UTF-16BE-encod<strong>in</strong>g are not hexcoded.<br />

78 proc pdf::text_obj {str} {<br />

79 if {[regexp -- {[^ -~\241-\254\256-\377]|^\xFE\xFF} $str]} then {<br />

80 b<strong>in</strong>ary scan [encod<strong>in</strong>g convertto unicode $str] H* uhex<br />

81 regsub -all -- {\w{64}} "" "&\n" res<br />

82 return $res<br />

83 } else {<br />

84 return [str<strong>in</strong>g_obj $str]<br />

85 }<br />

86 }<br />

pdf::name_obj (proc) The name_obj procedure returns the <strong>PDF</strong> name object correspond<strong>in</strong>g to its argument.<br />

It is useful ma<strong>in</strong>ly for names with strange characters <strong>in</strong> them (non-ASCII<br />

characters or characters with special mean<strong>in</strong>g <strong>in</strong> <strong>PDF</strong> syntax), but most names<br />

(e.g. dictionary keys) appear<strong>in</strong>g <strong>in</strong> <strong>PDF</strong> files do not require any quot<strong>in</strong>g and can<br />

therefore just as well be written as explicit <strong>PDF</strong> code.<br />

87 proc pdf::name_obj {str} {<br />

88 if {[str<strong>in</strong>g bytelength $str]>126} then {<br />

89 error "Str<strong>in</strong>g too long to be a <strong>PDF</strong> name."<br />

90 }<br />

91 set res /<br />

92 foreach ch [split [encod<strong>in</strong>g convertto utf-8 $str] {}] {<br />

93 switch -glob -- $ch {<br />

94 ( - ) - < - > - \\[ - \\] - \{ - \} - / - % - # {<br />

95 scan $ch %c code<br />

96 append res [format #%02x $code]<br />

14


97 }<br />

98 [!-~] {append res $ch}<br />

99 default {<br />

100 scan $ch %c code<br />

101 append res [format #%02x $code]<br />

102 }<br />

103 }<br />

104 }<br />

105 return $res<br />

106 }<br />

pdf::array_obj (proc) The array_obj procedure builds an array object <strong>of</strong> the objects it is given as<br />

arguments. The syntax is<br />

pdf::array_obj {object} ∗<br />

Newl<strong>in</strong>es are <strong>in</strong>serted between the objects if it does not appear as if the object<br />

would fit on a s<strong>in</strong>gle (100 character) l<strong>in</strong>e.<br />

107 proc pdf::array_obj {args} {<br />

108 set res \[<br />

109 set len 1<br />

110 foreach item $args {<br />

111 if {[str<strong>in</strong>g length $item] + $len >= 100} then {<br />

112 append res \n<br />

113 set len 0<br />

114 } elseif {[str<strong>in</strong>g length $res]>1} then {<br />

115 append res " "<br />

116 <strong>in</strong>cr len<br />

117 }<br />

118 append res $item<br />

119 <strong>in</strong>cr len [str<strong>in</strong>g length $item]<br />

120 }<br />

121 if {$len >= 100} then {<br />

122 append res \n<br />

123 }<br />

124 append res \]<br />

125 }<br />

pdf::dict_obj (proc) The dict_obj procedure builds a dictionary object from its arguments. The<br />

syntax is<br />

pdf::dict_obj {key} {value} ∗<br />

where each {key} must be a name object and each {value} must be an object. It<br />

is checked that the number <strong>of</strong> elements is correct and that the keys beg<strong>in</strong> with a<br />

slash.<br />

126 proc pdf::dict_obj {args} {<br />

127 if {[llength $args] % 2 != 0} then {<br />

128 error "Not the same number <strong>of</strong> keys and values."<br />

129 }<br />

15


130 set res ">"<br />

142 }<br />

pdf::null_obj (proc) The null_obj procedure returns a null object. It has no arguments.<br />

143 proc pdf::null_obj {} {return null}<br />

pdf::date_obj (proc) The date_obj procedure formats a <strong>Tcl</strong> seconds value as a <strong>PDF</strong> date str<strong>in</strong>g object.<br />

The syntax is<br />

pdf::date_obj {seconds} {local} ?<br />

where {seconds} is the time as returned by clock seconds and {local} controls<br />

how to deal with the issue <strong>of</strong> time zones. The possible values for this are (noncase-sensitive)<br />

none or an empty str<strong>in</strong>g Express time <strong>in</strong> local timezone, but don’t <strong>in</strong>clude any time zone <strong>in</strong>formation<br />

<strong>in</strong> the result. This is the default.<br />

UTC or gmt Express time <strong>in</strong> UTC and use Z as timezone.<br />

local or full Express time <strong>in</strong> local timezone, compute its difference to UTC, and <strong>in</strong>clude<br />

that <strong>in</strong> the result.<br />

144 proc pdf::date_obj {secs {local ""}} {<br />

145 switch -- [str<strong>in</strong>g tolower $local] none - "" {<br />

146 return [clock format $secs -format (D:%Y%m%d%H%M%S)]<br />

147 } utc - gmt {<br />

148 return [clock format $secs -format (D:%Y%m%d%H%M%SZ) -gmt 1]<br />

149 } full - local {<br />

150 set res [clock format $secs -format (D:%Y%m%d%H%M%S]<br />

151 set semilocal [clock format $secs -format "%Y%m%d %H:%M:%S"]<br />

152 set local [clock scan $semilocal -gmt 1]<br />

153 set <strong>of</strong>fset [expr {$local - $secs}]<br />

154 if {$<strong>of</strong>fset < 0} then {<br />

155 append res -<br />

156 set <strong>of</strong>fset [expr abs($<strong>of</strong>fset)]<br />

157 } else {<br />

158 append res +<br />

159 }<br />

160 append res [clock format $<strong>of</strong>fset -format "%H’%M’)" -gmt 1]<br />

16


161 return $res<br />

162 } default {<br />

163 error "Unknown locality sett<strong>in</strong>g ’$local’"<br />

164 }<br />

165 }<br />

Objects can also be streams, but those have a special relation to the file structure<br />

and are therefore best treated <strong>in</strong> conjunction with that. In particular, streams<br />

cannot be used as arguments <strong>of</strong> array_obj or dict_obj. The arguments <strong>of</strong> these<br />

procedures can however be <strong>in</strong>direct references to objects <strong>of</strong> any type, but these<br />

too are best treated <strong>in</strong> the context <strong>of</strong> the <strong>basic</strong> <strong>PDF</strong> file structure.<br />

2.2 File structure<br />

The body <strong>of</strong> a <strong>PDF</strong> file consists <strong>of</strong> a sequence <strong>of</strong> <strong>in</strong>direct objects, which are ma<strong>in</strong>ly<br />

a sort <strong>of</strong> declarations: a pair <strong>of</strong> <strong>in</strong>tegers are associated with an object value. S<strong>in</strong>ce<br />

any composite object can (and <strong>in</strong> several cases must) conta<strong>in</strong> a reference to any<br />

<strong>in</strong>direct object, this makes it possible to build up arbitrary data structures. It<br />

is however also a complication, s<strong>in</strong>ce it requires that there is a mechanism for<br />

allocat<strong>in</strong>g these numbers.<br />

pdf::file〈num〉 (array) Every file that <strong>Tcl</strong> opens gets a unique identifier which is used <strong>in</strong> calls to puts and<br />

such. This identifier is also used as the name <strong>of</strong> an array <strong>in</strong> the pdf namespace,<br />

<strong>in</strong> which the procedures below store all auxiliary <strong>in</strong>formation they need to create<br />

a proper <strong>PDF</strong> file.<br />

pdf::file〈num〉<br />

(!〈reference label〉)<br />

pdf::file〈num〉<br />

(last_object_num)<br />

In this API, references to <strong>in</strong>direct objects can be arbitrary str<strong>in</strong>gs, called reference<br />

labels. The correspondence to the object numbers actually found <strong>in</strong> the file is given<br />

by the !〈reference label〉 entries <strong>in</strong> the array <strong>of</strong> the file <strong>in</strong> question. The entries <strong>in</strong><br />

this array are lists with the structure<br />

{object number} {generation number} {file position} ?<br />

where the {file position} is present only if the <strong>in</strong>direct object <strong>in</strong> question has been<br />

written to file already. The {object number} is the number <strong>of</strong> the object referred<br />

to. The {generation number} is currently always zero; it appears that it can only<br />

be nonzero for files that have <strong>in</strong>crementally updated, and this API only supports<br />

creat<strong>in</strong>g a file from scratch. The {file position} is the position <strong>in</strong> the file <strong>of</strong> the<br />

beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the <strong>in</strong>direct object beg<strong>in</strong> referred to.<br />

The last_object_num entry <strong>in</strong> the array holds the most recently allocated<br />

object number. It is <strong>in</strong>cremented whenever a new reference label is encountered.<br />

pdf::obj_ref (proc) The obj_ref procedure returns <strong>PDF</strong> code for an <strong>in</strong>direct reference to an object.<br />

The syntax is<br />

pdf::obj_ref {file} {reference label}<br />

17


pdf::beg<strong>in</strong>_stream (proc)<br />

pdf::end_stream (proc)<br />

pdf::file〈num〉<br />

(current_stream)<br />

pdf::file〈num〉<br />

(?〈reference label〉)<br />

where {file} is the <strong>in</strong>dentifier <strong>of</strong> the <strong>PDF</strong> file <strong>in</strong> question. If the {reference label}<br />

has not been encountered before for this particular file, then a new object number<br />

is allocated for it.<br />

166 proc pdf::obj_ref {F label} {<br />

167 upvar #0 [namespace current]::$F A<br />

168 if {![<strong>in</strong>fo exists A(!$label)]} then {<br />

169 <strong>in</strong>cr A(last_object_num)<br />

170 set A(!$label) [list $A(last_object_num) 0]<br />

171 }<br />

172 format {%d %d R} [l<strong>in</strong>dex $A(!$label) 0] [l<strong>in</strong>dex $A(!$label) 1]<br />

173 }<br />

The beg<strong>in</strong>_stream and end_stream procedures delimit the creation <strong>of</strong> a stream<br />

object. Between two such commands, it is possible to write arbitrary text (usually<br />

page descriptors or some sort <strong>of</strong> embedded data) to the <strong>PDF</strong> file and have it<br />

<strong>in</strong>serted correctly <strong>in</strong>to the file as the data stored <strong>in</strong> the stream object.<br />

The syntax for beg<strong>in</strong>_stream is<br />

pdf::beg<strong>in</strong>_stream {file} {reference label} {key} {value} ∗<br />

where {file} <strong>of</strong> course is the file to write to and {reference label} is the str<strong>in</strong>g that<br />

should be used to reference this object. Each stream consists <strong>of</strong> one dictionary part<br />

and one data part, where the primary task <strong>of</strong> the dictionary part is to specify how<br />

the data part should be <strong>in</strong>terpreted. The most important element <strong>in</strong> the dictionary<br />

is the /Length key and its value—these are <strong>in</strong>serted by the beg<strong>in</strong>_stream and<br />

end_stream commands, so one needs not worry about those—but if for example<br />

the data part is encoded <strong>in</strong> some special way (for example, it might be compressed)<br />

then it is necessary to <strong>in</strong>clude additional elements <strong>in</strong> the dictionary. This is what<br />

the {key} and {value} arguments are for.<br />

The current_stream entry <strong>in</strong> a <strong>PDF</strong> file array is set if and only if the current<br />

position <strong>in</strong> that file is <strong>in</strong>side a stream. It is not possible to beg<strong>in</strong> a new stream<br />

when this entry is set. The value <strong>of</strong> this entry is a list with the structure<br />

{reference label} {start}<br />

where {reference label} is the reference label <strong>of</strong> the stream and {start} is the<br />

position <strong>in</strong> the file <strong>of</strong> the first byte <strong>in</strong> the stream data. Both <strong>of</strong> these are needed<br />

at end_stream to record the length <strong>of</strong> the stream data.<br />

This k<strong>in</strong>d <strong>of</strong> entry is used for <strong>in</strong>direct objects that are lengths <strong>of</strong> the stream whose<br />

reference label is the 〈reference label〉. They have the same syntax as their !<br />

ord<strong>in</strong>ary counterparts.<br />

174 proc pdf::beg<strong>in</strong>_stream {F label args} {<br />

175 upvar #0 [namespace current]::$F A<br />

176 if {[<strong>in</strong>fo exists A(current_stream)]} then {<br />

177 error "There is already a stream ([l<strong>in</strong>dex $A(current_stream) 0])\<br />

be<strong>in</strong>g written to <strong>in</strong> this file."<br />

18


179 }<br />

180 if {![<strong>in</strong>fo exists A(!$label)]} then {<br />

181 <strong>in</strong>cr A(last_object_num)<br />

182 set A(!$label) [list $A(last_object_num) 0]<br />

183 }<br />

184 set A(?$label) [list [<strong>in</strong>cr A(last_object_num)] 0]<br />

185 lappend A(!$label) [tell $F]<br />

186 puts $F\<br />

[format {%d %d obj} [l<strong>in</strong>dex $A(!$label) 0] [l<strong>in</strong>dex $A(!$label) 1]]<br />

188 puts $F [eval\<br />

[list dict_obj /Length [format {%d 0 R} $A(last_object_num)]]\<br />

$args]<br />

191 puts $F stream<br />

192 set A(current_stream) [list $label [tell $F]]<br />

193 }<br />

The end_stream procedure takes the target file as its only argument. It f<strong>in</strong>ishes<br />

<strong>of</strong>f the stream as necessary. It also evaluates everyth<strong>in</strong>g that has been placed <strong>in</strong><br />

the backlog <strong>of</strong> the file.<br />

pdf::file〈num〉(backlog) It is not possible to output a new <strong>in</strong>direct object when a stream is be<strong>in</strong>g written to,<br />

but it can still be at such a time that the need for such an object is discovered. The<br />

backlog entry provides a way around that limitation—this entry is a script that is<br />

evaluated (and cleared) at the end <strong>of</strong> every end_stream, hence commands can be<br />

delayed by append<strong>in</strong>g them to this script, <strong>in</strong>stead <strong>of</strong> evaluat<strong>in</strong>g them immediately.<br />

New commands are appended to the backlog, and must be preceeded by a<br />

command separator.<br />

194 proc pdf::end_stream {F} {<br />

195 upvar #0 [namespace current]::$F A<br />

196 if {![<strong>in</strong>fo exists A(current_stream)]} then {<br />

197 error "There is no stream to end."<br />

198 }<br />

199 set length [expr {[tell $F] - [l<strong>in</strong>dex $A(current_stream) 1]}]<br />

200 set label [l<strong>in</strong>dex $A(current_stream) 0]<br />

201 unset A(current_stream)<br />

202 puts $F "endstream endobj"<br />

203 lappend A(?$label) [tell $F]<br />

204 puts $F [format {%d %d obj %d endobj} [l<strong>in</strong>dex $A(?$label) 0]\<br />

[l<strong>in</strong>dex $A(?$label) 1] $length]<br />

206 eval "set A(backlog) {}; $A(backlog)"<br />

207 }<br />

pdf::put_obj (proc) The put_obj procedure writes a direct object to a <strong>PDF</strong> file. The syntax is<br />

pdf::put_obj {file} {reference label} {object}<br />

208 proc pdf::put_obj {F label obj} {<br />

209 upvar #0 [namespace current]::$F A<br />

210 if {[<strong>in</strong>fo exists A(current_stream)]} then {<br />

211 append A(backlog) \n [list put_obj $F $label $obj]<br />

19


212 return<br />

213 }<br />

214 if {![<strong>in</strong>fo exists A(!$label)]} then {<br />

215 <strong>in</strong>cr A(last_object_num)<br />

216 set A(!$label) [list $A(last_object_num) 0]<br />

217 }<br />

218 lappend A(!$label) [tell $F]<br />

219 puts $F\<br />

[format {%d %d obj} [l<strong>in</strong>dex $A(!$label) 0] [l<strong>in</strong>dex $A(!$label) 1]]<br />

221 puts $F $obj<br />

222 puts $F endobj<br />

223 }<br />

pdf::rewrite_pdf (proc) The rewrite_pdf procedure opens a new <strong>PDF</strong> file for writ<strong>in</strong>g and <strong>in</strong>itialises the<br />

associated data structures. The syntax is<br />

pdf::rewrite_pdf {file name} 〈options〉<br />

and the return value is the identifier <strong>of</strong> the file opened. The {file name} is <strong>of</strong><br />

course the name <strong>of</strong> that file. The 〈options〉 is zero or more <strong>of</strong><br />

-permissions {<strong>in</strong>teger}<br />

-header {str<strong>in</strong>g}<br />

The permissions are the default permissions for the file <strong>in</strong> question. If this is not<br />

specified, then no such value is specified to open, The header is a str<strong>in</strong>g that will<br />

be put first <strong>in</strong> the file (as header). It defaults to<br />

%<strong>PDF</strong>-1.3<br />

%˚aäö<br />

(<strong>in</strong> UTF-8) where the first l<strong>in</strong>e is a standard header l<strong>in</strong>e, and the second l<strong>in</strong>e is<br />

there to help some s<strong>of</strong>tware understand that the file should be treated as a b<strong>in</strong>ary<br />

file. Note that no newl<strong>in</strong>e is <strong>in</strong>serted after this str<strong>in</strong>g; be sure to <strong>in</strong>clude it <strong>in</strong> the<br />

str<strong>in</strong>g if necessary.<br />

224 proc pdf::rewrite_pdf {name args} {<br />

225 set Opt(-header) [encod<strong>in</strong>g convertto utf-8 %<strong>PDF</strong>-1.3\n%\xe5\xe4\xf6\n]<br />

227 array set Opt $args<br />

228 if {[<strong>in</strong>fo exists Opt(-permissions)]} then {<br />

229 set F [open $name w $Opt(-permissions)]<br />

230 } else {<br />

231 set F [open $name w]<br />

232 }<br />

233 fconfigure $F -translation b<strong>in</strong>ary<br />

234 puts -nonewl<strong>in</strong>e $F $Opt(-header)<br />

235 upvar #0 [namespace current]::$F A<br />

236 array unset A<br />

237 set A(last_object_num) 0<br />

238 set A(backlog) ""<br />

239 return $F<br />

240 }<br />

20


pdf::close_pdf (proc) The close_pdf procedure performs the non-trivial task <strong>of</strong> f<strong>in</strong>ish<strong>in</strong>g <strong>of</strong>f the <strong>PDF</strong><br />

file and clos<strong>in</strong>g it. The syntax is<br />

pdf::close_pdf {file} {catalog label} {key} {value} ∗<br />

and the return value is a report detail<strong>in</strong>g any problems encountered (such as<br />

objects that are referred to but never def<strong>in</strong>ed). This is a report rather than an<br />

error, because there is <strong>in</strong> many cases no sharp dist<strong>in</strong>ction. If the return value is<br />

non-empty, then there is probably a bug <strong>in</strong> your program that needs to be fixed.<br />

The {file} is the identifier <strong>of</strong> the file to write. The {catalog label} is the reference<br />

label <strong>of</strong> the Catalog object <strong>in</strong> the document. The rema<strong>in</strong><strong>in</strong>g arguments can be<br />

used to <strong>in</strong>sert additional <strong>in</strong>formation (such as a reference to the Info dictionary <strong>of</strong><br />

the document) <strong>in</strong> the trailer dictionary.<br />

241 proc pdf::close_pdf {F label args} {<br />

242 upvar #0 [namespace current]::$F A<br />

243 set reportL [list]<br />

The first step is to compile the cross-reference table <strong>of</strong> the document. I orig<strong>in</strong>ally<br />

made one subsection for each range <strong>of</strong> def<strong>in</strong>ed <strong>in</strong>direct objects, giv<strong>in</strong>g the<br />

mandatory free entry #0 a separate subsection, but for some reason Adobe s<strong>of</strong>tware<br />

didn’t like that at all. 2 Hence the current implementation is to make a<br />

cross-reference table with only one subsection, with an explicit free entry for every<br />

miss<strong>in</strong>g item.<br />

The xrA array constructed below is a prototype for the cross-reference section.<br />

It is <strong>in</strong>dexed by object number and the entries have the list structure<br />

{file position} {generation number} {type}<br />

Just as <strong>in</strong> a <strong>PDF</strong> file, the {type} is either f or n depend<strong>in</strong>g on whether the entry<br />

is “free” or “<strong>in</strong> use”. The {file position} and {generation number} are however<br />

not padded with zeros, and the {file position} is <strong>in</strong>itially an empty str<strong>in</strong>g <strong>in</strong> the<br />

“free” entries.<br />

This first round simply collects the <strong>in</strong>formation and detects collisions.<br />

244 set xrA(0) [list "" 65535 f]<br />

245 foreach lbl [array names A {[!?]*}] {<br />

246 set idx [l<strong>in</strong>dex $A($lbl) 0]<br />

247 set ent [list [l<strong>in</strong>dex $A($lbl) 2] [l<strong>in</strong>dex $A($lbl) 1] n]<br />

248 if {[llength $A($lbl)]3} then {<br />

253 lappend reportL "Multiple <strong>in</strong>direct objects\<br />

for label [str<strong>in</strong>g range $lbl 1 end]; at\<br />

[jo<strong>in</strong> [lrange $A($lbl) 2 end]]."<br />

2 Whether this means Adobe isn’t follow<strong>in</strong>g their own standard I leave to others to decide.<br />

Neither GhostScript nor Quartz (the <strong>PDF</strong>-based graphics system <strong>in</strong> Mac OS X) seemed to have<br />

any problems with this arrangement.<br />

21


256 }<br />

257 if {![<strong>in</strong>fo exists xrA($idx)]} then {<br />

258 set xrA($idx) $ent<br />

259 } elseif {[l<strong>in</strong>dex $xrA($idx) 2]=="f" && [l<strong>in</strong>dex $ent 2]=="n"}\<br />

then {<br />

261 lappend reportL "This shouldn’t happen: There are several\<br />

reference labels for <strong>in</strong>direct object $idx. Us<strong>in</strong>g that with\<br />

label: [str<strong>in</strong>g range $lbl 1 end]"<br />

265 set xrA($idx) $ent<br />

266 } else {<br />

267 lappend reportL "This shouldn’t happen: There are several\<br />

reference labels for <strong>in</strong>direct object $idx. Ignor<strong>in</strong>g that\<br />

with label: [str<strong>in</strong>g range $lbl 1 end]"<br />

271 }<br />

272 }<br />

The second round makes sure that there is a contiguous sequence <strong>of</strong> reference<br />

numbers and constructs the l<strong>in</strong>ked list <strong>of</strong> free entries.<br />

273 set last_free 0<br />

274 set maxidx [l<strong>in</strong>dex [lsort -<strong>in</strong>teger -decreas<strong>in</strong>g [array names xrA]] 0]<br />

275 for {set n $maxidx} {$n>=0} {<strong>in</strong>cr n -1} {<br />

276 if {![<strong>in</strong>fo exists xrA($n)]} then {<br />

277 set xrA($n) [list "" 0 f]<br />

278 lappend reportL "This shouldn’t happen: Object number $n was\<br />

allocated, but not assigned a reference label."<br />

281 }<br />

282 if {[l<strong>in</strong>dex $xrA($n) 2]=="f"} then {<br />

283 set xrA($n) [lreplace $xrA($n) 0 0 $last_free]<br />

284 set last_free $n<br />

285 }<br />

286 }<br />

Now the cross-reference section can be written to file.<br />

287 set startxref [tell $F]<br />

288 puts $F xref<br />

289 puts $F [format {%d %d} 0 [expr {$maxidx + 1}]]<br />

290 for {set n 0} {$n


303 puts $F "startxref\n${startxref}\n%%EOF"<br />

The f<strong>in</strong>al step is to close the file and compile the report.<br />

304 close $F<br />

305 jo<strong>in</strong> $reportL \n<br />

306 }<br />

307 〈/pkg〉<br />

2.3 Hello World<br />

The code below creates a <strong>PDF</strong> file match<strong>in</strong>g the <strong>basic</strong> “Hello World” example [1,<br />

Sec. A.2].<br />

308 〈∗example1〉<br />

309 set F [pdf::rewrite_pdf hello.pdf]<br />

310 pdf::put_obj $F "The catalog" [pdf::dict_obj\<br />

311 /Type /Catalog\<br />

312 /Pages [pdf::obj_ref $F "The pages"]\<br />

313 /Outl<strong>in</strong>es [pdf::obj_ref $F "The outl<strong>in</strong>es"]]<br />

314 pdf::put_obj $F "The outl<strong>in</strong>es"\<br />

[pdf::dict_obj /Type /Outl<strong>in</strong>es /Count [pdf::<strong>in</strong>t_obj 0]]<br />

316 pdf::put_obj $F "The pages" [pdf::dict_obj\<br />

317 /Type /Pages\<br />

318 /Count [pdf::<strong>in</strong>t_obj 1]\<br />

319 /Kids [pdf::array_obj [pdf::obj_ref $F "Page 1"]]]<br />

320 pdf::put_obj $F "Page 1" [pdf::dict_obj\<br />

321 /Type /Page\<br />

322 /Parent [pdf::obj_ref $F "The pages"]\<br />

323 /Resources [pdf::dict_obj\<br />

324 /Font [pdf::dict_obj /F1 [pdf::obj_ref $F "Helvetica"]]\<br />

325 /ProcSet [pdf::obj_ref $F "The procs"]]\<br />

326 /MediaBox [pdf::array_obj [pdf::<strong>in</strong>t_obj 0] [pdf::<strong>in</strong>t_obj 0]\<br />

[pdf::<strong>in</strong>t_obj 612] [pdf::<strong>in</strong>t_obj 792]]\<br />

328 /Contents [pdf::obj_ref $F "Page 1 contents"]]<br />

329 pdf::beg<strong>in</strong>_stream $F "Page 1 contents"<br />

330 puts $F {BT}<br />

331 puts $F {/F1 24 Tf}<br />

332 puts $F {100 100 Td (Hello World) Tj}<br />

333 puts $F {ET}<br />

334 pdf::end_stream $F<br />

335 pdf::put_obj $F "The procs" [pdf::array_obj /<strong>PDF</strong> /Text]<br />

336 pdf::put_obj $F "Helvetica" [pdf::dict_obj /Type /Font /Subtype /Type1\<br />

/Name /F1 /BaseFont /Helvetica /Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />

339 pdf::close_pdf $F "The catalog"<br />

340 〈/example1〉<br />

3 Contents and resources<br />

Most <strong>of</strong> the th<strong>in</strong>gs one actually sees <strong>of</strong> a <strong>PDF</strong> document is part <strong>of</strong> a content stream,<br />

which is the side <strong>of</strong> <strong>PDF</strong> which is most like a simplified Postscript file: a sequence<br />

23


pdf::resource_dict_obj<br />

(proc)<br />

<strong>of</strong> simple operators for draw<strong>in</strong>g text and graphics, and before each operator is<br />

arguments. One difference is however that many types <strong>of</strong> data are not permitted<br />

with<strong>in</strong> a content stream, because some aspects (<strong>in</strong>direct objects, dictionaries) <strong>of</strong><br />

the required forms <strong>of</strong> such data are not permitted there. Instead the content<br />

stream has to be supplemented by a resources dictionary, which locally associates<br />

names to objects, and these names are what one may use <strong>in</strong> the content stream.<br />

The model used here to overcome this is to equip the <strong>in</strong>ternal representation <strong>of</strong><br />

a contents stream with a representation <strong>of</strong> the correspond<strong>in</strong>g resources dictionary.<br />

Commands emitt<strong>in</strong>g operators that make use <strong>of</strong> such <strong>in</strong>direct resources should<br />

check if these are present <strong>in</strong> the resources dictionary, and see to that they are<br />

added if they were not. The resources dictionary is uniquely identified by the file<br />

identifier and stream object label.<br />

3.1 Resources representation<br />

Resources dictionaries are kept <strong>in</strong> arrays, where each resource type (or equivalently:<br />

entry <strong>in</strong> the dictionary) has a separate entry. These entries are key–value<br />

lists where the keys are <strong>PDF</strong> name objects and the values are the underly<strong>in</strong>g resource<br />

objects (normally <strong>in</strong>direct references). (An exception is the ProcSet entry,<br />

which is a straight list <strong>of</strong> names.) The resource type names are the same as <strong>in</strong><br />

the <strong>PDF</strong> file, e.g. XObject, Font, and ProcSet—<strong>in</strong> other words, don’t <strong>in</strong>clude a<br />

lead<strong>in</strong>g slash.<br />

S<strong>in</strong>ce explicit declaration <strong>of</strong> procsets was declared obsolete <strong>in</strong> <strong>PDF</strong> 1.4 and<br />

wasn’t very useful earlier either, most <strong>of</strong> the support for specify<strong>in</strong>g procsets has<br />

been removed from the pdf package, and the ProcSet entries <strong>in</strong>stead default to<br />

list<strong>in</strong>g all five procsets. If for some reason you wish to specify a smaller set <strong>of</strong><br />

procsets, then set the ProcSet entry <strong>of</strong> your resources array to a list <strong>of</strong> those<br />

names <strong>of</strong> procsets that you want to require.<br />

The resource_dict_obj procedure returns the <strong>PDF</strong> dictionary object for the<br />

data kept <strong>in</strong> an array. The call syntax is<br />

pdf::resource_dict_obj {array-name}<br />

where the {array-name} refers to an array <strong>in</strong> the local context <strong>of</strong> the caller.<br />

If the array does not conta<strong>in</strong> any ProcSet entry, then for compatibility such<br />

an entry list<strong>in</strong>g all five procsets is <strong>in</strong>serted.<br />

341 〈∗pkg〉<br />

342 proc pdf::resource_dict_obj {arrname} {<br />

343 upvar 1 $arrname A<br />

344 set call [list dict_obj]<br />

345 if {![<strong>in</strong>fo exists A(ProcSet)]} then {<br />

346 lappend call /ProcSet {[/<strong>PDF</strong>/Text/ImageB/ImageC/ImageI]}<br />

347 }<br />

348 foreach type [array names A] {<br />

349 lappend call [name_obj $type]<br />

350 if {$type == "ProcSet"} then {<br />

351 lappend call [eval [l<strong>in</strong>sert $A(ProcSet) 0 array_obj]]<br />

24


pdf::file〈num〉<br />

(Resources/〈type〉)<br />

pdf::beg<strong>in</strong>_contents<br />

(proc)<br />

pdf::end_contents (proc)<br />

352 } else {<br />

353 lappend call [eval [l<strong>in</strong>sert $A($type) 0 dict_obj]]<br />

354 }<br />

355 }<br />

356 eval $call<br />

357 }<br />

When a content stream is be<strong>in</strong>g written to, the resources dictionary data <strong>of</strong> that<br />

stream is kept <strong>in</strong> the ma<strong>in</strong> array <strong>of</strong> that <strong>PDF</strong> file. The entry formats are the same<br />

as when kept <strong>in</strong> a separate array, but the entry names are prefixed by Resources/<br />

to prevent name clashes.<br />

The beg<strong>in</strong>_contents and end_contents procedures are specialised forms <strong>of</strong><br />

beg<strong>in</strong>_stream and end_stream that, <strong>in</strong> addition to delimit<strong>in</strong>g the creation <strong>of</strong><br />

a stream object, manage the associated resources dictionary.<br />

The syntax for beg<strong>in</strong>_content is<br />

pdf::beg<strong>in</strong>_contents {resources-array} {file} {reference label} {key}<br />

{value} ∗<br />

where all arguments except {resources-array} are as for beg<strong>in</strong>_stream. If this<br />

extra argument is nonempty then it is the name <strong>in</strong> the local context <strong>of</strong> the caller<br />

<strong>of</strong> an array represent<strong>in</strong>g a resources dictionary; the procedure copies the contents<br />

<strong>of</strong> that array to the current resources dictionary for this file. If {resources-array}<br />

is empty then the current resources dictionary is set to be<strong>in</strong>g empty.<br />

358 proc pdf::beg<strong>in</strong>_contents {arr F label args} {<br />

359 eval [list beg<strong>in</strong>_stream $F $label] $args<br />

360 upvar #0 [namespace current]::$F A<br />

361 array unset A Resources/*<br />

362 if {[str<strong>in</strong>g length $arr]} then {<br />

363 upvar 1 $arr B<br />

364 foreach type [array names B] {<br />

365 set A(Resources/$type) $B($type)<br />

366 }<br />

367 }<br />

368 }<br />

The end_contents procedure has the syntax<br />

pdf::end_contents {resources-array} {file}<br />

It copies the current resources dictionary data for the {file} to the {resources-array}<br />

(variable name <strong>in</strong> the local context <strong>of</strong> the caller) and then calls end_stream to<br />

end the current contents stream.<br />

369 proc pdf::end_contents {arr F} {<br />

370 upvar #0 [namespace current]::$F A<br />

371 if {![<strong>in</strong>fo exists A(current_stream)]} then {<br />

372 error "There is no stream to end."<br />

373 }<br />

25


pdf::has_resource?<br />

(proc)<br />

374 upvar 1 $arr B<br />

375 foreach <strong>in</strong>dex [array names A Resources/*] {<br />

376 set type [str<strong>in</strong>g range $<strong>in</strong>dex 10 end]<br />

377 set B($type) $A($<strong>in</strong>dex)<br />

378 }<br />

379 end_stream $F<br />

380 }<br />

The has_resource? procedure can be used to query whether a particular resource<br />

is present <strong>in</strong> the current resources dictionary. The syntax is<br />

pdf::has_resource? {file} {type} {object} {name-var} ?<br />

and the return value is 1 if the {object} is one <strong>of</strong> the objects listed under the<br />

{type} <strong>in</strong> the current dictionary <strong>of</strong> the file {file} and 0 otherwise. If a {name-var}<br />

is specified and the return value is 1 then that variable <strong>in</strong> the local context <strong>of</strong> the<br />

caller will be set to the <strong>PDF</strong> name object associated with the given {object}.<br />

381 〈∗obsolete〉<br />

382 proc pdf::has_resource? {F type obj {namevar {}}} {<br />

383 upvar #0 [namespace current]::$F A<br />

384 if {![<strong>in</strong>fo exists A(Resources/$type)]} then {return 0}<br />

385 if {$type == "ProcSet"} then {<br />

386 if {[lsearch -exact $A(Resources/$type) $obj] >= 0} then {<br />

387 if {[str<strong>in</strong>g length $namevar]} then {<br />

388 uplevel 1 [list ::set $namevar $obj]<br />

389 }<br />

390 return 1<br />

391 } else {<br />

392 return 0<br />

393 }<br />

394 }<br />

395 foreach {name resobj} $A(Resources/$type) {<br />

396 if {[str<strong>in</strong>g equal $resobj $obj]} then {<br />

397 if {[str<strong>in</strong>g length $namevar]} then {<br />

398 uplevel 1 [list ::set $namevar $name]<br />

399 }<br />

400 return 1<br />

401 }<br />

402 }<br />

403 return 0<br />

404 }<br />

405 〈/obsolete〉<br />

pdf::name_resource (proc) The name_resource procedure provides a name object referr<strong>in</strong>g to an object and<br />

(if necessary) adds that object to the current resources dictionary <strong>of</strong> the file. The<br />

syntax is<br />

pdf::name_resource {var-name} {file} {type} {object} {suggested<br />

name} ?<br />

26


where {var-name} is the name <strong>of</strong> a variable <strong>in</strong> the local context <strong>of</strong> the caller that<br />

will be set to the name object referr<strong>in</strong>g to the specified resuource. The result is 0<br />

if the resource was already present and 1 if an entry for it was added.<br />

The {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file <strong>in</strong> which the stream is located for<br />

which this resource is go<strong>in</strong>g to be made available. The {type} is the name (slash<br />

not <strong>in</strong>cluded) <strong>of</strong> the resource dictionary entry where this resource should be placed,<br />

e.g. Font, XObject, etc. The {object} is the object that constitutes the resource to<br />

name. The {suggested name} argument can be used to request a particular name<br />

for the resource; it should be the <strong>PDF</strong> name object to give the resource. An error<br />

will be raised if that name is already used for some other resource <strong>of</strong> that type.<br />

The {type} must not be ProcSet.<br />

406 proc pdf::name_resource {varname F type obj {name {}}} {<br />

407 upvar #0 [namespace current]::$F A<br />

408 switch -- $type ColorSpace {<br />

409 set short_type /CS<br />

410 } XObject {<br />

411 set short_type /XO<br />

412 } ExtGState {<br />

413 set short_type /GS<br />

414 } Font {<br />

415 set short_type /F<br />

416 } Pattern {<br />

417 set short_type /Pat<br />

418 } ProcSet {<br />

419 error {If you really th<strong>in</strong>k you need to bother about procsets,\<br />

then access the array directly.}<br />

421 } Properties {<br />

422 set short_type /Prop<br />

423 } Shad<strong>in</strong>g {<br />

424 set short_type /Sh<br />

425 } default {<br />

426 set short_type /$type<br />

427 }<br />

428 if {![<strong>in</strong>fo exists A(Resources/$type)]} then {<br />

429 if {![str<strong>in</strong>g length $name]} then {<br />

430 set name ${short_type}0<br />

431 }<br />

432 set A(Resources/$type) [list $name $obj]<br />

433 uplevel 1 [list ::set $varname $name]<br />

434 return 1<br />

435 }<br />

436 if {[str<strong>in</strong>g length $name]} then {<br />

437 foreach {key val} $A(Resources/$type) {<br />

438 if {[str<strong>in</strong>g equal $key $name]} then {<br />

439 if {![str<strong>in</strong>g equal $obj $val]} then {<br />

440 error "Name already <strong>in</strong> use for: $val"<br />

441 }<br />

442 uplevel 1 [list ::set $varname $name]<br />

443 return 0<br />

27


pdf::require_procsets<br />

(proc)<br />

444 }<br />

445 }<br />

446 lappend A(Resources/$type) $name $obj<br />

447 uplevel 1 [list ::set $varname $name]<br />

448 return 1<br />

449 }<br />

450 set name "${short_type}[expr {[llength $A(Resources/$type)]/2}]"<br />

451 regsub -all {[\[\]?*\\]} $short_type {\\&} pattern<br />

452 append pattern *<br />

453 set free 1<br />

454 foreach {key val} $A(Resources/$type) {<br />

455 if {[str<strong>in</strong>g equal $val $obj]} then {<br />

456 uplevel 1 [list ::set $varname $key]<br />

457 return 0<br />

458 }<br />

459 if {[str<strong>in</strong>g equal $key $name]} then {set free 0}<br />

460 if {[str<strong>in</strong>g match $pattern $key]} then {<br />

461 set Used([str<strong>in</strong>g range $key [str<strong>in</strong>g length $short_type] end])\<br />

{}<br />

462 }<br />

463 }<br />

464 if {!$free} then {<br />

465 set n [expr {[llength $A(Resources/$type)]/2}]<br />

466 while {[<strong>in</strong>fo exists Used($n)]} {<strong>in</strong>cr n}<br />

467 set name ${short_type}$n<br />

468 }<br />

469 lappend A(Resources/$type) $name $obj<br />

470 uplevel 1 [list ::set $varname $name]<br />

471 return 1<br />

472 }<br />

The require_procsets procedure is called to make sure that certa<strong>in</strong> ProcSets<br />

are listed the current resources dictionary. The syntax is<br />

pdf::require_procsets {file} {name obj } ∗<br />

where {file} is the relevant file and the {name obj }s are the <strong>PDF</strong> name objects <strong>of</strong><br />

the required ProcSets.<br />

473 〈∗obsolete〉<br />

474 proc pdf::require_procsets {F args} {<br />

475 upvar #0 [namespace current]::$F A<br />

476 if {![<strong>in</strong>fo exists A(Resources/ProcSet)]} then {<br />

477 set A(Resources/ProcSet) $args<br />

478 } else {<br />

479 set A(Resources/ProcSet) [lsort -dictionary -unique [<br />

480 concat $A(Resources/ProcSet) $args<br />

481 ]]<br />

482 }<br />

483 }<br />

484 〈/obsolete〉<br />

28


3.2 Formatt<strong>in</strong>g content<br />

pdf::spr<strong>in</strong>tf (proc) The spr<strong>in</strong>tf procedure formats data for writ<strong>in</strong>g to a <strong>PDF</strong> contents stream. The<br />

syntax is<br />

pdf::spr<strong>in</strong>tf {format list} {data} ∗<br />

and the return value is the result<strong>in</strong>g <strong>PDF</strong> code.<br />

The {format list} is similar to the formatt<strong>in</strong>g str<strong>in</strong>g <strong>of</strong> format, but every<br />

conversion specifier must be a separate list element. List elements that are not<br />

conversion specifiers are copied verbatim to the result. Material from different list<br />

elements are always separated by whitespace <strong>in</strong> the result.<br />

As with format, the first character <strong>of</strong> a conversion specifier is always a ‘%’.<br />

The exact format is<br />

%〈char〉 〈count〉(.〈precision〉) ? ?<br />

(a 〈precision〉 field requires specify<strong>in</strong>g a 〈count〉 because the conversion specifiers<br />

are parsed us<strong>in</strong>g scan). The 〈count〉 defaults to 1 and specify<strong>in</strong>g a non-unit 〈count〉<br />

is equivalent to specify<strong>in</strong>g that many separate conversion specifiers <strong>in</strong> sequence.<br />

The 〈precision〉 is only used by real and length conversions.<br />

The conversion character 〈char〉 specifies how the {data} should be converted.<br />

The <strong>basic</strong> conversions are<br />

b Boolean, to be formatted by boolean_obj.<br />

i Integer, to be formatted by <strong>in</strong>t_obj.<br />

l Length, to be formatted by length_obj. This consumes two {data} arguments:<br />

one for the value and one for the unit.<br />

n Str<strong>in</strong>g, to be formatted by name_obj.<br />

o Already formatted <strong>PDF</strong> object.<br />

r Real number, to be formatted by real_obj (with default precision accord<strong>in</strong>g<br />

to the precision variable).<br />

s <strong>PDF</strong> str<strong>in</strong>g, to be formatted by str<strong>in</strong>g_obj.<br />

In addition, the correspond<strong>in</strong>g upper case letters select the same formatt<strong>in</strong>g, but<br />

the (first) {data} argument is <strong>in</strong>terpreted as a list <strong>of</strong> th<strong>in</strong>gs to process <strong>in</strong> the<br />

specified way. F<strong>in</strong>ally, if the character is an & then the {data} is <strong>in</strong>terpreted as a<br />

list<br />

{format list} {data} ∗<br />

which will be formatted by a recursive spr<strong>in</strong>tf call and <strong>in</strong>serted <strong>in</strong>to the result<br />

at that position. This is <strong>in</strong>tended to simplify encod<strong>in</strong>g structured data.<br />

485 proc pdf::spr<strong>in</strong>tf {format args} {<br />

486 variable precision<br />

29


487 set items [list]<br />

488 set n 0<br />

489 foreach spec $format {<br />

490 set count 1<br />

491 set prec $precision<br />

492 if {![scan $spec {%%%[bilnorsBILNORS&]%d.%d} code count prec]}\<br />

then {<br />

494 lappend items $spec<br />

495 } else {<br />

496 for {} {$count>=1} {<strong>in</strong>cr count -1; <strong>in</strong>cr n} {<br />

497 set datum [l<strong>in</strong>dex $args $n]<br />

498 switch -- $code "b" {<br />

499 lappend items [boolean_obj $datum]<br />

500 } "i" {<br />

501 lappend items [<strong>in</strong>t_obj $datum]<br />

502 } "l" {<br />

503 lappend items [<br />

504 length_obj $datum [l<strong>in</strong>dex $args [<strong>in</strong>cr n]] $prec<br />

505 ]<br />

506 } "n" {<br />

507 lappend items [name_obj $datum]<br />

508 } "o" {<br />

509 lappend items $datum<br />

510 } "r" {<br />

511 lappend items [real_obj $datum $prec]<br />

512 } "s" {<br />

513 lappend items [str<strong>in</strong>g_obj $datum]<br />

514 } "B" {<br />

515 foreach d $datum {lappend items [boolean_obj $d]}<br />

516 } "I" {<br />

517 foreach d $datum {lappend items [<strong>in</strong>t_obj $d]}<br />

518 } "L" {<br />

519 set unit [l<strong>in</strong>dex $args [<strong>in</strong>cr n]]<br />

520 foreach d $datum {<br />

521 lappend items [length_obj $d $unit $prec]<br />

522 }<br />

523 } "N" {<br />

524 foreach d $datum {lappend items [name_obj $d]}<br />

525 } "O" {<br />

526 eval [list lappend items] $datum<br />

527 } "R" {<br />

528 foreach d $datum {<br />

529 lappend items [real_obj $d $prec]<br />

530 }<br />

531 } "S" {<br />

532 foreach d $datum {lappend items [str<strong>in</strong>g_obj $d]}<br />

533 } "&" {<br />

534 lappend items [eval [l<strong>in</strong>sert $datum 0 spr<strong>in</strong>tf]]<br />

535 } default {<br />

536 error "Bad pdf::spr<strong>in</strong>tf format specifier ‘$spec’."<br />

30


537 }<br />

538 }<br />

539 }<br />

540 }<br />

541 jo<strong>in</strong> $items<br />

542 }<br />

pdf::pr<strong>in</strong>tf (proc) The pr<strong>in</strong>tf procedure is an extension <strong>of</strong> spr<strong>in</strong>tf that immediately writes the<br />

formatted str<strong>in</strong>g to a file rather than return<strong>in</strong>g it. The syntax is<br />

pdf::pr<strong>in</strong>tf {file} {format list} {data} ∗<br />

543 proc pdf::pr<strong>in</strong>tf {F format args} {<br />

544 puts $F [eval [list spr<strong>in</strong>tf $format] $args]<br />

545 }<br />

546 〈/pkg〉<br />

3.3 Hello aga<strong>in</strong>, World<br />

The code below is an example that achieves very much the same th<strong>in</strong>gs as that <strong>in</strong><br />

Subsection 2.3, but this time us<strong>in</strong>g the resource management and data formatt<strong>in</strong>g<br />

provided for content streams.<br />

547 〈∗example2〉<br />

548 set F [pdf::rewrite_pdf helloaga<strong>in</strong>.pdf]<br />

549 pdf::put_obj $F "Helvetica" [pdf::dict_obj /Type /Font /Subtype /Type1\<br />

/BaseFont /Helvetica /Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />

(It turns out that the /Name entry, which is <strong>in</strong>cluded <strong>in</strong> 〈example1〉, <strong>of</strong> <strong>PDF</strong> files<br />

has been depracated for quite some time, although it is still <strong>in</strong> the “Hello world”<br />

example <strong>of</strong> the <strong>PDF</strong> 1.5 specification.)<br />

With resource management, page contents is merely the follow<strong>in</strong>g.<br />

552 pdf::beg<strong>in</strong>_contents "" $F "Page 1 contents"<br />

553 pdf::name_resource Helvetica $F Font [pdf::obj_ref $F "Helvetica"]<br />

554 pdf::pr<strong>in</strong>tf $F {BT %o %i Tf %r2 Td %s Tj ET} $Helvetica 24 100 100 \<br />

{Hello aga<strong>in</strong>, World!}<br />

Let’s add also some graphics: a green circle with midpo<strong>in</strong>t (200, 200) and radius<br />

50. <strong>PDF</strong> doesn’t have circular arcs, but the MetaFont four segment approximation<br />

should do nicely. This places the control po<strong>in</strong>ts 4<br />

√ −1<br />

3 1 + 2 ≈ 0.552284749831<br />

<strong>of</strong> the radius from their nearest knot, and for a radius <strong>of</strong> 50 that is very nearly<br />

27.6.<br />

556 pdf::pr<strong>in</strong>tf $F {%R rg} {0 1 0}<br />

557 pdf::pr<strong>in</strong>tf $F {%R m %R3 c %R3 c %R3 c %R3 c f}\<br />

558 {200 150}\<br />

559 {227.6 150} {250 172.4} {250 200}\<br />

560 {250 227.6} {227.6 250} {200 250}\<br />

561 {172.4 250} {150 227.6} {150 200}\<br />

562 {150 172.4} {172.4 150} {200 150}<br />

563 pdf::end_contents Res1 $F<br />

31


pdf::file〈num〉<br />

(Pages/〈num〉)<br />

pdf::file〈num〉<br />

(Pages/prefix)<br />

pdf::file〈num〉<br />

(Pages/arity)<br />

pdf::file〈num〉<br />

(Pages/last)<br />

pdf::file〈num〉<br />

(Pages/attributes)<br />

564 pdf::put_obj $F "Page 1" [pdf::dict_obj\<br />

565 /Type /Page\<br />

566 /Parent [pdf::obj_ref $F "The pages"]\<br />

567 /MediaBox [pdf::array_obj [pdf::<strong>in</strong>t_obj 0] [pdf::<strong>in</strong>t_obj 0]\<br />

[pdf::<strong>in</strong>t_obj 612] [pdf::<strong>in</strong>t_obj 792]]\<br />

569 /Resources [pdf::resource_dict_obj Res1]\<br />

570 /Contents [pdf::obj_ref $F "Page 1 contents"]]<br />

571 pdf::put_obj $F "The pages" [pdf::dict_obj\<br />

572 /Type /Pages\<br />

573 /Count [pdf::<strong>in</strong>t_obj 1]\<br />

574 /Kids [pdf::array_obj [pdf::obj_ref $F "Page 1"]]]<br />

575 pdf::put_obj $F "The catalog"\<br />

[pdf::dict_obj /Type /Catalog /Pages [pdf::obj_ref $F "The pages"]]<br />

There is really no po<strong>in</strong>t <strong>in</strong> mak<strong>in</strong>g an /Outl<strong>in</strong>es dictionary that would anyway<br />

be empty.<br />

Someth<strong>in</strong>g there is a po<strong>in</strong>t <strong>in</strong> mak<strong>in</strong>g is however a document <strong>in</strong>formation dictionary.<br />

578 pdf::put_obj $F "Document <strong>in</strong>fo" [pdf::dict_obj\<br />

579 /Title [pdf::text_obj "Hello aga<strong>in</strong>, world!"]\<br />

580 /CreationDate [pdf::date_obj [clock seconds]] ]<br />

582 pdf::close_pdf $F "The catalog" /Info [pdf::obj_ref $F "Document <strong>in</strong>fo"]<br />

583 〈/example2〉<br />

4 Document pages<br />

4.1 The tree <strong>of</strong> pages<br />

One <strong>of</strong> the quirks <strong>of</strong> <strong>PDF</strong> is the (very data structure) requirement that (amongst<br />

other th<strong>in</strong>gs) pages have to be organised <strong>in</strong> a tree structure, where l<strong>in</strong>ks go not only<br />

from parent to child, but also from child to parent. This is def<strong>in</strong>itely someth<strong>in</strong>g<br />

that programmers shouldn’t have to bother with, so the pdf package can take care<br />

<strong>of</strong> generat<strong>in</strong>g such a structure when pages are merely sequentially appended to<br />

the document.<br />

At the heart <strong>of</strong> the page tree generation lies the prelim<strong>in</strong>ary representations <strong>of</strong><br />

Pages tree nodes that have to be constructed before actual code can be written to<br />

the file. Every Pages node has an entry <strong>in</strong> the array <strong>of</strong> the file, and the contents<br />

<strong>of</strong> these entries are lists with the structure<br />

{kid label} {kid count} +<br />

where each pair <strong>of</strong> elements corresponds to one child node. The {kid label} is the<br />

reference label for this node and the {kid count} is the number <strong>of</strong> pages <strong>in</strong> that<br />

subtree.<br />

Build<strong>in</strong>g a Pages tree necessarily means that Pages nodes, which are <strong>in</strong>direct<br />

objects, have to be created. That <strong>in</strong> turn means that they will have to be assigned<br />

32


labels, and <strong>in</strong> order to avoid clashes with labels used elsewhere, the user is required<br />

to specify a label prefix for the Pages tree system to use. This prefix is stored <strong>in</strong><br />

the Pages/prefix entry <strong>of</strong> the file array.<br />

The maximal number <strong>of</strong> children a node is allowed to have is kept <strong>in</strong> the<br />

Pages/arity entry. The number <strong>of</strong> the most recently created node is kept <strong>in</strong> the<br />

Pages/last node.<br />

The Pages/attributes entry is a list <strong>of</strong> keys and values to <strong>in</strong>sert <strong>in</strong>to the root<br />

Pages node.<br />

pdf::beg<strong>in</strong>_pages (proc) The beg<strong>in</strong>_pages procedure <strong>in</strong>itialises the Pages tree system for a <strong>PDF</strong> file. The<br />

syntax is<br />

pdf::beg<strong>in</strong>_pages {file} {label prefix} {option} {value} ∗<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file and {label prefix} will be used as prefix<br />

<strong>of</strong> all reference labels created by the Pages tree system. An {option} {value} is<br />

either<br />

-arity {arity}<br />

or a pair <strong>of</strong> <strong>PDF</strong> objects, where the first is a name object. The {arity} is the<br />

maximal number <strong>of</strong> children a node is allowed to have; it defaults to 5. The <strong>PDF</strong><br />

object pairs will be <strong>in</strong>serted <strong>in</strong>to the root Pages node. Additional such items may<br />

be specified at end_pages.<br />

584 〈∗pkg〉<br />

585 proc pdf::beg<strong>in</strong>_pages {F prefix args} {<br />

586 upvar #0 [namespace current]::$F A<br />

587 set A(Pages/arity) 5<br />

588 set A(Pages/attributes) [list]<br />

589 set A(Pages/prefix) $prefix<br />

590 foreach {option value} $args {<br />

591 switch -glob -- $option -arity {<br />

592 set A(Pages/arity) $value<br />

593 } /* {<br />

594 lappend A(Pages/attributes) $option $value<br />

595 } default {<br />

596 error "Unknown option: $option"<br />

597 }<br />

598 }<br />

599 set A(Pages/last) 1<br />

600 set A(Pages/1) [list]<br />

601 }<br />

pdf::shipout (proc) This procedure writes a Page object to a file and <strong>in</strong>serts that <strong>in</strong>to the Pages tree<br />

<strong>of</strong> that file after all pages previously <strong>in</strong>serted. The syntax is<br />

pdf::shipout {file} {label} {key} {object} +<br />

where {file} is the <strong>PDF</strong> file identifier and {label} is the reference label for the page<br />

object. The {key} and {object} arguments are attributes for the page object (keys<br />

33


and values for the dicitionary). This should not <strong>in</strong>clude the /Type and /Parent<br />

attributes, which are <strong>in</strong>serted automatically.<br />

602 proc pdf::shipout {F label args} {<br />

603 upvar #0 [namespace current]::$F A<br />

604 if {[llength $A(Pages/$A(Pages/last))]/2 >= $A(Pages/arity)} then {<br />

605 <strong>in</strong>cr A(Pages/last)<br />

606 set A(Pages/$A(Pages/last)) [list]<br />

607 }<br />

608 put_obj $F $label [eval [l<strong>in</strong>sert $args 0 dict_obj /Type /Page\<br />

/Parent [obj_ref $F $A(Pages/prefix)$A(Pages/last)]]]<br />

610 lappend A(Pages/$A(Pages/last)) $label 1<br />

611 }<br />

pdf::end_pages (proc) The end_pages procedure completes the Pages tree for a <strong>PDF</strong> file and returns a<br />

reference to the root object <strong>of</strong> that tree. The syntax is<br />

pdf::make_pages_nodes<br />

(proc)<br />

pdf::end_pages {file} 〈attributes〉<br />

where the {file} is the <strong>PDF</strong> file identifier and 〈attributes〉 are attributes to <strong>in</strong>sert<br />

<strong>in</strong>to the root node <strong>of</strong> the Pages tree.<br />

The make_pages_nodes procedure takes a list <strong>of</strong> numbers <strong>of</strong> Pages nodes that<br />

have not yet been written to file and writes objects for these nodes to file. The<br />

syntax is<br />

pdf::make_pages_nodes {file} {node-list} {parent} ?<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file and {node-list} is the list <strong>of</strong> node<br />

numbers. If there is a {parent} argument then the Pages node with this number<br />

will be made the parent <strong>of</strong> the listed nodes, and the return value is the list <strong>of</strong><br />

reference labels and page counts that need to be <strong>in</strong>cluded <strong>in</strong> the Pages/〈parent〉<br />

entry <strong>of</strong> the file’s array. If there is not a {parent} argument then the procedure<br />

allocates a new Pages node and makes that the parent <strong>of</strong> the listed nodes; the<br />

result is then the number <strong>of</strong> the newly allocated parent.<br />

612 proc pdf::make_pages_nodes {F nodeL {parent -1}} {<br />

613 upvar #0 [namespace current]::$F A<br />

614 if {$parent < 0} then {<br />

615 set p [<strong>in</strong>cr A(Pages/last)]<br />

616 } else {<br />

617 set p $parent<br />

618 }<br />

619 set res [list]<br />

620 set parent_obj [obj_ref $F $A(Pages/prefix)$p]<br />

621 foreach i $nodeL {<br />

622 set count 0<br />

623 set kids [list array_obj]<br />

624 foreach {label c} $A(Pages/$i) {<br />

625 lappend kids [obj_ref $F $label]<br />

626 <strong>in</strong>cr count $c<br />

34


627 }<br />

628 set label $A(Pages/prefix)$i<br />

629 lappend res $label $count<br />

630 put_obj $F $label [dict_obj /Type /Pages /Kids [eval $kids]\<br />

/Count [<strong>in</strong>t_obj $count] /Parent $parent_obj]<br />

632 }<br />

633 if {$parent < 0} then {<br />

634 set A(Pages/$p) $res<br />

635 return $p<br />

636 } else {<br />

637 return $res<br />

638 }<br />

639 }<br />

The <strong>basic</strong> problem <strong>in</strong> end_pages is to construct the actual tree so that it is<br />

reasonably well balanced. The approach used below is to build the tree from<br />

the leaves to the root, always collect as many children as possible <strong>in</strong>to each new<br />

node, and move nodes up (well, towards the root; there is some disagreement as<br />

to whether that is up or down) one level <strong>in</strong> the tree if the number <strong>of</strong> nodes <strong>in</strong> the<br />

current level is not divisible by the tree arity.<br />

By allow<strong>in</strong>g nodes to migrate to higher levels, one creates a risk that the tree<br />

becomes unbalanced. This is managed <strong>in</strong> the procedure below by keep<strong>in</strong>g track <strong>of</strong><br />

nodes that are saturated, i.e., nodes that have leaves at different depths <strong>in</strong> their<br />

subtrees. By not allow<strong>in</strong>g saturated nodes to migrate to a higher level, one can<br />

ensure that the trees that are constructed are balanced. It is furthermore fairly<br />

easy to keep track <strong>of</strong> this, because one can choose nodes for migration <strong>in</strong> such a<br />

way that the only node <strong>in</strong> a level that may be saturated is the last one. This is<br />

possible because the maximal number <strong>of</strong> nodes that one may need to migrate is<br />

one less than the arity, and thus the nodes that migrated to the previous level and<br />

the saturated nodes <strong>in</strong> the previous level are always few enough that the last node<br />

<strong>of</strong> the new level can be parent <strong>of</strong> them all.<br />

640 proc pdf::end_pages {F args} {<br />

641 upvar #0 [namespace current]::$F A<br />

642 array set Attr $A(Pages/attributes)<br />

643 array set Attr $args<br />

In the first level <strong>of</strong> Pages nodes to creat, there are some special complications one<br />

has to deal with, so on a first read-through, it is better to start with the ma<strong>in</strong><br />

case.<br />

In the first level, one faces two additional complication that are not present <strong>in</strong><br />

the ma<strong>in</strong> case. The first is that the child nodes are Page nodes rather than Pages<br />

nodes; this means one cannot move them up to the forrest list be<strong>in</strong>g constructed.<br />

The second complication is that the parents <strong>of</strong> the Page nodes were fixed before<br />

all <strong>of</strong> their children had been created. This requires some special handl<strong>in</strong>g <strong>of</strong> the<br />

last node: if it does not have a full set <strong>of</strong> children, then it will have to be moved<br />

up <strong>in</strong> a slightly unconventional manner.<br />

644 set limit $A(Pages/last)<br />

35


645 if {[llength $A(Pages/$limit)]/2 >= $A(Pages/arity)} then {<br />

646 set saturated 0<br />

647 } else {<br />

648 set d [expr {[llength $A(Pages/$limit)]/2}]<br />

649 set L [list]<br />

650 while {$d < $A(Pages/arity) && [<strong>in</strong>cr limit -1]>=1} {<br />

651 set L [l<strong>in</strong>sert $L 0 $limit]<br />

652 <strong>in</strong>cr d<br />

653 }<br />

654 set last $A(Pages/last)<br />

655 set A(Pages/$last)\<br />

[concat [make_pages_nodes $F $L $last] $A(Pages/$last)]<br />

657 <strong>in</strong>cr limit -1<br />

658 set saturated 1<br />

659 }<br />

660 set forrest [list]<br />

661 set L [list]<br />

662 for {set n 1} {$n = $A(Pages/arity)} then {<br />

665 lappend forrest [make_pages_nodes $F $L]<br />

666 set L [list]<br />

667 }<br />

668 }<br />

669 if {[llength $L]} then {eval [list lappend forrest] $L}<br />

670 if {$saturated} then {lappend forrest $last}<br />

In the ma<strong>in</strong> case, the numbers those Pages nodes that have not yet been given<br />

a parent are kept <strong>in</strong> the forrest list. The <strong>basic</strong> approach is to build the next<br />

level start<strong>in</strong>g from the left <strong>of</strong> this list, assign<strong>in</strong>g as many <strong>of</strong> nodes as allowed to<br />

each new parent node that is created.<br />

The first complication is that the length <strong>of</strong> forrest need not be divisible by<br />

the specified tree arity. In this case, some number <strong>of</strong> nodes (those that are <strong>in</strong><br />

L below when the entire forrest has been processed) are simply moved up to<br />

the next level. This leads however to the next complication: if the last node <strong>in</strong><br />

forrest is saturated, then it may not be moved up. The limit variable is used<br />

for reserv<strong>in</strong>g those nodes that will be made sibl<strong>in</strong>gs <strong>of</strong> this last node.<br />

671 while {[llength $forrest] >= $A(Pages/arity)} {<br />

672 set newforrest [list]<br />

673 set limit\<br />

[expr {[llength $forrest] - ($saturated ? $A(Pages/arity) : 0)}]<br />

675 set L [list]<br />

676 foreach n $forrest {<br />

677 lappend L $n<br />

678 if {[llength $L] >= $A(Pages/arity)} then {<br />

679 lappend newforrest [make_pages_nodes $F $L]<br />

680 set L [list]<br />

681 }<br />

682 if {[<strong>in</strong>cr limit -1]


683 }<br />

684 if {[llength $L]} then {eval [list lappend newforrest] $L}<br />

685 if {$saturated} then {<br />

686 lappend newforrest [make_pages_nodes $F [lrange $forrest\<br />

[format end-%d [expr {$A(Pages/arity)-1}]] end]]<br />

688 } elseif {[llength $L]} then {<br />

689 set saturated 1<br />

690 }<br />

691 set forrest $newforrest<br />

692 }<br />

Here starts the endgame. The root node is special <strong>in</strong> that it has no parent but<br />

may recieve many additional attributes.<br />

693 if {[llength $forrest] > 1} then {<br />

694 set root [make_pages_nodes $F $forrest]<br />

695 } else {<br />

696 set root [l<strong>in</strong>dex $forrest 0]<br />

697 }<br />

698 set count 0<br />

699 set kids [list array_obj]<br />

700 foreach {label c} $A(Pages/$root) {<br />

701 lappend kids [obj_ref $F $label]<br />

702 <strong>in</strong>cr count $c<br />

703 }<br />

704 set res $A(Pages/prefix)$root<br />

705 set Attr(/Count) [<strong>in</strong>t_obj $count]<br />

706 set Attr(/Kids) [eval $kids]<br />

707 set Attr(/Type) /Pages<br />

708 put_obj $F $res [eval [list dict_obj] [array get Attr]]<br />

709 return $res<br />

710 }<br />

Although the above algorithm generates balanced trees <strong>of</strong> m<strong>in</strong>imal size (m<strong>in</strong>imal<br />

number <strong>of</strong> nodes), it does not always generate trees <strong>of</strong> m<strong>in</strong>imal height—the<br />

height may be one more than the m<strong>in</strong>imum. What decides this is surpris<strong>in</strong>gly<br />

enough a k<strong>in</strong>d <strong>of</strong> odd/even phenonemon: the rema<strong>in</strong>der class modulo the arity<br />

m<strong>in</strong>us one <strong>of</strong> the total number <strong>of</strong> pages! If one is lucky with this, the tree height<br />

atta<strong>in</strong>s the m<strong>in</strong>imum, and if one is unlucky, it comes out one larger than the<br />

possible m<strong>in</strong>imum.<br />

The reason that the arity m<strong>in</strong>us one turns up is that every node reduces the<br />

number <strong>of</strong> nodes without a parent by precisely one less than the number <strong>of</strong> children<br />

<strong>of</strong> that node. The algorithm keeps assign<strong>in</strong>g the maximal number <strong>of</strong> children to<br />

each node, until the level is so small that all nodes can be made children <strong>of</strong> the<br />

root node. The catch <strong>in</strong> that is that the number <strong>of</strong> children <strong>of</strong> the root node is<br />

decided by the rema<strong>in</strong>der class modulo the arity m<strong>in</strong>us one <strong>of</strong> the total number<br />

<strong>of</strong> pages, and this may turn out to be too small to fit <strong>in</strong> the necessary number <strong>of</strong><br />

pages unless the tree height is allowed to exceed the theoretical m<strong>in</strong>imum.<br />

Experiments <strong>in</strong>dicate that by plac<strong>in</strong>g the node with the least number <strong>of</strong> children<br />

at the first level <strong>in</strong>stead, it is always possible to fit the tree with<strong>in</strong> the m<strong>in</strong>imal<br />

37


height (while keep<strong>in</strong>g balance and m<strong>in</strong>imal size), but this is a bit tricky to do when<br />

one does not know f<strong>in</strong>al number <strong>of</strong> pages from the start, and therefore the simpler<br />

algorithm above was chosen <strong>in</strong>stead.<br />

4.2 Lengths and rectangles<br />

The default “user space” coord<strong>in</strong>ate system <strong>in</strong> a <strong>PDF</strong> file, which is also the coord<strong>in</strong>ate<br />

system used for e.g. l<strong>in</strong>ks and dest<strong>in</strong>ations, uses the Postscript (or “big”)<br />

po<strong>in</strong>t as length unit. S<strong>in</strong>ce this is not the unit which most people are most comfortable<br />

with, it is useful to provide conversion from other units.<br />

pdf::unit_factor (array) The unit_factor array is <strong>in</strong>dexed by names <strong>of</strong> length units. Its entries are the<br />

lengths <strong>of</strong> these units <strong>in</strong> terms <strong>of</strong> Postscript po<strong>in</strong>ts. The conversion factors are<br />

those <strong>of</strong> TEX [3, Ch. 10].<br />

711 namespace eval pdf {<br />

712 set unit_factor(bp) 1.0<br />

713 set unit_factor(<strong>in</strong>) 72.0<br />

714 set unit_factor(pt) [expr {$unit_factor(<strong>in</strong>) / 72.27}]<br />

715 set unit_factor(pc) [expr {$unit_factor(pt) * 12}]<br />

716 set unit_factor(cm) [expr {$unit_factor(<strong>in</strong>) / 2.54}]<br />

717 set unit_factor(mm) [expr {$unit_factor(<strong>in</strong>) / 25.4}]<br />

718 set unit_factor(dd) [expr {$unit_factor(pt) * 1238 / 1157}]<br />

719 set unit_factor(cc) [expr {$unit_factor(dd) * 12}]<br />

720 }<br />

Additional units could be added, if need be. For example <strong>in</strong> a context where the<br />

size <strong>of</strong> a screen pixel can be determ<strong>in</strong>ed (and this size is unique, i.e., Tk is not<br />

operat<strong>in</strong>g aga<strong>in</strong>st multiple screens with possibly different resolutions), it may be<br />

convenient to def<strong>in</strong>e a px or pixel entry for this unit.<br />

pdf::length (proc) This procedure handles conversion from a physical unit to <strong>PDF</strong> units. The syntax<br />

is<br />

pdf::length {value} {unit}<br />

where {unit} is a unit that has an entry <strong>in</strong> the unit_factor array and {value} is<br />

the numeric value <strong>in</strong> that unit.<br />

721 proc pdf::length {value unit} {<br />

722 variable unit_factor<br />

723 return [expr {$value * $unit_factor($unit)}]<br />

724 }<br />

pdf::length_obj (proc) This procedure comb<strong>in</strong>es the unit conversion <strong>of</strong> the length procedure with the<br />

formatt<strong>in</strong>g <strong>of</strong> real_obj. The syntax is<br />

pdf::length_obj {value} {unit} {precision} ?<br />

where {unit} is a unit that has an entry <strong>in</strong> the unit_factor array, {value} is the<br />

numeric value <strong>in</strong> that unit, and {precision} is as for real_obj.<br />

725 proc pdf::length_obj {value unit args} {<br />

38


726 if {[llength $args]==0} then {<br />

727 real_obj [length $value $unit]<br />

728 } elseif {[llength $args]==1} then {<br />

729 real_obj [length $value $unit] [l<strong>in</strong>dex $args 0]<br />

730 } else {<br />

731 error "Too many arguments."<br />

732 }<br />

733 }<br />

A data structure that is common <strong>in</strong> <strong>PDF</strong> documents is the rectangle. Below<br />

are some commands for operat<strong>in</strong>g on these <strong>in</strong> the form <strong>of</strong> a four element list<br />

{left} {bottom} {right} {top}<br />

pdf::rect_obj (proc) This procedure returns the <strong>PDF</strong> object (a <strong>PDF</strong> array) for a rectangle. The syntax<br />

is<br />

pdf::rect_obj {rectangle}<br />

and the rectangle coord<strong>in</strong>ates are encoded us<strong>in</strong>g real_obj with the default precision.<br />

734 proc pdf::rect_obj {R} {<br />

735 array_obj [real_obj [l<strong>in</strong>dex $R 0]] [real_obj [l<strong>in</strong>dex $R 1]]\<br />

[real_obj [l<strong>in</strong>dex $R 2]] [real_obj [l<strong>in</strong>dex $R 3]]<br />

737 }<br />

pdf::<strong>in</strong>t_rect_obj (proc) This procedure returns the <strong>PDF</strong> object (a <strong>PDF</strong> array) for a rectangle, after hav<strong>in</strong>g<br />

rounded its coord<strong>in</strong>ates to <strong>in</strong>tegers. The syntax is<br />

pdf::<strong>in</strong>t_rect_obj {rectangle}<br />

and the rectangle coord<strong>in</strong>ates are encoded us<strong>in</strong>g <strong>in</strong>t_obj.<br />

738 proc pdf::<strong>in</strong>t_rect_obj {R} {<br />

739 array_obj [<strong>in</strong>t_obj [expr {round([l<strong>in</strong>dex $R 0])}]]\<br />

[<strong>in</strong>t_obj [expr {round([l<strong>in</strong>dex $R 1])}]]\<br />

[<strong>in</strong>t_obj [expr {round([l<strong>in</strong>dex $R 2])}]]\<br />

[<strong>in</strong>t_obj [expr {round([l<strong>in</strong>dex $R 3])}]]<br />

743 }<br />

pdf::make_rect (proc) The make_rect procedure is a generic tool for mak<strong>in</strong>g rectangles with specified<br />

dimensions. The syntax is<br />

pdf::make_rect {option} {value} {unit} ? +<br />

where {option} is one <strong>of</strong> the follow<strong>in</strong>g:<br />

-width Distance from left to right<br />

-height Distance from bottom to top<br />

-left left<br />

-right right<br />

39


-top top<br />

-bottom bottom<br />

-ll {left bottom}<br />

-lr {right bottom}<br />

-ul {left top}<br />

-ur {right top}<br />

-center midpo<strong>in</strong>t<br />

-midx x-coord<strong>in</strong>ate <strong>of</strong> midpo<strong>in</strong>t<br />

-midy y-coord<strong>in</strong>ate <strong>of</strong> midpo<strong>in</strong>t<br />

The way it works is that the list <strong>of</strong> options is processed left to right, every<br />

option contributes some <strong>in</strong>formation about the wanted rectangle, and when all<br />

four coord<strong>in</strong>ates are known the rectangle is returned. The {value} is, depend<strong>in</strong>g<br />

on the option, either a number or a po<strong>in</strong>t (list <strong>of</strong> two numbers). The {unit} is the<br />

unit <strong>of</strong> the {value}; it defaults to bp if omitted.<br />

In the first process<strong>in</strong>g step, horizontal and vertical <strong>in</strong>formation is separated<br />

and values are converted to bp units. Information is collected <strong>in</strong> two arrays X and<br />

Y, where the entries have the follow<strong>in</strong>g mean<strong>in</strong>gs<br />

lo low coord<strong>in</strong>ate (left or bottom)<br />

hi high coord<strong>in</strong>ate (right or top)<br />

mid midpo<strong>in</strong>t coord<strong>in</strong>ate<br />

sz size (width or height)<br />

744 proc pdf::make_rect {args} {<br />

745 variable unit_factor<br />

746 lappend args -break<br />

747 set i 0<br />

748 foreach a $args {<br />

749 if {[array size X]>=2 && [array size Y]>=2} then {break}<br />

750 if {$i == 0} then {<br />

751 set option $a<br />

752 } elseif {$i == 1} then {<br />

753 set value $a<br />

754 } else {<br />

755 if {[<strong>in</strong>fo exists unit_factor($a)]} then {<br />

756 set factor $unit_factor($a)<br />

757 } else {<br />

758 set i 0<br />

759 set factor 1.0<br />

760 }<br />

761 switch -- $option {<br />

762 -width {set X(sz) [expr {$value * $factor}]}<br />

763 -height {set Y(sz) [expr {$value * $factor}]}<br />

764 -left {set X(lo) [expr {$value * $factor}]}<br />

765 -right {set X(hi) [expr {$value * $factor}]}<br />

766 -bottom {set Y(lo) [expr {$value * $factor}]}<br />

767 -top {set Y(hi) [expr {$value * $factor}]}<br />

40


768 -midx {set X(mid) [expr {$value * $factor}]}<br />

769 -midy {set Y(mid) [expr {$value * $factor}]}<br />

770 -center {<br />

771 set X(mid) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />

772 set Y(mid) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />

773 }<br />

774 -ll {<br />

775 set X(lo) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />

776 set Y(lo) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />

777 }<br />

778 -lr {<br />

779 set X(hi) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />

780 set Y(lo) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />

781 }<br />

782 -ul {<br />

783 set X(lo) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />

784 set Y(hi) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />

785 }<br />

786 -ur {<br />

787 set X(hi) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />

788 set Y(hi) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />

789 }<br />

790 -end {<br />

791 error "Insufficient <strong>in</strong>formation"<br />

792 }<br />

793 default {<br />

794 error "Unknown option: $option"<br />

795 }<br />

796 }<br />

797 if {$i == 0} then {<br />

798 set option $a<br />

799 } else {<br />

800 set i -1<br />

801 }<br />

802 }<br />

803 <strong>in</strong>cr i<br />

804 }<br />

In the second process<strong>in</strong>g step, the two pieces <strong>of</strong> <strong>in</strong>formation that have been specified<br />

are used for comput<strong>in</strong>g the ones that are needed.<br />

805 if {[array size X] > 2} then {<br />

806 error "More than two horizontal data given."<br />

807 }<br />

808 if {[array size Y] > 2} then {<br />

809 error "More than two vertical data given."<br />

810 }<br />

811 foreach a {X Y} {<br />

812 switch -- [lsort [array names $a]] {lo sz} {<br />

813 set ${a}(hi) [expr {[set ${a}(lo)] + [set ${a}(sz)]}]<br />

814 } {hi sz} {<br />

41


815 set ${a}(lo) [expr {[set ${a}(hi)] - [set ${a}(sz)]}]<br />

816 } {lo mid} {<br />

817 set ${a}(hi) [expr {2*[set ${a}(mid)] - [set ${a}(lo)]}]<br />

818 } {hi mid} {<br />

819 set ${a}(lo) [expr {2*[set ${a}(mid)] - [set ${a}(hi)]}]<br />

820 } {mid sz} {<br />

821 set ${a}(lo) [expr {[set ${a}(mid)] - 0.5*[set ${a}(sz)]}]<br />

822 set ${a}(hi) [expr {[set ${a}(mid)] + 0.5*[set ${a}(sz)]}]<br />

823 }<br />

824 }<br />

825 return [list $X(lo) $Y(lo) $X(hi) $Y(hi)]<br />

826 }<br />

pdf::standard_rect (proc) The standard_rect procedure exchanges high and low coord<strong>in</strong>ates <strong>of</strong> a rectangle<br />

as needed to ensure that height and width are non-negative. The syntax is<br />

pdf::standard_rect {rect}<br />

and the return value is the standardized rectangle.<br />

827 proc pdf::standard_rect {R} {<br />

828 foreach {l b r t} $R {break}<br />

829 if {$l > $r} then {foreach {l r} [list $r $l] {break}}<br />

830 if {$b > $t} then {foreach {b t} [list $t $b] {break}}<br />

831 return [list $l $b $r $t]<br />

832 }<br />

pdf::<strong>in</strong>set_rect (proc) The <strong>in</strong>set_rect procedure moves the sides <strong>of</strong> a rectangle by specified lengths.<br />

There are three syntaxes<br />

pdf::<strong>in</strong>set_rect {rect} {amount} {unit}<br />

pdf::<strong>in</strong>set_rect {rect} {dx} {dy} {unit}<br />

pdf::<strong>in</strong>set_rect {rect} {dl} {db} {dr} {dt} {unit}<br />

where {rect} is the rectangle to <strong>in</strong>set and {unit} is the length unit <strong>in</strong> which the<br />

<strong>in</strong>set amount is specified. Positive amounts make the rectangle smaller, negative<br />

amounts make it larger. The result <strong>in</strong> the new rectangle.<br />

In the first form, all sides are moved by the same {amount}. In the second<br />

form, the left and right sides are moved by {dx} and the top and bottom sides<br />

are moved by {dy}. In the third form, the left, bottom, right, and top sides are<br />

moved by {dl}, {db}, {dr}, and {dt} respectively.<br />

833 proc pdf::<strong>in</strong>set_rect {R args} {<br />

834 if {[llength $args] != 2 && [llength $args] != 3 && [llength $args]\<br />

!= 5} then {<br />

836 error "Wrong number <strong>of</strong> arguments"<br />

837 }<br />

838 set factor [length 1 [l<strong>in</strong>dex $args end]]<br />

839 set args [lrange $args 0 end-1]<br />

840 set D [lrange [concat $args $args $args $args] 0 3]<br />

841 set res [list]<br />

42


842 foreach a $R da $D sign {1 1 -1 -1} {<br />

843 lappend res [expr {$a + $da*$factor*$sign}]<br />

844 }<br />

845 return $res<br />

846 }<br />

pdf::<strong>of</strong>fset_rect (proc) The <strong>of</strong>fset_rect procedure moves a rectangle <strong>in</strong> the plane, but preserves its<br />

width and height. The syntax is<br />

pdf:<strong>of</strong>fset_rect {rect} {dx} {dy} {unit} ?<br />

where {rect} is the rectangle, {dx} and {dy} are the horizontal and vertical displacement<br />

amounts, and {unit} is the unit (which defaults to bp) <strong>of</strong> these amounts.<br />

The return value is the <strong>of</strong>fset rectangle.<br />

847 proc pdf::<strong>of</strong>fset_rect {R dx dy {unit bp}} {<br />

848 set factor [length 1 $unit]<br />

849 set res [list]<br />

850 foreach {x y} $R {<br />

851 lappend res [expr {$x + $factor*$dx}] [expr {$y + $factor*$dy}]<br />

852 }<br />

853 return $res<br />

854 }<br />

pdf::wh_rect (proc) This procedure returns the list<br />

{left} {bottom} {width} {height}<br />

that corresponds to a rectangle. The syntax is<br />

pdf::wh_rect {rect}<br />

This procedure may be used to convert a rectangle to the list <strong>of</strong> operands required<br />

by the re <strong>PDF</strong> operator.<br />

855 proc pdf::wh_rect {rect} {<br />

856 list [l<strong>in</strong>dex $rect 0] [l<strong>in</strong>dex $rect 1]\<br />

[expr {[l<strong>in</strong>dex $rect 2] - [l<strong>in</strong>dex $rect 0]}]\<br />

[expr {[l<strong>in</strong>dex $rect 3] - [l<strong>in</strong>dex $rect 1]}]<br />

859 }<br />

4.3 Paper sizes<br />

pdf::paper_rect (array) It is convenient to have some standard paper sizes readily available as rectangles.<br />

The paper_rect array is <strong>in</strong>itialised with a couple <strong>of</strong> these.<br />

860 namespace eval pdf {<br />

861 set paper_rect(A4) [make_rect -ll {0 0} -width 210 mm -height 297 mm]<br />

863 set paper_rect(A4R)\<br />

[make_rect -ll {0 0} -width 297 mm -height 210 mm]<br />

865 set paper_rect(letter)\<br />

[make_rect -ll {0 0} -width 8.5 <strong>in</strong> -height 11 <strong>in</strong>]<br />

43


867 set paper_rect(legal)\<br />

[make_rect -ll {0 0} -width 8.5 <strong>in</strong> -height 14 <strong>in</strong>]<br />

869 }<br />

870 〈/pkg〉<br />

4.4 A multi-page example<br />

The purpose <strong>of</strong> the follow<strong>in</strong>g is ma<strong>in</strong>ly to generate a multipage document to test<br />

the page tree generation. Hence the actual document length (<strong>in</strong> pages) is factored<br />

out as a parameter set <strong>in</strong> the first l<strong>in</strong>e.<br />

871 〈∗example3〉<br />

872 set document_pages 19<br />

873 set F [pdf::rewrite_pdf {pages.pdf}]<br />

The next couple <strong>of</strong> l<strong>in</strong>es determ<strong>in</strong>e the page layout. The rectangle paper determ<strong>in</strong>es<br />

the page size. Every page conta<strong>in</strong>s as graphic the rectangle frame. foot_x<br />

and foot_y are coord<strong>in</strong>ates for the page foot.<br />

874 set paper $pdf::paper_rect(A4)<br />

875 set frame [pdf::<strong>in</strong>set_rect $paper 41 60 41 30 mm]<br />

876 set foot_y [expr {[l<strong>in</strong>dex $frame 1] - [pdf::length 36 pt]}]<br />

877 set foot_x [expr {0.5*[l<strong>in</strong>dex $frame 0] + 0.5*[l<strong>in</strong>dex $frame 2]}]<br />

This is preparation for writ<strong>in</strong>g the page numbers. First, a font is needed. Second,<br />

I want the page numbers to be centered. This means I need to measure the width<br />

<strong>of</strong> the str<strong>in</strong>g to show before show<strong>in</strong>g it. Luckily the digits <strong>in</strong> Times-Roman are all<br />

half an em wide. The 0.25*$size is thus half the width <strong>of</strong> a digit.<br />

878 pdf::put_obj $F "Times" [pdf::dict_obj /Type /Font /Subtype /Type1\<br />

/Name /F1 /BaseFont /Times-Roman /Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />

881 proc put_page_no {F num} {<br />

882 global foot_x foot_y<br />

883 pdf::name_resource Times $F Font [pdf::obj_ref $F "Times"]<br />

884 set size [pdf::length 10 pt]<br />

885 set thepage [format %d $num]<br />

886 pdf::pr<strong>in</strong>tf $F {BT %o %r Tf 1 0 0 1 %r2 Tm %s Tj ET} $Times $size\<br />

[expr {$foot_x - 0.25*$size*[str<strong>in</strong>g length $thepage]}] $foot_y\<br />

$thepage<br />

889 }<br />

890 pdf::beg<strong>in</strong>_pages $F "Pages\#" /MediaBox [pdf::rect_obj $paper]<br />

891 array unset Rez<br />

892 for {set page 1} {$page


pdf::file〈num〉<br />

(Outl<strong>in</strong>es/prefix)<br />

pdf::file〈num〉<br />

(Outl<strong>in</strong>es/last)<br />

pdf::file〈num〉<br />

(Outl<strong>in</strong>es/stack)<br />

pdf::file〈num〉<br />

(Outl<strong>in</strong>e/〈str<strong>in</strong>g〉)<br />

pdf::file〈num〉<br />

(Outl<strong>in</strong>e/parent)<br />

pdf::file〈num〉<br />

(Outl<strong>in</strong>e/first)<br />

pdf::file〈num〉<br />

(Outl<strong>in</strong>e/last)<br />

pdf::file〈num〉<br />

(Outl<strong>in</strong>e/count)<br />

pdf::file〈num〉<br />

(Outl<strong>in</strong>e/prev)<br />

pdf::file〈num〉<br />

(Outl<strong>in</strong>e/open)<br />

902 pdf::put_obj $F "The catalog" [pdf::dict_obj\<br />

903 /Type /Catalog\<br />

904 /Pages [pdf::obj_ref $F $Pages]]<br />

905 pdf::close_pdf $F "The catalog"<br />

906 〈/example3〉<br />

5 Document outl<strong>in</strong>e<br />

The “outl<strong>in</strong>e” <strong>of</strong> a <strong>PDF</strong> document is the table <strong>of</strong> contents that one <strong>of</strong>ten sees <strong>in</strong> a<br />

separate pane next to the pane actually show<strong>in</strong>g some page <strong>of</strong> the document. The<br />

procedures below handle build<strong>in</strong>g the data structure encod<strong>in</strong>g this, while leav<strong>in</strong>g<br />

it to the user to provide the l<strong>in</strong>ks to actual document content.<br />

5.1 Low-level stuff<br />

As with the Pages tree, build<strong>in</strong>g an outl<strong>in</strong>e tree <strong>in</strong>volves automatically creat<strong>in</strong>g<br />

nodes for the tree. (This node creation could have been made explicit, but there<br />

doesn’t seem to be much po<strong>in</strong>t <strong>in</strong> that.) To prevent that the labels <strong>of</strong> these clash<br />

with the labels <strong>of</strong> other objects, each outl<strong>in</strong>e node has a special prefix which<br />

is stored <strong>in</strong> the Outl<strong>in</strong>es/prefix entry <strong>of</strong> the file array. The rest <strong>of</strong> the label<br />

is a decimal number which is assigned sequentially. The most recently assigned<br />

number is kept <strong>in</strong> the Outl<strong>in</strong>es/last entry.<br />

The <strong>in</strong>formation kept track <strong>of</strong> for the build<strong>in</strong>g <strong>of</strong> an outl<strong>in</strong>e tree is dist<strong>in</strong>guished<br />

by scope as belong<strong>in</strong>g to one <strong>of</strong> two scopes. Th<strong>in</strong>gs that are relevant only to the<br />

current position <strong>in</strong> the tree are kept <strong>in</strong> Outl<strong>in</strong>e/〈str<strong>in</strong>g〉 entries, whereas th<strong>in</strong>gs<br />

that are more generally relevant are kept <strong>in</strong> Outl<strong>in</strong>es/〈str<strong>in</strong>g〉 entries (note the<br />

extra s). There is a stack <strong>in</strong> the Outl<strong>in</strong>es/stack entry onto which the current<br />

position can be pushed and later popped <strong>of</strong>f. This stack is a list where the last<br />

element is topmost. The elements themselves are the results <strong>of</strong> an array get for<br />

all Outl<strong>in</strong>e/〈str<strong>in</strong>g〉 entries <strong>in</strong> the file array.<br />

The current state <strong>of</strong> the tree construction is, <strong>in</strong> a sense, located slighly below the<br />

level where nodes are be<strong>in</strong>g added. The l<strong>in</strong>ks between this level and its parent are<br />

stored <strong>in</strong> the Outl<strong>in</strong>e/parent, Outl<strong>in</strong>e/first, and Outl<strong>in</strong>e/last entries <strong>in</strong> the<br />

file array. All three are numbers which when appended to the prefix produce the<br />

node labels.<br />

Outl<strong>in</strong>e/parent is the parent node, Outl<strong>in</strong>e/first is the first child <strong>of</strong> the<br />

parent, and Outl<strong>in</strong>e/last is the (currently) last child <strong>of</strong> the parent.<br />

Outl<strong>in</strong>e/count is the number <strong>of</strong> children <strong>of</strong> the parent, <strong>in</strong>clud<strong>in</strong>g any children<br />

<strong>of</strong> open child nodes. If this is zero then the current level <strong>in</strong> the outl<strong>in</strong>e hierarchy<br />

is empty, which amongst other th<strong>in</strong>gs implies that Outl<strong>in</strong>e/first and<br />

Outl<strong>in</strong>e/last has not been <strong>in</strong>itialised.<br />

Outl<strong>in</strong>e/prev is, if it is set, the number <strong>of</strong> the predecessor (<strong>in</strong> the same level)<br />

<strong>of</strong> the node currently be<strong>in</strong>g constructed. It should not be set when the node is the<br />

first node on that level.<br />

45


pdf::file〈num〉<br />

(Outl<strong>in</strong>e//〈name〉)<br />

pdf::put_outl<strong>in</strong>e_node<br />

(proc)<br />

pdf::outl<strong>in</strong>e_node_set<br />

(proc)<br />

Outl<strong>in</strong>e/open is a boolean for whether the current node should be open (i.e.,<br />

its children, if it will get any, will by default be visible).<br />

Explicit <strong>PDF</strong> object for outl<strong>in</strong>e dictionaries are also stored <strong>in</strong> Outl<strong>in</strong>e/ entries<br />

<strong>of</strong> the file array. In this case, the <strong>in</strong>dex suffix is the <strong>PDF</strong> name object for the<br />

dictionary key.<br />

The put_outl<strong>in</strong>e_node procedure outputs the current node <strong>of</strong> an outl<strong>in</strong>e to file.<br />

The syntax is<br />

pdf::put_outl<strong>in</strong>e_node {file} {option} {value} ∗<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file. An {option} {value} is a pair <strong>of</strong> <strong>PDF</strong><br />

objects, where the first is a name object. These objects will be placed <strong>in</strong> the <strong>PDF</strong><br />

dictionary object for this node, possibly overrid<strong>in</strong>g a previously specified pair with<br />

the same {option}. There is no particular return value.<br />

The procedure clears the Outl<strong>in</strong>e//〈name〉 part <strong>of</strong> the file array. It does<br />

not generate any /First, /Last, or /Count items. It does not <strong>in</strong>crement<br />

Outl<strong>in</strong>e/count, because that is the responsibility <strong>of</strong> the procedure that allocated<br />

a node number for this node.<br />

907 〈∗pkg〉<br />

908 proc pdf::put_outl<strong>in</strong>e_node {F args} {<br />

909 upvar #0 [namespace current]::$F A<br />

910 foreach name [array names A Outl<strong>in</strong>e//*] {<br />

911 set N([str<strong>in</strong>g range $name 8 end]) $A($name)<br />

912 }<br />

913 foreach {name value} $args {<br />

914 if {[str<strong>in</strong>g match /* $name]} then {<br />

915 set N($name) $value<br />

916 } else {<br />

917 error "Bad option ’$name’"<br />

918 }<br />

919 }<br />

920 if {[<strong>in</strong>fo exists A(Outl<strong>in</strong>e/prev)]} then {<br />

921 set N(/Prev) [obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/prev)]<br />

922 }<br />

923 set N(/Parent) [obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/parent)]<br />

924 put_obj $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/last) [<br />

925 eval [l<strong>in</strong>sert [array get N] 0 dict_obj]<br />

926 ]<br />

927 array unset A Outl<strong>in</strong>e//*<br />

928 }<br />

The outl<strong>in</strong>e_node_set procedure sets fields for the current outl<strong>in</strong>e node. The<br />

syntax is one <strong>of</strong><br />

pdf::outl<strong>in</strong>e_node_set {file} {args}<br />

pdf::outl<strong>in</strong>e_node_set {file} {option} {value} ∗<br />

46


where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file. The first form is merely a variant<br />

on the second form, where the {args} is treated as a list <strong>of</strong> the arguments that<br />

should have followed the {file}.<br />

An {option} {value} is either a pair <strong>of</strong> <strong>PDF</strong> objects, where the first is a name<br />

object, or<br />

-open {boolean}<br />

The -open option sets the open state <strong>of</strong> the current node. Other options set an<br />

entry <strong>in</strong> the dictionary object for the current node. There is no particular return<br />

value.<br />

929 proc pdf::outl<strong>in</strong>e_node_set {F args} {<br />

930 upvar #0 [namespace current]::$F A<br />

931 if {[llength $args] == 1} then {set args [l<strong>in</strong>dex $args 0]}<br />

932 foreach {option value} $args {<br />

933 switch -glob -- $option /* {<br />

934 set A(Outl<strong>in</strong>e/$option) $value<br />

935 } -open {<br />

936 set A(Outl<strong>in</strong>e/open) $value<br />

937 } default {<br />

938 error "Bad option ’$option’"<br />

939 }<br />

940 }<br />

941 }<br />

pdf::outl<strong>in</strong>e_item (proc) The outl<strong>in</strong>e_item procedure creates a new outl<strong>in</strong>e node at the current level. If<br />

there already was a current outl<strong>in</strong>e node then that is output first. The syntax is<br />

pdf::outl<strong>in</strong>e_item {file} {title} {option} {value} ∗<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file and {title} is the title <strong>of</strong> the new<br />

outl<strong>in</strong>e node. An {option} {value} is either a pair <strong>of</strong> <strong>PDF</strong> objects, where the first<br />

is a name object, or<br />

-open {boolean}<br />

The -open option sets the open state <strong>of</strong> the new child node. The default for this<br />

option is 0. The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the new<br />

child node. There is no particular return value.<br />

942 proc pdf::outl<strong>in</strong>e_item {F title args} {<br />

943 upvar #0 [namespace current]::$F A<br />

The first step deals with whatever should hold the l<strong>in</strong>k to the new node, usually<br />

the previous current node, which will be output. This <strong>in</strong>volves allocat<strong>in</strong>g a number<br />

for the new node and therefore also <strong>in</strong>crement<strong>in</strong>g Outl<strong>in</strong>e/count.<br />

944 <strong>in</strong>cr A(Outl<strong>in</strong>es/last)<br />

945 if {$A(Outl<strong>in</strong>e/count)} then {<br />

946 put_outl<strong>in</strong>e_node $F /Next\<br />

[obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>es/last)]<br />

47


pdf::outl<strong>in</strong>e_beg<strong>in</strong>group<br />

(proc)<br />

pdf::outl<strong>in</strong>e_endgroup<br />

(proc)<br />

948 set A(Outl<strong>in</strong>e/prev) $A(Outl<strong>in</strong>e/last)<br />

949 } else {<br />

950 set A(Outl<strong>in</strong>e/first) $A(Outl<strong>in</strong>es/last)<br />

951 }<br />

952 <strong>in</strong>cr A(Outl<strong>in</strong>e/count)<br />

The second step is merely some entry <strong>in</strong>itialisation for the new node.<br />

953 set A(Outl<strong>in</strong>e/last) $A(Outl<strong>in</strong>es/last)<br />

954 set A(Outl<strong>in</strong>e//Title) [text_obj $title]<br />

955 set A(Outl<strong>in</strong>e/open) 0<br />

956 outl<strong>in</strong>e_node_set $F $args<br />

957 }<br />

The outl<strong>in</strong>e_beg<strong>in</strong>group procedure pushes the current state onto the stack and<br />

makes the current node the parent for the new current state. The syntax is<br />

pdf::outl<strong>in</strong>e_beg<strong>in</strong>group {file} {option} {value} ∗<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file. An {option} {value} is either a pair<br />

<strong>of</strong> <strong>PDF</strong> objects, where the first is a name object, or<br />

-open {boolean}<br />

The -open option sets the open state <strong>of</strong> the parent node. The default for this<br />

option is 0. The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the<br />

parent node. There is no particular return value.<br />

958 proc pdf::outl<strong>in</strong>e_beg<strong>in</strong>group {F args} {<br />

959 upvar #0 [namespace current]::$F A<br />

960 if {!$A(Outl<strong>in</strong>e/count)} then {<br />

961 error "There is no current node to make the parent <strong>of</strong> a new\<br />

group."<br />

963 }<br />

964 outl<strong>in</strong>e_node_set $F $args<br />

965 lappend A(Outl<strong>in</strong>es/stack) [array get A Outl<strong>in</strong>e/*]<br />

966 set parent $A(Outl<strong>in</strong>e/last)<br />

967 array unset A Outl<strong>in</strong>e/*<br />

968 set A(Outl<strong>in</strong>e/parent) $parent<br />

969 set A(Outl<strong>in</strong>e/count) 0<br />

970 }<br />

The outl<strong>in</strong>e_endgroup procedure ends the current level <strong>of</strong> outl<strong>in</strong>e nodes and pops<br />

one element <strong>of</strong>f the stack, thus turn<strong>in</strong>g the current parent back <strong>in</strong>to the current<br />

node, as it was before the match<strong>in</strong>g outl<strong>in</strong>e_endgroup. The syntax is<br />

pdf::outl<strong>in</strong>e_endgroup {file} {option} {value} ∗<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file. An {option} {value} is either a pair<br />

<strong>of</strong> <strong>PDF</strong> objects, where the first is a name object, or<br />

-open {boolean}<br />

48


pdf::file〈num〉<br />

(Outl<strong>in</strong>es/levels)<br />

pdf::outl<strong>in</strong>e_head<strong>in</strong>g<br />

(proc)<br />

The -open option can be used to override the open/closed state <strong>of</strong> the node, and<br />

can thus control whether the level <strong>of</strong> outl<strong>in</strong>e items that was ended will be open by<br />

default. The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the current<br />

node popped <strong>of</strong>f the stack, i.e., the previous parent node.<br />

971 proc pdf::outl<strong>in</strong>e_endgroup {F args} {<br />

972 upvar #0 [namespace current]::$F A<br />

973 set count $A(Outl<strong>in</strong>e/count)<br />

974 if {$count} then {<br />

975 put_outl<strong>in</strong>e_node $F<br />

976 lappend args /First [<br />

977 obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/first)<br />

978 ] /Last [<br />

979 obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/last)<br />

980 ]<br />

981 }<br />

982 array unset A Outl<strong>in</strong>e/*<br />

983 array set A [l<strong>in</strong>dex $A(Outl<strong>in</strong>es/stack) end]<br />

984 set A(Outl<strong>in</strong>es/stack) [lreplace $A(Outl<strong>in</strong>es/stack) end end]<br />

985 outl<strong>in</strong>e_node_set $F $args<br />

986 if {$count} then {<br />

987 if {$A(Outl<strong>in</strong>e/open)} then {<br />

988 set A(Outl<strong>in</strong>e//Count) [<strong>in</strong>t_obj $count]<br />

989 <strong>in</strong>cr A(Outl<strong>in</strong>e/count) $count<br />

990 } else {<br />

991 set A(Outl<strong>in</strong>e//Count) [<strong>in</strong>t_obj [expr {-$count}]]<br />

992 }<br />

993 }<br />

994 }<br />

5.2 An outl<strong>in</strong>e <strong>of</strong> head<strong>in</strong>gs<br />

One <strong>of</strong> the most common models for document structur<strong>in</strong>g is to have a family <strong>of</strong><br />

commands which say “make a level n head<strong>in</strong>g” and are supposed to be used at the<br />

beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> each section/subsection/. . . <strong>in</strong> the document. This model is useful<br />

also for construct<strong>in</strong>g a table <strong>of</strong> contents such as the outl<strong>in</strong>e.<br />

The Outl<strong>in</strong>es/levels entry <strong>of</strong> the file array is the list <strong>of</strong> head<strong>in</strong>g levels nested<br />

around the current outl<strong>in</strong>e node, with the last element be<strong>in</strong>g the level <strong>of</strong> that node.<br />

The list is empty before the first node has been <strong>in</strong>serted. Apart from that situation,<br />

the list length should always be one greater than that <strong>of</strong> the Outl<strong>in</strong>es/stack entry.<br />

The outl<strong>in</strong>e_head<strong>in</strong>g procedure adds a new head<strong>in</strong>g to the document outl<strong>in</strong>e.<br />

The syntax is<br />

pdf::outl<strong>in</strong>e_head<strong>in</strong>g {file} {level} {title} {option} {value} ∗<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file, {level} is the nom<strong>in</strong>al level <strong>of</strong> this<br />

item, and {title} is the title. An {option} {value} is either a pair <strong>of</strong> <strong>PDF</strong> objects,<br />

where the first is a name object, or<br />

49


-open {boolean}<br />

The -open option controls whether this item will be open by default, i.e., if its<br />

subitems (if there will be any) should be shown. It defaults to false (closed).<br />

The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the new item. These<br />

are what one should use to specify a dest<strong>in</strong>ation or equivalent for the outl<strong>in</strong>e item.<br />

The {level} is relative, and can be an arbitrary str<strong>in</strong>g. The way it is used<br />

is that if {level} is greater than the current level, then a new level is begun.<br />

Else if {level} is greater than the previous level, the item is a sibl<strong>in</strong>g <strong>of</strong> the last<br />

item and the current level is updated. Otherwise the current level is ended and<br />

the issue is reexam<strong>in</strong>ed. This dynamically adapts to the set <strong>of</strong> {level}s actually<br />

used <strong>in</strong> a document, even if these are not consecutive. It also gracefully copes<br />

with <strong>in</strong>consistencies such as forgett<strong>in</strong>g some head<strong>in</strong>g level at the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> a<br />

document.<br />

There is no particular return value.<br />

995 proc pdf::outl<strong>in</strong>e_head<strong>in</strong>g {F level title args} {<br />

996 upvar #0 [namespace current]::$F A<br />

997 if\<br />

{[llength $A(Outl<strong>in</strong>es/levels)] > [llength $A(Outl<strong>in</strong>es/stack)] + 1}\<br />

then {<br />

999 set A(Outl<strong>in</strong>es/levels)\<br />

[lrange $A(Outl<strong>in</strong>es/levels) 0 [llength $A(Outl<strong>in</strong>es/stack)]]<br />

1001 }<br />

1002 while {<br />

1003 $level 1<br />

1005 } {<br />

1006 outl<strong>in</strong>e_endgroup $F<br />

1007 set A(Outl<strong>in</strong>es/levels) [lreplace $A(Outl<strong>in</strong>es/levels) end end]<br />

1008 }<br />

1009 if {$A(Outl<strong>in</strong>e/count) && $level > [l<strong>in</strong>dex $A(Outl<strong>in</strong>es/levels) end]}\<br />

then {<br />

1011 lappend A(Outl<strong>in</strong>es/levels) $level<br />

1012 outl<strong>in</strong>e_beg<strong>in</strong>group $F<br />

1013 } else {<br />

1014 set A(Outl<strong>in</strong>es/levels)\<br />

[lreplace $A(Outl<strong>in</strong>es/levels) end end $level]<br />

1016 }<br />

1017 eval [l<strong>in</strong>sert $args 0 outl<strong>in</strong>e_item $F $title]<br />

1018 }<br />

pdf::beg<strong>in</strong>_outl<strong>in</strong>e (proc) The beg<strong>in</strong>_outl<strong>in</strong>e procedure <strong>in</strong>itialises the outl<strong>in</strong>e system for a <strong>PDF</strong> file. The<br />

syntax is<br />

pdf::beg<strong>in</strong>_outl<strong>in</strong>e {file} {prefix}<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file and {prefix} is a prefix that will be<br />

used for all labels for <strong>in</strong>direct objects that the outl<strong>in</strong>e system creates. There is no<br />

particular return value.<br />

1019 proc pdf::beg<strong>in</strong>_outl<strong>in</strong>e {F prefix} {<br />

50


1020 upvar #0 [namespace current]::$F A<br />

1021 set A(Outl<strong>in</strong>es/prefix) $prefix<br />

1022 set A(Outl<strong>in</strong>es/last) 1<br />

1023 set A(Outl<strong>in</strong>es/stack) [list]<br />

1024 set A(Outl<strong>in</strong>es/levels) [list]<br />

1025 set A(Outl<strong>in</strong>e/parent) 1<br />

1026 set A(Outl<strong>in</strong>e/count) 0<br />

1027 }<br />

pdf::end_outl<strong>in</strong>e (proc) The end_outl<strong>in</strong>e procedure f<strong>in</strong>ishes <strong>of</strong>f the outl<strong>in</strong>e tree for a <strong>PDF</strong> file and returns<br />

the label <strong>of</strong> the root node. The syntax is<br />

pdf::end_outl<strong>in</strong>e {file}<br />

where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file.<br />

1028 proc pdf::end_outl<strong>in</strong>e {F} {<br />

1029 upvar #0 [namespace current]::$F A<br />

1030 while {[llength $A(Outl<strong>in</strong>es/stack)]} {<br />

1031 outl<strong>in</strong>e_endgroup $F<br />

1032 }<br />

1033 put_outl<strong>in</strong>e_node $F<br />

1034 set label "$A(Outl<strong>in</strong>es/prefix)1"<br />

1035 set call [list dict_obj /Type /Outl<strong>in</strong>es]<br />

1036 if {[<strong>in</strong>fo exists A(Outl<strong>in</strong>e/first)]} then {<br />

1037 lappend call /First [<br />

1038 obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/first)<br />

1039 ] /Last [<br />

1040 obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/last)<br />

1041 ] /Count [<strong>in</strong>t_obj $A(Outl<strong>in</strong>e/count)]<br />

1042 }<br />

1043 put_obj $F $label [eval $call]<br />

1044 return $label<br />

1045 }<br />

1046 〈/pkg〉<br />

5.3 An outl<strong>in</strong>e example<br />

The purpose <strong>of</strong> the follow<strong>in</strong>g is to test the outl<strong>in</strong>e generation. The structure,<br />

which is perhaps somewhat atypical, is to first generate all the document contents<br />

and then generate an outl<strong>in</strong>e with l<strong>in</strong>ks <strong>in</strong>to the document.<br />

1047 〈∗example4〉<br />

1048 set F [pdf::rewrite_pdf {outl<strong>in</strong>e.pdf}]<br />

The idea for the page contents is that this should consist <strong>of</strong> the numbers 1–12,<br />

each rather large, on a page <strong>of</strong> its own, and <strong>in</strong> a different font.<br />

1049 pdf::beg<strong>in</strong>_pages $F "Pages\#"\<br />

1050 /MediaBox [pdf::rect_obj $pdf::paper_rect(A4)]<br />

1051 set page 1<br />

1052 foreach font {<br />

51


1053 Times-Roman Helvetica Courier<br />

1054 Times-Bold Helvetica-Bold Courier-Bold<br />

1055 Times-Italic Helvetica-Oblique Courier-Oblique<br />

1056 Times-BoldItalic Helvetica-BoldOblique Courier-BoldOblique<br />

1057 } {<br />

1058 pdf::put_obj $F $font [pdf::dict_obj /Type /Font /Subtype /Type1\<br />

/BaseFont [pdf::name_obj $font] /Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />

1061 pdf::beg<strong>in</strong>_contents "" $F "Page $page contents"<br />

1062 pdf::name_resource fid $F Font [pdf::obj_ref $F $font]<br />

1063 pdf::pr<strong>in</strong>tf $F {BT %o %r Tf 1 0 0 1 %r2 Tm %s Tj ET} $fid\<br />

[pdf::length 10 cm] [pdf::length 5 cm] [pdf::length 10 cm] $page<br />

1067 pdf::end_contents Rez $F<br />

1068 pdf::shipout $F "Page $page" /Contents\<br />

[pdf::obj_ref $F "Page $page contents"] /Resources\<br />

[pdf::resource_dict_obj Rez]<br />

1071 unset Rez<br />

1072 <strong>in</strong>cr page<br />

1073 }<br />

1074 set Pages [pdf::end_pages $F]<br />

1075 pdf::beg<strong>in</strong>_outl<strong>in</strong>e $F "TOC\#"<br />

1076 pdf::outl<strong>in</strong>e_head<strong>in</strong>g $F 1 "Numeric" /Dest [<br />

1077 pdf::array_obj [pdf::obj_ref $F "Page 1"] /Fit<br />

1078 ]<br />

1079 for {set page 1} {$page


1104 pdf::array_obj [pdf::obj_ref $F "Page $page"] /XYZ\<br />

[pdf::null_obj] [pdf::null_obj] [pdf::real_obj $page]<br />

1106 ]<br />

1107 <strong>in</strong>cr page<br />

1108 }<br />

1109 pdf::outl<strong>in</strong>e_head<strong>in</strong>g $F 1 "Russian" /Dest [<br />

1110 pdf::array_obj [pdf::obj_ref $F "Page 1"] /FitV\<br />

[pdf::length_obj 5 cm]<br />

1112 ]<br />

1113 set page 1<br />

1114 foreach {Ruslish name} {<br />

1115 Od<strong>in</strong> \u041E\u0434\u0438\u043D<br />

1116 Dva \u0414\u0432\u0430<br />

1117 Tri \u0422\u0440\u0438<br />

1118 !Cetyre \u0427\u0435\u0442\u044B\u0440\u0435<br />

1119 P!ath \u041F\u044F\u0442\u044C<br />

1120 !Sesth \u0428\u0435\u0441\u0442\u044C<br />

1121 Semh \u0421\u0435\u043C\u044C<br />

1122 Vosemh \u0412\u043E\u0441\u0435\u043C\u044C<br />

1123 Dev!ath \u0414\u0435\u0432\u044F\u0442\u044C<br />

1124 Des!ath \u0414\u0435\u0441\u044F\u0442\u044C<br />

1125 Od<strong>in</strong>nadcath<br />

1126 \u041E\u0434\u0438\u043D\u043D\u0430\u0434\u0446\u0430\u0442\u044C<br />

1127 Dvenadcath<br />

1128 \u0414\u0432\u0435\u043D\u0430\u0434\u0446\u0430\u0442\u044C<br />

1129 } {<br />

1130 pdf::outl<strong>in</strong>e_head<strong>in</strong>g $F 2 $name /Dest [<br />

1131 pdf::array_obj [pdf::obj_ref $F "Page $page"] /XYZ\<br />

[pdf::null_obj] [pdf::null_obj] [pdf::null_obj]<br />

1133 ]<br />

1134 <strong>in</strong>cr page<br />

1135 }<br />

1136 set outl<strong>in</strong>e [pdf::end_outl<strong>in</strong>e $F]<br />

1137 pdf::put_obj $F "The catalog" [pdf::dict_obj\<br />

1138 /Type /Catalog\<br />

1139 /Pages [pdf::obj_ref $F $Pages]\<br />

1140 /PageMode /UseOutl<strong>in</strong>es\<br />

1141 /Outl<strong>in</strong>es [pdf::obj_ref $F $outl<strong>in</strong>e]]<br />

1142 pdf::close_pdf $F "The catalog"<br />

1143 〈/example4〉<br />

References<br />

[1] Adobe Systems Incorporated: Portable Document Format Reference<br />

Manual, version 1.3 (second edition), Addison–Wesley, 1999; ISBN 0-<br />

201-61588-6; http://partners.adobe.com/public/developer/en/pdf/<br />

<strong>PDF</strong>Reference13.pdf.<br />

53


[2] Adobe Systems Incorporated: <strong>PDF</strong> Reference, fourth edition: Adobe<br />

Portable Document Format version 1.5.; http://partners.adobe.com/<br />

public/developer/en/pdf/<strong>PDF</strong>Reference15 v5.pdf.<br />

[3] Donald E. Knuth, Duane Bibby (illustrations): The TEXbook, Addison-Wesley,<br />

1991, ISBN 0-201-13448-9; also volume A <strong>of</strong> Computers and typesett<strong>in</strong>g,<br />

ISBN 0-201-13447-0.<br />

<strong>Index</strong><br />

All numbers <strong>in</strong> this <strong>in</strong>dex are page numbers. Underl<strong>in</strong>ed entries refer to places<br />

where the item <strong>in</strong> question is def<strong>in</strong>ed.<br />

A<br />

array_obj (proc), pdf namespace 4, 15<br />

B<br />

beg<strong>in</strong>_contents (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . 5, 25<br />

beg<strong>in</strong>_outl<strong>in</strong>e (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . . . 6, 50<br />

beg<strong>in</strong>_pages (proc), pdf namespace 6, 33<br />

beg<strong>in</strong>_stream (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . . . 4, 18<br />

boolean_obj (proc), pdf namespace 4, 12<br />

C<br />

close_pdf (proc), pdf namespace 2, 21<br />

D<br />

date_obj (proc), pdf namespace . 4, 16<br />

dict_obj (proc), pdf namespace . 4, 15<br />

E<br />

end_contents (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . . . 5, 25<br />

end_outl<strong>in</strong>e (proc), pdf namespace 6, 51<br />

end_pages (proc), pdf namespace 6, 34<br />

end_stream (proc), pdf namespace 4, 18<br />

F<br />

file〈num〉 (array), pdf namespace . 17<br />

?〈reference label〉 . . . . . . . . . . 18<br />

!〈reference label〉 . . . . . . . . . . 17<br />

backlog . . . . . . . . . . . . . . . . 19<br />

current_stream . . . . . . . . . . . 18<br />

last_object_num . . . . . . . . . . 17<br />

Outl<strong>in</strong>e//〈name〉 . . . . . . . . . 46<br />

Outl<strong>in</strong>e/count . . . . . . . . . . . 45<br />

54<br />

Outl<strong>in</strong>e/first . . . . . . . . . . . 45<br />

Outl<strong>in</strong>e/last . . . . . . . . . . . . 45<br />

Outl<strong>in</strong>e/open . . . . . . . . . . . . 45<br />

Outl<strong>in</strong>e/parent . . . . . . . . . . . 45<br />

Outl<strong>in</strong>e/prev . . . . . . . . . . . . 45<br />

Outl<strong>in</strong>e/〈str<strong>in</strong>g〉 . . . . . . . . . . 45<br />

Outl<strong>in</strong>es/last . . . . . . . . . . . 45<br />

Outl<strong>in</strong>es/levels . . . . . . . . . . 49<br />

Outl<strong>in</strong>es/prefix . . . . . . . . . . 45<br />

Outl<strong>in</strong>es/stack . . . . . . . . . . . 45<br />

Pages/arity . . . . . . . . . . . . . 32<br />

Pages/attributes . . . . . . . . . 32<br />

Pages/last . . . . . . . . . . . . . . 32<br />

Pages/prefix . . . . . . . . . . . . 32<br />

Pages/〈num〉 . . . . . . . . . . . . . 32<br />

Resources/〈type〉 . . . . . . . . . . 25<br />

H<br />

has_resource? (proc), pdf namespace 26<br />

hexstr<strong>in</strong>g_obj (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . . . 3, 13<br />

I<br />

<strong>in</strong>set_rect (proc), pdf namespace 10, 42<br />

<strong>in</strong>t_obj (proc), pdf namespace . . 3, 12<br />

<strong>in</strong>t_rect_obj (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . . . 11, 39<br />

L<br />

length (proc), pdf namespace . . . 9, 38<br />

length_obj (proc), pdf namespace 9, 38<br />

M<br />

make_pages_nodes (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . 34<br />

make_rect (proc), pdf namespace 10, 39


N<br />

name_obj (proc), pdf namespace . 4, 14<br />

name_resource (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . . . 5, 26<br />

null_obj (proc), pdf namespace . 4, 16<br />

O<br />

obj_ref (proc), pdf namespace . . 2, 17<br />

<strong>of</strong>fset_rect (proc), pdf namespace .<br />

. . . . . . . . . . . . . . . . . . . . 10, 43<br />

outl<strong>in</strong>e_beg<strong>in</strong>group (proc), pdf<br />

namespace . . . . . . . . . . . . 7, 48<br />

outl<strong>in</strong>e_endgroup (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . 7, 48<br />

outl<strong>in</strong>e_head<strong>in</strong>g (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . 7, 49<br />

outl<strong>in</strong>e_item (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . . . 7, 47<br />

outl<strong>in</strong>e_node_set (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . 7, 46<br />

P<br />

paper_rect (array), pdf namespace .<br />

. . . . . . . . . . . . . . . . . . . . 11, 43<br />

precision (var.), pdf namespace . 3, 12<br />

pr<strong>in</strong>tf (proc), pdf namespace . . . 8, 31<br />

put_obj (proc), pdf namespace . . 2, 19<br />

55<br />

put_outl<strong>in</strong>e_node (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . 46<br />

R<br />

real_obj (proc), pdf namespace . 3, 12<br />

rect_obj (proc), pdf namespace . 11, 39<br />

require_procsets (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . 28<br />

resource_dict_obj (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . 5, 24<br />

rewrite_pdf (proc), pdf namespace 2, 20<br />

S<br />

shipout (proc), pdf namespace . . 6, 33<br />

spr<strong>in</strong>tf (proc), pdf namespace . . 8, 29<br />

standard_rect (proc), pdf namespace<br />

. . . . . . . . . . . . . . . . . . . . 11, 42<br />

str<strong>in</strong>g_obj (proc), pdf namespace 3, 12<br />

T<br />

text_obj (proc), pdf namespace . 4, 14<br />

U<br />

unit_factor (array), pdf namespace 38<br />

W<br />

wh_rect (proc), pdf namespace . . 11, 43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!