A basic PDF writer in Tcl - Index of
A basic PDF writer in Tcl - Index of
A basic PDF writer in Tcl - Index of
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
A <strong>basic</strong> <strong>PDF</strong> <strong>writer</strong> <strong>in</strong> <strong>Tcl</strong><br />
Lars Hellström<br />
February 3, 2005<br />
Abstract<br />
This file conta<strong>in</strong>s some <strong>basic</strong> rout<strong>in</strong>es that allow a <strong>Tcl</strong> script to write<br />
<strong>PDF</strong> files.<br />
Contents<br />
1 Usage 2<br />
1.1 File structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br />
1.2 Direct objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />
1.3 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />
1.4 Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />
1.5 Outl<strong>in</strong>e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />
1.6 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />
1.7 Rectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />
2 <strong>PDF</strong> files and objects 11<br />
2.1 Build<strong>in</strong>g objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />
2.2 File structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br />
2.3 Hello World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />
3 Contents and resources 23<br />
3.1 Resources representation . . . . . . . . . . . . . . . . . . . . . . . . 24<br />
3.2 Formatt<strong>in</strong>g content . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />
3.3 Hello aga<strong>in</strong>, World . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />
4 Document pages 32<br />
4.1 The tree <strong>of</strong> pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />
4.2 Lengths and rectangles . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />
4.3 Paper sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />
4.4 A multi-page example . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />
5 Document outl<strong>in</strong>e 45<br />
5.1 Low-level stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />
5.2 An outl<strong>in</strong>e <strong>of</strong> head<strong>in</strong>gs . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />
5.3 An outl<strong>in</strong>e example . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />
1
1 Usage<br />
The aim <strong>of</strong> the <strong>basic</strong> pdf package is to simplify the generation <strong>of</strong> well-formed<br />
<strong>PDF</strong> files. Programmers who <strong>in</strong>tend to make use <strong>of</strong> it should first familiarize<br />
themselves with the actual <strong>PDF</strong> format specification, as it is not the aim <strong>of</strong> the<br />
<strong>basic</strong> pdf package 1 to substitute anyth<strong>in</strong>g else for the raw expressive power <strong>of</strong> the<br />
<strong>PDF</strong> format. Newcomers should f<strong>in</strong>d [2] (version 1.5 <strong>of</strong> the <strong>PDF</strong> specification) a<br />
good reference and <strong>in</strong>troduction to the details <strong>of</strong> the <strong>PDF</strong> format.<br />
1.1 File structure<br />
A <strong>PDF</strong> file is <strong>basic</strong>ally a (sometimes huge) data structure, consist<strong>in</strong>g <strong>of</strong> a myriad<br />
<strong>of</strong> objects (which are quite comparable to <strong>Tcl</strong> Objs, i.e., to <strong>Tcl</strong> values, although<br />
<strong>PDF</strong> objects have types). An object can be direct (encoded at the position it is<br />
used) or <strong>in</strong>direct (encoded somewhere else <strong>in</strong> the file and referenced by number).<br />
The absolute positions <strong>in</strong> the file <strong>of</strong> all <strong>in</strong>direct objects have to be given <strong>in</strong> a crossreference<br />
table at the end <strong>of</strong> the file, and gett<strong>in</strong>g this right is the first obstacle to<br />
generat<strong>in</strong>g a well-formed <strong>PDF</strong> file.<br />
pdf::put_obj (proc) The pdf package provides a model where <strong>in</strong>direct object can be assigned arbi-<br />
pdf::obj_ref (proc) trary str<strong>in</strong>gs as labels. Actual object numbers are allocated as needed and positions<br />
needed for the cross-reference table are recorded. The two <strong>basic</strong> commands<br />
for deail<strong>in</strong>g with <strong>in</strong>direct objects are<br />
pdf::rewrite_pdf (proc)<br />
pdf::close_pdf (proc)<br />
pdf::obj_ref {file} {reference label}<br />
pdf::put_obj {file} {reference label} {object}<br />
put_obj writes a <strong>PDF</strong> object to a file (thus mak<strong>in</strong>g it available as <strong>in</strong>direct object<br />
<strong>in</strong> that file), whereas obj_ref returns <strong>PDF</strong> code for a reference to an <strong>in</strong>direct<br />
object. obj_ref may occur before as well as after the put_obj for the object it<br />
refers to.<br />
Open <strong>PDF</strong> files are referenced via the usual identifier <strong>of</strong> the <strong>Tcl</strong> channel. To<br />
open a file for the purpose <strong>of</strong> creat<strong>in</strong>g a new <strong>PDF</strong> document, one uses<br />
pdf::rewrite_pdf {file name} 〈options〉<br />
which returns the identifier <strong>of</strong> the new file. The 〈options〉 is zero or more <strong>of</strong><br />
-permissions {<strong>in</strong>teger}<br />
-header {str<strong>in</strong>g}<br />
The permissions are the default permissions for the file <strong>in</strong> question. If this is not<br />
specified, then no such value is specified to open, The header is a str<strong>in</strong>g that will<br />
be put first <strong>in</strong> the file (as header). (The default header str<strong>in</strong>g declares the <strong>PDF</strong><br />
version to be 1.3 [1], which is a good compromise between support<strong>in</strong>g old <strong>PDF</strong><br />
consumers and provid<strong>in</strong>g <strong>PDF</strong> features.)<br />
The command used to close a <strong>PDF</strong> file should be<br />
1 But is a likely aim <strong>of</strong> add-on packages.<br />
2
pdf::close_pdf {file} {catalog label} {key} {value} ∗<br />
s<strong>in</strong>ce this is what will output the cross-reference table and trailer to this file,<br />
before it is closed. {catalog label} is the label <strong>of</strong> the /Catalog object for the<br />
document. The {key} {value} arguments are <strong>PDF</strong> objects which will be <strong>in</strong>serted<br />
<strong>in</strong>to the file’s trailer dictionary. Each {key} must be a name object, and each<br />
{value} the correspond<strong>in</strong>g value. (The /Size and /Root entries <strong>in</strong> this dictionary<br />
are generated automatically, so it is perfectly OK to only give two arguments to<br />
close_pdf.)<br />
It is part <strong>of</strong> the <strong>PDF</strong> specification how to make updates to an exist<strong>in</strong>g <strong>PDF</strong><br />
document, but the pdf package currently <strong>of</strong>fers no support for that. Should such<br />
support be added <strong>in</strong> the future, then one would use some other command than<br />
rewrite_pdf to open the file for modifications.<br />
1.2 Direct objects<br />
The pdf package commands that return {object}s (i.e., <strong>PDF</strong> code for an object)<br />
are<br />
pdf::boolean_obj {boolean}<br />
pdf::<strong>in</strong>t_obj {<strong>in</strong>teger}<br />
pdf::real_obj {value} {precision} ?<br />
pdf::str<strong>in</strong>g_obj {byte str<strong>in</strong>g}<br />
pdf::hexstr<strong>in</strong>g_obj {byte str<strong>in</strong>g}<br />
pdf::text_obj {str<strong>in</strong>g}<br />
pdf::name_obj {str<strong>in</strong>g}<br />
pdf::array_obj {object} ∗<br />
pdf::dict_obj {key object} {value object} ∗<br />
pdf::null_obj<br />
pdf::date_obj {clock value} {zonemode} ?<br />
pdf::length_obj {value} {unit} {precision} ?<br />
pdf::rect_obj {rectangle}<br />
pdf::<strong>in</strong>t_rect_obj {rectangle}<br />
pdf::resource_dict_obj {array-name}<br />
pdf::obj_ref {file} {reference label}<br />
All but the last <strong>of</strong> these return direct objects, whereas obj_ref as expla<strong>in</strong>ed above<br />
returns a reference to an <strong>in</strong>direct object. In addition to us<strong>in</strong>g the above commands,<br />
an {object} can also be the explicit <strong>PDF</strong> code for an object; this is most common<br />
with name objects.<br />
pdf::<strong>in</strong>t_obj (proc) The <strong>in</strong>t_obj command formats a <strong>Tcl</strong> <strong>in</strong>teger as a <strong>PDF</strong> object. The real_obj<br />
pdf::real_obj (proc) command similarly formats a <strong>Tcl</strong> double. The {precision} is the number <strong>of</strong> decimals<br />
that will be <strong>in</strong>cluded <strong>in</strong> the <strong>PDF</strong> code. When omitted, the current value <strong>of</strong><br />
pdf::precision (var.) the pdf::precision variable is used <strong>in</strong>stead. This variable is by default set to 3.<br />
pdf::str<strong>in</strong>g_obj (proc)<br />
pdf::hexstr<strong>in</strong>g_obj (proc)<br />
The str<strong>in</strong>g_obj command takes a {byte str<strong>in</strong>g} (a str<strong>in</strong>g consist<strong>in</strong>g <strong>of</strong> char-<br />
acters <strong>in</strong> the range \x00–\xFF) and returns the correspond<strong>in</strong>g <strong>PDF</strong> str<strong>in</strong>g object,<br />
delimited by parentheses. The hexstr<strong>in</strong>g_obj command does the same th<strong>in</strong>g, but<br />
3
makes use <strong>of</strong> hexstr<strong>in</strong>g (-delimited sequence <strong>of</strong> hexadecimal digits) encod<strong>in</strong>g<br />
<strong>in</strong>stead.<br />
Text objects and date objects are syntactically <strong>PDF</strong> str<strong>in</strong>g objects, but they<br />
are used <strong>in</strong> special contexts and are there given an <strong>in</strong>terpretation that is slightly<br />
different from that <strong>of</strong> ord<strong>in</strong>ary <strong>PDF</strong> str<strong>in</strong>gs. In particular, the character set for<br />
text object is always the full Unicode, whereas the encod<strong>in</strong>gs <strong>of</strong> ord<strong>in</strong>ary <strong>PDF</strong><br />
pdf::text_obj (proc) str<strong>in</strong>gs depend heavily on the context. The text_obj command takes an arbitrary<br />
pdf::date_obj (proc) <strong>Tcl</strong> str<strong>in</strong>g as argument and returns the correspond<strong>in</strong>g text object. The date_obj<br />
command takes a {clock value} (as used by the clock command) and returns the<br />
correspond<strong>in</strong>g <strong>PDF</strong> date object. The optional {zonemode} argument specifies how<br />
time zones are encoded <strong>in</strong> the object. An empty str<strong>in</strong>g (the default) or none means<br />
that no time zone specification should be <strong>in</strong>cluded. utc or gmt means encode the<br />
time as a UTC. local or full causes the <strong>of</strong>fset from local time to UTC to be<br />
computed and <strong>in</strong>cluded <strong>in</strong> the result.<br />
pdf::boolean_obj (proc) The boolean_obj and null_obj commands return boolean and null objects,<br />
pdf::null_obj (proc) respectively. They’re not that frequently used. The name_obj command returns<br />
pdf::name_obj (proc) the name object formed from a given str<strong>in</strong>g. This is most <strong>of</strong>ten used with variable<br />
str<strong>in</strong>gs, such as for example font names, that are not known when the program is<br />
written.<br />
pdf::array_obj (proc) The array_obj command returns the array object (comparable to a <strong>Tcl</strong> list)<br />
pdf::dict_obj (proc) that is formed from the given sequence <strong>of</strong> objects. The dict_obj command returns<br />
the dictionary object that is formed from the given sequence <strong>of</strong> keys and values.<br />
The {key object}s must all be name objects.<br />
1.3 Streams<br />
pdf::beg<strong>in</strong>_stream (proc) Much <strong>of</strong> the data <strong>in</strong> a <strong>PDF</strong> file is not stored <strong>in</strong> the above k<strong>in</strong>d <strong>of</strong> objects, but<br />
pdf::end_stream (proc) <strong>in</strong> a special k<strong>in</strong>d <strong>of</strong> <strong>in</strong>direct object called a stream. These are created us<strong>in</strong>g the<br />
commands<br />
pdf::beg<strong>in</strong>_stream {file} {label} {key} {value} ∗<br />
pdf::end_stream {file}<br />
The {label} is the one which will be used with obj_ref to refer to the stream<br />
object. Every stream comes with a stream dictionary that conta<strong>in</strong>s <strong>in</strong>formation<br />
about how the stream data should be decoded, e.g. decompressed. The {key}<br />
{value} arguments <strong>of</strong> beg<strong>in</strong>_stream are placed <strong>in</strong> this dictionary; the /Length<br />
entry for the stream is however automatically generated.<br />
Data written to a <strong>PDF</strong> file, us<strong>in</strong>g for example puts, between a beg<strong>in</strong>_stream<br />
and the match<strong>in</strong>g end_stream will go <strong>in</strong>to that stream. Such data need <strong>in</strong> general<br />
not conform to the ord<strong>in</strong>ary <strong>PDF</strong> syntax, but can be pretty much anyth<strong>in</strong>g. It<br />
will depend on where <strong>in</strong> the document the stream object is referenced whether the<br />
data is correct or not. Note that files opened us<strong>in</strong>g rewrite_pdf are configured to<br />
be b<strong>in</strong>ary. It is an error to try to beg<strong>in</strong> a new stream before end<strong>in</strong>g a previous one,<br />
but it is possible to use put_obj even <strong>in</strong>side a stream; the object is then cached<br />
<strong>in</strong>ternally and written to file after the stream has been ended.<br />
4
A special, but very common type <strong>of</strong> stream is the contents stream; this is for<br />
example used for all the text and graphics on actual document pages. Contents<br />
streams are created us<strong>in</strong>g the commands<br />
pdf::beg<strong>in</strong>_contents {resources-array} {file} {label} {key} {value} ∗<br />
pdf::end_contents {resources-array} {file}<br />
pdf::resource_dict_obj<br />
The special th<strong>in</strong>g about contents streams is that they are always associated with<br />
some resources dictionary, which maps names used <strong>in</strong> the contents stream to <strong>PDF</strong><br />
objects outside it. The extra feature provided by the . . . _contents commands as<br />
compared to the . . . _stream commands is a mechanism for keep<strong>in</strong>g track <strong>of</strong> the<br />
current set or resources and which permits extend<strong>in</strong>g this set when needed.<br />
When not <strong>in</strong>side a contents stream, data for resources dictionaries are<br />
(proc) kept <strong>in</strong> a <strong>Tcl</strong> array. The data can be converted to a <strong>PDF</strong> object us<strong>in</strong>g the<br />
pdf::beg<strong>in</strong>_contents<br />
resource_dict_obj command, which takes the name <strong>of</strong> an array as argument.<br />
beg<strong>in</strong>_contents similarly takes the name <strong>of</strong> an array as argument, and copies the<br />
(proc) data from this array to an <strong>in</strong>ternal (file-specific) storage. If the {resources-array}<br />
argument is empty then the <strong>in</strong>ternally stored resources dictionary starts out empty<br />
pdf::end_contents (proc)<br />
pdf::name_resource (proc)<br />
as well. end_contents conversely copies the resource dictionary entries from <strong>in</strong>-<br />
ternal storage to the specified {resources-array} (note: it does not clear that<br />
array first). If several contents streams are to share the same resources array,<br />
then one should pass the array filled <strong>in</strong> by the previous end_contents to the next<br />
beg<strong>in</strong>_content.<br />
Between beg<strong>in</strong>_contents and the match<strong>in</strong>g end_contents, one can use the<br />
name_resource command to get a name by which one can refer to a particular<br />
object from with<strong>in</strong> this contents stream. The syntax is<br />
pdf::name_resource {variable} {file} {type} {object} {suggested<br />
name} ?<br />
where the {variable} is the name <strong>of</strong> a variable that will be set to the wanted name<br />
object. {type} is the resource type, and should be one <strong>of</strong> ColorSpace, XObject,<br />
ExtGState, Font, Pattern, Properties, and Shad<strong>in</strong>g. {object} is the actual<br />
object (direct or <strong>in</strong>direct) and {file} is the <strong>PDF</strong> file.<br />
The optional {suggested name} argument can be used to force use <strong>of</strong> a particular<br />
name; if this is not supplied, then an available name is automatically generated.<br />
(Forc<strong>in</strong>g a particular name may be useful for backwards compatibility, as there are<br />
some known bugs <strong>in</strong> <strong>PDF</strong> readers which required us<strong>in</strong>g the same name <strong>in</strong> several<br />
different resource dictionaries.) Multiple calls for the same resource will reuse the<br />
same name, unless a suggested name is provided. The command returns 1 if a<br />
new name was added to the resource dictionary and 0 if an old name could be<br />
reused. An error is thrown if the {suggested name} is already assigned to some<br />
other object.<br />
The <strong>PDF</strong> specification also def<strong>in</strong>es ProcSet resources, but you need not worry<br />
about those. By default (i.e., if the ProcSet entry is not set), resource_dict_obj<br />
<strong>in</strong>serts an entry for the full set <strong>of</strong> procsets. Most <strong>PDF</strong> consumers never bothered<br />
about the procsets anyway.<br />
5
pdf::shipout (proc)<br />
1.4 Pages<br />
<strong>PDF</strong> requires that all pages are arranged <strong>in</strong> a data structure called the pages tree.<br />
The pdf package has commands that can take care <strong>of</strong> build<strong>in</strong>g this tree for you;<br />
if you use them, then you only have to worry about generat<strong>in</strong>g the pages <strong>in</strong> the<br />
order you want them to appear <strong>in</strong> the document.<br />
To f<strong>in</strong>ish a document page, one uses the command<br />
pdf::shipout {file} {label} {key} {object} +<br />
{file} here is the <strong>PDF</strong> file identifier and {label} is the reference label you want to<br />
assign to the page object. (L<strong>in</strong>ks <strong>in</strong> a <strong>PDF</strong> file require a reference to the target<br />
page, so it is likely that you will want to obj_ref the page.) The {key} and<br />
{object} arguments are attributes for the page object (keys and values for the<br />
dicitionary). This should not <strong>in</strong>clude the /Type and /Parent attributes, which<br />
are <strong>in</strong>serted automatically. An example:<br />
pdf::shipout $F "Page $n"\<br />
/Contents [pdf::obj_ref $F "Page $n contents"]\<br />
/Resources [pdf::obj_ref $F "Page $n resources"]<br />
Before the first shipout, one must <strong>in</strong>itialise the pages tree us<strong>in</strong>g the<br />
pdf::beg<strong>in</strong>_pages (proc) beg<strong>in</strong>_pages command, and after the last shipout, one must use end_pages<br />
pdf::end_pages (proc) to complete the pages tree.<br />
pdf::beg<strong>in</strong>_pages {file} {label prefix} {option} {value} ∗<br />
pdf::end_pages {file} {option} {value} ∗<br />
{option}s that beg<strong>in</strong> with a / are <strong>in</strong>terpreted as names <strong>of</strong> entries to <strong>in</strong>sert <strong>in</strong>to the<br />
root node <strong>of</strong> the pages tree; <strong>in</strong> this case the {value} must be an object. This is<br />
useful if some attribute (e.g. page size) is the same for all pages, as one can then<br />
specify it once at the root and let it be <strong>in</strong>herited by all the pages.<br />
Every node <strong>in</strong> the tree is given a reference label, so to avoid clashes with other<br />
objects, all /Pages nodes (but not the page nodes) are given labels that beg<strong>in</strong>s with<br />
the {label prefix} specified at beg<strong>in</strong>_pages. The end_pages command returns the<br />
label that was given to the root node.<br />
The pages tree constructed by end_pages is balanced and <strong>of</strong> m<strong>in</strong>imal size with<br />
respect to its arity (number <strong>of</strong> kids per parent). The default arity is 5, but that<br />
can be overridden us<strong>in</strong>g the -arity {option} <strong>of</strong> beg<strong>in</strong>_pages, <strong>in</strong> which case the<br />
correspond<strong>in</strong>g {value} is the new arity.<br />
1.5 Outl<strong>in</strong>e<br />
A similar mechanism exists for build<strong>in</strong>g the outl<strong>in</strong>e tree. Construction <strong>of</strong> this is<br />
pdf::beg<strong>in</strong>_outl<strong>in</strong>e (proc) begun at beg<strong>in</strong>_outl<strong>in</strong>e and completed at end_outl<strong>in</strong>e. To beg<strong>in</strong>_outl<strong>in</strong>e one<br />
pdf::end_outl<strong>in</strong>e (proc) must supply a str<strong>in</strong>g that will be used as prefix for all labels <strong>of</strong> nodes <strong>in</strong> the tree,<br />
and end_outl<strong>in</strong>e will return the label <strong>of</strong> the outl<strong>in</strong>e tree root node.<br />
pdf::beg<strong>in</strong>_outl<strong>in</strong>e {file} {prefix}<br />
pdf::end_outl<strong>in</strong>e {file}<br />
6
pdf::outl<strong>in</strong>e_head<strong>in</strong>g New items can be added to the outl<strong>in</strong>e us<strong>in</strong>g the outl<strong>in</strong>e_head<strong>in</strong>g command.<br />
(proc) This has the syntax<br />
pdf::outl<strong>in</strong>e_head<strong>in</strong>g {file} {level} {title} {option} {value} ∗<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file, {level} is the nom<strong>in</strong>al level <strong>of</strong> this<br />
item, and {title} is the title. The title is an ord<strong>in</strong>ary <strong>Tcl</strong> str<strong>in</strong>g and there is no<br />
restriction on which characters it may conta<strong>in</strong>.<br />
An {option} {value} is either a pair <strong>of</strong> <strong>PDF</strong> objects, where the first is a name<br />
object, or<br />
-open {boolean}<br />
The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the new item. These<br />
are what one should use to specify a dest<strong>in</strong>ation or equivalent for the outl<strong>in</strong>e item.<br />
The -open option controls whether this item will be open by default, i.e., if its<br />
subitems (if there will be any) should be shown. It defaults to false (closed).<br />
The {level} is relative, and can be an arbitrary str<strong>in</strong>g. The way it is used<br />
is that if {level} is greater than the current level, then a new level is begun.<br />
Else if {level} is greater than the previous level, the item is a sibl<strong>in</strong>g <strong>of</strong> the last<br />
item and the current level is updated. Otherwise the current level is ended and<br />
the issue is reexam<strong>in</strong>ed. This dynamically adapts to the set <strong>of</strong> {level}s actually<br />
used <strong>in</strong> a document, even if these are not consecutive. It also gracefully copes<br />
with <strong>in</strong>consistencies such as forgett<strong>in</strong>g some head<strong>in</strong>g level at the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> a<br />
document.<br />
It is possible to create rather obnoxious outl<strong>in</strong>es by hardwir<strong>in</strong>g particular zoom<br />
factors <strong>in</strong>to the outl<strong>in</strong>e. It is usually best to specify no more than the dest<strong>in</strong>ation<br />
page and vertical position, as shown <strong>in</strong> this example:<br />
pdf::outl<strong>in</strong>e_head<strong>in</strong>g $F 1 "Introduction" /Dest [<br />
pdf::array_obj [pdf::obj_ref $F "Page 1"] /XYZ null\<br />
[pdf::real_obj $ypos] null<br />
]<br />
There are also four lower level commands available, which may be useful if for<br />
some reason some <strong>in</strong>formation needed for an entry is not available until the end <strong>of</strong><br />
pdf::outl<strong>in</strong>e_node_set it (e.g. the position <strong>of</strong> that end). The outl<strong>in</strong>e_node_set command can be used<br />
(proc) to set entries <strong>in</strong> the dictionary <strong>of</strong> the current outl<strong>in</strong>e item. Its syntax is<br />
pdf::outl<strong>in</strong>e_item (proc)<br />
pdf::outl<strong>in</strong>e_beg<strong>in</strong>group<br />
(proc)<br />
pdf::outl<strong>in</strong>e_endgroup<br />
(proc)<br />
pdf::outl<strong>in</strong>e_node_set {file} {option} {value} ∗<br />
The outl<strong>in</strong>e_item command creates a new item <strong>in</strong> the current level <strong>of</strong> the outl<strong>in</strong>e.<br />
Its syntax is<br />
pdf::outl<strong>in</strong>e_item {file} {title} {option} {value} ∗<br />
These options and values are handled as for outl<strong>in</strong>e_head<strong>in</strong>g.<br />
For beg<strong>in</strong>n<strong>in</strong>g and end<strong>in</strong>g lower level groups <strong>of</strong> items, there are the commands<br />
pdf::outl<strong>in</strong>e_beg<strong>in</strong>group {file} {option} {value} ∗<br />
pdf::outl<strong>in</strong>e_endgroup {file} {option} {value} ∗<br />
7
pdf::pr<strong>in</strong>tf (proc)<br />
pdf::spr<strong>in</strong>tf (proc)<br />
The {option} and {value} arguments here affect the parent <strong>of</strong> the new group <strong>of</strong><br />
items. Note that between an outl<strong>in</strong>e_beg<strong>in</strong>group and the first outl<strong>in</strong>e_item<br />
after it, there is no current item <strong>in</strong> the outl<strong>in</strong>e.<br />
1.6 Contents<br />
Once <strong>in</strong>side a contents stream, <strong>PDF</strong> is fairly similar to Postscript (although still<br />
more strict and structured) with sequences <strong>of</strong> operands followed by some operator.<br />
To simplify writ<strong>in</strong>g such code, there is a command pr<strong>in</strong>tf which <strong>of</strong>fers format-<br />
style formatt<strong>in</strong>g <strong>of</strong> data written to the file. The syntax is<br />
pdf::pr<strong>in</strong>tf {file} {format list} {data} ∗<br />
and as with format, each conversion specifier <strong>in</strong> the {format list} consumes one<br />
or several {data} items. (It is probably a good idea to limit the length <strong>of</strong> {format<br />
list}s to small enough chunks that you can <strong>in</strong>stantly see what each {data} item is<br />
used for.) There is also a command spr<strong>in</strong>tf with syntax<br />
pdf::spr<strong>in</strong>tf {format list} {data} ∗<br />
that returns the formatted code rather than writ<strong>in</strong>g it to a file.<br />
The {format list}s are lists where every element is either explcit <strong>PDF</strong> code<br />
(typically an operator) or a conversion specifier. As with format, the conversion<br />
specifiers are recognised by the fact that their first character is a ‘%’. The contributions<br />
to the formatted <strong>PDF</strong> code from separate list elements will be separated<br />
by whitespace as necessary.<br />
The second character <strong>of</strong> a conversion specifier determ<strong>in</strong>es the type <strong>of</strong> conversion<br />
to carry out. The <strong>basic</strong> conversions are<br />
b Boolean, to be formatted by boolean_obj.<br />
i Integer, to be formatted by <strong>in</strong>t_obj.<br />
l Length, to be formatted by length_obj. This consumes two {data} arguments:<br />
one for the value and one for the unit.<br />
n Data is a str<strong>in</strong>g, to be formatted by name_obj.<br />
o Already formatted <strong>PDF</strong> object.<br />
r Real number, to be formatted by real_obj (with default precision accord<strong>in</strong>g<br />
to the precision variable).<br />
s <strong>PDF</strong> str<strong>in</strong>g, to be formatted by str<strong>in</strong>g_obj.<br />
In addition, the correspond<strong>in</strong>g upper case letters select the same formatt<strong>in</strong>g, but<br />
the (first) {data} argument is <strong>in</strong>terpreted as a list <strong>of</strong> values to format <strong>in</strong> the specified<br />
way. The character may also be an &, <strong>in</strong> which case the {data} is <strong>in</strong>terpreted<br />
as a list<br />
{format list} {data} ∗<br />
8
pdf::length (proc)<br />
pdf::length_obj (proc)<br />
which will be formatted by a recursive spr<strong>in</strong>tf call and <strong>in</strong>serted <strong>in</strong>to the result<br />
at that position. This is <strong>in</strong>tended to simplify encod<strong>in</strong>g structured data.<br />
The exact format <strong>of</strong> a conversion specifier is<br />
%〈char〉 〈count〉(.〈precision〉) ? ?<br />
The 〈count〉 defaults to 1 and specify<strong>in</strong>g a non-unit 〈count〉 value is equivalent<br />
to specify<strong>in</strong>g that many separate conversion specifiers <strong>in</strong> sequence. Specify<strong>in</strong>g a<br />
〈precision〉 overrides the precision default for real and length conversions.<br />
Page contents <strong>in</strong> <strong>PDF</strong> are primarily graphical, and thus there is a fair amount<br />
<strong>of</strong> coord<strong>in</strong>ates <strong>in</strong>volved. For manufactur<strong>in</strong>g isolated coord<strong>in</strong>ates, the length com-<br />
mand, its object-mak<strong>in</strong>g counterpart length_obj, and the pr<strong>in</strong>tf counterpart %l<br />
are convenient, as they make it possible to express lengths <strong>in</strong> physical units and<br />
then have them automatically converted to the (default) <strong>PDF</strong> length unit. The<br />
syntax is<br />
pdf::length {value} {unit}<br />
pdf::length_obj {value} {unit} {precision} ?<br />
where {value} is the numerical value, {unit} the name <strong>of</strong> the unit it is expressed<br />
<strong>in</strong>, and {precision} as with real_obj an optimal precision that is specified if one<br />
wishes to override the default. The units known to the pdf package are<br />
An example:<br />
bp Postscript po<strong>in</strong>t (1/72 <strong>in</strong>)<br />
cc cicero<br />
cm centimeter<br />
dd Didot po<strong>in</strong>t (European pr<strong>in</strong>ter’s po<strong>in</strong>t)<br />
<strong>in</strong> <strong>in</strong>ch<br />
mm millimeter<br />
pc pica<br />
pt (American) pr<strong>in</strong>ter’s po<strong>in</strong>t<br />
pdf::beg<strong>in</strong>_contents "" $F "A page"<br />
pdf::pr<strong>in</strong>tf $F {%l2 m %L l S} 5 cm 5 cm {10 15} cm<br />
pdf::name_resource times_font $F Font [pdf::dict_obj\<br />
/Type /Font /Subtype /Type1 /BaseFont /Times-Roman\<br />
/Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />
pdf::pr<strong>in</strong>tf $F {BT}<br />
pdf::pr<strong>in</strong>tf $F {%o %l Tf 1 0 0 1 %L1.1 Tm} $times_font 12 dd {8 10} cm<br />
pdf::pr<strong>in</strong>tf $F {%s Tj} [encod<strong>in</strong>g convertto macRoman "na\u00EFve"]<br />
pdf::pr<strong>in</strong>tf $F {ET}<br />
pdf::end_contents resarr $F<br />
1.7 Rectangles<br />
Coord<strong>in</strong>ates occur not only <strong>in</strong> page contents, but also <strong>in</strong> many other data structures<br />
<strong>in</strong> a <strong>PDF</strong> file. In particular it is common that one has to specify some<br />
9
pdf::make_rect (proc)<br />
pdf::<strong>of</strong>fset_rect (proc)<br />
pdf::<strong>in</strong>set_rect (proc)<br />
rectangle (e.g. the clickable area <strong>of</strong> a l<strong>in</strong>k, or the imagable area <strong>of</strong> a page), so the<br />
pdf package provides several commands for creat<strong>in</strong>g, modify<strong>in</strong>g, and formatt<strong>in</strong>g<br />
rectangles.<br />
The <strong>basic</strong> format for a {rectangle} that the pdf package uses is as a list <strong>of</strong> four<br />
elements<br />
{left} {bottom} {right} {top}<br />
each <strong>of</strong> which is the coord<strong>in</strong>ate <strong>of</strong> one side <strong>of</strong> the rectangle, <strong>in</strong> default <strong>PDF</strong> units<br />
(i.e., bp). Such lists are returned by the commands<br />
pdf::make_rect {option} {value} {unit} ? +<br />
pdf:<strong>of</strong>fset_rect {rect} {dx} {dy} {unit} ?<br />
pdf::<strong>in</strong>set_rect {rect} {amount} {unit}<br />
pdf::<strong>in</strong>set_rect {rect} {dx} {dy} {unit}<br />
pdf::<strong>in</strong>set_rect {rect} {dl} {db} {dr} {dt} {unit}<br />
pdf::standard_rect {rect}<br />
make_rect is a generic “tell me what you know about the rectangle and I’ll figure<br />
out what its coord<strong>in</strong>ates are” command. Each option specifies a value for one<br />
or two quantities that can be derived from the rectangle coord<strong>in</strong>ates, and by<br />
comb<strong>in</strong><strong>in</strong>g the <strong>in</strong>formation the command calculates the rectangle coord<strong>in</strong>ates.<br />
-width Distance from left to right<br />
-height Distance from bottom to top<br />
-left left<br />
-right right<br />
-top top<br />
-bottom bottom<br />
-ll {left bottom}<br />
-lr {right bottom}<br />
-ul {left top}<br />
-ur {right top}<br />
-center midpo<strong>in</strong>t<br />
-midx x-coord<strong>in</strong>ate <strong>of</strong> midpo<strong>in</strong>t<br />
-midy y-coord<strong>in</strong>ate <strong>of</strong> midpo<strong>in</strong>t<br />
The way it works is that the list <strong>of</strong> options is processed left to right, every<br />
option contributes some <strong>in</strong>formation about the wanted rectangle, and when all<br />
four coord<strong>in</strong>ates are known the rectangle is returned. The {value} is, depend<strong>in</strong>g<br />
on the option, either a number or a po<strong>in</strong>t (list <strong>of</strong> two numbers). The {unit} is the<br />
unit <strong>of</strong> the {value}; it defaults to bp if omitted.<br />
Once a rectangle has been constructed, it can be modified us<strong>in</strong>g the other<br />
commands shown above. The <strong>of</strong>fset_rect command moves the rectangle but<br />
preserves its size; the {unit} defaults to bp. The <strong>in</strong>set_rect command shr<strong>in</strong>ks a<br />
rectangle by mov<strong>in</strong>g the sides <strong>in</strong>wards by the specified amount(s), or for a negative<br />
amount grows the rectangle by mov<strong>in</strong>g the sides outwards. One can specify a s<strong>in</strong>gle<br />
10
pdf::standard_rect (proc)<br />
pdf::rect_obj (proc)<br />
pdf::<strong>in</strong>t_rect_obj (proc)<br />
pdf::wh_rect (proc)<br />
pdf::paper_rect (array)<br />
{amount} for all sides, separate {dx} and {dy} amounts for horizontal and vertical<br />
coord<strong>in</strong>ates respectively, or separate amounts for each <strong>of</strong> the four sides. A typical<br />
usage is to shr<strong>in</strong>k a rectangle to leave a marg<strong>in</strong>.<br />
It is possible to end up with a rectangle where the {bottom} is above the {top}<br />
or {left} is further right than {right} itself, i.e., a rectangle with negative height<br />
or width. <strong>PDF</strong> consumers typically normalises such rectangles by exchang<strong>in</strong>g<br />
the sides as necessary, so this is <strong>of</strong>ten not a problem, but if you want to ensure<br />
that a rectangle has positive height or depth then you may feed it through the<br />
standard_rect command. This might be necessary if you want to place a po<strong>in</strong>t<br />
below some given rectangle.<br />
To get a rectangle <strong>in</strong>to <strong>PDF</strong> code, there are three commands<br />
pdf::rect_obj {rectangle}<br />
pdf::<strong>in</strong>t_rect_obj {rectangle}<br />
pdf::wh_rect {rectangle}<br />
rect_obj returns a rectangle object (a <strong>PDF</strong> array <strong>of</strong> four real numbers).<br />
<strong>in</strong>t_rect_obj also returns a rectangle object, but rounds the coord<strong>in</strong>ates to <strong>in</strong>-<br />
tegers first to conserve space. wh_rect does not return <strong>PDF</strong> code, but simply the<br />
four element <strong>Tcl</strong> list<br />
{left} {bottom} {width} {height}<br />
that corresponds to the {rectangle}. These are the operands <strong>of</strong> the re operator,<br />
and can be conveniently formatted us<strong>in</strong>g %R.<br />
F<strong>in</strong>ally, there is an array paper_rect which conta<strong>in</strong>s the /MediaBox rectangles<br />
correspond<strong>in</strong>g to some popular paper sizes: A4, A4R (landscape A4), letter, and<br />
legal.<br />
Implementation<br />
2 <strong>PDF</strong> files and objects<br />
A Portable Document Format (<strong>PDF</strong>) file is, when compared with for example a<br />
PostScript file or HTML file, a rather disorganised document. This is because at<br />
the <strong>basic</strong> level, a <strong>PDF</strong> file is a heap rather than a text; it can be “disorganised”<br />
s<strong>in</strong>ce its logical structure is based on cross-referenc<strong>in</strong>g rather than on sequentiality.<br />
The first step is therefore to provide support for writ<strong>in</strong>g well-formed heaps.<br />
1 〈∗pkg〉<br />
2 package require <strong>Tcl</strong> 8.3<br />
<strong>Tcl</strong> 8.3 is required for array unset, and str<strong>in</strong>g equal is used <strong>in</strong> some places. It<br />
should be possible to make the code should run on <strong>Tcl</strong> 8.1.1 (which is required for<br />
str<strong>in</strong>g map) if those two were worked around.<br />
3 package provide pdf 0.2<br />
4 namespace eval pdf {}<br />
11
2.1 Build<strong>in</strong>g objects<br />
The <strong>in</strong>dependent units <strong>in</strong> a <strong>PDF</strong> file are called objects. An object is essentially<br />
a value (which <strong>in</strong>cludes a type). The procedures below construct str<strong>in</strong>gs <strong>of</strong> <strong>PDF</strong><br />
code that encode objects <strong>of</strong> various types. The str<strong>in</strong>gs returned are generally<br />
such that one must <strong>in</strong>sert whitespace between two such str<strong>in</strong>gs if the data is to<br />
be properly encoded. The str<strong>in</strong>gs may conta<strong>in</strong> newl<strong>in</strong>es if some build<strong>in</strong>g rout<strong>in</strong>e<br />
th<strong>in</strong>ks the l<strong>in</strong>es should otherwise be too long.<br />
pdf::boolean_obj (proc) The boolean_obj procedure returns a boolean object, correspond<strong>in</strong>g to the str<strong>in</strong>g<br />
passed as its only argument. The argument can be any <strong>Tcl</strong> boolean value.<br />
5 proc pdf::boolean_obj {value} {<br />
6 if {$value} then {return true} else {return false}<br />
7 }<br />
pdf::<strong>in</strong>t_obj (proc) The <strong>in</strong>t_obj procedure returns the <strong>PDF</strong> object correspond<strong>in</strong>g to the <strong>in</strong>teger supplied<br />
as argument.<br />
pdf::real_obj (proc)<br />
pdf::precision (var.)<br />
8 proc pdf::<strong>in</strong>t_obj {value} {format %d $value}<br />
The real_obj procedure returns the <strong>PDF</strong> object correspond<strong>in</strong>g to the real number<br />
supplied as argument. The syntax is<br />
pdf::real_obj {value} {precision} ?<br />
where {precision} is the number <strong>of</strong> decimals that will be <strong>in</strong>cluded <strong>in</strong> the object.<br />
If omitted, the value <strong>of</strong> the precision variable is used, and that defaults to 3.<br />
9 set pdf::precision 3<br />
10 proc pdf::real_obj {value {precision -1}} {<br />
11 if {$precision
24 if {$code==92} then {<br />
25 append str \\<br />
26 <strong>in</strong>cr len<br />
27 cont<strong>in</strong>ue<br />
28 } elseif {$code=100} {<br />
47 lappend L [str<strong>in</strong>g map [list \\ \\\\ ( \\( ) \\) \r \\r \n \\n]\<br />
[str<strong>in</strong>g range $str 0 99]]<br />
49 set str [str<strong>in</strong>g range $str 100 end]<br />
50 }<br />
51 if {[str<strong>in</strong>g length $str]} then {<br />
52 lappend L\<br />
[str<strong>in</strong>g map [list \\ \\\\ ( \\( ) \\) \r \\r \n \\n] $str]<br />
54 }<br />
55 set str ([jo<strong>in</strong> $L \\\n])<br />
56 }<br />
57 return $str<br />
58 }<br />
pdf::hexstr<strong>in</strong>g_obj (proc) The hexstr<strong>in</strong>g_obj procedure returns the <strong>PDF</strong> str<strong>in</strong>g object, encoded as hexadecimal<br />
digits, that corresponds to the argument. If the str<strong>in</strong>g is longer than 31<br />
characters then it will be broken on several l<strong>in</strong>es.<br />
59 proc pdf::hexstr<strong>in</strong>g_obj {str} {<br />
60 set hstr "
70 <strong>in</strong>cr len 2<br />
71 } else {<br />
72 error "Bad character $ch [format (U+%04x) $code] <strong>in</strong> <strong>PDF</strong>\<br />
str<strong>in</strong>g."<br />
74 }<br />
75 }<br />
76 append hstr ">"<br />
77 }<br />
pdf::text_obj (proc) The text_obj procedure returns the <strong>PDF</strong> text str<strong>in</strong>g object that corresponds to<br />
the argument str<strong>in</strong>g. The syntax is<br />
pdf::text_obj {str<strong>in</strong>g}<br />
where {str<strong>in</strong>g} is an arbitrary <strong>Tcl</strong> str<strong>in</strong>g. (Ord<strong>in</strong>ary <strong>PDF</strong> str<strong>in</strong>gs are more like <strong>Tcl</strong><br />
byte arrays.)<br />
The greatest complication <strong>in</strong> the implementation is check<strong>in</strong>g whether the<br />
{str<strong>in</strong>g} can be encoded <strong>in</strong> <strong>PDF</strong>DocEncod<strong>in</strong>g or will have to be expressed <strong>in</strong><br />
UTF-16BE. This is handled slightly sneakily, as <strong>in</strong> fact only the subset <strong>of</strong><br />
<strong>PDF</strong>DocEncod<strong>in</strong>g that co<strong>in</strong>cides with iso8859-1 (and hence Unicode) is allowed;<br />
any character outside that set triggers conversion to UTF-16BE (as does a str<strong>in</strong>g<br />
that beg<strong>in</strong>s with the Byte Order Mark \xFE\xFF).<br />
UTF-16BE-encoded str<strong>in</strong>gs are hexcoded, s<strong>in</strong>ce they are probably easier to<br />
<strong>in</strong>terpret that way. Str<strong>in</strong>gs not require<strong>in</strong>g UTF-16BE-encod<strong>in</strong>g are not hexcoded.<br />
78 proc pdf::text_obj {str} {<br />
79 if {[regexp -- {[^ -~\241-\254\256-\377]|^\xFE\xFF} $str]} then {<br />
80 b<strong>in</strong>ary scan [encod<strong>in</strong>g convertto unicode $str] H* uhex<br />
81 regsub -all -- {\w{64}} "" "&\n" res<br />
82 return $res<br />
83 } else {<br />
84 return [str<strong>in</strong>g_obj $str]<br />
85 }<br />
86 }<br />
pdf::name_obj (proc) The name_obj procedure returns the <strong>PDF</strong> name object correspond<strong>in</strong>g to its argument.<br />
It is useful ma<strong>in</strong>ly for names with strange characters <strong>in</strong> them (non-ASCII<br />
characters or characters with special mean<strong>in</strong>g <strong>in</strong> <strong>PDF</strong> syntax), but most names<br />
(e.g. dictionary keys) appear<strong>in</strong>g <strong>in</strong> <strong>PDF</strong> files do not require any quot<strong>in</strong>g and can<br />
therefore just as well be written as explicit <strong>PDF</strong> code.<br />
87 proc pdf::name_obj {str} {<br />
88 if {[str<strong>in</strong>g bytelength $str]>126} then {<br />
89 error "Str<strong>in</strong>g too long to be a <strong>PDF</strong> name."<br />
90 }<br />
91 set res /<br />
92 foreach ch [split [encod<strong>in</strong>g convertto utf-8 $str] {}] {<br />
93 switch -glob -- $ch {<br />
94 ( - ) - < - > - \\[ - \\] - \{ - \} - / - % - # {<br />
95 scan $ch %c code<br />
96 append res [format #%02x $code]<br />
14
97 }<br />
98 [!-~] {append res $ch}<br />
99 default {<br />
100 scan $ch %c code<br />
101 append res [format #%02x $code]<br />
102 }<br />
103 }<br />
104 }<br />
105 return $res<br />
106 }<br />
pdf::array_obj (proc) The array_obj procedure builds an array object <strong>of</strong> the objects it is given as<br />
arguments. The syntax is<br />
pdf::array_obj {object} ∗<br />
Newl<strong>in</strong>es are <strong>in</strong>serted between the objects if it does not appear as if the object<br />
would fit on a s<strong>in</strong>gle (100 character) l<strong>in</strong>e.<br />
107 proc pdf::array_obj {args} {<br />
108 set res \[<br />
109 set len 1<br />
110 foreach item $args {<br />
111 if {[str<strong>in</strong>g length $item] + $len >= 100} then {<br />
112 append res \n<br />
113 set len 0<br />
114 } elseif {[str<strong>in</strong>g length $res]>1} then {<br />
115 append res " "<br />
116 <strong>in</strong>cr len<br />
117 }<br />
118 append res $item<br />
119 <strong>in</strong>cr len [str<strong>in</strong>g length $item]<br />
120 }<br />
121 if {$len >= 100} then {<br />
122 append res \n<br />
123 }<br />
124 append res \]<br />
125 }<br />
pdf::dict_obj (proc) The dict_obj procedure builds a dictionary object from its arguments. The<br />
syntax is<br />
pdf::dict_obj {key} {value} ∗<br />
where each {key} must be a name object and each {value} must be an object. It<br />
is checked that the number <strong>of</strong> elements is correct and that the keys beg<strong>in</strong> with a<br />
slash.<br />
126 proc pdf::dict_obj {args} {<br />
127 if {[llength $args] % 2 != 0} then {<br />
128 error "Not the same number <strong>of</strong> keys and values."<br />
129 }<br />
15
130 set res ">"<br />
142 }<br />
pdf::null_obj (proc) The null_obj procedure returns a null object. It has no arguments.<br />
143 proc pdf::null_obj {} {return null}<br />
pdf::date_obj (proc) The date_obj procedure formats a <strong>Tcl</strong> seconds value as a <strong>PDF</strong> date str<strong>in</strong>g object.<br />
The syntax is<br />
pdf::date_obj {seconds} {local} ?<br />
where {seconds} is the time as returned by clock seconds and {local} controls<br />
how to deal with the issue <strong>of</strong> time zones. The possible values for this are (noncase-sensitive)<br />
none or an empty str<strong>in</strong>g Express time <strong>in</strong> local timezone, but don’t <strong>in</strong>clude any time zone <strong>in</strong>formation<br />
<strong>in</strong> the result. This is the default.<br />
UTC or gmt Express time <strong>in</strong> UTC and use Z as timezone.<br />
local or full Express time <strong>in</strong> local timezone, compute its difference to UTC, and <strong>in</strong>clude<br />
that <strong>in</strong> the result.<br />
144 proc pdf::date_obj {secs {local ""}} {<br />
145 switch -- [str<strong>in</strong>g tolower $local] none - "" {<br />
146 return [clock format $secs -format (D:%Y%m%d%H%M%S)]<br />
147 } utc - gmt {<br />
148 return [clock format $secs -format (D:%Y%m%d%H%M%SZ) -gmt 1]<br />
149 } full - local {<br />
150 set res [clock format $secs -format (D:%Y%m%d%H%M%S]<br />
151 set semilocal [clock format $secs -format "%Y%m%d %H:%M:%S"]<br />
152 set local [clock scan $semilocal -gmt 1]<br />
153 set <strong>of</strong>fset [expr {$local - $secs}]<br />
154 if {$<strong>of</strong>fset < 0} then {<br />
155 append res -<br />
156 set <strong>of</strong>fset [expr abs($<strong>of</strong>fset)]<br />
157 } else {<br />
158 append res +<br />
159 }<br />
160 append res [clock format $<strong>of</strong>fset -format "%H’%M’)" -gmt 1]<br />
16
161 return $res<br />
162 } default {<br />
163 error "Unknown locality sett<strong>in</strong>g ’$local’"<br />
164 }<br />
165 }<br />
Objects can also be streams, but those have a special relation to the file structure<br />
and are therefore best treated <strong>in</strong> conjunction with that. In particular, streams<br />
cannot be used as arguments <strong>of</strong> array_obj or dict_obj. The arguments <strong>of</strong> these<br />
procedures can however be <strong>in</strong>direct references to objects <strong>of</strong> any type, but these<br />
too are best treated <strong>in</strong> the context <strong>of</strong> the <strong>basic</strong> <strong>PDF</strong> file structure.<br />
2.2 File structure<br />
The body <strong>of</strong> a <strong>PDF</strong> file consists <strong>of</strong> a sequence <strong>of</strong> <strong>in</strong>direct objects, which are ma<strong>in</strong>ly<br />
a sort <strong>of</strong> declarations: a pair <strong>of</strong> <strong>in</strong>tegers are associated with an object value. S<strong>in</strong>ce<br />
any composite object can (and <strong>in</strong> several cases must) conta<strong>in</strong> a reference to any<br />
<strong>in</strong>direct object, this makes it possible to build up arbitrary data structures. It<br />
is however also a complication, s<strong>in</strong>ce it requires that there is a mechanism for<br />
allocat<strong>in</strong>g these numbers.<br />
pdf::file〈num〉 (array) Every file that <strong>Tcl</strong> opens gets a unique identifier which is used <strong>in</strong> calls to puts and<br />
such. This identifier is also used as the name <strong>of</strong> an array <strong>in</strong> the pdf namespace,<br />
<strong>in</strong> which the procedures below store all auxiliary <strong>in</strong>formation they need to create<br />
a proper <strong>PDF</strong> file.<br />
pdf::file〈num〉<br />
(!〈reference label〉)<br />
pdf::file〈num〉<br />
(last_object_num)<br />
In this API, references to <strong>in</strong>direct objects can be arbitrary str<strong>in</strong>gs, called reference<br />
labels. The correspondence to the object numbers actually found <strong>in</strong> the file is given<br />
by the !〈reference label〉 entries <strong>in</strong> the array <strong>of</strong> the file <strong>in</strong> question. The entries <strong>in</strong><br />
this array are lists with the structure<br />
{object number} {generation number} {file position} ?<br />
where the {file position} is present only if the <strong>in</strong>direct object <strong>in</strong> question has been<br />
written to file already. The {object number} is the number <strong>of</strong> the object referred<br />
to. The {generation number} is currently always zero; it appears that it can only<br />
be nonzero for files that have <strong>in</strong>crementally updated, and this API only supports<br />
creat<strong>in</strong>g a file from scratch. The {file position} is the position <strong>in</strong> the file <strong>of</strong> the<br />
beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the <strong>in</strong>direct object beg<strong>in</strong> referred to.<br />
The last_object_num entry <strong>in</strong> the array holds the most recently allocated<br />
object number. It is <strong>in</strong>cremented whenever a new reference label is encountered.<br />
pdf::obj_ref (proc) The obj_ref procedure returns <strong>PDF</strong> code for an <strong>in</strong>direct reference to an object.<br />
The syntax is<br />
pdf::obj_ref {file} {reference label}<br />
17
pdf::beg<strong>in</strong>_stream (proc)<br />
pdf::end_stream (proc)<br />
pdf::file〈num〉<br />
(current_stream)<br />
pdf::file〈num〉<br />
(?〈reference label〉)<br />
where {file} is the <strong>in</strong>dentifier <strong>of</strong> the <strong>PDF</strong> file <strong>in</strong> question. If the {reference label}<br />
has not been encountered before for this particular file, then a new object number<br />
is allocated for it.<br />
166 proc pdf::obj_ref {F label} {<br />
167 upvar #0 [namespace current]::$F A<br />
168 if {![<strong>in</strong>fo exists A(!$label)]} then {<br />
169 <strong>in</strong>cr A(last_object_num)<br />
170 set A(!$label) [list $A(last_object_num) 0]<br />
171 }<br />
172 format {%d %d R} [l<strong>in</strong>dex $A(!$label) 0] [l<strong>in</strong>dex $A(!$label) 1]<br />
173 }<br />
The beg<strong>in</strong>_stream and end_stream procedures delimit the creation <strong>of</strong> a stream<br />
object. Between two such commands, it is possible to write arbitrary text (usually<br />
page descriptors or some sort <strong>of</strong> embedded data) to the <strong>PDF</strong> file and have it<br />
<strong>in</strong>serted correctly <strong>in</strong>to the file as the data stored <strong>in</strong> the stream object.<br />
The syntax for beg<strong>in</strong>_stream is<br />
pdf::beg<strong>in</strong>_stream {file} {reference label} {key} {value} ∗<br />
where {file} <strong>of</strong> course is the file to write to and {reference label} is the str<strong>in</strong>g that<br />
should be used to reference this object. Each stream consists <strong>of</strong> one dictionary part<br />
and one data part, where the primary task <strong>of</strong> the dictionary part is to specify how<br />
the data part should be <strong>in</strong>terpreted. The most important element <strong>in</strong> the dictionary<br />
is the /Length key and its value—these are <strong>in</strong>serted by the beg<strong>in</strong>_stream and<br />
end_stream commands, so one needs not worry about those—but if for example<br />
the data part is encoded <strong>in</strong> some special way (for example, it might be compressed)<br />
then it is necessary to <strong>in</strong>clude additional elements <strong>in</strong> the dictionary. This is what<br />
the {key} and {value} arguments are for.<br />
The current_stream entry <strong>in</strong> a <strong>PDF</strong> file array is set if and only if the current<br />
position <strong>in</strong> that file is <strong>in</strong>side a stream. It is not possible to beg<strong>in</strong> a new stream<br />
when this entry is set. The value <strong>of</strong> this entry is a list with the structure<br />
{reference label} {start}<br />
where {reference label} is the reference label <strong>of</strong> the stream and {start} is the<br />
position <strong>in</strong> the file <strong>of</strong> the first byte <strong>in</strong> the stream data. Both <strong>of</strong> these are needed<br />
at end_stream to record the length <strong>of</strong> the stream data.<br />
This k<strong>in</strong>d <strong>of</strong> entry is used for <strong>in</strong>direct objects that are lengths <strong>of</strong> the stream whose<br />
reference label is the 〈reference label〉. They have the same syntax as their !<br />
ord<strong>in</strong>ary counterparts.<br />
174 proc pdf::beg<strong>in</strong>_stream {F label args} {<br />
175 upvar #0 [namespace current]::$F A<br />
176 if {[<strong>in</strong>fo exists A(current_stream)]} then {<br />
177 error "There is already a stream ([l<strong>in</strong>dex $A(current_stream) 0])\<br />
be<strong>in</strong>g written to <strong>in</strong> this file."<br />
18
179 }<br />
180 if {![<strong>in</strong>fo exists A(!$label)]} then {<br />
181 <strong>in</strong>cr A(last_object_num)<br />
182 set A(!$label) [list $A(last_object_num) 0]<br />
183 }<br />
184 set A(?$label) [list [<strong>in</strong>cr A(last_object_num)] 0]<br />
185 lappend A(!$label) [tell $F]<br />
186 puts $F\<br />
[format {%d %d obj} [l<strong>in</strong>dex $A(!$label) 0] [l<strong>in</strong>dex $A(!$label) 1]]<br />
188 puts $F [eval\<br />
[list dict_obj /Length [format {%d 0 R} $A(last_object_num)]]\<br />
$args]<br />
191 puts $F stream<br />
192 set A(current_stream) [list $label [tell $F]]<br />
193 }<br />
The end_stream procedure takes the target file as its only argument. It f<strong>in</strong>ishes<br />
<strong>of</strong>f the stream as necessary. It also evaluates everyth<strong>in</strong>g that has been placed <strong>in</strong><br />
the backlog <strong>of</strong> the file.<br />
pdf::file〈num〉(backlog) It is not possible to output a new <strong>in</strong>direct object when a stream is be<strong>in</strong>g written to,<br />
but it can still be at such a time that the need for such an object is discovered. The<br />
backlog entry provides a way around that limitation—this entry is a script that is<br />
evaluated (and cleared) at the end <strong>of</strong> every end_stream, hence commands can be<br />
delayed by append<strong>in</strong>g them to this script, <strong>in</strong>stead <strong>of</strong> evaluat<strong>in</strong>g them immediately.<br />
New commands are appended to the backlog, and must be preceeded by a<br />
command separator.<br />
194 proc pdf::end_stream {F} {<br />
195 upvar #0 [namespace current]::$F A<br />
196 if {![<strong>in</strong>fo exists A(current_stream)]} then {<br />
197 error "There is no stream to end."<br />
198 }<br />
199 set length [expr {[tell $F] - [l<strong>in</strong>dex $A(current_stream) 1]}]<br />
200 set label [l<strong>in</strong>dex $A(current_stream) 0]<br />
201 unset A(current_stream)<br />
202 puts $F "endstream endobj"<br />
203 lappend A(?$label) [tell $F]<br />
204 puts $F [format {%d %d obj %d endobj} [l<strong>in</strong>dex $A(?$label) 0]\<br />
[l<strong>in</strong>dex $A(?$label) 1] $length]<br />
206 eval "set A(backlog) {}; $A(backlog)"<br />
207 }<br />
pdf::put_obj (proc) The put_obj procedure writes a direct object to a <strong>PDF</strong> file. The syntax is<br />
pdf::put_obj {file} {reference label} {object}<br />
208 proc pdf::put_obj {F label obj} {<br />
209 upvar #0 [namespace current]::$F A<br />
210 if {[<strong>in</strong>fo exists A(current_stream)]} then {<br />
211 append A(backlog) \n [list put_obj $F $label $obj]<br />
19
212 return<br />
213 }<br />
214 if {![<strong>in</strong>fo exists A(!$label)]} then {<br />
215 <strong>in</strong>cr A(last_object_num)<br />
216 set A(!$label) [list $A(last_object_num) 0]<br />
217 }<br />
218 lappend A(!$label) [tell $F]<br />
219 puts $F\<br />
[format {%d %d obj} [l<strong>in</strong>dex $A(!$label) 0] [l<strong>in</strong>dex $A(!$label) 1]]<br />
221 puts $F $obj<br />
222 puts $F endobj<br />
223 }<br />
pdf::rewrite_pdf (proc) The rewrite_pdf procedure opens a new <strong>PDF</strong> file for writ<strong>in</strong>g and <strong>in</strong>itialises the<br />
associated data structures. The syntax is<br />
pdf::rewrite_pdf {file name} 〈options〉<br />
and the return value is the identifier <strong>of</strong> the file opened. The {file name} is <strong>of</strong><br />
course the name <strong>of</strong> that file. The 〈options〉 is zero or more <strong>of</strong><br />
-permissions {<strong>in</strong>teger}<br />
-header {str<strong>in</strong>g}<br />
The permissions are the default permissions for the file <strong>in</strong> question. If this is not<br />
specified, then no such value is specified to open, The header is a str<strong>in</strong>g that will<br />
be put first <strong>in</strong> the file (as header). It defaults to<br />
%<strong>PDF</strong>-1.3<br />
%˚aäö<br />
(<strong>in</strong> UTF-8) where the first l<strong>in</strong>e is a standard header l<strong>in</strong>e, and the second l<strong>in</strong>e is<br />
there to help some s<strong>of</strong>tware understand that the file should be treated as a b<strong>in</strong>ary<br />
file. Note that no newl<strong>in</strong>e is <strong>in</strong>serted after this str<strong>in</strong>g; be sure to <strong>in</strong>clude it <strong>in</strong> the<br />
str<strong>in</strong>g if necessary.<br />
224 proc pdf::rewrite_pdf {name args} {<br />
225 set Opt(-header) [encod<strong>in</strong>g convertto utf-8 %<strong>PDF</strong>-1.3\n%\xe5\xe4\xf6\n]<br />
227 array set Opt $args<br />
228 if {[<strong>in</strong>fo exists Opt(-permissions)]} then {<br />
229 set F [open $name w $Opt(-permissions)]<br />
230 } else {<br />
231 set F [open $name w]<br />
232 }<br />
233 fconfigure $F -translation b<strong>in</strong>ary<br />
234 puts -nonewl<strong>in</strong>e $F $Opt(-header)<br />
235 upvar #0 [namespace current]::$F A<br />
236 array unset A<br />
237 set A(last_object_num) 0<br />
238 set A(backlog) ""<br />
239 return $F<br />
240 }<br />
20
pdf::close_pdf (proc) The close_pdf procedure performs the non-trivial task <strong>of</strong> f<strong>in</strong>ish<strong>in</strong>g <strong>of</strong>f the <strong>PDF</strong><br />
file and clos<strong>in</strong>g it. The syntax is<br />
pdf::close_pdf {file} {catalog label} {key} {value} ∗<br />
and the return value is a report detail<strong>in</strong>g any problems encountered (such as<br />
objects that are referred to but never def<strong>in</strong>ed). This is a report rather than an<br />
error, because there is <strong>in</strong> many cases no sharp dist<strong>in</strong>ction. If the return value is<br />
non-empty, then there is probably a bug <strong>in</strong> your program that needs to be fixed.<br />
The {file} is the identifier <strong>of</strong> the file to write. The {catalog label} is the reference<br />
label <strong>of</strong> the Catalog object <strong>in</strong> the document. The rema<strong>in</strong><strong>in</strong>g arguments can be<br />
used to <strong>in</strong>sert additional <strong>in</strong>formation (such as a reference to the Info dictionary <strong>of</strong><br />
the document) <strong>in</strong> the trailer dictionary.<br />
241 proc pdf::close_pdf {F label args} {<br />
242 upvar #0 [namespace current]::$F A<br />
243 set reportL [list]<br />
The first step is to compile the cross-reference table <strong>of</strong> the document. I orig<strong>in</strong>ally<br />
made one subsection for each range <strong>of</strong> def<strong>in</strong>ed <strong>in</strong>direct objects, giv<strong>in</strong>g the<br />
mandatory free entry #0 a separate subsection, but for some reason Adobe s<strong>of</strong>tware<br />
didn’t like that at all. 2 Hence the current implementation is to make a<br />
cross-reference table with only one subsection, with an explicit free entry for every<br />
miss<strong>in</strong>g item.<br />
The xrA array constructed below is a prototype for the cross-reference section.<br />
It is <strong>in</strong>dexed by object number and the entries have the list structure<br />
{file position} {generation number} {type}<br />
Just as <strong>in</strong> a <strong>PDF</strong> file, the {type} is either f or n depend<strong>in</strong>g on whether the entry<br />
is “free” or “<strong>in</strong> use”. The {file position} and {generation number} are however<br />
not padded with zeros, and the {file position} is <strong>in</strong>itially an empty str<strong>in</strong>g <strong>in</strong> the<br />
“free” entries.<br />
This first round simply collects the <strong>in</strong>formation and detects collisions.<br />
244 set xrA(0) [list "" 65535 f]<br />
245 foreach lbl [array names A {[!?]*}] {<br />
246 set idx [l<strong>in</strong>dex $A($lbl) 0]<br />
247 set ent [list [l<strong>in</strong>dex $A($lbl) 2] [l<strong>in</strong>dex $A($lbl) 1] n]<br />
248 if {[llength $A($lbl)]3} then {<br />
253 lappend reportL "Multiple <strong>in</strong>direct objects\<br />
for label [str<strong>in</strong>g range $lbl 1 end]; at\<br />
[jo<strong>in</strong> [lrange $A($lbl) 2 end]]."<br />
2 Whether this means Adobe isn’t follow<strong>in</strong>g their own standard I leave to others to decide.<br />
Neither GhostScript nor Quartz (the <strong>PDF</strong>-based graphics system <strong>in</strong> Mac OS X) seemed to have<br />
any problems with this arrangement.<br />
21
256 }<br />
257 if {![<strong>in</strong>fo exists xrA($idx)]} then {<br />
258 set xrA($idx) $ent<br />
259 } elseif {[l<strong>in</strong>dex $xrA($idx) 2]=="f" && [l<strong>in</strong>dex $ent 2]=="n"}\<br />
then {<br />
261 lappend reportL "This shouldn’t happen: There are several\<br />
reference labels for <strong>in</strong>direct object $idx. Us<strong>in</strong>g that with\<br />
label: [str<strong>in</strong>g range $lbl 1 end]"<br />
265 set xrA($idx) $ent<br />
266 } else {<br />
267 lappend reportL "This shouldn’t happen: There are several\<br />
reference labels for <strong>in</strong>direct object $idx. Ignor<strong>in</strong>g that\<br />
with label: [str<strong>in</strong>g range $lbl 1 end]"<br />
271 }<br />
272 }<br />
The second round makes sure that there is a contiguous sequence <strong>of</strong> reference<br />
numbers and constructs the l<strong>in</strong>ked list <strong>of</strong> free entries.<br />
273 set last_free 0<br />
274 set maxidx [l<strong>in</strong>dex [lsort -<strong>in</strong>teger -decreas<strong>in</strong>g [array names xrA]] 0]<br />
275 for {set n $maxidx} {$n>=0} {<strong>in</strong>cr n -1} {<br />
276 if {![<strong>in</strong>fo exists xrA($n)]} then {<br />
277 set xrA($n) [list "" 0 f]<br />
278 lappend reportL "This shouldn’t happen: Object number $n was\<br />
allocated, but not assigned a reference label."<br />
281 }<br />
282 if {[l<strong>in</strong>dex $xrA($n) 2]=="f"} then {<br />
283 set xrA($n) [lreplace $xrA($n) 0 0 $last_free]<br />
284 set last_free $n<br />
285 }<br />
286 }<br />
Now the cross-reference section can be written to file.<br />
287 set startxref [tell $F]<br />
288 puts $F xref<br />
289 puts $F [format {%d %d} 0 [expr {$maxidx + 1}]]<br />
290 for {set n 0} {$n
303 puts $F "startxref\n${startxref}\n%%EOF"<br />
The f<strong>in</strong>al step is to close the file and compile the report.<br />
304 close $F<br />
305 jo<strong>in</strong> $reportL \n<br />
306 }<br />
307 〈/pkg〉<br />
2.3 Hello World<br />
The code below creates a <strong>PDF</strong> file match<strong>in</strong>g the <strong>basic</strong> “Hello World” example [1,<br />
Sec. A.2].<br />
308 〈∗example1〉<br />
309 set F [pdf::rewrite_pdf hello.pdf]<br />
310 pdf::put_obj $F "The catalog" [pdf::dict_obj\<br />
311 /Type /Catalog\<br />
312 /Pages [pdf::obj_ref $F "The pages"]\<br />
313 /Outl<strong>in</strong>es [pdf::obj_ref $F "The outl<strong>in</strong>es"]]<br />
314 pdf::put_obj $F "The outl<strong>in</strong>es"\<br />
[pdf::dict_obj /Type /Outl<strong>in</strong>es /Count [pdf::<strong>in</strong>t_obj 0]]<br />
316 pdf::put_obj $F "The pages" [pdf::dict_obj\<br />
317 /Type /Pages\<br />
318 /Count [pdf::<strong>in</strong>t_obj 1]\<br />
319 /Kids [pdf::array_obj [pdf::obj_ref $F "Page 1"]]]<br />
320 pdf::put_obj $F "Page 1" [pdf::dict_obj\<br />
321 /Type /Page\<br />
322 /Parent [pdf::obj_ref $F "The pages"]\<br />
323 /Resources [pdf::dict_obj\<br />
324 /Font [pdf::dict_obj /F1 [pdf::obj_ref $F "Helvetica"]]\<br />
325 /ProcSet [pdf::obj_ref $F "The procs"]]\<br />
326 /MediaBox [pdf::array_obj [pdf::<strong>in</strong>t_obj 0] [pdf::<strong>in</strong>t_obj 0]\<br />
[pdf::<strong>in</strong>t_obj 612] [pdf::<strong>in</strong>t_obj 792]]\<br />
328 /Contents [pdf::obj_ref $F "Page 1 contents"]]<br />
329 pdf::beg<strong>in</strong>_stream $F "Page 1 contents"<br />
330 puts $F {BT}<br />
331 puts $F {/F1 24 Tf}<br />
332 puts $F {100 100 Td (Hello World) Tj}<br />
333 puts $F {ET}<br />
334 pdf::end_stream $F<br />
335 pdf::put_obj $F "The procs" [pdf::array_obj /<strong>PDF</strong> /Text]<br />
336 pdf::put_obj $F "Helvetica" [pdf::dict_obj /Type /Font /Subtype /Type1\<br />
/Name /F1 /BaseFont /Helvetica /Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />
339 pdf::close_pdf $F "The catalog"<br />
340 〈/example1〉<br />
3 Contents and resources<br />
Most <strong>of</strong> the th<strong>in</strong>gs one actually sees <strong>of</strong> a <strong>PDF</strong> document is part <strong>of</strong> a content stream,<br />
which is the side <strong>of</strong> <strong>PDF</strong> which is most like a simplified Postscript file: a sequence<br />
23
pdf::resource_dict_obj<br />
(proc)<br />
<strong>of</strong> simple operators for draw<strong>in</strong>g text and graphics, and before each operator is<br />
arguments. One difference is however that many types <strong>of</strong> data are not permitted<br />
with<strong>in</strong> a content stream, because some aspects (<strong>in</strong>direct objects, dictionaries) <strong>of</strong><br />
the required forms <strong>of</strong> such data are not permitted there. Instead the content<br />
stream has to be supplemented by a resources dictionary, which locally associates<br />
names to objects, and these names are what one may use <strong>in</strong> the content stream.<br />
The model used here to overcome this is to equip the <strong>in</strong>ternal representation <strong>of</strong><br />
a contents stream with a representation <strong>of</strong> the correspond<strong>in</strong>g resources dictionary.<br />
Commands emitt<strong>in</strong>g operators that make use <strong>of</strong> such <strong>in</strong>direct resources should<br />
check if these are present <strong>in</strong> the resources dictionary, and see to that they are<br />
added if they were not. The resources dictionary is uniquely identified by the file<br />
identifier and stream object label.<br />
3.1 Resources representation<br />
Resources dictionaries are kept <strong>in</strong> arrays, where each resource type (or equivalently:<br />
entry <strong>in</strong> the dictionary) has a separate entry. These entries are key–value<br />
lists where the keys are <strong>PDF</strong> name objects and the values are the underly<strong>in</strong>g resource<br />
objects (normally <strong>in</strong>direct references). (An exception is the ProcSet entry,<br />
which is a straight list <strong>of</strong> names.) The resource type names are the same as <strong>in</strong><br />
the <strong>PDF</strong> file, e.g. XObject, Font, and ProcSet—<strong>in</strong> other words, don’t <strong>in</strong>clude a<br />
lead<strong>in</strong>g slash.<br />
S<strong>in</strong>ce explicit declaration <strong>of</strong> procsets was declared obsolete <strong>in</strong> <strong>PDF</strong> 1.4 and<br />
wasn’t very useful earlier either, most <strong>of</strong> the support for specify<strong>in</strong>g procsets has<br />
been removed from the pdf package, and the ProcSet entries <strong>in</strong>stead default to<br />
list<strong>in</strong>g all five procsets. If for some reason you wish to specify a smaller set <strong>of</strong><br />
procsets, then set the ProcSet entry <strong>of</strong> your resources array to a list <strong>of</strong> those<br />
names <strong>of</strong> procsets that you want to require.<br />
The resource_dict_obj procedure returns the <strong>PDF</strong> dictionary object for the<br />
data kept <strong>in</strong> an array. The call syntax is<br />
pdf::resource_dict_obj {array-name}<br />
where the {array-name} refers to an array <strong>in</strong> the local context <strong>of</strong> the caller.<br />
If the array does not conta<strong>in</strong> any ProcSet entry, then for compatibility such<br />
an entry list<strong>in</strong>g all five procsets is <strong>in</strong>serted.<br />
341 〈∗pkg〉<br />
342 proc pdf::resource_dict_obj {arrname} {<br />
343 upvar 1 $arrname A<br />
344 set call [list dict_obj]<br />
345 if {![<strong>in</strong>fo exists A(ProcSet)]} then {<br />
346 lappend call /ProcSet {[/<strong>PDF</strong>/Text/ImageB/ImageC/ImageI]}<br />
347 }<br />
348 foreach type [array names A] {<br />
349 lappend call [name_obj $type]<br />
350 if {$type == "ProcSet"} then {<br />
351 lappend call [eval [l<strong>in</strong>sert $A(ProcSet) 0 array_obj]]<br />
24
pdf::file〈num〉<br />
(Resources/〈type〉)<br />
pdf::beg<strong>in</strong>_contents<br />
(proc)<br />
pdf::end_contents (proc)<br />
352 } else {<br />
353 lappend call [eval [l<strong>in</strong>sert $A($type) 0 dict_obj]]<br />
354 }<br />
355 }<br />
356 eval $call<br />
357 }<br />
When a content stream is be<strong>in</strong>g written to, the resources dictionary data <strong>of</strong> that<br />
stream is kept <strong>in</strong> the ma<strong>in</strong> array <strong>of</strong> that <strong>PDF</strong> file. The entry formats are the same<br />
as when kept <strong>in</strong> a separate array, but the entry names are prefixed by Resources/<br />
to prevent name clashes.<br />
The beg<strong>in</strong>_contents and end_contents procedures are specialised forms <strong>of</strong><br />
beg<strong>in</strong>_stream and end_stream that, <strong>in</strong> addition to delimit<strong>in</strong>g the creation <strong>of</strong><br />
a stream object, manage the associated resources dictionary.<br />
The syntax for beg<strong>in</strong>_content is<br />
pdf::beg<strong>in</strong>_contents {resources-array} {file} {reference label} {key}<br />
{value} ∗<br />
where all arguments except {resources-array} are as for beg<strong>in</strong>_stream. If this<br />
extra argument is nonempty then it is the name <strong>in</strong> the local context <strong>of</strong> the caller<br />
<strong>of</strong> an array represent<strong>in</strong>g a resources dictionary; the procedure copies the contents<br />
<strong>of</strong> that array to the current resources dictionary for this file. If {resources-array}<br />
is empty then the current resources dictionary is set to be<strong>in</strong>g empty.<br />
358 proc pdf::beg<strong>in</strong>_contents {arr F label args} {<br />
359 eval [list beg<strong>in</strong>_stream $F $label] $args<br />
360 upvar #0 [namespace current]::$F A<br />
361 array unset A Resources/*<br />
362 if {[str<strong>in</strong>g length $arr]} then {<br />
363 upvar 1 $arr B<br />
364 foreach type [array names B] {<br />
365 set A(Resources/$type) $B($type)<br />
366 }<br />
367 }<br />
368 }<br />
The end_contents procedure has the syntax<br />
pdf::end_contents {resources-array} {file}<br />
It copies the current resources dictionary data for the {file} to the {resources-array}<br />
(variable name <strong>in</strong> the local context <strong>of</strong> the caller) and then calls end_stream to<br />
end the current contents stream.<br />
369 proc pdf::end_contents {arr F} {<br />
370 upvar #0 [namespace current]::$F A<br />
371 if {![<strong>in</strong>fo exists A(current_stream)]} then {<br />
372 error "There is no stream to end."<br />
373 }<br />
25
pdf::has_resource?<br />
(proc)<br />
374 upvar 1 $arr B<br />
375 foreach <strong>in</strong>dex [array names A Resources/*] {<br />
376 set type [str<strong>in</strong>g range $<strong>in</strong>dex 10 end]<br />
377 set B($type) $A($<strong>in</strong>dex)<br />
378 }<br />
379 end_stream $F<br />
380 }<br />
The has_resource? procedure can be used to query whether a particular resource<br />
is present <strong>in</strong> the current resources dictionary. The syntax is<br />
pdf::has_resource? {file} {type} {object} {name-var} ?<br />
and the return value is 1 if the {object} is one <strong>of</strong> the objects listed under the<br />
{type} <strong>in</strong> the current dictionary <strong>of</strong> the file {file} and 0 otherwise. If a {name-var}<br />
is specified and the return value is 1 then that variable <strong>in</strong> the local context <strong>of</strong> the<br />
caller will be set to the <strong>PDF</strong> name object associated with the given {object}.<br />
381 〈∗obsolete〉<br />
382 proc pdf::has_resource? {F type obj {namevar {}}} {<br />
383 upvar #0 [namespace current]::$F A<br />
384 if {![<strong>in</strong>fo exists A(Resources/$type)]} then {return 0}<br />
385 if {$type == "ProcSet"} then {<br />
386 if {[lsearch -exact $A(Resources/$type) $obj] >= 0} then {<br />
387 if {[str<strong>in</strong>g length $namevar]} then {<br />
388 uplevel 1 [list ::set $namevar $obj]<br />
389 }<br />
390 return 1<br />
391 } else {<br />
392 return 0<br />
393 }<br />
394 }<br />
395 foreach {name resobj} $A(Resources/$type) {<br />
396 if {[str<strong>in</strong>g equal $resobj $obj]} then {<br />
397 if {[str<strong>in</strong>g length $namevar]} then {<br />
398 uplevel 1 [list ::set $namevar $name]<br />
399 }<br />
400 return 1<br />
401 }<br />
402 }<br />
403 return 0<br />
404 }<br />
405 〈/obsolete〉<br />
pdf::name_resource (proc) The name_resource procedure provides a name object referr<strong>in</strong>g to an object and<br />
(if necessary) adds that object to the current resources dictionary <strong>of</strong> the file. The<br />
syntax is<br />
pdf::name_resource {var-name} {file} {type} {object} {suggested<br />
name} ?<br />
26
where {var-name} is the name <strong>of</strong> a variable <strong>in</strong> the local context <strong>of</strong> the caller that<br />
will be set to the name object referr<strong>in</strong>g to the specified resuource. The result is 0<br />
if the resource was already present and 1 if an entry for it was added.<br />
The {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file <strong>in</strong> which the stream is located for<br />
which this resource is go<strong>in</strong>g to be made available. The {type} is the name (slash<br />
not <strong>in</strong>cluded) <strong>of</strong> the resource dictionary entry where this resource should be placed,<br />
e.g. Font, XObject, etc. The {object} is the object that constitutes the resource to<br />
name. The {suggested name} argument can be used to request a particular name<br />
for the resource; it should be the <strong>PDF</strong> name object to give the resource. An error<br />
will be raised if that name is already used for some other resource <strong>of</strong> that type.<br />
The {type} must not be ProcSet.<br />
406 proc pdf::name_resource {varname F type obj {name {}}} {<br />
407 upvar #0 [namespace current]::$F A<br />
408 switch -- $type ColorSpace {<br />
409 set short_type /CS<br />
410 } XObject {<br />
411 set short_type /XO<br />
412 } ExtGState {<br />
413 set short_type /GS<br />
414 } Font {<br />
415 set short_type /F<br />
416 } Pattern {<br />
417 set short_type /Pat<br />
418 } ProcSet {<br />
419 error {If you really th<strong>in</strong>k you need to bother about procsets,\<br />
then access the array directly.}<br />
421 } Properties {<br />
422 set short_type /Prop<br />
423 } Shad<strong>in</strong>g {<br />
424 set short_type /Sh<br />
425 } default {<br />
426 set short_type /$type<br />
427 }<br />
428 if {![<strong>in</strong>fo exists A(Resources/$type)]} then {<br />
429 if {![str<strong>in</strong>g length $name]} then {<br />
430 set name ${short_type}0<br />
431 }<br />
432 set A(Resources/$type) [list $name $obj]<br />
433 uplevel 1 [list ::set $varname $name]<br />
434 return 1<br />
435 }<br />
436 if {[str<strong>in</strong>g length $name]} then {<br />
437 foreach {key val} $A(Resources/$type) {<br />
438 if {[str<strong>in</strong>g equal $key $name]} then {<br />
439 if {![str<strong>in</strong>g equal $obj $val]} then {<br />
440 error "Name already <strong>in</strong> use for: $val"<br />
441 }<br />
442 uplevel 1 [list ::set $varname $name]<br />
443 return 0<br />
27
pdf::require_procsets<br />
(proc)<br />
444 }<br />
445 }<br />
446 lappend A(Resources/$type) $name $obj<br />
447 uplevel 1 [list ::set $varname $name]<br />
448 return 1<br />
449 }<br />
450 set name "${short_type}[expr {[llength $A(Resources/$type)]/2}]"<br />
451 regsub -all {[\[\]?*\\]} $short_type {\\&} pattern<br />
452 append pattern *<br />
453 set free 1<br />
454 foreach {key val} $A(Resources/$type) {<br />
455 if {[str<strong>in</strong>g equal $val $obj]} then {<br />
456 uplevel 1 [list ::set $varname $key]<br />
457 return 0<br />
458 }<br />
459 if {[str<strong>in</strong>g equal $key $name]} then {set free 0}<br />
460 if {[str<strong>in</strong>g match $pattern $key]} then {<br />
461 set Used([str<strong>in</strong>g range $key [str<strong>in</strong>g length $short_type] end])\<br />
{}<br />
462 }<br />
463 }<br />
464 if {!$free} then {<br />
465 set n [expr {[llength $A(Resources/$type)]/2}]<br />
466 while {[<strong>in</strong>fo exists Used($n)]} {<strong>in</strong>cr n}<br />
467 set name ${short_type}$n<br />
468 }<br />
469 lappend A(Resources/$type) $name $obj<br />
470 uplevel 1 [list ::set $varname $name]<br />
471 return 1<br />
472 }<br />
The require_procsets procedure is called to make sure that certa<strong>in</strong> ProcSets<br />
are listed the current resources dictionary. The syntax is<br />
pdf::require_procsets {file} {name obj } ∗<br />
where {file} is the relevant file and the {name obj }s are the <strong>PDF</strong> name objects <strong>of</strong><br />
the required ProcSets.<br />
473 〈∗obsolete〉<br />
474 proc pdf::require_procsets {F args} {<br />
475 upvar #0 [namespace current]::$F A<br />
476 if {![<strong>in</strong>fo exists A(Resources/ProcSet)]} then {<br />
477 set A(Resources/ProcSet) $args<br />
478 } else {<br />
479 set A(Resources/ProcSet) [lsort -dictionary -unique [<br />
480 concat $A(Resources/ProcSet) $args<br />
481 ]]<br />
482 }<br />
483 }<br />
484 〈/obsolete〉<br />
28
3.2 Formatt<strong>in</strong>g content<br />
pdf::spr<strong>in</strong>tf (proc) The spr<strong>in</strong>tf procedure formats data for writ<strong>in</strong>g to a <strong>PDF</strong> contents stream. The<br />
syntax is<br />
pdf::spr<strong>in</strong>tf {format list} {data} ∗<br />
and the return value is the result<strong>in</strong>g <strong>PDF</strong> code.<br />
The {format list} is similar to the formatt<strong>in</strong>g str<strong>in</strong>g <strong>of</strong> format, but every<br />
conversion specifier must be a separate list element. List elements that are not<br />
conversion specifiers are copied verbatim to the result. Material from different list<br />
elements are always separated by whitespace <strong>in</strong> the result.<br />
As with format, the first character <strong>of</strong> a conversion specifier is always a ‘%’.<br />
The exact format is<br />
%〈char〉 〈count〉(.〈precision〉) ? ?<br />
(a 〈precision〉 field requires specify<strong>in</strong>g a 〈count〉 because the conversion specifiers<br />
are parsed us<strong>in</strong>g scan). The 〈count〉 defaults to 1 and specify<strong>in</strong>g a non-unit 〈count〉<br />
is equivalent to specify<strong>in</strong>g that many separate conversion specifiers <strong>in</strong> sequence.<br />
The 〈precision〉 is only used by real and length conversions.<br />
The conversion character 〈char〉 specifies how the {data} should be converted.<br />
The <strong>basic</strong> conversions are<br />
b Boolean, to be formatted by boolean_obj.<br />
i Integer, to be formatted by <strong>in</strong>t_obj.<br />
l Length, to be formatted by length_obj. This consumes two {data} arguments:<br />
one for the value and one for the unit.<br />
n Str<strong>in</strong>g, to be formatted by name_obj.<br />
o Already formatted <strong>PDF</strong> object.<br />
r Real number, to be formatted by real_obj (with default precision accord<strong>in</strong>g<br />
to the precision variable).<br />
s <strong>PDF</strong> str<strong>in</strong>g, to be formatted by str<strong>in</strong>g_obj.<br />
In addition, the correspond<strong>in</strong>g upper case letters select the same formatt<strong>in</strong>g, but<br />
the (first) {data} argument is <strong>in</strong>terpreted as a list <strong>of</strong> th<strong>in</strong>gs to process <strong>in</strong> the<br />
specified way. F<strong>in</strong>ally, if the character is an & then the {data} is <strong>in</strong>terpreted as a<br />
list<br />
{format list} {data} ∗<br />
which will be formatted by a recursive spr<strong>in</strong>tf call and <strong>in</strong>serted <strong>in</strong>to the result<br />
at that position. This is <strong>in</strong>tended to simplify encod<strong>in</strong>g structured data.<br />
485 proc pdf::spr<strong>in</strong>tf {format args} {<br />
486 variable precision<br />
29
487 set items [list]<br />
488 set n 0<br />
489 foreach spec $format {<br />
490 set count 1<br />
491 set prec $precision<br />
492 if {![scan $spec {%%%[bilnorsBILNORS&]%d.%d} code count prec]}\<br />
then {<br />
494 lappend items $spec<br />
495 } else {<br />
496 for {} {$count>=1} {<strong>in</strong>cr count -1; <strong>in</strong>cr n} {<br />
497 set datum [l<strong>in</strong>dex $args $n]<br />
498 switch -- $code "b" {<br />
499 lappend items [boolean_obj $datum]<br />
500 } "i" {<br />
501 lappend items [<strong>in</strong>t_obj $datum]<br />
502 } "l" {<br />
503 lappend items [<br />
504 length_obj $datum [l<strong>in</strong>dex $args [<strong>in</strong>cr n]] $prec<br />
505 ]<br />
506 } "n" {<br />
507 lappend items [name_obj $datum]<br />
508 } "o" {<br />
509 lappend items $datum<br />
510 } "r" {<br />
511 lappend items [real_obj $datum $prec]<br />
512 } "s" {<br />
513 lappend items [str<strong>in</strong>g_obj $datum]<br />
514 } "B" {<br />
515 foreach d $datum {lappend items [boolean_obj $d]}<br />
516 } "I" {<br />
517 foreach d $datum {lappend items [<strong>in</strong>t_obj $d]}<br />
518 } "L" {<br />
519 set unit [l<strong>in</strong>dex $args [<strong>in</strong>cr n]]<br />
520 foreach d $datum {<br />
521 lappend items [length_obj $d $unit $prec]<br />
522 }<br />
523 } "N" {<br />
524 foreach d $datum {lappend items [name_obj $d]}<br />
525 } "O" {<br />
526 eval [list lappend items] $datum<br />
527 } "R" {<br />
528 foreach d $datum {<br />
529 lappend items [real_obj $d $prec]<br />
530 }<br />
531 } "S" {<br />
532 foreach d $datum {lappend items [str<strong>in</strong>g_obj $d]}<br />
533 } "&" {<br />
534 lappend items [eval [l<strong>in</strong>sert $datum 0 spr<strong>in</strong>tf]]<br />
535 } default {<br />
536 error "Bad pdf::spr<strong>in</strong>tf format specifier ‘$spec’."<br />
30
537 }<br />
538 }<br />
539 }<br />
540 }<br />
541 jo<strong>in</strong> $items<br />
542 }<br />
pdf::pr<strong>in</strong>tf (proc) The pr<strong>in</strong>tf procedure is an extension <strong>of</strong> spr<strong>in</strong>tf that immediately writes the<br />
formatted str<strong>in</strong>g to a file rather than return<strong>in</strong>g it. The syntax is<br />
pdf::pr<strong>in</strong>tf {file} {format list} {data} ∗<br />
543 proc pdf::pr<strong>in</strong>tf {F format args} {<br />
544 puts $F [eval [list spr<strong>in</strong>tf $format] $args]<br />
545 }<br />
546 〈/pkg〉<br />
3.3 Hello aga<strong>in</strong>, World<br />
The code below is an example that achieves very much the same th<strong>in</strong>gs as that <strong>in</strong><br />
Subsection 2.3, but this time us<strong>in</strong>g the resource management and data formatt<strong>in</strong>g<br />
provided for content streams.<br />
547 〈∗example2〉<br />
548 set F [pdf::rewrite_pdf helloaga<strong>in</strong>.pdf]<br />
549 pdf::put_obj $F "Helvetica" [pdf::dict_obj /Type /Font /Subtype /Type1\<br />
/BaseFont /Helvetica /Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />
(It turns out that the /Name entry, which is <strong>in</strong>cluded <strong>in</strong> 〈example1〉, <strong>of</strong> <strong>PDF</strong> files<br />
has been depracated for quite some time, although it is still <strong>in</strong> the “Hello world”<br />
example <strong>of</strong> the <strong>PDF</strong> 1.5 specification.)<br />
With resource management, page contents is merely the follow<strong>in</strong>g.<br />
552 pdf::beg<strong>in</strong>_contents "" $F "Page 1 contents"<br />
553 pdf::name_resource Helvetica $F Font [pdf::obj_ref $F "Helvetica"]<br />
554 pdf::pr<strong>in</strong>tf $F {BT %o %i Tf %r2 Td %s Tj ET} $Helvetica 24 100 100 \<br />
{Hello aga<strong>in</strong>, World!}<br />
Let’s add also some graphics: a green circle with midpo<strong>in</strong>t (200, 200) and radius<br />
50. <strong>PDF</strong> doesn’t have circular arcs, but the MetaFont four segment approximation<br />
should do nicely. This places the control po<strong>in</strong>ts 4<br />
√ −1<br />
3 1 + 2 ≈ 0.552284749831<br />
<strong>of</strong> the radius from their nearest knot, and for a radius <strong>of</strong> 50 that is very nearly<br />
27.6.<br />
556 pdf::pr<strong>in</strong>tf $F {%R rg} {0 1 0}<br />
557 pdf::pr<strong>in</strong>tf $F {%R m %R3 c %R3 c %R3 c %R3 c f}\<br />
558 {200 150}\<br />
559 {227.6 150} {250 172.4} {250 200}\<br />
560 {250 227.6} {227.6 250} {200 250}\<br />
561 {172.4 250} {150 227.6} {150 200}\<br />
562 {150 172.4} {172.4 150} {200 150}<br />
563 pdf::end_contents Res1 $F<br />
31
pdf::file〈num〉<br />
(Pages/〈num〉)<br />
pdf::file〈num〉<br />
(Pages/prefix)<br />
pdf::file〈num〉<br />
(Pages/arity)<br />
pdf::file〈num〉<br />
(Pages/last)<br />
pdf::file〈num〉<br />
(Pages/attributes)<br />
564 pdf::put_obj $F "Page 1" [pdf::dict_obj\<br />
565 /Type /Page\<br />
566 /Parent [pdf::obj_ref $F "The pages"]\<br />
567 /MediaBox [pdf::array_obj [pdf::<strong>in</strong>t_obj 0] [pdf::<strong>in</strong>t_obj 0]\<br />
[pdf::<strong>in</strong>t_obj 612] [pdf::<strong>in</strong>t_obj 792]]\<br />
569 /Resources [pdf::resource_dict_obj Res1]\<br />
570 /Contents [pdf::obj_ref $F "Page 1 contents"]]<br />
571 pdf::put_obj $F "The pages" [pdf::dict_obj\<br />
572 /Type /Pages\<br />
573 /Count [pdf::<strong>in</strong>t_obj 1]\<br />
574 /Kids [pdf::array_obj [pdf::obj_ref $F "Page 1"]]]<br />
575 pdf::put_obj $F "The catalog"\<br />
[pdf::dict_obj /Type /Catalog /Pages [pdf::obj_ref $F "The pages"]]<br />
There is really no po<strong>in</strong>t <strong>in</strong> mak<strong>in</strong>g an /Outl<strong>in</strong>es dictionary that would anyway<br />
be empty.<br />
Someth<strong>in</strong>g there is a po<strong>in</strong>t <strong>in</strong> mak<strong>in</strong>g is however a document <strong>in</strong>formation dictionary.<br />
578 pdf::put_obj $F "Document <strong>in</strong>fo" [pdf::dict_obj\<br />
579 /Title [pdf::text_obj "Hello aga<strong>in</strong>, world!"]\<br />
580 /CreationDate [pdf::date_obj [clock seconds]] ]<br />
582 pdf::close_pdf $F "The catalog" /Info [pdf::obj_ref $F "Document <strong>in</strong>fo"]<br />
583 〈/example2〉<br />
4 Document pages<br />
4.1 The tree <strong>of</strong> pages<br />
One <strong>of</strong> the quirks <strong>of</strong> <strong>PDF</strong> is the (very data structure) requirement that (amongst<br />
other th<strong>in</strong>gs) pages have to be organised <strong>in</strong> a tree structure, where l<strong>in</strong>ks go not only<br />
from parent to child, but also from child to parent. This is def<strong>in</strong>itely someth<strong>in</strong>g<br />
that programmers shouldn’t have to bother with, so the pdf package can take care<br />
<strong>of</strong> generat<strong>in</strong>g such a structure when pages are merely sequentially appended to<br />
the document.<br />
At the heart <strong>of</strong> the page tree generation lies the prelim<strong>in</strong>ary representations <strong>of</strong><br />
Pages tree nodes that have to be constructed before actual code can be written to<br />
the file. Every Pages node has an entry <strong>in</strong> the array <strong>of</strong> the file, and the contents<br />
<strong>of</strong> these entries are lists with the structure<br />
{kid label} {kid count} +<br />
where each pair <strong>of</strong> elements corresponds to one child node. The {kid label} is the<br />
reference label for this node and the {kid count} is the number <strong>of</strong> pages <strong>in</strong> that<br />
subtree.<br />
Build<strong>in</strong>g a Pages tree necessarily means that Pages nodes, which are <strong>in</strong>direct<br />
objects, have to be created. That <strong>in</strong> turn means that they will have to be assigned<br />
32
labels, and <strong>in</strong> order to avoid clashes with labels used elsewhere, the user is required<br />
to specify a label prefix for the Pages tree system to use. This prefix is stored <strong>in</strong><br />
the Pages/prefix entry <strong>of</strong> the file array.<br />
The maximal number <strong>of</strong> children a node is allowed to have is kept <strong>in</strong> the<br />
Pages/arity entry. The number <strong>of</strong> the most recently created node is kept <strong>in</strong> the<br />
Pages/last node.<br />
The Pages/attributes entry is a list <strong>of</strong> keys and values to <strong>in</strong>sert <strong>in</strong>to the root<br />
Pages node.<br />
pdf::beg<strong>in</strong>_pages (proc) The beg<strong>in</strong>_pages procedure <strong>in</strong>itialises the Pages tree system for a <strong>PDF</strong> file. The<br />
syntax is<br />
pdf::beg<strong>in</strong>_pages {file} {label prefix} {option} {value} ∗<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file and {label prefix} will be used as prefix<br />
<strong>of</strong> all reference labels created by the Pages tree system. An {option} {value} is<br />
either<br />
-arity {arity}<br />
or a pair <strong>of</strong> <strong>PDF</strong> objects, where the first is a name object. The {arity} is the<br />
maximal number <strong>of</strong> children a node is allowed to have; it defaults to 5. The <strong>PDF</strong><br />
object pairs will be <strong>in</strong>serted <strong>in</strong>to the root Pages node. Additional such items may<br />
be specified at end_pages.<br />
584 〈∗pkg〉<br />
585 proc pdf::beg<strong>in</strong>_pages {F prefix args} {<br />
586 upvar #0 [namespace current]::$F A<br />
587 set A(Pages/arity) 5<br />
588 set A(Pages/attributes) [list]<br />
589 set A(Pages/prefix) $prefix<br />
590 foreach {option value} $args {<br />
591 switch -glob -- $option -arity {<br />
592 set A(Pages/arity) $value<br />
593 } /* {<br />
594 lappend A(Pages/attributes) $option $value<br />
595 } default {<br />
596 error "Unknown option: $option"<br />
597 }<br />
598 }<br />
599 set A(Pages/last) 1<br />
600 set A(Pages/1) [list]<br />
601 }<br />
pdf::shipout (proc) This procedure writes a Page object to a file and <strong>in</strong>serts that <strong>in</strong>to the Pages tree<br />
<strong>of</strong> that file after all pages previously <strong>in</strong>serted. The syntax is<br />
pdf::shipout {file} {label} {key} {object} +<br />
where {file} is the <strong>PDF</strong> file identifier and {label} is the reference label for the page<br />
object. The {key} and {object} arguments are attributes for the page object (keys<br />
33
and values for the dicitionary). This should not <strong>in</strong>clude the /Type and /Parent<br />
attributes, which are <strong>in</strong>serted automatically.<br />
602 proc pdf::shipout {F label args} {<br />
603 upvar #0 [namespace current]::$F A<br />
604 if {[llength $A(Pages/$A(Pages/last))]/2 >= $A(Pages/arity)} then {<br />
605 <strong>in</strong>cr A(Pages/last)<br />
606 set A(Pages/$A(Pages/last)) [list]<br />
607 }<br />
608 put_obj $F $label [eval [l<strong>in</strong>sert $args 0 dict_obj /Type /Page\<br />
/Parent [obj_ref $F $A(Pages/prefix)$A(Pages/last)]]]<br />
610 lappend A(Pages/$A(Pages/last)) $label 1<br />
611 }<br />
pdf::end_pages (proc) The end_pages procedure completes the Pages tree for a <strong>PDF</strong> file and returns a<br />
reference to the root object <strong>of</strong> that tree. The syntax is<br />
pdf::make_pages_nodes<br />
(proc)<br />
pdf::end_pages {file} 〈attributes〉<br />
where the {file} is the <strong>PDF</strong> file identifier and 〈attributes〉 are attributes to <strong>in</strong>sert<br />
<strong>in</strong>to the root node <strong>of</strong> the Pages tree.<br />
The make_pages_nodes procedure takes a list <strong>of</strong> numbers <strong>of</strong> Pages nodes that<br />
have not yet been written to file and writes objects for these nodes to file. The<br />
syntax is<br />
pdf::make_pages_nodes {file} {node-list} {parent} ?<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file and {node-list} is the list <strong>of</strong> node<br />
numbers. If there is a {parent} argument then the Pages node with this number<br />
will be made the parent <strong>of</strong> the listed nodes, and the return value is the list <strong>of</strong><br />
reference labels and page counts that need to be <strong>in</strong>cluded <strong>in</strong> the Pages/〈parent〉<br />
entry <strong>of</strong> the file’s array. If there is not a {parent} argument then the procedure<br />
allocates a new Pages node and makes that the parent <strong>of</strong> the listed nodes; the<br />
result is then the number <strong>of</strong> the newly allocated parent.<br />
612 proc pdf::make_pages_nodes {F nodeL {parent -1}} {<br />
613 upvar #0 [namespace current]::$F A<br />
614 if {$parent < 0} then {<br />
615 set p [<strong>in</strong>cr A(Pages/last)]<br />
616 } else {<br />
617 set p $parent<br />
618 }<br />
619 set res [list]<br />
620 set parent_obj [obj_ref $F $A(Pages/prefix)$p]<br />
621 foreach i $nodeL {<br />
622 set count 0<br />
623 set kids [list array_obj]<br />
624 foreach {label c} $A(Pages/$i) {<br />
625 lappend kids [obj_ref $F $label]<br />
626 <strong>in</strong>cr count $c<br />
34
627 }<br />
628 set label $A(Pages/prefix)$i<br />
629 lappend res $label $count<br />
630 put_obj $F $label [dict_obj /Type /Pages /Kids [eval $kids]\<br />
/Count [<strong>in</strong>t_obj $count] /Parent $parent_obj]<br />
632 }<br />
633 if {$parent < 0} then {<br />
634 set A(Pages/$p) $res<br />
635 return $p<br />
636 } else {<br />
637 return $res<br />
638 }<br />
639 }<br />
The <strong>basic</strong> problem <strong>in</strong> end_pages is to construct the actual tree so that it is<br />
reasonably well balanced. The approach used below is to build the tree from<br />
the leaves to the root, always collect as many children as possible <strong>in</strong>to each new<br />
node, and move nodes up (well, towards the root; there is some disagreement as<br />
to whether that is up or down) one level <strong>in</strong> the tree if the number <strong>of</strong> nodes <strong>in</strong> the<br />
current level is not divisible by the tree arity.<br />
By allow<strong>in</strong>g nodes to migrate to higher levels, one creates a risk that the tree<br />
becomes unbalanced. This is managed <strong>in</strong> the procedure below by keep<strong>in</strong>g track <strong>of</strong><br />
nodes that are saturated, i.e., nodes that have leaves at different depths <strong>in</strong> their<br />
subtrees. By not allow<strong>in</strong>g saturated nodes to migrate to a higher level, one can<br />
ensure that the trees that are constructed are balanced. It is furthermore fairly<br />
easy to keep track <strong>of</strong> this, because one can choose nodes for migration <strong>in</strong> such a<br />
way that the only node <strong>in</strong> a level that may be saturated is the last one. This is<br />
possible because the maximal number <strong>of</strong> nodes that one may need to migrate is<br />
one less than the arity, and thus the nodes that migrated to the previous level and<br />
the saturated nodes <strong>in</strong> the previous level are always few enough that the last node<br />
<strong>of</strong> the new level can be parent <strong>of</strong> them all.<br />
640 proc pdf::end_pages {F args} {<br />
641 upvar #0 [namespace current]::$F A<br />
642 array set Attr $A(Pages/attributes)<br />
643 array set Attr $args<br />
In the first level <strong>of</strong> Pages nodes to creat, there are some special complications one<br />
has to deal with, so on a first read-through, it is better to start with the ma<strong>in</strong><br />
case.<br />
In the first level, one faces two additional complication that are not present <strong>in</strong><br />
the ma<strong>in</strong> case. The first is that the child nodes are Page nodes rather than Pages<br />
nodes; this means one cannot move them up to the forrest list be<strong>in</strong>g constructed.<br />
The second complication is that the parents <strong>of</strong> the Page nodes were fixed before<br />
all <strong>of</strong> their children had been created. This requires some special handl<strong>in</strong>g <strong>of</strong> the<br />
last node: if it does not have a full set <strong>of</strong> children, then it will have to be moved<br />
up <strong>in</strong> a slightly unconventional manner.<br />
644 set limit $A(Pages/last)<br />
35
645 if {[llength $A(Pages/$limit)]/2 >= $A(Pages/arity)} then {<br />
646 set saturated 0<br />
647 } else {<br />
648 set d [expr {[llength $A(Pages/$limit)]/2}]<br />
649 set L [list]<br />
650 while {$d < $A(Pages/arity) && [<strong>in</strong>cr limit -1]>=1} {<br />
651 set L [l<strong>in</strong>sert $L 0 $limit]<br />
652 <strong>in</strong>cr d<br />
653 }<br />
654 set last $A(Pages/last)<br />
655 set A(Pages/$last)\<br />
[concat [make_pages_nodes $F $L $last] $A(Pages/$last)]<br />
657 <strong>in</strong>cr limit -1<br />
658 set saturated 1<br />
659 }<br />
660 set forrest [list]<br />
661 set L [list]<br />
662 for {set n 1} {$n = $A(Pages/arity)} then {<br />
665 lappend forrest [make_pages_nodes $F $L]<br />
666 set L [list]<br />
667 }<br />
668 }<br />
669 if {[llength $L]} then {eval [list lappend forrest] $L}<br />
670 if {$saturated} then {lappend forrest $last}<br />
In the ma<strong>in</strong> case, the numbers those Pages nodes that have not yet been given<br />
a parent are kept <strong>in</strong> the forrest list. The <strong>basic</strong> approach is to build the next<br />
level start<strong>in</strong>g from the left <strong>of</strong> this list, assign<strong>in</strong>g as many <strong>of</strong> nodes as allowed to<br />
each new parent node that is created.<br />
The first complication is that the length <strong>of</strong> forrest need not be divisible by<br />
the specified tree arity. In this case, some number <strong>of</strong> nodes (those that are <strong>in</strong><br />
L below when the entire forrest has been processed) are simply moved up to<br />
the next level. This leads however to the next complication: if the last node <strong>in</strong><br />
forrest is saturated, then it may not be moved up. The limit variable is used<br />
for reserv<strong>in</strong>g those nodes that will be made sibl<strong>in</strong>gs <strong>of</strong> this last node.<br />
671 while {[llength $forrest] >= $A(Pages/arity)} {<br />
672 set newforrest [list]<br />
673 set limit\<br />
[expr {[llength $forrest] - ($saturated ? $A(Pages/arity) : 0)}]<br />
675 set L [list]<br />
676 foreach n $forrest {<br />
677 lappend L $n<br />
678 if {[llength $L] >= $A(Pages/arity)} then {<br />
679 lappend newforrest [make_pages_nodes $F $L]<br />
680 set L [list]<br />
681 }<br />
682 if {[<strong>in</strong>cr limit -1]
683 }<br />
684 if {[llength $L]} then {eval [list lappend newforrest] $L}<br />
685 if {$saturated} then {<br />
686 lappend newforrest [make_pages_nodes $F [lrange $forrest\<br />
[format end-%d [expr {$A(Pages/arity)-1}]] end]]<br />
688 } elseif {[llength $L]} then {<br />
689 set saturated 1<br />
690 }<br />
691 set forrest $newforrest<br />
692 }<br />
Here starts the endgame. The root node is special <strong>in</strong> that it has no parent but<br />
may recieve many additional attributes.<br />
693 if {[llength $forrest] > 1} then {<br />
694 set root [make_pages_nodes $F $forrest]<br />
695 } else {<br />
696 set root [l<strong>in</strong>dex $forrest 0]<br />
697 }<br />
698 set count 0<br />
699 set kids [list array_obj]<br />
700 foreach {label c} $A(Pages/$root) {<br />
701 lappend kids [obj_ref $F $label]<br />
702 <strong>in</strong>cr count $c<br />
703 }<br />
704 set res $A(Pages/prefix)$root<br />
705 set Attr(/Count) [<strong>in</strong>t_obj $count]<br />
706 set Attr(/Kids) [eval $kids]<br />
707 set Attr(/Type) /Pages<br />
708 put_obj $F $res [eval [list dict_obj] [array get Attr]]<br />
709 return $res<br />
710 }<br />
Although the above algorithm generates balanced trees <strong>of</strong> m<strong>in</strong>imal size (m<strong>in</strong>imal<br />
number <strong>of</strong> nodes), it does not always generate trees <strong>of</strong> m<strong>in</strong>imal height—the<br />
height may be one more than the m<strong>in</strong>imum. What decides this is surpris<strong>in</strong>gly<br />
enough a k<strong>in</strong>d <strong>of</strong> odd/even phenonemon: the rema<strong>in</strong>der class modulo the arity<br />
m<strong>in</strong>us one <strong>of</strong> the total number <strong>of</strong> pages! If one is lucky with this, the tree height<br />
atta<strong>in</strong>s the m<strong>in</strong>imum, and if one is unlucky, it comes out one larger than the<br />
possible m<strong>in</strong>imum.<br />
The reason that the arity m<strong>in</strong>us one turns up is that every node reduces the<br />
number <strong>of</strong> nodes without a parent by precisely one less than the number <strong>of</strong> children<br />
<strong>of</strong> that node. The algorithm keeps assign<strong>in</strong>g the maximal number <strong>of</strong> children to<br />
each node, until the level is so small that all nodes can be made children <strong>of</strong> the<br />
root node. The catch <strong>in</strong> that is that the number <strong>of</strong> children <strong>of</strong> the root node is<br />
decided by the rema<strong>in</strong>der class modulo the arity m<strong>in</strong>us one <strong>of</strong> the total number<br />
<strong>of</strong> pages, and this may turn out to be too small to fit <strong>in</strong> the necessary number <strong>of</strong><br />
pages unless the tree height is allowed to exceed the theoretical m<strong>in</strong>imum.<br />
Experiments <strong>in</strong>dicate that by plac<strong>in</strong>g the node with the least number <strong>of</strong> children<br />
at the first level <strong>in</strong>stead, it is always possible to fit the tree with<strong>in</strong> the m<strong>in</strong>imal<br />
37
height (while keep<strong>in</strong>g balance and m<strong>in</strong>imal size), but this is a bit tricky to do when<br />
one does not know f<strong>in</strong>al number <strong>of</strong> pages from the start, and therefore the simpler<br />
algorithm above was chosen <strong>in</strong>stead.<br />
4.2 Lengths and rectangles<br />
The default “user space” coord<strong>in</strong>ate system <strong>in</strong> a <strong>PDF</strong> file, which is also the coord<strong>in</strong>ate<br />
system used for e.g. l<strong>in</strong>ks and dest<strong>in</strong>ations, uses the Postscript (or “big”)<br />
po<strong>in</strong>t as length unit. S<strong>in</strong>ce this is not the unit which most people are most comfortable<br />
with, it is useful to provide conversion from other units.<br />
pdf::unit_factor (array) The unit_factor array is <strong>in</strong>dexed by names <strong>of</strong> length units. Its entries are the<br />
lengths <strong>of</strong> these units <strong>in</strong> terms <strong>of</strong> Postscript po<strong>in</strong>ts. The conversion factors are<br />
those <strong>of</strong> TEX [3, Ch. 10].<br />
711 namespace eval pdf {<br />
712 set unit_factor(bp) 1.0<br />
713 set unit_factor(<strong>in</strong>) 72.0<br />
714 set unit_factor(pt) [expr {$unit_factor(<strong>in</strong>) / 72.27}]<br />
715 set unit_factor(pc) [expr {$unit_factor(pt) * 12}]<br />
716 set unit_factor(cm) [expr {$unit_factor(<strong>in</strong>) / 2.54}]<br />
717 set unit_factor(mm) [expr {$unit_factor(<strong>in</strong>) / 25.4}]<br />
718 set unit_factor(dd) [expr {$unit_factor(pt) * 1238 / 1157}]<br />
719 set unit_factor(cc) [expr {$unit_factor(dd) * 12}]<br />
720 }<br />
Additional units could be added, if need be. For example <strong>in</strong> a context where the<br />
size <strong>of</strong> a screen pixel can be determ<strong>in</strong>ed (and this size is unique, i.e., Tk is not<br />
operat<strong>in</strong>g aga<strong>in</strong>st multiple screens with possibly different resolutions), it may be<br />
convenient to def<strong>in</strong>e a px or pixel entry for this unit.<br />
pdf::length (proc) This procedure handles conversion from a physical unit to <strong>PDF</strong> units. The syntax<br />
is<br />
pdf::length {value} {unit}<br />
where {unit} is a unit that has an entry <strong>in</strong> the unit_factor array and {value} is<br />
the numeric value <strong>in</strong> that unit.<br />
721 proc pdf::length {value unit} {<br />
722 variable unit_factor<br />
723 return [expr {$value * $unit_factor($unit)}]<br />
724 }<br />
pdf::length_obj (proc) This procedure comb<strong>in</strong>es the unit conversion <strong>of</strong> the length procedure with the<br />
formatt<strong>in</strong>g <strong>of</strong> real_obj. The syntax is<br />
pdf::length_obj {value} {unit} {precision} ?<br />
where {unit} is a unit that has an entry <strong>in</strong> the unit_factor array, {value} is the<br />
numeric value <strong>in</strong> that unit, and {precision} is as for real_obj.<br />
725 proc pdf::length_obj {value unit args} {<br />
38
726 if {[llength $args]==0} then {<br />
727 real_obj [length $value $unit]<br />
728 } elseif {[llength $args]==1} then {<br />
729 real_obj [length $value $unit] [l<strong>in</strong>dex $args 0]<br />
730 } else {<br />
731 error "Too many arguments."<br />
732 }<br />
733 }<br />
A data structure that is common <strong>in</strong> <strong>PDF</strong> documents is the rectangle. Below<br />
are some commands for operat<strong>in</strong>g on these <strong>in</strong> the form <strong>of</strong> a four element list<br />
{left} {bottom} {right} {top}<br />
pdf::rect_obj (proc) This procedure returns the <strong>PDF</strong> object (a <strong>PDF</strong> array) for a rectangle. The syntax<br />
is<br />
pdf::rect_obj {rectangle}<br />
and the rectangle coord<strong>in</strong>ates are encoded us<strong>in</strong>g real_obj with the default precision.<br />
734 proc pdf::rect_obj {R} {<br />
735 array_obj [real_obj [l<strong>in</strong>dex $R 0]] [real_obj [l<strong>in</strong>dex $R 1]]\<br />
[real_obj [l<strong>in</strong>dex $R 2]] [real_obj [l<strong>in</strong>dex $R 3]]<br />
737 }<br />
pdf::<strong>in</strong>t_rect_obj (proc) This procedure returns the <strong>PDF</strong> object (a <strong>PDF</strong> array) for a rectangle, after hav<strong>in</strong>g<br />
rounded its coord<strong>in</strong>ates to <strong>in</strong>tegers. The syntax is<br />
pdf::<strong>in</strong>t_rect_obj {rectangle}<br />
and the rectangle coord<strong>in</strong>ates are encoded us<strong>in</strong>g <strong>in</strong>t_obj.<br />
738 proc pdf::<strong>in</strong>t_rect_obj {R} {<br />
739 array_obj [<strong>in</strong>t_obj [expr {round([l<strong>in</strong>dex $R 0])}]]\<br />
[<strong>in</strong>t_obj [expr {round([l<strong>in</strong>dex $R 1])}]]\<br />
[<strong>in</strong>t_obj [expr {round([l<strong>in</strong>dex $R 2])}]]\<br />
[<strong>in</strong>t_obj [expr {round([l<strong>in</strong>dex $R 3])}]]<br />
743 }<br />
pdf::make_rect (proc) The make_rect procedure is a generic tool for mak<strong>in</strong>g rectangles with specified<br />
dimensions. The syntax is<br />
pdf::make_rect {option} {value} {unit} ? +<br />
where {option} is one <strong>of</strong> the follow<strong>in</strong>g:<br />
-width Distance from left to right<br />
-height Distance from bottom to top<br />
-left left<br />
-right right<br />
39
-top top<br />
-bottom bottom<br />
-ll {left bottom}<br />
-lr {right bottom}<br />
-ul {left top}<br />
-ur {right top}<br />
-center midpo<strong>in</strong>t<br />
-midx x-coord<strong>in</strong>ate <strong>of</strong> midpo<strong>in</strong>t<br />
-midy y-coord<strong>in</strong>ate <strong>of</strong> midpo<strong>in</strong>t<br />
The way it works is that the list <strong>of</strong> options is processed left to right, every<br />
option contributes some <strong>in</strong>formation about the wanted rectangle, and when all<br />
four coord<strong>in</strong>ates are known the rectangle is returned. The {value} is, depend<strong>in</strong>g<br />
on the option, either a number or a po<strong>in</strong>t (list <strong>of</strong> two numbers). The {unit} is the<br />
unit <strong>of</strong> the {value}; it defaults to bp if omitted.<br />
In the first process<strong>in</strong>g step, horizontal and vertical <strong>in</strong>formation is separated<br />
and values are converted to bp units. Information is collected <strong>in</strong> two arrays X and<br />
Y, where the entries have the follow<strong>in</strong>g mean<strong>in</strong>gs<br />
lo low coord<strong>in</strong>ate (left or bottom)<br />
hi high coord<strong>in</strong>ate (right or top)<br />
mid midpo<strong>in</strong>t coord<strong>in</strong>ate<br />
sz size (width or height)<br />
744 proc pdf::make_rect {args} {<br />
745 variable unit_factor<br />
746 lappend args -break<br />
747 set i 0<br />
748 foreach a $args {<br />
749 if {[array size X]>=2 && [array size Y]>=2} then {break}<br />
750 if {$i == 0} then {<br />
751 set option $a<br />
752 } elseif {$i == 1} then {<br />
753 set value $a<br />
754 } else {<br />
755 if {[<strong>in</strong>fo exists unit_factor($a)]} then {<br />
756 set factor $unit_factor($a)<br />
757 } else {<br />
758 set i 0<br />
759 set factor 1.0<br />
760 }<br />
761 switch -- $option {<br />
762 -width {set X(sz) [expr {$value * $factor}]}<br />
763 -height {set Y(sz) [expr {$value * $factor}]}<br />
764 -left {set X(lo) [expr {$value * $factor}]}<br />
765 -right {set X(hi) [expr {$value * $factor}]}<br />
766 -bottom {set Y(lo) [expr {$value * $factor}]}<br />
767 -top {set Y(hi) [expr {$value * $factor}]}<br />
40
768 -midx {set X(mid) [expr {$value * $factor}]}<br />
769 -midy {set Y(mid) [expr {$value * $factor}]}<br />
770 -center {<br />
771 set X(mid) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />
772 set Y(mid) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />
773 }<br />
774 -ll {<br />
775 set X(lo) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />
776 set Y(lo) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />
777 }<br />
778 -lr {<br />
779 set X(hi) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />
780 set Y(lo) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />
781 }<br />
782 -ul {<br />
783 set X(lo) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />
784 set Y(hi) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />
785 }<br />
786 -ur {<br />
787 set X(hi) [expr {[l<strong>in</strong>dex $value 0] * $factor}]<br />
788 set Y(hi) [expr {[l<strong>in</strong>dex $value 1] * $factor}]<br />
789 }<br />
790 -end {<br />
791 error "Insufficient <strong>in</strong>formation"<br />
792 }<br />
793 default {<br />
794 error "Unknown option: $option"<br />
795 }<br />
796 }<br />
797 if {$i == 0} then {<br />
798 set option $a<br />
799 } else {<br />
800 set i -1<br />
801 }<br />
802 }<br />
803 <strong>in</strong>cr i<br />
804 }<br />
In the second process<strong>in</strong>g step, the two pieces <strong>of</strong> <strong>in</strong>formation that have been specified<br />
are used for comput<strong>in</strong>g the ones that are needed.<br />
805 if {[array size X] > 2} then {<br />
806 error "More than two horizontal data given."<br />
807 }<br />
808 if {[array size Y] > 2} then {<br />
809 error "More than two vertical data given."<br />
810 }<br />
811 foreach a {X Y} {<br />
812 switch -- [lsort [array names $a]] {lo sz} {<br />
813 set ${a}(hi) [expr {[set ${a}(lo)] + [set ${a}(sz)]}]<br />
814 } {hi sz} {<br />
41
815 set ${a}(lo) [expr {[set ${a}(hi)] - [set ${a}(sz)]}]<br />
816 } {lo mid} {<br />
817 set ${a}(hi) [expr {2*[set ${a}(mid)] - [set ${a}(lo)]}]<br />
818 } {hi mid} {<br />
819 set ${a}(lo) [expr {2*[set ${a}(mid)] - [set ${a}(hi)]}]<br />
820 } {mid sz} {<br />
821 set ${a}(lo) [expr {[set ${a}(mid)] - 0.5*[set ${a}(sz)]}]<br />
822 set ${a}(hi) [expr {[set ${a}(mid)] + 0.5*[set ${a}(sz)]}]<br />
823 }<br />
824 }<br />
825 return [list $X(lo) $Y(lo) $X(hi) $Y(hi)]<br />
826 }<br />
pdf::standard_rect (proc) The standard_rect procedure exchanges high and low coord<strong>in</strong>ates <strong>of</strong> a rectangle<br />
as needed to ensure that height and width are non-negative. The syntax is<br />
pdf::standard_rect {rect}<br />
and the return value is the standardized rectangle.<br />
827 proc pdf::standard_rect {R} {<br />
828 foreach {l b r t} $R {break}<br />
829 if {$l > $r} then {foreach {l r} [list $r $l] {break}}<br />
830 if {$b > $t} then {foreach {b t} [list $t $b] {break}}<br />
831 return [list $l $b $r $t]<br />
832 }<br />
pdf::<strong>in</strong>set_rect (proc) The <strong>in</strong>set_rect procedure moves the sides <strong>of</strong> a rectangle by specified lengths.<br />
There are three syntaxes<br />
pdf::<strong>in</strong>set_rect {rect} {amount} {unit}<br />
pdf::<strong>in</strong>set_rect {rect} {dx} {dy} {unit}<br />
pdf::<strong>in</strong>set_rect {rect} {dl} {db} {dr} {dt} {unit}<br />
where {rect} is the rectangle to <strong>in</strong>set and {unit} is the length unit <strong>in</strong> which the<br />
<strong>in</strong>set amount is specified. Positive amounts make the rectangle smaller, negative<br />
amounts make it larger. The result <strong>in</strong> the new rectangle.<br />
In the first form, all sides are moved by the same {amount}. In the second<br />
form, the left and right sides are moved by {dx} and the top and bottom sides<br />
are moved by {dy}. In the third form, the left, bottom, right, and top sides are<br />
moved by {dl}, {db}, {dr}, and {dt} respectively.<br />
833 proc pdf::<strong>in</strong>set_rect {R args} {<br />
834 if {[llength $args] != 2 && [llength $args] != 3 && [llength $args]\<br />
!= 5} then {<br />
836 error "Wrong number <strong>of</strong> arguments"<br />
837 }<br />
838 set factor [length 1 [l<strong>in</strong>dex $args end]]<br />
839 set args [lrange $args 0 end-1]<br />
840 set D [lrange [concat $args $args $args $args] 0 3]<br />
841 set res [list]<br />
42
842 foreach a $R da $D sign {1 1 -1 -1} {<br />
843 lappend res [expr {$a + $da*$factor*$sign}]<br />
844 }<br />
845 return $res<br />
846 }<br />
pdf::<strong>of</strong>fset_rect (proc) The <strong>of</strong>fset_rect procedure moves a rectangle <strong>in</strong> the plane, but preserves its<br />
width and height. The syntax is<br />
pdf:<strong>of</strong>fset_rect {rect} {dx} {dy} {unit} ?<br />
where {rect} is the rectangle, {dx} and {dy} are the horizontal and vertical displacement<br />
amounts, and {unit} is the unit (which defaults to bp) <strong>of</strong> these amounts.<br />
The return value is the <strong>of</strong>fset rectangle.<br />
847 proc pdf::<strong>of</strong>fset_rect {R dx dy {unit bp}} {<br />
848 set factor [length 1 $unit]<br />
849 set res [list]<br />
850 foreach {x y} $R {<br />
851 lappend res [expr {$x + $factor*$dx}] [expr {$y + $factor*$dy}]<br />
852 }<br />
853 return $res<br />
854 }<br />
pdf::wh_rect (proc) This procedure returns the list<br />
{left} {bottom} {width} {height}<br />
that corresponds to a rectangle. The syntax is<br />
pdf::wh_rect {rect}<br />
This procedure may be used to convert a rectangle to the list <strong>of</strong> operands required<br />
by the re <strong>PDF</strong> operator.<br />
855 proc pdf::wh_rect {rect} {<br />
856 list [l<strong>in</strong>dex $rect 0] [l<strong>in</strong>dex $rect 1]\<br />
[expr {[l<strong>in</strong>dex $rect 2] - [l<strong>in</strong>dex $rect 0]}]\<br />
[expr {[l<strong>in</strong>dex $rect 3] - [l<strong>in</strong>dex $rect 1]}]<br />
859 }<br />
4.3 Paper sizes<br />
pdf::paper_rect (array) It is convenient to have some standard paper sizes readily available as rectangles.<br />
The paper_rect array is <strong>in</strong>itialised with a couple <strong>of</strong> these.<br />
860 namespace eval pdf {<br />
861 set paper_rect(A4) [make_rect -ll {0 0} -width 210 mm -height 297 mm]<br />
863 set paper_rect(A4R)\<br />
[make_rect -ll {0 0} -width 297 mm -height 210 mm]<br />
865 set paper_rect(letter)\<br />
[make_rect -ll {0 0} -width 8.5 <strong>in</strong> -height 11 <strong>in</strong>]<br />
43
867 set paper_rect(legal)\<br />
[make_rect -ll {0 0} -width 8.5 <strong>in</strong> -height 14 <strong>in</strong>]<br />
869 }<br />
870 〈/pkg〉<br />
4.4 A multi-page example<br />
The purpose <strong>of</strong> the follow<strong>in</strong>g is ma<strong>in</strong>ly to generate a multipage document to test<br />
the page tree generation. Hence the actual document length (<strong>in</strong> pages) is factored<br />
out as a parameter set <strong>in</strong> the first l<strong>in</strong>e.<br />
871 〈∗example3〉<br />
872 set document_pages 19<br />
873 set F [pdf::rewrite_pdf {pages.pdf}]<br />
The next couple <strong>of</strong> l<strong>in</strong>es determ<strong>in</strong>e the page layout. The rectangle paper determ<strong>in</strong>es<br />
the page size. Every page conta<strong>in</strong>s as graphic the rectangle frame. foot_x<br />
and foot_y are coord<strong>in</strong>ates for the page foot.<br />
874 set paper $pdf::paper_rect(A4)<br />
875 set frame [pdf::<strong>in</strong>set_rect $paper 41 60 41 30 mm]<br />
876 set foot_y [expr {[l<strong>in</strong>dex $frame 1] - [pdf::length 36 pt]}]<br />
877 set foot_x [expr {0.5*[l<strong>in</strong>dex $frame 0] + 0.5*[l<strong>in</strong>dex $frame 2]}]<br />
This is preparation for writ<strong>in</strong>g the page numbers. First, a font is needed. Second,<br />
I want the page numbers to be centered. This means I need to measure the width<br />
<strong>of</strong> the str<strong>in</strong>g to show before show<strong>in</strong>g it. Luckily the digits <strong>in</strong> Times-Roman are all<br />
half an em wide. The 0.25*$size is thus half the width <strong>of</strong> a digit.<br />
878 pdf::put_obj $F "Times" [pdf::dict_obj /Type /Font /Subtype /Type1\<br />
/Name /F1 /BaseFont /Times-Roman /Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />
881 proc put_page_no {F num} {<br />
882 global foot_x foot_y<br />
883 pdf::name_resource Times $F Font [pdf::obj_ref $F "Times"]<br />
884 set size [pdf::length 10 pt]<br />
885 set thepage [format %d $num]<br />
886 pdf::pr<strong>in</strong>tf $F {BT %o %r Tf 1 0 0 1 %r2 Tm %s Tj ET} $Times $size\<br />
[expr {$foot_x - 0.25*$size*[str<strong>in</strong>g length $thepage]}] $foot_y\<br />
$thepage<br />
889 }<br />
890 pdf::beg<strong>in</strong>_pages $F "Pages\#" /MediaBox [pdf::rect_obj $paper]<br />
891 array unset Rez<br />
892 for {set page 1} {$page
pdf::file〈num〉<br />
(Outl<strong>in</strong>es/prefix)<br />
pdf::file〈num〉<br />
(Outl<strong>in</strong>es/last)<br />
pdf::file〈num〉<br />
(Outl<strong>in</strong>es/stack)<br />
pdf::file〈num〉<br />
(Outl<strong>in</strong>e/〈str<strong>in</strong>g〉)<br />
pdf::file〈num〉<br />
(Outl<strong>in</strong>e/parent)<br />
pdf::file〈num〉<br />
(Outl<strong>in</strong>e/first)<br />
pdf::file〈num〉<br />
(Outl<strong>in</strong>e/last)<br />
pdf::file〈num〉<br />
(Outl<strong>in</strong>e/count)<br />
pdf::file〈num〉<br />
(Outl<strong>in</strong>e/prev)<br />
pdf::file〈num〉<br />
(Outl<strong>in</strong>e/open)<br />
902 pdf::put_obj $F "The catalog" [pdf::dict_obj\<br />
903 /Type /Catalog\<br />
904 /Pages [pdf::obj_ref $F $Pages]]<br />
905 pdf::close_pdf $F "The catalog"<br />
906 〈/example3〉<br />
5 Document outl<strong>in</strong>e<br />
The “outl<strong>in</strong>e” <strong>of</strong> a <strong>PDF</strong> document is the table <strong>of</strong> contents that one <strong>of</strong>ten sees <strong>in</strong> a<br />
separate pane next to the pane actually show<strong>in</strong>g some page <strong>of</strong> the document. The<br />
procedures below handle build<strong>in</strong>g the data structure encod<strong>in</strong>g this, while leav<strong>in</strong>g<br />
it to the user to provide the l<strong>in</strong>ks to actual document content.<br />
5.1 Low-level stuff<br />
As with the Pages tree, build<strong>in</strong>g an outl<strong>in</strong>e tree <strong>in</strong>volves automatically creat<strong>in</strong>g<br />
nodes for the tree. (This node creation could have been made explicit, but there<br />
doesn’t seem to be much po<strong>in</strong>t <strong>in</strong> that.) To prevent that the labels <strong>of</strong> these clash<br />
with the labels <strong>of</strong> other objects, each outl<strong>in</strong>e node has a special prefix which<br />
is stored <strong>in</strong> the Outl<strong>in</strong>es/prefix entry <strong>of</strong> the file array. The rest <strong>of</strong> the label<br />
is a decimal number which is assigned sequentially. The most recently assigned<br />
number is kept <strong>in</strong> the Outl<strong>in</strong>es/last entry.<br />
The <strong>in</strong>formation kept track <strong>of</strong> for the build<strong>in</strong>g <strong>of</strong> an outl<strong>in</strong>e tree is dist<strong>in</strong>guished<br />
by scope as belong<strong>in</strong>g to one <strong>of</strong> two scopes. Th<strong>in</strong>gs that are relevant only to the<br />
current position <strong>in</strong> the tree are kept <strong>in</strong> Outl<strong>in</strong>e/〈str<strong>in</strong>g〉 entries, whereas th<strong>in</strong>gs<br />
that are more generally relevant are kept <strong>in</strong> Outl<strong>in</strong>es/〈str<strong>in</strong>g〉 entries (note the<br />
extra s). There is a stack <strong>in</strong> the Outl<strong>in</strong>es/stack entry onto which the current<br />
position can be pushed and later popped <strong>of</strong>f. This stack is a list where the last<br />
element is topmost. The elements themselves are the results <strong>of</strong> an array get for<br />
all Outl<strong>in</strong>e/〈str<strong>in</strong>g〉 entries <strong>in</strong> the file array.<br />
The current state <strong>of</strong> the tree construction is, <strong>in</strong> a sense, located slighly below the<br />
level where nodes are be<strong>in</strong>g added. The l<strong>in</strong>ks between this level and its parent are<br />
stored <strong>in</strong> the Outl<strong>in</strong>e/parent, Outl<strong>in</strong>e/first, and Outl<strong>in</strong>e/last entries <strong>in</strong> the<br />
file array. All three are numbers which when appended to the prefix produce the<br />
node labels.<br />
Outl<strong>in</strong>e/parent is the parent node, Outl<strong>in</strong>e/first is the first child <strong>of</strong> the<br />
parent, and Outl<strong>in</strong>e/last is the (currently) last child <strong>of</strong> the parent.<br />
Outl<strong>in</strong>e/count is the number <strong>of</strong> children <strong>of</strong> the parent, <strong>in</strong>clud<strong>in</strong>g any children<br />
<strong>of</strong> open child nodes. If this is zero then the current level <strong>in</strong> the outl<strong>in</strong>e hierarchy<br />
is empty, which amongst other th<strong>in</strong>gs implies that Outl<strong>in</strong>e/first and<br />
Outl<strong>in</strong>e/last has not been <strong>in</strong>itialised.<br />
Outl<strong>in</strong>e/prev is, if it is set, the number <strong>of</strong> the predecessor (<strong>in</strong> the same level)<br />
<strong>of</strong> the node currently be<strong>in</strong>g constructed. It should not be set when the node is the<br />
first node on that level.<br />
45
pdf::file〈num〉<br />
(Outl<strong>in</strong>e//〈name〉)<br />
pdf::put_outl<strong>in</strong>e_node<br />
(proc)<br />
pdf::outl<strong>in</strong>e_node_set<br />
(proc)<br />
Outl<strong>in</strong>e/open is a boolean for whether the current node should be open (i.e.,<br />
its children, if it will get any, will by default be visible).<br />
Explicit <strong>PDF</strong> object for outl<strong>in</strong>e dictionaries are also stored <strong>in</strong> Outl<strong>in</strong>e/ entries<br />
<strong>of</strong> the file array. In this case, the <strong>in</strong>dex suffix is the <strong>PDF</strong> name object for the<br />
dictionary key.<br />
The put_outl<strong>in</strong>e_node procedure outputs the current node <strong>of</strong> an outl<strong>in</strong>e to file.<br />
The syntax is<br />
pdf::put_outl<strong>in</strong>e_node {file} {option} {value} ∗<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file. An {option} {value} is a pair <strong>of</strong> <strong>PDF</strong><br />
objects, where the first is a name object. These objects will be placed <strong>in</strong> the <strong>PDF</strong><br />
dictionary object for this node, possibly overrid<strong>in</strong>g a previously specified pair with<br />
the same {option}. There is no particular return value.<br />
The procedure clears the Outl<strong>in</strong>e//〈name〉 part <strong>of</strong> the file array. It does<br />
not generate any /First, /Last, or /Count items. It does not <strong>in</strong>crement<br />
Outl<strong>in</strong>e/count, because that is the responsibility <strong>of</strong> the procedure that allocated<br />
a node number for this node.<br />
907 〈∗pkg〉<br />
908 proc pdf::put_outl<strong>in</strong>e_node {F args} {<br />
909 upvar #0 [namespace current]::$F A<br />
910 foreach name [array names A Outl<strong>in</strong>e//*] {<br />
911 set N([str<strong>in</strong>g range $name 8 end]) $A($name)<br />
912 }<br />
913 foreach {name value} $args {<br />
914 if {[str<strong>in</strong>g match /* $name]} then {<br />
915 set N($name) $value<br />
916 } else {<br />
917 error "Bad option ’$name’"<br />
918 }<br />
919 }<br />
920 if {[<strong>in</strong>fo exists A(Outl<strong>in</strong>e/prev)]} then {<br />
921 set N(/Prev) [obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/prev)]<br />
922 }<br />
923 set N(/Parent) [obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/parent)]<br />
924 put_obj $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/last) [<br />
925 eval [l<strong>in</strong>sert [array get N] 0 dict_obj]<br />
926 ]<br />
927 array unset A Outl<strong>in</strong>e//*<br />
928 }<br />
The outl<strong>in</strong>e_node_set procedure sets fields for the current outl<strong>in</strong>e node. The<br />
syntax is one <strong>of</strong><br />
pdf::outl<strong>in</strong>e_node_set {file} {args}<br />
pdf::outl<strong>in</strong>e_node_set {file} {option} {value} ∗<br />
46
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file. The first form is merely a variant<br />
on the second form, where the {args} is treated as a list <strong>of</strong> the arguments that<br />
should have followed the {file}.<br />
An {option} {value} is either a pair <strong>of</strong> <strong>PDF</strong> objects, where the first is a name<br />
object, or<br />
-open {boolean}<br />
The -open option sets the open state <strong>of</strong> the current node. Other options set an<br />
entry <strong>in</strong> the dictionary object for the current node. There is no particular return<br />
value.<br />
929 proc pdf::outl<strong>in</strong>e_node_set {F args} {<br />
930 upvar #0 [namespace current]::$F A<br />
931 if {[llength $args] == 1} then {set args [l<strong>in</strong>dex $args 0]}<br />
932 foreach {option value} $args {<br />
933 switch -glob -- $option /* {<br />
934 set A(Outl<strong>in</strong>e/$option) $value<br />
935 } -open {<br />
936 set A(Outl<strong>in</strong>e/open) $value<br />
937 } default {<br />
938 error "Bad option ’$option’"<br />
939 }<br />
940 }<br />
941 }<br />
pdf::outl<strong>in</strong>e_item (proc) The outl<strong>in</strong>e_item procedure creates a new outl<strong>in</strong>e node at the current level. If<br />
there already was a current outl<strong>in</strong>e node then that is output first. The syntax is<br />
pdf::outl<strong>in</strong>e_item {file} {title} {option} {value} ∗<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file and {title} is the title <strong>of</strong> the new<br />
outl<strong>in</strong>e node. An {option} {value} is either a pair <strong>of</strong> <strong>PDF</strong> objects, where the first<br />
is a name object, or<br />
-open {boolean}<br />
The -open option sets the open state <strong>of</strong> the new child node. The default for this<br />
option is 0. The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the new<br />
child node. There is no particular return value.<br />
942 proc pdf::outl<strong>in</strong>e_item {F title args} {<br />
943 upvar #0 [namespace current]::$F A<br />
The first step deals with whatever should hold the l<strong>in</strong>k to the new node, usually<br />
the previous current node, which will be output. This <strong>in</strong>volves allocat<strong>in</strong>g a number<br />
for the new node and therefore also <strong>in</strong>crement<strong>in</strong>g Outl<strong>in</strong>e/count.<br />
944 <strong>in</strong>cr A(Outl<strong>in</strong>es/last)<br />
945 if {$A(Outl<strong>in</strong>e/count)} then {<br />
946 put_outl<strong>in</strong>e_node $F /Next\<br />
[obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>es/last)]<br />
47
pdf::outl<strong>in</strong>e_beg<strong>in</strong>group<br />
(proc)<br />
pdf::outl<strong>in</strong>e_endgroup<br />
(proc)<br />
948 set A(Outl<strong>in</strong>e/prev) $A(Outl<strong>in</strong>e/last)<br />
949 } else {<br />
950 set A(Outl<strong>in</strong>e/first) $A(Outl<strong>in</strong>es/last)<br />
951 }<br />
952 <strong>in</strong>cr A(Outl<strong>in</strong>e/count)<br />
The second step is merely some entry <strong>in</strong>itialisation for the new node.<br />
953 set A(Outl<strong>in</strong>e/last) $A(Outl<strong>in</strong>es/last)<br />
954 set A(Outl<strong>in</strong>e//Title) [text_obj $title]<br />
955 set A(Outl<strong>in</strong>e/open) 0<br />
956 outl<strong>in</strong>e_node_set $F $args<br />
957 }<br />
The outl<strong>in</strong>e_beg<strong>in</strong>group procedure pushes the current state onto the stack and<br />
makes the current node the parent for the new current state. The syntax is<br />
pdf::outl<strong>in</strong>e_beg<strong>in</strong>group {file} {option} {value} ∗<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file. An {option} {value} is either a pair<br />
<strong>of</strong> <strong>PDF</strong> objects, where the first is a name object, or<br />
-open {boolean}<br />
The -open option sets the open state <strong>of</strong> the parent node. The default for this<br />
option is 0. The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the<br />
parent node. There is no particular return value.<br />
958 proc pdf::outl<strong>in</strong>e_beg<strong>in</strong>group {F args} {<br />
959 upvar #0 [namespace current]::$F A<br />
960 if {!$A(Outl<strong>in</strong>e/count)} then {<br />
961 error "There is no current node to make the parent <strong>of</strong> a new\<br />
group."<br />
963 }<br />
964 outl<strong>in</strong>e_node_set $F $args<br />
965 lappend A(Outl<strong>in</strong>es/stack) [array get A Outl<strong>in</strong>e/*]<br />
966 set parent $A(Outl<strong>in</strong>e/last)<br />
967 array unset A Outl<strong>in</strong>e/*<br />
968 set A(Outl<strong>in</strong>e/parent) $parent<br />
969 set A(Outl<strong>in</strong>e/count) 0<br />
970 }<br />
The outl<strong>in</strong>e_endgroup procedure ends the current level <strong>of</strong> outl<strong>in</strong>e nodes and pops<br />
one element <strong>of</strong>f the stack, thus turn<strong>in</strong>g the current parent back <strong>in</strong>to the current<br />
node, as it was before the match<strong>in</strong>g outl<strong>in</strong>e_endgroup. The syntax is<br />
pdf::outl<strong>in</strong>e_endgroup {file} {option} {value} ∗<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file. An {option} {value} is either a pair<br />
<strong>of</strong> <strong>PDF</strong> objects, where the first is a name object, or<br />
-open {boolean}<br />
48
pdf::file〈num〉<br />
(Outl<strong>in</strong>es/levels)<br />
pdf::outl<strong>in</strong>e_head<strong>in</strong>g<br />
(proc)<br />
The -open option can be used to override the open/closed state <strong>of</strong> the node, and<br />
can thus control whether the level <strong>of</strong> outl<strong>in</strong>e items that was ended will be open by<br />
default. The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the current<br />
node popped <strong>of</strong>f the stack, i.e., the previous parent node.<br />
971 proc pdf::outl<strong>in</strong>e_endgroup {F args} {<br />
972 upvar #0 [namespace current]::$F A<br />
973 set count $A(Outl<strong>in</strong>e/count)<br />
974 if {$count} then {<br />
975 put_outl<strong>in</strong>e_node $F<br />
976 lappend args /First [<br />
977 obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/first)<br />
978 ] /Last [<br />
979 obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/last)<br />
980 ]<br />
981 }<br />
982 array unset A Outl<strong>in</strong>e/*<br />
983 array set A [l<strong>in</strong>dex $A(Outl<strong>in</strong>es/stack) end]<br />
984 set A(Outl<strong>in</strong>es/stack) [lreplace $A(Outl<strong>in</strong>es/stack) end end]<br />
985 outl<strong>in</strong>e_node_set $F $args<br />
986 if {$count} then {<br />
987 if {$A(Outl<strong>in</strong>e/open)} then {<br />
988 set A(Outl<strong>in</strong>e//Count) [<strong>in</strong>t_obj $count]<br />
989 <strong>in</strong>cr A(Outl<strong>in</strong>e/count) $count<br />
990 } else {<br />
991 set A(Outl<strong>in</strong>e//Count) [<strong>in</strong>t_obj [expr {-$count}]]<br />
992 }<br />
993 }<br />
994 }<br />
5.2 An outl<strong>in</strong>e <strong>of</strong> head<strong>in</strong>gs<br />
One <strong>of</strong> the most common models for document structur<strong>in</strong>g is to have a family <strong>of</strong><br />
commands which say “make a level n head<strong>in</strong>g” and are supposed to be used at the<br />
beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> each section/subsection/. . . <strong>in</strong> the document. This model is useful<br />
also for construct<strong>in</strong>g a table <strong>of</strong> contents such as the outl<strong>in</strong>e.<br />
The Outl<strong>in</strong>es/levels entry <strong>of</strong> the file array is the list <strong>of</strong> head<strong>in</strong>g levels nested<br />
around the current outl<strong>in</strong>e node, with the last element be<strong>in</strong>g the level <strong>of</strong> that node.<br />
The list is empty before the first node has been <strong>in</strong>serted. Apart from that situation,<br />
the list length should always be one greater than that <strong>of</strong> the Outl<strong>in</strong>es/stack entry.<br />
The outl<strong>in</strong>e_head<strong>in</strong>g procedure adds a new head<strong>in</strong>g to the document outl<strong>in</strong>e.<br />
The syntax is<br />
pdf::outl<strong>in</strong>e_head<strong>in</strong>g {file} {level} {title} {option} {value} ∗<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file, {level} is the nom<strong>in</strong>al level <strong>of</strong> this<br />
item, and {title} is the title. An {option} {value} is either a pair <strong>of</strong> <strong>PDF</strong> objects,<br />
where the first is a name object, or<br />
49
-open {boolean}<br />
The -open option controls whether this item will be open by default, i.e., if its<br />
subitems (if there will be any) should be shown. It defaults to false (closed).<br />
The <strong>PDF</strong> objects will be placed <strong>in</strong> the dictionary object for the new item. These<br />
are what one should use to specify a dest<strong>in</strong>ation or equivalent for the outl<strong>in</strong>e item.<br />
The {level} is relative, and can be an arbitrary str<strong>in</strong>g. The way it is used<br />
is that if {level} is greater than the current level, then a new level is begun.<br />
Else if {level} is greater than the previous level, the item is a sibl<strong>in</strong>g <strong>of</strong> the last<br />
item and the current level is updated. Otherwise the current level is ended and<br />
the issue is reexam<strong>in</strong>ed. This dynamically adapts to the set <strong>of</strong> {level}s actually<br />
used <strong>in</strong> a document, even if these are not consecutive. It also gracefully copes<br />
with <strong>in</strong>consistencies such as forgett<strong>in</strong>g some head<strong>in</strong>g level at the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> a<br />
document.<br />
There is no particular return value.<br />
995 proc pdf::outl<strong>in</strong>e_head<strong>in</strong>g {F level title args} {<br />
996 upvar #0 [namespace current]::$F A<br />
997 if\<br />
{[llength $A(Outl<strong>in</strong>es/levels)] > [llength $A(Outl<strong>in</strong>es/stack)] + 1}\<br />
then {<br />
999 set A(Outl<strong>in</strong>es/levels)\<br />
[lrange $A(Outl<strong>in</strong>es/levels) 0 [llength $A(Outl<strong>in</strong>es/stack)]]<br />
1001 }<br />
1002 while {<br />
1003 $level 1<br />
1005 } {<br />
1006 outl<strong>in</strong>e_endgroup $F<br />
1007 set A(Outl<strong>in</strong>es/levels) [lreplace $A(Outl<strong>in</strong>es/levels) end end]<br />
1008 }<br />
1009 if {$A(Outl<strong>in</strong>e/count) && $level > [l<strong>in</strong>dex $A(Outl<strong>in</strong>es/levels) end]}\<br />
then {<br />
1011 lappend A(Outl<strong>in</strong>es/levels) $level<br />
1012 outl<strong>in</strong>e_beg<strong>in</strong>group $F<br />
1013 } else {<br />
1014 set A(Outl<strong>in</strong>es/levels)\<br />
[lreplace $A(Outl<strong>in</strong>es/levels) end end $level]<br />
1016 }<br />
1017 eval [l<strong>in</strong>sert $args 0 outl<strong>in</strong>e_item $F $title]<br />
1018 }<br />
pdf::beg<strong>in</strong>_outl<strong>in</strong>e (proc) The beg<strong>in</strong>_outl<strong>in</strong>e procedure <strong>in</strong>itialises the outl<strong>in</strong>e system for a <strong>PDF</strong> file. The<br />
syntax is<br />
pdf::beg<strong>in</strong>_outl<strong>in</strong>e {file} {prefix}<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file and {prefix} is a prefix that will be<br />
used for all labels for <strong>in</strong>direct objects that the outl<strong>in</strong>e system creates. There is no<br />
particular return value.<br />
1019 proc pdf::beg<strong>in</strong>_outl<strong>in</strong>e {F prefix} {<br />
50
1020 upvar #0 [namespace current]::$F A<br />
1021 set A(Outl<strong>in</strong>es/prefix) $prefix<br />
1022 set A(Outl<strong>in</strong>es/last) 1<br />
1023 set A(Outl<strong>in</strong>es/stack) [list]<br />
1024 set A(Outl<strong>in</strong>es/levels) [list]<br />
1025 set A(Outl<strong>in</strong>e/parent) 1<br />
1026 set A(Outl<strong>in</strong>e/count) 0<br />
1027 }<br />
pdf::end_outl<strong>in</strong>e (proc) The end_outl<strong>in</strong>e procedure f<strong>in</strong>ishes <strong>of</strong>f the outl<strong>in</strong>e tree for a <strong>PDF</strong> file and returns<br />
the label <strong>of</strong> the root node. The syntax is<br />
pdf::end_outl<strong>in</strong>e {file}<br />
where {file} is the identifier <strong>of</strong> the <strong>PDF</strong> file.<br />
1028 proc pdf::end_outl<strong>in</strong>e {F} {<br />
1029 upvar #0 [namespace current]::$F A<br />
1030 while {[llength $A(Outl<strong>in</strong>es/stack)]} {<br />
1031 outl<strong>in</strong>e_endgroup $F<br />
1032 }<br />
1033 put_outl<strong>in</strong>e_node $F<br />
1034 set label "$A(Outl<strong>in</strong>es/prefix)1"<br />
1035 set call [list dict_obj /Type /Outl<strong>in</strong>es]<br />
1036 if {[<strong>in</strong>fo exists A(Outl<strong>in</strong>e/first)]} then {<br />
1037 lappend call /First [<br />
1038 obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/first)<br />
1039 ] /Last [<br />
1040 obj_ref $F $A(Outl<strong>in</strong>es/prefix)$A(Outl<strong>in</strong>e/last)<br />
1041 ] /Count [<strong>in</strong>t_obj $A(Outl<strong>in</strong>e/count)]<br />
1042 }<br />
1043 put_obj $F $label [eval $call]<br />
1044 return $label<br />
1045 }<br />
1046 〈/pkg〉<br />
5.3 An outl<strong>in</strong>e example<br />
The purpose <strong>of</strong> the follow<strong>in</strong>g is to test the outl<strong>in</strong>e generation. The structure,<br />
which is perhaps somewhat atypical, is to first generate all the document contents<br />
and then generate an outl<strong>in</strong>e with l<strong>in</strong>ks <strong>in</strong>to the document.<br />
1047 〈∗example4〉<br />
1048 set F [pdf::rewrite_pdf {outl<strong>in</strong>e.pdf}]<br />
The idea for the page contents is that this should consist <strong>of</strong> the numbers 1–12,<br />
each rather large, on a page <strong>of</strong> its own, and <strong>in</strong> a different font.<br />
1049 pdf::beg<strong>in</strong>_pages $F "Pages\#"\<br />
1050 /MediaBox [pdf::rect_obj $pdf::paper_rect(A4)]<br />
1051 set page 1<br />
1052 foreach font {<br />
51
1053 Times-Roman Helvetica Courier<br />
1054 Times-Bold Helvetica-Bold Courier-Bold<br />
1055 Times-Italic Helvetica-Oblique Courier-Oblique<br />
1056 Times-BoldItalic Helvetica-BoldOblique Courier-BoldOblique<br />
1057 } {<br />
1058 pdf::put_obj $F $font [pdf::dict_obj /Type /Font /Subtype /Type1\<br />
/BaseFont [pdf::name_obj $font] /Encod<strong>in</strong>g /MacRomanEncod<strong>in</strong>g]<br />
1061 pdf::beg<strong>in</strong>_contents "" $F "Page $page contents"<br />
1062 pdf::name_resource fid $F Font [pdf::obj_ref $F $font]<br />
1063 pdf::pr<strong>in</strong>tf $F {BT %o %r Tf 1 0 0 1 %r2 Tm %s Tj ET} $fid\<br />
[pdf::length 10 cm] [pdf::length 5 cm] [pdf::length 10 cm] $page<br />
1067 pdf::end_contents Rez $F<br />
1068 pdf::shipout $F "Page $page" /Contents\<br />
[pdf::obj_ref $F "Page $page contents"] /Resources\<br />
[pdf::resource_dict_obj Rez]<br />
1071 unset Rez<br />
1072 <strong>in</strong>cr page<br />
1073 }<br />
1074 set Pages [pdf::end_pages $F]<br />
1075 pdf::beg<strong>in</strong>_outl<strong>in</strong>e $F "TOC\#"<br />
1076 pdf::outl<strong>in</strong>e_head<strong>in</strong>g $F 1 "Numeric" /Dest [<br />
1077 pdf::array_obj [pdf::obj_ref $F "Page 1"] /Fit<br />
1078 ]<br />
1079 for {set page 1} {$page
1104 pdf::array_obj [pdf::obj_ref $F "Page $page"] /XYZ\<br />
[pdf::null_obj] [pdf::null_obj] [pdf::real_obj $page]<br />
1106 ]<br />
1107 <strong>in</strong>cr page<br />
1108 }<br />
1109 pdf::outl<strong>in</strong>e_head<strong>in</strong>g $F 1 "Russian" /Dest [<br />
1110 pdf::array_obj [pdf::obj_ref $F "Page 1"] /FitV\<br />
[pdf::length_obj 5 cm]<br />
1112 ]<br />
1113 set page 1<br />
1114 foreach {Ruslish name} {<br />
1115 Od<strong>in</strong> \u041E\u0434\u0438\u043D<br />
1116 Dva \u0414\u0432\u0430<br />
1117 Tri \u0422\u0440\u0438<br />
1118 !Cetyre \u0427\u0435\u0442\u044B\u0440\u0435<br />
1119 P!ath \u041F\u044F\u0442\u044C<br />
1120 !Sesth \u0428\u0435\u0441\u0442\u044C<br />
1121 Semh \u0421\u0435\u043C\u044C<br />
1122 Vosemh \u0412\u043E\u0441\u0435\u043C\u044C<br />
1123 Dev!ath \u0414\u0435\u0432\u044F\u0442\u044C<br />
1124 Des!ath \u0414\u0435\u0441\u044F\u0442\u044C<br />
1125 Od<strong>in</strong>nadcath<br />
1126 \u041E\u0434\u0438\u043D\u043D\u0430\u0434\u0446\u0430\u0442\u044C<br />
1127 Dvenadcath<br />
1128 \u0414\u0432\u0435\u043D\u0430\u0434\u0446\u0430\u0442\u044C<br />
1129 } {<br />
1130 pdf::outl<strong>in</strong>e_head<strong>in</strong>g $F 2 $name /Dest [<br />
1131 pdf::array_obj [pdf::obj_ref $F "Page $page"] /XYZ\<br />
[pdf::null_obj] [pdf::null_obj] [pdf::null_obj]<br />
1133 ]<br />
1134 <strong>in</strong>cr page<br />
1135 }<br />
1136 set outl<strong>in</strong>e [pdf::end_outl<strong>in</strong>e $F]<br />
1137 pdf::put_obj $F "The catalog" [pdf::dict_obj\<br />
1138 /Type /Catalog\<br />
1139 /Pages [pdf::obj_ref $F $Pages]\<br />
1140 /PageMode /UseOutl<strong>in</strong>es\<br />
1141 /Outl<strong>in</strong>es [pdf::obj_ref $F $outl<strong>in</strong>e]]<br />
1142 pdf::close_pdf $F "The catalog"<br />
1143 〈/example4〉<br />
References<br />
[1] Adobe Systems Incorporated: Portable Document Format Reference<br />
Manual, version 1.3 (second edition), Addison–Wesley, 1999; ISBN 0-<br />
201-61588-6; http://partners.adobe.com/public/developer/en/pdf/<br />
<strong>PDF</strong>Reference13.pdf.<br />
53
[2] Adobe Systems Incorporated: <strong>PDF</strong> Reference, fourth edition: Adobe<br />
Portable Document Format version 1.5.; http://partners.adobe.com/<br />
public/developer/en/pdf/<strong>PDF</strong>Reference15 v5.pdf.<br />
[3] Donald E. Knuth, Duane Bibby (illustrations): The TEXbook, Addison-Wesley,<br />
1991, ISBN 0-201-13448-9; also volume A <strong>of</strong> Computers and typesett<strong>in</strong>g,<br />
ISBN 0-201-13447-0.<br />
<strong>Index</strong><br />
All numbers <strong>in</strong> this <strong>in</strong>dex are page numbers. Underl<strong>in</strong>ed entries refer to places<br />
where the item <strong>in</strong> question is def<strong>in</strong>ed.<br />
A<br />
array_obj (proc), pdf namespace 4, 15<br />
B<br />
beg<strong>in</strong>_contents (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . 5, 25<br />
beg<strong>in</strong>_outl<strong>in</strong>e (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . . . 6, 50<br />
beg<strong>in</strong>_pages (proc), pdf namespace 6, 33<br />
beg<strong>in</strong>_stream (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . . . 4, 18<br />
boolean_obj (proc), pdf namespace 4, 12<br />
C<br />
close_pdf (proc), pdf namespace 2, 21<br />
D<br />
date_obj (proc), pdf namespace . 4, 16<br />
dict_obj (proc), pdf namespace . 4, 15<br />
E<br />
end_contents (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . . . 5, 25<br />
end_outl<strong>in</strong>e (proc), pdf namespace 6, 51<br />
end_pages (proc), pdf namespace 6, 34<br />
end_stream (proc), pdf namespace 4, 18<br />
F<br />
file〈num〉 (array), pdf namespace . 17<br />
?〈reference label〉 . . . . . . . . . . 18<br />
!〈reference label〉 . . . . . . . . . . 17<br />
backlog . . . . . . . . . . . . . . . . 19<br />
current_stream . . . . . . . . . . . 18<br />
last_object_num . . . . . . . . . . 17<br />
Outl<strong>in</strong>e//〈name〉 . . . . . . . . . 46<br />
Outl<strong>in</strong>e/count . . . . . . . . . . . 45<br />
54<br />
Outl<strong>in</strong>e/first . . . . . . . . . . . 45<br />
Outl<strong>in</strong>e/last . . . . . . . . . . . . 45<br />
Outl<strong>in</strong>e/open . . . . . . . . . . . . 45<br />
Outl<strong>in</strong>e/parent . . . . . . . . . . . 45<br />
Outl<strong>in</strong>e/prev . . . . . . . . . . . . 45<br />
Outl<strong>in</strong>e/〈str<strong>in</strong>g〉 . . . . . . . . . . 45<br />
Outl<strong>in</strong>es/last . . . . . . . . . . . 45<br />
Outl<strong>in</strong>es/levels . . . . . . . . . . 49<br />
Outl<strong>in</strong>es/prefix . . . . . . . . . . 45<br />
Outl<strong>in</strong>es/stack . . . . . . . . . . . 45<br />
Pages/arity . . . . . . . . . . . . . 32<br />
Pages/attributes . . . . . . . . . 32<br />
Pages/last . . . . . . . . . . . . . . 32<br />
Pages/prefix . . . . . . . . . . . . 32<br />
Pages/〈num〉 . . . . . . . . . . . . . 32<br />
Resources/〈type〉 . . . . . . . . . . 25<br />
H<br />
has_resource? (proc), pdf namespace 26<br />
hexstr<strong>in</strong>g_obj (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . . . 3, 13<br />
I<br />
<strong>in</strong>set_rect (proc), pdf namespace 10, 42<br />
<strong>in</strong>t_obj (proc), pdf namespace . . 3, 12<br />
<strong>in</strong>t_rect_obj (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . . . 11, 39<br />
L<br />
length (proc), pdf namespace . . . 9, 38<br />
length_obj (proc), pdf namespace 9, 38<br />
M<br />
make_pages_nodes (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . 34<br />
make_rect (proc), pdf namespace 10, 39
N<br />
name_obj (proc), pdf namespace . 4, 14<br />
name_resource (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . . . 5, 26<br />
null_obj (proc), pdf namespace . 4, 16<br />
O<br />
obj_ref (proc), pdf namespace . . 2, 17<br />
<strong>of</strong>fset_rect (proc), pdf namespace .<br />
. . . . . . . . . . . . . . . . . . . . 10, 43<br />
outl<strong>in</strong>e_beg<strong>in</strong>group (proc), pdf<br />
namespace . . . . . . . . . . . . 7, 48<br />
outl<strong>in</strong>e_endgroup (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . 7, 48<br />
outl<strong>in</strong>e_head<strong>in</strong>g (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . 7, 49<br />
outl<strong>in</strong>e_item (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . . . 7, 47<br />
outl<strong>in</strong>e_node_set (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . 7, 46<br />
P<br />
paper_rect (array), pdf namespace .<br />
. . . . . . . . . . . . . . . . . . . . 11, 43<br />
precision (var.), pdf namespace . 3, 12<br />
pr<strong>in</strong>tf (proc), pdf namespace . . . 8, 31<br />
put_obj (proc), pdf namespace . . 2, 19<br />
55<br />
put_outl<strong>in</strong>e_node (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . 46<br />
R<br />
real_obj (proc), pdf namespace . 3, 12<br />
rect_obj (proc), pdf namespace . 11, 39<br />
require_procsets (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . 28<br />
resource_dict_obj (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . 5, 24<br />
rewrite_pdf (proc), pdf namespace 2, 20<br />
S<br />
shipout (proc), pdf namespace . . 6, 33<br />
spr<strong>in</strong>tf (proc), pdf namespace . . 8, 29<br />
standard_rect (proc), pdf namespace<br />
. . . . . . . . . . . . . . . . . . . . 11, 42<br />
str<strong>in</strong>g_obj (proc), pdf namespace 3, 12<br />
T<br />
text_obj (proc), pdf namespace . 4, 14<br />
U<br />
unit_factor (array), pdf namespace 38<br />
W<br />
wh_rect (proc), pdf namespace . . 11, 43