The Cassetin Project — Towards an Inventory of Ancient ... - TUG
The Cassetin Project — Towards an Inventory of Ancient ... - TUG
The Cassetin Project — Towards an Inventory of Ancient ... - TUG
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
sets. Yet, they are present in foundry specimens, books<br />
or evenin grammars ...<br />
Unicode, Characters <strong>an</strong>d Glyphs<br />
Unicode [15] makes a strong difference between characters<br />
(abstract linguistic entities) <strong>an</strong>d glyphs (a possible<br />
physicalstylisticrepresentationorrendition<strong>of</strong>theseentities).<br />
Veryfewcleverpapersgiveagoodexpl<strong>an</strong>ation<strong>of</strong><br />
thoseconcepts;letuscitehereonebyKenWhistler,the<br />
technical director <strong>of</strong> Unicode [2] <strong>an</strong>d one by a typographer,<br />
John Hudson [12]. On the other h<strong>an</strong>d, there are<br />
also good papers that say that Unicode made the wrong<br />
choice <strong>an</strong>d that characters <strong>an</strong>d glyphs are not so easily<br />
different [10, 11]. We would like to add that “types”<br />
(with the usual typographic me<strong>an</strong>ing) are neither characters,<br />
nor glyphs. 11<br />
An import<strong>an</strong>t point is that the Unicode principle<br />
thatseparatesglyphs<strong>an</strong>dcharactershasbeenhistorically<br />
violated by <strong>an</strong>other one: Unicode is based on previous<br />
encodingsystems(proprietaryorinternationalst<strong>an</strong>dards)<br />
whereligatureswerepresent. IfUnicodewascle<strong>an</strong>,even<br />
the sign “&” should not be there! However we c<strong>an</strong> be<br />
suspiciouswhy“long s”<strong>an</strong>deven“ligaturest”havebeen<br />
veryrecentlyadded<strong>an</strong>dnot “ligaturect”!<br />
Imaginethe dialog:<br />
–“HowcouldIdescribe 12 Fertel’scase(figure7)<strong>an</strong>dits<br />
˛eusingUnicode?” I ask.<br />
– Answer from Unicode specialist: “Use latin small<br />
letter e with ogonek, U+0119.”<br />
–“No,Isay,Fertel’scharacterisnotthatcharacter,there<br />
is the same glyph resembl<strong>an</strong>ce as with latin capital<br />
a <strong>an</strong>d greek capital alpha, but they are different<br />
characters <strong>an</strong>d Unicodeencodesthemseparately.”<br />
–“Whydon’tyouencodethischaracteraslettere<strong>an</strong>da<br />
combiningdiacriticogonek?”<br />
– “For it is not <strong>an</strong> ogonek,rather a kind<strong>of</strong> breve,”I<strong>an</strong>swer.<br />
– “OK,” he says, “your ˛e is a glyph <strong>of</strong> some latin<br />
small letter with breve.”<br />
Idisagree,it’snotthesamebreveastheoneusedbyFertelin<strong>an</strong>othercase:<br />
“ĕ”,soit’snotthesamecharacter...<br />
And now, if you look at the alphabet given by the same<br />
Baïf, you c<strong>an</strong> see <strong>an</strong> “a with raising tail” that is<br />
11. <strong>The</strong>re are m<strong>an</strong>y stylistic vari<strong>an</strong>ts <strong>of</strong> our “˛e”! On the<br />
otherh<strong>an</strong>d,Unicodespeaksaboutrendition<strong>of</strong>abstractcharacters.<br />
However, what about the other way: when sc<strong>an</strong>ning documents,<br />
printedcharactersexistbeforethecorresponding“abstract”character,<br />
they are not only images <strong>of</strong> abstract characters, they are<br />
charactersbythemselvesat<strong>an</strong>intermediarylevelbetweenglyphs<br />
<strong>an</strong>d linguistic entities.<br />
12. Even if “[t]he Unicode St<strong>an</strong>dard is explicitly not aimed at<br />
being a system for facsimile representation <strong>of</strong> text” [2], one may<br />
need to quote such a character. Actually, it is not only a Unicode<br />
problem!<br />
<strong>The</strong><strong>Cassetin</strong> <strong>Project</strong><br />
rather a nasal O (its place in the alphabet is just before<br />
thePletter). Let us restart thesame dialog...<br />
Lastpoint: Unicodeknowsoldl<strong>an</strong>guagessuchasthe<br />
Runes or Ogham. Why should it ignore old Europe<strong>an</strong><br />
l<strong>an</strong>guages<strong>an</strong>dtheir writingusedfor centuries?<br />
<strong>The</strong> <strong>Cassetin</strong> <strong>Project</strong><br />
BeinginvolvedindigitizationprojectssuchasFournier’s<br />
M<strong>an</strong>uel typographique, 13 I am continuously confronted<br />
with such problems <strong>of</strong> coding or naming old 14 characters.<br />
Discussionswithm<strong>an</strong>ypeopleinvolvedinsuchtasks<br />
pushed me recently to undertake a project 15 to inventory<br />
these types <strong>an</strong>d try to establish a st<strong>an</strong>dardized list<br />
<strong>of</strong> names or ... codes.<br />
Its main aims are:<br />
<strong>Inventory</strong><strong>of</strong>types Prepare<strong>an</strong> inventory<strong>of</strong> all types used<br />
intexts 16 printedin Europe<strong>an</strong> 17 l<strong>an</strong>guages.<br />
Typical characters are<br />
• Ligatures,suchastheonesalreadyquotedhere(sh,<br />
si,st,...) <strong>an</strong>dm<strong>an</strong>yotherones(liketheHungari<strong>an</strong><br />
gz...).<br />
• V<strong>an</strong>ished characters, such as the “˛e,” the tailed A,<br />
etc.<br />
• Accented characters (like the old Sp<strong>an</strong>ish conson<strong>an</strong>ts).<br />
• Abbreviations.<br />
• Special characters such as verset <strong>an</strong>d respons (these<br />
twoareinUnicode,butm<strong>an</strong>yotherspecialcharacters<br />
arenot).<br />
• Historical typographical characters 18 (that are not<br />
alreadyinUnicode)such as raisedletters.<br />
Thisinventoryisbased on<br />
• Previousstudies,suchas[3,4,5,7],includingWeb<br />
pagessuchas Bolton’son cases [6].<br />
• Specimenspublishedby foundries.<br />
• <strong>Ancient</strong>books.<br />
• <strong>The</strong> MUFI project for m<strong>an</strong>uscripts!<br />
13. LikeMoxon’s, afamous18thcenturybookontype-cutting<br />
<strong>an</strong>d typefounding. See [9, 13] <strong>an</strong>d http://www.irisa.fr/<br />
faqtypo/BiViTy.<br />
14. Old me<strong>an</strong>s here before DTP! A typical example is the use,<br />
stillcurrent in 1950, <strong>of</strong> theabbreviation “crossed K” thatrepresents<br />
the Breton “ker” occurring in m<strong>an</strong>y names.<br />
15. Temporarily called CASSETIN: “cassetin” is the French<br />
name <strong>of</strong> case boxes. It c<strong>an</strong> st<strong>an</strong>d for “CASSE Type encodINg”<br />
... See also [1].<br />
16. One problem not yet solved: should we consider all types,<br />
even the ones used outside <strong>of</strong> plain text, such as ornaments <strong>an</strong>d<br />
rules? I do not think so, however the limits are not yet fixed!<br />
17. Thisisagain<strong>an</strong>unsolvedquestion: Whichl<strong>an</strong>guagesdowe<br />
consider? Latinones? WhataboutCyrillic,Greek,Hebrew,Arabic,<br />
Syriac, etc.? Actually, today it is only a matter <strong>of</strong> specialists<br />
working in this project ...<br />
18. We do not dare to speak about small caps!<br />
<strong>TUG</strong>boat,Volume24 (2003), No.3—Proceedings<strong>of</strong> EuroTEX2003 317