11.12.2012 Views

The Cassetin Project — Towards an Inventory of Ancient ... - TUG

The Cassetin Project — Towards an Inventory of Ancient ... - TUG

The Cassetin Project — Towards an Inventory of Ancient ... - TUG

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

sets. Yet, they are present in foundry specimens, books<br />

or evenin grammars ...<br />

Unicode, Characters <strong>an</strong>d Glyphs<br />

Unicode [15] makes a strong difference between characters<br />

(abstract linguistic entities) <strong>an</strong>d glyphs (a possible<br />

physicalstylisticrepresentationorrendition<strong>of</strong>theseentities).<br />

Veryfewcleverpapersgiveagoodexpl<strong>an</strong>ation<strong>of</strong><br />

thoseconcepts;letuscitehereonebyKenWhistler,the<br />

technical director <strong>of</strong> Unicode [2] <strong>an</strong>d one by a typographer,<br />

John Hudson [12]. On the other h<strong>an</strong>d, there are<br />

also good papers that say that Unicode made the wrong<br />

choice <strong>an</strong>d that characters <strong>an</strong>d glyphs are not so easily<br />

different [10, 11]. We would like to add that “types”<br />

(with the usual typographic me<strong>an</strong>ing) are neither characters,<br />

nor glyphs. 11<br />

An import<strong>an</strong>t point is that the Unicode principle<br />

thatseparatesglyphs<strong>an</strong>dcharactershasbeenhistorically<br />

violated by <strong>an</strong>other one: Unicode is based on previous<br />

encodingsystems(proprietaryorinternationalst<strong>an</strong>dards)<br />

whereligatureswerepresent. IfUnicodewascle<strong>an</strong>,even<br />

the sign “&” should not be there! However we c<strong>an</strong> be<br />

suspiciouswhy“long s”<strong>an</strong>deven“ligaturest”havebeen<br />

veryrecentlyadded<strong>an</strong>dnot “ligaturect”!<br />

Imaginethe dialog:<br />

–“HowcouldIdescribe 12 Fertel’scase(figure7)<strong>an</strong>dits<br />

˛eusingUnicode?” I ask.<br />

– Answer from Unicode specialist: “Use latin small<br />

letter e with ogonek, U+0119.”<br />

–“No,Isay,Fertel’scharacterisnotthatcharacter,there<br />

is the same glyph resembl<strong>an</strong>ce as with latin capital<br />

a <strong>an</strong>d greek capital alpha, but they are different<br />

characters <strong>an</strong>d Unicodeencodesthemseparately.”<br />

–“Whydon’tyouencodethischaracteraslettere<strong>an</strong>da<br />

combiningdiacriticogonek?”<br />

– “For it is not <strong>an</strong> ogonek,rather a kind<strong>of</strong> breve,”I<strong>an</strong>swer.<br />

– “OK,” he says, “your ˛e is a glyph <strong>of</strong> some latin<br />

small letter with breve.”<br />

Idisagree,it’snotthesamebreveastheoneusedbyFertelin<strong>an</strong>othercase:<br />

“ĕ”,soit’snotthesamecharacter...<br />

And now, if you look at the alphabet given by the same<br />

Baïf, you c<strong>an</strong> see <strong>an</strong> “a with raising tail” that is<br />

11. <strong>The</strong>re are m<strong>an</strong>y stylistic vari<strong>an</strong>ts <strong>of</strong> our “˛e”! On the<br />

otherh<strong>an</strong>d,Unicodespeaksaboutrendition<strong>of</strong>abstractcharacters.<br />

However, what about the other way: when sc<strong>an</strong>ning documents,<br />

printedcharactersexistbeforethecorresponding“abstract”character,<br />

they are not only images <strong>of</strong> abstract characters, they are<br />

charactersbythemselvesat<strong>an</strong>intermediarylevelbetweenglyphs<br />

<strong>an</strong>d linguistic entities.<br />

12. Even if “[t]he Unicode St<strong>an</strong>dard is explicitly not aimed at<br />

being a system for facsimile representation <strong>of</strong> text” [2], one may<br />

need to quote such a character. Actually, it is not only a Unicode<br />

problem!<br />

<strong>The</strong><strong>Cassetin</strong> <strong>Project</strong><br />

rather a nasal O (its place in the alphabet is just before<br />

thePletter). Let us restart thesame dialog...<br />

Lastpoint: Unicodeknowsoldl<strong>an</strong>guagessuchasthe<br />

Runes or Ogham. Why should it ignore old Europe<strong>an</strong><br />

l<strong>an</strong>guages<strong>an</strong>dtheir writingusedfor centuries?<br />

<strong>The</strong> <strong>Cassetin</strong> <strong>Project</strong><br />

BeinginvolvedindigitizationprojectssuchasFournier’s<br />

M<strong>an</strong>uel typographique, 13 I am continuously confronted<br />

with such problems <strong>of</strong> coding or naming old 14 characters.<br />

Discussionswithm<strong>an</strong>ypeopleinvolvedinsuchtasks<br />

pushed me recently to undertake a project 15 to inventory<br />

these types <strong>an</strong>d try to establish a st<strong>an</strong>dardized list<br />

<strong>of</strong> names or ... codes.<br />

Its main aims are:<br />

<strong>Inventory</strong><strong>of</strong>types Prepare<strong>an</strong> inventory<strong>of</strong> all types used<br />

intexts 16 printedin Europe<strong>an</strong> 17 l<strong>an</strong>guages.<br />

Typical characters are<br />

• Ligatures,suchastheonesalreadyquotedhere(sh,<br />

si,st,...) <strong>an</strong>dm<strong>an</strong>yotherones(liketheHungari<strong>an</strong><br />

gz...).<br />

• V<strong>an</strong>ished characters, such as the “˛e,” the tailed A,<br />

etc.<br />

• Accented characters (like the old Sp<strong>an</strong>ish conson<strong>an</strong>ts).<br />

• Abbreviations.<br />

• Special characters such as verset <strong>an</strong>d respons (these<br />

twoareinUnicode,butm<strong>an</strong>yotherspecialcharacters<br />

arenot).<br />

• Historical typographical characters 18 (that are not<br />

alreadyinUnicode)such as raisedletters.<br />

Thisinventoryisbased on<br />

• Previousstudies,suchas[3,4,5,7],includingWeb<br />

pagessuchas Bolton’son cases [6].<br />

• Specimenspublishedby foundries.<br />

• <strong>Ancient</strong>books.<br />

• <strong>The</strong> MUFI project for m<strong>an</strong>uscripts!<br />

13. LikeMoxon’s, afamous18thcenturybookontype-cutting<br />

<strong>an</strong>d typefounding. See [9, 13] <strong>an</strong>d http://www.irisa.fr/<br />

faqtypo/BiViTy.<br />

14. Old me<strong>an</strong>s here before DTP! A typical example is the use,<br />

stillcurrent in 1950, <strong>of</strong> theabbreviation “crossed K” thatrepresents<br />

the Breton “ker” occurring in m<strong>an</strong>y names.<br />

15. Temporarily called CASSETIN: “cassetin” is the French<br />

name <strong>of</strong> case boxes. It c<strong>an</strong> st<strong>an</strong>d for “CASSE Type encodINg”<br />

... See also [1].<br />

16. One problem not yet solved: should we consider all types,<br />

even the ones used outside <strong>of</strong> plain text, such as ornaments <strong>an</strong>d<br />

rules? I do not think so, however the limits are not yet fixed!<br />

17. Thisisagain<strong>an</strong>unsolvedquestion: Whichl<strong>an</strong>guagesdowe<br />

consider? Latinones? WhataboutCyrillic,Greek,Hebrew,Arabic,<br />

Syriac, etc.? Actually, today it is only a matter <strong>of</strong> specialists<br />

working in this project ...<br />

18. We do not dare to speak about small caps!<br />

<strong>TUG</strong>boat,Volume24 (2003), No.3—Proceedings<strong>of</strong> EuroTEX2003 317

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!