07.10.2014 Views

Typesetting Rare Chinese Characters in LATEX - TUG

Typesetting Rare Chinese Characters in LATEX - TUG

Typesetting Rare Chinese Characters in LATEX - TUG

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Typesett<strong>in</strong>g</strong> <strong>Rare</strong> <strong>Ch<strong>in</strong>ese</strong> <strong>Characters</strong> <strong>in</strong> <strong>LATEX</strong><br />

Wai Wong, Candy L.K. Yiu, Kelv<strong>in</strong> C.F. Ng<br />

Department of Computer Science<br />

Hong Kong Baptist University<br />

Kowloon Tong, Kowloon<br />

Hong Kong<br />

wwong,candyyiu,ncfkelv<strong>in</strong>@comp.hkbu.edu.hk<br />

http://www.comp.hkbu.edu.hk/~wwong<br />

Abstract<br />

Written<strong>Ch<strong>in</strong>ese</strong>has tens of thousands of characters. But most available fonts conta<strong>in</strong> only around<br />

6to12thousandcommoncharactersthatcanmeettheneedsofeverydayusers. However,<strong>in</strong>publicationsand<strong>in</strong>formationexchange<strong>in</strong>manyprofessionalfields,anumberofrarecharactersthatare<br />

not <strong>in</strong> common fonts are needed <strong>in</strong> each document. This paper describes a method of typesett<strong>in</strong>gsuchrarecharacters<strong>in</strong>L<br />

A TEX. Thedocumentauthordescribesararecharacter<strong>in</strong>HanGlyph<br />

when such need arises. A <strong>Ch<strong>in</strong>ese</strong> character synthesis system renders the glyph accord<strong>in</strong>g to this<br />

description and collects the newly created glyphs <strong>in</strong>to a font so they are available <strong>in</strong> the body of<br />

theL A TEX document.<br />

Résumé<br />

Lech<strong>in</strong>oisécritpossèdedesdiza<strong>in</strong>esdemilliersd’idéogrammes.Maislaplupartdesfontesdisponibles<br />

ne contiennent que quelques 6 et 12 mille idéogrammes standard. Néanmo<strong>in</strong>s, les publications<br />

en sciences huma<strong>in</strong>es, comme la littérature classique, l’archéologie, ou, tout simplement, les<br />

registres municipaux denoms, nécessitentl’utilisationdegrand nombre d’idéogrammes rares.<br />

Cetarticledécrit uneméthode decomposition dedetels caractères rares sous L A TEX.L’auteur<br />

du document décrit le caractère rare dans une syntaxe spéciale, appelée HanGlyph. Un système<br />

desynthèsedecaractèresch<strong>in</strong>oisgénèreleglyphed’aprèscettedescriptionetassemblelesglyphes<br />

a<strong>in</strong>si produits dans une fonte pour être immédiatement utilisés par L A TEX dans le corps du document.<br />

Introduction<br />

The <strong>Ch<strong>in</strong>ese</strong> written script is ideographic. Each ideograph,<br />

known as hanzi, or more commonly ‘<strong>Ch<strong>in</strong>ese</strong><br />

character’,hasitsownvisualstructureandcarriescerta<strong>in</strong><br />

mean<strong>in</strong>g. There are tens of thousands of hanzi or <strong>Ch<strong>in</strong>ese</strong>characters.<br />

The most notable difference between an ideographic<br />

script and a phonographic script, such as the<br />

Lat<strong>in</strong> alphabet,Slavonic alphabet,and so on, isthe huge<br />

number of characters that the former scriptconta<strong>in</strong>s.<br />

Historically, the number of hanzi <strong>in</strong> existence <strong>in</strong>creases<br />

with time. This is reflected <strong>in</strong> many dictionariespublished<strong>in</strong>various<br />

times. Forexample,thefirst <strong>in</strong>fluential<br />

book on hanzi 說 文 解 字 (shuōwénjiězì 1 ) published<br />

around 100 AD collected 10,516 characters. By<br />

the time of the Q<strong>in</strong>g dynasty, the more famous dictionary<br />

康 熙 字 典 (kāngxīzìdiǎn) (published<strong>in</strong> 1716) conta<strong>in</strong>s<br />

47,043 characters. The largest contemporary dictionary<br />

中 華 字 海 (zhōnghuázìhǎi)documentsastagger-<br />

1. The word shuōwénjiězì follow<strong>in</strong>g the hanzi is <strong>in</strong> p<strong>in</strong>y<strong>in</strong>, a<br />

phonetic transcription of <strong>Ch<strong>in</strong>ese</strong> characters.<br />

<strong>in</strong>g86,000 characters [5].<br />

In order to facilitate the mach<strong>in</strong>e process<strong>in</strong>g of the<br />

<strong>Ch<strong>in</strong>ese</strong>script,codedcharactersetshavebeendeveloped<br />

[8]. Commonly used coded <strong>Ch<strong>in</strong>ese</strong> character sets are<br />

GB2312-80,CNS11643,Big5,JISX0208-1983andKS<br />

X 1001:1992. The number of <strong>Ch<strong>in</strong>ese</strong> characters encoded<br />

<strong>in</strong> these character sets varies from around 4,888<br />

(<strong>in</strong> KSX1001:1992)to48,027(<strong>in</strong> CNS11643-1992).<br />

The ma<strong>in</strong> reason beh<strong>in</strong>d the selection of characters<strong>in</strong>theseencodedcharactersetsisthefrequencywith<br />

which each character appears <strong>in</strong> common documents,<br />

such as newspapers, textbooks and bus<strong>in</strong>ess correspondence.<br />

Astatisticalstudyrevealsthataround6,600<strong>Ch<strong>in</strong>ese</strong><br />

characters can cover 99.999% of daily use [5]. It<br />

seems that hav<strong>in</strong>g a large enough encoded character set<br />

willsolvetheproblem.<br />

Recent <strong>in</strong>ternational effort <strong>in</strong> character set standardization<br />

results <strong>in</strong> the Unicode (version 4.0 released<br />

<strong>in</strong> May 2003) [12] and ISO/IEC 10646 standards.<br />

Unicode 3.0 [9] encoded 27,484 <strong>Ch<strong>in</strong>ese</strong> characters.<br />

42,711 characters wereadded<strong>in</strong> version3.2 [11].<br />

Althoughthenew<strong>in</strong>ternationalstandards <strong>in</strong>cludea<br />

582 <strong>TUG</strong>boat,Volume24 (2003), No. 3—Proceed<strong>in</strong>gsof EuroTEX2003


<strong>Typesett<strong>in</strong>g</strong><strong>Rare</strong> <strong>Ch<strong>in</strong>ese</strong><strong>Characters</strong> <strong>in</strong> L A TEX<br />

huge number of <strong>Ch<strong>in</strong>ese</strong> characters, <strong>in</strong> practice, several<br />

problems rema<strong>in</strong> unsolved. The first is that the design<br />

and creation of fonts of a huge character set is very expensive.<br />

Itisalsonoteconomicaltoprocessandma<strong>in</strong>ta<strong>in</strong><br />

suchlargefontsgiventhatmanyofthecharacters<strong>in</strong>them<br />

arerarelyused. Furthermore,alargenumberofexist<strong>in</strong>g<br />

systems maynot beableto handlesuchalargecharacter<br />

set. Transferr<strong>in</strong>g a document to such older systems will<br />

result <strong>in</strong>miss<strong>in</strong>gcharacters or evencrash thesystem.<br />

Onepossiblesolutionistosynthesizethecharacters<br />

whenneeded. Wehavedesigneda<strong>Ch<strong>in</strong>ese</strong>characterdescriptionlanguagecalledHanGlyph.<br />

Itisahigh-levelabstract<br />

description language which captures the essential<br />

features of characters. These features are the topology<br />

ofthestrokesandtheirrelativesizeandlocation. Weare<br />

develop<strong>in</strong>g a <strong>Ch<strong>in</strong>ese</strong> character synthesis system (CCSS)<br />

torenderthecharactersfromtheHanGlyphdescription.<br />

TheHanGlyphlanguageandthemethodofsynthesiz<strong>in</strong>g<br />

<strong>Ch<strong>in</strong>ese</strong>characterswillbedescribed<strong>in</strong>thenextsection.<br />

Us<strong>in</strong>g the HanGlyph description and the character<br />

synthesis system, we can typeset rarely used <strong>Ch<strong>in</strong>ese</strong><br />

characters <strong>in</strong> L A TEX documents. We developed a simple<br />

macro package to allow the document author to embed<br />

HanGlyph descriptions <strong>in</strong> a L A TEX document. Dur<strong>in</strong>g<br />

the first time the <strong>LATEX</strong> document is formatted, a Han-<br />

Glyph file conta<strong>in</strong><strong>in</strong>g all character descriptions is generated.<br />

This file is then processed by the synthesis system<br />

to create a font. The next time the L A TEX document is<br />

formatted, the newly generated <strong>Ch<strong>in</strong>ese</strong> character font<br />

is accessible. Thus, the character can be typeset. Later<br />

sections will describe how to use this macro and outl<strong>in</strong>e<br />

itsimplementation.<br />

<strong>Ch<strong>in</strong>ese</strong> character synthesis<br />

TheHanGlyphlanguage is a <strong>Ch<strong>in</strong>ese</strong> character description<br />

language. It provides a means of describ<strong>in</strong>g the<br />

topological arrangements of strokes <strong>in</strong> <strong>Ch<strong>in</strong>ese</strong> characters.<br />

Each <strong>Ch<strong>in</strong>ese</strong> character can be decomposed <strong>in</strong>to a<br />

numberofpartscalledcomponents. Eachcomponentconsistsofanumberofstrokes.<br />

Forexample,asillustrated<strong>in</strong><br />

Figure1,thecharacter 明 (míngmean<strong>in</strong>gbright)canbe<br />

decomposed <strong>in</strong>to two components 日 (rì) and 月 (yuè).<br />

Thefirstcomponentconsistsoffourstrokes:s( 豎 shù),<br />

i( 橫 折 hèngzhé),h( 橫 héng)andh.<br />

A HanGlyph expression describes the arrangement<br />

of the strokes <strong>in</strong> an abstract fashion. This means that<br />

only topological <strong>in</strong>formation is captured and no geometric<strong>in</strong>formationis<strong>in</strong>cluded.<br />

Forexample,theHanGlyph<br />

expression describ<strong>in</strong>g the character 二 is h h = which<br />

means the character is composed by two héng strokes,<br />

one above the other. We do not need to specify the exactcoord<strong>in</strong>atesofthestart<strong>in</strong>gpo<strong>in</strong>tofeachstroke.<br />

These<br />

willbeworkedout bythecharacter synthesizer.<br />

Oneofthecriteriaforwrit<strong>in</strong>gagoodHanGlyphex-<br />

<strong>Characters</strong><br />

✁<br />

✁<br />

✁<br />

✁✁<br />

Components 日<br />

✓✄❈ ❙<br />

✓ ✄ ❈❈❈❙❙❙<br />

✓<br />

✄<br />

明<br />

❆<br />

❆❆❆❆<br />

月<br />

✓✄❈ ❙<br />

✓ ✄ ❈❈❈❙❙❙<br />

✓<br />

✄<br />

Strokess i h h q j h h<br />

FIG. 1: Composition of a <strong>Ch<strong>in</strong>ese</strong>character<br />

pressionistobeabletodist<strong>in</strong>guishsimilarcharacters. For<br />

example, the characters 土 and 士 are composed of the<br />

same strokes and <strong>in</strong> the same arrangements. The only<br />

difference is the relative length of the twohstrokes.<br />

The HanGlyph expressions to describe these characters<br />

are h h=< s+_ and h h=> s+_, respectively. The dimensional<br />

relation symbols < and > specify the relative<br />

lengthof thetwohstrokes.<br />

Because the smallest build<strong>in</strong>g blocks of a <strong>Ch<strong>in</strong>ese</strong><br />

character are the strokes, the HanGlyph language takes<br />

the strokes as its primitives. Accord<strong>in</strong>g to many studiesof<br />

<strong>Ch<strong>in</strong>ese</strong>characters, weselected41 such primitive<br />

strokes. HanGlyphcomposescharactersus<strong>in</strong>gasmallset<br />

of fiveoperationswhichare illustrated<strong>in</strong> Figure2:<br />

• top-bottom,<br />

• left-right,<br />

• fully-enclosed,<br />

• partially-enclosed,and<br />

• cross<strong>in</strong>g.<br />

Itwouldbeverytediousifeverycharacterdescriptionconta<strong>in</strong>sdetailsdowntoeachs<strong>in</strong>glestroke.<br />

Us<strong>in</strong>gthe<br />

factthatcharacterscanbedecomposed<strong>in</strong>tocomponents<br />

andmanycomponentsappear<strong>in</strong>anumberofcharacters,<br />

HanGlyphallowsmacrostobedef<strong>in</strong>edtostandforcomponents.<br />

Thus,expressions can beveryconcise.<br />

Insummary, HanGlyphprovidesanabstractwayto<br />

describe <strong>Ch<strong>in</strong>ese</strong> characters which captures their essential<br />

characteristics so that they can be rendered. Details<br />

of the HanGlyph language can be found <strong>in</strong> our paper to<br />

bepresentedat <strong>TUG</strong>2003[14].<br />

The<strong>Ch<strong>in</strong>ese</strong>charactersynthesissystem (CCSS)isresponsible<br />

for render<strong>in</strong>g the character glyphs from their Han-<br />

Glyph descriptions. The core of CCSS is implemented<br />

<strong>in</strong> METAPOST [6]. It consists of a library of META-<br />

POST macros implement<strong>in</strong>g various HanGlyph operations.<br />

The <strong>in</strong>put to CCSS is a sequence of HanGlyph<br />

<strong>TUG</strong>boat,Volume24 (2003), No.3—Proceed<strong>in</strong>gsof EuroTEX2003 583


WaiWong,Candy L.K.Yiu,Kelv<strong>in</strong>C.F.Ng<br />

Top−bottom:<br />

Partially−<br />

enclose:<br />

0 1<br />

2 3<br />

Left−right:<br />

4 5 6 7<br />

Enclose:<br />

Cross:<br />

FIG. 2: Graphical representationof operators<br />

1 2<br />

3 4<br />

7<br />

6<br />

5<br />

(a) (b) (c)<br />

FIG. 3: Basic stroke macros for the strokeJ<br />

expressions. A front end translator converts these Han-<br />

Glyphexpressions<strong>in</strong>toaMETAPOSTprogram. Runn<strong>in</strong>g<br />

METAPOST on this programwillthengenerateaset of<br />

PostScript fileseach of whichconta<strong>in</strong>s a s<strong>in</strong>gleglyph.<br />

The METAPOST macro library is organized <strong>in</strong>to<br />

two major parts: the primitive strokes and the composition<br />

operations. Each primitive stroke is implemented<br />

as three METAPOST macros, namely a control po<strong>in</strong>t<br />

macro,askeletonmacroandanoutl<strong>in</strong>emacro. Thecontrol<br />

po<strong>in</strong>t macro def<strong>in</strong>es the control po<strong>in</strong>ts, the skeleton<br />

macrospecifiestheskeletalpaththatconnectsthecontrol<br />

po<strong>in</strong>ts, and the outl<strong>in</strong>e macro draws the outl<strong>in</strong>e which<br />

aredef<strong>in</strong>edrelativetothecontrolpo<strong>in</strong>tsandtheskeletal<br />

path. Figure3showsasample stroke.<br />

The composition operation macros perform the<br />

composition. This is done by transform<strong>in</strong>g the control<br />

po<strong>in</strong>tsofeachstrokeandpositionthemtothecorrectlocationwith<strong>in</strong>acharacterbound<strong>in</strong>gbox.<br />

Whenallstrokes<br />

of a character are positioned at the correct location, the<br />

skeleton macros are called to def<strong>in</strong>e the skeletal strokes.<br />

Then,theoutl<strong>in</strong>emacrosarecalledtodrawtheoutl<strong>in</strong>e.<br />

Us<strong>in</strong>g HanGlyph <strong>in</strong> L A TEX<br />

The HanGlyph language provides a means of describ<strong>in</strong>g<strong>Ch<strong>in</strong>ese</strong>characters,andthe<br />

CCSSsystemrendersthe<br />

characters so that we can have visual output. How can<br />

we then use them <strong>in</strong> the context of typesett<strong>in</strong>g professional<br />

documents <strong>in</strong> L A TEX? Figure4illustrates theprocess<br />

flow.<br />

First of all, the document author will embed the<br />

HanGlyph expressions of the required <strong>Ch<strong>in</strong>ese</strong> characters<strong>in</strong>theL<br />

A TEXsourcefiles. <strong>Typesett<strong>in</strong>g</strong>thedocument<br />

willgenerateaHanGlyphfilethatconta<strong>in</strong>sallHanGlyph<br />

expressions <strong>in</strong> the source. This file is then processed by<br />

theCCSSfontmakertogenerateafont<strong>in</strong>tfmandpkformatsothatthenexttimeL<br />

A TEXisrunitcanf<strong>in</strong>dthefont.<br />

To allow the author to embed HanGlyph expressions<br />

and to use the synthesized characters <strong>in</strong> a <strong>LATEX</strong><br />

document, we developed a macro package hanglyph.<br />

This provides a very simple <strong>in</strong>terface for authors to use<br />

HanGlyph. Todef<strong>in</strong>eanewcharacter,theauthorwrites<br />

the HanGlyph expression <strong>in</strong> the preamble of a L A TEX<br />

document as below:<br />

\hgchar{tu}{h h=< s+_}<br />

\hgchar{shi}{h h=> s+_}<br />

Eachcalltothemacro\hgchardef<strong>in</strong>esanewHanGlyph<br />

character. The macro takes two arguments: The first is<br />

a character name which can be used <strong>in</strong> the document to<br />

typeset the character; the second is the HanGlyph expression<br />

describ<strong>in</strong>gthe character. For example, the first<br />

call to \hgchar above def<strong>in</strong>es a new macro \tu to represent<br />

thecharacter 土 .<br />

When process<strong>in</strong>g the L A TEX document, each call to<br />

the macro \hgchar also writes a HanGlyph character<br />

def<strong>in</strong>ition to a HanGlyph file. Each character def<strong>in</strong>ition<br />

associates a character code to its HanGlyph expression.<br />

584 <strong>TUG</strong>boat,Volume24 (2003), No. 3—Proceed<strong>in</strong>gsof EuroTEX2003


<strong>Typesett<strong>in</strong>g</strong><strong>Rare</strong> <strong>Ch<strong>in</strong>ese</strong><strong>Characters</strong> <strong>in</strong> L A TEX<br />

LaTeX document<br />

with<br />

HanGlyph characters<br />

<strong>Ch<strong>in</strong>ese</strong> font<br />

<strong>in</strong><br />

tfm and pk<br />

LaTeX<br />

CCSS<br />

Font maker<br />

dvi file<br />

HanGlyph file<br />

FIG. 4: <strong>Typesett<strong>in</strong>g</strong>process<br />

The hanglyphpackageautomaticallykeepstrackofthe<br />

charactercode. Bydefault,thefirstcharacter,def<strong>in</strong>edby<br />

thefirst call to themacro \hgchar, has thecode0.<br />

In the body of the L A TEX document, the character<br />

names def<strong>in</strong>ed us<strong>in</strong>g the macro \hgchar can be used to<br />

typeset thesecharacters. So theauthor can type<br />

The <strong>Ch<strong>in</strong>ese</strong> characters \tu{} and \shi{}<br />

are composed of the same strokes and <strong>in</strong><br />

the same arrangements.<br />

to typeset thefollow<strong>in</strong>g:<br />

The<strong>Ch<strong>in</strong>ese</strong> characters 土 and 士<br />

are composed of thesame strokes and <strong>in</strong><br />

thesame arrangements.<br />

To use the \hgchar macro, the author should <strong>in</strong>clude<br />

the hanglyph package <strong>in</strong> the <strong>LATEX</strong> source file.<br />

After runn<strong>in</strong>g L A TEX once, a HanGlyph file hav<strong>in</strong>g the<br />

suffix hgc and the same base name as the L A TEX documentisgenerated.<br />

Theauthorshouldexecutethe CCSS<br />

fontmaker to generate the tfm and pk files needed by<br />

L A TEXandthedvidriver. Thefontmakershouldbeexecuted<br />

each time the embedded HanGlyph expressions<br />

arechangedsothatthefontisup-to-date. Figure6shows<br />

some sample characters generatedbyour system.<br />

This approach of typesett<strong>in</strong>g <strong>Ch<strong>in</strong>ese</strong> characters<br />

aimsatapplicationswhereonlyasmallnumberofrarely<br />

used characters appear <strong>in</strong> a document. By ‘rarely used<br />

characters’, we mean those characters that cannot be<br />

found <strong>in</strong> commonly available fonts, such as those that<br />

come with the popular operat<strong>in</strong>g systems or typesett<strong>in</strong>g<br />

systems. Our method provides a simple and convenient<br />

wayto solvethis miss<strong>in</strong>g character problem. Therefore,<br />

the current implementationof the hanglyph packageis<br />

ableto handleup to 256 characters <strong>in</strong>adocument.<br />

The implementation<br />

The implementation is <strong>in</strong> two parts: the macro package<br />

hanglyph, and the CCSS fontmaker implemented as a<br />

suiteof scriptsand Cprograms.<br />

The hanglyph package def<strong>in</strong>es the macro \hgchar for<br />

the document author to specify the <strong>Ch<strong>in</strong>ese</strong> character<br />

andanumberofauxiliarymacrostomanagethe<strong>Ch<strong>in</strong>ese</strong><br />

character font.<br />

The package works as follows: All characters def<strong>in</strong>edbycall<strong>in</strong>g\hgcharwillbecollected<strong>in</strong>toafontwith<br />

thedefaultfontnamehgfont. Eachcharacterisassigned<br />

a character code <strong>in</strong> the order it is def<strong>in</strong>ed. A counter<br />

namedhg@charcodekeepstrackofthenumberofcharacters<br />

that havebeen def<strong>in</strong>ed.<br />

Thecharacterdef<strong>in</strong>itionmacro \hgchartakestwo<br />

arguments, namely the character name and a HanGlyph<br />

expression. It performs two tasks: def<strong>in</strong><strong>in</strong>g a character<br />

macroandwrit<strong>in</strong>gthecharacterdescriptiontotheHan-<br />

Glyphfile.<br />

In order to allow the character name macros to<br />

be used <strong>in</strong> the document body, the def<strong>in</strong>itions must be<br />

placed <strong>in</strong> the preamble. At ‘beg<strong>in</strong> document’, the Han-<br />

Glyph file will be closed and all character macros have<br />

been def<strong>in</strong>ed. Each character macro is def<strong>in</strong>edto represent<br />

as<strong>in</strong>glecharacter <strong>in</strong> thedefaultHanGlyphfont.<br />

By default, the font metric file and the glyph file<br />

have the same basename as the document file. The<br />

hanglyph package provides a macro \hgfontname for<br />

theauthor to changethe font filename.<br />

TheCCSSfontmaker takesaHanGlyphfileandgenerates<br />

afonttobeused<strong>in</strong>theL A TEXdocument. Theprocessis<br />

slightlylong-w<strong>in</strong>dedbutthephilosophyistouseasmany<br />

<strong>TUG</strong>boat,Volume24 (2003), No.3—Proceed<strong>in</strong>gsof EuroTEX2003 585


WaiWong,Candy L.K.Yiu,Kelv<strong>in</strong>C.F.Ng<br />

exist<strong>in</strong>gtoolsandfileformatsaspossible,sothatthegenerated<br />

files are compatible with exist<strong>in</strong>g systems. This<br />

process isdepicted<strong>in</strong>Figure5.<br />

The core of the process is the <strong>Ch<strong>in</strong>ese</strong> character<br />

synthesis system. The output of CCSS is a set of encapsulated<br />

PostScript (eps) files. Each file conta<strong>in</strong>s a<br />

s<strong>in</strong>gle glyph. They are then converted to bitmaps <strong>in</strong><br />

portablebitmapformat(pbm). Thistaskisaccomplished<br />

byGhostscript. Thesetofpbmfilesisthenmerged<strong>in</strong>toa<br />

largebitmap,whichisthenconvertedtoaTEXfont<strong>in</strong>a<br />

pairofgfandplfiles. Thispbm-to-gfconversionisperformedbyautilitycalledpbmtogfdevelopedbythefirst<br />

authorseveralyearsago[13]ands<strong>in</strong>cebeenmadeavailableon<br />

CTAN. Theprogramtomergeasetofpbmfiles<br />

is pbmmerge, which is relatively simple to implement.<br />

The last two programs, pltotf and gftopk, are available<br />

<strong>in</strong> all TEX implementations. This process is easily<br />

<strong>in</strong>tegrated<strong>in</strong>as<strong>in</strong>glescript.<br />

Discussion and conclusion<br />

Us<strong>in</strong>gtheHanGlyphdescriptionlanguageandCCSS,one<br />

cangenerate<strong>Ch<strong>in</strong>ese</strong>characterglyphsthatarenotfound<br />

<strong>in</strong> commonly available fonts. To typeset such characters<br />

requires generat<strong>in</strong>g a font <strong>in</strong> the format understood by<br />

the typesett<strong>in</strong>g system. The hanglyph package enables<br />

usersoftheL A TEXsystemtotypesetrarelyused<strong>Ch<strong>in</strong>ese</strong><br />

characters.<br />

At this stage, we have experimented with a small<br />

numberofcharacters. Theresults(someofwhichareillustrated<strong>in</strong>Figure6)showthatthisapproachisfeasible,<br />

and very promis<strong>in</strong>g. We are plann<strong>in</strong>g to carry out more<br />

experiments to typeset a reasonably large set of characters.<br />

The purpose is to study the effect of rasterization<br />

and to improve the quality of the glyphs generated by<br />

CCSS. We are confident that our method of character<br />

synthesis can produce reasonably good quality and readable<strong>Ch<strong>in</strong>ese</strong><br />

character glyphs.<br />

Itisobviousthatthecurrentfont mak<strong>in</strong>gprocessis<br />

a bit <strong>in</strong>efficient. Another shortcom<strong>in</strong>g is the loss of scalability<br />

by convert<strong>in</strong>g the vector representation of character<br />

glyphs <strong>in</strong>to a bitmap form. These weaknesses will<br />

certa<strong>in</strong>lybeelim<strong>in</strong>atedasthedevelopmentofCCSSprogresses.<br />

ThecurrentstageofthedevelopmentofCCSSconcentrates<br />

on the experiment and improvement of glyph<br />

algorithms. ItisplannedthatfutureversionsofCCSSwill<br />

generateoutl<strong>in</strong>efontsforHanGlyph<strong>in</strong>put. Thiswillcerta<strong>in</strong>lyimprovequalityandusability.<br />

Thispaperdescribesonlyoneof many possibleapplicationsof<br />

HanGlyphand<strong>Ch<strong>in</strong>ese</strong> character synthesis.<br />

WeenvisagethatCCSSwillcontributegreatlytoeas<strong>in</strong>ga<br />

majordifficulty<strong>in</strong><strong>Ch<strong>in</strong>ese</strong>textual<strong>in</strong>formationexchange<br />

and presentation<strong>in</strong> aheterogenous environment.<br />

References<br />

[1] 蘇 培 成 (sūpéichéng).《 二 十 世 紀 的 現 代 漢 字 研<br />

究 》(èrshíshìjìde xiàndài hànzì xiānjiù) 書 海 出 版<br />

社 (shūhǎi chūbǎnshè),2001.[SUPeiCheng,20th<br />

Century Research on Modern <strong>Ch<strong>in</strong>ese</strong> <strong>Characters</strong>, Su<br />

HaiPress,2001.]<br />

[2] 劉 連 元 (liúliányuán).〈 漢 字 拓 扑 結 構 分 析 〉(hànzì<br />

tuōpǔ jiékoù fēnxī). In《 漢 字 》(hànzì). 上 海 教<br />

育 出 版 社 (shànghǎi jiàoyù chūbǎnshè), 1993.<br />

[LIU Lian Yuan, Analysis of the Topological Structureof<strong>Ch<strong>in</strong>ese</strong><strong>Characters</strong>,ShanghaiEducationPress,<br />

1993.]<br />

[3] 傅 永 和 (fùyónghé).《 漢 字 結 構 和 構 造 成 分 的 基<br />

楚 研 究 》(hànzì jiékoù he koùzào chéngfènde jī<br />

chǔ yānjiù). 上 海 教 育 出 版 社 (shànghǎi jiàoyù chū<br />

bǎnshè), 1993. [FU Yong He, BasicResearchonthe<br />

Structureof<strong>Ch<strong>in</strong>ese</strong>charactersandTheirConstituents,<br />

ShanghaiEducation Press,1993.]<br />

[4] 傅 永 和 (fùyónghé).《 中 文 信 息 處 理 》(zhōngwén<br />

xìnxī chǔlǐ). 廣 東 教 育 出 版 社 (guǎngdōng jiàoyù<br />

chūbǎnshè),1999. [FU Yong He, <strong>Ch<strong>in</strong>ese</strong>InformationProcess<strong>in</strong>g,GuangdongEducation<br />

Press,1999.]<br />

[5] 馮 志 偉 (féngzhìwěi).《 計 算 語 言 學 基 礎 》(jìsuàn<br />

yǔyánxué jīchǔ). 商 務 印 書 館 (shāngwù yìnshū<br />

guǎng), 2001. [FENG Zhi Wei, Foundations of<br />

Computational L<strong>in</strong>guistics, The Commercial Press,<br />

2001.]<br />

[6] J. D. Hobby. A METAFONT-likesystemwithPost-<br />

Script output, <strong>TUG</strong>boat, Vol. 10, No. 4, pp. 505–<br />

512, December1989.<br />

[7] John D. Hobby. AUser’sManualfor METAPOST,<br />

AT&T Bell Laboratory Computer Science TechnicalReport,<br />

No. 162, 1992.<br />

[8] Ken Lunde. CJKV<strong>in</strong>formationprocess<strong>in</strong>g. O’Reilly,<br />

1999.<br />

[9] The Unicode Consortium. The Unicode Standard,<br />

Version3.0. Addison-Wesley,2000.<br />

[10] The Unicode Consortium. Unicode Version<br />

3.1, The Unicode Standard Annex No. 27.<br />

http://www.unicode.org/reports/tr27/<br />

[11] The Unicode Consortium. Unicode Version<br />

3.2, The Unicode Standard Annex No. 28.<br />

http://www.unicode.org/reports/tr28/<br />

[12] The Unicode Consortium. TheUnicodeStandard,<br />

Version4.0. Addison-Wesley,2003.<br />

[13] Wai Wong. pbmtogf—convert<strong>in</strong>g bitmap to font.<br />

http://www.comp.hkbu.edu.hk/~wwong/<br />

typeset/pbmtogf/<br />

[14] CandyL.K.Yiu,WaiWong. <strong>Ch<strong>in</strong>ese</strong>charactersynthesisus<strong>in</strong>g<br />

METAPOST. <strong>TUG</strong>boat,Vol.24, No.1,<br />

pp.85–93, 2003 (<strong>TUG</strong>2003 proceed<strong>in</strong>gs).<br />

586 <strong>TUG</strong>boat,Volume24 (2003), No. 3—Proceed<strong>in</strong>gsof EuroTEX2003


<strong>Typesett<strong>in</strong>g</strong><strong>Rare</strong> <strong>Ch<strong>in</strong>ese</strong><strong>Characters</strong> <strong>in</strong> L A TEX<br />

HanGlyph<br />

file<br />

Font property<br />

pl file<br />

pltotf<br />

Font metric<br />

tfm file<br />

merged pbm<br />

pbmtogf<br />

CCSS<br />

pbmmerge<br />

Generic font<br />

gf file<br />

gftopk<br />

Packed font<br />

pk file<br />

Character<br />

glyphs <strong>in</strong> eps<br />

epstopbm<br />

glyphs <strong>in</strong> pbm<br />

¡ ¢<br />

FIG. 5: The fontmak<strong>in</strong>gprocess<br />

£<br />

Glyph HanGlyph expression<br />

¤<br />

p n |<br />

¥<br />

hSt ba_1 | !~<br />

¦<br />

wang_b_2 d q | ~ | !~ wang_a_2 | !~<br />

§<br />

zhou_1 shu_1 |<br />

¨<br />

pian_4 /Pq kn ^7 _]/|<br />

©<br />

h s+ d Z|!~ h= @ ’<br />

Å<br />

qih sih ^7 0.8 0.8 _ x<strong>in</strong>_1_b | hhs1 =<br />

«<br />

div qih sih ^7 0.8 0.8 _ x<strong>in</strong>_1_b | ^1<br />

¬<br />

x<strong>in</strong>_1_b D q | ~ | x<strong>in</strong>_1_b |<br />

­<br />

ri_4 ri_4 = <<br />

Æ<br />

p h=0.4]e+#[<br />

¯<br />

sih ph m = |!~ 0.5 0.5<br />

<br />

sih qian_4 |!~ 0.5 0.5<br />

<br />

she_2 x<strong>in</strong>_1_b |<br />

<br />

zhi_3_a wp |<br />

<br />

h S+

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!