Typesetting Rare Chinese Characters in LATEX - TUG
Typesetting Rare Chinese Characters in LATEX - TUG
Typesetting Rare Chinese Characters in LATEX - TUG
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Typesett<strong>in</strong>g</strong> <strong>Rare</strong> <strong>Ch<strong>in</strong>ese</strong> <strong>Characters</strong> <strong>in</strong> <strong>LATEX</strong><br />
Wai Wong, Candy L.K. Yiu, Kelv<strong>in</strong> C.F. Ng<br />
Department of Computer Science<br />
Hong Kong Baptist University<br />
Kowloon Tong, Kowloon<br />
Hong Kong<br />
wwong,candyyiu,ncfkelv<strong>in</strong>@comp.hkbu.edu.hk<br />
http://www.comp.hkbu.edu.hk/~wwong<br />
Abstract<br />
Written<strong>Ch<strong>in</strong>ese</strong>has tens of thousands of characters. But most available fonts conta<strong>in</strong> only around<br />
6to12thousandcommoncharactersthatcanmeettheneedsofeverydayusers. However,<strong>in</strong>publicationsand<strong>in</strong>formationexchange<strong>in</strong>manyprofessionalfields,anumberofrarecharactersthatare<br />
not <strong>in</strong> common fonts are needed <strong>in</strong> each document. This paper describes a method of typesett<strong>in</strong>gsuchrarecharacters<strong>in</strong>L<br />
A TEX. Thedocumentauthordescribesararecharacter<strong>in</strong>HanGlyph<br />
when such need arises. A <strong>Ch<strong>in</strong>ese</strong> character synthesis system renders the glyph accord<strong>in</strong>g to this<br />
description and collects the newly created glyphs <strong>in</strong>to a font so they are available <strong>in</strong> the body of<br />
theL A TEX document.<br />
Résumé<br />
Lech<strong>in</strong>oisécritpossèdedesdiza<strong>in</strong>esdemilliersd’idéogrammes.Maislaplupartdesfontesdisponibles<br />
ne contiennent que quelques 6 et 12 mille idéogrammes standard. Néanmo<strong>in</strong>s, les publications<br />
en sciences huma<strong>in</strong>es, comme la littérature classique, l’archéologie, ou, tout simplement, les<br />
registres municipaux denoms, nécessitentl’utilisationdegrand nombre d’idéogrammes rares.<br />
Cetarticledécrit uneméthode decomposition dedetels caractères rares sous L A TEX.L’auteur<br />
du document décrit le caractère rare dans une syntaxe spéciale, appelée HanGlyph. Un système<br />
desynthèsedecaractèresch<strong>in</strong>oisgénèreleglyphed’aprèscettedescriptionetassemblelesglyphes<br />
a<strong>in</strong>si produits dans une fonte pour être immédiatement utilisés par L A TEX dans le corps du document.<br />
Introduction<br />
The <strong>Ch<strong>in</strong>ese</strong> written script is ideographic. Each ideograph,<br />
known as hanzi, or more commonly ‘<strong>Ch<strong>in</strong>ese</strong><br />
character’,hasitsownvisualstructureandcarriescerta<strong>in</strong><br />
mean<strong>in</strong>g. There are tens of thousands of hanzi or <strong>Ch<strong>in</strong>ese</strong>characters.<br />
The most notable difference between an ideographic<br />
script and a phonographic script, such as the<br />
Lat<strong>in</strong> alphabet,Slavonic alphabet,and so on, isthe huge<br />
number of characters that the former scriptconta<strong>in</strong>s.<br />
Historically, the number of hanzi <strong>in</strong> existence <strong>in</strong>creases<br />
with time. This is reflected <strong>in</strong> many dictionariespublished<strong>in</strong>various<br />
times. Forexample,thefirst <strong>in</strong>fluential<br />
book on hanzi 說 文 解 字 (shuōwénjiězì 1 ) published<br />
around 100 AD collected 10,516 characters. By<br />
the time of the Q<strong>in</strong>g dynasty, the more famous dictionary<br />
康 熙 字 典 (kāngxīzìdiǎn) (published<strong>in</strong> 1716) conta<strong>in</strong>s<br />
47,043 characters. The largest contemporary dictionary<br />
中 華 字 海 (zhōnghuázìhǎi)documentsastagger-<br />
1. The word shuōwénjiězì follow<strong>in</strong>g the hanzi is <strong>in</strong> p<strong>in</strong>y<strong>in</strong>, a<br />
phonetic transcription of <strong>Ch<strong>in</strong>ese</strong> characters.<br />
<strong>in</strong>g86,000 characters [5].<br />
In order to facilitate the mach<strong>in</strong>e process<strong>in</strong>g of the<br />
<strong>Ch<strong>in</strong>ese</strong>script,codedcharactersetshavebeendeveloped<br />
[8]. Commonly used coded <strong>Ch<strong>in</strong>ese</strong> character sets are<br />
GB2312-80,CNS11643,Big5,JISX0208-1983andKS<br />
X 1001:1992. The number of <strong>Ch<strong>in</strong>ese</strong> characters encoded<br />
<strong>in</strong> these character sets varies from around 4,888<br />
(<strong>in</strong> KSX1001:1992)to48,027(<strong>in</strong> CNS11643-1992).<br />
The ma<strong>in</strong> reason beh<strong>in</strong>d the selection of characters<strong>in</strong>theseencodedcharactersetsisthefrequencywith<br />
which each character appears <strong>in</strong> common documents,<br />
such as newspapers, textbooks and bus<strong>in</strong>ess correspondence.<br />
Astatisticalstudyrevealsthataround6,600<strong>Ch<strong>in</strong>ese</strong><br />
characters can cover 99.999% of daily use [5]. It<br />
seems that hav<strong>in</strong>g a large enough encoded character set<br />
willsolvetheproblem.<br />
Recent <strong>in</strong>ternational effort <strong>in</strong> character set standardization<br />
results <strong>in</strong> the Unicode (version 4.0 released<br />
<strong>in</strong> May 2003) [12] and ISO/IEC 10646 standards.<br />
Unicode 3.0 [9] encoded 27,484 <strong>Ch<strong>in</strong>ese</strong> characters.<br />
42,711 characters wereadded<strong>in</strong> version3.2 [11].<br />
Althoughthenew<strong>in</strong>ternationalstandards <strong>in</strong>cludea<br />
582 <strong>TUG</strong>boat,Volume24 (2003), No. 3—Proceed<strong>in</strong>gsof EuroTEX2003
<strong>Typesett<strong>in</strong>g</strong><strong>Rare</strong> <strong>Ch<strong>in</strong>ese</strong><strong>Characters</strong> <strong>in</strong> L A TEX<br />
huge number of <strong>Ch<strong>in</strong>ese</strong> characters, <strong>in</strong> practice, several<br />
problems rema<strong>in</strong> unsolved. The first is that the design<br />
and creation of fonts of a huge character set is very expensive.<br />
Itisalsonoteconomicaltoprocessandma<strong>in</strong>ta<strong>in</strong><br />
suchlargefontsgiventhatmanyofthecharacters<strong>in</strong>them<br />
arerarelyused. Furthermore,alargenumberofexist<strong>in</strong>g<br />
systems maynot beableto handlesuchalargecharacter<br />
set. Transferr<strong>in</strong>g a document to such older systems will<br />
result <strong>in</strong>miss<strong>in</strong>gcharacters or evencrash thesystem.<br />
Onepossiblesolutionistosynthesizethecharacters<br />
whenneeded. Wehavedesigneda<strong>Ch<strong>in</strong>ese</strong>characterdescriptionlanguagecalledHanGlyph.<br />
Itisahigh-levelabstract<br />
description language which captures the essential<br />
features of characters. These features are the topology<br />
ofthestrokesandtheirrelativesizeandlocation. Weare<br />
develop<strong>in</strong>g a <strong>Ch<strong>in</strong>ese</strong> character synthesis system (CCSS)<br />
torenderthecharactersfromtheHanGlyphdescription.<br />
TheHanGlyphlanguageandthemethodofsynthesiz<strong>in</strong>g<br />
<strong>Ch<strong>in</strong>ese</strong>characterswillbedescribed<strong>in</strong>thenextsection.<br />
Us<strong>in</strong>g the HanGlyph description and the character<br />
synthesis system, we can typeset rarely used <strong>Ch<strong>in</strong>ese</strong><br />
characters <strong>in</strong> L A TEX documents. We developed a simple<br />
macro package to allow the document author to embed<br />
HanGlyph descriptions <strong>in</strong> a L A TEX document. Dur<strong>in</strong>g<br />
the first time the <strong>LATEX</strong> document is formatted, a Han-<br />
Glyph file conta<strong>in</strong><strong>in</strong>g all character descriptions is generated.<br />
This file is then processed by the synthesis system<br />
to create a font. The next time the L A TEX document is<br />
formatted, the newly generated <strong>Ch<strong>in</strong>ese</strong> character font<br />
is accessible. Thus, the character can be typeset. Later<br />
sections will describe how to use this macro and outl<strong>in</strong>e<br />
itsimplementation.<br />
<strong>Ch<strong>in</strong>ese</strong> character synthesis<br />
TheHanGlyphlanguage is a <strong>Ch<strong>in</strong>ese</strong> character description<br />
language. It provides a means of describ<strong>in</strong>g the<br />
topological arrangements of strokes <strong>in</strong> <strong>Ch<strong>in</strong>ese</strong> characters.<br />
Each <strong>Ch<strong>in</strong>ese</strong> character can be decomposed <strong>in</strong>to a<br />
numberofpartscalledcomponents. Eachcomponentconsistsofanumberofstrokes.<br />
Forexample,asillustrated<strong>in</strong><br />
Figure1,thecharacter 明 (míngmean<strong>in</strong>gbright)canbe<br />
decomposed <strong>in</strong>to two components 日 (rì) and 月 (yuè).<br />
Thefirstcomponentconsistsoffourstrokes:s( 豎 shù),<br />
i( 橫 折 hèngzhé),h( 橫 héng)andh.<br />
A HanGlyph expression describes the arrangement<br />
of the strokes <strong>in</strong> an abstract fashion. This means that<br />
only topological <strong>in</strong>formation is captured and no geometric<strong>in</strong>formationis<strong>in</strong>cluded.<br />
Forexample,theHanGlyph<br />
expression describ<strong>in</strong>g the character 二 is h h = which<br />
means the character is composed by two héng strokes,<br />
one above the other. We do not need to specify the exactcoord<strong>in</strong>atesofthestart<strong>in</strong>gpo<strong>in</strong>tofeachstroke.<br />
These<br />
willbeworkedout bythecharacter synthesizer.<br />
Oneofthecriteriaforwrit<strong>in</strong>gagoodHanGlyphex-<br />
<strong>Characters</strong><br />
✁<br />
✁<br />
✁<br />
✁✁<br />
Components 日<br />
✓✄❈ ❙<br />
✓ ✄ ❈❈❈❙❙❙<br />
✓<br />
✄<br />
明<br />
❆<br />
❆❆❆❆<br />
月<br />
✓✄❈ ❙<br />
✓ ✄ ❈❈❈❙❙❙<br />
✓<br />
✄<br />
Strokess i h h q j h h<br />
FIG. 1: Composition of a <strong>Ch<strong>in</strong>ese</strong>character<br />
pressionistobeabletodist<strong>in</strong>guishsimilarcharacters. For<br />
example, the characters 土 and 士 are composed of the<br />
same strokes and <strong>in</strong> the same arrangements. The only<br />
difference is the relative length of the twohstrokes.<br />
The HanGlyph expressions to describe these characters<br />
are h h=< s+_ and h h=> s+_, respectively. The dimensional<br />
relation symbols < and > specify the relative<br />
lengthof thetwohstrokes.<br />
Because the smallest build<strong>in</strong>g blocks of a <strong>Ch<strong>in</strong>ese</strong><br />
character are the strokes, the HanGlyph language takes<br />
the strokes as its primitives. Accord<strong>in</strong>g to many studiesof<br />
<strong>Ch<strong>in</strong>ese</strong>characters, weselected41 such primitive<br />
strokes. HanGlyphcomposescharactersus<strong>in</strong>gasmallset<br />
of fiveoperationswhichare illustrated<strong>in</strong> Figure2:<br />
• top-bottom,<br />
• left-right,<br />
• fully-enclosed,<br />
• partially-enclosed,and<br />
• cross<strong>in</strong>g.<br />
Itwouldbeverytediousifeverycharacterdescriptionconta<strong>in</strong>sdetailsdowntoeachs<strong>in</strong>glestroke.<br />
Us<strong>in</strong>gthe<br />
factthatcharacterscanbedecomposed<strong>in</strong>tocomponents<br />
andmanycomponentsappear<strong>in</strong>anumberofcharacters,<br />
HanGlyphallowsmacrostobedef<strong>in</strong>edtostandforcomponents.<br />
Thus,expressions can beveryconcise.<br />
Insummary, HanGlyphprovidesanabstractwayto<br />
describe <strong>Ch<strong>in</strong>ese</strong> characters which captures their essential<br />
characteristics so that they can be rendered. Details<br />
of the HanGlyph language can be found <strong>in</strong> our paper to<br />
bepresentedat <strong>TUG</strong>2003[14].<br />
The<strong>Ch<strong>in</strong>ese</strong>charactersynthesissystem (CCSS)isresponsible<br />
for render<strong>in</strong>g the character glyphs from their Han-<br />
Glyph descriptions. The core of CCSS is implemented<br />
<strong>in</strong> METAPOST [6]. It consists of a library of META-<br />
POST macros implement<strong>in</strong>g various HanGlyph operations.<br />
The <strong>in</strong>put to CCSS is a sequence of HanGlyph<br />
<strong>TUG</strong>boat,Volume24 (2003), No.3—Proceed<strong>in</strong>gsof EuroTEX2003 583
WaiWong,Candy L.K.Yiu,Kelv<strong>in</strong>C.F.Ng<br />
Top−bottom:<br />
Partially−<br />
enclose:<br />
0 1<br />
2 3<br />
Left−right:<br />
4 5 6 7<br />
Enclose:<br />
Cross:<br />
FIG. 2: Graphical representationof operators<br />
1 2<br />
3 4<br />
7<br />
6<br />
5<br />
(a) (b) (c)<br />
FIG. 3: Basic stroke macros for the strokeJ<br />
expressions. A front end translator converts these Han-<br />
Glyphexpressions<strong>in</strong>toaMETAPOSTprogram. Runn<strong>in</strong>g<br />
METAPOST on this programwillthengenerateaset of<br />
PostScript fileseach of whichconta<strong>in</strong>s a s<strong>in</strong>gleglyph.<br />
The METAPOST macro library is organized <strong>in</strong>to<br />
two major parts: the primitive strokes and the composition<br />
operations. Each primitive stroke is implemented<br />
as three METAPOST macros, namely a control po<strong>in</strong>t<br />
macro,askeletonmacroandanoutl<strong>in</strong>emacro. Thecontrol<br />
po<strong>in</strong>t macro def<strong>in</strong>es the control po<strong>in</strong>ts, the skeleton<br />
macrospecifiestheskeletalpaththatconnectsthecontrol<br />
po<strong>in</strong>ts, and the outl<strong>in</strong>e macro draws the outl<strong>in</strong>e which<br />
aredef<strong>in</strong>edrelativetothecontrolpo<strong>in</strong>tsandtheskeletal<br />
path. Figure3showsasample stroke.<br />
The composition operation macros perform the<br />
composition. This is done by transform<strong>in</strong>g the control<br />
po<strong>in</strong>tsofeachstrokeandpositionthemtothecorrectlocationwith<strong>in</strong>acharacterbound<strong>in</strong>gbox.<br />
Whenallstrokes<br />
of a character are positioned at the correct location, the<br />
skeleton macros are called to def<strong>in</strong>e the skeletal strokes.<br />
Then,theoutl<strong>in</strong>emacrosarecalledtodrawtheoutl<strong>in</strong>e.<br />
Us<strong>in</strong>g HanGlyph <strong>in</strong> L A TEX<br />
The HanGlyph language provides a means of describ<strong>in</strong>g<strong>Ch<strong>in</strong>ese</strong>characters,andthe<br />
CCSSsystemrendersthe<br />
characters so that we can have visual output. How can<br />
we then use them <strong>in</strong> the context of typesett<strong>in</strong>g professional<br />
documents <strong>in</strong> L A TEX? Figure4illustrates theprocess<br />
flow.<br />
First of all, the document author will embed the<br />
HanGlyph expressions of the required <strong>Ch<strong>in</strong>ese</strong> characters<strong>in</strong>theL<br />
A TEXsourcefiles. <strong>Typesett<strong>in</strong>g</strong>thedocument<br />
willgenerateaHanGlyphfilethatconta<strong>in</strong>sallHanGlyph<br />
expressions <strong>in</strong> the source. This file is then processed by<br />
theCCSSfontmakertogenerateafont<strong>in</strong>tfmandpkformatsothatthenexttimeL<br />
A TEXisrunitcanf<strong>in</strong>dthefont.<br />
To allow the author to embed HanGlyph expressions<br />
and to use the synthesized characters <strong>in</strong> a <strong>LATEX</strong><br />
document, we developed a macro package hanglyph.<br />
This provides a very simple <strong>in</strong>terface for authors to use<br />
HanGlyph. Todef<strong>in</strong>eanewcharacter,theauthorwrites<br />
the HanGlyph expression <strong>in</strong> the preamble of a L A TEX<br />
document as below:<br />
\hgchar{tu}{h h=< s+_}<br />
\hgchar{shi}{h h=> s+_}<br />
Eachcalltothemacro\hgchardef<strong>in</strong>esanewHanGlyph<br />
character. The macro takes two arguments: The first is<br />
a character name which can be used <strong>in</strong> the document to<br />
typeset the character; the second is the HanGlyph expression<br />
describ<strong>in</strong>gthe character. For example, the first<br />
call to \hgchar above def<strong>in</strong>es a new macro \tu to represent<br />
thecharacter 土 .<br />
When process<strong>in</strong>g the L A TEX document, each call to<br />
the macro \hgchar also writes a HanGlyph character<br />
def<strong>in</strong>ition to a HanGlyph file. Each character def<strong>in</strong>ition<br />
associates a character code to its HanGlyph expression.<br />
584 <strong>TUG</strong>boat,Volume24 (2003), No. 3—Proceed<strong>in</strong>gsof EuroTEX2003
<strong>Typesett<strong>in</strong>g</strong><strong>Rare</strong> <strong>Ch<strong>in</strong>ese</strong><strong>Characters</strong> <strong>in</strong> L A TEX<br />
LaTeX document<br />
with<br />
HanGlyph characters<br />
<strong>Ch<strong>in</strong>ese</strong> font<br />
<strong>in</strong><br />
tfm and pk<br />
LaTeX<br />
CCSS<br />
Font maker<br />
dvi file<br />
HanGlyph file<br />
FIG. 4: <strong>Typesett<strong>in</strong>g</strong>process<br />
The hanglyphpackageautomaticallykeepstrackofthe<br />
charactercode. Bydefault,thefirstcharacter,def<strong>in</strong>edby<br />
thefirst call to themacro \hgchar, has thecode0.<br />
In the body of the L A TEX document, the character<br />
names def<strong>in</strong>ed us<strong>in</strong>g the macro \hgchar can be used to<br />
typeset thesecharacters. So theauthor can type<br />
The <strong>Ch<strong>in</strong>ese</strong> characters \tu{} and \shi{}<br />
are composed of the same strokes and <strong>in</strong><br />
the same arrangements.<br />
to typeset thefollow<strong>in</strong>g:<br />
The<strong>Ch<strong>in</strong>ese</strong> characters 土 and 士<br />
are composed of thesame strokes and <strong>in</strong><br />
thesame arrangements.<br />
To use the \hgchar macro, the author should <strong>in</strong>clude<br />
the hanglyph package <strong>in</strong> the <strong>LATEX</strong> source file.<br />
After runn<strong>in</strong>g L A TEX once, a HanGlyph file hav<strong>in</strong>g the<br />
suffix hgc and the same base name as the L A TEX documentisgenerated.<br />
Theauthorshouldexecutethe CCSS<br />
fontmaker to generate the tfm and pk files needed by<br />
L A TEXandthedvidriver. Thefontmakershouldbeexecuted<br />
each time the embedded HanGlyph expressions<br />
arechangedsothatthefontisup-to-date. Figure6shows<br />
some sample characters generatedbyour system.<br />
This approach of typesett<strong>in</strong>g <strong>Ch<strong>in</strong>ese</strong> characters<br />
aimsatapplicationswhereonlyasmallnumberofrarely<br />
used characters appear <strong>in</strong> a document. By ‘rarely used<br />
characters’, we mean those characters that cannot be<br />
found <strong>in</strong> commonly available fonts, such as those that<br />
come with the popular operat<strong>in</strong>g systems or typesett<strong>in</strong>g<br />
systems. Our method provides a simple and convenient<br />
wayto solvethis miss<strong>in</strong>g character problem. Therefore,<br />
the current implementationof the hanglyph packageis<br />
ableto handleup to 256 characters <strong>in</strong>adocument.<br />
The implementation<br />
The implementation is <strong>in</strong> two parts: the macro package<br />
hanglyph, and the CCSS fontmaker implemented as a<br />
suiteof scriptsand Cprograms.<br />
The hanglyph package def<strong>in</strong>es the macro \hgchar for<br />
the document author to specify the <strong>Ch<strong>in</strong>ese</strong> character<br />
andanumberofauxiliarymacrostomanagethe<strong>Ch<strong>in</strong>ese</strong><br />
character font.<br />
The package works as follows: All characters def<strong>in</strong>edbycall<strong>in</strong>g\hgcharwillbecollected<strong>in</strong>toafontwith<br />
thedefaultfontnamehgfont. Eachcharacterisassigned<br />
a character code <strong>in</strong> the order it is def<strong>in</strong>ed. A counter<br />
namedhg@charcodekeepstrackofthenumberofcharacters<br />
that havebeen def<strong>in</strong>ed.<br />
Thecharacterdef<strong>in</strong>itionmacro \hgchartakestwo<br />
arguments, namely the character name and a HanGlyph<br />
expression. It performs two tasks: def<strong>in</strong><strong>in</strong>g a character<br />
macroandwrit<strong>in</strong>gthecharacterdescriptiontotheHan-<br />
Glyphfile.<br />
In order to allow the character name macros to<br />
be used <strong>in</strong> the document body, the def<strong>in</strong>itions must be<br />
placed <strong>in</strong> the preamble. At ‘beg<strong>in</strong> document’, the Han-<br />
Glyph file will be closed and all character macros have<br />
been def<strong>in</strong>ed. Each character macro is def<strong>in</strong>edto represent<br />
as<strong>in</strong>glecharacter <strong>in</strong> thedefaultHanGlyphfont.<br />
By default, the font metric file and the glyph file<br />
have the same basename as the document file. The<br />
hanglyph package provides a macro \hgfontname for<br />
theauthor to changethe font filename.<br />
TheCCSSfontmaker takesaHanGlyphfileandgenerates<br />
afonttobeused<strong>in</strong>theL A TEXdocument. Theprocessis<br />
slightlylong-w<strong>in</strong>dedbutthephilosophyistouseasmany<br />
<strong>TUG</strong>boat,Volume24 (2003), No.3—Proceed<strong>in</strong>gsof EuroTEX2003 585
WaiWong,Candy L.K.Yiu,Kelv<strong>in</strong>C.F.Ng<br />
exist<strong>in</strong>gtoolsandfileformatsaspossible,sothatthegenerated<br />
files are compatible with exist<strong>in</strong>g systems. This<br />
process isdepicted<strong>in</strong>Figure5.<br />
The core of the process is the <strong>Ch<strong>in</strong>ese</strong> character<br />
synthesis system. The output of CCSS is a set of encapsulated<br />
PostScript (eps) files. Each file conta<strong>in</strong>s a<br />
s<strong>in</strong>gle glyph. They are then converted to bitmaps <strong>in</strong><br />
portablebitmapformat(pbm). Thistaskisaccomplished<br />
byGhostscript. Thesetofpbmfilesisthenmerged<strong>in</strong>toa<br />
largebitmap,whichisthenconvertedtoaTEXfont<strong>in</strong>a<br />
pairofgfandplfiles. Thispbm-to-gfconversionisperformedbyautilitycalledpbmtogfdevelopedbythefirst<br />
authorseveralyearsago[13]ands<strong>in</strong>cebeenmadeavailableon<br />
CTAN. Theprogramtomergeasetofpbmfiles<br />
is pbmmerge, which is relatively simple to implement.<br />
The last two programs, pltotf and gftopk, are available<br />
<strong>in</strong> all TEX implementations. This process is easily<br />
<strong>in</strong>tegrated<strong>in</strong>as<strong>in</strong>glescript.<br />
Discussion and conclusion<br />
Us<strong>in</strong>gtheHanGlyphdescriptionlanguageandCCSS,one<br />
cangenerate<strong>Ch<strong>in</strong>ese</strong>characterglyphsthatarenotfound<br />
<strong>in</strong> commonly available fonts. To typeset such characters<br />
requires generat<strong>in</strong>g a font <strong>in</strong> the format understood by<br />
the typesett<strong>in</strong>g system. The hanglyph package enables<br />
usersoftheL A TEXsystemtotypesetrarelyused<strong>Ch<strong>in</strong>ese</strong><br />
characters.<br />
At this stage, we have experimented with a small<br />
numberofcharacters. Theresults(someofwhichareillustrated<strong>in</strong>Figure6)showthatthisapproachisfeasible,<br />
and very promis<strong>in</strong>g. We are plann<strong>in</strong>g to carry out more<br />
experiments to typeset a reasonably large set of characters.<br />
The purpose is to study the effect of rasterization<br />
and to improve the quality of the glyphs generated by<br />
CCSS. We are confident that our method of character<br />
synthesis can produce reasonably good quality and readable<strong>Ch<strong>in</strong>ese</strong><br />
character glyphs.<br />
Itisobviousthatthecurrentfont mak<strong>in</strong>gprocessis<br />
a bit <strong>in</strong>efficient. Another shortcom<strong>in</strong>g is the loss of scalability<br />
by convert<strong>in</strong>g the vector representation of character<br />
glyphs <strong>in</strong>to a bitmap form. These weaknesses will<br />
certa<strong>in</strong>lybeelim<strong>in</strong>atedasthedevelopmentofCCSSprogresses.<br />
ThecurrentstageofthedevelopmentofCCSSconcentrates<br />
on the experiment and improvement of glyph<br />
algorithms. ItisplannedthatfutureversionsofCCSSwill<br />
generateoutl<strong>in</strong>efontsforHanGlyph<strong>in</strong>put. Thiswillcerta<strong>in</strong>lyimprovequalityandusability.<br />
Thispaperdescribesonlyoneof many possibleapplicationsof<br />
HanGlyphand<strong>Ch<strong>in</strong>ese</strong> character synthesis.<br />
WeenvisagethatCCSSwillcontributegreatlytoeas<strong>in</strong>ga<br />
majordifficulty<strong>in</strong><strong>Ch<strong>in</strong>ese</strong>textual<strong>in</strong>formationexchange<br />
and presentation<strong>in</strong> aheterogenous environment.<br />
References<br />
[1] 蘇 培 成 (sūpéichéng).《 二 十 世 紀 的 現 代 漢 字 研<br />
究 》(èrshíshìjìde xiàndài hànzì xiānjiù) 書 海 出 版<br />
社 (shūhǎi chūbǎnshè),2001.[SUPeiCheng,20th<br />
Century Research on Modern <strong>Ch<strong>in</strong>ese</strong> <strong>Characters</strong>, Su<br />
HaiPress,2001.]<br />
[2] 劉 連 元 (liúliányuán).〈 漢 字 拓 扑 結 構 分 析 〉(hànzì<br />
tuōpǔ jiékoù fēnxī). In《 漢 字 》(hànzì). 上 海 教<br />
育 出 版 社 (shànghǎi jiàoyù chūbǎnshè), 1993.<br />
[LIU Lian Yuan, Analysis of the Topological Structureof<strong>Ch<strong>in</strong>ese</strong><strong>Characters</strong>,ShanghaiEducationPress,<br />
1993.]<br />
[3] 傅 永 和 (fùyónghé).《 漢 字 結 構 和 構 造 成 分 的 基<br />
楚 研 究 》(hànzì jiékoù he koùzào chéngfènde jī<br />
chǔ yānjiù). 上 海 教 育 出 版 社 (shànghǎi jiàoyù chū<br />
bǎnshè), 1993. [FU Yong He, BasicResearchonthe<br />
Structureof<strong>Ch<strong>in</strong>ese</strong>charactersandTheirConstituents,<br />
ShanghaiEducation Press,1993.]<br />
[4] 傅 永 和 (fùyónghé).《 中 文 信 息 處 理 》(zhōngwén<br />
xìnxī chǔlǐ). 廣 東 教 育 出 版 社 (guǎngdōng jiàoyù<br />
chūbǎnshè),1999. [FU Yong He, <strong>Ch<strong>in</strong>ese</strong>InformationProcess<strong>in</strong>g,GuangdongEducation<br />
Press,1999.]<br />
[5] 馮 志 偉 (féngzhìwěi).《 計 算 語 言 學 基 礎 》(jìsuàn<br />
yǔyánxué jīchǔ). 商 務 印 書 館 (shāngwù yìnshū<br />
guǎng), 2001. [FENG Zhi Wei, Foundations of<br />
Computational L<strong>in</strong>guistics, The Commercial Press,<br />
2001.]<br />
[6] J. D. Hobby. A METAFONT-likesystemwithPost-<br />
Script output, <strong>TUG</strong>boat, Vol. 10, No. 4, pp. 505–<br />
512, December1989.<br />
[7] John D. Hobby. AUser’sManualfor METAPOST,<br />
AT&T Bell Laboratory Computer Science TechnicalReport,<br />
No. 162, 1992.<br />
[8] Ken Lunde. CJKV<strong>in</strong>formationprocess<strong>in</strong>g. O’Reilly,<br />
1999.<br />
[9] The Unicode Consortium. The Unicode Standard,<br />
Version3.0. Addison-Wesley,2000.<br />
[10] The Unicode Consortium. Unicode Version<br />
3.1, The Unicode Standard Annex No. 27.<br />
http://www.unicode.org/reports/tr27/<br />
[11] The Unicode Consortium. Unicode Version<br />
3.2, The Unicode Standard Annex No. 28.<br />
http://www.unicode.org/reports/tr28/<br />
[12] The Unicode Consortium. TheUnicodeStandard,<br />
Version4.0. Addison-Wesley,2003.<br />
[13] Wai Wong. pbmtogf—convert<strong>in</strong>g bitmap to font.<br />
http://www.comp.hkbu.edu.hk/~wwong/<br />
typeset/pbmtogf/<br />
[14] CandyL.K.Yiu,WaiWong. <strong>Ch<strong>in</strong>ese</strong>charactersynthesisus<strong>in</strong>g<br />
METAPOST. <strong>TUG</strong>boat,Vol.24, No.1,<br />
pp.85–93, 2003 (<strong>TUG</strong>2003 proceed<strong>in</strong>gs).<br />
586 <strong>TUG</strong>boat,Volume24 (2003), No. 3—Proceed<strong>in</strong>gsof EuroTEX2003
<strong>Typesett<strong>in</strong>g</strong><strong>Rare</strong> <strong>Ch<strong>in</strong>ese</strong><strong>Characters</strong> <strong>in</strong> L A TEX<br />
HanGlyph<br />
file<br />
Font property<br />
pl file<br />
pltotf<br />
Font metric<br />
tfm file<br />
merged pbm<br />
pbmtogf<br />
CCSS<br />
pbmmerge<br />
Generic font<br />
gf file<br />
gftopk<br />
Packed font<br />
pk file<br />
Character<br />
glyphs <strong>in</strong> eps<br />
epstopbm<br />
glyphs <strong>in</strong> pbm<br />
¡ ¢<br />
FIG. 5: The fontmak<strong>in</strong>gprocess<br />
£<br />
Glyph HanGlyph expression<br />
¤<br />
p n |<br />
¥<br />
hSt ba_1 | !~<br />
¦<br />
wang_b_2 d q | ~ | !~ wang_a_2 | !~<br />
§<br />
zhou_1 shu_1 |<br />
¨<br />
pian_4 /Pq kn ^7 _]/|<br />
©<br />
h s+ d Z|!~ h= @ ’<br />
Å<br />
qih sih ^7 0.8 0.8 _ x<strong>in</strong>_1_b | hhs1 =<br />
«<br />
div qih sih ^7 0.8 0.8 _ x<strong>in</strong>_1_b | ^1<br />
¬<br />
x<strong>in</strong>_1_b D q | ~ | x<strong>in</strong>_1_b |<br />
<br />
ri_4 ri_4 = <<br />
Æ<br />
p h=0.4]e+#[<br />
¯<br />
sih ph m = |!~ 0.5 0.5<br />
<br />
sih qian_4 |!~ 0.5 0.5<br />
<br />
she_2 x<strong>in</strong>_1_b |<br />
<br />
zhi_3_a wp |<br />
<br />
h S+