Unicode mathematics in LATEX: Advantages and challenges - TUG
Unicode mathematics in LATEX: Advantages and challenges - TUG
Unicode mathematics in LATEX: Advantages and challenges - TUG
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
218 <strong>TUG</strong>boat, Volume 31 (2010), No. 2<br />
<strong>and</strong> simply switch<strong>in</strong>g the math font. This meant<br />
that, <strong>in</strong>ternally, \mathbf <strong>and</strong> friends simply resulted<br />
<strong>in</strong> a font switch, which is efficient <strong>and</strong> straightforward<br />
(although sometimes tricky to juggle with only<br />
sixteen maths fonts <strong>in</strong> eight-bit TEX).<br />
Unfortunately, th<strong>in</strong>gs are not so simple with<br />
unicode-math. With<strong>in</strong> <strong>Unicode</strong>, each alphabet style<br />
(for bold <strong>and</strong> script <strong>and</strong> so on) is encoded <strong>in</strong> a dist<strong>in</strong>ct<br />
<strong>Unicode</strong> range. For example, the italic mathematical<br />
‘w’ accessed with $w$ is symbol U+1D464. The<br />
bold upright mathematical ‘w’ ($\mathbf{w}$) is<br />
U+1D430. In order to switch from one to the other<br />
us<strong>in</strong>g a comm<strong>and</strong> like \mathbf requires that the<br />
mathcodes for all affected letters must change locally<br />
<strong>in</strong>side its argument. This isn’t too <strong>in</strong>efficient, s<strong>in</strong>ce<br />
assign<strong>in</strong>g \mathcodes is pretty fast, but it’s not particularly<br />
elegant. It would be easier not to support<br />
the \mathbf{...}-style syntax at all <strong>and</strong> <strong>in</strong>stead<br />
refer to such symbols with macros, such as \mbfw for<br />
the bold ‘w’. But we must support the switch<strong>in</strong>gstyle<br />
comm<strong>and</strong>s for backwards compatibility.<br />
There are some alternatives to do<strong>in</strong>g th<strong>in</strong>gs this<br />
way, but they all have trade-offs. The simplest solution<br />
would be to use X TEX’s <strong>in</strong>put mapp<strong>in</strong>g feature<br />
that allows letters <strong>in</strong> the source to be transformed<br />
<strong>in</strong>to other letters before typesett<strong>in</strong>g. Thus, the ‘variable<br />
math code’ approach as used <strong>in</strong> L A TEX could be<br />
used for <strong>Unicode</strong> maths alphabets. However, this<br />
system is less flexible (features such as Example 8<br />
would be more difficult) <strong>and</strong> an alternative approach<br />
would be required for LuaTEX.<br />
Another approach would be to use (math-)active<br />
characters for all maths symbols. In this approach,<br />
ASCII letters such as ‘a’ would be active <strong>in</strong> maths<br />
mode <strong>and</strong> exp<strong>and</strong> (as if it were a macro) to a construction<br />
such as<br />
E<br />
\csname mathchar_\mathstyle_a \endcsname<br />
where \mathstyle would resolve to ‘up’ or ‘bf’ (etc.)<br />
depend<strong>in</strong>g on the context <strong>and</strong> \mathchar_up_a <strong>and</strong><br />
\mathchar_bf_a (etc.) would be def<strong>in</strong>ed accord<strong>in</strong>gly<br />
with the appropriate <strong>Unicode</strong> maths glyph. This is<br />
more efficient for font switch<strong>in</strong>g but less efficient (<strong>and</strong><br />
perhaps more fragile) when symbol remapp<strong>in</strong>g is not<br />
tak<strong>in</strong>g place. Us<strong>in</strong>g active characters is the technique<br />
used by the breqn package to do its automatic l<strong>in</strong>e<br />
break<strong>in</strong>g of <strong>mathematics</strong>, <strong>and</strong> extend<strong>in</strong>g that system<br />
for unicode-math would be quite logical. One way or<br />
the other, breqn compatibility is planned for unicodemath<br />
<strong>in</strong> the future.<br />
Us<strong>in</strong>g LuaTEX for alphabet remapp<strong>in</strong>g (which<br />
is how ConTEXt’s implementation works) is probably<br />
the best way to tackle this problem, but while<br />
unicode-math is written for X L A TEX as well (<strong>and</strong><br />
E<br />
it will cont<strong>in</strong>ue to be for the immediate future) we<br />
must stick with TEX-based programm<strong>in</strong>g solutions.<br />
8 Experiences writ<strong>in</strong>g the package<br />
Some aspects of writ<strong>in</strong>g the unicode-math package<br />
have been more organised than other L A TEX code<br />
I’ve written. As more <strong>and</strong> more L A TEX code is be<strong>in</strong>g<br />
developed publicly <strong>in</strong> source code repositories such<br />
as GitHub, BitBucket, <strong>and</strong> others, I would like to<br />
discuss quickly some of the <strong>in</strong>frastructure of this<br />
package’s development.<br />
8.1 Cross-platform development<br />
The fontspec <strong>and</strong> unicode-math packages are both<br />
now targeted towards runn<strong>in</strong>g on both X L A TEX <strong>and</strong><br />
LuaL A TEX. Despite small differences <strong>in</strong> how certa<strong>in</strong><br />
th<strong>in</strong>gs are done, this generally works well for both.<br />
Most of the code <strong>in</strong> unicode-math <strong>and</strong> fontspec<br />
has been written (or re-written) with the expl3 programm<strong>in</strong>g<br />
<strong>in</strong>terface. This has proven to be a very<br />
useable <strong>in</strong>terface to numerous high-level programm<strong>in</strong>g<br />
constructs; expl3 allows more complex ideas<br />
to be easily realisable with<strong>in</strong> the limitations of TEX<br />
macro programm<strong>in</strong>g.<br />
In this shared X L A TEX/LuaL A TEX environment,<br />
Lua code is restricted to a m<strong>in</strong>imum <strong>in</strong> order to<br />
m<strong>in</strong>imise separate code branches for each eng<strong>in</strong>e,<br />
as much as possible. Functions <strong>in</strong> Lua are ‘hidden’<br />
<strong>in</strong>side TEX macros, so all of the ma<strong>in</strong> programm<strong>in</strong>g <strong>in</strong><br />
unicode-math resembles pla<strong>in</strong> old TEX programm<strong>in</strong>g.<br />
I personally f<strong>in</strong>d this much easier to read than mix<strong>in</strong>g<br />
Lua code <strong>and</strong> TEX macro code together.<br />
8.2 Version control<br />
E<br />
E<br />
I use the Git version control system, <strong>and</strong> for some<br />
time I’ve been us<strong>in</strong>g GitHub repositories 9 for most<br />
of my public code (L A TEX <strong>and</strong> otherwise). GitHub<br />
provides free accounts for developers of open source<br />
software, <strong>and</strong> their site <strong>in</strong>cludes a very functional<br />
bug tracker/issue reporter per project. Track<strong>in</strong>g bugs<br />
over several years is certa<strong>in</strong>ly no fun with email.<br />
I’ve had many people contribute code <strong>and</strong> provide<br />
feedback through the GitHub project page, <strong>and</strong><br />
I highly recommend such a public development environment<br />
for all package developers.<br />
The ma<strong>in</strong> advantages to these systems, for me,<br />
are the ease with which others can collaborate on<br />
code or documentation writ<strong>in</strong>g <strong>and</strong> with how issues<br />
can be resolved. Hav<strong>in</strong>g a public code repository<br />
also allows users to access historical versions of the<br />
code, which can be important for those on legacy<br />
systems who cannot upgrade their distributions but<br />
9 http://github.com/wspr<br />
Will Robertson