07.10.2014 Views

Unicode mathematics in LATEX: Advantages and challenges - TUG

Unicode mathematics in LATEX: Advantages and challenges - TUG

Unicode mathematics in LATEX: Advantages and challenges - TUG

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

218 <strong>TUG</strong>boat, Volume 31 (2010), No. 2<br />

<strong>and</strong> simply switch<strong>in</strong>g the math font. This meant<br />

that, <strong>in</strong>ternally, \mathbf <strong>and</strong> friends simply resulted<br />

<strong>in</strong> a font switch, which is efficient <strong>and</strong> straightforward<br />

(although sometimes tricky to juggle with only<br />

sixteen maths fonts <strong>in</strong> eight-bit TEX).<br />

Unfortunately, th<strong>in</strong>gs are not so simple with<br />

unicode-math. With<strong>in</strong> <strong>Unicode</strong>, each alphabet style<br />

(for bold <strong>and</strong> script <strong>and</strong> so on) is encoded <strong>in</strong> a dist<strong>in</strong>ct<br />

<strong>Unicode</strong> range. For example, the italic mathematical<br />

‘w’ accessed with $w$ is symbol U+1D464. The<br />

bold upright mathematical ‘w’ ($\mathbf{w}$) is<br />

U+1D430. In order to switch from one to the other<br />

us<strong>in</strong>g a comm<strong>and</strong> like \mathbf requires that the<br />

mathcodes for all affected letters must change locally<br />

<strong>in</strong>side its argument. This isn’t too <strong>in</strong>efficient, s<strong>in</strong>ce<br />

assign<strong>in</strong>g \mathcodes is pretty fast, but it’s not particularly<br />

elegant. It would be easier not to support<br />

the \mathbf{...}-style syntax at all <strong>and</strong> <strong>in</strong>stead<br />

refer to such symbols with macros, such as \mbfw for<br />

the bold ‘w’. But we must support the switch<strong>in</strong>gstyle<br />

comm<strong>and</strong>s for backwards compatibility.<br />

There are some alternatives to do<strong>in</strong>g th<strong>in</strong>gs this<br />

way, but they all have trade-offs. The simplest solution<br />

would be to use X TEX’s <strong>in</strong>put mapp<strong>in</strong>g feature<br />

that allows letters <strong>in</strong> the source to be transformed<br />

<strong>in</strong>to other letters before typesett<strong>in</strong>g. Thus, the ‘variable<br />

math code’ approach as used <strong>in</strong> L A TEX could be<br />

used for <strong>Unicode</strong> maths alphabets. However, this<br />

system is less flexible (features such as Example 8<br />

would be more difficult) <strong>and</strong> an alternative approach<br />

would be required for LuaTEX.<br />

Another approach would be to use (math-)active<br />

characters for all maths symbols. In this approach,<br />

ASCII letters such as ‘a’ would be active <strong>in</strong> maths<br />

mode <strong>and</strong> exp<strong>and</strong> (as if it were a macro) to a construction<br />

such as<br />

E<br />

\csname mathchar_\mathstyle_a \endcsname<br />

where \mathstyle would resolve to ‘up’ or ‘bf’ (etc.)<br />

depend<strong>in</strong>g on the context <strong>and</strong> \mathchar_up_a <strong>and</strong><br />

\mathchar_bf_a (etc.) would be def<strong>in</strong>ed accord<strong>in</strong>gly<br />

with the appropriate <strong>Unicode</strong> maths glyph. This is<br />

more efficient for font switch<strong>in</strong>g but less efficient (<strong>and</strong><br />

perhaps more fragile) when symbol remapp<strong>in</strong>g is not<br />

tak<strong>in</strong>g place. Us<strong>in</strong>g active characters is the technique<br />

used by the breqn package to do its automatic l<strong>in</strong>e<br />

break<strong>in</strong>g of <strong>mathematics</strong>, <strong>and</strong> extend<strong>in</strong>g that system<br />

for unicode-math would be quite logical. One way or<br />

the other, breqn compatibility is planned for unicodemath<br />

<strong>in</strong> the future.<br />

Us<strong>in</strong>g LuaTEX for alphabet remapp<strong>in</strong>g (which<br />

is how ConTEXt’s implementation works) is probably<br />

the best way to tackle this problem, but while<br />

unicode-math is written for X L A TEX as well (<strong>and</strong><br />

E<br />

it will cont<strong>in</strong>ue to be for the immediate future) we<br />

must stick with TEX-based programm<strong>in</strong>g solutions.<br />

8 Experiences writ<strong>in</strong>g the package<br />

Some aspects of writ<strong>in</strong>g the unicode-math package<br />

have been more organised than other L A TEX code<br />

I’ve written. As more <strong>and</strong> more L A TEX code is be<strong>in</strong>g<br />

developed publicly <strong>in</strong> source code repositories such<br />

as GitHub, BitBucket, <strong>and</strong> others, I would like to<br />

discuss quickly some of the <strong>in</strong>frastructure of this<br />

package’s development.<br />

8.1 Cross-platform development<br />

The fontspec <strong>and</strong> unicode-math packages are both<br />

now targeted towards runn<strong>in</strong>g on both X L A TEX <strong>and</strong><br />

LuaL A TEX. Despite small differences <strong>in</strong> how certa<strong>in</strong><br />

th<strong>in</strong>gs are done, this generally works well for both.<br />

Most of the code <strong>in</strong> unicode-math <strong>and</strong> fontspec<br />

has been written (or re-written) with the expl3 programm<strong>in</strong>g<br />

<strong>in</strong>terface. This has proven to be a very<br />

useable <strong>in</strong>terface to numerous high-level programm<strong>in</strong>g<br />

constructs; expl3 allows more complex ideas<br />

to be easily realisable with<strong>in</strong> the limitations of TEX<br />

macro programm<strong>in</strong>g.<br />

In this shared X L A TEX/LuaL A TEX environment,<br />

Lua code is restricted to a m<strong>in</strong>imum <strong>in</strong> order to<br />

m<strong>in</strong>imise separate code branches for each eng<strong>in</strong>e,<br />

as much as possible. Functions <strong>in</strong> Lua are ‘hidden’<br />

<strong>in</strong>side TEX macros, so all of the ma<strong>in</strong> programm<strong>in</strong>g <strong>in</strong><br />

unicode-math resembles pla<strong>in</strong> old TEX programm<strong>in</strong>g.<br />

I personally f<strong>in</strong>d this much easier to read than mix<strong>in</strong>g<br />

Lua code <strong>and</strong> TEX macro code together.<br />

8.2 Version control<br />

E<br />

E<br />

I use the Git version control system, <strong>and</strong> for some<br />

time I’ve been us<strong>in</strong>g GitHub repositories 9 for most<br />

of my public code (L A TEX <strong>and</strong> otherwise). GitHub<br />

provides free accounts for developers of open source<br />

software, <strong>and</strong> their site <strong>in</strong>cludes a very functional<br />

bug tracker/issue reporter per project. Track<strong>in</strong>g bugs<br />

over several years is certa<strong>in</strong>ly no fun with email.<br />

I’ve had many people contribute code <strong>and</strong> provide<br />

feedback through the GitHub project page, <strong>and</strong><br />

I highly recommend such a public development environment<br />

for all package developers.<br />

The ma<strong>in</strong> advantages to these systems, for me,<br />

are the ease with which others can collaborate on<br />

code or documentation writ<strong>in</strong>g <strong>and</strong> with how issues<br />

can be resolved. Hav<strong>in</strong>g a public code repository<br />

also allows users to access historical versions of the<br />

code, which can be important for those on legacy<br />

systems who cannot upgrade their distributions but<br />

9 http://github.com/wspr<br />

Will Robertson

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!