26.08.2013 Views

Representing Myanmar in Unicode - Evertype

Representing Myanmar in Unicode - Evertype

Representing Myanmar in Unicode - Evertype

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Lower Vowel<br />

These are the standard Burmese lower vowels. This also specifies the order of ါိ ု as be<strong>in</strong>g U+102D U+102F.<br />

Karen Vowel<br />

This slot does not occur with the previous Lower Vowel slot. It conta<strong>in</strong>s characters that are used as vowels<br />

<strong>in</strong> other languages. Notice that <strong>in</strong> Sgaw Karen one can have two occurrences of ါၢ U+1062 as <strong>in</strong> ကၢၢ U+1000<br />

U+1062 U+1062 U+103A.<br />

Shan Vowel<br />

This upper diacritic may either occur above a consonant, or above a follow<strong>in</strong>g shan a vowel ါၢ U+1062. The<br />

position depends on which of the various Shan scripts is be<strong>in</strong>g written. As a result, this slot position is<br />

optimal s<strong>in</strong>ce it occurs between two slots conta<strong>in</strong><strong>in</strong>g U+1062.<br />

A Vowel<br />

Unlike other slots which may or may not <strong>in</strong>clude spac<strong>in</strong>g characters, the A vowel slot always conta<strong>in</strong>s a<br />

spac<strong>in</strong>g character. This is not to say that a the slot always has to be filled.<br />

Anusvara<br />

In Mon ါဲ U+1032 acts as a f<strong>in</strong>al character and so may occur over a ါာ U+102C. In the situation where it<br />

occurs after a ါိ U+102D, it is still rendered as a visual ligature with the U+1032 occurr<strong>in</strong>g first as <strong>in</strong>: ါိ.<br />

Different languages use ါံ U+1036 <strong>in</strong> different ways. ါံ U+1036 here is act<strong>in</strong>g as a f<strong>in</strong>al character, <strong>in</strong> contrast<br />

to the same character <strong>in</strong> the Upper Vowel slot where it is act<strong>in</strong>g as a vowel.<br />

There is one language <strong>in</strong> which this approach may result <strong>in</strong> a possible <strong>in</strong>visible ambiguity and that is Mon.<br />

Mon treats ansuvara ါံ U+1036 as a f<strong>in</strong>al nasal and as such it may follow a ါာ U+102C as per Burmese. In<br />

Mon, though, anusvara may also follow ါါ U+102B. But when that happens, it is rendered above the<br />

preced<strong>in</strong>g consonant. This may result <strong>in</strong> two valid sequences ါံ ါ U+1036 U+102B and U+102B U+1036,<br />

accord<strong>in</strong>g to the above table, render<strong>in</strong>g the same. This requires us to add a further constra<strong>in</strong>t that is not<br />

captured by the chart above: U+1036 may not directly precede U+102B. We can say this because there are no<br />

known situations <strong>in</strong> which U+1036, act<strong>in</strong>g as a vowel, is used <strong>in</strong> conjunction with the vowel U+102B.<br />

Likewise for ါုဲ U+102F U+1032. The visually <strong>in</strong>dentical sequence U+1032 followed by a Lower Vowel<br />

(U+102F or U+1032) is illegal. For more details on Mon see the section on Mon further down this document.<br />

Pwo Tones<br />

These are all spac<strong>in</strong>g and may take ါ့ U+1037.<br />

Lower Dot<br />

This lower dot slot position may only be filled when either of the A Vowel or Pwo Tone slots are filled. It is<br />

possible for two ါ့ U+1037 to occur. For example, <strong>in</strong> Pwo Karen: ကၠ ့ၫ့ U+1000 U+1060 U+1037 U+106B<br />

U+1037.<br />

Mon h<br />

Mon has the concept of contract<strong>in</strong>g f<strong>in</strong>al consonants us<strong>in</strong>g diacritics. One such is us<strong>in</strong>g medial h followed by<br />

an asat to represent a f<strong>in</strong>al h. S<strong>in</strong>ce the medial h may occur under a U+102C it is listed here before the visible<br />

virama which will also occur. This slot is only filled if there is a U+102C and a follow<strong>in</strong>g visible virama.<br />

Visible Virama<br />

This is only used if there is a spac<strong>in</strong>g character after the consonant on which the asat is rendered (I.e.<br />

someth<strong>in</strong>g <strong>in</strong> any of the A Vowel or Pwo Tone slots).<br />

Visarga<br />

The visarga slot not only <strong>in</strong>cludes visarga U+1038 but also Shan tone letters.<br />

<strong>Represent<strong>in</strong>g</strong> <strong>Myanmar</strong> <strong>in</strong> <strong>Unicode</strong> Page 7 of 37 Version: 433

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!