Representing Myanmar in Unicode - Evertype
Representing Myanmar in Unicode - Evertype
Representing Myanmar in Unicode - Evertype
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Lower Vowel<br />
These are the standard Burmese lower vowels. This also specifies the order of ါိ ု as be<strong>in</strong>g U+102D U+102F.<br />
Karen Vowel<br />
This slot does not occur with the previous Lower Vowel slot. It conta<strong>in</strong>s characters that are used as vowels<br />
<strong>in</strong> other languages. Notice that <strong>in</strong> Sgaw Karen one can have two occurrences of ါၢ U+1062 as <strong>in</strong> ကၢၢ U+1000<br />
U+1062 U+1062 U+103A.<br />
Shan Vowel<br />
This upper diacritic may either occur above a consonant, or above a follow<strong>in</strong>g shan a vowel ါၢ U+1062. The<br />
position depends on which of the various Shan scripts is be<strong>in</strong>g written. As a result, this slot position is<br />
optimal s<strong>in</strong>ce it occurs between two slots conta<strong>in</strong><strong>in</strong>g U+1062.<br />
A Vowel<br />
Unlike other slots which may or may not <strong>in</strong>clude spac<strong>in</strong>g characters, the A vowel slot always conta<strong>in</strong>s a<br />
spac<strong>in</strong>g character. This is not to say that a the slot always has to be filled.<br />
Anusvara<br />
In Mon ါဲ U+1032 acts as a f<strong>in</strong>al character and so may occur over a ါာ U+102C. In the situation where it<br />
occurs after a ါိ U+102D, it is still rendered as a visual ligature with the U+1032 occurr<strong>in</strong>g first as <strong>in</strong>: ါိ.<br />
Different languages use ါံ U+1036 <strong>in</strong> different ways. ါံ U+1036 here is act<strong>in</strong>g as a f<strong>in</strong>al character, <strong>in</strong> contrast<br />
to the same character <strong>in</strong> the Upper Vowel slot where it is act<strong>in</strong>g as a vowel.<br />
There is one language <strong>in</strong> which this approach may result <strong>in</strong> a possible <strong>in</strong>visible ambiguity and that is Mon.<br />
Mon treats ansuvara ါံ U+1036 as a f<strong>in</strong>al nasal and as such it may follow a ါာ U+102C as per Burmese. In<br />
Mon, though, anusvara may also follow ါါ U+102B. But when that happens, it is rendered above the<br />
preced<strong>in</strong>g consonant. This may result <strong>in</strong> two valid sequences ါံ ါ U+1036 U+102B and U+102B U+1036,<br />
accord<strong>in</strong>g to the above table, render<strong>in</strong>g the same. This requires us to add a further constra<strong>in</strong>t that is not<br />
captured by the chart above: U+1036 may not directly precede U+102B. We can say this because there are no<br />
known situations <strong>in</strong> which U+1036, act<strong>in</strong>g as a vowel, is used <strong>in</strong> conjunction with the vowel U+102B.<br />
Likewise for ါုဲ U+102F U+1032. The visually <strong>in</strong>dentical sequence U+1032 followed by a Lower Vowel<br />
(U+102F or U+1032) is illegal. For more details on Mon see the section on Mon further down this document.<br />
Pwo Tones<br />
These are all spac<strong>in</strong>g and may take ါ့ U+1037.<br />
Lower Dot<br />
This lower dot slot position may only be filled when either of the A Vowel or Pwo Tone slots are filled. It is<br />
possible for two ါ့ U+1037 to occur. For example, <strong>in</strong> Pwo Karen: ကၠ ့ၫ့ U+1000 U+1060 U+1037 U+106B<br />
U+1037.<br />
Mon h<br />
Mon has the concept of contract<strong>in</strong>g f<strong>in</strong>al consonants us<strong>in</strong>g diacritics. One such is us<strong>in</strong>g medial h followed by<br />
an asat to represent a f<strong>in</strong>al h. S<strong>in</strong>ce the medial h may occur under a U+102C it is listed here before the visible<br />
virama which will also occur. This slot is only filled if there is a U+102C and a follow<strong>in</strong>g visible virama.<br />
Visible Virama<br />
This is only used if there is a spac<strong>in</strong>g character after the consonant on which the asat is rendered (I.e.<br />
someth<strong>in</strong>g <strong>in</strong> any of the A Vowel or Pwo Tone slots).<br />
Visarga<br />
The visarga slot not only <strong>in</strong>cludes visarga U+1038 but also Shan tone letters.<br />
<strong>Represent<strong>in</strong>g</strong> <strong>Myanmar</strong> <strong>in</strong> <strong>Unicode</strong> Page 7 of 37 Version: 433