Recommendations ❘ 23Language ConsiderationsFOSS has typically been localized by a few volunteers working remotely, without the benefit of linguistsor a technical dictionary for translation. The work can take a long time and it can be riddled withinconsistencies or errors.The pace of FOSS localization is uneven. In countries where the language is similar to English and thereare many bilingual volunteers, FOSS localization is well established. Where governments and otheragencies have stepped in to provide financial support for localization, the results have also beenimpressive. (The “CJK” partnership of China, Japan, and Korea stands out as an example.)In countries without much technical infrastructure, localization of both commercial software and FOSSis slow. It is far slower when the language is not of the Indo-European group. Commercial companies seelittle profit in the work, and few local professionals have the time or skills to localize FOSS. Even thoughthe source code is <strong>free</strong>ly available for localization work to begin, few specialized technical standards ortechnical dictionaries exist.Some languages, particularly those with Latin-based scripts, are relatively easy to localize. Others can bevery difficult. As an example, both Lao and Thai share a 42-consonant script, with vowel and intonationmarks. These scripts follow complex rules of layout involving consonants, vowels, special symbols,conjuncts and ligatures. All of these writing systems share certain characteristics: spaces are not necessarilyused to separate words, and vowels appear before and after, under, over, and after consonants.Thai and Lao volunteers responsible for localizing FOSS have saved a great deal of time and avoidedfrustration by cooperating on technical issues, and sharing information on resources and tools.Across Asia, opportunities exist for shared localization efforts at the inter-governmental level. Many otherAsian languages share similarities, and often the programming tasks are nearly identical across similarlanguage groups. Properly funded and organized, pan-Asian software localization is a realistic goal.
24 ❘ FREE/OPEN SOURCE SOFTWARE: LOCALIZATIONANNEX A. LOCALIZATION – KEY CONCEPTSThis annex provides a quick tour of the key concepts of localization, so that those interested in localizingFOSS for their own language, get a broad picture of the kind of knowledge that is needed. The nextannex provides the technical details required to get started.StandardizationWhen two or more entities interact, common conventions are important. Car drivers must abide by trafficrules to prevent accidents. People need common conventions on languages and gestures tocommunicate. Likewise, software needs standards and protocols to interoperate seamlessly. In terms ofsoftware engineering, contracts between parts of programs need to be established beforeimplementation. The contracts are most important for systems developed by a large group of individualdevelopers from different backgrounds, and are extremely essential for cross-platform interoperability.Standards provide such contracts for all computing systems in the world. Software developers need toconform to such conventions to prevent miscommunication. Therefore, standardization should be thevery first step for any kind of software development, including localization.To start localization, it is a good idea to study related standards and use them throughout the project.Nowadays, many international standards and specifications have been developed to cover the languagesof the world. If these do not fit the project’s needs, one may consider participating in standardizationactivities. Important sources are:ISO/IEC JTC1 (International Organization for Standardization and International ElectrotechnicalCommission Joint Technical Committee 1): A joint technical committee for international standardsfor information technology. There are many subcommittees (SC) for different categories, underwhich working groups (WG) are formed to work on subcategories of standards. For example,ISO/IEC JTC1/SC2/WG2 is the working group for Universal Coded Character Set (UCS). Thestandardization process, however, proceeds in a closed manner. If the national standard body isan ISO/IEC member, it can propose the requirements for the project. Otherwise, one may needto approach individual committees. They may ask for participation as a specialist. Informationfor JTC1/SC2 (coded character sets) is published at anubis.dkuug.dk/JTC1/SC2. Information forJTC1/SC22 (programming languages, their environments and system software interfaces) is atanubis.dkuug.dk/JTC1/SC22.Unicode Consortium: A non-profit organization working on a universal character set. It is closelyrelated to ISO/IEC JTC1 subcommittees. Its Web site is at www.unicode.org, where channels ofcontribution are provided.Free Standards Group: A non-profit organization dedicated to accelerating the use of FOSS bydeveloping and promoting standards. Its Web site is at www.<strong>free</strong>standards.org. It is open toparticipation. There are a number of work groups under its umbrella, including OpenI18N forinternationalization (www.openi18n.org).Note, however, that some issues such as national keyboard maps and input/output methods are notcovered by the standards mentioned above. The national standards body should define these standards,or unify existing solutions used by different vendors, so that users can benefit from the consistency.UnicodeCharacters are the most fundamental units for representing text data of any particular language. Inmathematical terms, the character set defines the set of all characters used in a language. In ICT terms,the character set must be encoded as bytes in the storage, according to some conventions, called encoding.These conventions must be agreed upon both by the sender and receiver of data for the information toremain intact and exact.In the 1970s, the character set used by most programs consisted of letters of the English alphabet, decimaldigits and some punctuation marks. The most widely used encoding was the 7-bit ASCII (American