12.07.2015 Views

Design of Urdu Virtual Keyboard

Design of Urdu Virtual Keyboard

Design of Urdu Virtual Keyboard

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Proceedings <strong>of</strong> the Conference on Language & Technology 2009<strong>Design</strong> <strong>of</strong> <strong>Urdu</strong> <strong>Virtual</strong> <strong>Keyboard</strong>M. Aamir Khan, M. Abid Khan and M. Naveed AliDepartment <strong>of</strong> Computer Science University <strong>of</strong> Peshawar, Pakistanbitvox@yahoo.com, m.abid6@gmail.com, naveed_asadecp@yahoo.comAbstractThis paper presents the first ever virtual keyboardlayout based on character frequency analysis <strong>of</strong> <strong>Urdu</strong>Corpus. To optimize the keyboard layout Monte CarloSimulation with simulated annealing is used.Furthermore, the proposed keyboard layout isaugmented with word prediction list derived from<strong>Urdu</strong> corpus to speed up text entry. Performanceanalysis <strong>of</strong> keyboard layout is done for justificationpurposes.1. Introduction<strong>Virtual</strong>/s<strong>of</strong>t keyboards allow users, to input textusing touch screen and stylus. For English language,several virtual keyboard layouts have been proposed.These include MacKenzie’s and Zhang’s OPTI layout,improved OPTI layout in a 5x6 layout (OPTI II) with38 wpm (words per minute), FITALY keyboard andChubon keyboards [6]. Evaluation <strong>of</strong> the performance<strong>of</strong> virtual keyboards involves the use <strong>of</strong> Fitts’ Law[1][2]. <strong>Keyboard</strong> input speed is measured in wpm(words per minute). Mean time (MT), to move to a keyon virtual keyboard, is computed in terms <strong>of</strong> movingto a target key K <strong>of</strong> width W lying at distance A fromthe current position <strong>of</strong> pointing device [3]. The layout<strong>of</strong> keys on virtual keyboard should be such that tominimize the mean time for all digraph movements.The digraph frequencies are a natural feature <strong>of</strong>languages. Mackenzie and Zhang evaluated theperformance <strong>of</strong> their virtual keyboard by computing27x27 digraph frequencies from a corpus [5]. Thedistances (amplitudes) for all the 27 x 27 digraphmovements in a given keyboard layout werecomputed, and for each movement the Fitts’ Law wasused to compute the MT [5]. The following equationwas used to compute the MT [5].1 ⎛ A ⎞MT = log2 ⎜ + 1⎟4.9 ⎝W⎠Here, W is the width <strong>of</strong> the key and A is thedistance to move to the target key K. Each mean timewas weighted by the digraph probability. The wpm(words per minute) was calculated by multiplying MTwith the average number <strong>of</strong> characters per word. Thecomputed wpm is an “upper limit” on the text entryspeed. The “visual scan” time to find a key wasassumed to be zero. The keyboard layout can be donemanually [5] or using optimization techniques such asMonte Carlo simulation [6].2. <strong>Urdu</strong> virtual keyboard design<strong>Urdu</strong> has 37 base characters. The character set usedfor designing the keyboard, proposed in this researchpaper, also contains Arabic characters. This facilitateskeypad to be used for entering Arabic text as well, butit is not optimized for Arabic language. Table 1 showsthe set <strong>of</strong> <strong>Urdu</strong> alphabets.Table 1: <strong>Urdu</strong> language charactersٹت پ ب ا خح چ ج ث ڑر ذ ڈ د صش س ژ ز غع ظ ط ض لگ ک ق ف هہ و ن م ئں آ ے ی ۓة ۀ ء ؤ The first step in designing a virtual keyboard is todetermine the digraph frequencies. For <strong>Urdu</strong> languagecomputing, digraph frequencies require a corpus. Araw corpus consisting <strong>of</strong> 16,638,852 words wascollected. It contained collections <strong>of</strong> newspaper126


Proceedings <strong>of</strong> the Conference on Language & Technology 2009articles, books and magazines. Table 2 shows thecharacter frequencies <strong>of</strong> individual <strong>Urdu</strong> alphabets indescending order.Table 2: <strong>Urdu</strong> character frequenciesUnicode Alphabet Frequency Percentage627ا 6733610 12.23570 6ccی 5752357 10.45266 6a9ک 3911143 7.10697 631ر 3669392 6.66768 648و 3327481 6.04639 6c1ہ 2994305 5.44098 6d2ے 2857846 5.19302 646ن 2773651 5.04003 645م 2684946 4.87884 62aت 2117669 3.84803 633س 1987451 3.61141 644ل 1915841 3.48129 628ب 1492997 2.71294 6baں 1469466 2.67018 62fد 1431230 2.60070 67eپ 914273 1.66133 62cج 844670 1.53486 6beه 800600 1.45478 626ئ 664594 1.20764 6afگ 643263 1.16888 639ع 636166 1.15598 641ف 546973 0.99391 642ق 544460 0.98934 634ش 532262 0.96718 62dح 501602 0.91147 632ز 454158 0.82525 679ٹ 420666 0.76440 686چ 358159 0.65081 62eخ 352729 0.64095 635ص 327434 0.59498 622آ 259879 0.47223 637ط 220613 0.40088 688ڈ 183081 0.33268 691ڑ 143244 0.26029 636ض 142813 0.25951 638ظ 104163 0.18928 63aغ 100331 0.18231 630ذ 79372 0.14423 62bث 69641 0.12655 624ؤ 32355 0.05879 621ء 24930 0.04530 6c2ۀ 4390 0.00798 698ژ 2522 0.00458 629ة 2275 0.00413 6d3ۓ 1479 0.00269 Total 55032482 100.00000The 46x46 digraph frequency table was computedfrom the corpus. The digraph is shown in the form <strong>of</strong>color chart in Figure 1. The dark shaded cells representhigher frequency digraphs, whereas light shaded cellsrepresent lower frequency digraphs.In Figure 1, the character on the Y axis shows thefirst character while the one on the X axis shows thesecond character in a digraph. The order <strong>of</strong> charactersin columns and rows <strong>of</strong> digraph’s color chart is thesame as in table 1. The last column and the last rowrepresent the space character. The digraph frequencytable was used for computing the wpm performance <strong>of</strong>the keyboard.To compute the performance <strong>of</strong> a given keyboardlayout, the following equation was used [5].MTkk= ∑∑i= 0 j=01 i , j2⎜log4.9⎛ A⎝ W⎞+ 1⎟×d(i,j)⎠Here, A i,j is the distance from key i to key j. Thed(i,j) represents the digraph frequency <strong>of</strong> character ifollowed by character j. The variable k is the number<strong>of</strong> characters in a given language. The diagonal entriesin digraph frequency table where i=j denote repeatedcharacter where no movement <strong>of</strong> stylus is involved.For repeated characters, the repeat stylus tapping timewas set as 0.127 seconds as in Zhai et.al [6].127


Proceedings <strong>of</strong> the Conference on Language & Technology 2009Figure 1: <strong>Urdu</strong> digraph color chart<strong>Design</strong>ing a keyboard layout is a combinatorialtask and requires O(n!) searches [6]. The layoutshould be arranged such that the MT is minimized.For optimizing the keyboard layout, 700 runs <strong>of</strong>Monte-Carlo Simulation with simulated annealingwere executed. The best layout was found at 53 rdsimulation. In each run 100,000-200,000 randommovements were tried each on keyboard layout.Annealing schedule was adjusted by trial and error.The width <strong>of</strong> each key was set at 50 pixels. The shape<strong>of</strong> keys is based on the work <strong>of</strong> Zhai et. al [6]. Figure2 shows the optimized layout <strong>of</strong> the <strong>Urdu</strong> keyboard.The speed <strong>of</strong> entering text, using the layout infigure 2, was computed using the following equationfrom Zhai et.al [6].wpm = 60 / AWL× MTwhere AWL is the average word length in a language.MT is the mean time. In <strong>Urdu</strong> language, the average128


Proceedings <strong>of</strong> the Conference on Language & Technology 2009word size was found to be 7 characters. The constant60 is the number <strong>of</strong> seconds in a minute. When aspace after each word is added it becomes 8characters.Figure 3: Improved <strong>Urdu</strong> virtual keyboard toutilize the empty gaps between keysA prototype version <strong>of</strong> keyboard wasimplemented using Micros<strong>of</strong>t Visual C++ 6.0 forMicros<strong>of</strong>t windows. The program helped the user byhighlighting the next probable keys and drawingrings around the keys. The darker color showed thehigher probability <strong>of</strong> occurrence while the lightercolor showed lower probability <strong>of</strong> occurrence. Themost probable next character shows the ring inblinking mode.Figure 4 shows the typing <strong>of</strong> the word مشاعرہ onthe keyboard. After the first three characters havebeen entered, the next set <strong>of</strong> probable characters ishighlighted in different shades <strong>of</strong> red color. T<strong>of</strong>urther improve the performance <strong>of</strong> user input speed,a prediction list was added. Figure 5 shows the use <strong>of</strong>prediction list.Figure 2: Corpus based optimized <strong>Urdu</strong>virtual keyboard layout arranged in 7x7 cellsFor the optimized keyboard, MT was found to beMT = 0.20609985wpm = (60/8×0.20609985) = 36.3901 wpmThe predicted speed <strong>of</strong> the keyboard is thus36.3901 words per minute. To utilize the spacebetween circular keys, the shape <strong>of</strong> keys was changedto hexagonal. Figure 3 shows the improved design <strong>of</strong>optimized layout.مشاعرہ Figure 4: Predictive input <strong>of</strong> the wordمشاعرہ129


Proceedings <strong>of</strong> the Conference on Language & Technology 2009Figure 5: Input with prediction list assistanceFor analysis, the performance <strong>of</strong> the keyboard wasdetermined on various words. Table 3 shows thecomputed distances <strong>of</strong> typing 38 most frequentlyoccurring words [2] in the corpus along with a spacecharacter after each word. The distances arecomputed in terms <strong>of</strong> keys to traverse a givenparticular word. The distance covered depends on theposition <strong>of</strong> keys and characters in a word.Table 3: Distances for 38 frequent wordsalong with a space characterWord Characters Frequency Distance618958 3 ک ے کےاپنےا پ ن ے 41801 10 کہاک ہ ا 41399 5 آپآ پ 40080 6 گياگ ی ا 39963 5 جسج س 38483 5 تهےت ه ے 37790 6 تکت ک 37160 4 ميںم ی ں 510330 6 کیک ی 495344 3 ہےہ ے 417230 4 اورا و ر 352897 5 سےس ے 319683 4 کاک ا 268072 4 کوک و 239480 4 اسا س 221585 5 نےن ے 200405 4 ہيںہ ی ں 196799 6 کہک ہ 184643 3 پرپ ر 173181 5 بهیب ه ی 127457 5 يہی ہ 120063 2 ايکا ی ک 116695 4 کرک ر 111749 5 نہيںن ہ ی ں 103967 7 انا ن 97549 3 ہوہ و 90129 4 کياک ی ا 89452 4 توت و 82484 3 وہو ہ 75497 3 لئےل ئ ے 60458 8 تهات ه ا 55527 5 پاکستانپ ا ک س ت ا ن 55404 15 کرنےک ر ن ے 52084 8 جوج و 51059 4 ہیہ ی 45321 2 وو 43449 2 نہن ہ 42718 3 3. EvaluationThe proposed layout presented in Figure 2 wasevaluated on 20 students <strong>of</strong> computer scienceprogram. The average text entry speed was found tobe 13.47 wpm based on an initial two hour trainingprior to the evaluation. The maximum speed achievedwas 22.5 wpm. When compared to virtual keyboardsfor English language, the text entry speed iscomparable to OPTI and QWERTY layout [5]. Thepredicted speed <strong>of</strong> OPTI layout is 58.2 words. Actualspeed <strong>of</strong> OPTI has been found to be 44.3 wpm after20 sessions <strong>of</strong> text entry, each for 45 minutes [5].With extended training <strong>of</strong> the user the text entryspeed <strong>of</strong> <strong>Urdu</strong> virtual keyboard can be improved.4. ConclusionThe design <strong>of</strong> the virtual keyboard presented inthis paper is based on character analysis <strong>of</strong> <strong>Urdu</strong>corpus. The virtual keyboard is particularly useful foroccasional users, who do not want or do not havetime to learn the hardware based keyboard layout.Being the first virtual keyboard for <strong>Urdu</strong> language,comparative study <strong>of</strong> performance with other virtualkeyboards is not possible. The predicted speed <strong>of</strong> textentry using this keyboard layout is 36.3901 wpm.The keyboard is also usable for entering Arabic text,but it is not optimized for Arabic.5. References[1] P..M. Fitts, “The information capacity <strong>of</strong> the humanmotor system in controlling the amplitude <strong>of</strong> movement”,Journal <strong>of</strong> Experimental Psychology, 1954, pp. 381-391.[2] M. Ijaz and S. Hussain. “Corpus Based <strong>Urdu</strong> LexiconDevelopment”, in proc. the conference on Language &Technology (CLT07), Bara Gali Summer Campus,University <strong>of</strong> Peshawar, Agust 7-11, 2007, pp. 85-94.130


Proceedings <strong>of</strong> the Conference on Language & Technology 2009[3] I. S. MacKenzie, A. Sellen and W. Buxton, “Acomparison <strong>of</strong> input devices in elemental pointing anddragging tasks”, in proc. CHI, 1991.[4] I. S. MacKenzie, “A note on the information theoreticbasis for Fitts’ law”, Journal <strong>of</strong> Motor Behavior, 1989, pp.323-330.[5] I. S. MacKenzie and S. X. Zhang, “The <strong>Design</strong> andevaluation <strong>of</strong> a High performance S<strong>of</strong>t <strong>Keyboard</strong>”, in proc.CHI 91, 15-20 May 1999.[6] S. Zhai, M. Hunter and B. A. Smith, “The Metropolis<strong>Keyboard</strong> -- An Exploration <strong>of</strong> Quantitative Techniques for<strong>Virtual</strong> <strong>Keyboard</strong> <strong>Design</strong>”, in proc. UIST'2000 - the 13thAnnual ACM Symposium on User Interface S<strong>of</strong>tware andTechnology, 2000.131

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!