10.06.2015 Views

Arabic Domain Names

Arabic Domain Names

Arabic Domain Names

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Dr. Abdulaziz H. Al-Zoman<br />

Director of SaudiNIC - CITC<br />

Chairman of Steering Committee - <strong>Arabic</strong> <strong>Domain</strong> Name Pilot Project<br />

Internet Governance Forum, Shram Elshaikh, Nov 15-18, 2009


<strong>Arabic</strong> Users/Language in the Internet<br />

IDN: <strong>Arabic</strong> <strong>Domain</strong> <strong>Names</strong><br />

<strong>Arabic</strong> Content Initiative (KAIAC)


Population of Arab world: 343 M (5% of world popul.)<br />

Native Speakers:<br />

422 M<br />

English Speaking Arabs: 6.5 M (2%)<br />

= 3%<br />

Source: “Addressing the Gap in <strong>Arabic</strong> Content”, by Usama Fayyad, The 2 nd Int. Symposium on Computers and <strong>Arabic</strong>, October 11th, 2009<br />

Source: Internet world stats – internetworldstats.com


Source: Internet world stats – internetworldstats.com


Total web pages<br />

– 56 B – Yahoo<br />

Total English web pages<br />

– 14 B – Google Probe<br />

Total <strong>Arabic</strong> web pages<br />

– 0.7 B – Google Probe<br />

(2009)<br />

– 0.2 B – d1g.com web crawl<br />

(2008)<br />

Source: “Addressing the Gap in <strong>Arabic</strong> Content”, by Usama Fayyad, Executive Chairman d1g.com, presentation<br />

on th 2 nd Int. Symposium on Computers and <strong>Arabic</strong>, October 11th, 2009


Source: www.citc.gov.sa


Source: www.citc.gov.sa


Content:<br />

– KAIAC: King Abdullah Initiative for <strong>Arabic</strong> Content<br />

– http://www.econtent.org.sa<br />

To reach the next billion people, according to<br />

IGF speakers in Hyderabad : The Internet<br />

must support the large number of languages<br />

in the world at all levels<br />

IDN<br />

– <strong>Arabic</strong> <strong>Domain</strong> Name Pilot Project<br />

– http://www.arabic-domains.org<br />

Source: http://www.networkworld.com/news/2008/120308-internet-needsmultilingual-support-for.html


Current ASCII-based<br />

DNs are incapable of<br />

representing <strong>Arabic</strong><br />

characters<br />

Difficulty to reach<br />

<strong>Arabic</strong> sites using<br />

English DNs<br />

(pronunciation &<br />

spelling problems)<br />

Full <strong>Arabic</strong> DNs will<br />

encourage Arab users to<br />

widely use the Internet<br />

<strong>Arabic</strong> News paper<br />

صحيفة الشرق األوسط<br />

www.al-sharqalawsat.com<br />

www.asharqalawsat.com<br />

www.asharq-alaowsat.com<br />

www.elsharkelaosat.com<br />

<br />

E-government Site<br />

يسّر<br />

www.yasser.gov.sa<br />

www.yaser.gov.sa<br />

www.yasir.gov.sa<br />

www.yassir.gov.sa


Linguistic Issues<br />

• To define the accepted <strong>Arabic</strong> character<br />

set to be used for writing <strong>Arabic</strong> domain<br />

names<br />

<strong>Arabic</strong> TLDs<br />

Technical<br />

Solutions<br />

• To define the top-level domains of the<br />

<strong>Arabic</strong> domain name tree structure (i.e.,<br />

<strong>Arabic</strong> gTLDs, and ccTLDs)<br />

• IETF (IDNA), UNICODE, …<br />

Root Servers<br />

• ICANN/IANA,<br />

• (e.g., IDN ccTLD Fast track)


Identified a number of<br />

linguistic issues with<br />

respect to domain names:<br />

– Usage of Tashkeel<br />

(diacritics).<br />

– Usage of Kashidah.<br />

– Character folding.<br />

– Which numerical digits<br />

should be supported.<br />

– Connecting multiple words.<br />

– Mixing with other languages.<br />

– Usage of special characters.<br />

Discussed with more than 60 members of<br />

the AINC linguistic committee reaching<br />

final recommendations<br />

A number of surveys have been collected<br />

from the public<br />

•Received more than 550 responses<br />

•Collected information have been analyzed and<br />

compared with the recommendations of the AINC<br />

linguistic committee<br />

Discussed with <strong>Arabic</strong> linguists to get<br />

their guidance regarding the <strong>Arabic</strong><br />

linguistic issues in domain names.<br />

Discussed and agreed by the Arab Team<br />

fro <strong>Domain</strong> <strong>Names</strong> (under Arab League)


Tashkeel (Diacritics)<br />

Kasheeda<br />

Character folding:<br />

Teh Marbuta + Heh<br />

different forms of Hamzah<br />

Alif Maqsura+Yeh<br />

Numbers (numerical digits)<br />

Connecting Multiple Words<br />

Mixing Latin and <strong>Arabic</strong><br />

Characters<br />

Special Characters (e.g., @,<br />

#, $, %, ...)<br />

Tashkeel should not be allowed.<br />

However, if there is a need to allowed users to entered it as part of a<br />

domain name then it should be stripped off by nameprep<br />

Kasheeda should be disallowed<br />

Folding should not be allowed<br />

If it is technically possible, it is preferred to support both (Latin and<br />

<strong>Arabic</strong>) sets with folding to one set. Otherwise, Latin set is sufficient<br />

It is recommended that multiple words are separated by the<br />

character "-".<br />

It is recommended that <strong>Arabic</strong> domain names be pure <strong>Arabic</strong> and<br />

they should not be mixed with other languages.<br />

It is recommended that <strong>Arabic</strong> domain names should follow the<br />

standard with respect to the use of special characters.<br />

http://tools.ietf.org/html/draft-farah-adntf-ling-guidelines-04


Characters from Unicode <strong>Arabic</strong> Table (0600–06FF)<br />

0621 ‏(ء)‏ <strong>Arabic</strong> Letter HAMZA<br />

0622 ‏(آ)‏ <strong>Arabic</strong> Letter ALEF with MADDA above<br />

0623 ‏(أ)‏ <strong>Arabic</strong> Letter ALEF with HAMZA above<br />

0624 ‏(ؤ)‏ <strong>Arabic</strong> Letter WAW with HAMZA above<br />

0625 ‏(إ)‏ <strong>Arabic</strong> Letter ALEF with HAMZA below<br />

0626 ‏(ئ)‏ <strong>Arabic</strong> Letter YEH with HAMZA above<br />

0627 ‏(ا)‏ <strong>Arabic</strong> Letter ALEF<br />

0628 ‏(ب)‏ <strong>Arabic</strong> Letter BEH<br />

0629 ‏(ة)‏ <strong>Arabic</strong> Letter TEH MARBUTA<br />

062A ‏(ت)‏ <strong>Arabic</strong> Letter TEH<br />

062B ‏(ث)‏ <strong>Arabic</strong> Letter THEH<br />

062C ‏(ج)‏ <strong>Arabic</strong> Letter JEEM<br />

062D ‏(ح)‏ <strong>Arabic</strong> Letter HAH<br />

062E ‏(خ)‏ <strong>Arabic</strong> Letter KHAH<br />

062F ‏(د)‏ <strong>Arabic</strong> Letter DAL<br />

0630 ‏(ذ)‏ <strong>Arabic</strong> Letter THAL<br />

0631 ‏(ر)‏ <strong>Arabic</strong> Letter REH<br />

0632 ‏(ز)‏ <strong>Arabic</strong> Letter ZAIN<br />

0633 ‏(س)‏ <strong>Arabic</strong> Letter SEEN<br />

0634 ‏(ش)‏ <strong>Arabic</strong> Letter SHEEN<br />

0635 ‏(ص)‏ <strong>Arabic</strong> Letter SAD<br />

0636 ‏(ض)‏ <strong>Arabic</strong> Letter DAD<br />

0637 ‏(ط)‏ <strong>Arabic</strong> Letter TAH<br />

0638 ‏(ظ)‏ <strong>Arabic</strong> Letter ZAH<br />

0639 ‏(ع)‏ <strong>Arabic</strong> Letter AIN<br />

063A ‏(غ)‏ <strong>Arabic</strong> Letter GHAIN<br />

0641 ‏(ف)‏ <strong>Arabic</strong> Letter FEH<br />

0642 ‏(ق)‏ <strong>Arabic</strong> Letter QAF<br />

0643 ‏(ك)‏ <strong>Arabic</strong> Letter KAF<br />

0644 ‏(ل)‏ <strong>Arabic</strong> Letter LAM<br />

0645 ‏(م)‏ <strong>Arabic</strong> Letter MEEM<br />

0646 ‏(ن)‏ <strong>Arabic</strong> Letter NOON<br />

0647 ‏(ھ)‏ <strong>Arabic</strong> Letter HEH<br />

0648 ‏(و)‏ <strong>Arabic</strong> Letter WAW<br />

0649 ‏(ى)‏ <strong>Arabic</strong> Letter ALEF MAKSURA<br />

064A ‏(ي)‏ <strong>Arabic</strong> Letter YEH<br />

0660 (0) <strong>Arabic</strong>-Indic Digit Zero<br />

0661 (1) <strong>Arabic</strong>-Indic Digit One<br />

0662 (2) <strong>Arabic</strong>-Indic Digit Two<br />

0663 (3) <strong>Arabic</strong>-Indic Digit Three<br />

0664 (4) <strong>Arabic</strong>-Indic Digit Four<br />

0665 (5) <strong>Arabic</strong>-Indic Digit Five<br />

0666 (6) <strong>Arabic</strong>-Indic Digit Six<br />

0667 (7) <strong>Arabic</strong>-Indic Digit Seven<br />

0668 (8) <strong>Arabic</strong>-Indic Digit Eight<br />

0669 (9) <strong>Arabic</strong>-Indic Digit Nine


Characters from Unicode Basic Latin Table (0000–007F):<br />

0030 (0) Digit Zero<br />

0031 (1) Digit One<br />

0032 (2) Digit Two<br />

0033 (3) Digit Three<br />

0034 (4) Digit Four<br />

0035 (5) Digit Five<br />

0036 (6) Digit Six<br />

0037 (7) Digit Seven<br />

0038 (8) Digit Eight<br />

0039 (9) Digit Nine<br />

002D (-) Hyphen-Minus<br />

002E (.) Full Stop (Dot)<br />

http://tools.ietf.org/html/draft-farah-adntf-ling-guidelines-04


Country level<br />

– Individually done be some Arab<br />

countries (ccTLDs)<br />

• <strong>Arabic</strong>.English , e.g., com.sa‏.نطاق<br />

• Problem of mixing languages (leftto-right<br />

and right-to-left)<br />

• Not accepted (linguistic and<br />

socially)<br />

GCC level<br />

– Implementing a<br />

pilot project for<br />

<strong>Arabic</strong> <strong>Domain</strong><br />

names in the GCC<br />

Countries<br />

<br />

Arab world level<br />

– Extend the GCC Pilot Project to include members of the Arab League.<br />

Renamed: "<strong>Arabic</strong> <strong>Domain</strong> <strong>Names</strong> Pilot Project”, under the auspices of the<br />

Arab League<br />

– 2 committees (Steering and technical) as part of the <strong>Arabic</strong> Team for <strong>Domain</strong><br />

<strong>Names</strong>


Compare with:<br />

www.tadawul.com.sa


‏(مثال . إختبار ( example.test Participated in the<br />

– Published a technical report about the test “IDN Top Level <strong>Domain</strong><br />

Evaluations and Testing Report”<br />

OS (4)<br />

– Windows XP/Vista<br />

– MAC OS X<br />

– Linux (ubuntu)<br />

Applications (10)<br />

– Internet Explorer, version 6.0/7.0<br />

– FireFox, version 2.0/3.0b<br />

– Opera, version 9.24<br />

– Safari, version 3.0<br />

– Office Outlook 2003<br />

– Outlook Express version 6.0<br />

– Thunderbird 2.0<br />

– Apple Mail version 2.1<br />

Web mail (3)<br />

– Yahoo Mail<br />

– Google Mail (Gmail)<br />

– Microsoft Mail (Hotmail)<br />

http://www.arabic-domains.org/docs/IDN-ADNPP-Report.pdf


Test Cases<br />

http://www.arabic-domains.org/docs/IDN-ADNPP-Report.pdf


<strong>Arabic</strong>-Indic vs. European-<strong>Arabic</strong> digits<br />

– 0 1 2 3 4 5 6 7 8 9<br />

– 0 1 2 3 4 5 6 7 8 9<br />

“Windows has supported number substitution by allowing the<br />

representation of different cultural shapes for the same digits while<br />

keeping the internal storage of these digits unified among different<br />

locales, for example numbers are stored in their well known<br />

hexadecimal values, 0x40, 0x41, but displayed according to the selected<br />

language.<br />

Source: http://msdn2.microsoft.com/en-us/library/aa350685(VS.85).aspx?PHPSESSID=o1fb21liejulfgrptbmi9dec92<br />

م‎12‎م<br />

input[0] = U+0645<br />

input[1] = U+0031<br />

input[2] = U+0032<br />

input[3] = U+0645<br />

≠<br />

م‎١٢‎م<br />

input[0] = U+0645<br />

input[1] = U+0661<br />

input[2] = U+0662<br />

input[3] = U+0645


The 2 nd most widely used alphabetic writing<br />

system in the world<br />

Used by many languages such as:<br />

– Persian, Urdu, Turkish, Kurdish, Pashto, Jawi, …<br />

Source: http://en.wikipedia.org/wiki/<strong>Arabic</strong>_script


There are a number of<br />

groups of characters<br />

that have the same<br />

shapes,<br />

06A9<br />

– eg. Kaf, Heh, Yeh, Alef,<br />

… groups<br />

0643<br />

06A9


كلمني<br />

کلمني<br />

کلمنې<br />

input[0] = U+06a9<br />

input[1] = U+0644<br />

input[2] = U+0645<br />

input[3] = U+0646<br />

input[4] = U+06d0<br />

input[0] = U+06a9<br />

input[1] = U+0644<br />

input[2] = U+0645<br />

input[3] = U+0646<br />

input[4] = U+064a<br />

input[0] = U+0643<br />

input[1] = U+0644<br />

input[2] = U+0645<br />

input[3] = U+0646<br />

input[4] = U+064a


Announced on 12 Nov. 2007<br />

It is directly supported by HM King Abdullah<br />

Goes along with:<br />

– the Arab League Council’s declaration (in their 19 th<br />

meeting)<br />

– The national ICT plan<br />

Implemented by KACST with the cooperation of<br />

others<br />

الحياة 3/11/1428 ھ الموافق 13/11/2007 م


Support and encouragement:<br />

– Activities that enrich <strong>Arabic</strong> content<br />

– Development of <strong>Arabic</strong> language Tools<br />

Help to make <strong>Arabic</strong> content and tools available<br />

to everyone<br />

Setup<br />

– standards for <strong>Arabic</strong> content and tools<br />

– Indicators<br />

Promote the awareness of the <strong>Arabic</strong> content and<br />

how to develop it


•Assessment of<br />

Current Content<br />

•International<br />

Benchmarks<br />

•Needs<br />

•Roadmap<br />

•Projects<br />

•Role of Stakeholders<br />

•Indicators<br />

Strategy<br />

Projects<br />

Digital library•<br />

Multimedia Center•<br />

Digital dictionary•<br />

Open Content•<br />

S&T Books •<br />

Translation<br />

<strong>Arabic</strong> Corpus•<br />

…•<br />

• Linguistic Tools<br />

• Search Engine<br />

• Automatic<br />

Translation<br />

• OCR<br />

• Spell Checker<br />

Tools<br />

Infra-<br />

structure<br />

Corpus<br />

Dictionaries<br />

Hardware<br />

Internet<br />

Portal<br />

•<br />

•<br />

•<br />

•<br />


رارارارا<br />

شكككك<br />

شککککرارارارا<br />

xn--mgbti28b<br />

input[0] = U+0634<br />

input[1] = U+06a9<br />

input[2] = U+0631<br />

input[3] = U+0627<br />

xn--mgbti4d<br />

input[0] = U+0634<br />

input[1] = U+0643<br />

input[2] = U+0631<br />

input[3] = U+0627

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!