Arabic Domain Names
Arabic Domain Names
Arabic Domain Names
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Dr. Abdulaziz H. Al-Zoman<br />
Director of SaudiNIC - CITC<br />
Chairman of Steering Committee - <strong>Arabic</strong> <strong>Domain</strong> Name Pilot Project<br />
Internet Governance Forum, Shram Elshaikh, Nov 15-18, 2009
<strong>Arabic</strong> Users/Language in the Internet<br />
IDN: <strong>Arabic</strong> <strong>Domain</strong> <strong>Names</strong><br />
<strong>Arabic</strong> Content Initiative (KAIAC)
Population of Arab world: 343 M (5% of world popul.)<br />
Native Speakers:<br />
422 M<br />
English Speaking Arabs: 6.5 M (2%)<br />
= 3%<br />
Source: “Addressing the Gap in <strong>Arabic</strong> Content”, by Usama Fayyad, The 2 nd Int. Symposium on Computers and <strong>Arabic</strong>, October 11th, 2009<br />
Source: Internet world stats – internetworldstats.com
Source: Internet world stats – internetworldstats.com
Total web pages<br />
– 56 B – Yahoo<br />
Total English web pages<br />
– 14 B – Google Probe<br />
Total <strong>Arabic</strong> web pages<br />
– 0.7 B – Google Probe<br />
(2009)<br />
– 0.2 B – d1g.com web crawl<br />
(2008)<br />
Source: “Addressing the Gap in <strong>Arabic</strong> Content”, by Usama Fayyad, Executive Chairman d1g.com, presentation<br />
on th 2 nd Int. Symposium on Computers and <strong>Arabic</strong>, October 11th, 2009
Source: www.citc.gov.sa
Source: www.citc.gov.sa
Content:<br />
– KAIAC: King Abdullah Initiative for <strong>Arabic</strong> Content<br />
– http://www.econtent.org.sa<br />
To reach the next billion people, according to<br />
IGF speakers in Hyderabad : The Internet<br />
must support the large number of languages<br />
in the world at all levels<br />
IDN<br />
– <strong>Arabic</strong> <strong>Domain</strong> Name Pilot Project<br />
– http://www.arabic-domains.org<br />
Source: http://www.networkworld.com/news/2008/120308-internet-needsmultilingual-support-for.html
Current ASCII-based<br />
DNs are incapable of<br />
representing <strong>Arabic</strong><br />
characters<br />
Difficulty to reach<br />
<strong>Arabic</strong> sites using<br />
English DNs<br />
(pronunciation &<br />
spelling problems)<br />
Full <strong>Arabic</strong> DNs will<br />
encourage Arab users to<br />
widely use the Internet<br />
<strong>Arabic</strong> News paper<br />
صحيفة الشرق األوسط<br />
www.al-sharqalawsat.com<br />
www.asharqalawsat.com<br />
www.asharq-alaowsat.com<br />
www.elsharkelaosat.com<br />
<br />
E-government Site<br />
يسّر<br />
www.yasser.gov.sa<br />
www.yaser.gov.sa<br />
www.yasir.gov.sa<br />
www.yassir.gov.sa
Linguistic Issues<br />
• To define the accepted <strong>Arabic</strong> character<br />
set to be used for writing <strong>Arabic</strong> domain<br />
names<br />
<strong>Arabic</strong> TLDs<br />
Technical<br />
Solutions<br />
• To define the top-level domains of the<br />
<strong>Arabic</strong> domain name tree structure (i.e.,<br />
<strong>Arabic</strong> gTLDs, and ccTLDs)<br />
• IETF (IDNA), UNICODE, …<br />
Root Servers<br />
• ICANN/IANA,<br />
• (e.g., IDN ccTLD Fast track)
Identified a number of<br />
linguistic issues with<br />
respect to domain names:<br />
– Usage of Tashkeel<br />
(diacritics).<br />
– Usage of Kashidah.<br />
– Character folding.<br />
– Which numerical digits<br />
should be supported.<br />
– Connecting multiple words.<br />
– Mixing with other languages.<br />
– Usage of special characters.<br />
Discussed with more than 60 members of<br />
the AINC linguistic committee reaching<br />
final recommendations<br />
A number of surveys have been collected<br />
from the public<br />
•Received more than 550 responses<br />
•Collected information have been analyzed and<br />
compared with the recommendations of the AINC<br />
linguistic committee<br />
Discussed with <strong>Arabic</strong> linguists to get<br />
their guidance regarding the <strong>Arabic</strong><br />
linguistic issues in domain names.<br />
Discussed and agreed by the Arab Team<br />
fro <strong>Domain</strong> <strong>Names</strong> (under Arab League)
Tashkeel (Diacritics)<br />
Kasheeda<br />
Character folding:<br />
Teh Marbuta + Heh<br />
different forms of Hamzah<br />
Alif Maqsura+Yeh<br />
Numbers (numerical digits)<br />
Connecting Multiple Words<br />
Mixing Latin and <strong>Arabic</strong><br />
Characters<br />
Special Characters (e.g., @,<br />
#, $, %, ...)<br />
Tashkeel should not be allowed.<br />
However, if there is a need to allowed users to entered it as part of a<br />
domain name then it should be stripped off by nameprep<br />
Kasheeda should be disallowed<br />
Folding should not be allowed<br />
If it is technically possible, it is preferred to support both (Latin and<br />
<strong>Arabic</strong>) sets with folding to one set. Otherwise, Latin set is sufficient<br />
It is recommended that multiple words are separated by the<br />
character "-".<br />
It is recommended that <strong>Arabic</strong> domain names be pure <strong>Arabic</strong> and<br />
they should not be mixed with other languages.<br />
It is recommended that <strong>Arabic</strong> domain names should follow the<br />
standard with respect to the use of special characters.<br />
http://tools.ietf.org/html/draft-farah-adntf-ling-guidelines-04
Characters from Unicode <strong>Arabic</strong> Table (0600–06FF)<br />
0621 (ء) <strong>Arabic</strong> Letter HAMZA<br />
0622 (آ) <strong>Arabic</strong> Letter ALEF with MADDA above<br />
0623 (أ) <strong>Arabic</strong> Letter ALEF with HAMZA above<br />
0624 (ؤ) <strong>Arabic</strong> Letter WAW with HAMZA above<br />
0625 (إ) <strong>Arabic</strong> Letter ALEF with HAMZA below<br />
0626 (ئ) <strong>Arabic</strong> Letter YEH with HAMZA above<br />
0627 (ا) <strong>Arabic</strong> Letter ALEF<br />
0628 (ب) <strong>Arabic</strong> Letter BEH<br />
0629 (ة) <strong>Arabic</strong> Letter TEH MARBUTA<br />
062A (ت) <strong>Arabic</strong> Letter TEH<br />
062B (ث) <strong>Arabic</strong> Letter THEH<br />
062C (ج) <strong>Arabic</strong> Letter JEEM<br />
062D (ح) <strong>Arabic</strong> Letter HAH<br />
062E (خ) <strong>Arabic</strong> Letter KHAH<br />
062F (د) <strong>Arabic</strong> Letter DAL<br />
0630 (ذ) <strong>Arabic</strong> Letter THAL<br />
0631 (ر) <strong>Arabic</strong> Letter REH<br />
0632 (ز) <strong>Arabic</strong> Letter ZAIN<br />
0633 (س) <strong>Arabic</strong> Letter SEEN<br />
0634 (ش) <strong>Arabic</strong> Letter SHEEN<br />
0635 (ص) <strong>Arabic</strong> Letter SAD<br />
0636 (ض) <strong>Arabic</strong> Letter DAD<br />
0637 (ط) <strong>Arabic</strong> Letter TAH<br />
0638 (ظ) <strong>Arabic</strong> Letter ZAH<br />
0639 (ع) <strong>Arabic</strong> Letter AIN<br />
063A (غ) <strong>Arabic</strong> Letter GHAIN<br />
0641 (ف) <strong>Arabic</strong> Letter FEH<br />
0642 (ق) <strong>Arabic</strong> Letter QAF<br />
0643 (ك) <strong>Arabic</strong> Letter KAF<br />
0644 (ل) <strong>Arabic</strong> Letter LAM<br />
0645 (م) <strong>Arabic</strong> Letter MEEM<br />
0646 (ن) <strong>Arabic</strong> Letter NOON<br />
0647 (ھ) <strong>Arabic</strong> Letter HEH<br />
0648 (و) <strong>Arabic</strong> Letter WAW<br />
0649 (ى) <strong>Arabic</strong> Letter ALEF MAKSURA<br />
064A (ي) <strong>Arabic</strong> Letter YEH<br />
0660 (0) <strong>Arabic</strong>-Indic Digit Zero<br />
0661 (1) <strong>Arabic</strong>-Indic Digit One<br />
0662 (2) <strong>Arabic</strong>-Indic Digit Two<br />
0663 (3) <strong>Arabic</strong>-Indic Digit Three<br />
0664 (4) <strong>Arabic</strong>-Indic Digit Four<br />
0665 (5) <strong>Arabic</strong>-Indic Digit Five<br />
0666 (6) <strong>Arabic</strong>-Indic Digit Six<br />
0667 (7) <strong>Arabic</strong>-Indic Digit Seven<br />
0668 (8) <strong>Arabic</strong>-Indic Digit Eight<br />
0669 (9) <strong>Arabic</strong>-Indic Digit Nine
Characters from Unicode Basic Latin Table (0000–007F):<br />
0030 (0) Digit Zero<br />
0031 (1) Digit One<br />
0032 (2) Digit Two<br />
0033 (3) Digit Three<br />
0034 (4) Digit Four<br />
0035 (5) Digit Five<br />
0036 (6) Digit Six<br />
0037 (7) Digit Seven<br />
0038 (8) Digit Eight<br />
0039 (9) Digit Nine<br />
002D (-) Hyphen-Minus<br />
002E (.) Full Stop (Dot)<br />
http://tools.ietf.org/html/draft-farah-adntf-ling-guidelines-04
Country level<br />
– Individually done be some Arab<br />
countries (ccTLDs)<br />
• <strong>Arabic</strong>.English , e.g., com.sa.نطاق<br />
• Problem of mixing languages (leftto-right<br />
and right-to-left)<br />
• Not accepted (linguistic and<br />
socially)<br />
GCC level<br />
– Implementing a<br />
pilot project for<br />
<strong>Arabic</strong> <strong>Domain</strong><br />
names in the GCC<br />
Countries<br />
<br />
Arab world level<br />
– Extend the GCC Pilot Project to include members of the Arab League.<br />
Renamed: "<strong>Arabic</strong> <strong>Domain</strong> <strong>Names</strong> Pilot Project”, under the auspices of the<br />
Arab League<br />
– 2 committees (Steering and technical) as part of the <strong>Arabic</strong> Team for <strong>Domain</strong><br />
<strong>Names</strong>
Compare with:<br />
www.tadawul.com.sa
(مثال . إختبار ( example.test Participated in the<br />
– Published a technical report about the test “IDN Top Level <strong>Domain</strong><br />
Evaluations and Testing Report”<br />
OS (4)<br />
– Windows XP/Vista<br />
– MAC OS X<br />
– Linux (ubuntu)<br />
Applications (10)<br />
– Internet Explorer, version 6.0/7.0<br />
– FireFox, version 2.0/3.0b<br />
– Opera, version 9.24<br />
– Safari, version 3.0<br />
– Office Outlook 2003<br />
– Outlook Express version 6.0<br />
– Thunderbird 2.0<br />
– Apple Mail version 2.1<br />
Web mail (3)<br />
– Yahoo Mail<br />
– Google Mail (Gmail)<br />
– Microsoft Mail (Hotmail)<br />
http://www.arabic-domains.org/docs/IDN-ADNPP-Report.pdf
Test Cases<br />
http://www.arabic-domains.org/docs/IDN-ADNPP-Report.pdf
<strong>Arabic</strong>-Indic vs. European-<strong>Arabic</strong> digits<br />
– 0 1 2 3 4 5 6 7 8 9<br />
– 0 1 2 3 4 5 6 7 8 9<br />
“Windows has supported number substitution by allowing the<br />
representation of different cultural shapes for the same digits while<br />
keeping the internal storage of these digits unified among different<br />
locales, for example numbers are stored in their well known<br />
hexadecimal values, 0x40, 0x41, but displayed according to the selected<br />
language.<br />
Source: http://msdn2.microsoft.com/en-us/library/aa350685(VS.85).aspx?PHPSESSID=o1fb21liejulfgrptbmi9dec92<br />
م12م<br />
input[0] = U+0645<br />
input[1] = U+0031<br />
input[2] = U+0032<br />
input[3] = U+0645<br />
≠<br />
م١٢م<br />
input[0] = U+0645<br />
input[1] = U+0661<br />
input[2] = U+0662<br />
input[3] = U+0645
The 2 nd most widely used alphabetic writing<br />
system in the world<br />
Used by many languages such as:<br />
– Persian, Urdu, Turkish, Kurdish, Pashto, Jawi, …<br />
Source: http://en.wikipedia.org/wiki/<strong>Arabic</strong>_script
There are a number of<br />
groups of characters<br />
that have the same<br />
shapes,<br />
06A9<br />
– eg. Kaf, Heh, Yeh, Alef,<br />
… groups<br />
0643<br />
06A9
كلمني<br />
کلمني<br />
کلمنې<br />
input[0] = U+06a9<br />
input[1] = U+0644<br />
input[2] = U+0645<br />
input[3] = U+0646<br />
input[4] = U+06d0<br />
input[0] = U+06a9<br />
input[1] = U+0644<br />
input[2] = U+0645<br />
input[3] = U+0646<br />
input[4] = U+064a<br />
input[0] = U+0643<br />
input[1] = U+0644<br />
input[2] = U+0645<br />
input[3] = U+0646<br />
input[4] = U+064a
Announced on 12 Nov. 2007<br />
It is directly supported by HM King Abdullah<br />
Goes along with:<br />
– the Arab League Council’s declaration (in their 19 th<br />
meeting)<br />
– The national ICT plan<br />
Implemented by KACST with the cooperation of<br />
others<br />
الحياة 3/11/1428 ھ الموافق 13/11/2007 م
Support and encouragement:<br />
– Activities that enrich <strong>Arabic</strong> content<br />
– Development of <strong>Arabic</strong> language Tools<br />
Help to make <strong>Arabic</strong> content and tools available<br />
to everyone<br />
Setup<br />
– standards for <strong>Arabic</strong> content and tools<br />
– Indicators<br />
Promote the awareness of the <strong>Arabic</strong> content and<br />
how to develop it
•Assessment of<br />
Current Content<br />
•International<br />
Benchmarks<br />
•Needs<br />
•Roadmap<br />
•Projects<br />
•Role of Stakeholders<br />
•Indicators<br />
Strategy<br />
Projects<br />
Digital library•<br />
Multimedia Center•<br />
Digital dictionary•<br />
Open Content•<br />
S&T Books •<br />
Translation<br />
<strong>Arabic</strong> Corpus•<br />
…•<br />
• Linguistic Tools<br />
• Search Engine<br />
• Automatic<br />
Translation<br />
• OCR<br />
• Spell Checker<br />
Tools<br />
Infra-<br />
structure<br />
Corpus<br />
Dictionaries<br />
Hardware<br />
Internet<br />
Portal<br />
•<br />
•<br />
•<br />
•<br />
•
رارارارا<br />
شكككك<br />
شککککرارارارا<br />
xn--mgbti28b<br />
input[0] = U+0634<br />
input[1] = U+06a9<br />
input[2] = U+0631<br />
input[3] = U+0627<br />
xn--mgbti4d<br />
input[0] = U+0634<br />
input[1] = U+0643<br />
input[2] = U+0631<br />
input[3] = U+0627