17.07.2013 Views

4. Character Sets - OCLC

4. Character Sets - OCLC

4. Character Sets - OCLC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>4.</strong>2 Non-Latin Script <strong>Character</strong> Set (continued)<br />

Valid character sets:<br />

UTF-8 Unicode<br />

Field 066<br />

Bengali, Devanagari, Tamil, and Thai<br />

There are no MARC-8 character sets for Bengali, Devanagari, Tamil, or Thai. <strong>OCLC</strong><br />

implemented script identification codes for these scripts based on ISO 15924 Code<br />

Lists (http://www.unicode.org/iso15924/codelists.html).<br />

The following list shows the ranges of UTF-8 Unicode characters that define valid<br />

characters for these scripts in WorldCat records:<br />

Bengali (character range U+0980 to U+09FF)<br />

Devanagari (character range U+0900 to U+097F)<br />

Tamil (character range U+0B80 to U+0BFF)<br />

Thai (character range U+0E00 to U+0E7F)<br />

If you receive records in MARC-8 data format, characters that are not supported in<br />

MARC-8 are represented by numeric character references (NCR).<br />

Field 066 identifies the presence of any MARC-8 character sets for non-Latin scripts<br />

in the record. Field 066 is not used: for UTF-8 character sets.<br />

‡c Alternate graphic character set identification Subfield ‡c contains a code<br />

identifying the MARC-8 character set used in the record. The subfield is repeated for<br />

each additional character set present. The following codes display:<br />

$1 Chinese, Japanese, Korean present<br />

(3 Basic Arabic present<br />

(4 Extended Arabic present<br />

(N Basic Cyrillic present<br />

(Q Extended Cyrillic present<br />

(S Extended Greek present<br />

(2 Basic Hebrew present<br />

4:10 <strong>OCLC</strong>-MARC Records

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!