09.04.2013 Views

SNOMED CT® Release Format 1 (RF1) Guide - ihtsdo

SNOMED CT® Release Format 1 (RF1) Guide - ihtsdo

SNOMED CT® Release Format 1 (RF1) Guide - ihtsdo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.3.7.2.2.1. Single byte encoding<br />

Characters in the range 'u+0000' to 'u+007f' are encoded as a single byte.<br />

Table 23: UTF-8 Single Byte Encoding<br />

2.3.7.2.2.2. Two byte encoding<br />

0<br />

byte 0<br />

Characters in the range 'u+0080' to 'u+07ff' are encoded as two bytes.<br />

Table 24: Two byte encoding<br />

1<br />

1<br />

0<br />

byte 0<br />

2.3.7.2.2.3. Three byte encoding<br />

bits 6-10<br />

Characters in the range 'u+0800' to 'u+ffff' are encoded as three bytes:<br />

Table 25: UTF-8 Three Byte Encoding<br />

1<br />

1<br />

1<br />

byte 0<br />

bits 12-15<br />

2.3.7.2.3. Notes on encoding rules<br />

0<br />

1<br />

0<br />

1<br />

0<br />

byte 1<br />

bits 6-11<br />

bits 0-6<br />

1<br />

byte 1<br />

0<br />

bits 0-5<br />

byte 2<br />

bits 0-5<br />

The first bits of each byte indicate the role of the byte. A zero bit terminates this role information.Thus possible<br />

byte values are:<br />

Table 26: UTF-8 Encoding Rules<br />

Bits<br />

0???? ?? ?<br />

10??? ?? ?<br />

110?? ?? ?<br />

1110? ?? ?<br />

1111? ?? ?<br />

Byte value<br />

000-127<br />

128-191<br />

192-223<br />

224-239<br />

240-255<br />

Role<br />

Single byte encoding of a character<br />

Continuation of a multi-byte encoding<br />

First byte of a two byte character encoding<br />

First byte of a three byte character encoding<br />

Invalid in UTF-8<br />

Structure and Content <strong>Guide</strong> | 91<br />

© 2002-2012 International Health Terminology Standards Development Organisation CVR #: 30363434

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!