04.02.2013 Views

HTML 5 Draft Standard - 30 July 2009 - Huihoo

HTML 5 Draft Standard - 30 July 2009 - Huihoo

HTML 5 Draft Standard - 30 July 2009 - Huihoo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

42<br />

Note: Implementors are strongly urged to carefully examine any third-party<br />

libraries they might consider using to implement the parsing of syntaxes<br />

described below. For example, date libraries are likely to implement error<br />

handling behavior that differs from what is required in this specification, since<br />

error-handling behavior is often not defined in specifications that describe<br />

date syntaxes similar to those used in this specification, and thus<br />

implementations tend to vary greatly in how they handle errors.<br />

2.4.1 Common parser idioms<br />

The space characters, for the purposes of this specification, are U+0020 SPACE, U+0009<br />

CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D<br />

CARRIAGE RETURN (CR).<br />

The White_Space characters are those that have the Unicode property "White_Space".<br />

[UNICODE]<br />

The alphanumeric ASCII characters are those in the ranges U+00<strong>30</strong> DIGIT ZERO .. U+0039<br />

DIGIT NINE, U+0041 LATIN CAPITAL LETTER A .. U+005A LATIN CAPITAL LETTER Z, U+0061<br />

LATIN SMALL LETTER A .. U+007A LATIN SMALL LETTER Z.<br />

Some of the micro-parsers described below follow the pattern of having an input variable that<br />

holds the string being parsed, and having a position variable pointing at the next character to<br />

parse in input.<br />

For parsers based on this pattern, a step that requires the user agent to collect a sequence of<br />

characters means that the following algorithm must be run, with characters being the set of<br />

characters that can be collected:<br />

1. Let input and position be the same variables as those of the same name in the algorithm<br />

that invoked these steps.<br />

2. Let result be the empty string.<br />

3. While position doesn't point past the end of input and the character at position is one of<br />

the characters, append that character to the end of result and advance position to the<br />

next character in input.<br />

4. Return result.<br />

The step skip whitespace means that the user agent must collect a sequence of characters<br />

(page 42) that are space characters (page 42). The step skip White_Space characters means<br />

that the user agent must collect a sequence of characters (page 42) that are White_Space (page<br />

42) characters. In both cases, the collected characters are not used. [UNICODE]<br />

When a user agent is to strip line breaks from a string, the user agent must remove any<br />

U+000A LINE FEED (LF) and U+000D CARRIAGE RETURN (CR) characters from that string.<br />

The code-point length of a string is the number of Unicode code points in that string.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!