21.07.2015 Views

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

26 <strong>G<strong>AWK</strong></strong>: <strong>Effective</strong> <strong>AWK</strong> <strong>Programming</strong>\xhh...The hexadecimal value hh, where hh stands for a sequence of hexadecimal digits(‘0’–‘9’, and either ‘A’–‘F’ or ‘a’–‘f’). Like the same construct in ISO C, theescape sequence continues until the first nonhexadecimal digit is seen. However,using more than two hexadecimal digits produces undefined results. (The ‘\x’escape sequence is not allowed in POSIX awk.)\/ A literal slash (necessary for regexp constants only). This expression is usedwhen you want to write a regexp constant that contains a slash. Because theregexp is delimited by slashes, you need to escape the slash that is part of thepattern, in order to tell awk to keep processing the rest of the regexp.\" A literal double quote (necessary for string constants only). This expression isused when you want to write a string constant that contains a double quote.Because the string is delimited by double quotes, you need to escape the quotethat is part of the string, in order to tell awk to keep processing the rest of thestring.In gawk, a number of additional two-character sequences that begin with a backslash havespecial meaning in regexps. See Section 2.5 [gawk-Specific Regexp Operators], page 31.In a regexp, a backslash before any character that is not in the previous list and not listedin Section 2.5 [gawk-Specific Regexp Operators], page 31, means that the next charactershould be taken literally, even if it would normally be a regexp operator. For example,/a\+b/ matches the three characters ‘a+b’.For complete portability, do not use a backslash before any character not shown in theprevious list.To summarize:• The escape sequences in the table above are always processed first, for both stringconstants and regexp constants. This happens very early, as soon as awk reads yourprogram.• gawk processes both regexp constants and dynamic regexps (see Section 2.8 [UsingDynamic Regexps], page 34), for the special operators listed in Section 2.5 [gawk-Specific Regexp Operators], page 31.• A backslash before any other character means to treat that character literally.Advanced Notes: Backslash Before Regular CharactersIf you place a backslash in a string constant before something that is not one of the characterspreviously listed, POSIX awk purposely leaves what happens as undefined. There are twochoices:Strip the backslash outThis is what Unix awk and gawk both do. For example, "a\qc" is the sameas "aqc". (Because this is such an easy bug both to introduce and to miss,gawk warns you about it.) Consider ‘FS = "[ \t]+\|[ \t]+"’ to use verticalbars surrounded by whitespace as the field separator. There should be twobackslashes in the string ‘FS = "[ \t]+\\|[ \t]+"’.)Leave the backslash aloneSome other awk implementations do this."a\qc" is the same as typing "a\\qc".In such implementations, typing

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!