21.07.2015 Views

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3: Reading Input Files 47assignment ‘FS = "\.."’ assigns the character string ".." to FS (the backslash is stripped).This creates a regexp meaning “fields are separated by occurrences of any two characters.”If instead you want fields to be separated by a literal period followed by any single character,use ‘FS = "\\.."’.The following table summarizes how fields are split, based on the value of FS (‘==’ means“is equal to”):FS == " "Fields are separated by runs of whitespace. Leading and trailing whitespaceare ignored. This is the default.FS == any other single characterFields are separated by each occurrence of the character. Multiple successiveoccurrences delimit empty fields, as do leading and trailing occurrences. Thecharacter can even be a regexp metacharacter; it does not need to be escaped.FS == regexpFields are separated by occurrences of characters that match regexp. Leadingand trailing matches of regexp delimit empty fields.FS == ""Each individual character in the record becomes a separate field. (This is agawk extension; it is not specified by the POSIX standard.)Advanced Notes: Changing FS Does Not Affect the FieldsAccording to the POSIX standard, awk is supposed to behave as if each record is split intofields at the time it is read. In particular, this means that if you change the value of FSafter a record is read, the value of the fields (i.e., how they were split) should reflect the oldvalue of FS, not the new one.However, many implementations of awk do not work this way. Instead, they defer splittingthe fields until a field is actually referenced. The fields are split using the currentvalue of FS! This behavior can be difficult to diagnose. The following example illustratesthe difference between the two methods. (The sed 3 command prints just the first line of‘/etc/passwd’.)sed 1q /etc/passwd | awk ’{ FS = ":" ; print $1 }’which usually prints:rooton an incorrect implementation of awk, while gawk prints something like:root:nSijPlPhZZwgE:0:0:Root:/:Advanced Notes: FS and IGNORECASEThe IGNORECASE variable (see Section 6.5.1 [Built-in Variables That Control awk], page 110)affects field splitting only when the value of FS is a regexp. It has no effect when FS is asingle character, even if that character is a letter. Thus, in the following code:FS = "c"IGNORECASE = 1$0 = "aCa"3 The sed utility is a “stream editor.” Its behavior is also defined by the POSIX standard.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!