21.07.2015 Views

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

50 <strong>G<strong>AWK</strong></strong>: <strong>Effective</strong> <strong>AWK</strong> <strong>Programming</strong>the formfeed character). Any other character could equally well be used, as long as it won’tbe part of the data in a record.Another technique is to have blank lines separate records. By a special dispensation, anempty string as the value of RS indicates that records are separated by one or more blanklines. When RS is set to the empty string, each record always ends at the first blank lineencountered. The next record doesn’t start until the first nonblank line that follows. Nomatter how many blank lines appear in a row, they all act as one record separator. (Blanklines must be completely empty; lines that contain only whitespace do not count.)You can achieve the same effect as ‘RS = ""’ by assigning the string "\n\n+" to RS. Thisregexp matches the newline at the end of the record and one or more blank lines after therecord. In addition, a regular expression always matches the longest possible sequence whenthere is a choice (see Section 2.7 [How Much Text Matches?], page 33). So the next recorddoesn’t start until the first nonblank line that follows—no matter how many blank linesappear in a row, they are considered one record separator.There is an important difference between ‘RS = ""’ and ‘RS = "\n\n+"’. In the first case,leading newlines in the input data file are ignored, and if a file ends without extra blanklines after the last record, the final newline is removed from the record. In the second case,this special processing is not done.Now that the input is separated into records, the second step is to separate the fieldsin the record. One way to do this is to divide each of the lines into fields in the normalmanner. This happens by default as the result of a special feature. When RS is set to theempty string, and FS is set to a single character, the newline character always acts as afield separator. This is in addition to whatever field separations result from FS. 4The original motivation for this special exception was probably to provide useful behaviorin the default case (i.e., FS is equal to " "). This feature can be a problem if you really don’twant the newline character to separate fields, because there is no way to prevent it. However,you can work around this by using the split function to break up the record manually (seeSection 8.1.3 [String-Manipulation Functions], page 132). If you have a single character fieldseparator, you can work around the special feature in a different way, by making FS into aregexp for that single character. For example, if the field separator is a percent character,instead of ‘FS = "%"’, use ‘FS = "[%]"’.Another way to separate fields is to put each field on a separate line: to do this, justset the variable FS to the string "\n". (This single character separator matches a singlenewline.) A practical example of a data file organized this way might be a mailing list, whereeach entry is separated by blank lines. Consider a mailing list in a file named ‘addresses’,which looks like this:Jane Doe123 Main StreetAnywhere, SE 12345-6789John Smith456 Tree-lined AvenueSmallville, MW 98765-43214 When FS is the null string ("") or a regexp, this special feature of RS does not apply. It does apply tothe default field separator of a single space: ‘FS = " "’.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!