21.07.2015 Views

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

36 <strong>G<strong>AWK</strong></strong>: <strong>Effective</strong> <strong>AWK</strong> <strong>Programming</strong>3 Reading Input FilesIn the typical awk program, all input is read either from the standard input (by default,this is the keyboard, but often it is a pipe from another command) or from files whosenames you specify on the awk command line. If you specify input files, awk reads themin order, processing all the data from one before going on to the next. The name of thecurrent input file can be found in the built-in variable FILENAME (see Section 6.5 [Built-inVariables], page 110).The input is read in units called records, and is processed by the rules of your programone record at a time. By default, each record is one line. Each record is automatically splitinto chunks called fields. This makes it more convenient for programs to work on the partsof a record.On rare occasions, you may need to use the getline command. The getline commandis valuable, both because it can do explicit input from any number of files, and becausethe files used with it do not have to be named on the awk command line (see Section 3.8[Explicit Input with getline], page 52).3.1 How Input Is Split into RecordsThe awk utility divides the input for your awk program into records and fields. awk keepstrack of the number of records that have been read so far from the current input file. Thisvalue is stored in a built-in variable called FNR. It is reset to zero when a new file is started.Another built-in variable, NR, is the total number of input records read so far from all datafiles. It starts at zero, but is never automatically reset to zero.Records are separated by a character called the record separator. By default, the recordseparator is the newline character. This is why records are, by default, single lines. Adifferent character can be used for the record separator by assigning the character to thebuilt-in variable RS.Like any other variable, the value of RS can be changed in the awk program with theassignment operator, ‘=’ (see Section 5.7 [Assignment Expressions], page 83). The newrecord-separator character should be enclosed in quotation marks, which indicate a stringconstant. Often the right time to do this is at the beginning of execution, before any inputis processed, so that the very first record is read with the proper separator. To do this, usethe special BEGIN pattern (see Section 6.1.4 [The BEGIN and END Special Patterns], page 99).For example:awk ’BEGIN { RS = "/" }{ print $0 }’ BBS-listchanges the value of RS to "/", before reading any input. This is a string whose firstcharacter is a slash; as a result, records are separated by slashes. Then the input file isread, and the second rule in the awk program (the action with no pattern) prints eachrecord. Because each print statement adds a newline at the end of its output, this awkprogram copies the input with each slash changed to a newline. Here are the results ofrunning the program on ‘BBS-list’:$ awk ’BEGIN { RS = "/" }> { print $0 }’ BBS-list⊣ aardvark 555-5553 1200

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!