21.07.2015 Views

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

244 <strong>G<strong>AWK</strong></strong>: <strong>Effective</strong> <strong>AWK</strong> <strong>Programming</strong>one field), that the sort keys should be treated as numeric quantities (otherwise ‘15’ wouldcome before ‘5’), and that the sorting should be done in descending (reverse) order.The sort could even be done from within the program, by changing the END action to:END {sort = "sort -k 2nr"for (word in freq)printf "%s\t%d\n", word, freq[word] | sortclose(sort)}This way of sorting must be used on systems that do not have true pipes at the commandline(or batch-file) level. See the general operating system documentation for more informationon how to use the sort program.13.3.6 Removing Duplicates from Unsorted TextThe uniq program (see Section 13.2.6 [Printing Nonduplicated Lines of Text], page 229),removes duplicate lines from sorted data.Suppose, however, you need to remove duplicate lines from a data file but that you wantto preserve the order the lines are in. A good example of this might be a shell history file.The history file keeps a copy of all the commands you have entered, and it is not unusualto repeat a command several times in a row. Occasionally you might want to compactthe history by removing duplicate entries. Yet it is desirable to maintain the order of theoriginal commands.This simple program does the job. It uses two arrays. The data array is indexed bythe text of each line. For each line, data[$0] is incremented. If a particular line has notbeen seen before, then data[$0] is zero. In this case, the text of the line is stored inlines[count]. Each element of lines is a unique command, and the indices of linesindicate the order in which those lines are encountered. The END rule simply prints out thelines, in order:# histsort.awk --- compact a shell history file# Thanks to Byron Rakitzis for the general idea{if (data[$0]++ == 0)lines[++count] = $0}END {for (i = 1; i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!