21.07.2015 Views

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

GAWK: Effective AWK Programming

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

80 <strong>G<strong>AWK</strong></strong>: <strong>Effective</strong> <strong>AWK</strong> <strong>Programming</strong>format, awk converts all numbers to the same constant string. As a special case, if a numberis an integer, then the result of converting it to a string is always an integer, no matterwhat the value of CONVFMT may be. Given the following code fragment:CONVFMT = "%2.2f"a = 12b = a ""b has the value "12", not "12.00".Prior to the POSIX standard, awk used the value of OFMT for converting numbers tostrings. OFMT specifies the output format to use when printing numbers with print. CONVFMTwas introduced in order to separate the semantics of conversion from the semantics ofprinting. Both CONVFMT and OFMT have the same default value: "%.6g". In the vast majorityof cases, old awk programs do not change their behavior. However, these semantics forOFMT are something to keep in mind if you must port your new style program to olderimplementations of awk. We recommend that instead of changing your programs, just portgawk itself. See Section 4.1 [The print Statement], page 58, for more information on theprint statement.And, once again, where you are can matter when it comes to converting between numbersand strings. In Section 2.9 [Where You Are Makes A Difference], page 35, we mentioned thatthe local character set and language (the locale) can affect how gawk matches characters.The locale also affects numeric formats. In particular, for awk programs, it affects thedecimal point character. The "C" locale, and most English-language locales, use the periodcharacter (‘.’) as the decimal point. However, many (if not most) European and non-Englishlocales use the comma (‘,’) as the decimal point character.The POSIX standard says that awk always uses the period as the decimal point whenreading the awk program source code, and for command-line variable assignments (seeSection 11.3 [Other Command-Line Arguments], page 182). However, when interpretinginput data, for print and printf output, and for number to string conversion, the localdecimal point character is used. Here are some examples indicating the difference inbehavior, on a GNU/Linux system:$ gawk ’BEGIN { printf "%g\n", 3.1415927 }’⊣ 3.14159$ LC_ALL=en_DK gawk ’BEGIN { printf "%g\n", 3.1415927 }’⊣ 3,14159$ echo 4,321 | gawk ’{ print $1 + 1 }’⊣ 5$ echo 4,321 | LC_ALL=en_DK gawk ’{ print $1 + 1 }’⊣ 5,321The ‘en_DK’ locale is for English in Denmark, where the comma acts as the decimal pointseparator. In the normal "C" locale, gawk treats ‘4,321’ as ‘4’, while in the Danish locale,it’s treated as the full number, ‘4.321’.For version 3.1.3 through 3.1.5, gawk fully complied with this aspect of the standard.However, many users in non-English locales complained about this behavior, since their dataused a period as the decimal point. Beginning in version 3.1.6, the default behavior wasrestored to use a period as the decimal point character. You can use the ‘--use-lc-numeric’option (see Section 11.2 [Command-Line Options], page 177) to force gawk to use the locale’s

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!