Jay Taylor's notes
back to listing indexAWK Language Programming - Built-in Variables
[web search]
Built-in Variables
Most awk
variables are available for you to use for your own
purposes; they never change except when your program assigns values to
them, and never affect anything except when your program examines them.
However, a few variables in awk
have special built-in meanings.
Some of them awk
examines automatically, so that they enable you
to tell awk
how to do certain things. Others are set
automatically by awk
, so that they carry information from the
internal workings of awk
to your program.
This chapter documents all the built-in variables of gawk
. Most
of them are also documented in the chapters describing their areas of
activity.
Built-in Variables that Control awk
This is an alphabetical list of the variables which you can change to
control how awk
does certain things. Those variables that are
specific to gawk
are marked with an asterisk, `*'.
CONVFMT
-
This string controls conversion of numbers to
strings (see section Conversion of Strings and Numbers).
It works by being passed, in effect, as the first argument to the
sprintf
function (see section Built-in Functions for String Manipulation). Its default value is"%.6g"
.CONVFMT
was introduced by the POSIX standard. FIELDWIDTHS *
-
This is a space separated list of columns that tells
gawk
how to split input with fixed, columnar boundaries. It is an experimental feature. Assigning toFIELDWIDTHS
overrides the use ofFS
for field splitting. See section Reading Fixed-width Data, for more information. Ifgawk
is in compatibility mode (see section Command Line Options), thenFIELDWIDTHS
has no special meaning, and field splitting operations are done based exclusively on the value ofFS
. FS
-
FS
is the input field separator (see section Specifying How Fields are Separated). The value is a single-character string or a multi-character regular expression that matches the separations between fields in an input record. If the value is the null string (""
), then each character in the record becomes a separate field. The default value is" "
, a string consisting of a single space. As a special exception, this value means that any sequence of spaces and tabs is a single separator. It also causes spaces and tabs at the beginning and end of a record to be ignored. You can set the value ofFS
on the command line using the `-F' option:awk -F, 'program' input-files
Ifgawk
is usingFIELDWIDTHS
for field-splitting, assigning a value toFS
will causegawk
to return to the normal,FS
-based, field splitting. An easy way to do this is to simply say `FS = FS', perhaps with an explanatory comment. IGNORECASE *
-
If
IGNORECASE
is non-zero or non-null, then all string comparisons, and all regular expression matching are case-independent. Thus, regexp matching with `~' and `!~', and thegensub
,gsub
,index
,match
,split
andsub
functions, record termination withRS
, and field splitting withFS
all ignore case when doing their particular regexp operations. See section Case-sensitivity in Matching. Ifgawk
is in compatibility mode (see section Command Line Options), thenIGNORECASE
has no special meaning, and string and regexp operations are always case-sensitive. OFMT
-
This string controls conversion of numbers to
strings (see section Conversion of Strings and Numbers) for
printing with the
print
statement. It works by being passed, in effect, as the first argument to thesprintf
function (see section Built-in Functions for String Manipulation). Its default value is"%.6g"
. Earlier versions ofawk
also usedOFMT
to specify the format for converting numbers to strings in general expressions; this is now done byCONVFMT
. OFS
-
This is the output field separator (see section Output Separators). It is
output between the fields output by a
print
statement. Its default value is" "
, a string consisting of a single space. ORS
-
This is the output record separator. It is output at the end of every
print
statement. Its default value is"\n"
. (See section Output Separators.) RS
-
This is
awk
's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines, or a regexp, in which case records are separated by matches of the regexp in the input text. (See section How Input is Split into Records.) SUBSEP
-
SUBSEP
is the subscript separator. It has the default value of"\034"
, and is used to separate the parts of the indices of a multi-dimensional array. Thus, the expressionfoo["A", "B"]
really accessesfoo["A\034B"]
(see section Multi-dimensional Arrays).
Built-in Variables that Convey Information
This is an alphabetical list of the variables that are set
automatically by awk
on certain occasions in order to provide
information to your program. Those variables that are specific to
gawk
are marked with an asterisk, `*'.
ARGC
ARGV
-
The command-line arguments available to
awk
programs are stored in an array calledARGV
.ARGC
is the number of command-line arguments present. See section Other Command Line Arguments. Unlike mostawk
arrays,ARGV
is indexed from zero toARGC
- 1. For example:$ awk 'BEGIN { > for (i = 0; i < ARGC; i++) > print ARGV[i] > }' inventory-shipped BBS-list -| awk -| inventory-shipped -| BBS-list
In this example,ARGV[0]
contains"awk"
,ARGV[1]
contains"inventory-shipped"
, andARGV[2]
contains"BBS-list"
. The value ofARGC
is three, one more than the index of the last element inARGV
, since the elements are numbered from zero. The namesARGC
andARGV
, as well as the convention of indexing the array from zero toARGC
- 1, are derived from the C language's method of accessing command line arguments. See section UsingARGC
andARGV
, for information about howawk
uses these variables. ARGIND *
-
The index in
ARGV
of the current file being processed. Every timegawk
opens a new data file for processing, it setsARGIND
to the index inARGV
of the file name. Whengawk
is processing the input files, it is always true that `FILENAME == ARGV[ARGIND]'. This variable is useful in file processing; it allows you to tell how far along you are in the list of data files, and to distinguish between successive instances of the same filename on the command line. While you can change the value ofARGIND
within yourawk
program,gawk
will automatically set it to a new value when the next file is opened. This variable is agawk
extension. In otherawk
implementations, or ifgawk
is in compatibility mode (see section Command Line Options), it is not special. ENVIRON
-
An associative array that contains the values of the environment. The array
indices are the environment variable names; the values are the values of
the particular environment variables. For example,
ENVIRON["HOME"]
might be `/home/arnold'. Changing this array does not affect the environment passed on to any programs thatawk
may spawn via redirection or thesystem
function. (In a future version ofgawk
, it may do so.) Some operating systems may not have environment variables. On such systems, theENVIRON
array is empty (except forENVIRON["AWKPATH"]
). ERRNO *
-
If a system error occurs either doing a redirection for
getline
, during a read forgetline
, or during aclose
operation, thenERRNO
will contain a string describing the error. This variable is agawk
extension. In otherawk
implementations, or ifgawk
is in compatibility mode (see section Command Line Options), it is not special. FILENAME
-
This is the name of the file that
awk
is currently reading. When no data files are listed on the command line,awk
reads from the standard input, andFILENAME
is set to"-"
.FILENAME
is changed each time a new file is read (see section Reading Input Files). Inside aBEGIN
rule, the value ofFILENAME
is""
, since there are no input files being processed yet.(7) (d.c.) FNR
-
FNR
is the current record number in the current file.FNR
is incremented each time a new record is read (see section Explicit Input withgetline
). It is reinitialized to zero each time a new input file is started. NF
-
NF
is the number of fields in the current input record.NF
is set each time a new record is read, when a new field is created, or when$0
changes (see section Examining Fields). NR
-
This is the number of input records
awk
has processed since the beginning of the program's execution (see section How Input is Split into Records).NR
is set each time a new record is read. RLENGTH
-
RLENGTH
is the length of the substring matched by thematch
function (see section Built-in Functions for String Manipulation).RLENGTH
is set by invoking thematch
function. Its value is the length of the matched string, or -1 if no match was found. RSTART
-
RSTART
is the start-index in characters of the substring matched by thematch
function (see section Built-in Functions for String Manipulation).RSTART
is set by invoking thematch
function. Its value is the position of the string where the matched substring starts, or zero if no match was found. RT *
-
RT
is set each time a record is read. It contains the input text that matched the text denoted byRS
, the record separator. This variable is agawk
extension. In otherawk
implementations, or ifgawk
is in compatibility mode (see section Command Line Options), it is not special.
A side note about NR
and FNR
.
awk
simply increments both of these variables
each time it reads a record, instead of setting them to the absolute
value of the number of records read. This means that your program can
change these variables, and their new values will be incremented for
each record (d.c.). For example:
$ echo '1 > 2 > 3 > 4' | awk 'NR == 2 { NR = 17 } > { print NR }' -| 1 -| 17 -| 18 -| 19
Before FNR
was added to the awk
language
(see section Major Changes between V7 and SVR3.1),
many awk
programs used this feature to track the number of
records in a file by resetting NR
to zero when FILENAME
changed.
Using ARGC
and ARGV
In section Built-in Variables that Convey Information,
you saw this program describing the information contained in ARGC
and ARGV
:
$ awk 'BEGIN { > for (i = 0; i < ARGC; i++) > print ARGV[i] > }' inventory-shipped BBS-list -| awk -| inventory-shipped -| BBS-list
In this example, ARGV[0]
contains "awk"
, ARGV[1]
contains "inventory-shipped"
, and ARGV[2]
contains
"BBS-list"
.
Notice that the awk
program is not entered in ARGV
. The
other special command line options, with their arguments, are also not
entered. But variable assignments on the command line are
treated as arguments, and do show up in the ARGV
array.
Your program can alter ARGC
and the elements of ARGV
.
Each time awk
reaches the end of an input file, it uses the next
element of ARGV
as the name of the next input file. By storing a
different string there, your program can change which files are read.
You can use "-"
to represent the standard input. By storing
additional elements and incrementing ARGC
you can cause
additional files to be read.
If you decrease the value of ARGC
, that eliminates input files
from the end of the list. By recording the old value of ARGC
elsewhere, your program can treat the eliminated arguments as
something other than file names.
To eliminate a file from the middle of the list, store the null string
(""
) into ARGV
in place of the file's name. As a
special feature, awk
ignores file names that have been
replaced with the null string.
You may also use the delete
statement to remove elements from
ARGV
(see section The delete
Statement).
All of these actions are typically done from the BEGIN
rule,
before actual processing of the input begins.
See section Splitting a Large File Into Pieces, and see
section Duplicating Output Into Multiple Files, for an example
of each way of removing elements from ARGV
.
The following fragment processes ARGV
in order to examine, and
then remove, command line options.
BEGIN { for (i = 1; i < ARGC; i++) { if (ARGV[i] == "-v") verbose = 1 else if (ARGV[i] == "-d") debug = 1 else if (ARGV[i] ~ /^-?/) { e = sprintf("%s: unrecognized option -- %c", ARGV[0], substr(ARGV[i], 1, ,1)) print e > "/dev/stderr" } else break delete ARGV[i] } }
Go to the first, previous, next, last section, table of contents.