awk(1)awk(1)NAMEawk - pattern-directed scanning and processing language
SYNOPSIS
fs] ] [program | progfile ...] [file ...]
DESCRIPTION
scans each input file for lines that match any of a set of patterns
specified literally in program or in one or more files specified as
progfile. With each pattern there can be an associated action that is
to be performed when a line in a file matches the pattern. Each line
is matched against the pattern portion of every pattern-action state‐
ment, and the associated action is performed for each matched pattern.
The file name means the standard input. Any file of the form is
treated as an assignment, not a filename. An assignment is evaluated
at the time it would have been opened if it were a filename, unless the
option is used.
An input line is made up of fields separated by white space, or by reg‐
ular expression The fields are denoted ...; refers to the entire line.
Options
recognizes the following options and arguments:
Specify regular expression used to separate fields.
The default is to recognize space and tab charac‐
ters, and to discard leading spaces and tabs. If
the option is used, leading input field separa‐
tors are no longer discarded.
Specify an awk program file.
Up to 100 program files can be specified. The
pattern-action statements in these files are exe‐
cuted in the same order as the files were speci‐
fied.
Cause assignment to occur before the action (if it
exists) is executed.
Statements
A pattern-action statement has the form:
A missing means print the line; a missing pattern always matches. Pat‐
tern-action statements are separated by new-lines or semicolons.
An action is a sequence of statements. A statement can be one of the
following:
if(expression) statement [ else statement ]
while(expression) statement
for(expression;expression;expression) statement
for(var in array) statement
do statement while(expression)
break
continue
{[statement ...]}
expression # commonly var = expression
print [expression-list] [ > expression]
printf format [, expression-list] [ > expression]
return [expression]
next # skip remaining patterns on this input line.
delete array [expression] # delete an array element.
exit [expression] # exit immediately; status is expression.
Statements are terminated by semicolons, newlines or right braces. An
empty expression-list stands for String constants are quoted with the
usual C escapes recognized within. Expressions take on string or
numeric values as appropriate, and are built using the operators (expo‐
nentiation), and concatenation (indicated by a blank). The operators
(double quotes, string conversion operator), and are also available in
expressions. Variables can be scalars, array elements (denoted ) or
fields. Variables are initialized to the null string. Array sub‐
scripts can be any string, not necessarily numeric (this allows for a
form of associative memory). Multiple subscripts such as are permit‐
ted. The constituents are concatenated, separated by the value of
The statement prints its arguments on the standard output (or on a file
if or is present or on a pipe if is present), separated by the current
output field separator, and terminated by the output record separator.
file and cmd can be literal names or parenthesized expressions. Iden‐
tical string values in different statements denote the same open file.
The statement formats its expression list according to the format (see
printf(3S)).
Built-In Functions
The built-in function closes the file or pipe opened by a or statement
or a call to with the same string-valued expr. This function returns
zero if successful, otherwise, it returns non-zero.
The customary functions are built in. Other built-in functions are:
Length of its associated argument (in bytes) taken as a string, or
of
if no argument.
Length of its associated argument (in characters) taken as a string,
or of
if no argument.
Returns a random number between zero and one.
Sets the seed value for
rand, and returns the previous seed value. If no
argument is given, the time of day is used as the
seed value; otherwise, expr is used.
Truncates to an integer value
Return the at most
n-character substring of s that begins at position
m, numbering from 1. If n is omitted, the sub‐
string is limited by the length of string s.
Return the position, in characters, numbering from 1, in string
s where string t first occurs, or zero if it does
not occur at all.
Return the position, in characters, numbering from 1, in string
s where the extended regular expression ere
occurs, or 0 if it does not. The variables and
are set to the position and length of the matched
string.
Splits the string s into array elements , , ..., , and returns n.
The separation is done with the regular expression
fs, or with the field separator if fs is not
given.
Substitutes repl for the first occurrence of the extended reg‐
ular expression ere in the string in. If in is
not given, is used.
Same as except that all occurrences of the regular expres‐
sion are replaced; and return the number of
replacements.
String resulting from formatting
expr ... according to the printf(3S) format fmt
Executes cmd and returns its exit status
Converts the argument string
s to uppercase and returns the result.
Converts the argument string
s to lowercase and returns the result.
The built-in function sets to the next input record from the current
input file; sets to the next record from file. x sets variable x
instead. Finally, pipes the output of cmd into each call of returns
the next line of output from cmd. In all cases, returns 1 for a suc‐
cessful input, 0 for end of file, and −1 for an error.
Patterns
Patterns are arbitrary Boolean combinations (with of regular expres‐
sions and relational expressions. supports Extended Regular Expres‐
sions as described in regexp(5). Isolated regular expressions in a
pattern apply to the entire line. Regular expressions can also occur
in relational expressions, using the operators and is a constant regu‐
lar expression; any string (constant or variable) can be used as a reg‐
ular expression, except in the position of an isolated regular expres‐
sion in a pattern.
A pattern can consist of two patterns separated by a comma; in this
case, the action is performed for all lines from an occurrence of the
first pattern though an occurrence of the second.
A relational expression is one of the following:
expression matchop regular-expression
expression relop expression
where a relop is any of the six relational operators in C, and a
matchop is either (matches) or (does not match). A conditional is an
arithmetic expression, a relational expression, or a Boolean combina‐
tion of the two.
The special patterns and can be used to capture control before the
first input line is read and after the last. and do not combine with
other patterns.
Special Characters
The following special escape sequences are recognized by in both regu‐
lar expressions and strings:
Escape Meaning
alert character
backspace character
form-feed character
new-line character
carriage-return character
tab character
vertical-tab character
1- to 3-digit octal value
nnn
1- to n-digit hexadecimal number
Variable Names
Variable names with special meanings are:
Input field separator regular expression; a space character by
default;
also settable by option
The number of fields in the current record.
The ordinal number of the current record from the start of
input. Inside a
action the value is zero. Inside an action the
value is the number of the last record pro‐
cessed.
The ordinal number of the current record in the current file.
Inside a
action the value is zero. Inside an action the
value is the number of the last record pro‐
cessed in the last file processed.
A pathname of the current input file.
The input record separator; a newline character by default.
The statement output field separator; a space
character by default.
The statement output record separator; a newline
character by default.
Output format for numbers (default
If the value of is not a floating-point format
specification, the results are unspecified.
Internal conversion format for numbers (default
If the value of is not a floating-point format
specification, the results are unspecified.
Under the UNIX Standard environment (see stan‐
dards(5)) if is not specified, is used as the
internal conversion format for numbers by
default.
The subscript separator string for multi-dimensional arrays; the
default
value is "\034"
The number of elements in the
array.
An array of command line arguments, excluding options and the
program argument numbered from zero to
The arguments in can be modified or added to;
can be altered. As each input file ends, will
treat the next non-null element of up to the
current value of inclusive, as the name of the
next input file. Thus, setting an element of
to null means that it will not be treated as
an input file. The name indicates the standard
input. If an argument matches the format of an
assignment operand, this argument will be
treated as an assignment rather than a file
argument.
Array of environment variables; subscripts are names.
For example, if environment variable produces
The starting position of the string matched by the
function, numbering from 1. This is always
equivalent to the return value of the func‐
tion.
The length of the string matched by the
function.
Functions can be defined (at the position of a pattern-action state‐
ment) as follows:
Parameters are passed by value if scalar, and by reference if array
name. Functions can be called recursively. Parameters are local to
the function; all other variables are global.
Note that if pattern-action statements are used in an HP-UX command
line as an argument to the command, the pattern-action statement must
be enclosed in single quotes to protect it from the shell. For exam‐
ple, to print lines longer than 72 characters, the pattern-action
statement as used in a script progfile command form) is:
The same pattern action statement used as an argument to the command is
quoted in this manner:
EXTERNAL INFLUENCES
For information about the UNIX standard environment, see standards(5).
Environment Variables
Provides a default value for the internationalization variables that
are unset
or null. If is unset or null, the default value of "C"
(see lang(5)) is used. If any of the internationaliza‐
tion variables contains an invalid setting, will behave
as if all internationalization variables are set to "C".
See environ(5).
If set to a non-empty string value, overrides the values of all the
other
internationalization variables.
Determines the interpretation of text as single and/or
multi-byte characters, the classification of characters
as printable, and the characters matched by character
class expressions in regular expressions.
Determines the radix character used when interpreting numeric input,
performing
conversion between numeric and string values and format‐
ting numeric output. Regardless of locale, the period
character (the decimal-point character of the POSIX
locale) is the decimal-point character recognized in
processing programs (including assignments in command-
line arguments).
Determines the locale for the behavior of ranges, equivalence classes
and multi-character collating elements within regular
expressions.
Determines the locale that should be used to affect the format and con‐
tents
of diagnostic messages written to standard error and
informative messages written to standard output.
Determines the location of message catalogues for the processing of
Determines the search path when looking for commands executed by
or input and output pipes.
In addition, all environment variables will be visible via the variable
International Code Set Support
Single- and multi-byte character code sets are supported except that
variable names must contain only ASCII characters and regular expres‐
sions must contain only valid characters.
DIAGNOSTICS
supports up to 199 fields ..., per record.
EXAMPLES
Print lines longer than 72 characters:
Print first two fields in opposite order:
Same, with input fields separated by comma and/or blanks and tabs:
BEGIN { FS = ",[ \t]*|[ \t]+" }
{ print $2, $1 }
Add up first column, print sum and average:
{ s += $1 }"
END { print "sum is", s, " average is", s/NR }
Print all lines between start/stop pairs:
Simulate command (see echo(1)):
BEGIN { # Simulate echo(1)
for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
printf "\n"
exit }
WARNINGS
If the input line length to is greater than 3,000 bytes, then the be‐
haviour is undefined.
AUTHOR
was developed by AT&T, IBM, OSF, and HP.
SEE ALSOlex(1), sed(1), standards(5).
A. V. Aho, B. W. Kernighan, P. J. Weinberger: Addison-Wesley, 1988.
STANDARDS CONFORMANCEawk(1)