awk(1)awk(1)NAMEawk - Pattern scanning and processing language
SYNOPSISawk [-F ERE] [-f program_file]... [-v var=val]... [argument]...
awk [-F ERE] [-v var=val]... ['program_text'] [argument]...
STANDARDS
Interfaces documented on this reference page conform to industry stan‐
dards as follows:
awk: XCU5.0
Refer to the standards(5) reference page for more information about
industry standards and associated tags.
OPTIONS
Defines ERE (extended regular expression) as the value of the input
field separator before any input is read. Using this option is compa‐
rable to assigning a value to the built-in variable FS. Specifies the
pathname (program_file) of a file containing a awk program. If multi‐
ple instances of this option are specified, the concatenation of the
files specified as program_file in the order specified is the awk pro‐
gram. The awk program can alternatively be specified on the command
line as the single argument program_text. The var=val argument is an
assignment operand that specifies a value (val) for a variable (var).
The specified variable assignment occurs prior to executing the awk
program, including the actions associated with BEGIN patterns (if any
are in the program). Multiple occurrences of the -v option can be
specified on the awk command line.
OPERANDS
If -f program_file is not specified, the first parameter to awk is pro‐
gram_text, delimited by single quotation (') characters.
See the DESCRIPTION section for the processing of this parame‐
ter. The following two types of argument can be intermixed: A
pathname of a file that contains the input to be read, which is
matched against the set of patterns in the program. If no
input_file operands are specified, or if the input_file argument
is -, standard input is used. The characters before the = rep‐
resent the name of an awk variable. If that name is an awk
reserved word, the behavior is undefined. The characters fol‐
lowing the = are interpreted as if they appeared in the awk pro‐
gram preceded and followed by a double quotation (") character;
in other words, as a string value. If the value is considered a
numeric string, the variable is assigned a numeric value. Each
such variable assignment occurs just prior to the processing of
the following program_file, if any. Thus, an assignment before
the first program_file argument is executed after the BEGIN
actions (if any), while an assignment after the last pro‐
gram_file argument occurs before the END actions (if any). If
there are no program_file arguments, assignments are executed
before processing the standard input.
If the environment variable STDS_FLAG is set to ALL, the vari‐
able names that begin with a digit or = are considered invalid.
DESCRIPTION
The awk command executes programs written in the awk programming lan‐
guage, a powerful pattern matching utility for textual data manipula‐
tion. An awk program is a sequence of patterns and corresponding
actions that are carried out when a pattern is read. The awk command
is a more powerful tool for text manipulation than either sed or grep.
The awk command: Performs convenient numeric processing Allows vari‐
ables within actions Allows general selection of patterns Allows con‐
trol flow in the actions Does not require any compiling of programs
The pattern-matching and action statements of the awk language can be
specified either on the command line or in a program file. In either
case, the awk command first reads all program statements.
If -f program_file is not specified, the first operand to awk is pro‐
gram_text, delimited by single quotation (') characters.
Execution of an awk program starts by executing the actions associated
with all BEGIN patterns in the order they occur in the program. Then,
each operand in an input-file argument (or standard input if an input
file is not specified) is processed in turn by: Reading input data
until a record separator is seen (a newline character by default)
Splitting the current record into fields using the current value of FS
Evaluating each pattern in the program in the order of occurrence Exe‐
cuting the action associated with each pattern that matches the current
record
The action for a matching pattern is executed before evaluating
subsequent patterns. The actions associated with all END pat‐
terns are executed in program order.
Refer to the EXAMPLES section for an example that demonstrates the
results of specifying a variable assignment as a flag argument or com‐
mand argument in different positions on the awk command line.
The awk command reads input data in the order stated on the command
line. If you specify input_file as a - (dash) or do not specify a
filename, awk reads standard input.
The awk command reads input data from any of the following sources: Any
input_file operands or their equivalents, which can be affected by mod‐
ifying the awk variables ARGV and ARGC Standard input, in the absence
of any input_file operands Arguments to the getline function
Input files must be text files. When the built-in variable RS is set
to a value other than a newline character, awk supports records termi‐
nated with the specified separator up to LINE_MAX bytes.
Pattern-action statements on the command line are enclosed in ' (single
quote characters) to protect them from interpretation by the shell.
Consecutive pattern-action statements on the same command line are sep‐
arated by a ; (semicolon), within one set of quote delimiters.
By default, the awk command treats input lines as records, separated by
spaces, tabs, or a field separator you set with the FS variable. (When
a space character is the field separator, multiple spaces are recog‐
nized as a single separator.) Fields are referenced as $1, $2, and so
on. The reference $0 specifies the entire record (by default, a line).
If t is the argument to the -F option, field separator FS is set to the
tab character. If awk is invoked as awk -F\t, it causes the shell to
pass only t to the -F option, which will be treated as tab character
internally. This affects the behavior of awk-F t.... To get t as a
field separator, invoke the command as awk -F[t]....
Program Structure
A awk program is composed of pairs of the form: pattern { action}
Either the pattern or the action (including the enclosing brace charac‐
ters) can be omitted.
If pattern lacks a corresponding action, awk writes the entire record
that contains the pattern to standard output. If action lacks a corre‐
sponding pattern, awk applies the action to every record.
Actions
An action is a sequence of statements that follow C language syntax.
Any single statement can be replaced by a statement list enclosed in
braces. When statement is a list of statements, they must be separated
by newline characters or semicolons, and are executed sequentially in
order of appearance. Statements in the awk language include:
break continue delete array [expression] exit [expression] for (expres‐
sion;expression;expression) statement for (variable in array) statement
if (expression) statement [else statement] next print [expres‐
sion_list][>file|>>file][| command] printf format[ ,expres‐
sion_list][>file|>>file][| command] printf format[,expression_list
][>file] while (expression) statement variable=expression
Statements can end with a semicolon, a newline character, or the right
brace enclosing the action:
{ [ statement ... ] }
Expressions can have string or numeric values and are built using the
operators +, -, *, /, %, a space for string concatenation, and the C
operators ++, --, +=, -=, *=, /=, =, ^=, ?:, >, >=, <, <=, ==, $, (),
~, !~, in, ||, &&, !, and !=.
Because the actions process fields, input white space is not preserved
in the output.
The file and command arguments in awk statements can be literal names
or expressions enclosed in double quotation (") characters. Identical
string values in different statements refer to the same open file.
The print statement writes its arguments to standard output (or to a
file if > file or >> file is present), separated by the current output
field separator and terminated by the current output record separator.
The printf statement formats its expression list according to the for‐
mat of the printf subroutine, and writes it arguments to standard out‐
put, separated by the output field separator and terminated by the out‐
put record separator. You can redirect the output into a file using
the print ... file or printf( ...) > file statements.
Variables
Variables can be scalars, array elements (denoted x[i]), or fields.
With the exception of function parameters, variables are not explicitly
declared.
Variable names can consist of uppercase and lowercase alphabetic let‐
ters, the underscore character, the digits (0 to 9), and extended char‐
acters. Variable names cannot begin with a digit. Field variables are
designated by $ (dollar sign), followed by a number or numerical
expression. The effect of the field number expression evaluating to
anything other than a non-negative integer is unspecified.
Variables are initialized to the null string. Array subscripts can be
any string; they do not have to be numeric. This allows for a form of
associative memory. Enclose string constants in expressions in double
quotation (") characters.
There are several variables with special meaning to awk. They include:
The number of elements in the ARGV array. An array of command line
arguments, excluding options and the program_file arguments, numbered
from zero to ARGC-1.
The arguments in ARGV can be modified or added to; ARGC can be
altered. As each input file ends, awk treats the next non-null
element of ARGV, up to and including the current value of
ARGC-1, as the name of the next input file. Therefore, setting
an element of ARGV to null means that it is not be treated as an
input file. When the element is the character -, standard input
is specified. When the element matches the format for an
assignment (variable=value), the element is treated as an
assignment rather than as the name of an awk input file. The
PRINTF format for converting numbers to strings (except for out‐
put statements, where OFMT is used); %.6g by default. The vari‐
able ENVIRON is an array representing the value of the environ‐
ment. The indexes of the array are strings consisting of the
names of the environmental variables, and the value of each
array element is a string consisting of the value of that vari‐
able. The name of the current input file. Inside a BEGIN
action, the FILENAME value is undefined. Inside an END action,
the value is the name of the last input file processed. The
ordinal number of the current input line (record) in the current
file. Inside a BEGIN action, the value is zero. Inside an END
action, the value is the number of the last record processed in
the last file processed. Input field separator (default is a
space). If it is a space, then any number of spaces and tabs can
separate fields. The number of fields in the current input line
(record) with a limit of 199. The number of the current input
line (record). The print statement output field separator
(default is a space). The print statement output record separa‐
tor (default is a newline character). The printf statement out‐
put format for converting numbers to strings in output state‐
ments (default is %.6g). The length of the string matched by
the match function. Input record separator (default is a new‐
line character). The starting position of the string matched by
the match function, numbering from 1. This is always equivalent
to the return value of the match function. The subscript sepa‐
rator string for multi-dimensional arrays.
Functions
There are a variety of built-in functions that can be used in awk
actions.
Arithmetic Functions
The arithmetic functions, except for int, are based on the ISO C stan‐
dard. The behavior is undefined in cases where the ISO C standard
specifies that an error be returned or that the behavior is undefined.
Returns the arctangent of y/x. Returns the cosine of x, where x is in
radians. Returns the sine of x where x is in radians. Returns the
exponential factor of x. Returns the natural logarithm of x. Returns
the square root of x. Truncates its argument to an integer. It is
truncated toward 0 when x > 0. Returns a random number n, such that 0
<= n < 1. Sets the seed value for rand to expr or uses the time of day
if expr is omitted. The previous seed value is returned.
String Functions
Behave like sub (see below), except replace all occurrences of the reg‐
ular expression (like the ed utility global substitute) in $0 or in the
in argument, when specified. Returns the position, in characters, num‐
bering from 1, in string s where string t first occurs, or zero if it
does not occur at all. Returns the length, in characters, of its argu‐
ment taken as a string, or of the whole record, $0, if there is no
argument. Returns the position, in characters, numbering from 1, in
string s where the extended regular expression ere occurs, or zero if
it does not occur at all. RSTART is set to the starting position, zero
if no match is found; RLENGTH is set to the length of the matched
string, -1 if no match is found. Splits the string s into array ele‐
ments a[1], a[2], ... a[n], and return n. The separation is done with
the extended regular expression fs or with the field separator FS if fs
is not given. Each array element has a string value when created. If
the string assigned to any array element, with any occurrence of the
decimal point character from the current locale changed to a period
character, would be considered a numeric string, the array element also
has the numeric value of the numeric string. The effect of a null
string as the value of fs is unspecified. Formats the expressions
according to the printf format given by fmt and return the resulting
string. Substitutes the string repl in place of the first instance of
the extended regular expression ERE in string in and return the number
of substitutions. An ampersand (&) appearing in the string repl is
replaced by the string from in that matches the regular expression.
For each occurrence of backslash (\) encountered when scanning the
string repl from beginning to end, the next character is taken liter‐
ally and loses its special meaning (for example, \& is interpreted as a
literal ampersand character). Except for & and \, it is unspecified
what the special meaning of any such character is. If in is specified
and it is not an lvalue, the behavior is undefined. If in is omitted,
awk substitutes in the current record ($0). Returns the at most n
character substring of s that begins at position m, numbering from 1.
If n is missing, the length of the substring is limited by the length
of the string s. Returns a string based on the string s. Each charac‐
ter in s that is an upper case letter specified to have a tolower map‐
ping by the LC_TYPE category of the current locale is replaced in the
returned string by the lower case letter specified by the mapping.
Other characters in s are unchanged in the returned string. Returns a
string based on the string s. Each character in s that is a lower case
letter specified to have a toupper mapping by the LC_TYPE category of
the current locale is replaced in the returned string by the upper case
letter specified by the mapping. Other characters in s are unchanged
in the returned string.
Input/Output and General Functions
Closes the file or pipe opened by a print or printf statement or a call
to getline with the same string-valued expression. If the close was
successful, the function returns zero; otherwise, it returns non-zero.
Reads a record of input from a stream piped from the output of a com‐
mand. The stream is created if no stream is currently open with the
value of expression as its common name. The stream created is equiva‐
lent to one created by a call to the popen function with the value of
expression as the command argument and a value of r as the mode argu‐
ment. As long as the stream remains open, subsequent calls in which
expression evaluates to the same string read subsequent records from
the file. The stream will remain open until the close function is
called with an expression that evaluates to the same string value. At
that time, the stream is closed as if by a call to the pclose function.
If var is missing, $0 and NF are set; otherwise, var is set. Sets $0
to the next input record from the current input file. This form of
getline sets the NF, NR, and FNR variables. Sets variable var to the
next input record from the current input file. This form of getline
sets the FNR and NR variables. Reads the next record of input from a
named file. The expression is evaluated to produce a string that is
used as a full pathname. If the file of that name is not currently
open, it is opened. As long as the stream remains open, subsequent
calls in which expression evaluates to the same string value, read sub‐
sequent records from the file. The file remains open until the close
function is called with an expression that evaluates to the same string
value. If var is missing, $0 and NF are set; otherwise, var is set.
Executes the command given by expression in a manner equivalent to the
system function and returns the exit status to the command.
All forms of getline return 1 for successful input, zero for end of
file, and -1 for an error.
The getline function sets $0 to the next input record from the current
input file; getline < file sets $0 to the next record from file. The
function getlinex sets variable x instead. Finally, command| getline
pipes the output of command into getline. Each call of getline returns
the next line of output from command. In all cases, getline returns 1
for a successful input, 0 (zero) for End-of-File, and -1 for an error.
The getline function sets $0 to the next input record from the current
input file. The getline function returns 1 for a successful input and
0 for End-of-File.
Where strings are used as the name of a file or pipeline, the strings
must be textually identical. The terminology “same string value”
implies that “equivalent strings”, even those that differ only by space
characters, represent different files.
User-defined Functions
The awk language also provides user-defined functions. Such functions
can be defined as: function name(args,...) { statements }
A function can be referred to anywhere in an awk program; in particu‐
lar, the function's use can precede the function definition. The scope
of a function is global.
Function arguments can be either scalars or arrays; the behavior is
undefined if an array name is passed as an argument that the function
uses as a scalar, or if a scalar expression is passed as an argument
that the function uses as an array. Function arguments are passed by
value if scalar and by reference if array name. Argument names are
local to the function; all other variable names are global. The same
name is not used as both an argument name and as the name of a function
or special awk variable. The same name must not be used both as a
variable name with global scope and as the name of a function. The
same name must not be used within the same scope both as a scalar vari‐
able and as an array.
The number of parameters in the function definition need not match the
number of parameters in the function call. Excess formal parameters
can be used as local variables. If fewer arguments are supplied in a
function call than are in the function definition, the extra parameters
that are used in the function body as scalars is initialized with a
string value of the null string and a numeric value of zero, and the
extra parameters that are used in the function body as arrays are ini‐
tialized as empty arrays. If more arguments are supplied in a function
call than are in the function definition, the behavior is undefined.
When invoking a function, no white space can be placed between the
function name and the opening parenthesis. Function calls can be
nested and recursive calls can be made upon functions. Upon return
from any nested or recursive function call, the values of all the call‐
ing function's parameters are unchanged, except for array parameters
passed by reference. The return statement can be used to return a
value.
Patterns
Patterns are arbitrary Boolean combinations of patterns and relational
expressions (the !, ||, and && operators and parentheses for grouping).
You must start and end regular expressions with slashes. You can use
regular expressions as described for grep, including the following spe‐
cial characters: One or more occurrences of the pattern. Zero or one
occurrence of the pattern. Either of two statements. Grouping of
expressions.
Isolated regular expressions in a pattern apply to the entire line.
Regular expressions can occur in relational expressions. Any string
(constant or variable) can be used as a regular expression, except in
the position of an isolated regular expression in a pattern.
If two patterns are separated by a comma, the action is performed on
all lines between an occurrence of the first pattern and the next
occurrence of the second.
There are two types of relational expressions that you can use. The
first type has the form: expression match_operator pattern
where match_operator is either: ~ (for contains) or !~ (for does not
contain).
The second type has the form: expression relational_operator expres‐
sion
where relational_operator is any of the six C relational operators: <,
>, <=, >=, ==, and !=. An expression can be an arithmetic expression,
a relational expression, or a Boolean combination of these.
Special Patterns
You can use the BEGIN and END special patterns to capture control
before the first and after the last input line is read, respectively.
BEGIN must be the first pattern; END must be the last.
Each BEGIN pattern is matched once and its associated action executed
before the first record of input is read and before command line
assignment is done. Each END pattern is matched once and its associ‐
ated action executed after the last record of input has been read.
These two patterns have associated actions.
BEGIN and END do not combine with other patterns. Multiple BEGIN and
END patterns are allowed. The actions associated with the BEGIN pat‐
terns is executed in the order specified in the program, as are the END
actions. An END pattern can precede a BEGIN pattern in a program.
You have two ways to designate an extended regular expression other
than white space to separate fields. You can use the -Fere option on
the command line, or you can assign a string with the expression to the
built-in variable FS. Either action changes the field separator to
ere.
There are no explicit conversions between numbers and strings. To
force an expression to be treated as a number, add 0 to it. To force
it to be treated as a string, append a null string ("").
Comment Delimiter
In the awk language, a comment starts with the sharp sign character, #,
and continues to the end of the line. The # does not have to be the
first character on the line. The awk language ignores the rest of the
line following a sharp sign. For example : # This program prints a nice
friendly message. It helps
# Keep novice users from being afraid of the computer.
The purpose of a comment is to help you or another person understand
the program at a later time.
EXIT STATUS
The following exit values are returned: Successful completion. An
error occurred.
EXAMPLES
To display the file lines that are longer than 72 bytes, enter: % awk
'length >72' chapter1
This command selects each line of the file chapter1 that is
longer than 72 bytes. The command then writes these lines to
standard output because no action is specified. To display all
lines between the words start and stop, enter: % awk
'/start/,/stop/' chapter1 To run an awk program (sum2.awk) that
processes a file (chapter1), enter: % awk-f sum2.awk chap‐
ter1 The following awk program computes the sum and average of
the numbers in the second column of the input file:
{
sum += $2
} END {
print "Sum: ", sum;
print "Average:", sum/NR;
}
The first action adds the value of the second field of each line
to the sum variable. The awk command initializes sum, and all
variables, to 0 (zero) before starting. The keyword END before
the second action causes awk to perform that action after all of
the input file is read. The NR variable, which is used to cal‐
culate the average, is a special variable containing the number
of records (lines) that were read. To print the names of the
users who have the C shell as the initial shell, enter: % awk
-F: '$7 ~ /csh/ {print $1}' /etc/passwd To print the first two
fields in reversed order, enter: % awk '{ print $2, $1 }' The
following awk program prints the first two fields of the input
file in reversed order, with input fields separated by a comma,
then adds up the first column and prints the sum and average:
BEGIN { FS = "," }
{ print $2, $1}
{ s += $1 } END { print "sum is", s, "average is",
s/NR } The following example shows how command line assignments
synchronize with awk program statements.
Consider the following set of awk statements that make up a pro‐
gram named test_program:
BEGIN { if (RS == ":")
print "Assignment in effect for BEGIN statements"
}
{ if (RS == ":")
print "Assignment in effect for middle statements"
} END { if (RS == ":")
print "Assignment in effect for END statements"
}
Notice the different results that are produced by different ways
of assigning a value to RS on the awk command line. The file
text_file contains the line “Hello, Hello”. % awk-f test_pro‐
gram -v RS=: text_file
Assignment in effect for BEGIN statements Assignment in effect
for middle statements Assignment in effect for END statements
% awk-f test_program RS=: text_file
Assignment in effect for middle statements Assignment in effect
for END statements
% awk-f test_program text_file RS=:
Assignment in effect for END statements
ENVIRONMENT VARIABLES
The following environment variables affect the execution of awk: Pro‐
vides a default value for the internationalization variables that are
unset or null. If LANG is unset or null, the corresponding value from
the default locale is used. If any of the internationalization vari‐
ables contain an invalid setting, the utility behaves as if none of the
variables had been defined. If set to a non-empty string value, over‐
rides the values of all the other internationalization variables.
Determines the locale for the interpretation of sequences of bytes of
text data as characters (for example, single-byte as opposed to multi-
byte characters in arguments). Determines the locale for the format
and contents of diagnostic messages written to standard error. Deter‐
mines the location of message catalogs for the processing of LC_MES‐
SAGES. Resolves the behavior of the command in some scenarios that
cause noncompliance with POSIX standards. Setting this variable to ALL
enables the command to overcome all instances of noncompliance.
SEE ALSO
Commands: grep(1), lex(1), sed(1)
Routines: printf(3)
Programming Support Tools
awk(1)