regexp(3X)regexp(3X)NAMEcompile(), step(), advance() - regular expression compile and match
routines
SYNOPSIS
Remarks
Features documented in this manual entry are obsolescent and may be
removed in a future HP-UX release. Use of regcomp(3C) functions
instead is recommended.
DESCRIPTION
These functions are general-purpose regular expression matching rou‐
tines to be used in programs that perform Basic Regular Expression (see
regexp(5)) matching. These functions are defined in
The functions and do pattern matching given a character string and a
compiled regular expression as input. takes a Basic Regular Expression
as input and produces a compiled expression that can be used with and
The interface to this file is unpleasantly complex. Programs that
include this file must have the following five macros declared before
the statement. These macros are used by the routine.
Return the value of the next byte
in the regular expression pattern. Successive calls to
should return successive bytes of the regular expres‐
sion.
Return the next byte in the regular
expression. Successive calls to should return the same
byte (which should also be the next byte returned by
Cause the argument
c to be returned by the next call to (and No more than
one byte of pushback is ever needed, and this byte is
guaranteed to be the last byte read by The value of the
macro is always ignored.
This macro is used on normal exit of the
routine. The value of the argument pointer is a pointer
to the character after the last character of the com‐
piled regular expression. This is useful to programs
that must manage memory allocation.
This is the abnormal return from the
routine. The argument val is an error number (see table
below for meanings). This call should never return.
11 Range endpoint
too large.
16 Bad number.
25 ``\digit'' out
of range.
36 Illegal or miss‐
ing delimiter.
41 No remembered
search string.
42 imbalance.
43 Too many
44 More than 2 num‐
bers given in
45 expected after
46 First number
exceeds second
in
49 imbalance.
50 Regular expres‐
sion overflow.
The syntax of the routine is as follows:
The first parameter instring is never used
explicitly by the routine, but is useful for
programs that pass down different pointers
to input characters. It is sometimes used
in the declaration (see below). Programs
that call functions to input characters or
have characters in an external array can
pass down a value of for this parameter.
The next parameter expbuf is a character
pointer. It points to the location where
the compiled regular expression will be
placed.
The parameter endbuf is one more than the
highest address where the compiled regular
expression can be placed. If the compiled
expression cannot fit in (endbuf−expbuf)
bytes, a call to is made.
The parameter eof is the character which
marks the end of the regular expression.
For example, in ed(1), this character is
usually a
Each program that includes this file must
have a statement for This definition is
placed right after the declaration for the
function and the opening curly brace It is
used for dependent declarations and initial‐
izations. Most often it is used to set a
register variable to point to the beginning
of the regular expression so that this reg‐
ister variable can be used in the declara‐
tions for and Otherwise it can be used to
declare external variables that might be
used by and See the example below of the
declarations taken from grep(1).
also performs actual regular expression
matching in this file. The call to step is
as follows:
The first parameter to is a pointer to a
string of characters to be checked for a
match. This string should be null-termi‐
nated.
The second parameter expbuf is the compiled
regular expression that was obtained by a
call to
returns non-zero if the given string matches
the regular expression, and zero if the
expressions do not match. If there is a
match, two external character pointers are
set as a side effect to the call to The
variable set in is This is a pointer to the
first character that matched the regular
expression. The variable which is set by
the function points to the character after
the last character that matches the regular
expression. Thus, if the regular expression
matches the entire line, points to the first
character of string and points to the null
at the end of string.
uses the external variable which is set by
if the regular expression begins with If
this is set, tries to match the regular
expression to the beginning of the string
only. If more than one regular expression
is to be compiled before the first is exe‐
cuted, the value of circf should be saved
for each compiled expression and circf
should be set to that saved value before
each call to
is called from with the same arguments as
The purpose of is to step through the string
argument and call until returns non-zero,
which indicates a match, or until the end of
string is reached. To constrain string to
beginning-of-line in all cases, need not be
called; simply call
When encounters a or sequence in the regular
expression, it advances its pointer to the
string to be matched as far as possible and
recursively calls itself, trying to match
the rest of the string to the rest of the
regular expression. As long as there is no
match, advance backs up along the string
until it finds a match or reaches the point
in the string that initially matched the or
It is sometimes desirable to stop this back‐
ing up before the initial point in the
string is reached. If the external charac‐
ter pointer is equal to the point in the
string at sometime during the backing up
process, breaks out of the loop that backs
up and returns zero. This is used by ed(1)
and sed(1) for substitutions done globally
(not just the first occurrence, but the
whole line) so, for example, expressions
such as do not loop forever.
The additional external variables and are
used for special purposes.
EXTERNAL INFLUENCES
Locale
The category determines the collating
sequence used in compiling and executing
regular expressions.
The category determines the interpretation
of text as single and/or multi-byte charac‐
ters, and the characters matched by charac‐
ter class expressions in regular expres‐
sions.
International Code Set Support
Single- and multi-byte character code sets
are supported.
EXAMPLES
The following is an example of how the regu‐
lar expression macros and calls look from
grep(1):
...
...
SEE ALSOgrep(1), regcomp(3C), setlocale(3C), reg‐
exp(5).
STANDARDS CONFORMANCEregexp(3X)