RWCRegexp(3C++) RWCRegexp(3C++)
NameRWCRegexp - Rogue Wave library class
Synopsis
#include <rw/regexp.h>
RWCRegexp re(".*\.doc");// Matches filename with suffix ".doc"
Description
Class RWCRegexp represents a regular expression. The constructor
"compiles" the expression into a form that can be used more efficiently.
The results can then be used for string searches using class RWCString.
The regular expression (RE) is constucted as follows: The following
rules determine one-character REs that match a single character:
1.1 Any character that is not a special character (to be defined) matches
itself.
1.2 A backslash (\fR) followed by any special character matches the
literal character itself. I.e., this "escapes" the special character.
1.3 The "special characters" are:
+ * ? . [ ] ^ $
1.4 The period (.) matches any character except the newline. E.g.,
".umpty" matches either "Humpty" or "Dumpty."
1.5 A set of characters enclosed in brackets ([]) is a one-character RE
that matches any of the characters in that set. E.g., "[akm]" matches
either an "a", "k", or "m". A range of characters can be indicated with
a dash. E.g., "[a-z]" matches any lower-case letter. However, if the
first character of the set is the caret (^), then the RE matches any
character except those in the set. It does not match the empty string.
Example: [^akm] matches any character except "a", "k", or "m". The
caret loses its special meaning if it is not the first character of the
set. The following rules can be used to build a multicharacter RE.
2.1 A one-character RE followed by an asterisk (*) matches zero or more
occurrences of the RE. Hence, [a-z]* matches zero or more lower-case
characters.
2.2 A one-character RE followed by a plus (+) matches one or more
occurrences of the RE. Hence, [a-z]+ matches one or more lower-case
characters.
Page 1
RWCRegexp(3C++) RWCRegexp(3C++)
2.3 A question mark (?) is an optional element. The preceeding RE can
occur zero or once in the string -- no more. E.g. xy?z matches either
xyz or xz.
2.4 The concatenation of REs is a RE that matches the corresponding
concatenation of strings. E.g., [A-Z][a-z]* matches any capitalized
word. Finally, the entire regular expression can be anchored to match
only the beginning or end of a line:
3.1 If the caret (^) is at the beginning of the RE, then the matched
string must be at the beginning of a line.
3.2 If the dollar sign ($) is at the end of the RE, then the matched
string must be at the end of the line. The following escape codes can be
used to match control characters:
backspace
\ ESC (escape)
formfeed
newline
carriage return
tab
d the literal hex number 0xdd
dd the literal octal number ddd
C Control code. E.g. \fB^D is "control-D"
Persistence
None
Example
#include <rw/regexp.h>
#include <rw/cstring.h>
#include <rw/rstream.h>
main(){
RWCString aString("Hark! Hark! the lark");
// A regular expression matching any lower-case word
// starting with "l":
RWCRegexp reg("l[a-z]*");
cout << aString(reg) << endl; // Prints "lark"
}
Page 2
RWCRegexp(3C++) RWCRegexp(3C++)
Public Constructors
RWCRegexp(const char* pat);
Construct a regular expression from the pattern given by pat. The status
of the results can be found by using member function status().
RWCRegexp(const RWCRegexp& r);
Copy constructor. Uses value semantics -- self will be a copy of r.
Public Destructor
~RWCRegexp();
Destructor. Releases any allocated memory.
Assignment Operators
RWCRegexp&
operator=(const RWCRegexp&);
Uses value semantics -- sets self to a copy of r.
RWCRegexp&
operator=(const char* pat);
Recompiles self to the pattern given by pat. The status of the results
can be found by using member function status().
Public Member Functions
size_t
index(const RWCString& str,size_t* len, size_t start=0) const;
Returns the index of the first instance in the string str that matches
the regular expression compiled in self, or RW_NPOS if there is no such
match. The search starts at index start. The length of the matching
pattern is returned in the variable pointed to by len. If an invalid
regular expression is used for the search, an exception of type
RWInternalErr will be thrown. Note that this member function is
relatively clumsy to use -- class RWCString offers a better interface to
regular expression searches.
statVal
status();
Returns the status of the regular expression and resets status to OK:
Page 3
RWCRegexp(3C++) RWCRegexp(3C++)
statVal Meaning
RWCRegexp::OK No errors
RWCRegexp::ILLEGAL Pattern was illegal
RWCRegexp::TOOLONG Pattern exceeded maximum length
Page 4