PERL(1)PERL(1)NAMEperl - practical extraction and report language
SYNOPSISperl [options] filename args
DESCRIPTION
Perl is an interpreted language optimized for scanning arbitrary text
files, extracting information from those text files, and printing
reports based on that information. It's also a good language for many
system management tasks. The language is intended to be practical
(easy to use, efficient, complete) rather than beautiful (tiny, ele‐
gant, minimal). It combines (in the author's opinion, anyway) some of
the best features of C, sed, awk, and sh, so people familiar with those
languages should have little difficulty with it. (Language historians
will also note some vestiges of csh, Pascal, and even BASIC-PLUS.)
Expression syntax corresponds quite closely to C expression syntax.
Unlike most Unix utilities, perl does not arbitrarily limit the size of
your data—if you've got the memory, perl can slurp in your whole file
as a single string. Recursion is of unlimited depth. And the hash
tables used by associative arrays grow as necessary to prevent degraded
performance. Perl uses sophisticated pattern matching techniques to
scan large amounts of data very quickly. Although optimized for scan‐
ning text, perl can also deal with binary data, and can make dbm files
look like associative arrays (where dbm is available). Setuid perl
scripts are safer than C programs through a dataflow tracing mechanism
which prevents many stupid security holes. If you have a problem that
would ordinarily use sed or awk or sh, but it exceeds their capabili‐
ties or must run a little faster, and you don't want to write the silly
thing in C, then perl may be for you. There are also translators to
turn your sed and awk scripts into perl scripts.
Upon startup, perl looks for your script in one of the following
places: Specified line by line via -e switches on the command line.
Contained in the file specified by the first filename on the command
line. (Note that systems supporting the #! notation invoke inter‐
preters this way.) Passed in implicitly via standard input. This only
works if there are no filename arguments—to pass arguments to a stdin
script you must explicitly specify a - for the script name.
After locating your script, perl compiles it to an internal form. If
the script is syntactically correct, it is executed.
A single-character option may be combined with the following option, if
any. This is particularly useful when invoking a script using the #!
construct which only allows one argument. Example:
#!/usr/bin/perl -spi.bak # same as -s -p -i.bak
...
Options include:
-0digits
specifies the record separator ($/) as an octal number. If there
are no digits, the null character is the separator. Other
switches may precede or follow the digits. For example, if you
have a version of find which can print filenames terminated by the
null character, you can say this:
find . -name '*.bak' -print0 | perl-n0e unlink
The special value 00 will cause Perl to slurp files in paragraph
mode. The value 0777 will cause Perl to slurp files whole since
there is no legal character with that value.
-a turns on autosplit mode when used with a -n or -p. An implicit
split command to the @F array is done as the first thing inside
the implicit while loop produced by the -n or -p.
perl-ane ´print pop(@F), "\n";´
is equivalent to
while (<>) {
@F = split(´ ´);
print pop(@F), "\n";
}
-c causes perl to check the syntax of the script and then exit with‐
out executing it.
-d runs the script under the perl debugger. See the section on
Debugging.
-Dnumber
sets debugging flags. To watch how it executes your script, use
-D14. (This only works if debugging is compiled into your perl.)
Another nice value is -D1024, which lists your compiled syntax
tree. And -D512 displays compiled regular expressions.
-e commandline
may be used to enter one line of script. Multiple -e commands may
be given to build up a multi-line script. If -e is given, perl
will not look for a script filename in the argument list.
-iextension
specifies that files processed by the <> construct are to be
edited in-place. It does this by renaming the input file, opening
the output file by the same name, and selecting that output file
as the default for print statements. The extension, if supplied,
is added to the name of the old file to make a backup copy. If no
extension is supplied, no backup is made. Saying perl-p -i.bak
-e "s/foo/bar/;" ... is the same as using the script:
#!/usr/bin/perl -pi.bak
s/foo/bar/;
which is equivalent to
#!/usr/bin/perl
while (<>) {
if ($ARGV ne $oldargv) {
rename($ARGV, $ARGV . ´.bak´);
open(ARGVOUT, ">$ARGV");
select(ARGVOUT);
$oldargv = $ARGV;
}
s/foo/bar/;
}
continue {
print; # this prints to original filename
}
select(STDOUT);
except that the -i form doesn't need to compare $ARGV to $oldargv
to know when the filename has changed. It does, however, use
ARGVOUT for the selected filehandle. Note that STDOUT is restored
as the default output filehandle after the loop. You can use eof
to locate the end of each input file, in case you want to append
to each file, or reset line numbering (see example under eof).
-Idirectory
may be used in conjunction with -P to tell the C preprocessor
where to look for include files. By default /usr/include and
/usr/lib/perl are searched.
-loctnum
enables automatic line-ending processing. It has two effects:
first, it automatically chops the line terminator when used with
-n or -p , and second, it assigns $\ to have the value of octnum
so that any print statements will have that line terminator added
back on. If octnum is omitted, sets $\ to the current value of
$/. For instance, to trim lines to 80 columns:
perl-lpe ´substr($_, 80) = ""´
Note that the assignment $\ = $/ is done when the switch is pro‐
cessed, so the input record separator can be different than the
output record separator if the -l switch is followed by a -0
switch:
gnufind / -print0 | perl-ln0e 'print "found $_" if -p'
This sets $\ to newline and then sets $/ to the null character.
-n causes perl to assume the following loop around your script, which
makes it iterate over filename arguments somewhat like sed -n or
awk:
while (<>) {
... # your script goes here
}
Note that the lines are not printed by default. See -p to have
lines printed. Here is an efficient way to delete all files older
than a week:
find . -mtime +7 -print | perl-nle ´unlink;´
This is faster than using the -exec switch of find because you
don't have to start a process on every filename found.
-p causes perl to assume the following loop around your script, which
makes it iterate over filename arguments somewhat like sed:
while (<>) {
... # your script goes here
} continue {
print;
}
Note that the lines are printed automatically. To suppress print‐
ing use the -n switch. A -p overrides a -n switch.
-P causes your script to be run through the C preprocessor before
compilation by perl. (Since both comments and cpp directives
begin with the # character, you should avoid starting comments
with any words recognized by the C preprocessor such as if, else
or define.)
-s enables some rudimentary switch parsing for switches on the com‐
mand line after the script name but before any filename arguments
(or before a --). Any switch found there is removed from @ARGV
and sets the corresponding variable in the perl script. The fol‐
lowing script prints true if and only if the script is invoked
with a -xyz switch.
#!/usr/bin/perl -s
if ($xyz) { print "true\n"; }
-S makes perl use the PATH environment variable to search for the
script (unless the name of the script starts with a slash). Typi‐
cally this is used to emulate #! startup on machines that don't
support #!, in the following manner:
#!/usr/bin/perl
eval "exec /usr/bin/perl -S $0 $*"
if $running_under_some_shell;
The system ignores the first line and feeds the script to /bin/sh,
which proceeds to try to execute the perl script as a shell
script. The shell executes the second line as a normal shell com‐
mand, and thus starts up the perl interpreter. On some systems $0
doesn't always contain the full pathname, so the -S tells perl to
search for the script if necessary. After perl locates the
script, it parses the lines and ignores them because the variable
$running_under_some_shell is never true. A better construct than
$* would be ${1+"$@"}, which handles embedded spaces and such in
the filenames, but doesn't work if the script is being interpreted
by csh. In order to start up sh rather than csh, some systems may
have to replace the #! line with a line containing just a colon,
which will be politely ignored by perl. Other systems can't con‐
trol that, and need a totally devious construct that will work
under any of csh, sh or perl, such as the following:
eval '(exit $?0)' && eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
& eval 'exec /usr/bin/perl -S $0 $argv:q'
if 0;
-u causes perl to dump core after compiling your script. You can
then take this core dump and turn it into an executable file by
using the undump program (not supplied). This speeds startup at
the expense of some disk space (which you can minimize by strip‐
ping the executable). (Still, a "hello world" executable comes
out to about 200K on my machine.) If you are going to run your
executable as a set-id program then you should probably compile it
using taintperl rather than normal perl. If you want to execute a
portion of your script before dumping, use the dump operator
instead. Note: availability of undump is platform specific and
may not be available for a specific port of perl.
-U allows perl to do unsafe operations. Currently the only unsafe
operations are the unlinking of directories while running as supe‐
ruser, and running setuid programs with fatal taint checks turned
into warnings.
-v prints the version and patchlevel of your perl executable.
-w prints warnings about identifiers that are mentioned only once,
and scalar variables that are used before being set. Also warns
about redefined subroutines, and references to undefined filehan‐
dles or filehandles opened readonly that you are attempting to
write on. Also warns you if you use == on values that don't look
like numbers, and if your subroutines recurse more than 100 deep.
-xdirectory
tells perl that the script is embedded in a message. Leading
garbage will be discarded until the first line that starts with #!
and contains the string "perl". Any meaningful switches on that
line will be applied (but only one group of switches, as with nor‐
mal #! processing). If a directory name is specified, Perl will
switch to that directory before running the script. The -x switch
only controls the the disposal of leading garbage. The script
must be terminated with __END__ if there is trailing garbage to be
ignored (the script can process any or all of the trailing garbage
via the DATA filehandle if desired).
ENVIRONMENT
Used if chdir has no argument. Used if chdir has no argument and HOME
is not set. Used in executing subprocesses, and in finding the script
if -S is used. A colon-separated list of directories in which to look
for Perl library files before looking in the standard library and the
current directory. The command used to get the debugger code. If
unset, uses
require 'perldb.pl'
Apart from these, perl uses no other environment variables, except to
make them available to the script being executed, and to child pro‐
cesses. However, scripts running setuid would do well to execute the
following lines before doing anything else, just to keep people honest:
$ENV{´PATH´} = ´/bin:/usr/bin´; # or whatever you need
$ENV{´SHELL´} = ´/bin/sh´ if $ENV{´SHELL´} ne ´´;
$ENV{´IFS´} = ´´ if $ENV{´IFS´} ne ´´;
FILES
/tmp/perl-eXXXXXX temporary file for -e commands.
SEE ALSO
The complete perl documentation can be found in the UNIX System man‐
ager's Manual (SMM:19).
a2p awk to perl translator
s2p sed to perl translator
DIAGNOSTICS
Compilation errors will tell you the line number of the error, with an
indication of the next token or token type that was to be examined.
(In the case of a script passed to perl via -e switches, each -e is
counted as one line.)
Setuid scripts have additional constraints that can produce error mes‐
sages such as Insecure dependency. See the section on setuid scripts.
TRAPS
Accustomed awk users should take special note of the following: Semi‐
colons are required after all simple statements in perl (except at the
end of a block). Newline is not a statement delimiter. Curly brackets
are required on ifs and whiles. Variables begin with $ or @ in perl.
Arrays index from 0 unless you set $[. Likewise string positions in
substr() and index(). You have to decide whether your array has
numeric or string indices. Associative array values do not spring into
existence upon mere reference. You have to decide whether you want to
use string or numeric comparisons. Reading an input line does not
split it for you. You get to split it yourself to an array. And the
split operator has different arguments. The current input line is nor‐
mally in $_, not $0. It generally does not have the newline stripped.
($0 is the name of the program executed.) $<digit> does not refer to
fields—it refers to substrings matched by the last match pattern. The
print statement does not add field and record separators unless you set
$, and $\. You must open your files before you print to them. The
range operator is .., not comma. (The comma operator works as in C.)
The match operator is =~, not ~. (~ is the one's complement operator,
as in C.) The exponentiation operator is **, not ^. (^ is the XOR
operator, as in C.) The concatenation operator is ., not the null
string. (Using the null string would render /pat/ /pat/ unparsable,
since the third slash would be interpreted as a division operator—the
tokener is in fact slightly context sensitive for operators like /, ?,
and <. And in fact, . itself can be the beginning of a number.) Next,
exit and continue work differently. The following variables work dif‐
ferently
Awk Perl
ARGC $#ARGV
ARGV[0] $0
FILENAME $ARGV
FNR $. - something
FS (whatever you like)
NF $#Fld, or some such
NR $.
OFMT $#
OFS $,
ORS $\
RLENGTH length($&)
RS $/
RSTART length($`)
SUBSEP $;
When in doubt, run the awk construct through a2p and see what it gives
you.
Cerebral C programmers should take note of the following: Curly brack‐
ets are required on ifs and whiles. You should use elsif rather than
else if Break and continue become last and next, respectively. There's
no switch statement. Variables begin with $ or @ in perl. Printf does
not implement *. Comments begin with #, not /*. You can't take the
address of anything. ARGV must be capitalized. The system calls link,
unlink, rename, etc. return nonzero for success, not 0. Signal han‐
dlers deal with signal names, not numbers.
Seasoned sed programmers should take note of the following: Backrefer‐
ences in substitutions use $ rather than \. The pattern matching
metacharacters (, ), and | do not have backslashes in front. The range
operator is .. rather than comma.
Sharp shell programmers should take note of the following: The backtick
operator does variable interpretation without regard to the presence of
single quotes in the command. The backtick operator does no transla‐
tion of the return value, unlike csh. Shells (especially csh) do sev‐
eral levels of substitution on each command line. Perl does substitu‐
tion only in certain constructs such as double quotes, backticks, angle
brackets and search patterns. Shells interpret scripts a little bit at
a time. Perl compiles the whole program before executing it. The
arguments are available via @ARGV, not $1, $2, etc. The environment is
not automatically made available as variables.
BUGS
Perl is at the mercy of your machine's definitions of various opera‐
tions such as type casting, atof() and sprintf().
If your stdio requires a seek or eof between reads and writes on a par‐
ticular stream, so does perl. (This doesn't apply to sysread() and
syswrite().)
While none of the built-in data types have any arbitrary size limits
(apart from memory size), there are still a few arbitrary limits: a
given identifier may not be longer than 255 characters, and no compo‐
nent of your PATH may be longer than 255 if you use -S. A regular
expression may not compile to more than 32767 bytes internally.
Perl actually stands for Pathologically Eclectic Rubbish Lister, but
don't tell anyone I said that.
AUTHOR
Larry Wall <lwall@netlabs.com>
MS-DOS port by Diomidis Spinellis <dds@cc.ic.ac.uk>
4.3 Berkeley Distribution June 30, 1993 PERL(1)