Regexp::Common::commenUser Contributed Perl DocumentRegexp::Common::comment(3)NAMERegexp::Common::comment-- provide regexes for comments.
SYNOPSIS
use Regexp::Common qw /comment/;
while (<>) {
/$RE{comment}{C}/ and print "Contains a C comment\n";
/$RE{comment}{C++}/ and print "Contains a C++ comment\n";
/$RE{comment}{PHP}/ and print "Contains a PHP comment\n";
/$RE{comment}{Java}/ and print "Contains a Java comment\n";
/$RE{comment}{Perl}/ and print "Contains a Perl comment\n";
/$RE{comment}{awk}/ and print "Contains an awk comment\n";
/$RE{comment}{HTML}/ and print "Contains an HTML comment\n";
}
use Regexp::Common qw /comment RE_comment_HTML/;
while (<>) {
$_ =~ RE_comment_HTML() and print "Contains an HTML comment\n";
}
DESCRIPTION
Please consult the manual of Regexp::Common for a general description
of the works of this interface.
Do not use this module directly, but load it via Regexp::Common.
This modules gives you regular expressions for comments in various lan-
guages.
THE LANGUAGES
Below, the comments of each of the languages are described. The pat-
terns are available as $RE{comment}{LANG}, foreach language LANG. Some
languages have variants; it's described at the individual languages how
to get the patterns for the variants. Unless mentioned otherwise,
"{-keep}" sets $1, $2, $3 and $4 to the entire comment, the opening
marker, the content of the comment, and the closing marker (for many
languages, the latter is a newline) respectively.
ABC Comments in ABC start with a backslash ("\"), and last till the end
of the line. See <http://homepages.cwi.nl/%7Esteven/abc/>.
Ada Comments in Ada start with "--", and last till the end of the line.
Advisor
Advisor is a language used by the HP product glance. Comments for
this language start with either "#" or "//", and last till the end
of the line.
Advsys
Comments for the Advsys language start with ";" and last till the
end of the line. See also <http://www.wurb.com/if/devsys/12>.
Alan
Alan comments start with "--", and last till the end of the line.
See also <http://w1.132.telia.com/~u13207378/alan/manual/alan-
TOC.html>.
Algol 60
Comments in the Algol 60 language start with the keyword "comment",
and end with a ";". See <http://www.mass-
werk.at/algol60/report.htm>.
Algol 68
In Algol 68, comments are either delimited by "#", or by one of the
keywords "co" or "comment". The keywords should not be part of
another word. See <http://westein.arb-phys.uni-dort-
mund.de/~wb/a68s.txt>. With "{-keep}", only $1 will be set,
returning the entire comment.
ALPACA
The ALPACA language has comments starting with "/*" and ending with
"*/".
awk The awk programming language uses comments that start with "#" and
end at the end of the line.
B The B language has comments starting with "/*" and ending with
"*/".
BASIC
There are various forms of BASIC around. Currently, we only support
the variant supported by mvEnterprise, whose pattern is available
as $RE{comment}{BASIC}{mvEnterprise}. Comments in this language
start with a "!", a "*" or the keyword "REM", and end till the end
of the line. See <http://www.rainingdata.com/prod-
ucts/beta/docs/mve/50/ReferenceManual/Basic.pdf>.
Beatnik
The esotoric language Beatnik only uses words consisting of let-
ters. Words are scored according to the rules of Scrabble. Words
scoring less than 5 points, or 18 points or more are considered
comments (although the compiler might mock at you if you score less
than 5 points). Regardless whether "{-keep}", $1 will be set, and
set to the entire comment. This pattern requires perl 5.8.0 or
newer.
beta-Juliet
The beta-Juliet programming language has comments that start with
"//" and that continue till the end of the line. See also
<http://www.catseye.mb.ca/esoteric/b-juliet/index.html>.
Befunge-98
The esotoric language Befunge-98 uses comments that start and end
with a ";". See <http://www.catseye.mb.ca/eso-
teric/befunge/98/spec98.html>.
BML BML, or Better Markup Language is an HTML templating language that
uses comments starting with "<?c_", and ending with "c_?>". See
<http://www.livejournal.com/doc/server/bml.index.html>.
Brainfuck
The minimal language Brainfuck uses only eight characters, "<",
">", "[", "]", "+", "-", "." and ",". Any other characters are
considered comments. With "{-keep}", $1 is set to the entire com-
ment.
C The C language has comments starting with "/*" and ending with
"*/".
C-- The C-- language has comments starting with "/*" and ending with
"*/". See <http://cs.uas.arizona.edu/classes/453/pro-
grams/C--Spec.html>.
C++ The C++ language has two forms of comments. Comments that start
with "//" and last till the end of the line, and comments that
start with "/*", and end with "*/". If "{-keep}" is used, only $1
will be set, and set to the entire comment.
C# The C# language has two forms of comments. Comments that start with
"//" and last till the end of the line, and comments that start
with "/*", and end with "*/". If "{-keep}" is used, only $1 will be
set, and set to the entire comment. See
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/vclr-
fcsharpspec_C.asp>.
Caml
Comments in Caml start with "(*", end with "*)", and can be nested.
See <http://www.cs.caltech.edu/courses/cs134/cs134b/book.pdf> and
<http://pauillac.inria.fr/caml/index-eng.html>.
Cg The Cg language has two forms of comments. Comments that start with
"//" and last till the end of the line, and comments that start
with "/*", and end with "*/". If "{-keep}" is used, only $1 will be
set, and set to the entire comment. See <http://devel-
oper.nvidia.com/attach/3722>.
CLU In "CLU", a comment starts with a procent sign ("%"), and ends with
the next newline. See <ftp://ftp.lcs.mit.edu:/pub/pclu/CLU-syn-
tax.ps> and <http://www.pmg.lcs.mit.edu/CLU.html>.
COBOL
Traditionally, comments in COBOL are indicated by an asteriks in
the seventh column. This is what the pattern matches. Modern com-
piler may more lenient though. See
<http://www.csis.ul.ie/cobol/Course/COBOLIntro.htm>, and
<http://www.csis.ul.ie/cobol/default.htm>. Due to a bug in the reg-
exp engine of perl 5.6.x, this regexp is only available in version
5.8.0 and up.
CQL Comments in the chess query language (CQL) start with a semi colon
(";") and last till the end of the line. See
<http://www.rbnn.com/cql/>.
Crystal Report
The formula editor in Crystal Reports uses comments that start with
"//", and end with the end of the line.
Dylan
There are two types of comments in Dylan. They either start with
"//", or are nested comments, delimited with "/*" and "*/". Under
"{-keep}", only $1 will be set, returning the entire comment. This
pattern requires perl 5.6.0 or newer.
ECMAScript
The ECMAScript language has two forms of comments. Comments that
start with "//" and last till the end of the line, and comments
that start with "/*", and end with "*/". If "{-keep}" is used, only
$1 will be set, and set to the entire comment. JavaScript is
Netscapes implementation of ECMAScript. See <http://www.ecma-inter-
national.org/publications/files/ecma-st/Ecma-262.pdf>, and
<http://www.ecma-international.org/publications/stan-
dards/Ecma-262.htm>.
Eiffel
Eiffel comments start with "--", and last till the end of the line.
False
In False, comments start with "{" and end with "}". See
<http://wouter.fov120.com/false/false.txt>
FPL The FPL language has two forms of comments. Comments that start
with "//" and last till the end of the line, and comments that
start with "/*", and end with "*/". If "{-keep}" is used, only $1
will be set, and set to the entire comment.
Forth
Comments in Forth start with "\", and end with the end of the line.
See also <http://docs.sun.com/sb/doc/806-1377-10>.
Fortran
There are two forms of Fortran. There's free form Fortran, which
has comments that start with "!", and end at the end of the line.
The pattern for this is given by $RE{Fortran}. Fixed form Fortran,
which has been obsoleted, has comments that start with "C", "c" or
"*" in the first column, or with "!" anywhere, but the sixth col-
umn. The pattern for this are given by $RE{Fortran}{fixed}.
See also <http://www.cray.com/craydoc/manu-
als/007-3692-005/html-007-3692-005/>.
Funge-98
The esotoric language Funge-98 uses comments that start and end
with a ";".
fvwm2
Configuration files for fvwm2 have comments starting with a "#" and
lasting the rest of the line.
Haifu
Haifu, an esotoric language using haikus, has comments starting and
ending with a ",". See <http://www.dangermouse.net/eso-
teric/haifu.html>.
Haskell
There are two types of comments in Haskell. They either start with
at least two dashes, or are nested comments, delimited with "{-"
and "-}". Under "{-keep}", only $1 will be set, returning the
entire comment. This pattern requires perl 5.6.0 or newer.
HTML
In HTML, comments only appear inside a comment declaration. A com-
ment declaration starts with a "<!", and ends with a ">". Inside
this declaration, we have zero or more comments. Comments starts
with "--" and end with "--", and are optionally followed by white-
space. The pattern $RE{comment}{HTML} recognizes those comment dec-
larations (and hence more than a comment). Note that this is not
the same as something that starts with "<!--" and ends with "-->",
because the following will be matched completely:
<!-- First Comment --
--> Second Comment <!--
-- Third Comment -->
Do not be fooled by what your favourite browser thinks is an HTML
comment.
If "{-keep}" is used, the following are returned:
$1 captures the entire comment declaration.
$2 captures the MDO (markup declaration open), "<!".
$3 captures the content between the MDO and the MDC.
$4 captures the (last) comment, without the surrounding dashes.
$5 captures the MDC (markup declaration close), ">".
Hugo
There are two types of comments in Hugo. They either start with "!"
(which cannot be followed by a "\"), or are nested comments, delim-
ited with "!\" and "\!". Under "{-keep}", only $1 will be set,
returning the entire comment. This pattern requires perl 5.6.0 or
newer.
Icon
Icon has comments that start with "#" and end at the next new line.
See <http://www.toolsofcomputing.com/IconHandbook/IconHand-
book.pdf>, <http://www.cs.arizona.edu/icon/index.htm>, and
<http://burks.bton.ac.uk/burks/language/icon/index.htm>.
ILLGOL
The esotoric language ILLGOL uses comments starting with NB and
lasting till the end of the line. See <http://www.cats-
eye.mb.ca/esoteric/illgol/index.html>.
INTERCAL
Comments in INTERCAL are single line comments. They start with one
of the keywords "NOT" or "N'T", and can optionally be preceeded by
the keywords "DO" and "PLEASE". If both keywords are used, "PLEASE"
preceeds "DO". Keywords are separated by whitespace.
J The language J uses comments that start with "NB.", and that last
till the end of the line. See <http://www.jsoft-
ware.com/books/help/primer/contents.htm>, and <http://www.jsoft-
ware.com/>.
Java
The Java language has two forms of comments. Comments that start
with "//" and last till the end of the line, and comments that
start with "/*", and end with "*/". If "{-keep}" is used, only $1
will be set, and set to the entire comment.
JavaScript
The JavaScript language has two forms of comments. Comments that
start with "//" and last till the end of the line, and comments
that start with "/*", and end with "*/". If "{-keep}" is used, only
$1 will be set, and set to the entire comment. JavaScript is
Netscapes implementation of ECMAScript. See
<http://www.mozilla.org/js/language/E262-3.pdf>, and
<http://www.mozilla.org/js/language/>.
LaTeX
The documentation language LaTeX uses comments starting with "%"
and ending at the end of the line.
Lisp
Comments in Lisp start with a semi-colon (";") and last till the
end of the line.
LPC The LPC language has comments starting with "/*" and ending with
"*/".
LOGO
Comments for the language LOGO start with ";", and last till the
end of the line.
lua Comments for the lua language start with "--", and last till the
end of the line. See also <http://www.lua.org/manual/manual.html>.
M, MUMPS
In "M" (aka "MUMPS"), comments start with a semi-colon, and last
till the end of a line. The language specification requires the
semi-colon to be preceeded by one or more linestart characters.
Those characters default to a space, but that's configurable. This
requirement, of preceeding the comment with linestart characters is
not tested for. See <ftp://ftp.inter-
sys.com/pub/openm/ism/ism64docs.zip>, <http://mtechnology.inter-
sys.com/mproducts/openm/index.html>, and <http://mcen-
ter.com/mtrc/index.html>.
mutt
Configuration files for mutt have comments starting with a "#" and
lasting the rest of the line.
Nickle
The Nickle language has one line comments starting with "#" (like
Perl), or multiline comments delimited by "/*" and "*/" (like C).
Under "-keep", only $1 will be set. See also
<http://www.nickle.org>.
Oberon
Comments in Oberon start with "(*" and end with "*)". See
<http://www.oberon.ethz.ch/oreport.html>.
Pascal
There are many implementations of Pascal. This modules provides
pattern for comments of several implementations.
$RE{comment}{Pascal}
This is the pattern that recognizes comments according to the
Pascal ISO standard. This standard says that comments start
with either "{", or "(*", and end with "}" or "*)". This means
that "{*)" and "(*}" are considered to be comments. Many Pascal
applications don't allow this. See <http://www.pascal-cen-
tral.com/docs/iso10206.txt>
$RE{comment}{Alice}
The Alice Pascal compiler accepts comments that start with "{"
and end with "}". Comments are not allowed to contain newlines.
See <http://www.templetons.com/brad/alice/language/>.
$RE{comment}{Pascal}{Delphi}, $RE{comment}{Pascal}{Free} and
$RE{comment}{Pascal}{GPC}
The Delphi Pascal, Free Pascal and the Gnu Pascal Compiler
implementations of Pascal all have comments that either start
with "//" and last till the end of the line, are delimited with
"{" and "}" or are delimited with "(*" and "*)". Patterns for
those comments are given by $RE{comment}{Pascal}{Delphi},
$RE{comment}{Pascal}{Free} and $RE{comment}{Pascal}{GPC}
respectively. These patterns only set $1 when "{-keep}" is
used, which will then include the entire comment.
See <http://info.borland.com/techpubs/delphi5/oplg/>,
<http://www.freepascal.org/docs-html/ref/ref.html> and
<http://www.gnu-pascal.de/gpc/>.
$RE{comment}{Pascal}{Workshop}
The Workshop Pascal compiler, from SUN Microsystems, allows
comments that are delimited with either "{" and "}", delimited
with "(*)" and "*"), delimited with "/*", and "*/", or starting
and ending with a double quote ("""). When "{-keep}" is used,
only $1 is set, and returns the entire comment.
See <http://docs.sun.com/db/doc/802-5762>.
PEARL
Comments in PEARL start with a "!" and last till the end of the
line, or start with "/*" and end with "*/". With "{-keep}", $1 will
be set to the entire comment.
PHP Comments in PHP start with either "#" or "//" and last till the end
of the line, or are delimited by "/*" and "*/". With "{-keep}", $1
will be set to the entire comment.
PL/B
In PL/B, comments start with either "." or ";", and end with the
next newline. See <http://www.mmcctech.com/pl-b/plb-0010.htm>.
PL/I
The PL/I language has comments starting with "/*" and ending with
"*/".
PL/SQL
In PL/SQL, comments either start with "--" and run till the end of
the line, or start with "/*" and end with "*/".
Perl
Perl uses comments that start with a "#", and continue till the end
of the line.
Portia
The Portia programming language has comments that start with "//",
and last till the end of the line.
Python
Python uses comments that start with a "#", and continue till the
end of the line.
Q-BAL
Comments in the Q-BAL language start with "`" (a backtick), and
contine till the end of the line.
QML In "QML", comments start with "#" and last till the end of the
line. See <http://www.questionmark.com/uk/qml/overview.doc>.
R The statistical language R uses comments that start with a "#" and
end with the following new line. See <http://www.r-project.org/>.
REBOL
Comments for the REBOL language start with ";" and last till the
end of the line.
Ruby
Comments in Ruby start with "#" and last till the end of the time.
Scheme
Scheme comments start with ";", and last till the end of the line.
See <http://schemers.org/>.
shell
Comments in various shells start with a "#" and end at the end of
the line.
Shelta
The esotoric language Shelta uses comments that start and end with
a ";". See <http://www.catseye.mb.ca/esoteric/shelta/index.html>.
SLIDE
The SLIDE language has two froms of comments. First there is the
line comment, which starts with a "#" and includes the rest of the
line (just like Perl). Second, there is the multiline, nested com-
ment, which are delimited by "(*" and "*)". Under C{-keep}>, only
$1 is set, and is set to the entire comment. This pattern needs at
least Perl version 5.6.0. See <http://www.cs.berke-
ley.edu/~ug/slide/docs/slide/spec/spec_frame_intro.shtml>.
slrn
Configuration files for slrn have comments starting with a "%" and
lasting the rest of the line.
Smalltalk
Smalltalk uses comments that start and end with a double quote,
""".
SMITH
Comments in the SMITH language start with ";", and last till the
end of the line.
Squeak
In the Smalltalk variant Squeak, comments start and end with """.
Double quotes can appear inside comments by doubling them.
SQL Standard SQL uses comments starting with two or more dashes, and
ending at the end of the line.
MySQL does not follow the standard. Instead, it allows comments
that start with a "#" or "-- " (that's two dashes and a space) end-
ing with the following newline, and comments starting with "/*",
and ending with the next ";" or "*/" that isn't inside single or
double quotes. A pattern for this is returned by $RE{com-
ment}{SQL}{MySQL}. With "{-keep}", only $1 will be set, and it
returns the entire comment.
Tcl In Tcl, comments start with "#" and continue till the end of the
line.
TeX The documentation language TeX uses comments starting with "%" and
ending at the end of the line.
troff
The document formatting language troff uses comments starting with
"\"", and continuing till the end of the line.
vi In configuration files for the editor vi, one can use comments
starting with """, and ending at the end of the line.
*W In the language *W, comments start with "||", and end with "!!".
zonefile
Comments in DNS zonefiles start with ";", and continue till the end
of the line.
REFERENCES
[Go 90]
Charles F. Goldfarb: The SGML Handbook. Oxford: Oxford University
Press. 1990. ISBN 0-19-853737-9. Ch. 10.3, pp 390-391.
HISTORY
$Log: comment.pm,v $
Revision 2.116 2005/03/16 00:00:02 abigail
CQL, INTERCAL, R
Revision 2.115 2005/01/09 23:12:03 abigail
BML comments
Revision 2.114 2004/12/18 11:43:06 abigail
POD: HTML comments end in >, not <
Revision 2.113 2004/12/15 22:06:51 abigail
Fixed regex for J comments
Revision 2.112 2004/06/09 21:44:48 abigail
New languages
Revision 2.111 2003/09/24 08:39:35 abigail
Stupid "syntax" warning issues false positives
Revision 2.110 2003/08/19 21:27:55 abigail
Nickle language
Revision 2.109 2003/08/13 10:07:39 abigail
Added patterns for C--, C#, Cg and SLIDE comments
Revision 2.108 2003/08/01 11:30:25 abigail
Comments for 'QML' and 'PL/SQL'
Revision 2.107 2003/05/25 21:33:48 abigail
POD nits from Bryan C. Warnock
Revision 2.106 2003/03/12 22:25:42 abigail
- More generic setup to define comments for various languages.
- Expanded and redid the documentation for comment.pm.
- Comments for Advisor, Advsys, Alan, Algol 60, Algol 68, B,
BASIC (mvEnterprise), Forth, Fortran (both fixed and free form),
fvwm2, mutt, Oberon, 6 versions of Pascal,
PEARL (one of the at least four...), PL/B, PL/I, slrn, Squeak.
Revision 2.105 2003/03/09 19:04:42 abigail
- More generic setup to define comments for various languages.
- Expanded and redid the documentation for comment.pm.
Now every language has its own paragraph, describing its comment,
and pointers to webpages.
- Comments for Advisor, Advsys, Alan, Algol 60, Algol 68, B, BASIC
(mvEnterprise), Forth, Fortran (both fixed and free form), fvwm2, mutt,
Oberon, 6 versions of Pascal, PEARL (one of the at least four...), PL/B,
PL/I, slrn, Squeak.
Revision 2.104 2003/02/21 14:48:06 abigail
Crystal Reports
Revision 2.103 2003/02/11 09:39:08 abigail
Added
Revision 2.102 2003/02/07 15:23:54 abigail
Lua and FPL
Revision 2.101 2003/02/01 22:55:31 abigail
Changed Copyright years
Revision 2.100 2003/01/21 23:19:40 abigail
The whole world understands RCS/CVS version numbers, that 1.9 is an
older version than 1.10. Except CPAN. Curse the idiot(s) who think
that version numbers are floats (in which universe do floats have
more than one decimal dot?).
Everything is bumped to version 2.100 because CPAN couldn't deal
with the fact one file had version 1.10.
Revision 1.19 2002/11/06 13:51:34 abigail
Minor POD changes.
Revision 1.18 2002/09/18 18:13:01 abigail
Fixes for 5.005
Revision 1.17 2002/09/04 17:04:24 abigail
Q-BAL
Revision 1.16 2002/08/27 16:50:50 abigail
Patterns for Beatnik, Befunge-98, Funge-98 and W*.
Revision 1.15 2002/08/22 17:04:03 abigail
SMITH added
Revision 1.14 2002/08/22 16:41:25 abigail
+ Added function 'id' and 'from_to' with associated data.
+ Added function 'combine' for languages having multiple syntaxes.
+ Added 'Shelta'
Revision 1.13 2002/08/21 16:00:32 abigail
beta-Juliet, Portia, ILLGOL and Brainfuck.
Revision 1.12 2002/08/20 17:40:37 abigail
- Created a 'nested' function (simplified version from
Regexp::Common::balanced).
- Comments that use 'from' to eol or balanced (nested) delimiters
are now generated from a data array.
- Added Hugo and Haifu.
Revision 1.11 2002/08/05 12:16:58 abigail
Fixed 'Regex::' and 'Rexexp::' typos to 'Regexp::'
(Found my Mike Castle).
Revision 1.10 2002/07/31 23:33:16 abigail
Documented that Haskell and Dylan comments need at least 5.6.0.
Revision 1.9 2002/07/31 23:12:29 abigail
Dylan and Haskell comments can be nested, hence version 5.6.0 of Perl
is needed to be able to make a regex matching them.
Revision 1.8 2002/07/31 14:48:16 abigail
Added LOGO (to please petdance)
Revision 1.7 2002/07/31 13:06:41 abigail
Dealt with -keep for Haskell and Dylan.
Revision 1.6 2002/07/31 00:54:00 abigail
Added comments for Haskell, Dylan, Smalltalk and MySQL.
Revision 1.5 2002/07/30 16:38:23 abigail
Added support for the languages: LaTeX, Tcl, TeX and troff.
Revision 1.4 2002/07/26 16:48:12 abigail
Simplied datastructure for the languages that use single line comments.
Revision 1.3 2002/07/26 16:37:20 abigail
Added new languages: Ada, awk, Eiffel, Java, LPC, PHP, Python,
REBOL, Ruby, vi and zonefile.
Revision 1.2 2002/07/25 22:37:44 abigail
Added 'use strict'.
Added 'no_defaults' to 'use Regex::Common' to prevent loaded of all
defaults.
Revision 1.1 2002/07/25 19:56:07 abigail
Modularizing Regexp::Common.
SEE ALSO
Regexp::Common for a general description of how to use this interface.
AUTHOR
Damian Conway (damian@conway.org)
MAINTAINANCE
This package is maintained by Abigail (regexp-common@abigail.nl).
BUGS AND IRRITATIONS
Bound to be plenty.
For a start, there are many common regexes missing. Send them in to
regexp-common@abigail.nl.
COPYRIGHT
Copyright (c) 2001 - 2003, Damian Conway. All Rights Reserved.
This module is free software. It may be used, redistributed
and/or modified under the terms of the Perl Artistic License
(see http://www.perl.com/perl/misc/Artistic.html)
perl v5.8.8 2003-03-23 Regexp::Common::comment(3)