Mail::SpamAssassin::PeUserSContributed PerlMail::SpamAssassin::PerMsgStatus(3)NAMEMail::SpamAssassin::PerMsgStatus - per-message status (spam or
not-spam)
SYNOPSIS
my $spamtest = new Mail::SpamAssassin ({
'rules_filename' => '/etc/spamassassin.rules',
'userprefs_filename' => $ENV{HOME}.'/.spamassassin/user_prefs'
});
my $mail = $spamtest->parse();
my $status = $spamtest->check ($mail);
my $rewritten_mail;
if ($status->is_spam()) {
$rewritten_mail = $status->rewrite_mail ();
}
...
DESCRIPTION
The Mail::SpamAssassin "check()" method returns an object of this
class. This object encapsulates all the per-message state.
METHODS
$status->check ()
Runs the SpamAssassin rules against the message pointed to by the
object.
$status->learn()
After a mail message has been checked, this method can be called.
If the score is outside a certain range around the threshold, ie.
if the message is judged more-or-less definitely spam or definitely
non-spam, it will be fed into SpamAssassin's learning systems (cur‐
rently the naive Bayesian classifier), so that future similar mails
will be caught.
$score = $status->get_autolearn_points()
Return the message's score as computed for auto-learning. Certain
tests are ignored:
- rules with tflags set to 'learn' (the Bayesian rules)
- rules with tflags set to 'userconf' (user white/black-listing rules, etc)
- rules with tflags set to 'noautolearn'
Also note that auto-learning occurs using scores from either score‐
set 0 or 1, depending on what scoreset is used during message
check. It is likely that the message check and auto-learn scores
will be different.
$score = $status->get_head_only_points()
Return the message's score as computed for auto-learning, ignoring
all rules except for header-based ones.
$score = $status->get_learned_points()
Return the message's score as computed for auto-learning, ignoring
all rules except for learning-based ones.
$score = $status->get_body_only_points()
Return the message's score as computed for auto-learning, ignoring
all rules except for body-based ones.
$isspam = $status->is_spam ()
After a mail message has been checked, this method can be called.
It will return 1 for mail determined likely to be spam, 0 if it
does not seem spam-like.
$list = $status->get_names_of_tests_hit ()
After a mail message has been checked, this method can be called.
It will return a comma-separated string, listing all the symbolic
test names of the tests which were trigged by the mail.
$list = $status->get_names_of_subtests_hit ()
After a mail message has been checked, this method can be called.
It will return a comma-separated string, listing all the symbolic
test names of the meta-rule sub-tests which were trigged by the
mail. Sub-tests are the normally-hidden rules, which score 0 and
have names beginning with two underscores, used in meta rules.
$num = $status->get_score ()
After a mail message has been checked, this method can be called.
It will return the message's score.
$num = $status->get_required_score ()
After a mail message has been checked, this method can be called.
It will return the score required for a mail to be considered spam.
$num = $status->get_autolearn_status ()
After a mail message has been checked, this method can be called.
It will return one of the following strings depending on whether
the mail was auto-learned or not: "ham", "no", "spam", "disabled",
"failed", "unavailable".
$report = $status->get_report ()
Deliver a "spam report" on the checked mail message. This contains
details of how many spam detection rules it triggered.
The report is returned as a multi-line string, with the lines sepa‐
rated by "\n" characters.
$preview = $status->get_content_preview ()
Give a "preview" of the content.
This is returned as a multi-line string, with the lines separated
by "\n" characters, containing a fully-decoded, safe, plain-text
sample of the first few lines of the message body.
$msg = $status->get_message()
Return the object representing the message being scanned.
$status->rewrite_mail ()
Rewrite the mail message. This will at minimum add headers, and at
maximum MIME-encapsulate the message text, to reflect its spam or
not-spam status. The function will return a scalar of the rewrit‐
ten message.
The actual modifications depend on the configuration (see
"Mail::SpamAssassin::Conf" for more information).
The possible modifications are as follows:
To:, From: and Subject: modification on spam mails
Depending on the configuration, the To: and From: lines can
have a user-defined RFC 2822 comment appended for spam mail.
The subject line may have a user-defined string prepended to it
for spam mail.
X-Spam-* headers for all mails
Depending on the configuration, zero or more headers with names
beginning with "X-Spam-" will be added to mail depending on
whether it is spam or ham.
spam message with report_safe
If report_safe is set to true (1), then spam messages are
encapsulated into their own message/rfc822 MIME attachment
without any modifications being made.
If report_safe is set to false (0), then the message will only
have the above headers added/modified.
$status->set_tag($tagname, $value)
Set a template tag, as used in "add_header", report templates, etc.
This API is intended for use by plugins. Tag names will be con‐
verted to an all-uppercase representation internally.
$value can be a subroutine reference, which will be evaluated each
time the template is expanded. Note that perl supports closures,
which means that variables set in the caller's scope can be
accessed inside this "sub". For example:
my $text = "hello world!";
$status->set_tag("FOO", sub {
return $text;
});
See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS" section for more
details on how template tags are used.
"undef" will be returned if a tag by that name has not been
defined.
$string = $status->get_tag($tagname)
Get the current value of a template tag, as used in "add_header",
report templates, etc. This API is intended for use by plugins.
Tag names will be converted to an all-uppercase representation
internally. See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS" sec‐
tion for more details on tags.
"undef" will be returned if a tag by that name has not been
defined.
$status->set_spamd_result_item($subref)
Set an entry for the spamd result log line. $subref should be a
code reference for a subroutine which will return a string in
'name=VALUE' format, similar to the other entries in the spamd
result line:
Jul 17 14:10:47 radish spamd[16670]: spamd: result: Y 22 - ALL_NATURAL,
DATE_IN_FUTURE_03_06,DIET_1,DRUGS_ERECTILE,DRUGS_PAIN,
TEST_FORGED_YAHOO_RCVD,TEST_INVALID_DATE,TEST_NOREALNAME,
TEST_NORMAL_HTTP_TO_IP,UNDISC_RECIPS scantime=0.4,size=3138,user=jm,
uid=1000,required_score=5.0,rhost=localhost,raddr=127.0.0.1,
rport=33153,mid=<9PS291LhupY>,autolearn=spam
"name" and "VALUE" must not contain "=" or "," characters, as it is
important that these log lines are easy to parse.
The code reference will be called by spamd after the message has
been scanned, and the "PerMsgStatus::check()" method has returned.
$status->finish ()
Indicate that this $status object is finished with, and can be
destroyed.
If you are using SpamAssassin in a persistent environment, or
checking many mail messages from one "Mail::SpamAssassin" factory,
this method should be called to ensure Perl's garbage collection
will clean up old status objects.
$name = $status->get_current_eval_rule_name()
Return the name of the currently-running eval rule. "undef" is
returned if no eval rule is currently being run. Useful for plug‐
ins to determine the current rule name while inside an eval test
function call.
$status->get_decoded_body_text_array ()
Returns the message body, with base64 or quoted-printable encodings
decoded, and non-text parts or non-inline attachments stripped.
It is returned as an array of strings, with each string represent‐
ing one newline-separated line of the body.
$status->get_decoded_stripped_body_text_array ()
Returns the message body, decoded (as described in
get_decoded_body_text_array()), with HTML rendered, and with white‐
space normalized.
It will always render text/html, and will use a heuristic to deter‐
mine if other text/* parts should be considered text/html.
It is returned as an array of strings, with each string represent‐
ing one 'paragraph'. Paragraphs, in plain-text mails, are double-
newline-separated blocks of multi-line text.
$status->get (header_name [, default_value])
Returns a message header, pseudo-header, real name or address.
"header_name" is the name of a mail header, such as 'Subject',
'To', etc. If "default_value" is given, it will be used if the
requested "header_name" does not exist.
Appending ":raw" to the header name will inhibit decoding of
quoted-printable or base-64 encoded strings.
Appending ":addr" to the header name will cause everything except
the first email address to be removed from the header. For exam‐
ple, all of the following will result in "example@foo":
example@foo
example@foo (Foo Blah)
example@foo, example@bar
display: example@foo (Foo Blah), example@bar ;
Foo Blah <example@foo>
"Foo Blah" <example@foo>
"'Foo Blah'" <example@foo>
Appending ":name" to the header name will cause everything except
the first real name to be removed from the header. For example,
all of the following will result in "Foo Blah"
example@foo (Foo Blah)
example@foo (Foo Blah), example@bar
display: example@foo (Foo Blah), example@bar ;
Foo Blah <example@foo>
"Foo Blah" <example@foo>
"'Foo Blah'" <example@foo>
There are several special pseudo-headers that can be specified:
"ALL" can be used to mean the text of all the message's headers.
"ALL-TRUSTED" can be used to mean the text of all the message's
headers that could only have been added by trusted relays.
"ALL-INTERNAL" can be used to mean the text of all the message's
headers that could only have been added by internal relays.
"ALL-UNTRUSTED" can be used to mean the text of all the message's
headers that may have been added by untrusted relays. To make this
pseudo-header more useful for header rules the 'Received' header
that was added by the last trusted relay is included, even though
it can be trusted.
"ALL-EXTERNAL" can be used to mean the text of all the message's
headers that may have been added by external relays. Like
"ALL-UNTRUSTED" the 'Received' header added by the last internal
relay is included.
"ToCc" can be used to mean the contents of both the 'To' and 'Cc'
headers.
"EnvelopeFrom" is the address used in the 'MAIL FROM:' phase of the
SMTP transaction that delivered this message, if this data has been
made available by the SMTP server.
"MESSAGEID" is a symbol meaning all Message-Id's found in the mes‐
sage; some mailing list software moves the real 'Message-Id' to
'Resent-Message-Id' or 'X-Message-Id', then uses its own one in the
'Message-Id' header. The value returned for this symbol is the
text from all 3 headers, separated by newlines.
"X-Spam-Relays-Untrusted" is the generated metadata of untrusted
relays the message has passed through
"X-Spam-Relays-Trusted" is the generated metadata of trusted relays
the message has passed through
$status->get_uri_list ()
Returns an array of all unique URIs found in the message. It takes
a combination of the URIs found in the rendered (decoded and HTML
stripped) body and the URIs found when parsing the HTML in the mes‐
sage. Will also set $status->{uri_list} (the array as returned by
this function).
The returned array will include the "raw" URI as well as "slightly
cooked" versions. For example, the single URI
'http://%77w%77.example.com/' will get turned into: (
'http://%77w%77.example.com/', 'http://www.example.com/' )
$status->get_uri_detail_list ()
Returns a hash reference of all unique URIs found in the message
and various data about where the URIs were found in the message.
It takes a combination of the URIs found in the rendered (decoded
and HTML stripped) body and the URIs found when parsing the HTML in
the message. Will also set $status->{uri_detail_list} (the hash
reference as returned by this function). This function will also
set $status->{uri_domain_count} (count of unique domains).
The hash format looks something like this:
raw_uri => {
types => { a => 1, img => 1, parsed => 1 },
cleaned => [ canonified_uri ],
anchor_text => [ "click here", "no click here" ],
domains => { domain1 => 1, domain2 => 1 },
}
"raw_uri" is whatever the URI was in the message itself
(http://spamassassin.apache%2Eorg/).
"types" is a hash of the HTML tags (lowercase) which referenced the
raw_uri. parsed is a faked type which specifies that the raw_uri
was seen in the rendered text.
"cleaned" is an array of the raw and canonified version of the
raw_uri (http://spamassassin.apache%2Eorg/, http://spamassas‐
sin.apache.org/).
"anchor_text" is an array of the anchor text (text between <a> and
</a>), if any, which linked to the URI.
"domains" is a hash of the domains found in the canonified URIs.
$status->clear_test_state()
Clear test state, including test log messages from "$sta‐
tus->test_log()".
$status->got_hit ($rulename, $desc_prepend [, name => value, ...])
Register a hit against a rule in the ruleset.
There are two mandatory arguments. These are $rulename, the name of
the rule that fired, and $desc_prepend, which is a short string
that will be prepended to the rules "describe" string in output
reports.
In addition, callers can supplement that with the following
optional data:
score => $num
Optional: the score to use for the rule hit. If unspecified,
the value from the "Mail::SpamAssassin::Conf" object's
"{scores}" hash will be used.
value => $num
Optional: the value to assign to the rule; the default value is
1. tflags multiple rules use values of greater than 1 to indi‐
cate multiple hits. This value is accessible to meta rules.
ruletype => $type
Optional, but recommended: the rule type string. This is used
in the "hit_rule" plugin call, called by this method. If
unset, 'unknown' is used.
Backwards compatibility: the two mandatory arguments have been part
of this API since SpamAssassin 2.x. The optional name=<gtvalue>
pairs, however, are a new addition in SpamAssassin 3.2.0.
$status->create_fulltext_tmpfile (fulltext_ref)
This function creates a temporary file containing the passed scalar
reference data (typically the full/pristine text of the message).
This is typically used by external programs like pyzor and dccproc,
to avoid hangs due to buffering issues. Methods that need this,
should call $self->create_fulltext_tmpfile($fulltext) to retrieve
the temporary filename; it will be created if it has not already
been.
Note: This can only be called once until $status->delete_full‐
text_tmpfile() is called.
$status->delete_fulltext_tmpfile ()
Will cleanup after a $status->create_fulltext_tmpfile() call.
Deletes the temporary file and uncaches the filename.
SEE ALSO
"Mail::SpamAssassin" "spamassassin"
perl v5.8.82008-06-1Mail::SpamAssassin::PerMsgStatus(3)