Mail-SpamAssassin

 view release on metacpan or  search on metacpan

lib/Mail/SpamAssassin/Plugin/TxRep.pm  view on Meta::CPAN

 _TXREPXXXYMEAN_     Mean score on which TXREP modification is based
 _TXREPXXXYCOUNT_    Number of messages on which TXREP modification is based
 _TXREPXXXYPRESCORE_ Score before TXREP
 _TXREPXXXYUNKNOWN_  New sender (not found in the TXREP list)

The XXX part of the tag takes the form of one of the following IDs, depending
on the reputation checked: EMAIL, EMAILIP, IP, DOMAIN, or HELO. The Y appendix
ID is used only in the case of dual storage, and takes the form of either U (for
user storage reputations), or G (for global storage reputations).

=cut

package Mail::SpamAssassin::Plugin::TxRep;

use strict;
use warnings;
# use bytes;
use re 'taint';

use NetAddr::IP 4.000;                          # qw(:upper);
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Plugin::Bayes;
use Mail::SpamAssassin::Util qw(untaint_var);
use Mail::SpamAssassin::Logger;

our @ISA = qw(Mail::SpamAssassin::Plugin);


###########################################################################
sub new {                       # constructor: register the eval rule
###########################################################################
  my ($class, $main) = @_;

  $class   = ref($class) || $class;
  my $self = $class->SUPER::new($main);
  bless($self, $class);

  $self->{main}          = $main;
  $self->{conf}          = $main->{conf};
  $self->{factor}        = $main->{conf}->{txrep_factor};
  $self->register_eval_rule("check_senders_reputation", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
  $self->set_config($main->{conf});

  # only the default conf loaded here, do nothing here requiring
  # the runtime settings
  dbg("TxRep: new object created");
  return $self;
}


###########################################################################
sub set_config {
###########################################################################
  my($self, $conf) = @_;
  my @cmds;

# -------------------------------------------------------------------------
=head1 USER PREFERENCES

The following options can be used in both site-wide (C<local.cf>) and
user-specific (C<user_prefs>) configuration files to customize how
SpamAssassin handles incoming email messages.

=over 4

=item B<use_txrep>

  0 | 1                 (default: 0)

Whether to use TxRep reputation system.  TxRep tracks the long-term average
score for each sender and then shifts the score of new messages toward that
long-term average.  This can increase or decrease the score for messages,
depending on the long-term behavior of the particular correspondent.

Note that certain tests are ignored when determining the final message score:

 - rules with tflags set to 'noautolearn'

=cut

  push (@cmds, {
    setting     => 'use_txrep',
    default     => 0,
    type        => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
  });


=item B<txrep_factor>

 range [0..1]           (default: 0.5)

How much towards the long-term mean for the sender to regress a message.
Basically, the algorithm is to track the long-term total score and the count
of messages for the sender (C<total> and C<count>), and then once we have
otherwise fully calculated the score for this message (C<score>), we calculate
the final score for the message as:

 finalscore = score + factor * (total + score)/(count + 1)

So if C<factor> = 0.5, then we'll move to half way between the calculated
score and the new mean value.  If C<factor> = 0.3, then we'll move about 1/3
of the way from the score toward the mean.  C<factor> = 1 means use the
long-term mean including also the new unadjusted score; C<factor> = 0 mean
just use the calculated score, disabling so the score averaging, though still
recording the reputation to the database.

=cut

  push (@cmds, {
    setting     => 'txrep_factor',
    default     => 0.5,
    type        => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
    code        => sub {
        my ($self, $key, $value, $line) = @_;
        if ($value < 0 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
        $self->{txrep_factor} = $value;
    }
  });


=item B<txrep_dilution_factor>

lib/Mail/SpamAssassin/Plugin/TxRep.pm  view on Meta::CPAN

    setting     => 'txrep_weight_helo',
    default     => 0.5,
    type        => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
    code        => sub {
        my ($self, $key, $value, $line) = @_;
        if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
        $self->{txrep_weight_helo} = $value;
    }
  });

=item B<txrep_report_details>

  0 | 1 | 2             (default: 0)

Add TxRep details to the rule's description in the message report or summary,
similar to how RBL rules commonly are showing listed domains.

If enabled (value 1) the identificators (From address bound to originating IP
address fraction, From address alone, domain name bound to originating IP
address fraction, originating IP address and HELO if available) used in
calculating the sender's overall reputation are listed, including the
originating IP address fraction (according to the mask settings) where
applicable.

If this option is set to 2, the listed identificators' individual mean
reputation and count are reported in addition.

Identificators and additional data will only be added to the description on a
message's initial scan.  Re-processing a previously already scanned message
will not list the individual idenficators and their respective reputation
values used originally.

This option is disabled by default for now, due to potential formatting issues
caused by the number and length of additional description details.

=cut

  push (@cmds, {
    setting     => 'txrep_report_details',
    default     => 0,
    type        => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
    code        => sub {
        my ($self, $key, $value, $line) = @_;

        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE
          if ($value eq '');
        return $Mail::SpamAssassin::Conf::INVALID_VALUE
          unless ($value =~ /^[012]$/);

        $self->{txrep_report_details} = $value;
    }
  });

=back

=head1 ADMINISTRATOR SETTINGS

These settings differ from the ones above, in that they are considered 'more
privileged' -- even more than the ones in the B<PRIVILEGED SETTINGS> section.
No matter what C<allow_user_rules> is set to, these can never be set from a
user's C<user_prefs> file.

=over 4

=item B<txrep_factory module>

 (default: Mail::SpamAssassin::DBBasedAddrList)

Select alternative database factory module for the TxRep database.

=cut

  push (@cmds, {
    setting      => 'txrep_factory',
    is_admin     => 1,
    default      => 'Mail::SpamAssassin::DBBasedAddrList',
    type         => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
  });


=item B<auto_welcomelist_path /path/filename>

 (default: ~/.spamassassin/tx-reputation)

Previously auto_whitelist_path which will work interchangeably until 4.1.

This is the TxRep directory and filename.  By default, each user
has their own reputation database in their C<~/.spamassassin> directory with
mode 0700.  For system-wide SpamAssassin use, you may want to share this
across all users.

=cut

  push (@cmds, {
    setting      => 'auto_welcomelist_path',
    aliases      => ['auto_whitelist_path'], # removed in 4.1
    is_admin     => 1,
    default      => '__userstate__/tx-reputation',
    type         => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
    code         => sub {
        my ($self, $key, $value, $line) = @_;
        unless (defined $value && $value !~ /^$/) {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
        $self->{auto_welcomelist_path} = $value;
    }
  });


=item B<auto_welcomelist_db_modules Module ...>

 (default: see below)

Previously auto_whitelist_db_modules which will work interchangeably until 4.1.

What database modules should be used for the TxRep storage database
file.   The first named module that can be loaded from the Perl include path
will be used.  The format is:

  PreferredModuleName SecondBest ThirdBest ...

ie. a space-separated list of Perl module names.  The default is:

lib/Mail/SpamAssassin/Plugin/TxRep.pm  view on Meta::CPAN

        {$self->{learning} =      $self->{conf}->{txrep_learn_penalty};}
  else  {$self->{learning} = -1 * $self->{conf}->{txrep_learn_bonus};}

  my $ret = !$self->{learning} || $self->check_senders_reputation($pms);
  $self->{learning} = undef;
  return $ret;
}


######################################################### plugin hook #####
sub forget_message {
###########################################################################
  my ($self, $params) = @_;
  return 0 unless ($self->{conf}->{use_txrep});
  my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg});

  dbg("TxRep: forgetting a message");
  $self->{forgetting} = 1;
  my $ret = $self->check_senders_reputation($pms);
  $self->{forgetting} = undef;
  return $ret;
}


######################################################### plugin hook #####
sub learner_expire_old_training {
###########################################################################
  my ($self, $params) = @_;
  return 0 unless ($self->{conf}->{use_txrep} && $self->{conf}->{txrep_expiry_days});

  dbg("TxRep: expiry not implemented yet");
#  dbg("TxRep: expiry starting");
#  my $timer = $self->{main}->time_method("expire_bayes");
#  $self->{store}->expire_old_tokens($params);
#  dbg("TxRep: expiry completed");
}


######################################################### plugin hook #####
sub learner_close {
###########################################################################
  my ($self, $params) = @_;
  my $quiet = $params->{quiet};
  return 0 unless ($self->{conf}->{use_txrep});

  $self->{txKeepStoreTied} = undef;
  $self->finish();
  dbg("TxRep: learner_close");
}

sub has_txrep_min_score { 1 }
sub has_txrep_max_score { 1 }

=head1 OPTIMIZING TXREP

TxRep can be optimized for speed and simplicity, or for the precision in
assigning the reputation scores.

First of all TxRep can be quickly disabled and re-enabled through the option
L</C<use_txrep>>. It can be done globally, or individually in each respective
C<user_prefs>. Disabling TxRep will not destroy the database, so it can be
re-enabled any time later again.

On many systems, SQL-based storage may perform faster than the default
Berkeley DB storage, so you should consider setting it up.

Then there are multiple settings that can reduce the number of records stored
in the database, hence reducing the size of the storage, and also the processing
time:

1. Setting L</C<txrep_user2global_ratio>> to zero will disable the dual storage,
halving so the disk space requirements, and the processing times of this plugin.

2. You can disable all but one of the L<REPUTATION WEIGHTS>. The EMAIL_IP is
the most specific option, so it is the most likely choice in such case, but you
could base the reputation system on any of the remaining scores. Each of the
enabled reputations adds a new entry to the database for each new identificator.
So while for example the number of recorded and scored domains may be big, the
number of stored IP addresses will be probably higher, and would require more
space in the storage.

3. Disabling the L</C<txrep_track_messages>> avoids storing a separate entry
for every scanned message, hence also reducing the disk space requirements, and
the processing time.

4. Disabling the option L</C<txrep_autolearn>> will save the processing time
at messages that trigger the auto-learning process.

5. Disabling L</C<txrep_welcomelist_out>> will reduce the processing time at
outbound connections.

6. Keeping the option L</C<auto_welcomelist_distinguish_signed>> enabled may help
slightly reducing the size of the database, because at signed messages, the
originating IP address is ignored, hence no additional database entries are
needed for each separate IP address (resp. a masked block of IP addresses).


Since TxRep reuses the storage architecture of the former AWL plugin, for
initializing the SQL storage, the same instructions apply also to TxRep.
Although the old AWL table can be reused for TxRep, by default TxRep expects
the SQL table to be named "txrep".

To install a new SQL table for TxRep, run the appropriate SQL file for your
system under the /sql directory.

If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
instead of ENGINE=MyISAM at the end of the command. You can also use other
types of ENGINE (depending on what is available on your system). For example
MEMORY engine stores the entire table in the server memory, achieving
performance similar to Redis. You would need to care about the replication
of the RAM table to disk through a cronjob, to avoid loss of data at reboot.
The InnoDB engine is used by default, offering high scalability (database
size and concurrence of accesses). In conjunction with a high value of
innodb_buffer_pool or with the memcached plugin (MySQL v5.6+) it can also
offer performance comparable to Redis.

=cut

1;



( run in 0.830 second using v1.01-cache-2.11-cpan-8f98c5d2c55 )