formula results from the CPAN

formula

Mail-SpamAssassin

view release on metacpan or search on metacpan

lib/Mail/SpamAssassin/Plugin/Bayes.pm view on Meta::CPAN

864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884 $msgatime = $now if ( $msgatime > $now );
 
my @touch_tokens;
my $tinfo_spammy = $permsgstatus->{bayes_token_info_spammy} = [];
my $tinfo_hammy = $permsgstatus->{bayes_token_info_hammy} = [];
 
my %tok_strength = map( ($_, abs($pw{$_}->{prob} - 0.5)), @pw_keys);
my $log_each_token = (would_log('dbg', 'bayes') > 1);
 
# now take the most significant tokens and calculate probs using
# Robinson's formula.
 
@pw_keys = sort { $tok_strength{$b} <=> $tok_strength{$a} } @pw_keys;
 
if (@pw_keys > N_SIGNIFICANT_TOKENS) { $#pw_keys = N_SIGNIFICANT_TOKENS - 1 }
 
my @sorted;
my $score;
foreach my $tok (@pw_keys) {
  next if $tok_strength{$tok} <
              $Mail::SpamAssassin::Bayes::Combine::MIN_PROB_STRENGTH;

lib/Mail/SpamAssassin/Plugin/TxRep.pm view on Meta::CPAN

68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88 1. B<Scoring> - at AWL, although it tracks the number of messages received from each
respective sender, when calculating the corrective score at a new message, it does
not take it in count in any way. So for example a sender who previously sent a single
ham message with the score of -5, and then sends a second one with the score of +10,
AWL will issue a corrective score bringing the score towards the -5. With the default
C<auto_welcomelist_factor> of 0.5, the resulting score would be only 2.5. And it would be
exactly the same even if the sender previously sent 1,000 messages with the average of
-5. TxRep tries to take the maximal advantage of the collected data, and adjusts the
final score not only with the mean reputation score stored in the database, but also
respecting the number of messages already seen from the sender. You can see the exact
formula in the section L</C<txrep_factor>>.
 
2. B<Learning> - AWL ignores any spam/ham learning. In fact it acts against it, which
often leads to a frustrating situation, where a user repeatedly tags all messages of a
given sender as spam (resp. ham), but at any new message from the sender, AWL will
adjust the score of the message back to the historical average which does B<not> include
the learned scores. This is now changed at TxRep, and every spam/ham learning will be
recorded in the reputation database, and hence taken in consideration at future email
from the respective sender. See the section L</"LEARNING SPAM / HAM"> for more details.
 
3. B<Auto-Learning> - in certain situations SpamAssassin may declare a message an

lib/Mail/SpamAssassin/Plugin/TxRep.pm view on Meta::CPAN

1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922 }
 
 
=head1 LEARNING SPAM / HAM
 
When SpamAssassin is told to learn (or relearn) a given message as spam or
ham, all reputations relevant to the message (email, email_ip, domain, ip, helo)
in both global and user storages will be updated using the C<txrep_learn_penalty>
respectively the C<rxrep_learn_bonus> values. The new reputation of given sender
property (email, domain,...) will be the respective result of one of the following
formulas:
 
   new_reputation = old_reputation + learn_penalty
   new_reputation = old_reputation - learn_bonus
 
The TxRep plugin currently does track each message individually, hence it
does not detect when you learn the message repeatedly. It will add/subtract
the penalty/bonus score each time the message is fed to the spam learner.
 
=cut

( run in 0.638 second using v1.01-cache-2.11-cpan-f79bc02f770 )