Mail-Digest-Tools
view release on metacpan or search on metacpan
print "$tokill files deleted\n";
$tokill ? print KILL $killstr : print KILL "[None.]\n";
close KILL or die "Couldn't close $killfile after writing: $!";
}
sub _get_digest_list {
my ($config_in_ref, $config_out_ref) = @_;
opendir(DIR, ${$config_out_ref}{'dir_digest'}) || die "no ${$config_out_ref}{'dir_digest'}?: $!";
my @digests =
sort { lc($a) cmp lc($b) }
grep { /${$config_in_ref}{'grep_formula'}/ }
readdir(DIR);
closedir(DIR) || die "Could not close ${$config_out_ref}{'dir_digest'}: $!";
return \@digests;
}
sub _prep_source_file {
my ($config_in_ref, $config_out_ref, $digests_ref) = @_; # v1.94
# %in_out: hash of all instances in directory of a given digest,
# value refers to digest's title and its message topics
my (%in_out, $id);
Similarly, the Perl Beginner digest files look like this:
[PBML] Digest Number 1491.txt
[PBML] Digest Number 1492.txt
To correctly identify Perl-Win32-Users digest files from any other files in
the same directory, we compose a string which would form the core of a Perl
regular expression, I<i.e.,> everything in a pattern except the outer
delimiters. Internally, Mail::Digest::Tools passes the file name through a
C<grep { /regexp/ }> pattern, so the first key is called C<grep_formula>.
%pw32u_config_in = (
grep_formula => 'Perl-Win32-Users Digest',
...
);
The equivalent pattern for the Perl Beginners digest would be:
%pbml_config_in = (
grep_formula => '\[PBML\]',
...
);
Note that the C<[> and C<]> characters have to be escaped with a C<\>
backslash because they are normally metacharacters inside Perl regular
expressions.
We next have to extract the digest number from the digest's file name.
Certain mailing list programs give individual digests both a 'Volume' number
as well as an individual digest number. Perl-Win32-Users typifies this. In
the example above we need to capture both the C<1> as volume number and C<1771>
as digest number. The next key in our configuration hash is called
C<pattern_target>:
%pw32u_config_in = (
grep_formula => 'Perl-Win32-Users Digest',
pattern_target => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
...
);
Note the two sets of capturing parentheses.
Other digests, such as those at Yahoo! Groups, dispense with a volume number
and simply increment each digest number:
%pbml_config_in = (
grep_formula => '\[PBML\]',
pattern_target => '.*\s(\d+)\.txt$',
...
);
Note that this C<pattern_target> contains only one pair of capturing
parentheses.
=head2 Analysis of Digest's Internal Structure
A digest's internal structure is discussed in detail above (see
'A TYPICAL MAILING LIST DIGEST'). Here we need to identify two
characteristics: the way the digest introduces its list of today's topics
and the string it uses to delimit the list of today's topics from the first
individual message in the digest and all subsequent messages from one another.
Continuing with our two examples from above, we provide values for keys
C<topics_intro> and C<source_msg_delimiter>:
%pw32u_config_in = (
grep_formula => 'Perl-Win32-Users digest',
pattern_target => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
topics_intro => 'Today\'s Topics:',
source_msg_delimiter => "--__--__--\n\n",
...
);
Note the escaped C<'> apostrophe character in the value for key
C<topics_intro>.
%pbml_config_in = (
grep_formula => '\[PBML\]',
pattern_target => '.*\s(\d+)\.txt$',
topics_intro => 'Topics in this digest:',
source_msg_delimiter => "________________________________________________________________________\n________________________________________________________________________\n\n",
...
);
Note that the values provided for the respective C<source_msg_delimiter> keys
had to be double-quoted strings. That's because all such delimiters include
two or more C<\n> newline characters so that they form paragraphs unto
themselves. Unless indicated otherwise, the values for all other values in
Organization: Some Web Address
Reply-To: Sam Spade <sspade@some.web.address.com>
To: "Time" <summers@some.web.address.com>
CC: "Perl List" <perl-win32-users@listserv.activestate.com>
Subject: Re: New IE Update causes script problems
Patterns are easily developed to capture this information and store it in the
configuration hash:
%pw32u_config_in = (
grep_formula => 'Perl-Win32-Users digest',
pattern_target => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
topics_intro => 'Today\'s Topics:',
source_msg_delimiter => "--__--__--\n\n",
message_style_flag => '^Message:\s+(\d+)$',
from_style_flag => '^From:\s+(.+)$',
org_style_flag => '^Organization:\s+(.+)$',
to_style_flag => '^To:\s+(.+)$',
cc_style_flag => '^CC:\s+(.+)$',
subject_style_flag => '^Subject:\s+(.+)$',
date_style_flag => '^Date:\s+(.+)$',
Message: 4
Date: Sun, 7 Dec 2003 19:24:03 +1100
From: Philip Streets <phil@some.web.address.com.au>
Subject: RH9.0, perl 5.8.2 and qmail-localfilter question
The patterns developed to capture this information and store it in the
configuration hash would be as follows:
%pbml_config_in = (
grep_formula => '\[PBML\]',
pattern_target => '.*\s(\d+)\.txt$',
topics_intro => 'Topics in this digest:',
source_msg_delimiter => "________________________________________________________________________\n________________________________________________________________________\n\n",
message_style_flag => '^Message:\s+(\d+)$',
from_style_flag => '^\s+From:\s+(.+)$',
subject_style_flag => '^Subject:\s+(.+)$',
date_style_flag => '^\s+Date:\s+(.+)$',
...
);
be able to determine whether the digest needs a dose of digital castor oil or
not, and you set key C<MIME_cleanup_flag> accordingly. If the digest contains
unnecessary multipart MIME content, you set this flag to C<1>; otherwise, to
C<0>.
And with that you have completed your analysis of the internal structure of a
given digest and entered the relevant information into the first configuration
hash:
%pw32u_config_in = (
grep_formula => 'Perl-Win32-Users digest',
pattern_target => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
topics_intro => 'Today\'s Topics:',
source_msg_delimiter => "--__--__--\n\n",
message_style_flag => '^Message:\s+(\d+)$',
from_style_flag => '^From:\s+(.+)$',
org_style_flag => '^Organization:\s+(.+)$',
to_style_flag => '^To:\s+(.+)$',
cc_style_flag => '^CC:\s+(.+)$',
subject_style_flag => '^Subject:\s+(.+)$',
date_style_flag => '^Date:\s+(.+)$',
reply_to_style_flag => '^Reply-To:\s+(.+)$',
MIME_cleanup_flag => 1,
);
%pbml_config_in = (
grep_formula => '\[PBML\]',
pattern_target => '.*\s(\d+)\.txt$',
topics_intro => 'Topics in this digest:',
source_msg_delimiter => "________________________________________________________________________\n________________________________________________________________________\n\n",
message_style_flag => '^Message:\s+(\d+)$',
from_style_flag => '^\s+From:\s+(.+)$',
subject_style_flag => '^Subject:\s+(.+)$',
date_style_flag => '^\s+Date:\s+(.+)$',
MIME_cleanup_flag => 0,
);
'require' that file into your script. That way, any changes you make in the
configuration will be automatically picked up by each script that calls a
Mail::Digest::Tools function.
Here is an example of such a file holding the configuration hashes governing
use of the Perl-Win32-Users digest, along with a script making use of that file.
# file: pw32u.digest.data
$topdir = "E:/Digest/pw32u";
%config_in = (
grep_formula => 'Perl-Win32-Users digest',
pattern_target => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
# next element's value must be double-quoted
source_msg_delimiter => "--__--__--\n\n",
topics_intro => 'Today\'s Topics:',
message_style_flag => '^Message:\s+(\d+)$',
from_style_flag => '^From:\s+(.+)$',
org_style_flag => '^Organization:\s+(.+)$',
to_style_flag => '^To:\s+(.+)$',
cc_style_flag => '^CC:\s+(.+)$',
subject_style_flag => '^Subject:\s+(.+)$',
[PBML] Digest Number 1492.txt
digest_log.txt
Threads/
File F<digest.data> would look like this:
# digest.data
$topdir = "E:/Digest";
%digest_structure = (
pbml => {
grep_formula => '\[PBML\]',
pattern_target => '.*\s(\d+)\.txt$',
...
},
pw32u => {
grep_formula => 'Perl-Win32-Users digest',
pattern_target => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
...
},
);
%digest_output_format = (
pbml => {
title => 'Perl Beginner',
dir_digest => "$topdir/pbml",
dir_threads => "$topdir/pbml/Threads",
...
while ( ($k, $v) = each %{$digest_output_format{'pbml'}} ) {
$pbml_config_out{$k} = $v;
}
while ( ($k, $v) = each %{$digest_structure{'pw32u'}} ) {
$pw32u_config_in{$k} = $v;
}
while ( ($k, $v) = each %{$digest_output_format{'pw32u'}} ) {
$pw32u_config_out{$k} = $v;
}
ok( defined $pbml_config_in{'grep_formula'});# 6
ok( defined $pbml_config_in{'pattern_target'});# 7
ok( defined $pbml_config_in{'topics_intro'});# 8
ok( defined $pbml_config_in{'post_topics_delimiter'});# 9
ok( defined $pbml_config_in{'source_msg_delimiter'});# 10
ok( defined $pbml_config_in{'topics_intro'});# 11
ok( defined $pbml_config_in{'message_style_flag'});# 12
ok( defined $pbml_config_in{'from_style_flag'});# 13
ok(! defined $pbml_config_in{'org_style_flag'});# 14
ok(! defined $pbml_config_in{'to_style_flag'});# 15
ok(! defined $pbml_config_in{'cc_style_flag'});# 16
ok( defined $pbml_config_out{'mimelog'});# 31
ok( defined $pbml_config_out{'id_format'});# 32
ok( defined $pbml_config_out{'output_id_format'});# 33
ok( defined $pbml_config_out{'MIME_cleanup_log_flag'});# 34
ok( defined $pbml_config_out{'thread_msg_delimiter'});# 35
ok( defined $pbml_config_out{'archive_kill_trigger'});# 36
ok( defined $pbml_config_out{'archive_kill_days'});# 37
ok( defined $pbml_config_out{'digests_read_flag'});# 38
ok( defined $pbml_config_out{'archive_config'});# 39
ok( defined $pw32u_config_in{'grep_formula'});# 40
ok( defined $pw32u_config_in{'pattern_target'});# 41
ok( defined $pw32u_config_in{'topics_intro'});# 42
ok( defined $pw32u_config_in{'post_topics_delimiter'});# 43
ok( defined $pw32u_config_in{'source_msg_delimiter'});# 44
ok( defined $pw32u_config_in{'message_style_flag'});# 45
ok( defined $pw32u_config_in{'from_style_flag'});# 46
ok( defined $pw32u_config_in{'org_style_flag'});# 47
ok( defined $pw32u_config_in{'to_style_flag'});# 48
ok( defined $pw32u_config_in{'cc_style_flag'});# 49
ok( defined $pw32u_config_in{'subject_style_flag'});# 50
$pbml_tp[7] => [ qw|
00003_0003
00003_0005
00003_0006
| ],
);
# determine number of digests needing processing
opendir DIG, $pbml_digdir or die "Couldn't open directory $pbml_digdir: $!";
$log{'digs'} = scalar(
grep { /$pbml_config_in{'grep_formula'}/ } readdir DIG);
closedir DIG or die "Couldn't close directory $pbml_digdir: $!";
ok($log{'digs'} == 3, '3 pbml digests found before processing'); # 3
# verify log files are empty or do not yet exist
my $dl = $pbml_config_out{'digests_log'};
my $dr = $pbml_config_out{'digests_read'};
my $drf = $pbml_config_out{'digests_read_flag'};
my $tt = $pbml_config_out{'todays_topics'};
001_0002_0009
001_0002_0010
001_0002_0011
001_0002_0012
| ],
);
# determine number of digests needing processing
opendir DIG, $pw32u_digdir or die "Couldn't open directory $pw32u_digdir: $!";
$log{'digs'} = scalar(
grep { /$pw32u_config_in{'grep_formula'}/ } readdir DIG);
closedir DIG or die "Couldn't close directory $pw32u_digdir: $!";
ok($log{'digs'} == 2, '2 pw32u digests found before processing'); # 3
# verify log files are empty or do not yet exist
my $dl = $pw32u_config_out{'digests_log'};
my $dr = $pw32u_config_out{'digests_read'};
my $drf = $pw32u_config_out{'digests_read_flag'};
my $tt = $pw32u_config_out{'todays_topics'};
t/samples/digest.data view on Meta::CPAN
# digest.data
# as of 2/22/04: being adapted for
# changes to pw32u digest format
use Cwd;
$makefile_workdir = cwd();
$topdir = "$makefile_workdir/t/samples";
%digest_structure = (
pbml => {
grep_formula => '\[PBML\]',
# pattern_target => '.*\s(\d+)\.txt$',
pattern_target => '.*_(\d+)\.txt$',
topics_intro => 'Topics in this digest:',
# next 2 elements' values must be double-quoted
post_topics_delimiter => "________________________________________________________________________\n________________________________________________________________________\n\n",
source_msg_delimiter => "________________________________________________________________________\n________________________________________________________________________\n\n",
message_style_flag => '^Message:\s+(\d+)$',
from_style_flag => '^\s+From:\s+(.+)$',
org_style_flag => undef,
to_style_flag => undef,
cc_style_flag => undef,
subject_style_flag => '^Subject:\s+(.+)$',
date_style_flag => '^\s+Date:\s+(.+)$',
reply_to_style_flag => undef,
MIME_cleanup_flag => 0,
},
pw32u => {
# grep_formula => 'Perl-Win32-Users Digest',
# pattern_target => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt$',
grep_formula => 'Perl-Win32-Users_Digest',
pattern_target => '.*Vol_(\d+),_Issue_(\d+)\.txt$',
topics_intro => 'Today\'s Topics:',
# next 2 elements' value must' be double-quoted
post_topics_delimiter => "----------------------------------------------------------------------\n\n",
source_msg_delimiter => "------------------------------\n\n",
message_style_flag => '^Message:\s+(\d+)$',
from_style_flag => '^From:\s+(.+)$',
org_style_flag => '^Organization:\s+(.+)$',
to_style_flag => '^To:\s+(.+)$',
cc_style_flag => '^CC:\s+(.+)$',
( run in 0.407 second using v1.01-cache-2.11-cpan-26ccb49234f )