Mail-Digest-Tools

 view release on metacpan or  search on metacpan

Tools.pm  view on Meta::CPAN

    print "$tokill files deleted\n";
    $tokill ? print KILL $killstr : print KILL "[None.]\n";
    close KILL or die "Couldn't close $killfile after writing: $!";
}

sub _get_digest_list {
    my ($config_in_ref, $config_out_ref) = @_;
    opendir(DIR, ${$config_out_ref}{'dir_digest'}) || die "no ${$config_out_ref}{'dir_digest'}?: $!";
    my @digests = 
        sort { lc($a) cmp lc($b) } 
        grep { /${$config_in_ref}{'grep_formula'}/ } 
        readdir(DIR);
    closedir(DIR) || die "Could not close ${$config_out_ref}{'dir_digest'}: $!";
    return \@digests;
}

sub _prep_source_file {
    my ($config_in_ref, $config_out_ref, $digests_ref) = @_;  # v1.94
    # %in_out: hash of all instances in directory of a given digest, 
    # value refers to digest's title and its message topics
    my (%in_out, $id);

Tools.pm  view on Meta::CPAN


Similarly, the Perl Beginner digest files look like this:

    [PBML] Digest Number 1491.txt
    [PBML] Digest Number 1492.txt

To correctly identify Perl-Win32-Users digest files from any other files in 
the same directory, we compose a string which would form the core of a Perl 
regular expression, I<i.e.,> everything in a pattern except the outer 
delimiters.  Internally, Mail::Digest::Tools passes the file name through a 
C<grep { /regexp/ }> pattern, so the first key is called C<grep_formula>.

    %pw32u_config_in = (
        grep_formula            => 'Perl-Win32-Users Digest',
        ...
    );

The equivalent pattern for the Perl Beginners digest would be:

    %pbml_config_in = (
        grep_formula            => '\[PBML\]',
        ...
    );

Note that the C<[> and C<]> characters have to be escaped with a C<\> 
backslash because they are normally metacharacters inside Perl regular 
expressions.

We next have to extract the digest number from the digest's file name.  
Certain mailing list programs give individual digests both a 'Volume' number 
as well as an individual digest number.  Perl-Win32-Users typifies this.  In 
the example above we need to capture both the C<1> as volume number and C<1771> 
as digest number.  The next key in our configuration hash is called 
C<pattern_target>:

    %pw32u_config_in = (
        grep_formula            => 'Perl-Win32-Users Digest',
        pattern_target          => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
        ...
    );

Note the two sets of capturing parentheses.

Other digests, such as those at Yahoo! Groups, dispense with a volume number 
and simply increment each digest number:

    %pbml_config_in = (
        grep_formula            => '\[PBML\]',
        pattern_target          => '.*\s(\d+)\.txt$',
        ...
    );

Note that this C<pattern_target> contains only one pair of capturing 
parentheses.

=head2 Analysis of Digest's Internal Structure

A digest's internal structure is discussed in detail above (see 
'A TYPICAL MAILING LIST DIGEST').  Here we need to identify two 
characteristics:  the way the digest introduces its list of today's topics 
and the string it uses to delimit the list of today's topics from the first 
individual message in the digest and all subsequent messages from one another.  
Continuing with our two examples from above, we provide values for keys 
C<topics_intro> and C<source_msg_delimiter>: 

    %pw32u_config_in = (
        grep_formula            => 'Perl-Win32-Users digest',
        pattern_target          => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
        topics_intro            => 'Today\'s Topics:',
        source_msg_delimiter    => "--__--__--\n\n",
        ...
    );

Note the escaped C<'> apostrophe character in the value for key 
C<topics_intro>.

    %pbml_config_in = (
        grep_formula            => '\[PBML\]',
        pattern_target          => '.*\s(\d+)\.txt$',
        topics_intro            => 'Topics in this digest:',
        source_msg_delimiter    => "________________________________________________________________________\n________________________________________________________________________\n\n",
        ...
    );

Note that the values provided for the respective C<source_msg_delimiter> keys 
had to be double-quoted strings.  That's because all such delimiters include 
two or more C<\n> newline characters so that they form paragraphs unto 
themselves.  Unless indicated otherwise, the values for all other values in 

Tools.pm  view on Meta::CPAN

    Organization: Some Web Address
    Reply-To: Sam Spade <sspade@some.web.address.com>
    To: "Time" <summers@some.web.address.com>
    CC: "Perl List" <perl-win32-users@listserv.activestate.com>
    Subject: Re: New IE Update causes script problems

Patterns are easily developed to capture this information and store it in the 
configuration hash:

    %pw32u_config_in = (
        grep_formula            => 'Perl-Win32-Users digest',
        pattern_target          => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
        topics_intro            => 'Today\'s Topics:',
        source_msg_delimiter    => "--__--__--\n\n",
        message_style_flag      => '^Message:\s+(\d+)$',
        from_style_flag         => '^From:\s+(.+)$',
        org_style_flag          => '^Organization:\s+(.+)$',
        to_style_flag           => '^To:\s+(.+)$',
        cc_style_flag           => '^CC:\s+(.+)$',
        subject_style_flag      => '^Subject:\s+(.+)$',
        date_style_flag         => '^Date:\s+(.+)$',

Tools.pm  view on Meta::CPAN


    Message: 4
       Date: Sun, 7 Dec 2003 19:24:03 +1100
       From: Philip Streets <phil@some.web.address.com.au>
    Subject: RH9.0, perl 5.8.2 and qmail-localfilter question

The patterns developed to capture this information and store it in the 
configuration hash would be as follows:

    %pbml_config_in = (
        grep_formula            => '\[PBML\]',
        pattern_target          => '.*\s(\d+)\.txt$',
        topics_intro            => 'Topics in this digest:',
        source_msg_delimiter    => "________________________________________________________________________\n________________________________________________________________________\n\n",
        message_style_flag      => '^Message:\s+(\d+)$',
        from_style_flag         => '^\s+From:\s+(.+)$',
        subject_style_flag      => '^Subject:\s+(.+)$',
        date_style_flag         => '^\s+Date:\s+(.+)$',
        ...
    );

Tools.pm  view on Meta::CPAN

be able to determine whether the digest needs a dose of digital castor oil or 
not, and you set key C<MIME_cleanup_flag> accordingly.  If the digest contains 
unnecessary multipart MIME content, you set this flag to C<1>; otherwise, to 
C<0>.

And with that you have completed your analysis of the internal structure of a 
given digest and entered the relevant information into the first configuration 
hash:

    %pw32u_config_in = (
        grep_formula            => 'Perl-Win32-Users digest',
        pattern_target          => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
        topics_intro            => 'Today\'s Topics:',
        source_msg_delimiter    => "--__--__--\n\n",
        message_style_flag      => '^Message:\s+(\d+)$',
        from_style_flag         => '^From:\s+(.+)$',
        org_style_flag          => '^Organization:\s+(.+)$',
        to_style_flag           => '^To:\s+(.+)$',
        cc_style_flag           => '^CC:\s+(.+)$',
        subject_style_flag      => '^Subject:\s+(.+)$',
        date_style_flag         => '^Date:\s+(.+)$',
        reply_to_style_flag     => '^Reply-To:\s+(.+)$',
        MIME_cleanup_flag       => 1,
    );

    %pbml_config_in = (
        grep_formula            => '\[PBML\]',
        pattern_target          => '.*\s(\d+)\.txt$',
        topics_intro            => 'Topics in this digest:',
        source_msg_delimiter    => "________________________________________________________________________\n________________________________________________________________________\n\n",
        message_style_flag      => '^Message:\s+(\d+)$',
        from_style_flag         => '^\s+From:\s+(.+)$',
        subject_style_flag      => '^Subject:\s+(.+)$',
        date_style_flag         => '^\s+Date:\s+(.+)$',
        MIME_cleanup_flag       => 0,
    );

Tools.pm  view on Meta::CPAN

'require' that file into your script.  That way, any changes you make in the 
configuration will be automatically picked up by each script that calls a 
Mail::Digest::Tools function.

Here is an example of such a file holding the configuration hashes governing 
use of the Perl-Win32-Users digest, along with a script making use of that file.

    # file:  pw32u.digest.data
    $topdir = "E:/Digest/pw32u";
    %config_in =  (
         grep_formula           => 'Perl-Win32-Users digest',
         pattern_target          => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
         # next element's value must be double-quoted
         source_msg_delimiter   => "--__--__--\n\n",
         topics_intro           => 'Today\'s Topics:',
         message_style_flag     => '^Message:\s+(\d+)$',
         from_style_flag        => '^From:\s+(.+)$',
         org_style_flag         => '^Organization:\s+(.+)$',
         to_style_flag          => '^To:\s+(.+)$',
         cc_style_flag          => '^CC:\s+(.+)$',
         subject_style_flag     => '^Subject:\s+(.+)$',

Tools.pm  view on Meta::CPAN

        [PBML] Digest Number 1492.txt
        digest_log.txt
        Threads/

File F<digest.data> would look like this:

    # digest.data
    $topdir = "E:/Digest";
    %digest_structure = (
        pbml =>    {
             grep_formula   => '\[PBML\]',
             pattern_target => '.*\s(\d+)\.txt$',
             ...
           },
        pw32u =>   {
             grep_formula   => 'Perl-Win32-Users digest',
             pattern_target => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt',
             ...
           },
    );
    %digest_output_format = (
        pbml =>    {
             title          => 'Perl Beginner',
             dir_digest     => "$topdir/pbml",
             dir_threads    => "$topdir/pbml/Threads",
             ...

t/01.t  view on Meta::CPAN

while ( ($k, $v) = each %{$digest_output_format{'pbml'}} ) {
    $pbml_config_out{$k} = $v;
}
while ( ($k, $v) = each %{$digest_structure{'pw32u'}} ) {
    $pw32u_config_in{$k} = $v;
}
while ( ($k, $v) = each %{$digest_output_format{'pw32u'}} ) {
    $pw32u_config_out{$k} = $v;
}

ok(  defined $pbml_config_in{'grep_formula'});# 6
ok(  defined $pbml_config_in{'pattern_target'});# 7
ok(  defined $pbml_config_in{'topics_intro'});# 8
ok(  defined $pbml_config_in{'post_topics_delimiter'});# 9
ok(  defined $pbml_config_in{'source_msg_delimiter'});# 10
ok(  defined $pbml_config_in{'topics_intro'});# 11
ok(  defined $pbml_config_in{'message_style_flag'});# 12
ok(  defined $pbml_config_in{'from_style_flag'});# 13
ok(! defined $pbml_config_in{'org_style_flag'});# 14
ok(! defined $pbml_config_in{'to_style_flag'});# 15
ok(! defined $pbml_config_in{'cc_style_flag'});# 16

t/01.t  view on Meta::CPAN

ok(  defined $pbml_config_out{'mimelog'});# 31
ok(  defined $pbml_config_out{'id_format'});# 32
ok(  defined $pbml_config_out{'output_id_format'});# 33
ok(  defined $pbml_config_out{'MIME_cleanup_log_flag'});# 34
ok(  defined $pbml_config_out{'thread_msg_delimiter'});# 35
ok(  defined $pbml_config_out{'archive_kill_trigger'});# 36
ok(  defined $pbml_config_out{'archive_kill_days'});# 37
ok(  defined $pbml_config_out{'digests_read_flag'});# 38
ok(  defined $pbml_config_out{'archive_config'});# 39

ok(  defined $pw32u_config_in{'grep_formula'});# 40
ok(  defined $pw32u_config_in{'pattern_target'});# 41
ok(  defined $pw32u_config_in{'topics_intro'});# 42
ok(  defined $pw32u_config_in{'post_topics_delimiter'});# 43
ok(  defined $pw32u_config_in{'source_msg_delimiter'});# 44
ok(  defined $pw32u_config_in{'message_style_flag'});# 45
ok(  defined $pw32u_config_in{'from_style_flag'});# 46
ok(  defined $pw32u_config_in{'org_style_flag'});# 47
ok(  defined $pw32u_config_in{'to_style_flag'});# 48
ok(  defined $pw32u_config_in{'cc_style_flag'});# 49
ok(  defined $pw32u_config_in{'subject_style_flag'});# 50

t/02.t  view on Meta::CPAN

    $pbml_tp[7] => [ qw|
      00003_0003
      00003_0005
      00003_0006
        | ],
);
# determine number of digests needing processing

opendir DIG, $pbml_digdir or die "Couldn't open directory $pbml_digdir: $!";
$log{'digs'} = scalar(
    grep { /$pbml_config_in{'grep_formula'}/ } readdir DIG);
closedir DIG or die "Couldn't close directory $pbml_digdir: $!";
ok($log{'digs'} == 3, '3 pbml digests found before processing'); # 3

# verify log files are empty or do not yet exist

my $dl  = $pbml_config_out{'digests_log'};
my $dr  = $pbml_config_out{'digests_read'};
my $drf = $pbml_config_out{'digests_read_flag'};
my $tt  = $pbml_config_out{'todays_topics'};

t/03.t  view on Meta::CPAN

      001_0002_0009
      001_0002_0010
      001_0002_0011
      001_0002_0012
        | ],
);
# determine number of digests needing processing

opendir DIG, $pw32u_digdir or die "Couldn't open directory $pw32u_digdir: $!";
$log{'digs'} = scalar(
    grep { /$pw32u_config_in{'grep_formula'}/ } readdir DIG);
closedir DIG or die "Couldn't close directory $pw32u_digdir: $!";
ok($log{'digs'} == 2, '2 pw32u digests found before processing'); # 3

# verify log files are empty or do not yet exist

my $dl  = $pw32u_config_out{'digests_log'};
my $dr  = $pw32u_config_out{'digests_read'};
my $drf = $pw32u_config_out{'digests_read_flag'};
my $tt  = $pw32u_config_out{'todays_topics'};

t/samples/digest.data  view on Meta::CPAN

# digest.data
# as of 2/22/04:  being adapted for 
# changes to pw32u digest format
use Cwd;

$makefile_workdir = cwd();
$topdir = "$makefile_workdir/t/samples";

%digest_structure = (
    pbml =>    {
         grep_formula               => '\[PBML\]',
#         pattern_target             => '.*\s(\d+)\.txt$',
         pattern_target             => '.*_(\d+)\.txt$',
         topics_intro               => 'Topics in this digest:',
         # next 2 elements' values must be double-quoted
         post_topics_delimiter      => "________________________________________________________________________\n________________________________________________________________________\n\n",
         source_msg_delimiter       => "________________________________________________________________________\n________________________________________________________________________\n\n",
         message_style_flag         => '^Message:\s+(\d+)$',
         from_style_flag            => '^\s+From:\s+(.+)$',
         org_style_flag             => undef,
         to_style_flag              => undef,
         cc_style_flag              => undef,
         subject_style_flag         => '^Subject:\s+(.+)$',
         date_style_flag            => '^\s+Date:\s+(.+)$',
         reply_to_style_flag        => undef,
         MIME_cleanup_flag          => 0,
       },
    pw32u =>   {
#         grep_formula               => 'Perl-Win32-Users Digest',
#         pattern_target             => '.*Vol\s(\d+),\sIssue\s(\d+)\.txt$',
         grep_formula               => 'Perl-Win32-Users_Digest',
         pattern_target             => '.*Vol_(\d+),_Issue_(\d+)\.txt$',
         topics_intro               => 'Today\'s Topics:',
         # next 2 elements' value must' be double-quoted
         post_topics_delimiter      => "----------------------------------------------------------------------\n\n",
         source_msg_delimiter       => "------------------------------\n\n",
         message_style_flag         => '^Message:\s+(\d+)$',
         from_style_flag            => '^From:\s+(.+)$',
         org_style_flag             => '^Organization:\s+(.+)$',
         to_style_flag              => '^To:\s+(.+)$',
         cc_style_flag              => '^CC:\s+(.+)$',



( run in 0.407 second using v1.01-cache-2.11-cpan-26ccb49234f )