backup results from the CPAN

Algorithm-History-Levels

view release on metacpan or search on metacpan

lib/Algorithm/History/Levels.pm view on Meta::CPAN

package Algorithm::History::Levels;

our $DATE = '2017-06-14'; # DATE
our $VERSION = '0.001'; # VERSION

use 5.010001;
use strict;
use warnings;

use Exporter qw(import);
our @EXPORT_OK = qw(group_histories_into_levels);

our %SPEC;

sub _pick_history {
    my ($histories, $min_time, $max_time) = @_;
    for my $i (0..$#{$histories}) {
        #say "D:$histories->[$i][1] between $min_time & $max_time?";
        if ($histories->[$i][1] >= $min_time &&
                $histories->[$i][1] <= $max_time) {
            return splice(@$histories, $i, 1);
        }
    }
    undef;
}

$SPEC{group_histories_into_levels} = {
    v => 1.1,
    summary => 'Group histories into levels',
    description => <<'_',

This routine can group a single, linear histories into levels. This is be better
explained by an example. Suppose you produce daily database backups. Your backup
files are named:

    mydb.2017-06-13.sql.gz
    mydb.2017-06-12.sql.gz
    mydb.2017-06-11.sql.gz
    mydb.2017-06-10.sql.gz
    mydb.2017-06-09.sql.gz
    ...

After a while, your backups grow into tens and then hundreds of dump files. You
typically want to keep certain number of backups only, for example: 7 daily
backups, 4 weekly backups, 6 monthly backups (so you practically have 6 months
of history but do not need to store 6*30 = 180 dumps, only 7 + 4 + 6 = 17). This
is the routine you can use to select which files to keep and which to discard.

You provide the list of histories either in the form of Unix timestamps:

    [1497286800, 1497200400, 1497114000, ...]

or in the form of `[name, timestamp]` pairs, e.g.:

    [
      ['mydb.2017-06-13.sql.gz', 1497286800],
      ['mydb.2017-06-12.sql.gz', 1497200400],
      ['mydb.2017-06-11.sql.gz', 1497114000],
      ...
    ]

Duplicates of timestamps are allowed, but duplicates of names are not allowed.
If list of timestamps are given, the name is assumed to be the timestamp itself
and there must not be duplicates.

Then, you specify the levels with a list of `[period, num-in-this-level]` pairs.
For example, 7 daily + 4 weekly + 6 monthly can be specified using:

    [
      [86400, 7],
      [7*86400, 4],
      [30*86400, 6],
    ]

Subsequent level must have greater period than its previous.

This routine will return a hash. The `levels` key will contain the history
names, grouped into levels. The `discard` key will contain list of history names
to discard:

    {
      levels => [

        # histories for the first level
        ['mydb.2017-06-13.sql.gz',
         'mydb.2017-06-12.sql.gz',
         'mydb.2017-06-11.sql.gz',
         'mydb.2017-06-10.sql.gz',
         'mydb.2017-06-09.sql.gz',
         'mydb.2017-06-08.sql.gz',
         'mydb.2017-06-07.sql.gz'],

        # histories for the second level
        ['mydb.2017-06-06.sql.gz',
         'mydb.2017-05-30.sql.gz',
         'mydb.2017-05-23.sql.gz',
         'mydb.2017-05-16.sql.gz'],

        # histories for the third level
        ['mydb.2017-06-05.sql.gz',
         'mydb.2017-05-06.sql.gz',
         'mydb.2017-04-06.sql.gz',
         ...],

      discard => [

lib/Algorithm/History/Levels.pm view on Meta::CPAN

            }
            my @sample = Array::Sample::Partition::sample_partition(
                \@filler, $num_per_level - $num_filled);
            $res->{levels}[$l] = [
                sort { $b->[1] <=> $a->[1] }
                    (@{ $res->{levels}[$l] }, @sample),
            ];
            for my $i (reverse 0..$#histories) {
                for my $j (0..$#sample) {
                    if ($histories[$i] eq $sample[$j]) {
                        splice @histories, $i, 1;
                        last;
                    }
                }
            }
        }

        # only return names
        $res->{levels}[$l] = [ map {$_->[0]} @{ $res->{levels}[$l] } ];
    }

    push @{ $res->{discard} }, $_->[0] for @histories;

  END:
    $res;
}

1;
# ABSTRACT: Group histories into levels

__END__

=pod

=encoding UTF-8

=head1 NAME

Algorithm::History::Levels - Group histories into levels

=head1 VERSION

This document describes version 0.001 of Algorithm::History::Levels (from Perl distribution Algorithm-History-Levels), released on 2017-06-14.

=head1 SYNOPSIS

 use Algorithm::History::Levels qw(group_history_into_levels);

=head1 FUNCTIONS


=head2 group_histories_into_levels

Usage:

 group_histories_into_levels(%args) -> any

Group histories into levels.

This routine can group a single, linear histories into levels. This is be better
explained by an example. Suppose you produce daily database backups. Your backup
files are named:

 mydb.2017-06-13.sql.gz
 mydb.2017-06-12.sql.gz
 mydb.2017-06-11.sql.gz
 mydb.2017-06-10.sql.gz
 mydb.2017-06-09.sql.gz
 ...

After a while, your backups grow into tens and then hundreds of dump files. You
typically want to keep certain number of backups only, for example: 7 daily
backups, 4 weekly backups, 6 monthly backups (so you practically have 6 months
of history but do not need to store 6*30 = 180 dumps, only 7 + 4 + 6 = 17). This
is the routine you can use to select which files to keep and which to discard.

You provide the list of histories either in the form of Unix timestamps:

 [1497286800, 1497200400, 1497114000, ...]

or in the form of C<[name, timestamp]> pairs, e.g.:

 [
   ['mydb.2017-06-13.sql.gz', 1497286800],
   ['mydb.2017-06-12.sql.gz', 1497200400],
   ['mydb.2017-06-11.sql.gz', 1497114000],
   ...
 ]

Duplicates of timestamps are allowed, but duplicates of names are not allowed.
If list of timestamps are given, the name is assumed to be the timestamp itself
and there must not be duplicates.

Then, you specify the levels with a list of C<[period, num-in-this-level]> pairs.
For example, 7 daily + 4 weekly + 6 monthly can be specified using:

 [
   [86400, 7],
   [7*86400, 4],
   [30*86400, 6],
 ]

Subsequent level must have greater period than its previous.

This routine will return a hash. The C<levels> key will contain the history
names, grouped into levels. The C<discard> key will contain list of history names
to discard:

 {
   levels => [
 
     # histories for the first level
     ['mydb.2017-06-13.sql.gz',
      'mydb.2017-06-12.sql.gz',
      'mydb.2017-06-11.sql.gz',
      'mydb.2017-06-10.sql.gz',
      'mydb.2017-06-09.sql.gz',
      'mydb.2017-06-08.sql.gz',
      'mydb.2017-06-07.sql.gz'],
 
     # histories for the second level
     ['mydb.2017-06-06.sql.gz',
      'mydb.2017-05-30.sql.gz',
      'mydb.2017-05-23.sql.gz',
      'mydb.2017-05-16.sql.gz'],
 
     # histories for the third level
     ['mydb.2017-06-05.sql.gz',
      'mydb.2017-05-06.sql.gz',
      'mydb.2017-04-06.sql.gz',
      ...],
 
   discard => [

( run in 1.042 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )