App-MediaWiki2Git

 view release on metacpan or  search on metacpan

lib/App/MediaWiki2Git.pm  view on Meta::CPAN


has git => (is => 'rw', isa => 'Git::Repository', lazy_build => 1);

sub _build_git {
    my ($self) = @_;
    return Git::Repository->new();
}


=head2 Page tracking

Configuration lists the pages to fetch, and the last revision fetched
per page.

The Git author is constructed from the page information, including
some post-processing to attempt to improve the usefulness of anonymous
(IP address logged) edits.

The committer and commit timestamp are left to be picked up from the
environment.  This means that Git commitids will not be reproducible
between different runs of this code on the same page revisions.

=cut

sub pages {
    my ($self) = @_;
    my $p = $self->config_for(pages => 'ARRAY');
    die "No pages listed in configuration" unless @$p;
    return @$p;
}


sub page_lastrev {
    my ($self, $pagename, $new_lastrev) = @_;
    my $revs = $self->config->{_page_revs} ||= {};
    $revs->{$pagename} = $new_lastrev if defined $new_lastrev;
    return $revs->{$pagename} || 0;
}


# Destructively takes out all the page content
sub _save_revs {
    my ($self, $pageid, $page) = @_;

    foreach my $rev (@{ $page->{revisions} }) {
        $self->_save_page($page->{title}, $rev);
    }

    $self->config_save;
}


sub _save_page {
    my ($self, $pagename, $props) = @_;

    my $fn = $pagename;
    write_file($fn, { atomic => 1 }, delete $props->{'*'});

    my $author;
    if ($props->{user} =~ m{^[0-9.:]+$}) {
        # user looks like an IP address.

        # exists $props->{anon} # looks promising, but broken in early
        # revs (made with early mediawiki?)
        $author = $self->anon2author($props->{user});
    } else {
        $author = sprintf('%s <%s>', ($props->{user}) x 2);
    }

    my $msg = sprintf
      ("Edit: %s (rev%s) %s\n\n%s",
       $pagename, $props->{revid},
       $props->{comment} || '',
       Dump($props));

    printf("[%s %s] %s\n", $pagename, $props->{revid}, $author);
    $self->git->run(add => $fn);
    $self->git->run
      (commit => '-q',
       -m => $msg,
       -o => $fn,
       '--author' => $author,
       '--date' => $props->{timestamp});

    $self->page_lastrev($pagename, $props->{revid});
}


=head2 Hostname lookup

When users do not log in, we get their IP address.  When this is a web
proxy, we learn nothing; but in a company it is often a one-user
desktop machine.

We do a reverse lookup in the DNS (IPv4) to get a hostname.  Results
are cached during the run and errors are written out as warnings.

Beware that looking up historically-recorded IP addresses against the
current DNS is likely to generate surprises.

=head2 Hostname to user lookup

You may safely ignore this part of the code.

If the custom username-to-hostname mapping is present, we include in
the "anonymous" author info the result of a lookup.

This is a mapping I maintain to generate ssh host aliases, to assist
with internal user support.  The tool using it is small and not (yet)
published.

=cut

has _ptrcache => (is => 'ro', default => sub { {} });
has resolver => (is => 'ro', default => sub { Net::DNS::Resolver->new });
has host2nick => (is => 'rw', isa => 'HashRef', lazy_build => 1);


sub anon2author {
    my ($self, $ip) = @_;
    my $host = $self->ip2host($ip);
    my $nick = $self->host2nick->{$host} || $ip;



( run in 1.940 second using v1.01-cache-2.11-cpan-437f7b0c052 )