App-MediaWiki2Git
view release on metacpan or search on metacpan
can be fetched in one query. This is likely to be 50, 500 or 5000
depending on context. This module uses 500 as the default.
pages
This list defines the pages to be fetched.
TODO: we could populate 'pages' from a category list at the start of
each run.
dns_qual
(Optional, for local use only.) This key is appended to unqualified
hostnames in ~/.ssh/ssh-config.yaml, if you have that file.
_page_revs
For internal use. This hash of pagename to last fetched revision id
is used to avoid querying the api for previous page revisions.
It is the main reason why the configuration file must be rewritten
and committed along with the copies of the pages.
Updating
The configuration is extended in-memory, (atomically) replaced on disk,
and committed as fetching progresses.
XXX: errors during a run can leave the config out of sync with the
committed pages so page revisions may get committed again. One solution
would be to "reset --hard" to the last config save commit. This could be
automated, at some cost to the principle of least surprise.
OTHER COMPONENTS
MediaWiki interface
This is used read-only and anonymously (assumes it does not need to log
in).
Git interface
Uses Git::Repository to drive Git upon the current directory. There is
no configuration.
It is assumed that the previous requirement for the existence of the
configuration file is enough of a sanity check, to prevent messing with
any other Git repositories' history.
It currently performs only "git add" and "git commit" operations, but
might want to "git reset --hard" later. This should probably require
permission from the configuration.
Page tracking
Configuration lists the pages to fetch, and the last revision fetched
per page.
The Git author is constructed from the page information, including some
post-processing to attempt to improve the usefulness of anonymous (IP
address logged) edits.
The committer and commit timestamp are left to be picked up from the
environment. This means that Git commitids will not be reproducible
between different runs of this code on the same page revisions.
Hostname lookup
When users do not log in, we get their IP address. When this is a web
proxy, we learn nothing; but in a company it is often a one-user desktop
machine.
We do a reverse lookup in the DNS (IPv4) to get a hostname. Results are
cached during the run and errors are written out as warnings.
Beware that looking up historically-recorded IP addresses against the
current DNS is likely to generate surprises.
Hostname to user lookup
You may safely ignore this part of the code.
If the custom username-to-hostname mapping is present, we include in the
"anonymous" author info the result of a lookup.
This is a mapping I maintain to generate ssh host aliases, to assist
with internal user support. The tool using it is small and not (yet)
published.
AUTHOR
Copyright (C) 2011 Genome Research Limited
Author Matthew Astley mca@sanger.ac.uk
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
( run in 0.764 second using v1.01-cache-2.11-cpan-5a3173703d6 )