App-MediaWiki2Git

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN

        can be fetched in one query. This is likely to be 50, 500 or 5000
        depending on context. This module uses 500 as the default.

    pages
        This list defines the pages to be fetched.

        TODO: we could populate 'pages' from a category list at the start of
        each run.

    dns_qual
        (Optional, for local use only.) This key is appended to unqualified
        hostnames in ~/.ssh/ssh-config.yaml, if you have that file.

    _page_revs
        For internal use. This hash of pagename to last fetched revision id
        is used to avoid querying the api for previous page revisions.

        It is the main reason why the configuration file must be rewritten
        and committed along with the copies of the pages.

  Updating
    The configuration is extended in-memory, (atomically) replaced on disk,
    and committed as fetching progresses.

    XXX: errors during a run can leave the config out of sync with the
    committed pages so page revisions may get committed again. One solution
    would be to "reset --hard" to the last config save commit. This could be
    automated, at some cost to the principle of least surprise.

OTHER COMPONENTS
  MediaWiki interface
    This is used read-only and anonymously (assumes it does not need to log
    in).

  Git interface
    Uses Git::Repository to drive Git upon the current directory. There is
    no configuration.

    It is assumed that the previous requirement for the existence of the
    configuration file is enough of a sanity check, to prevent messing with
    any other Git repositories' history.

    It currently performs only "git add" and "git commit" operations, but
    might want to "git reset --hard" later. This should probably require
    permission from the configuration.

  Page tracking
    Configuration lists the pages to fetch, and the last revision fetched
    per page.

    The Git author is constructed from the page information, including some
    post-processing to attempt to improve the usefulness of anonymous (IP
    address logged) edits.

    The committer and commit timestamp are left to be picked up from the
    environment. This means that Git commitids will not be reproducible
    between different runs of this code on the same page revisions.

  Hostname lookup
    When users do not log in, we get their IP address. When this is a web
    proxy, we learn nothing; but in a company it is often a one-user desktop
    machine.

    We do a reverse lookup in the DNS (IPv4) to get a hostname. Results are
    cached during the run and errors are written out as warnings.

    Beware that looking up historically-recorded IP addresses against the
    current DNS is likely to generate surprises.

  Hostname to user lookup
    You may safely ignore this part of the code.

    If the custom username-to-hostname mapping is present, we include in the
    "anonymous" author info the result of a lookup.

    This is a mapping I maintain to generate ssh host aliases, to assist
    with internal user support. The tool using it is small and not (yet)
    published.

AUTHOR
    Copyright (C) 2011 Genome Research Limited

    Author Matthew Astley mca@sanger.ac.uk

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.



( run in 0.764 second using v1.01-cache-2.11-cpan-5a3173703d6 )