App-MtAws
view release on metacpan or search on metacpan
1. `sudo apt-get install build-essential libssl-dev`
3. install/update `LWP::UserAgent` and `LWP::Protocol::https` using [cpanm]
###### Fedora 18+
`sudo yum install perl-core perl-CGI perl-JSON-XS perl-libwww-perl perl-LWP-Protocol-https`
###### SUSE Linux Enterprise Server 11
1. `sudo zypper install perl-libwww-perl libopenssl-devel`
2. `sudo zypper install --type pattern Basis-Devel`
3. Upgrade openssl to (at least) `0.9.8r` (to check version use `openssl version`), can be found [here](http://download.opensuse.org/repositories/security:/fips/) (more info here [RT#81575](https://rt.cpan.org/Public/Bug/Display.html?id=81575))
4. Update `ExtUtils::MakeMaker` via [cpanm]
5. Install `LWP::UserAgent`, `LWP::Protocol::https`, `JSON::XS` using [cpanm]
###### Amazon Linux 2013.03
`sudo yum install perl-core perl-JSON-XS perl-libwww-perl perl-LWP-Protocol-https`
###### MacOS X
Install the following packages:
Install `LWP::UserAgent` (`p5-libwww-perl`), `JSON::XS` (`p5-json-XS`). For HTTPS support you need `LWP::Protocol::https`, however on MacOS X
you probably need `Mozilla::CA` (it should go with `LWP::Protocol::https`, but it can be missing). Try to use HTTPS without `Mozilla::CA` - if it does not work, install
`Mozilla::CA`
#### Install mt-aws-glacier
git clone https://github.com/vsespb/mt-aws-glacier.git
(or just download and unzip `https://github.com/vsespb/mt-aws-glacier/archive/master.zip` )
After that you can execute `mtglacier` script (found in root of repository) from any directory, or create a symlink to it - it will find other package files by itself
(don't forget to remove it later, if you decide to switch to CPAN install)
### *OR* Installation via CPAN
cpan -i App::MtAws
That's it.
### Installation general instructions, troubleshooting, edge cases and misc instructions
##### In general you need the following perl modules to run *mt-aws-glacier*:
* **LWP::UserAgent** (or Debian package **libwww-perl** or RPM package **perl-libwww-perl** or MacPort **p5-libwww-perl**)
* **JSON::XS** (or Debian package **libjson-xs-perl** or RPM package **perl-JSON-XS** or MacPort **p5-json-XS**)
##### Other notes
1. for old Perl < 5.9.3 (i.e. *CentOS 5.x*), install also **Digest::SHA** (or Debian package **libdigest-sha-perl** or RPM package **perl-Digest-SHA**)
2. Some distributions with old Perl stuff (examples: *Ubuntu 10.04*, *CentOS 5/6*) to use HTTPS you need to upgrade **LWP::Protocol::https** to version 6+ via CPAN.
3. *Fedora*, *CentOS 6* etc [decoupled](http://www.nntp.perl.org/group/perl.perl5.porters/2009/08/msg149747.html) Perl,
so package named `perl`, which is a part of default installation, is not actually real, full Perl, which is misleading.
`perl-core` is looks much more like a real Perl (I [hope](https://bugzilla.redhat.com/show_bug.cgi?id=985791) so)
4. On newer RHEL distributions (some *Fedora* versions) you need install **perl-LWP-Protocol-https** to use HTTPS.
5. To inistall `perl-JSON-XS` RPM package on RHEL5/6 you need to enable [EPEL](http://fedoraproject.org/wiki/EPEL) repository
6. If you've used manual installation before "CPAN" installation, it's probably better to remove previously installed `mtglacier` executable from your path.
7. CPAN distribution of *mt-aws-glacier* has a bit more dependencies than manual installation, as it requires additional modules for testsuite.
8. New releases of *mt-aws-glacier* usually appear on CPAN within a ~week after official release.
9. On *Fedora*, *CentOS 6 minimal* you need to install `perl-core`, `perl-CPAN`, `perl-CGI` before trying to install via CPAN
10. For some distributions with old Perl stuff (examples: *CentOS 5/6*) you need to update CPAN and Module::Build first: `cpan -i CPAN`, `cpan -i Module::Build`
11. CPAN tool asks too many questions during install (but ignores important errors). You can avoid it by running `cpan` command and configuring it like this:
o conf build_requires_install_policy yes
o conf prerequisites_policy follow
o conf halt_on_failure on
o conf commit
exit
12. Instead system `cpan` tool you might want to try [`cpanm`](http://search.cpan.org/perldoc?App%3A%3Acpanminus) - it's a bit easier to install and configure.
13. Installation of **LWP::Protocol::https** requires C header files ( `yum groupinstall "Development Tools"` for RHEL or `build-essential` for Debian ) and OpenSSL dev library (`openssl-devel` RPM or `libssl-dev` DEB).
[cpanm]:http://search.cpan.org/perldoc?App%3A%3Acpanminus
## Warnings ( *MUST READ* )
* When playing with Glacier make sure you will be able to delete all your archives, it's impossible to delete archive
or non-empty vault in amazon console now. Also make sure you have read _all_ Amazon Glacier pricing/faq.
* Read Amazon Glacier pricing [FAQ][Amazon Glacier faq] again, really. Beware of retrieval fee.
* Before using this program, you should read Amazon Glacier documentation and understand, in general, Amazon Glacier workflows and entities. This documentation
does not define any new layer of abstraction over Amazon Glacier entities.
* In general, all Amazon Glacier clients store metadata (filenames, file metadata) in own formats, incompatible with each other. To restore backup made with `mt-aws-glacier` you'll
need `mt-aws-glacier`, other software most likely will restore your data but loose filenames.
* With low "partsize" option you pay a bit more (Amazon charges for each upload request)
* For backup created with older versions (0.7x) of mt-aws-glacier, Journal file **required to restore backup**.
* Use a **Journal file** only with **same vault** ( more info [here](#what-is-journal) and [here](#how-to-maintain-a-relation-between-my-journal-files-and-my-vaults) and [here](https://github.com/vsespb/mt-aws-glacier/issues/50))
* When work with CD-ROM/CIFS/other non-Unix/non-POSIX filesystems, you might need set `leaf-optimization` to `0`
* Please read [ChangeLog][mt-aws glacier changelog] when upgrading to new version, and especially when downgrading.
(See "Compatibility" sections when downgrading)
* Zero length files and empty directories are ignored (as Amazon Glacier does not support it)
* See other [limitations](#limitations)
[Amazon Glacier faq]:http://aws.amazon.com/glacier/faqs/#How_will_I_be_charged_when_retrieving_large_amounts_of_data_from_Amazon_Glacier
[mt-aws glacier changelog]:https://github.com/vsespb/mt-aws-glacier/blob/master/ChangeLog
Thus you should always use a separate Journal file for each Amazon Glacier *vault*. Also, file metadata (except filename and file *modification time*) will
be lost, if you restore files from Amazon Glacier.
#### Some Journal features
* It's a text file. You can parse it with `grep` `awk` `cut`, `tail` etc, to extract information in case you need perform some advanced stuff, that `mtglacier` can't do (NOTE: make sure you know what you're doing ).
To view only some files:
grep Majorca Photos.journal
To view only creation records:
grep CREATED Photos.journal | wc -l
To compare only important fields of two journals
cut journal -f 4,5,6,7,8 |sort > journal.cut
cut new-journal -f 4,5,6,7,8 |sort > new-journal.cut
diff journal.cut new-journal.cut
* Each text line in a file represent one record
* It's an append-only file. File opened in append-only mode, and new records only added to the end. This guarantees that
you can recover Journal file to previous state in case of bug in program/crash/some power/filesystem issues. You can even use `chattr +a` to set append-only protection to the Journal.
* As Journal file is append-only, it's easy to perform incremental backups of it
#### Why Journal is a file in local filesystem file, but not in online Cloud storage (like Amazon S3 or Amazon DynamoDB)?
Journal is needed to restore backup, and we can expect that if you need to restore a backup, that means that you lost your filesystem, together with Journal.
However Journal also needed to perform *new backups* (`sync` command), to determine which files are already in Glacier and which are not. And also to checking local file integrity (`check-local-hash` command).
Actually, usually you perform new backups every day. And you restore backups (and loose your filesystem) very rare.
So fast (local) journal is essential to perform new backups fast and cheap (important for users who backups thousands or millions of files).
And if you lost your journal, you can restore it from Amazon Glacier (see `retrieve-inventory` command). Also it's recommended to backup your journal
to another backup system (Amazon S3 ? Dropbox ?) with another tool, because retrieving inventory from Amazon Glacier is pretty slow.
Also some users might want to backup *same* files from *multiple* different locations. They will need *synchronization* solution for journal files.
Anyway I think problem of putting Journals into cloud can be automated and solved with 3 lines bash script..
#### How to maintain a relation between my journal files and my vaults?
1. You can name journal with same name as your vault. Example: Vault name is `Photos`. Journal file name is `Photos.journal`. Or `eu-west-1-Photos.journal`
2. (Almost) Any command line option can be used in config file, so you can create `myphotos.cfg` with following content:
key=YOURKEY
secret=YOURSECRET
protocol=http
region=us-east-1
vault=Photos
journal=/home/me/.glacier/photos.journal
#### Why Journal does not contain region/vault information?
Keeping journal/vault in config does looks to me more like a Unix way. It can be a bit danger, but easier to maintain, because:
1. Let's imaging I decided to put region/vault into Journal. There are two options:
a. Put it into beginning of the file, before journal creation.
b. Store same region/vault in each record of the file. It looks like a waste of disk space.
Option (a) looks better. So this way journal will contain something like
region=us-east-1
vault=Photos
in the beginning. But same can be achieved by putting same lines to the config file (see previous question)
2. Also, putting vault/region to journal will make command line options `--vault` and `--region` useless
for general commands and will require to add another command (something like `create-journal-file`)
3. There is a possibility to use different *account id* in Amazon Glacier (i.e. different person's account). It's not supported yet in `mtglacier`,
but when it will, I'll have to store *account id* together with *region*/*vault*. Also default *account id* is '-' (means 'my account'). If one wish to use same
vault from a different Amazon Glacier account, he'll have to change '-' to real account id. So need to have ability to edit *account id*.
And *region/vault* information does not have sense without account.
4. Some users can have different permissions for different vaults, so they needs to maintain `key`/`secret`/`account_id` `region/vault` `journal` relation in same place
(this only can be config file, because involves `secret`)
5. Amazon might allow renaming of vaults or moving it across regions, in the future.
6. Currently journal consists of independent records, so can be split to separate records using `grep`, or several
journals can be merged using `cat` (but be careful if doing that)
7. In the future, there can be other features and options added, such as compression/encryption, which might require to decide again where to put new attributes for it.
8. Usually there is different policy for backing up config files and journal files (modifiable). So if you loose your journal file, you won't be sure which config corresponds to which *vault* (and journal file
can be restored from a *vault*)
9. It's better to keep relation between *vault* and transfer root (`--dir` option) in one place, such as config file.
#### Why Journal (and metadata stored in Amazon Glacier) does not contain file's metadata (like permissions)?
If you want to store permissions, put your files to archives before backup to Amazon Glacier. There are lot's of different possible things to store as file metadata information,
most of them are not portable. Take a look on archives file formats - different formats allows to store different metadata.
It's possible that in the future `mtglacier` will support some other metadata things.
## Specification for some commands
### `sync`
Propagates current local filesystem state to Amazon Glacier server.
`sync` accepts one or several of the following mode options: `--new`, `--replace-modified`, `--delete-removed`
If none of three above mode options provided, `--new` is implied (basically for backward compatibility).
1. `--new`
Uploads files, which exist in local filesystem (and have non-zero size), but not exist in Amazon Glacier (i.e. in Journal)
2. `--replace-modified`
Uploads modified files (i.e. which exist in local filesystem and in Amazon Glacier). After file gets successfully uploaded,
previous version of file is deleted. Logic of detection of modified files controlled by `--detect` option.
3. `--delete-removed`
Deletes files, which exist in Amazon Glacier, but missing in local filesystem (or have zero size) , from Amazon Glacier.
4. `--detect`
( run in 0.921 second using v1.01-cache-2.11-cpan-5a3173703d6 )