App-MtAws
view release on metacpan or search on metacpan
File is always considered modified if its *size changed* (but not zero)
1. `treehash` - calculates TreeHash checksum for file and compares with one in Journal. If checksum does not match - file is modified.
2. `mtime` - compares file last modification time in local filesystem and in journal, if it differs - file is modified.
3. `mtime-or-treehash` - compares file last modification time, if it differs - file is modified. If it matches - compares TreeHash.
4. `mtime-and-treehash` - compares file last modification time, if it differs - compares TreeHash. If modification time is not changed, file
treated as not-modified, treehash not checked.
5. `always-positive` - always treat files as modified, Modification time and TreeHash are ignored. Probably makes some sense only with `--filter` options.
6. `size-only` - treat files as modified only if size differs
NOTE: default mode for detect is `mtime-and-treehash`, it's more performance wise (treehash checked only for files with modification time changed),
but `mtime-or-treehash` and `treehash` are more safe in case you're not sure which programs change your files and how.
NOTE: `mtime-or-treehash` is mnemonic for *File is modified if mtime differs OR treehash differs*
`mtime-and-treehash` is mnemonic for *File is modified if mtime differs AND treehash differs*. Words
*AND* and *OR* means here logical operators with [short-circuit evaluation](http://en.wikipedia.org/wiki/Short-circuit_evaluation)
i.e. with `mtime-and-treehash` treehash never checked if mtime not differs. And with `mtime-or-treehash` treehash never checked if mtime differs.
NOTE: files with zero sizes are not supported by Amazon Glacier API, thus considered non-existing for consistency, for all `sync` modes.
NOTE: `sync` does not upload empty directories, there is no such thing as directory in Amazon Glacier.
NOTE: With `--dry-run` option TreeHash will not be calculated, instead *Will VERIFY treehash and upload...* message will be displayed.
NOTE: TreeHash calculation performed in parallel, so some of workers (defined with `--concurrency`) might be busy calculating treehash instead
of network IO.
### `restore`
Initiate Amazon Glacier RETRIEVE oparation for files listed in Journal, which don't *exist* on local filesystem and for
which RETRIEVE was not initiated during last 24 hours (that information obtained from *Journal* too - each retrieval logged
into journal together with timestamp)
### `restore-completed`
Donwloads files, listed in Journal, which don't *exist* on local filesystem, and which were previously
RETRIEVED (using `restore` command) and now available for download (i.e. in a ~4hours after retrieve).
Unlike `restore` command, list of retrieved files is requested from Amazon Glacier servers at runtime using API, not from
journal.
Data downloaded to unique temporary files (created in same directory as destination file). Temp files renamed to real files
only when download successfully finished. In case program terminated with error or after Ctrl-C, temp files with unfinished
downloads removed.
If `segment-size` specified (greater than 0) and particular file size in megabytes is larger than `segment-size`,
download for this file performed in multiple segments, i.e. using HTTP `Range:` header (each of size `segment-size` MiB, except last,
which can be smaller). Segments are downloaded in parallel (and different segments from different files can
be downloaded at same time).
Only values that are power of two supported for `segment-size` now.
Currenly if download breaks due to network problem, no resumption is performed, download of file or of current segment
started from beginning.
In case multi-segment downloads, TreeHash reported by Amazon Glacier for each segment is compared with actual TreeHash, calculated for segment at runtime.
In case of mismatch error is thrown and process stopped. Final TreeHash for whole file not checked yet.
In case full-file downloads, TreeHash reported by Amazon Glacier for whole file is compared with one calculated runtime and with one found in Journal file,
in case of mismatch, error is thrown and process stopped.
Unlike `partsize` option, `segment-size` does not allocate buffers in memory of the size specified, so you can use large `segment-size`.
### `upload-file`
Uploads a single file into Amazon Glacier. File will be tracked with Journal (just like when using `sync` command).
There are several possible combinations of options for `upload-file`:
1. **--filename** and **--dir**
_Uploads what_: a file, pointed by `filename`.
_Filename in Journal and Amazon Glacier metadata_: A relative path from `dir` to `filename`
./mtglacier upload-file --config glacier.cfg --vault myvault --journal journal.log --dir /data/backup --filename /data/backup/dir1/myfile
(this will upload content of `/data/backup/dir1/myfile` to Amazon Glacier and use `dir1/myfile` as filename for Journal )
./mtglacier upload-file --config glacier.cfg --vault myvault --journal journal.log --dir data/backup --filename data/backup/dir1/myfile
(Let's assume current directory is `/home`. Then this will upload content of `/home/data/backup/dir1/myfile` to Amazon Glacier and use `dir1/myfile` as filename for Journal)
NOTE: file `filename` should be inside directory `dir`
NOTE: both `-filename` and `--dir` resolved to full paths, before determining relative path from `--dir` to `--filename`. Thus yo'll get an error
if parent directories are unreadable. Also if you have `/dir/ds` symlink to `/dir/d3` directory, then `--dir /dir` `--filename /dir/ds/file` will result in relative
filename `d3/file` not `ds/file`
2. **--filename** and **--set-rel-filename**
_Uploads what_: a file, pointed by `filename`.
_Filename in Journal and Amazon Glacier metadata_: As specified in `set-rel-filename`
./mtglacier upload-file --config glacier.cfg --vault myvault --journal journal.log --filename /tmp/myfile --set-rel-filename a/b/c
(this will upload content of `/tmp/myfile` to Amazon Glacier and use `a/b/c` as filename for Journal )
(NOTE: `set-rel-filename` should be a _relative_ filename i.e. must not start with `/`)
3. **--stdin**, **--set-rel-filename** and **--check-max-file-size**
_Uploads what_: a file, read from STDIN
_Filename in Journal and Amazon Glacier metadata_: As specified in `set-rel-filename`
Also, as file size is not known until the very end of upload, need to be sure that file will not exceed 10 000 parts limit, and you must
specify `check-max-file-size` -- maximum possible size of file (in Megabytes), that you can expect. What this option do is simply throw error
if `check-max-file-size`/`partsize` > 10 000 parts (in that case it's recommended to adjust `partsize`). That's all. I remind that you can put this (and
any other option to config file)
./mtglacier upload-file --config glacier.cfg --vault myvault --journal journal.log --stdin --set-rel-filename path/to/file --check-max-file-size 131
(this will upload content of file read from STDIN to Amazon Glacier and use `path/to/file` as filename for Journal. )
(NOTE: `set-rel-filename` should be a _relative_ filename i.e. must not start with `/`)
NOTES:
1. In the current version of mtglacier you are disallowed to store multiple versions of same file. I.e. upload multiple files with same relative filename
to a single Amazon Glacier vault and single Journal. Simple file versioning will be implemented in the future versions.
2. You can use other optional options with this command (`concurrency`, `partsize`)
### `retrieve-inventory`
Issues inventory retrieval request for `--vault`.
You can specify inventory format with `--request-inventory-format`. Allowed values are `json` and `csv`. Defaults to `json`.
Although it's not recommended to use `csv` unless you have to. Amazon CSV format is not documented, has bugs and `mt-aws-glacier` CSV parsing
implementation (i.e. `download-inventory` command) is ~ 10 times slower than JSON.
See also [Restoring journal](#restoring-journal) for `retrieve-inventory`, `download-inventory` commands examples.
### `download-inventory`
Parses Amazon glacier job list (for `--vault`) taken from Amazon servers at runtime, finds latest (by initiation date) inventory retrieval request,
downloads it, converts to journal file and saves to `--new-journal`. Both `CSV` and `JSON` jobs are supported.
See also [Restoring journal](#restoring-journal) for `retrieve-inventory`, `download-inventory` commands examples.
### `list-vaults`
Lists all vaults in region specified by `--region` (with a respect to IAM permissions for listing vaults), prints it to the screen. Default format is human readable, not
for parsing. Use `--format=mtmsg` for machine readable tab separated format (which is not yet documented here, however it's self-explanatory and backward compatability is guaranteed;
one note - LastInventoryDate can be empty string as Amazon API can return it as null).
### Other commands
See [usage](#usage) for examples of use of the following commands: `purge-vault`, `check-local-hash`, `create-vault`, `delete-vault`.
## File selection options
`filter`, `include`, `exclude` options allow you to construct a list of RULES to select only certain files for the operation.
Can be used with commands: `sync`, `purge-vault`, `restore`, `restore-completed ` and `check-local-hash`
+ **--filter**
Adds one or several RULES to the list of rules. One filter value can contain multiple rules, it has same effect as multiple filter values with one
RULE each.
--filter 'RULE1 RULE2' --filter 'RULE3'
is same as
--filter 'RULE1 RULE2 RULE3'
9. `follow` (only `sync` command)
Follow symbolic links during directory traversal. This option hits performance and increases memory usage. Similar to `find -L`
## Configuring Character Encodings
Autodetection of locale/encodings not implemented yet, but currently there is ability to tune encodings manually.
Below 4 options, that can be used in config file and in command line.
1. `terminal-encoding` - Encoding of your terminal (STDOUT/STDERR for system messages)
2. `filenames-encoding` - Encoding of filenames in filesystem.
Under most *nix filesystems filenames stored as byte sequences, not characters. So in theory application is responsible for managing encodings.
3. `config-encoding` - Encoding of your config file (`glacier.cfg` in examples above)
4. `journal-encoding` - Encoding to be used for Journal file (when reading and writing journal specified with `--journal` and `--new-journal` options)
Default value for all options is 'UTF-8'. Under Linux and Mac OS X you usually don't need to change encodings.
Under *BSD systems often single-byte encodings are used. Most likely yo'll need to change `terminal-encoding` and `filenames-encoding`. Optionaly you can also
change `config-encoding` and `journal-encoding`.
Notes:
* Before switching `config-encoding` and `journal-encoding` you are responsible for transcoding file content of config and journal files manually.
* You are responsible for encoding compatibility. For example Don't try to work with UTF-8 journal with non-Cyrilic characters and KOI8-R (Cyrilic) filesystem.
* Don't try to use UTF-16 for *nix filesystem. It's not ASCII compatible and contains \x00 bytes, which can't be stored in filesystem.
* Don't use `UTF8` - it does not validate data, use `UTF-8` (one with a dash) instead.
* To get list of encodings installed with your Perl run:
perl -MEncode -e 'print join qq{\n}, Encode->encodings(q{:all})'
* Config file name (specified with `--config`) can be in any encoding (it's used as is) Of course it will work only if your terminal encoding match your
filesystem encoding or if your config file name consists of ASCII-7bit characters only.
* Additional information about encoding support in Perl programming language: [CPAN module Encode::Supported](http://search.cpan.org/perldoc?Encode%3A%3ASupported)
* Amazon Glacier metadata (on Amazon servers) is always stored in UTF-8. No way to override it. You can use Journal in any encoding with same
metdata without problems and you can dump metadata to journals with different encodings (using `download-inventory` command)
* See also [convmv tool](http://www.j3e.de/linux/convmv/man/)
## Limitations
* Only support filenames, which consist of octets, that can be mapped to a valid character sequence in desired encoding (i.e. filename
which are made of random bytes/garbage is not supported. usually it's not a problem).
* Filenames with CR (Carriage return, code 0x0D) LF (Line feed, code 0x0A) and TAB (0x09) are not supported (usually not a problem too).
* Length of relative filenames. Currently limit is about 700 ASCII characters or 350 2-byte UTF-8 character (.. or 230 3-byte characters).
* File modification time should be in range from year 1000 to year 9999.
(NOTE: if above requirements are not met, error will be thrown)
* If you uploaded files with file modifications dates past Y2038 on system which supports it, and then restored on system
which does not (like Linux 32bit), resulting file timestamp (of course) wrong and also
unpredictible (undefined behaviour). The only thing is guaranteed that if you restore journal from Amazon servers on affected (i.e. 32bit)
machine - journal will contain correct timestamp (same as on 64bit).
* Memory usage (for 'sync') formula is ~ min(NUMBER_OF_FILES_TO_SYNC, max-number-of-files) + partsize*concurrency
* With high partsize*concurrency there is a risk of getting network timeouts HTTP 408/500.
## Test/Play with it
1. create empty dir MYDIR
2. Set vault name inside `cycletest.sh`
3. Run
./cycletest.sh init MYDIR
./cycletest.sh retrieve MYDIR
./cycletest.sh restore MYDIR
* OR
./cycletest.sh init MYDIR
./cycletest.sh purge MYDIR
## See also
* Amazon Glacier Perl library on CPAN - see [Net::Amazon::Glacier][Amazon Glacier API CPAN module - Net::Amazon::Glacier] by *Tim Nordenfur*
* Amazon Glacier TreeHash CPAN module [Net::Amazon::TreeHash][Amazon Glacier TreeHash CPAN module - Net::Amazon::TreeHash] (copied from `mtglacier` code)
* [Amazon Glacier development forum][Amazon Glacier development forum]
[Amazon Glacier API CPAN module - Net::Amazon::Glacier]:https://metacpan.org/module/Net::Amazon::Glacier
[Amazon Glacier TreeHash CPAN module - Net::Amazon::TreeHash]:https://metacpan.org/module/Net::Amazon::TreeHash
[Amazon Glacier development forum]:https://forums.aws.amazon.com/forum.jspa?forumID=140
## Minimum Amazon Glacier permissions:
Something like this (including permissions to create/delete vaults):
{
"Statement": [
{
"Effect": "Allow",
"Resource":["arn:aws:glacier:eu-west-1:*:vaults/test1",
"arn:aws:glacier:us-east-1:*:vaults/test1",
"arn:aws:glacier:eu-west-1:*:vaults/test2",
"arn:aws:glacier:eu-west-1:*:vaults/test3"],
"Action":["glacier:UploadArchive",
"glacier:InitiateMultipartUpload",
"glacier:UploadMultipartPart",
"glacier:UploadPart",
"glacier:DeleteArchive",
"glacier:ListParts",
"glacier:InitiateJob",
"glacier:ListJobs",
"glacier:GetJobOutput",
"glacier:ListMultipartUploads",
"glacier:CompleteMultipartUpload"]
( run in 0.490 second using v1.01-cache-2.11-cpan-f56aa216473 )