App-MtAws
view release on metacpan or search on metacpan
lib/App/MtAws/MetaData.pm view on Meta::CPAN
# mt-aws-glacier - Amazon Glacier sync client
# Copyright (C) 2012-2014 Victor Efimov
# http://mt-aws.com (also http://vs-dev.com) vs@vs-dev.com
# License: GPLv3
#
# This file is part of "mt-aws-glacier"
#
# mt-aws-glacier is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# mt-aws-glacier is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
package App::MtAws::MetaData;
our $VERSION = '1.120';
use strict;
use warnings;
use utf8;
use Encode;
use MIME::Base64;
use JSON::XS;
use POSIX;
use Time::Local;
use App::MtAws::DateTime;
use constant MAX_SIZE => 1024;
use constant META_JOB_TYPE_FULL => 'full';
use Exporter 'import';
our @EXPORT = qw/meta_decode meta_job_decode meta_encode meta_job_encode META_JOB_TYPE_FULL/;
=head1 MT-AWS-GLACIER metadata format ('x-amz-archive-description' field).
Function definitions:
=====================
base64url() input - byte sequence, output - byte sequence
Is Base64 URL algorithm: http://en.wikipedia.org/wiki/Base64#URL_applications
basically it's base64 but with '=' padding removed, characters '+', '/' replaced with '-', '_' resp. and no new lines.
json_utf8() - input - Hash, output - byte sequence
JSON string in UTF-8 representation. Can contain not-escaped UTF-8 characters. Will not contain linefeed. Hash objects are unordered.
latin1_to_utf8() - input - byte sequence, output - byte sequence
Treats input data as Latin1 (ISO 8859-1) encoded sequence and converts it to UTF-8 sequence
isoO8601() - input - time, output - character string
ISOO8601 time in the following format YYYYMMDDTHHMMSSZ. Only UTC timezone. No leap seconds supported.
Supported year range is from 1000 to 9999
When encoding isoO8601() mt-aws-glacier will not store leap seconds. When decoding from isoO8601 leap seconds will be dropped.
{'filename': FILENAME, 'mtime': iso8601(MTIME)}
Hash with two keys: 'filename' and 'mtime'. Corresponds to JSON 'Object'.
Input data:
=====================
FILENAME (character string)
Is a relative filename (no leading slash). Filename is taken from file system and treated as a character sequence
with known encoding.
MTIME (time)
is file last modification time with 1 second resolution. Can be below Y1970.
Internal representation is epoch time, so it can be any valid epoch time (including negative values and zero).Supported
range - from year 1000 to 9999 (inclusive)
Version 'mt2'
=====================
x-amz-archive-description = 'mt2' <space> base64url(json_utf8({'filename': FILENAME, 'mtime': iso8601(MTIME)}))
Version 'mt1'
=====================
x-amz-archive-description = 'mt1' <space> base64url(latin1_to_utf8(json_utf8({'filename': FILENAME, 'mtime': iso8601(MTIME)})))
This format actually contains a bug - data is double encoded. However it does not affect data integrity. UTF-8 double encoded data can be
perfectly decoded (see http://www.j3e.de/linux/convmv/man/) - that's why the bug was unnoticed during one month.
This format was in use starting from version 0.80beta (2012-12-27) till 0.84beta (2013-01-28).
NOTES:
=====================
1) This specification assumes that in our programming language we have two different types of Strings: Byte string (byte sequence) and Character strings.
Byte string is sequence of octets. Character string is an internal representation of sequence of characters. Character strings cannot have encodings
by definition - it's internal, encoding is known to language implementation.
Some programming languages (like Ruby) have different model, when every string is a sequence of bytes with a known encoding (or no encoding at all).
2) According to this spec. Same (FILENAME,MTIME) values can produce different x-amz-archive-description, as JSON hash is unordered.
3) This specification explains how to _encode_ data (because it's a specification). However it's easy to
understant how to decode it back.
4) Path separator in filename is '/'
=cut
my $meta_coder = ($JSON::XS::VERSION ge '1.4') ?
JSON::XS->new->utf8->max_depth(1)->max_size(MAX_SIZE) : # some additional abuse-protection
JSON::XS->new->utf8; # it's still protected by length checking below
sub meta_decode
( run in 2.155 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )