Apache-Request-I18N
view release on metacpan or search on metacpan
- Minor documentation fixes
0.05 Fri Dec 2 20:50:59 2005
- Added testing using Apache::Test
Enhancements:
- Uploads are now automatically blessed in the appropriate subclass
Bugfixes:
- Suppressed parms() warning that occurred when ENCODE_PARMS was empty
- Fixed incorrect handling of Content-Type charset in form-urlencoded
Documentation:
- Mention case-insentivity mess in compatibility issues
- Explicitly mention scalar(param()) in compatibility issues
- Added reference to RFC 2388
0.04 Sun Nov 27 23:57:13 2005
- First public release
- Brought the documentation to an acceptable level
Changes:
- It is now a fatal error to have an empty DECODE_PARMS
- It is no longer a fatal error to provide Apache::Request arguments
to new() or instance() if the Apache::Request object already exists
- Calling args() now produces a warning
Enhancements:
- It should now be possible to subclass this module
- The Content-Type charset for x-www-form-urlencoded is now used, if
provided
- Support was added for language specification in encoded words
(RFC 2184, section 5)
Bugfixes:
- Form fields in the query string are no longer decoded for
multipart/form-data
- handler() now returns DECLINED instead of OK
- instance() was ignoring any previously existing Apache::Request
#
# Proper encoding of parameter values is defined in RFC 2184; unfortunately,
# libapreq does not recognize this format, so we can't support it.
{{
my $SPACE = '\040';
my $CTL = '\000-\037\377';
my $especials = quotemeta '()<>@,;:\\"/[]?.=';
my $token = qr/ [^ $SPACE $CTL $especials ]+ /x;
my $charset = $token;
my $language = $token;
my $encoding = $token;
my $encoded_text = qr/ [ \041-\076 \100-\176 ]+ /x;
my $encoded_word = qr/ =\? $charset (?: \* $language )? \? $encoding \?
$encoded_text \?= /x;
sub _decode_value {
my ($self, $value) = @_;
if ($value =~ /$encoded_word/o) {
return Encode::decode('MIME-Header', $value);
} else {
return $self->_decode($value);
}
# Remember which arguments were passed on the query string
#
# This used to call Apache->args, but it doesn't behave so well with
# ill-formed query strings. Apache::Request->query_params would be
# nice, but it was introduced in 1.3, and Debian sarge only has 1.1.
my %args = map { defined $_ ? $_ : '' }
map Apache::unescape_url_info(defined $_ ? $_ : ''),
map /^([^=]*)(?:=(.*))?/,
split /[&;]+/ => $self->query_string;
# Extract the Content-Type charset for x-www-form-urlencoded
my ($is_urlenc, $charset);
my ($ctype) = split_header_words($self->header_in('Content-Type'));
if ($ctype->[0] && $ctype->[0] eq 'application/x-www-form-urlencoded') {
$is_urlenc = 1;
my %tmp = @$ctype;
$charset = $tmp{charset};
}
my $old_parms = $self->SUPER::parms;
my $new_parms = new Apache::Table $self, scalar keys %$old_parms;
$old_parms->do( sub {
my ($key, $val) = @_;
# POSTed multipart/form-data form field names are supplied as
# a Content-Disposition parameter, so they are handled
# differently.
if ($is_urlenc || $args{$key}) {
$key = $self->_decode($key, $charset);
} else {
$key = $self->_decode_value($key);
}
# Same thing for filenames
if ($self->SUPER::upload($key)) {
$val = $self->_decode_value($val)
} else {
$val = $self->_decode($val, $charset);
}
$_ = $self->_encode($_) foreach $key, $val;
$new_parms->add($key, $val);
return 1;
} );
$self->{_old_parms} = $old_parms;
field names and filenames as specified by RFC 2184 is currently not supported.
(This is due to a limitation in I<libapreq>.)
Conversely, since some user-agents are known to encode such values via RFC
2047, we attempt decoding if possible. This means that a value supplied by a
standard-compliant user-agent may be wrongly decoded.
=item *
When using the B<multipart/form-data> encoding, each form field value may have
its character encoding specified via the I<charset> parameter of its
I<Content-Type> header. This value is currently ignored. (This is due to a
limitation in I<libapreq>.)
Similarly, the I<Content-Transfer-Encoding> header is also ignored.
=item *
When using upload hooks, the upload object supplied to UPLOAD_HOOK will not
have had its I<name>() and I<filename>() decoded yet.
=for comment
Note that doing so within a Mason component will have no effect, as Mason will
have already parsed and remembered all form fields.
=for comment
We should probably make _mangle_parms lazy, and only call it from param() and
such.
=item *
Automatically decode the contents of a B<text/*> file upload if a charset has
been provided.
=for comment
This should probably be optional, since we wouldn't know what to do with an
upload that doesn't have a charset. (Neither DECODE_PARMS nor the local
native charset would be appropriate here.) Besides, if ENCODE_PARMS was
defined, we'll still return a handle that spits out wide characters. (Come to
think of it, do any user-agents even bother providing a charset anyway?)
=item *
Allow for more than one DECODE_PARMS, and try to guess which one is
appropriate.
=item *
Use the I<User-Agent> header to figure out how far from the standards we must
stray.
=item *
Write a short text about the various standards and issues.
=head1 SEE ALSO
<http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html>
RFC 1522 - MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII Text
RFC 1806 - Communicating Presentation Information in Internet Messages: The Content-Disposition Header [2.3]
RFC 1866 - Hypertext Markup Language - 2.0 [8.2.1]
RFC 1867 - Form-based File Upload in HTML [3.3, 5.11]
RFC 2047 - MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text [5]
RFC 2070 - Internationalization of the Hypertext Markup Language [5.2]
RFC 2183 - Communicating Presentation Information in Internet Messages: The Content-Disposition Header Field [2, 2.3]
RFC 2231 - MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations
RFC 2388 - Returning Values from Forms: multipart/form-data
t/dump.d/02_post.in
t/dump.d/02_post.pl
t/dump.d/03_multipart.in
t/dump.d/03_multipart.pl
t/dump.d/04_upload.in
t/dump.d/04_upload.pl
t/dump.d/05_decode.in
t/dump.d/05_decode.pl
t/dump.d/06_encode.in
t/dump.d/06_encode.pl
t/dump.d/charset_urlencoded.in
t/dump.d/charset_urlencoded.pl
t/dump.d/mime.in
t/dump.d/mime.pl
t/dump.d/mime_filename.in
t/dump.d/mime_filename.pl
t/dump.d/mime_lang.in
t/dump.d/mime_lang.pl
t/dump.d/multiple_params.in
t/dump.d/multiple_params.pl
t/dump.d/multiple_uploads.in
t/dump.d/multiple_uploads.pl
t/dump.d/charset_urlencoded.in view on Meta::CPAN
# application/x-www-form-urlencoded charset override
POST /latin1?string=Cha%2bAO4-ne&Cha%2bAO4-ne=string HTTP/1.0
Content-Type: application/x-www-form-urlencoded; charset=utf-7
long_string=Longue+cha%2bAO4-ne&Longue+cha%2bAO4-ne=long_string
( run in 0.274 second using v1.01-cache-2.11-cpan-4d50c553e7e )