Catalyst-Runtime
view release on metacpan or search on metacpan
lib/Catalyst/UTF8.pod view on Meta::CPAN
form fields should be encoded in the same way that the document was served with. That means
that if you are using modern Catalyst and serving UTF8 encoded responses, a browser is
supposed to notice that and encode the form POSTs accordingly.
As a result since L<Catalyst> now serves UTF8 encoded responses by default, this means that
you can mostly rely on incoming form POSTs to be so encoded. L<Catalyst> will make this
assumption and decode accordingly (unless you explicitly turn off encoding...) If you are
running Catalyst in developer debug, then you will see the correct unicode characters in
the debug output. For example if you generate a POST request:
use Catalyst::Test 'MyApp';
use utf8;
my $res = request POST "/example/posted", ['â¥'=>'â¥', 'â¥â¥'=>'â¥'];
Running in CATALYST_DEBUG=1 mode you should see output like this:
[debug] Body Parameters are:
.-------------------------------------+--------------------------------------.
| Parameter | Value |
+-------------------------------------+--------------------------------------+
| ⥠| ⥠|
| â¥â¥ | ⥠|
'-------------------------------------+--------------------------------------'
And if you had a controller like this:
package MyApp::Controller::Example;
use base 'Catalyst::Controller';
sub posted :POST Local {
my ($self, $c) = @_;
$c->res->content_type('text/plain');
$c->res->body("hearts => ${\$c->req->post_parameters->{â¥}}");
}
The following test case would be true:
use Encode 2.21 'decode_utf8';
is decode_utf8($req->content), 'hearts => â¥';
In this case we decode so that we can print and compare strings with multibyte characters.
B<NOTE> In some cases some browsers may not follow the specification and set the form POST
encoding based on the server response. Catalyst itself doesn't attempt any workarounds, but one
common approach is to use a hidden form field with a UTF8 value (You might be familiar with
this from how Ruby on Rails has HTML form helpers that do that automatically). In that case
some browsers will send UTF8 encoded if it notices the hidden input field contains such a
character. Also, you can add an HTML attribute to your form tag which many modern browsers
will respect to set the encoding (accept-charset="utf-8"). And lastly there are some javascript
based tricks and workarounds for even more odd cases (just search the web for this will return
a number of approaches. Hopefully as more compliant browsers become popular these edge cases
will fade.
B<NOTE> It is possible for a form POST multipart response (normally a file upload) to contain
inline content with mixed content character sets and encoding. For example one might create
a POST like this:
use utf8;
use HTTP::Request::Common;
my $utf8 = 'test â¥';
my $shiftjs = 'test ãã¹ã';
my $req = POST '/root/echo_arg',
Content_Type => 'form-data',
Content => [
arg0 => 'helloworld',
Encode::encode('UTF-8','â¥') => Encode::encode('UTF-8','â¥â¥'),
arg1 => [
undef, '',
'Content-Type' =>'text/plain; charset=UTF-8',
'Content' => Encode::encode('UTF-8', $utf8)],
arg2 => [
undef, '',
'Content-Type' =>'text/plain; charset=SHIFT_JIS',
'Content' => Encode::encode('SHIFT_JIS', $shiftjs)],
arg2 => [
undef, '',
'Content-Type' =>'text/plain; charset=SHIFT_JIS',
'Content' => Encode::encode('SHIFT_JIS', $shiftjs)],
];
In this case we've created a POST request but each part specifies its own content
character set (and setting a content encoding would also be possible). Generally one
would not run into this situation in a web browser context but for completeness sake
Catalyst will notice if a multipart POST contains parts with complex or extended
header information. In these cases we will try to inspect the meta data and do the
right thing (in the above case we'd use SHIFT_JIS to decode, not UTF-8). However if
after inspecting the headers we cannot figure out how to decode the data, in those cases it
will not attempt to apply decoding to the form values. Instead the part will be represented as
an instance of an object L<Catalyst::Request::PartData> which will contain all the header
information needed for you to perform custom parser of the data.
Ideally we'd fix L<Catalyst> to be smarter about decoding so please submit your cases of
this so we can add intelligence to the parser and find a way to extract a valid value out
of it.
=head1 UTF8 Encoding in Body Response
When does L<Catalyst> encode your response body and what rules does it use to
determine when that is needed.
=head2 Summary
use utf8;
use warnings;
use strict;
package MyApp::Controller::Root;
use base 'Catalyst::Controller';
use File::Spec;
sub scalar_body :Local {
my ($self, $c) = @_;
$c->response->content_type('text/html');
$c->response->body("<p>This is scalar_body action â¥</p>");
}
sub stream_write :Local {
( run in 0.665 second using v1.01-cache-2.11-cpan-39bf76dae61 )