Catalyst-Runtime

 view release on metacpan or  search on metacpan

lib/Catalyst/UTF8.pod  view on Meta::CPAN

    .------------------------------------------------------------+-----------.
    | Action                                                     | Time      |
    +------------------------------------------------------------+-----------+
    | /root/base                                                 | 0.000080s |
    | /root/capture                                              | 0.000075s |
    | /root/arg                                                  | 0.000755s |
    '------------------------------------------------------------+-----------'

Again, remember that we are display the unicode character and using it to match actions
containing such multibyte characters BUT over HTTP you are getting these as URL encoded
bytes.  For example if you looked at the L<PSGI> C<$env> value for C<REQUEST_URI> you
would see (for the above request)

    REQUEST_URI => "/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5"

So on the incoming request we decode so that we can match and display unicode characters
(after decoding the URL encoding).  This makes it straightforward to use these types of
multibyte characters in your actions and see them incoming in captures and arguments.  Please
keep this in might if you are doing for example regular expression matching, length determination
or other string comparisons, you will need to try these incoming variables as though UTF8
strings.  For example in the following action:

        sub arg :Chained('capture') PathPart('♥') Args(1) {
          my ($self, $c, $arg) = @_;
        }

when $arg is "♥" you should expect C<length($arg)> to be C<1> since it is indeed one character
although it will take more than one byte to store.

=head2 UTF8 in constructing URLs via $c->uri_for

For the reverse (constructing meaningful URLs to actions that contain multibyte characters
in their paths or path parts, or when you want to include such characters in your captures
or arguments) L<Catalyst> will do the right thing (again just remember to use the C<utf8>
pragma).

    use utf8;
    my $url = $c->uri_for( $c->controller('Root')->action_for('arg'), ['♥','♥']);

When you stringify this object (for use in a template, for example) it will automatically
do the right thing regarding utf8 encoding and url encoding.

    http://localhost/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5

Since again what you want is a properly url encoded version of this.  In this case your string
length will reflect URL encoded bytes, not the character length.  Ultimately what you want
to send over the wire via HTTP needs to be bytes.

=head1 UTF8 in GET Query and Form POST

What Catalyst does with UTF8 in your GET and classic HTML Form POST

=head2 UTF8 in URL query and keywords

The same rules that we find in URL paths also cover URL query parts.  That is
if one types a URL like this into the browser

    http://localhost/example?♥=♥♥

When this goes 'over the wire' to your application server its going to be as
percent encoded bytes:


    http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5

When L<Catalyst> encounters this we decode the percent encoding and the utf8
so that we can properly display this information (such as in the debugging
logs or in a response.)

    [debug] Query Parameters are:
    .-------------------------------------+--------------------------------------.
    | Parameter                           | Value                                |
    +-------------------------------------+--------------------------------------+
    | ♥                                   | ♥♥                                   |
    '-------------------------------------+--------------------------------------'

All the values and keys that are part of $c->req->query_parameters will be
utf8 decoded.  So you should not need to do anything special to take those
values/keys and send them to the body response (since as we will see later
L<Catalyst> will do all the necessary encoding for you).

Again, remember that values of your parameters are now decode into Unicode strings.  so
for example you'd expect the result of length to reflect the character length not
the byte length.

Just like with arguments and captures, you can use utf8 literals (or utf8
strings) in $c->uri_for:

    use utf8;
    my $url = $c->uri_for( $c->controller('Root')->action_for('example'), {'♥' => '♥♥'});

When you stringify this object (for use in a template, for example) it will automatically
do the right thing regarding utf8 encoding and url encoding.

    http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5

Since again what you want is a properly url encoded version of this.  Ultimately what you want
to send over the wire via HTTP needs to be bytes (not unicode characters).

Remember if you use any utf8 literals in your source code, you should use the
C<use utf8> pragma.

B<NOTE:> Assuming UTF-8 in your query parameters and keywords may be an issue if you have
legacy code where you created URL in templates manually and used an encoding other than UTF-8.
In these cases you may find versions of Catalyst after 5.90080+ will incorrectly decode.  For
backwards compatibility we offer three configurations settings, here described in order of
precedence:

C<do_not_decode_query>

If true, then do not try to character decode any wide characters in your
request URL query or keywords.  You will need to handle this manually in your action code
(although if you choose this setting, chances are you already do this).

C<default_query_encoding>

This setting allows one to specify a fixed value for how to decode your query, instead of using
the default, UTF-8.

C<decode_query_using_global_encoding>

If this is true we decode using whatever you set C<encoding> to.

=head2 UTF8 in Form POST

In general most modern browsers will follow the specification, which says that POSTed



( run in 1.395 second using v1.01-cache-2.11-cpan-75ffa21a3d4 )