Catalyst-Runtime

 view release on metacpan or  search on metacpan

lib/Catalyst/UTF8.pod  view on Meta::CPAN

Using UTF8 characters in your Controller classes and actions.

=head2 Summary

In this section we will review changes to how UTF8 characters can be used in
controller actions, how it looks in the debugging screens (and your logs)
as well as how you construct L<URL> objects to actions with UTF8 paths
(or using UTF8 args or captures).

=head2 Unicode in Controllers and URLs

    package MyApp::Controller::Root;

    use utf8;
    use base 'Catalyst::Controller';

    sub heart_with_arg :Path('♥') Args(1)  {
      my ($self, $c, $arg) = @_;
    }

    sub base :Chained('/') CaptureArgs(0) {
      my ($self, $c) = @_;
    }

      sub capture :Chained('base') PathPart('♥') CaptureArgs(1) {
        my ($self, $c, $capture) = @_;
      }

        sub arg :Chained('capture') PathPart('♥') Args(1) {
          my ($self, $c, $arg) = @_;
        }

=head2 Discussion

In the example controller above we have constructed two matchable URL routes:

    http://localhost/root/♥/{arg}
    http://localhost/base/♥/{capture}/♥/{arg}

The first one is a classic Path type action and the second uses Chaining, and
spans three actions in total.  As you can see, you can use unicode characters
in your Path and PathPart attributes (remember to use the C<utf8> pragma to allow
these multibyte characters in your source).  The two constructed matchable routes
would match the following incoming URLs:

    (heart_with_arg) -> http://localhost/root/%E2%99%A5/{arg}
    (base/capture/arg) -> http://localhost/base/%E2%99%A5/{capture}/%E2%99%A5/{arg}

That path path C<%E2%99%A5> is url encoded unicode (assuming you are hitting this with
a reasonably modern browser).  Its basically what goes over HTTP when your type a
browser location that has the unicode 'heart' in it.  However we will use the unicode
symbol in your debugging messages:

    [debug] Loaded Path actions:
    .-------------------------------------+--------------------------------------.
    | Path                                | Private                              |
    +-------------------------------------+--------------------------------------+
    | /root/♥/*                          | /root/heart_with_arg                  |
    '-------------------------------------+--------------------------------------'

    [debug] Loaded Chained actions:
    .-------------------------------------+--------------------------------------.
    | Path Spec                           | Private                              |
    +-------------------------------------+--------------------------------------+
    | /base/♥/*/♥/*                       | /root/base (0)                       |
    |                                     | -> /root/capture (1)                 |
    |                                     | => /root/arg                         |
    '-------------------------------------+--------------------------------------'

And if the requested URL uses unicode characters in your captures or args (such as
C<http://localhost:/base/♥/♥/♥/♥>) you should see the arguments and captures as their
unicode characters as well:

    [debug] Arguments are "♥"
    [debug] "GET" request for "base/♥/♥/♥/♥" from "127.0.0.1"
    .------------------------------------------------------------+-----------.
    | Action                                                     | Time      |
    +------------------------------------------------------------+-----------+
    | /root/base                                                 | 0.000080s |
    | /root/capture                                              | 0.000075s |
    | /root/arg                                                  | 0.000755s |
    '------------------------------------------------------------+-----------'

Again, remember that we are display the unicode character and using it to match actions
containing such multibyte characters BUT over HTTP you are getting these as URL encoded
bytes.  For example if you looked at the L<PSGI> C<$env> value for C<REQUEST_URI> you
would see (for the above request)

    REQUEST_URI => "/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5"

So on the incoming request we decode so that we can match and display unicode characters
(after decoding the URL encoding).  This makes it straightforward to use these types of
multibyte characters in your actions and see them incoming in captures and arguments.  Please
keep this in might if you are doing for example regular expression matching, length determination
or other string comparisons, you will need to try these incoming variables as though UTF8
strings.  For example in the following action:

        sub arg :Chained('capture') PathPart('♥') Args(1) {
          my ($self, $c, $arg) = @_;
        }

when $arg is "♥" you should expect C<length($arg)> to be C<1> since it is indeed one character
although it will take more than one byte to store.

=head2 UTF8 in constructing URLs via $c->uri_for

For the reverse (constructing meaningful URLs to actions that contain multibyte characters
in their paths or path parts, or when you want to include such characters in your captures
or arguments) L<Catalyst> will do the right thing (again just remember to use the C<utf8>
pragma).

    use utf8;
    my $url = $c->uri_for( $c->controller('Root')->action_for('arg'), ['♥','♥']);

When you stringify this object (for use in a template, for example) it will automatically
do the right thing regarding utf8 encoding and url encoding.

    http://localhost/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5

Since again what you want is a properly url encoded version of this.  In this case your string
length will reflect URL encoded bytes, not the character length.  Ultimately what you want



( run in 0.925 second using v1.01-cache-2.11-cpan-39bf76dae61 )