unicode results from the CPAN

AnyEvent

=encoding utf-8

=head1 NAME

AnyEvent::Util - various utility functions.

=head1 SYNOPSIS

   use AnyEvent::Util;

=head1 DESCRIPTION

This module implements various utility functions, mostly replacing
well-known functions by event-ised counterparts.

All functions documented without C<AnyEvent::Util::> prefix are exported
by default.

=over 4

=cut

package AnyEvent::Util;

use Carp ();
use Errno ();
use Socket ();

use AnyEvent (); BEGIN { AnyEvent::common_sense }

use base 'Exporter';

our @EXPORT = qw(fh_nonblocking guard fork_call portable_pipe portable_socketpair run_cmd);
our @EXPORT_OK = qw(
   AF_INET6 WSAEWOULDBLOCK WSAEINPROGRESS WSAEINVAL
   close_all_fds_except
   punycode_encode punycode_decode idn_nameprep idn_to_ascii idn_to_unicode
);

our $VERSION = $AnyEvent::VERSION;

BEGIN {
   # provide us with AF_INET6, but only if allowed
   if (
      $AnyEvent::PROTOCOL{ipv6}
      && _AF_INET6
      && socket my $ipv6_socket, _AF_INET6, Socket::SOCK_DGRAM(), 0 # check if they can be created
   ) {
      *AF_INET6 = \&_AF_INET6;
   } else {
      # disable ipv6
      *AF_INET6 = sub () { 0 };
      delete $AnyEvent::PROTOCOL{ipv6};
   }

   # fix buggy Errno on some non-POSIX platforms
   # such as openbsd and windows.
   my %ERR = (
      EBADMSG => Errno::EDOM   (),
      EPROTO  => Errno::ESPIPE (),
   );

   while (my ($k, $v) = each %ERR) {
      next if eval "Errno::$k ()";
      AE::log 8 => "Broken Errno module, adding Errno::$k.";

      eval "sub Errno::$k () { $v }";
      push @Errno::EXPORT_OK, $k;
      push @{ $Errno::EXPORT_TAGS{POSIX} }, $k;
   }
}

=item ($r, $w) = portable_pipe

Calling C<pipe> in Perl is portable - except it doesn't really work on
sucky windows platforms (at least not with most perls - cygwin's perl
notably works fine): On windows, you actually get two file handles you
cannot use select on.

This function gives you a pipe that actually works even on the broken
windows platform (by creating a pair of TCP sockets on windows, so do not
expect any speed from that) and using C<pipe> everywhere else.

See C<portable_socketpair>, below, for a bidirectional "pipe".

Returns the empty list on any errors.

=item ($fh1, $fh2) = portable_socketpair

Just like C<portable_pipe>, above, but returns a bidirectional pipe
(usually by calling C<socketpair> to create a local loopback socket pair,
except on windows, where it again returns two interconnected TCP sockets).

Returns the empty list on any errors.

=cut

lib/AnyEvent/Util.pm view on Meta::CPAN

      my (@oldfh, @close);
      for my $fh (values %redir) {
         push @oldfh, $fh; # make sure we keep it open
         $fh = fileno $fh; # we only want the fd

         # dup if we are in the way
         # if we "leak" fds here, they will be dup2'ed over later
         defined ($fh = POSIX::dup ($fh)) or POSIX::_exit (124)
            while exists $redir{$fh};
      }

      # step 3, execute redirects
      while (my ($k, $v) = each %redir) {
         defined POSIX::dup2 ($v, $k)
            or POSIX::_exit (123);
      }

      # step 4, close everything else, except 0, 1, 2
      if ($arg{close_all}) {
         close_all_fds_except 0, 1, 2, keys %redir
      } else {
         POSIX::close ($_)
            for values %redir;
      }

      eval { $arg{on_prepare}(); 1 } or POSIX::_exit (123)
         if exists $arg{on_prepare};

      ref $cmd
         ? exec {$cmd->[0]} @$cmd
         : exec $cmd;

      POSIX::_exit (126);
   }

   ${$arg{'$$'}} = $pid
      if $arg{'$$'};

   %redir = (); # close child side of the fds

   my $status;
   $cv->begin (sub { shift->send ($status) });
   my $cw; $cw = AE::child $pid, sub {
      $status = $_[1];
      undef $cw; $cv->end;
   };

   $cv
}

=item AnyEvent::Util::punycode_encode $string

Punycode-encodes the given C<$string> and returns its punycode form. Note
that uppercase letters are I<not> casefolded - you have to do that
yourself.

Croaks when it cannot encode the string.

=item AnyEvent::Util::punycode_decode $string

Tries to punycode-decode the given C<$string> and return its unicode
form. Again, uppercase letters are not casefoled, you have to do that
yourself.

Croaks when it cannot decode the string.

=cut

sub punycode_encode($) {
   require "AnyEvent/Util/idna.pl";
   goto &punycode_encode;
}

sub punycode_decode($) {
   require "AnyEvent/Util/idna.pl";
   goto &punycode_decode;
}

=item AnyEvent::Util::idn_nameprep $idn[, $display]

Implements the IDNA nameprep normalisation algorithm. Or actually the
UTS#46 algorithm. Or maybe something similar - reality is complicated
between IDNA2003, UTS#46 and IDNA2008. If C<$display> is true then the name
is prepared for display, otherwise it is prepared for lookup (default).

If you have no clue what this means, look at C<idn_to_ascii> instead.

This function is designed to avoid using a lot of resources - it uses
about 1MB of RAM (most of this due to Unicode::Normalize). Also, names
that are already "simple" will only be checked for basic validity, without
the overhead of full nameprep processing.

=cut

our ($uts46_valid, $uts46_imap);

sub idn_nameprep($;$) {
   local $_ = $_[0];

   # lowercasing these should always be valid, and is required for xn-- detection
   y/A-Z/a-z/;

   if (/[^0-9a-z\-.]/) {
      # load the mapping data
      unless (defined $uts46_imap) {
         require Unicode::Normalize;
         require "AnyEvent/Util/uts46data.pl";
      }

      # uts46 nameprep

      # I naively tried to use a regex/transliterate approach first,
      # with one regex and one y///, but the compiled code was 4.5MB.
      # this version has a bit-table for the valid class, and
      # a char-replacement search string

      # for speed (cough) reasons, we skip-case 0-9a-z, -, ., which
      # really ought to be trivially valid. A-Z is valid, but already lowercased.
      s{
         ([^0-9a-z\-.])
      }{

lib/AnyEvent/Util.pm view on Meta::CPAN


            (substr $uts46_imap, $rep, 128) =~ /\x00 .[\x80-\xbf]* ([^\x00]*) \x00/x
               or die "FATAL: idn_nameprep imap table has unexpected contents";

            $rep = $1;
            $chr = $rep unless $rep =~ s/^\x01// && $_[1]; # replace unless deviation and display
            utf8::decode $chr;
         }
         $chr
      }gex;

      # KC
      $_ = Unicode::Normalize::NFKC ($_);
   }

   # decode punycode components, check for invalid xx-- prefixes
   s{
      (^|\.)(..)--([^\.]*)
   }{
      my ($pfx, $ace, $pc) = ($1, $2, $3);

      if ($ace eq "xn") {
         $pc = punycode_decode $pc; # will croak on error (we hope :)

         require Unicode::Normalize;
         $pc eq Unicode::Normalize::NFC ($pc)
            or Carp::croak "$_[0]: punycode label not in NFC detected during idn_nameprep";

         "$pfx$pc"
      } elsif ($ace !~ /^[a-z0-9]{2}$/) {
         "$pfx$ace--$pc"
      } else {
         Carp::croak "$_[0]: hyphens in 3rd/4th position of a label are not allowed";
      }
   }gex;

   # uts46 verification
   /\.-|-\./
      and Carp::croak "$_[0]: invalid hyphens detected during idn_nameprep";

   # missing: label begin with combining mark, idna2008 bidi

   # now check validity of each codepoint
   if (/[^0-9a-z\-.]/) {
      # load the mapping data
      unless (defined $uts46_imap) {
         require "AnyEvent/Util/uts46data.pl";
      }

      vec $uts46_valid, ord, 1
         or $_[1] && 0 <= index $uts46_imap, pack "C0U*", 0, ord, 1 # deviation == \x00$chr\x01
         or Carp::croak "$_[0]: disallowed characters during idn_nameprep"
         for split //;
   }

   $_
}

=item $domainname = AnyEvent::Util::idn_to_ascii $idn

Converts the given unicode string (C<$idn>, international domain name,
e.g. æ—¥æœ¬èªžã€‚ï¼ªï¼°) to a pure-ASCII domain name (this is usually
called the "IDN ToAscii" transform). This transformation is idempotent,
which means you can call it just in case and it will do the right thing.

Unlike some other "ToAscii" implementations, this one works on full domain
names and should never fail - if it cannot convert the name, then it will
return it unchanged.

This function is an amalgam of IDNA2003, UTS#46 and IDNA2008 - it tries to
be reasonably compatible to other implementations, reasonably secure, as
much as IDNs can be secure, and reasonably efficient when confronted with
IDNs that are already valid DNS names.

=cut

sub idn_to_ascii($) {
   return $_[0]
      unless $_[0] =~ /[^\x00-\x7f]/;

   my @output;

   eval {
      # punycode by label
      for (split /\./, (idn_nameprep $_[0]), -1) {
         if (/[^\x00-\x7f]/) {
            eval {
               push @output, "xn--" . punycode_encode $_;
               1;
            } or do {
               push @output, $_;
            };
         } else {
            push @output, $_;
         }
      }

      1
   } or return $_[0];

   shift @output
      while !length $output[0] && @output > 1;

   join ".", @output
}

=item $idn = AnyEvent::Util::idn_to_unicode $idn

Converts the given unicode string (C<$idn>, international domain name,
e.g. æ—¥æœ¬èªžã€‚ï¼ªï¼°, www.deliantra.net, www.xn--l-0ga.de) to
unicode form (this is usually called the "IDN ToUnicode" transform). This
transformation is idempotent, which means you can call it just in case and
it will do the right thing.

Unlike some other "ToUnicode" implementations, this one works on full
domain names and should never fail - if it cannot convert the name, then
it will return it unchanged.

This function is an amalgam of IDNA2003, UTS#46 and IDNA2008 - it tries to
be reasonably compatible to other implementations, reasonably secure, as
much as IDNs can be secure, and reasonably efficient when confronted with
IDNs that are already valid DNS names.

At the moment, this function simply calls C<idn_nameprep $idn, 1>,
returning its argument when that function fails.

=cut

sub idn_to_unicode($) {
   my $res = eval { idn_nameprep $_[0], 1 };
   defined $res ? $res : $_[0]
}

=back

=head1 AUTHOR

 Marc Lehmann <schmorp@schmorp.de>
 http://anyevent.schmorp.de

=cut

1

( run in 0.547 second using v1.01-cache-2.11-cpan-39bf76dae61 )