HTML-EntityReference

 view release on metacpan or  search on metacpan

lib/HTML/EntityReference.pm  view on Meta::CPAN

    #   General Punctuation
    ensp    => 8194, # en space
    emsp    => 8195, # em space
    thinsp  => 8201, # thin space
    zwnj    => 8204, # zero width non-joiner
    zwj     => 8205, # zero width joiner
    lrm     => 8206, # left-to-right mark
    rlm     => 8207, # right-to-left mark
    ndash   => 8211, # en dash
    mdash   => 8212, # em dash
    lsquo   => 8216, # left single quotation mark
    rsquo   => 8217, # right single quotation mark
    sbquo   => 8218, # single low-9 quotation mark
    ldquo   => 8220, # left double quotation mark
    rdquo   => 8221, # right double quotation mark
    bdquo   => 8222, # double low-9 quotation mark
    dagger  => 8224, # dagger
    Dagger  => 8225, # double dagger
    permil  => 8240, # per mille sign
    lsaquo  => 8249, # single left-pointing angle quotation mark
        # lsaquo is proposed but not yet ISO standardized
    rsaquo  => 8250, # single right-pointing angle quotation mark
        # rsaquo is proposed but not yet ISO standardized
    euro   => 8364,  # euro sign
    );

our %HTML5_draft;

=head2 %HTML5_draft

The package variable C<%HTML5_draft> contains the entities defined as part of the HTML5 standard, a work in progress.  These are taken from L<http://dev.w3.org/html5/spec/named-character-references.html#named-character-references>.  This is loaded on...

Unlike the existing standard HTML Entity chart, this chart contains some entries that expand to more than one code point.  They can be combining characters, variation selectors, and in a couple cases really are two separate characters.

=head2 other charts

Others will be added.

=head2 custom charts

You can pass your own chart data to the various functions, to be used instead of or in addtion to the built-in charts.  Do this by passing a reference to the hash as an element in the I<include> or I<exclude> list.

In addition to adding your own custom entities, you can also duplicate existing entities in order to override what gets generated (e.g. precomposed vs decomposed form), or provide priority in inverse lookups.

(This might work in this version but has not been tested yet)

=cut
    
## >> Other charts will go here.


my %arg_map= (
    HTML4 => \%W3C_Entities,
    HTML5_draft => [ \%HTML5_draft,  "HTML/Entity-HTML5_draft.pl.inc" ],
    ':all' => [qw/ HTML4 HTML5_draft /]
    );


=head1 Functions

The function calls also provide for an easy way to check multiple tables in one go.  They also abstract the way data is actually stored, and provide handling of simple cases, and take care of busy details that you might not have thought of like multi...

=head2 (parameters)

In general, the functions take the thing to be converted as the first parameter, and can take one or two additonal optional arguments.  Only the C<format> function doesn't follow this pattern exactly, taking another parameter first.

The second parameter specifies the chart or charts to use.  This is commonly referred to as the C<include> parameter.  That's because the 3rd works the same way but specifies things to C<exclude>.

The C<include> parameter may be a string or an array reference.  The string is the name of a chart or the name of a bundle.  The chart names available are C<"HTML4"> and C<"HTML5_draft">.  The only bundle name available is C<":all">.  Others will be ...

If you have more to say than just one string, you can use an array reference instead.  Each element of the array can be a string as explained above.  An item can also be a hash reference, which is a custom chart.

If more than one item is given as the include parameter, they are checked in order until something is found or the list exhausted.

The C<exclude> parameter is not implemented yet.

=cut

sub _next_arg
 {
 my $arglist= shift;
 my $arg= shift(@$arglist) // return ;  # pop off next argument
 return $arg  if ref($arg);  # user put table ref directly in list, not a name.
 if ($arg =~ /^:/) {
    # it is a name for more arguments
    my $list= $arg_map{$arg} // croak "No such option $arg.";
    unshift @$arglist, @$list;
    $arg= shift(@$arglist);
    }
 # look up the argument, and load if necessary.
 my $value= $arg_map{$arg} // croak "No such table $arg.";
 if (ref $value eq 'ARRAY') {  # as opposed to a hash
    # it is a delay load entry
    my ($table, $name)= @$value;
    require $name unless %$table;
    $arg_map{$arg}= $table;  # don't check again next time.
    $value= $table;
    }
 return $value;
 }

=head2 ordinal

Calling C<$n=HTML::EntityReference::ordinal($entity);> is simply the same as looking it up in the data hash: C<$n=$HTML::EntityReference::W3C_Entities{$entity};>.  It will return the code point if the C<$entity> is listed, or C<undef> otherwise.

The return value is normally a number, the integer value of the code point that the entity refers to.  In the case of multi-valued entities, the return value is an array reference.

=cut

sub ordinal
 {
 my ($entity, $include, $exclude)= @_;
 # >> TODO: handle excludes
 return $W3C_Entities{$entity}  unless defined $include;  # default meaning if no argument
 $include= [ $include ]  unless ref $include;  # single name allowed to be given directly
 while (my $table= _next_arg($include)) {
    my $val= $$table{$entity};
    return $val if defined $val;
    }
 return;  # not found anywhere it looked.
 }



( run in 1.272 second using v1.01-cache-2.11-cpan-97f6503c9c8 )