HTML-EntityReference
view release on metacpan or search on metacpan
lib/HTML/EntityReference.pm view on Meta::CPAN
# General Punctuation
ensp => 8194, # en space
emsp => 8195, # em space
thinsp => 8201, # thin space
zwnj => 8204, # zero width non-joiner
zwj => 8205, # zero width joiner
lrm => 8206, # left-to-right mark
rlm => 8207, # right-to-left mark
ndash => 8211, # en dash
mdash => 8212, # em dash
lsquo => 8216, # left single quotation mark
rsquo => 8217, # right single quotation mark
sbquo => 8218, # single low-9 quotation mark
ldquo => 8220, # left double quotation mark
rdquo => 8221, # right double quotation mark
bdquo => 8222, # double low-9 quotation mark
dagger => 8224, # dagger
Dagger => 8225, # double dagger
permil => 8240, # per mille sign
lsaquo => 8249, # single left-pointing angle quotation mark
# lsaquo is proposed but not yet ISO standardized
rsaquo => 8250, # single right-pointing angle quotation mark
# rsaquo is proposed but not yet ISO standardized
euro => 8364, # euro sign
);
our %HTML5_draft;
=head2 %HTML5_draft
The package variable C<%HTML5_draft> contains the entities defined as part of the HTML5 standard, a work in progress. These are taken from L<http://dev.w3.org/html5/spec/named-character-references.html#named-character-references>. This is loaded on...
Unlike the existing standard HTML Entity chart, this chart contains some entries that expand to more than one code point. They can be combining characters, variation selectors, and in a couple cases really are two separate characters.
=head2 other charts
Others will be added.
=head2 custom charts
You can pass your own chart data to the various functions, to be used instead of or in addtion to the built-in charts. Do this by passing a reference to the hash as an element in the I<include> or I<exclude> list.
In addition to adding your own custom entities, you can also duplicate existing entities in order to override what gets generated (e.g. precomposed vs decomposed form), or provide priority in inverse lookups.
(This might work in this version but has not been tested yet)
=cut
## >> Other charts will go here.
my %arg_map= (
HTML4 => \%W3C_Entities,
HTML5_draft => [ \%HTML5_draft, "HTML/Entity-HTML5_draft.pl.inc" ],
':all' => [qw/ HTML4 HTML5_draft /]
);
=head1 Functions
The function calls also provide for an easy way to check multiple tables in one go. They also abstract the way data is actually stored, and provide handling of simple cases, and take care of busy details that you might not have thought of like multi...
=head2 (parameters)
In general, the functions take the thing to be converted as the first parameter, and can take one or two additonal optional arguments. Only the C<format> function doesn't follow this pattern exactly, taking another parameter first.
The second parameter specifies the chart or charts to use. This is commonly referred to as the C<include> parameter. That's because the 3rd works the same way but specifies things to C<exclude>.
The C<include> parameter may be a string or an array reference. The string is the name of a chart or the name of a bundle. The chart names available are C<"HTML4"> and C<"HTML5_draft">. The only bundle name available is C<":all">. Others will be ...
If you have more to say than just one string, you can use an array reference instead. Each element of the array can be a string as explained above. An item can also be a hash reference, which is a custom chart.
If more than one item is given as the include parameter, they are checked in order until something is found or the list exhausted.
The C<exclude> parameter is not implemented yet.
=cut
sub _next_arg
{
my $arglist= shift;
my $arg= shift(@$arglist) // return ; # pop off next argument
return $arg if ref($arg); # user put table ref directly in list, not a name.
if ($arg =~ /^:/) {
# it is a name for more arguments
my $list= $arg_map{$arg} // croak "No such option $arg.";
unshift @$arglist, @$list;
$arg= shift(@$arglist);
}
# look up the argument, and load if necessary.
my $value= $arg_map{$arg} // croak "No such table $arg.";
if (ref $value eq 'ARRAY') { # as opposed to a hash
# it is a delay load entry
my ($table, $name)= @$value;
require $name unless %$table;
$arg_map{$arg}= $table; # don't check again next time.
$value= $table;
}
return $value;
}
=head2 ordinal
Calling C<$n=HTML::EntityReference::ordinal($entity);> is simply the same as looking it up in the data hash: C<$n=$HTML::EntityReference::W3C_Entities{$entity};>. It will return the code point if the C<$entity> is listed, or C<undef> otherwise.
The return value is normally a number, the integer value of the code point that the entity refers to. In the case of multi-valued entities, the return value is an array reference.
=cut
sub ordinal
{
my ($entity, $include, $exclude)= @_;
# >> TODO: handle excludes
return $W3C_Entities{$entity} unless defined $include; # default meaning if no argument
$include= [ $include ] unless ref $include; # single name allowed to be given directly
while (my $table= _next_arg($include)) {
my $val= $$table{$entity};
return $val if defined $val;
}
return; # not found anywhere it looked.
}
( run in 1.272 second using v1.01-cache-2.11-cpan-97f6503c9c8 )