HTML-StripScripts

 view release on metacpan or  search on metacpan

lib/HTML/StripScripts.pm  view on Meta::CPAN


    return $1
        if $self->{_hssCfg}{AllowRelURL}
        and $text =~ /^((?:[\w\-.!~*|;\/?=+\$,%#]|&){0,100})$/;

    $text =~ m< ^ ( https? :// [\w\-\.]{1,100} (?:\:\d{1,5})?
                    (?: / (?:[\w\-.!~*|;/?=+\$,%#]|&amp;){0,100} )?
                  )
                $
              >x ? $1 : undef;
}

=item validate_mailto ( TEXT )

If the C<AllowMailto> filter configuration option is set, then this
method is used to validate C<href> type attribute values which begin
with C<mailto:>.  TEXT is the attribute value in canonical form.
Returns a possibly modified attribute value (in canonical form) or C<undef>
to reject the attribute.

This uses a lightweight regex and does not guarantee that email
addresses are properly formatted. You can
always use an attribute callback (see L<"Attribute Callbacks">)
to provide stricter handling.

=cut

sub validate_mailto {
    my ( $self, $text ) = @_;

    return $1
        if $text =~ m/^(
            mailto:[\w\-!#\$%&'*+-\/=?^_`{|}~.]{1,64}    # localpart
            \@                                          # @
            [\w\-\.]{1,100}                             # domain
            (?:                                         # opt query string
                \?
                (?:[\w\-.!~*|;\/?=+\$,%#]|&amp;){0,100}
            )?
            )$/x;
    return;
}

=item validate_src_attribute ( TEXT )

If the C<AllowSrc> filter configuration option is set, then this
method is used to validate C<src> type attribute values.  TEXT is
the attribute value in canonical form.  Returns a possibly modified
attribute value (in canonical form) or C<undef> to reject the attribute.

The default implementation behaves as validate_href_attribute().

=cut

*validate_src_attribute = \&validate_href_attribute;

=back

=head1 OTHER METHODS TO OVERRIDE

As well as the output, reject, init and cdata methods listed above,
it might make sense for subclasses to override the following methods:

=over

=item filter_text ( TEXT )

This method will be invoked to filter blocks of non-tag text in the
input document.  Both input and output are in canonical form, see
L</"CANONICAL FORM">.

The default implementation does no filtering.

=cut

sub filter_text {
    my ( $self, $text ) = @_;

    return $text;
}

=item escape_html_metachars ( TEXT )

This method is used to escape all HTML metacharacters in TEXT.
The return value must be a copy of TEXT with metacharacters escaped.

The default implementation escapes a minimal set of
metacharacters for security against XSS vulnerabilities.  The set
of characters to escape is a compromise between the need for
security and the need to ensure that the filter will work for
documents in as many different character sets as possible.

Subclasses which make strong assumptions about the document
character set will be able to escape much more aggressively.

=cut

use vars qw(%_Escape_HTML_map);

BEGIN {
    %_Escape_HTML_map = ( '&' => '&amp;',
                          '<' => '&lt;',
                          '>' => '&gt;',
                          '"' => '&quot;',
                          "'" => '&#39;',
    );
}

sub escape_html_metachars {
    my ( $self, $text ) = @_;

    $text =~ s#([&<>"'])# $_Escape_HTML_map{$1} #ge;
    return $text;
}

=item strip_nonprintable ( TEXT )

Returns a copy of TEXT with runs of nonprintable characters replaced
with spaces or some other harmless string.  Avoids replacing anything
with the empty string, as that can lead to other security issues.



( run in 0.343 second using v1.01-cache-2.11-cpan-b50b6a40fd4 )