HTML-StripScripts

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN

        whitelist.

        The default implementation simply replaces all ampersands with
        "&", since that corresponds with the way most browsers treat
        entities in unquoted values.

    canonical_form_to_text ( TEXT )
        This method is used to convert the text in canonical form returned
        by the filter_text() method to a form suitable for inclusion in the
        output document.

        The default implementation runs anything that doesn't look like a
        valid entity through the escape_html_metachars() method.

    canonical_form_to_attval ( ATTVAL )
        This method is used to convert the text in canonical form returned
        by the "AttVal" handler subs to a form suitable for inclusion in
        doublequotes in the output tag.

        The default implementation converts CR, LF and TAB characters to a
        single space, and runs anything that doesn't look like a valid
        entity through the escape_html_metachars() method.

    validate_href_attribute ( TEXT )
        If the "AllowHref" filter configuration option is set, then this
        method is used to validate "href" type attribute values. TEXT is the
        attribute value in canonical form. Returns a possibly modified
        attribute value (in canonical form) or "undef" to reject the
        attribute.

        The default implementation allows only absolute "http" and "https"
        URLs, permits port numbers and query strings, and imposes reasonable
        length limits.

        It does not URI escape the query string, and it does not guarantee
        properly formatted URIs, it just tries to give safe URIs. You can
        always use an attribute callback (see "Attribute Callbacks") to
        provide stricter handling.

    validate_mailto ( TEXT )
        If the "AllowMailto" filter configuration option is set, then this
        method is used to validate "href" type attribute values which begin
        with "mailto:". TEXT is the attribute value in canonical form.
        Returns a possibly modified attribute value (in canonical form) or
        "undef" to reject the attribute.

        This uses a lightweight regex and does not guarantee that email
        addresses are properly formatted. You can always use an attribute
        callback (see "Attribute Callbacks") to provide stricter handling.

    validate_src_attribute ( TEXT )
        If the "AllowSrc" filter configuration option is set, then this
        method is used to validate "src" type attribute values. TEXT is the
        attribute value in canonical form. Returns a possibly modified
        attribute value (in canonical form) or "undef" to reject the
        attribute.

        The default implementation behaves as validate_href_attribute().

OTHER METHODS TO OVERRIDE
    As well as the output, reject, init and cdata methods listed above, it
    might make sense for subclasses to override the following methods:

    filter_text ( TEXT )
        This method will be invoked to filter blocks of non-tag text in the
        input document. Both input and output are in canonical form, see
        "CANONICAL FORM".

        The default implementation does no filtering.

    escape_html_metachars ( TEXT )
        This method is used to escape all HTML metacharacters in TEXT. The
        return value must be a copy of TEXT with metacharacters escaped.

        The default implementation escapes a minimal set of metacharacters
        for security against XSS vulnerabilities. The set of characters to
        escape is a compromise between the need for security and the need to
        ensure that the filter will work for documents in as many different
        character sets as possible.

        Subclasses which make strong assumptions about the document
        character set will be able to escape much more aggressively.

    strip_nonprintable ( TEXT )
        Returns a copy of TEXT with runs of nonprintable characters replaced
        with spaces or some other harmless string. Avoids replacing anything
        with the empty string, as that can lead to other security issues.

        The default implementation strips out only NULL characters, in order
        to avoid scrambling text for as many different character sets as
        possible.

        Subclasses which make some sort of assumption about the character
        set in use will be able to have a much wider definition of a
        nonprintable character, and hence a more secure strip_nonprintable()
        implementation.

ATTRIBUTE VALUE HANDLER SUBS
    References to the following subs appear in the "AttVal" whitelist
    returned by the init_attval_whitelist() method.

    _hss_attval_style( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value hander for the "style" attribute.

    _hss_attval_size ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for attributes who's values are some sort of
        size or length.

    _hss_attval_number ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for attributes who's values are a simple
        integer.

    _hss_attval_color ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for color attributes.

    _hss_attval_text ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for text attributes.

    _hss_attval_word ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for attributes who's values must consist of
        a single short word, with minus characters permitted.



( run in 0.513 second using v1.01-cache-2.11-cpan-02777c243ea )