HTML-StripScripts
view release on metacpan or search on metacpan
whitelist.
The default implementation simply replaces all ampersands with
"&", since that corresponds with the way most browsers treat
entities in unquoted values.
canonical_form_to_text ( TEXT )
This method is used to convert the text in canonical form returned
by the filter_text() method to a form suitable for inclusion in the
output document.
The default implementation runs anything that doesn't look like a
valid entity through the escape_html_metachars() method.
canonical_form_to_attval ( ATTVAL )
This method is used to convert the text in canonical form returned
by the "AttVal" handler subs to a form suitable for inclusion in
doublequotes in the output tag.
The default implementation converts CR, LF and TAB characters to a
single space, and runs anything that doesn't look like a valid
entity through the escape_html_metachars() method.
validate_href_attribute ( TEXT )
If the "AllowHref" filter configuration option is set, then this
method is used to validate "href" type attribute values. TEXT is the
attribute value in canonical form. Returns a possibly modified
attribute value (in canonical form) or "undef" to reject the
attribute.
The default implementation allows only absolute "http" and "https"
URLs, permits port numbers and query strings, and imposes reasonable
length limits.
It does not URI escape the query string, and it does not guarantee
properly formatted URIs, it just tries to give safe URIs. You can
always use an attribute callback (see "Attribute Callbacks") to
provide stricter handling.
validate_mailto ( TEXT )
If the "AllowMailto" filter configuration option is set, then this
method is used to validate "href" type attribute values which begin
with "mailto:". TEXT is the attribute value in canonical form.
Returns a possibly modified attribute value (in canonical form) or
"undef" to reject the attribute.
This uses a lightweight regex and does not guarantee that email
addresses are properly formatted. You can always use an attribute
callback (see "Attribute Callbacks") to provide stricter handling.
validate_src_attribute ( TEXT )
If the "AllowSrc" filter configuration option is set, then this
method is used to validate "src" type attribute values. TEXT is the
attribute value in canonical form. Returns a possibly modified
attribute value (in canonical form) or "undef" to reject the
attribute.
The default implementation behaves as validate_href_attribute().
OTHER METHODS TO OVERRIDE
As well as the output, reject, init and cdata methods listed above, it
might make sense for subclasses to override the following methods:
filter_text ( TEXT )
This method will be invoked to filter blocks of non-tag text in the
input document. Both input and output are in canonical form, see
"CANONICAL FORM".
The default implementation does no filtering.
escape_html_metachars ( TEXT )
This method is used to escape all HTML metacharacters in TEXT. The
return value must be a copy of TEXT with metacharacters escaped.
The default implementation escapes a minimal set of metacharacters
for security against XSS vulnerabilities. The set of characters to
escape is a compromise between the need for security and the need to
ensure that the filter will work for documents in as many different
character sets as possible.
Subclasses which make strong assumptions about the document
character set will be able to escape much more aggressively.
strip_nonprintable ( TEXT )
Returns a copy of TEXT with runs of nonprintable characters replaced
with spaces or some other harmless string. Avoids replacing anything
with the empty string, as that can lead to other security issues.
The default implementation strips out only NULL characters, in order
to avoid scrambling text for as many different character sets as
possible.
Subclasses which make some sort of assumption about the character
set in use will be able to have a much wider definition of a
nonprintable character, and hence a more secure strip_nonprintable()
implementation.
ATTRIBUTE VALUE HANDLER SUBS
References to the following subs appear in the "AttVal" whitelist
returned by the init_attval_whitelist() method.
_hss_attval_style( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value hander for the "style" attribute.
_hss_attval_size ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for attributes who's values are some sort of
size or length.
_hss_attval_number ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for attributes who's values are a simple
integer.
_hss_attval_color ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for color attributes.
_hss_attval_text ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for text attributes.
_hss_attval_word ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
Attribute value handler for attributes who's values must consist of
a single short word, with minus characters permitted.
( run in 0.513 second using v1.01-cache-2.11-cpan-02777c243ea )