HTML-ListScraper
view release on metacpan or search on metacpan
lib/HTML/ListScraper/Interactive.pm view on Meta::CPAN
182183184185186187188189190191192193194195196197198199200201202=item text
Include the plain text in the output.
=item index
Include tag positions in the output.
=back
The returned values are basically XHTML lines: opening tags, text with
quoted entities and closing tags. Tags are enclosed in angle
brackets. The returned values don't necessarily form a valid XML
fragment, though, i.e. because the input tags need not form a
tree.
When C<index> is set, tag values start with the tag's index, followed
by a tab. Next, spaces show indentation. An opening tag not identified
as missing a closing tag increases indentation by 2 spaces, a closing
tag decreases it back. An opening tag with missing closing tag is
output with '/' appended to its name. For the rules of associating
testdata/del.icio.us.html view on Meta::CPAN
12345678910<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
>
<html id=
"delicious"
>
<head>
<title>del.icio.us</title>
<meta http-equiv=
"Content-Type"
content=
"text/html; charset=UTF-8"
/>
<meta name=
"robots"
content=
"noarchive,nofollow,noindex"
/>
<
link
rel=
"stylesheet"
type=
"text/css"
href=
"/delicious.css?v=61E-123"
/>
<script type=
"text/javascript"
src=
"/ui/static/lib.js?v=61E-123"
></script>
<script type=
"text/javascript"
src=
"/ui/static/delicious.js?v=61E-123"
></script>
testdata/reddit.html view on Meta::CPAN
1234567891011<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
>
<html>
<head>
<meta http-equiv=
'Content-Type'
content=
'text/html; charset=UTF-8'
/>
<title>reddit.com: what&
#39;s new online</title>
<script src=
"/static/psrs.js"
language=
"javascript"
type=
"text/javascript"
></script>
<script src=
"/static/reddit.js"
language=
"javascript"
type=
'text/javascript'
></script>
<script language=
'javascript'
>var logged = false </script>
<script language=
'javascript'
> window.onload = init </script>
<
link
rel=
'stylesheet'
href=
'/static/styles.css'
type=
'text/css'
/>
<
link
rel=
'shortcut icon'
href=
'/favicon.ico'
type=
"image/x-icon"
/>
( run in 1.299 second using v1.01-cache-2.11-cpan-87723dcf8b7 )