HTML-ListScraper

 view release on metacpan or  search on metacpan

lib/HTML/ListScraper/Interactive.pm  view on Meta::CPAN

182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
=item text
 
Include the plain text in the output.
 
=item index
 
Include tag positions in the output.
 
=back
 
The returned values are basically XHTML lines: opening tags, text with
quoted entities and closing tags. Tags are enclosed in angle
brackets. The returned values don't necessarily form a valid XML
fragment, though, i.e. because the input tags need not form a
tree.
 
When C<index> is set, tag values start with the tag's index, followed
by a tab. Next, spaces show indentation. An opening tag not identified
as missing a closing tag increases indentation by 2 spaces, a closing
tag decreases it back. An opening tag with missing closing tag is
output with '/' appended to its name. For the rules of associating

testdata/del.icio.us.html  view on Meta::CPAN

1
2
3
4
5
6
7
8
9
10
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html id="delicious">
<head>
        <title>del.icio.us</title>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
        <meta name="robots" content="noarchive,nofollow,noindex"/>
        <link rel="stylesheet" type="text/css" href="/delicious.css?v=61E-123"/>
 
        <script type="text/javascript" src="/ui/static/lib.js?v=61E-123"></script>
        <script type="text/javascript" src="/ui/static/delicious.js?v=61E-123"></script>

testdata/reddit.html  view on Meta::CPAN

1
2
3
4
5
6
7
8
9
10
11
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />
<title>reddit.com: what&#39;s new online</title>
<script src="/static/psrs.js" language="javascript" type="text/javascript"></script>
<script src="/static/reddit.js" language="javascript" type='text/javascript'></script>
<script language='javascript'>var logged = false </script>
<script language='javascript'> window.onload = init </script>
<link rel='stylesheet' href='/static/styles.css' type='text/css' />
<link rel='shortcut icon' href='/favicon.ico' type="image/x-icon" />



( run in 1.299 second using v1.01-cache-2.11-cpan-87723dcf8b7 )