Alvis-Convert
view release on metacpan or search on metacpan
bin/html2alvis view on Meta::CPAN
Options:
--html-ext HTML file identifying filename extension
--meta-ext meta file identifying filename extension
--out-dir output directory
--N-per-out-dir # of records per output directory
--meta-encoding the encoding of the meta files
--html-encoding the encoding of all HTML files
--html-encoding-from-meta take the encoding of the HTML files from
the meta files (attribute 'detected-charset')
--[no]original include original document?
--help brief help message
--man full documentation
--[no]warnings warnings output flag
=head1 OPTIONS
=over 8
=item B<--html-ext>
lib/Alvis/Document/Meta.pm view on Meta::CPAN
$self->{attr}{url}=$value;
}
elsif ($name=~/^\s*date\s*$/isgo)
{
$self->{attr}{date}=$value;
}
elsif ($name=~/^\s*title\s*$/isgo)
{
$self->{attr}{title}=$value;
}
elsif ($name=~/^\s*detected\s*\-\s*charset\s*$/isgo)
{
$self->{attr}{detectedCharSet}=$value;
}
elsif ($name=~/^\s*Meta\-\s*(.*)$/isgo)
{
my $metafield=$1;
$metafield=lc($metafield);
if (exists($MetaMap{$metafield}))
{
lib/Alvis/Document/Meta.pm view on Meta::CPAN
See the source for the exact mapping from HTML header fields to DC.
Syntax of the meta information file:
<feature name>\t<feature value>\n
"Special" field names are
url
title
date
detected-charset
=head1 METHODS
=head2 new()
Options:
text The text of a meta information file.
=head2 parse($meta)
Maps the features to the Dublin Core set (dc:title etc.).
"Special" field names are
url
title
date
detected-charset
=head2 get_dcs()
Returns all Dublin Core mapped features as
([<name>,<value>],[<name>,<value>],...)
=head2 get($param)
Returns the setting for the attribute.
"Special" parameters are
t/test-data/to-split/29.xml view on Meta::CPAN
<modifiedDate>1146649940912</modifiedDate>
<httpServer>Apache/1.3.34 (Unix) mod_fastcgi/2.4.2 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.2 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7i</httpServer>
<urls>
<url>http://www.searchenginejournal.com/?p=3363</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Yahooâs YPN Says No to MySpace Traffic If you use MySpace profiles, blogs, comments, and mailings to spam or influence the teenie boppers over at MySpace to clickover to your website and that MySpace traffic is a major source of yo...
<metaData>
<meta name="title">Yahooâs YPN Says No to MySpace Traffic</meta>
<meta name="dc:type">text/html; charset=utf-8</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Jen Slegg</anchorText>
<location>http://www.jensense.com/archives/2006/05/myspacecom_and.html</location>
</link>
<link type="a">
<anchorText>Problogger.net</anchorText>
<location>http://www.problogger.net/archives/2006/05/03/yahoo-publisher-network-terminates-more-publisher-accounts/</location>
t/test-data/to-split/29.xml view on Meta::CPAN
<modifiedDate>1150315246240</modifiedDate>
<httpServer>Apache/1.3.36 (Unix) mod_fastcgi/2.4.2 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.2 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.27 OpenSSL/0.9.7a</httpServer>
<urls>
<url>http://www.searchenginejournal.com/?p=3530</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>RSS - Things That Make You Go Hmmm Why doesnât the new Yahoo Spark Blog publish an RSS feed? Of any kind? Not even an âadd to my Yahooâ button? Why canât I subscribe to the Technorati Hot Tags widget thatâs (supposedly) upd...
<metaData>
<meta name="title">RSS - Things That Make You Go Hmmm</meta>
<meta name="dc:type">text/html; charset=utf-8</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Technorati Hot Tags</anchorText>
<location>http://www.technorati.com/tags/</location>
</link>
<link type="a">
<anchorText>eBay</anchorText>
<location>http://www2.ebay.com/aw/core/200603200913002.html</location>
( run in 0.246 second using v1.01-cache-2.11-cpan-4d50c553e7e )