Alvis-Convert
view release on metacpan or search on metacpan
t/test-data/to-split/29.xml view on Meta::CPAN
<?xml version="1.0" encoding="UTF-8"?>
<documentCollection xmlns="http://alvis.info/enriched/" version="1.1">
<documentRecord id="A4AFC8E9BD3073A4EFADEB400B80D54A" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1146649940912</modifiedDate>
<httpServer>Apache/1.3.34 (Unix) mod_fastcgi/2.4.2 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.2 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7i</httpServer>
<urls>
<url>http://www.searchenginejournal.com/?p=3363</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Yahooâs YPN Says No to MySpace Traffic If you use MySpace profiles, blogs, comments, and mailings to spam or influence the teenie boppers over at MySpace to clickover to your website and that MySpace traffic is a major source of yo...
<metaData>
<meta name="title">Yahooâs YPN Says No to MySpace Traffic</meta>
<meta name="dc:type">text/html; charset=utf-8</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Jen Slegg</anchorText>
<location>http://www.jensense.com/archives/2006/05/myspacecom_and.html</location>
</link>
<link type="a">
<anchorText>Problogger.net</anchorText>
<location>http://www.problogger.net/archives/2006/05/03/yahoo-publisher-network-terminates-more-publisher-accounts/</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Yahoo</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Yahoo Search Marketing</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Yahoo Publisher Network</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google AdSense</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="A62EEF2D8BE45A8D097087B515598C68" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1148355445154</modifiedDate>
<httpServer>Apache/1.3.34 (Unix) DAV/1.0.3 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a</httpServer>
<urls>
<url>http://battellemedia.com/archives/002584.php</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Two items of very related interest today: 1. Wired News Releases Full Text of AT&T NSA Document (Slashdot). 2. Gonzales Says Publishing Leaks Is A Crime (Also Slashdot) Thank God for outlets like Wired. And best of luck.</section...
<metaData>
<meta name="title">Wired News: Will the US Sue?</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Gonzales Says Publishing Leaks Is A Crime</anchorText>
<location>http://yro.slashdot.org/article.pl?sid=06/05/22/1039257&from=rss</location>
</link>
<link type="a">
<anchorText>Wired News Releases Full Text of AT&T NSA Document</anchorText>
<location>http://yro.slashdot.org/article.pl?sid=06/05/22/132206</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="FF2C88E89A1DDFE4F8CD4845EEC285E3" xmlns="http://alvis.info/enriched/">
t/test-data/to-split/29.xml view on Meta::CPAN
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="7F0D97BDACC9D73DA79364ADF93A9080" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1144768340466</modifiedDate>
<httpServer>Apache/1.3.28 (Unix) mod_gzip/1.3.26.1a PHP/4.3.10 mod_ssl/2.8.15 OpenSSL/0.9.7c</httpServer>
<urls>
<url>http://www.seroundtable.com/archives/003639.html</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>There is a DigitalPoint Forum thread named that discusses a neat PageRank tool at http://www.webmastereyes.com/. The PageRank tool is different from others, in that it will enable you to plug in a URL and it will then place graphical...
<metaData>
<meta name="title">New Google PageRank Tool Plots PR Values Overlays On Page</meta>
<meta name="dc:date">Tue, 11 Apr 2006 12:40:49 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>http://www.webmastereyes.com/</anchorText>
<location>http://www.webmastereyes.com/</location>
</link>
<link type="a">
<anchorText>thread</anchorText>
<location>http://forums.digitalpoint.com/showthread.php?t=74054</location>
</link>
<link type="a">
<anchorText>DigitalPoint Forums</anchorText>
<location>http://forums.digitalpoint.com/showthread.php?t=74054</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Digital</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google PageRank</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="E25E5DBF90E6C6A3CDF200F61F6A20E6" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1150315246240</modifiedDate>
<httpServer>Apache/1.3.36 (Unix) mod_fastcgi/2.4.2 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.2 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.27 OpenSSL/0.9.7a</httpServer>
<urls>
<url>http://www.searchenginejournal.com/?p=3530</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>RSS - Things That Make You Go Hmmm Why doesnât the new Yahoo Spark Blog publish an RSS feed? Of any kind? Not even an âadd to my Yahooâ button? Why canât I subscribe to the Technorati Hot Tags widget thatâs (supposedly) upd...
<metaData>
<meta name="title">RSS - Things That Make You Go Hmmm</meta>
<meta name="dc:type">text/html; charset=utf-8</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Technorati Hot Tags</anchorText>
<location>http://www.technorati.com/tags/</location>
</link>
<link type="a">
<anchorText>eBay</anchorText>
<location>http://www2.ebay.com/aw/core/200603200913002.html</location>
</link>
<link type="a">
<anchorText>Yahoo Spark Blog</anchorText>
<location>http://dir.yahoo.com/thespark/240/peek-through-the-pinhole</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Yahoo</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Technorati</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="070E7EB628CC943FBF90E7C6A703D9B2" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1149606759016</modifiedDate>
<httpServer>Apache/1.3.28 (Unix) mod_gzip/1.3.26.1a PHP/4.3.10 mod_ssl/2.8.15 OpenSSL/0.9.7c</httpServer>
<urls>
<url>http://www.seroundtable.com/archives/003894.html</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Any SEO/M will tell you their job description sucks because in the process of describing exactly what they do, they nearly always watch the listener's eyes glaze over, waiting for a topic that may make better sense. Same thing with u...
<metaData>
<meta name="title">Officer Usability and General SEO</meta>
<meta name="dc:date">Mon, 05 Jun 2006 11:52:34 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Should links still be underlined and blue?</anchorText>
<location>http://www.cre8asiteforums.com/forums/index.php?s=&showtopic=36893&view=findpost&p=181570</location>
</link>
<link type="a">
<anchorText>Contextual Usability?</anchorText>
<location>http://www.cre8asiteforums.com/forums/index.php?s=&showtopic=37336&view=findpost&p=183860</location>
</link>
<link type="a">
<anchorText>Features don't matter anymore, Welcome to the Age of User Experience</anchorText>
<location>http://www.cre8asiteforums.com/forums/index.php?s=&showtopic=37237&view=findpost&p=183418</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
( run in 2.036 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )