Alvis-Convert
view release on metacpan or search on metacpan
t/test-data/to-split/29.xml view on Meta::CPAN
<location>http://www.problogger.net/archives/2006/05/03/yahoo-publisher-network-terminates-more-publisher-accounts/</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Yahoo</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Yahoo Search Marketing</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Yahoo Publisher Network</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google AdSense</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="A62EEF2D8BE45A8D097087B515598C68" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1148355445154</modifiedDate>
<httpServer>Apache/1.3.34 (Unix) DAV/1.0.3 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a</httpServer>
<urls>
<url>http://battellemedia.com/archives/002584.php</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Two items of very related interest today: 1. Wired News Releases Full Text of AT&T NSA Document (Slashdot). 2. Gonzales Says Publishing Leaks Is A Crime (Also Slashdot) Thank God for outlets like Wired. And best of luck.</section...
<metaData>
<meta name="title">Wired News: Will the US Sue?</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Gonzales Says Publishing Leaks Is A Crime</anchorText>
<location>http://yro.slashdot.org/article.pl?sid=06/05/22/1039257&from=rss</location>
</link>
<link type="a">
<anchorText>Wired News Releases Full Text of AT&T NSA Document</anchorText>
<location>http://yro.slashdot.org/article.pl?sid=06/05/22/132206</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="FF2C88E89A1DDFE4F8CD4845EEC285E3" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1142938329956</modifiedDate>
<httpServer>Apache</httpServer>
<urls>
<url>http://searchenginewatch.com/searchday/article.php/3592876</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>At long last, Google has launched its ownGoogle Finance service. For years, those seeking specialty financial information via Google have been sent to competitors such as Yahoo and MSN. Now Google's providing financial information di...
<metaData>
<meta name="title">Google Launches Google Finance</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>
wrote</anchorText>
<location>http://searchenginewatch.com/_subscribers/articles/article.php/3353401</location>
</link>
<link type="a">
<anchorText>Google Groups</anchorText>
<location>http://groups.google.com/</location>
</link>
<link type="a">
<anchorText>
Forrester</anchorText>
<location>http://searchenginewatch.com/_subscribers/updates/article.php/3326461#forrester</location>
</link>
<link type="a">
<anchorText>
enhancements</anchorText>
<location>http://searchenginewatch.com/searchday/article.php/2160891</location>
</link>
<link type="a">
<anchorText>gap</anchorText>
<location>http://searchenginewatch.com/sereport/article.php/2164171</location>
</link>
<link type="a">
<anchorText>members</anchorText>
<location>http://searchenginewatch.com/benefits/article.php</location>
</link>
<link type="a">
<anchorText>search
wars</anchorText>
<location>http://searchenginewatch.com/searchday/article.php/3347181</location>
</link>
<link type="a">
<anchorText>house move</anchorText>
<location>http://blog.searchenginewatch.com/blog/060320-184234</location>
</link>
<link type="a">
<anchorText>OneBox</anchorText>
<location>http://www.google.com/help/interpret.html#H</location>
</link>
<link type="a">
<anchorText>Google Finance</anchorText>
<location>http://finance.google.com/</location>
</link>
<link type="a">
<anchorText>http://finance.google.com</anchorText>
<location>http://finance.google.com</location>
</link>
<link type="a">
<anchorText>introduced</anchorText>
<location>http://searchenginewatch.com/searchday/article.php/3506541</location>
</link>
<link type="a">
<anchorText>
t/test-data/to-split/29.xml view on Meta::CPAN
<documentRecord id="35D3C71D8D04A7A782CD2E8CBF17220C" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1144681935588</modifiedDate>
<httpServer>Apache/1.3.28 (Unix) mod_gzip/1.3.26.1a PHP/4.3.10 mod_ssl/2.8.15 OpenSSL/0.9.7c</httpServer>
<urls>
<url>http://www.seroundtable.com/archives/003633.html</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>A featured Search Engine Watch Forum thread named SEO & Newspapers discusses a recent NYTimes article named This Boring Headline Is Written for Google. The first paragraph of the article somes it up; Journalists over the years ha...
<metaData>
<meta name="title">New York Times Changes Web Only Headlines To Be Search Engine Friendly</meta>
<meta name="dc:date">Mon, 10 Apr 2006 13:37:11 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Search Engine Watch Forums</anchorText>
<location>http://forums.searchenginewatch.com/showthread.php?threadid=11001</location>
</link>
<link type="a">
<anchorText>SEO & Newspapers</anchorText>
<location>http://forums.searchenginewatch.com/showthread.php?threadid=11001</location>
</link>
<link type="a">
<anchorText>explains</anchorText>
<location>http://blog.searchenginewatch.com/blog/060410-090051</location>
</link>
<link type="a">
<anchorText>This Boring Headline Is Written for Google</anchorText>
<location>http://www.nytimes.com/2006/04/09/weekinreview/09lohr.html?ex=1302235200&en=86fd20f27aa1d645&ei=5090&partner=rssuserland&emc=rss</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Danny Sullivan</form><named_entity_type>person</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Yahoo</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>NYTimes</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>MSN</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>MSN</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="B4158BE3ACF2447B8B2FF1AFFB5361A0" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1147168350172</modifiedDate>
<httpServer>Apache</httpServer>
<urls>
<url>http://searchenginewatch.com/searchday/article.php/3603301</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Paying attention to web metrics is an increasingly important aspect of search marketing, with methodologies, processes and tools that can dramatically lift marketing and business performance. A special report from the Search Engine S...
<metaData>
<meta name="title">Multichannel Metrics: Managing the Sea of Data</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText> Grantastic Designs, Inc.</anchorText>
<location>http://www.grantasticdesigns.com/</location>
</link>
<link type="a">
<anchorText>Search Engine Visibility</anchorText>
<location>http://www.searchenginesbook.com</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Eric Peterson</form><named_entity_type>person</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Shari Thurow</form><named_entity_type>person</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Pete</form><named_entity_type>person</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>WebSideStory</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Visual Sciences</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="6373E6ED154F42639933FA99BCE915DB" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1149760377185</modifiedDate>
<httpServer>Apache/2.0</httpServer>
<urls>
<url>http://google.weblogsinc.com/2006/06/06/google-getting-sued-in-france-by-book-publisher/</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Google is getting sued again from another book publisher. What else is new? These book publishers do not like Google to use excerpts from their books without permission. Even though they might be making additional sales from individu...
<metaData>
<meta name="title">Google getting sued in France by book publisher</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Alexander</anchorText>
<location>http://www.mobileread.com</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Google Inc</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google France</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
( run in 1.410 second using v1.01-cache-2.11-cpan-e1769b4cff6 )