Alvis-Convert
view release on metacpan or search on metacpan
t/test-data/to-split/29.xml view on Meta::CPAN
<location>http://www.problogger.net/archives/2006/05/03/yahoo-publisher-network-terminates-more-publisher-accounts/</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Yahoo</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Yahoo Search Marketing</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Yahoo Publisher Network</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google AdSense</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="A62EEF2D8BE45A8D097087B515598C68" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1148355445154</modifiedDate>
<httpServer>Apache/1.3.34 (Unix) DAV/1.0.3 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a</httpServer>
<urls>
<url>http://battellemedia.com/archives/002584.php</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Two items of very related interest today: 1. Wired News Releases Full Text of AT&T NSA Document (Slashdot). 2. Gonzales Says Publishing Leaks Is A Crime (Also Slashdot) Thank God for outlets like Wired. And best of luck.</section...
<metaData>
<meta name="title">Wired News: Will the US Sue?</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Gonzales Says Publishing Leaks Is A Crime</anchorText>
<location>http://yro.slashdot.org/article.pl?sid=06/05/22/1039257&from=rss</location>
</link>
<link type="a">
<anchorText>Wired News Releases Full Text of AT&T NSA Document</anchorText>
<location>http://yro.slashdot.org/article.pl?sid=06/05/22/132206</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="FF2C88E89A1DDFE4F8CD4845EEC285E3" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1142938329956</modifiedDate>
<httpServer>Apache</httpServer>
<urls>
<url>http://searchenginewatch.com/searchday/article.php/3592876</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>At long last, Google has launched its ownGoogle Finance service. For years, those seeking specialty financial information via Google have been sent to competitors such as Yahoo and MSN. Now Google's providing financial information di...
<metaData>
<meta name="title">Google Launches Google Finance</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>
wrote</anchorText>
<location>http://searchenginewatch.com/_subscribers/articles/article.php/3353401</location>
</link>
<link type="a">
<anchorText>Google Groups</anchorText>
<location>http://groups.google.com/</location>
</link>
<link type="a">
<anchorText>
Forrester</anchorText>
<location>http://searchenginewatch.com/_subscribers/updates/article.php/3326461#forrester</location>
</link>
<link type="a">
<anchorText>
enhancements</anchorText>
<location>http://searchenginewatch.com/searchday/article.php/2160891</location>
</link>
<link type="a">
<anchorText>gap</anchorText>
<location>http://searchenginewatch.com/sereport/article.php/2164171</location>
</link>
<link type="a">
<anchorText>members</anchorText>
<location>http://searchenginewatch.com/benefits/article.php</location>
</link>
<link type="a">
<anchorText>search
wars</anchorText>
<location>http://searchenginewatch.com/searchday/article.php/3347181</location>
</link>
<link type="a">
<anchorText>house move</anchorText>
<location>http://blog.searchenginewatch.com/blog/060320-184234</location>
</link>
<link type="a">
<anchorText>OneBox</anchorText>
<location>http://www.google.com/help/interpret.html#H</location>
</link>
<link type="a">
<anchorText>Google Finance</anchorText>
<location>http://finance.google.com/</location>
</link>
<link type="a">
<anchorText>http://finance.google.com</anchorText>
<location>http://finance.google.com</location>
</link>
<link type="a">
<anchorText>introduced</anchorText>
<location>http://searchenginewatch.com/searchday/article.php/3506541</location>
</link>
<link type="a">
<anchorText>
t/test-data/to-split/29.xml view on Meta::CPAN
<semantic_unit><named_entity><form>Gmail</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>MSN</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Yahoo News</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Yahoo Finance</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google Blog</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>MSN MoneyCentral</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Flash</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>way Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google Blog Search</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google News</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="48FFC0A03C2756C583F6D80C9E527393" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1142422246164</modifiedDate>
<httpServer>Apache/1.3.33 (Unix)</httpServer>
<urls>
<url>http://blog.outer-court.com/archive/2006-03-15-n42.html</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Google releases their desktop search tool in an updated version today. Among some bugfixes, thereâs a new Quick Search box. Hit Ctrl twice to make it appear in the middle of your desktop, and then search for anything â your compu...
<metaData>
<meta name="title">Google Desktop's Quick Search Box</meta>
<meta name="dc:date">Wed, 15 Mar 2006 11:20:57 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Quick Search box</anchorText>
<location>http://desktop.google.com/features.html#quicksearch</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Brin</form><named_entity_type>person</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google Desktop</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="18C9FD35812DFC4D4CCF0FD6AC1646BC" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1149133052555</modifiedDate>
<httpServer>Apache/1.3.33 (Unix)</httpServer>
<urls>
<url>http://blog.outer-court.com/archive/2006-05-30-n12.html</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Some bloggers are complaining that Google didnât have a Memorial day logo yesterday. Memorial Day âcommemorates U.S. men and women who have died in military service,âWikipedia explains. From a comment at Newsbusters by Warner T...
<metaData>
<meta name="title">Complaints Due to Lack of Google Memorial Day Logo</meta>
<meta name="dc:date">Thu, 01 Jun 2006 02:44:56 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>itâs good the way it is</anchorText>
<location>http://blog.lewrockwell.com/lewrw/archives/010666.html</location>
</link>
<link type="a">
<anchorText>Wikipedia</anchorText>
<location>http://en.wikipedia.org/wiki/Memorial_Day</location>
</link>
<link type="a">
<anchorText>a comment at Newsbusters</anchorText>
<location>http://newsbusters.org/node/5580</location>
</link>
<link type="a">
<anchorText>Some bloggers</anchorText>
<location>http://technorati.com/search/google%20memorial</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Wikipedia</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="0770964CAC923ACCDC189E0EA4208AE0" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1141993156883</modifiedDate>
<httpServer>Apache/1.3.34 (Unix) DAV/1.0.3 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a</httpServer>
<urls>
<url>http://battellemedia.com/archives/002391.php</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>From a Reuters story: Sen. Ron Wyden on Thursday proposed legislation aimed at preventing high-speed Internet service providers from charging content companies extra so consumers have faster access to their Web sites or receive speci...
<metaData>
<meta name="title">Net Neutrality Bill Unveiled</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>IPDemocracy</anchorText>
<location>http://www.ipdemocracy.com/archives/2006/03/02/index.php#001213</location>
</link>
<link type="a">
<anchorText>just joined FM</anchorText>
<location>http://fmpub.net/archives/2006/03/were_growingmee.php</location>
</link>
t/test-data/to-split/29.xml view on Meta::CPAN
</urls>
</acquisitionData>
<canonicalDocument>
<section>This morning I described what is Google Co-op, but I also promised I would try to implement an example for this site. Well, we have implemented phase one of Google Co-op subscription links for this site. You can subscribe to the coop...
<metaData>
<meta name="title">Dynamic Implementation of Google Co-op for Search Engine Roundtable</meta>
<meta name="dc:date">Thu, 11 May 2006 19:35:25 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>subscribe</anchorText>
<location>http://www.google.com/coop/trust/add?user=015090516856763095929&continue=http://www.google.com/coop/profile?user=015090516856763095929&sig=Y_aOf96WG5HGmgVEImc3p144xnXGY=</location>
</link>
<link type="a">
<location>http://www.google.com/coop/trust/add?user=015090516856763095929&continue=http://www.google.com/coop/profile?user=015090516856763095929&sig=Y_aOf96WG5HGmgVEImc3p144xnXGY=</location>
</link>
<link type="a">
<anchorText>Google AdSense</anchorText>
<location>http://www.google.com/search?q=Google+AdSense</location>
</link>
<link type="a">
<anchorText>SER Categories</anchorText>
<location>http://www.seroundtable.com/archives.html#category</location>
</link>
<link type="a">
<anchorText>what is Google Co-op</anchorText>
<location>http://www.seroundtable.com/archives/003796.html</location>
</link>
<link type="a">
<anchorText>by clicking here</anchorText>
<location>http://www.google.com/coop/profile?user=015090516856763095929</location>
</link>
<link type="a">
<anchorText>Link Building</anchorText>
<location>http://www.google.com/search?q=Link+Building</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google AdSense</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="57E3FF55199853DF2777EF6B8DC24516" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1149969689989</modifiedDate>
<httpServer>Apache</httpServer>
<urls>
<url>http://searchenginewatch.com/searchday/article.php/3612406</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Links to the week's topics from search engine forums across the web. What Top 5 Skills Would You Study to Become a Better SEO? Search Engine Watch Forums "What skills would you put on your Matrix 'must have' list for your career path...
<metaData>
<meta name="title">Search Engine Forums Spotlight</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>June 2006: Start of the Traditional Summer Slump</anchorText>
<location>http://www.webmasterworld.com/forum89/14428.htm</location>
</link>
<link type="a">
<anchorText>Google Goes to Congress to Block IAPs Charging for Faster Data</anchorText>
<location>http://www.webmasterworld.com/forum86/4531.htm</location>
</link>
<link type="a">
<anchorText>Search Engine Guide</anchorText>
<location>http://www.searchengineguide.com/</location>
</link>
<link type="a">
<anchorText>Does Citing Sources Help Rankings?</anchorText>
<location>http://www.v7n.com/forums/google-forum/31501-does-citing-sources-help-rankings.html</location>
</link>
<link type="a">
<anchorText>Cache Problems Growing for Directories?</anchorText>
<location>http://forums.searchenginewatch.com/showthread.php?threadid=11916</location>
</link>
<link type="a">
<anchorText>What Top 5 Skills Would You Study to Become a Better SEO?</anchorText>
<location>http://forums.searchenginewatch.com/showthread.php?t=11945</location>
</link>
<link type="a">
<anchorText>Google Office Continued: Spreadsheet Application Launched</anchorText>
<location>http://www.cre8asiteforums.com/forums/index.php?showtopic=37455</location>
</link>
<link type="a">
<anchorText>Separate Page for PPC?</anchorText>
<location>http://www.webproworld.com/viewtopic.php?t=64119</location>
</link>
<link type="a">
<anchorText>Brin Says Google Compromised Principles</anchorText>
<location>http://www.webmasterworld.com/forum86/4529.htm</location>
</link>
<link type="a">
<anchorText>Is Reciprocal Linking Dead</anchorText>
<location>http://www.highrankings.com/forum/index.php?showtopic=22885</location>
</link>
<link type="a">
<anchorText>Google Browser Sync For FireFox</anchorText>
<location>http://www.webmasterworld.com/forum30/34677.htm</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>John McCain</form><named_entity_type>person</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Sergey Brin</form><named_entity_type>person</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Brin</form><named_entity_type>person</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google Inc</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Reciprocal</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
( run in 1.368 second using v1.01-cache-2.11-cpan-df04353d9ac )