Alvis-Convert
view release on metacpan or search on metacpan
t/test-data/to-split/29.xml view on Meta::CPAN
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="18C9FD35812DFC4D4CCF0FD6AC1646BC" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1149133052555</modifiedDate>
<httpServer>Apache/1.3.33 (Unix)</httpServer>
<urls>
<url>http://blog.outer-court.com/archive/2006-05-30-n12.html</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Some bloggers are complaining that Google didnât have a Memorial day logo yesterday. Memorial Day âcommemorates U.S. men and women who have died in military service,âWikipedia explains. From a comment at Newsbusters by Warner T...
<metaData>
<meta name="title">Complaints Due to Lack of Google Memorial Day Logo</meta>
<meta name="dc:date">Thu, 01 Jun 2006 02:44:56 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>itâs good the way it is</anchorText>
<location>http://blog.lewrockwell.com/lewrw/archives/010666.html</location>
</link>
<link type="a">
<anchorText>Wikipedia</anchorText>
<location>http://en.wikipedia.org/wiki/Memorial_Day</location>
</link>
<link type="a">
<anchorText>a comment at Newsbusters</anchorText>
<location>http://newsbusters.org/node/5580</location>
</link>
<link type="a">
<anchorText>Some bloggers</anchorText>
<location>http://technorati.com/search/google%20memorial</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Wikipedia</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="0770964CAC923ACCDC189E0EA4208AE0" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1141993156883</modifiedDate>
<httpServer>Apache/1.3.34 (Unix) DAV/1.0.3 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a</httpServer>
<urls>
<url>http://battellemedia.com/archives/002391.php</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>From a Reuters story: Sen. Ron Wyden on Thursday proposed legislation aimed at preventing high-speed Internet service providers from charging content companies extra so consumers have faster access to their Web sites or receive speci...
<metaData>
<meta name="title">Net Neutrality Bill Unveiled</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>IPDemocracy</anchorText>
<location>http://www.ipdemocracy.com/archives/2006/03/02/index.php#001213</location>
</link>
<link type="a">
<anchorText>just joined FM</anchorText>
<location>http://fmpub.net/archives/2006/03/were_growingmee.php</location>
</link>
<link type="a">
<anchorText>Reuters story</anchorText>
<location>http://today.reuters.com/news/articleinvesting.aspx?type=governmentFilingsNews&storyid=URI:urn:newsml:reuters.com:20060302:MTFH08897_2006-03-02_22-55-27_N02376259:1</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="35D3C71D8D04A7A782CD2E8CBF17220C" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1144681935588</modifiedDate>
<httpServer>Apache/1.3.28 (Unix) mod_gzip/1.3.26.1a PHP/4.3.10 mod_ssl/2.8.15 OpenSSL/0.9.7c</httpServer>
<urls>
<url>http://www.seroundtable.com/archives/003633.html</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>A featured Search Engine Watch Forum thread named SEO & Newspapers discusses a recent NYTimes article named This Boring Headline Is Written for Google. The first paragraph of the article somes it up; Journalists over the years ha...
<metaData>
<meta name="title">New York Times Changes Web Only Headlines To Be Search Engine Friendly</meta>
<meta name="dc:date">Mon, 10 Apr 2006 13:37:11 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Search Engine Watch Forums</anchorText>
<location>http://forums.searchenginewatch.com/showthread.php?threadid=11001</location>
</link>
<link type="a">
<anchorText>SEO & Newspapers</anchorText>
<location>http://forums.searchenginewatch.com/showthread.php?threadid=11001</location>
</link>
<link type="a">
<anchorText>explains</anchorText>
<location>http://blog.searchenginewatch.com/blog/060410-090051</location>
</link>
<link type="a">
<anchorText>This Boring Headline Is Written for Google</anchorText>
<location>http://www.nytimes.com/2006/04/09/weekinreview/09lohr.html?ex=1302235200&en=86fd20f27aa1d645&ei=5090&partner=rssuserland&emc=rss</location>
t/test-data/to-split/29.xml view on Meta::CPAN
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Google</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Digital</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Google PageRank</form><named_entity_type>soft</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="E25E5DBF90E6C6A3CDF200F61F6A20E6" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1150315246240</modifiedDate>
<httpServer>Apache/1.3.36 (Unix) mod_fastcgi/2.4.2 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.2 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.27 OpenSSL/0.9.7a</httpServer>
<urls>
<url>http://www.searchenginejournal.com/?p=3530</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>RSS - Things That Make You Go Hmmm Why doesnât the new Yahoo Spark Blog publish an RSS feed? Of any kind? Not even an âadd to my Yahooâ button? Why canât I subscribe to the Technorati Hot Tags widget thatâs (supposedly) upd...
<metaData>
<meta name="title">RSS - Things That Make You Go Hmmm</meta>
<meta name="dc:type">text/html; charset=utf-8</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Technorati Hot Tags</anchorText>
<location>http://www.technorati.com/tags/</location>
</link>
<link type="a">
<anchorText>eBay</anchorText>
<location>http://www2.ebay.com/aw/core/200603200913002.html</location>
</link>
<link type="a">
<anchorText>Yahoo Spark Blog</anchorText>
<location>http://dir.yahoo.com/thespark/240/peek-through-the-pinhole</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
<semantic_unit><named_entity><form>Yahoo</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
<semantic_unit><named_entity><form>Technorati</form><named_entity_type>comp</named_entity_type></named_entity></semantic_unit>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="070E7EB628CC943FBF90E7C6A703D9B2" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1149606759016</modifiedDate>
<httpServer>Apache/1.3.28 (Unix) mod_gzip/1.3.26.1a PHP/4.3.10 mod_ssl/2.8.15 OpenSSL/0.9.7c</httpServer>
<urls>
<url>http://www.seroundtable.com/archives/003894.html</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Any SEO/M will tell you their job description sucks because in the process of describing exactly what they do, they nearly always watch the listener's eyes glaze over, waiting for a topic that may make better sense. Same thing with u...
<metaData>
<meta name="title">Officer Usability and General SEO</meta>
<meta name="dc:date">Mon, 05 Jun 2006 11:52:34 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>Should links still be underlined and blue?</anchorText>
<location>http://www.cre8asiteforums.com/forums/index.php?s=&showtopic=36893&view=findpost&p=181570</location>
</link>
<link type="a">
<anchorText>Contextual Usability?</anchorText>
<location>http://www.cre8asiteforums.com/forums/index.php?s=&showtopic=37336&view=findpost&p=183860</location>
</link>
<link type="a">
<anchorText>Features don't matter anymore, Welcome to the Age of User Experience</anchorText>
<location>http://www.cre8asiteforums.com/forums/index.php?s=&showtopic=37237&view=findpost&p=183418</location>
</link>
</outlinks>
</links>
</acquisition>
<linguisticAnalysis>
<semantic_unit_level>
</semantic_unit_level>
</linguisticAnalysis>
</documentRecord>
<documentRecord id="C5E3217E0849D4E0F5C78C132B7E826D" xmlns="http://alvis.info/enriched/">
<acquisition>
<acquisitionData>
<modifiedDate>1146772829195</modifiedDate>
<httpServer>Apache/1.3.28 (Unix) mod_gzip/1.3.26.1a PHP/4.3.10 mod_ssl/2.8.15 OpenSSL/0.9.7c</httpServer>
<urls>
<url>http://www.seroundtable.com/archives/003764.html</url>
</urls>
</acquisitionData>
<canonicalDocument>
<section>Yesterday, I posted at SEW blog Ask.com Second TV Blitz Stars Chief Scientist Guru, Apostolos Gerasoulis. I have now spotted the commercials that you can view for yourself at http://about.ask.com/docs/about/televisionads.shtml. Yes, ...
<metaData>
<meta name="title">Ask.com's New TV Commercials Sport Apostolos Gerasoulis, Ask.com's Technology Founder</meta>
<meta name="dc:date">Thu, 04 May 2006 19:35:39 GMT</meta>
<meta name="dc:type">text/html</meta>
</metaData>
<links>
<outlinks>
<link type="a">
<anchorText>http://about.ask.com/docs/about/televisionads.shtml</anchorText>
<location>http://about.ask.com/docs/about/televisionads.shtml</location>
</link>
<link type="a">
<anchorText>Ask.com Second TV Blitz Stars Chief Scientist Guru, Apostolos Gerasoulis</anchorText>
<location>http://blog.searchenginewatch.com/blog/060503-084529</location>
</link>
<link type="a">
<anchorText>Search Engine Roundtable Forums</anchorText>
<location>http://forums.seroundtable.com/showthread.php?t=699</location>
</link>
</outlinks>
</links>
( run in 0.351 second using v1.01-cache-2.11-cpan-483215c6ad5 )