Acme-OneHundredNotOut

 view release on metacpan or  search on metacpan

OneHundredNotOut.pm  view on Meta::CPAN

Kasei was as much about linguistic and textual analysis as it was about
mail munging. With that in mind, I wrote a few language-related modules
during my time with them.

The first important module, which I started work on while I was playing
with C<Mail::Miner>, was L<Lingua::EN::Keywords>. This started life as a
relatively naive algorithm for picking common words out of a text in an
attempt to provide some keywords to describe what the text is "about", and
has matured into quite a handy little automatic topic recognition
module. Its natural counterpart is L<Lingua::EN::NamedEntity>, which
B<is> still a naive algorithm but sometimes those are the best ones.

This module has a bit of story behind it. While analysing mails we were
trying to find people, places, times, and other things we could link
together into a knowledge base. The technical term for this is named
entity extraction. I find a useful library to do this, called C<GATE>.
It's written in Java, which meant using C<Inline::Java>, and is
extremely slow and complex. At the same time, I was writing a chapter on
computational linguistics with Perl in Advanced Perl Programming, and
wanted to talk about named entity extraction. Unfortunately, I only had
one module which did this, L<GATE::ANNIE::Simple>, and it was a hack. If

OneHundredNotOut.pm  view on Meta::CPAN


=head2 Text Munging, and Some More Mail Stuff

Applying my linguistic experience to the problems of intelligent mail
indexing, searching and displaying led to churning out another set of
modules.

The first problem was what to do with search results. You know those
little snippets that Google and other search engines display when you
search for some terms? They contextualise the terms in the body of the
document and highlight them in a snippet that best represents how
they're used in the document. This is actually a really hard problem,
and it took me several goes to get L<Text::Context> right. It uses
L<Text::Context::EitherSide> as an "emergency" contextualizer if it
can't get anything right at all, but the algorithm itself is a bit of a
swine. I actually had to prototype this module in Ruby to get my
thinking clear enough to code it up in Perl...

L<Text::Quoted> was another mail display problem - it's nice to
display different layers of quoted text in an email in different
colours. Identifying the quoted text isn't that hard, but working out

OneHundredNotOut.pm  view on Meta::CPAN

L<Maypole::Authentication::UserSessionCookie> and L<Maypole::Component>.

Template Toolkit and XML came back together again in a recent project
where I've had render some XML as part of a Maypole application.
Amazingly, there wasn't an XSLT filter for the Template Toolkit, so
L<Template::Plugin::XSLT> was born.

=head2 Games, Diversions and Toys

It was only when I got back from Japan that I learnt to play Go. How
stupid was that. For a year I had access to some of the best Go clubs
and professional teacher and players in the world, and then I only pick
the bloody game up when I get back to England. Anyway, any computer
programmer who learns to play go, and they all do soon or later,
eventually decides to do something about the pitiful state of computer
Go. It's quite ridiculous that the game's been around for thousands of
years and the best computer programs we've devised regularly get beaten
resoundingly by small children. Anyway, I did my bit, producing
L<Games::Go::GMP> and L<Games::Go::SGF> as utility libraries, before
working on L<Games::Goban> to represent the state of the game.

But then while working for Kasei we discovered another addictive
diversion: poker. Computer poker isn't that great either, and I wanted
to write some robots to play on the internet poker servers;
L<Games::Poker::HandEvaluator> was the first product there, with the
hard work done by a GNU library, and L<Games::Poker::OPP> being the
interface to the network protocol. The comments to that module contain a



( run in 0.907 second using v1.01-cache-2.11-cpan-4e96b696675 )