Acme-OneHundredNotOut
view release on metacpan or search on metacpan
OneHundredNotOut.pm view on Meta::CPAN
My views on human-computer interface and computer usability have been
unchanged since I wrote C<Tie::DiscoveryHash> way back in the mists of
time. The underlying principle behind that module was simple: the user
should B<never> tell the computer anything it already knows or can
reasonably be expected to work out. C<Tie::DiscoveryHash> was all about
having the computer find out stuff for itself.
This has influenced a number of my modules, which have focussed on
trying to make everything as simple as possible for the user (or more
usually, for the programmer using my modules) and then a bit simpler.
So, for instance, I found the whole process of keeping values persistent
between runs of Perl a bit of a nightmare - I could never remember the
syntax for tying to C<DB_File>, and I would always forget to use the
extremely handy C<MLDBM> module. I just wanted to say "keep this
variable around". L<Attribute::Persistent> does just that, cleanly and
simply. It even works out a sensible place to put the database, so you
don't have to.
Similarly, L<Config::Auto> works out where your application might keep a
configuration file, works out what format it's in, parses it, and hands
you back a hash. No muss, no fuss. And more importantly, no need to even
think about writing a config file parser again. It's done once, forever.
L<Getopt::Auto> applies the same design principles to handling command
line arguments - I hate forgetting how to use C<Getopt::Long>.
Other attempts at making things simple for the end-user weren't that
successful. As part of writing my (first) mail archiving and indexing
program, C<Mail::Miner>, of which more later, I wanted a nice way for
users to specify a time period in which they're looking for mails - "a
week ago", "sometime last summer", "near the beginning of last month" -
and so on. L<Date::PeriodParser> would take these descriptions and turn
them into a start and end time in which to search. Except, of course,
that this is a very hard thing to do and requires a lot of heuristics,
and while I started off quite well, as ever, I got distracted with other
interesting and considerably more tractable problems.
=head2 Mail Handling
A good number of my Perl modules focussed on mail handling, so many that
I was actually able to get a job basically doing mail processing in
Perl. It all started with L<Mail::Audit>. I was introduced to
F<procmail> at University, and it was useful enough, but it kept having
locking problems and losing my mail, and I didn't really understand it,
to be honest, so I wanted to write my mail filtering rules in Perl.
C<Mail::Audit> worked well for a couple of years before it grew into an
obese monster. I actually only use a very old version of C<Mail::Audit>
on my production server.
As part of the attempt to slim it back down again, I abstracted out one
of the major parts of its functionality, delivering an email to a local
mailbox. Now I only use mbox files, so it was reasonably easy for me,
but people wanted me to add Maildir and whatever to C<Mail::Audit>, so I
kicked it all out to L<Mail::LocalDelivery> instead.
But I found that I still wasn't able to filter my mail adequately and
find the stuff I needed from it. Attachments were a big problem, since
they both made ordinary search with C<grep> or C<grepmail> much slower,
and they weren't always easy to find anyway. So I wrote something to
remove attachments from mail and stick them in a database, and while I'm
at it, index mail for quick retrieval. And then it grew to identifying
"interesting" features of an email and searching for them too, and then
L<Mail::Miner> was born.
Finally, I got into web display of archived email, and needed a way of
displaying threads. Amazingly, nobody had coded up JWZ's mail threading
algorithm in Perl yet, so I did that too: L<Mail::Thread>.
But then I decided that C<Mail::*> was in a very sick state. I had been
working with the mail handling modules from CPAN - including my own -
and grown to hate them; they were all too slow, too complicated, too
buggy or all three. It was time for action, and the Perl Email Project
was born.
L<Email::Simple> was the first thing to come out of this, and is
a fantastic way of just getting at the bits you need from an email. It's
much simpler, and therefore much faster, than its more fully-featured
cousins on CPAN. L<Email::MIME> was its natural successor, which added
rudimentary MIME handling, and spawned two subsidiary modules,
L<Email::MIME::ContentType> and L<Email::MIME::Encodings> in order to
keep C<Email::MIME> itself focussed on the "do one thing and do it well"
principle.
Of course we then had to replace C<Mail::Audit>, so
L<Email::LocalDelivery> and L<Email::Filter> appeared. This is another
module I don't use, because my C<Mail::Audit> setup works and I'm
terrified of breaking it and losing all my mail. But I'm told that
C<Email::Filter> works just fine too.
By this stage, C<Mail::Miner> was getting crufty. It was replaced by a much
more modular and beautiful L<Email::Store>; this is extended with
plug-in modules like L<Email::Store::Summary>, L<Email::Store::Plucene>
and L<Email::Store::Thread>. I had to write the plug-in framework
myself, since neither C<Module::Pluggable> or C<Class::Trigger> did
quite what I wanted, and so the C<Email::Store> project also produced
L<Module::Pluggable::Ordered>.
Now C<Email::Store> naturally uses C<Email::Simple> objects, since
it's the most efficient mail representation class on CPAN.
Unfortunately, C<Email::Store> also wants to make use of some modules on
CPAN like C<Mail::ListDetector> which don't want to know about
C<Email::Simple> objects and want to talk C<Mail::Internet> or whatever.
To get around this, I wrote L<Email::Abstract> which provides module
writers with an interface to B<any> kind of mail object, so they don't
have to force a particular representation on their users.
=head2 Linguistics
I'm actually a linguist by training, not a computer programmer,
graduating from the school of Oriental Studies with second and third
year options in Japanese linguistics. I'd like to think that my work at
Kasei was as much about linguistic and textual analysis as it was about
mail munging. With that in mind, I wrote a few language-related modules
during my time with them.
The first important module, which I started work on while I was playing
with C<Mail::Miner>, was L<Lingua::EN::Keywords>. This started life as a
relatively naive algorithm for picking common words out of a text in an
attempt to provide some keywords to describe what the text is "about", and
has matured into quite a handy little automatic topic recognition
( run in 1.306 second using v1.01-cache-2.11-cpan-8f98c5d2c55 )