change results from the CPAN

change
Acme-OneHundredNotOut
view release on metacpan or search on metacpan

=head1 TEXT

I have just released my 100th module to CPAN, the first time that anyone
has reached that target. As some of you may know, I am getting ready to
go back to college and reinvent myself from being a programmer into
being a missionary. I don't forsee that many more Perl modules coming
out of this.

Of course, this doesn't mean that I'm going to abjure usage of Perl
forever; any time there's a computer and something I need automated, out
will come the Swiss Army Chainsaw and the job will get done. In fact, we
recently needed to manipulate some text from a mission handbook to
translate it into Japanese, and Perl was there handling and collating
all that.

But 100 modules is a convenient place to stop and take stock, and I hope
that those of you who have benefitted from my modules, programs or
writing about Perl will forgive me a certain spot of self-indulgence as
I look back over my CPAN career, especially since I feel that the
diversity of modules that I've produced is a good indication of the
diversity of what can be done with Perl.

Let's begin, then, with some humble beginnings, and then catch up on
recent history.

=head2 The Embarrassing Past

Contrary to popular belief, I was not always a CPAN author. I started
writing modules in 1998, immediately after reading the first edition of
the Perl Cookbook - yes, you can blame Nat and Tom for all this. The
first module that I released was L<Tie::DiscoveryHash>, since I'd just
learnt about tied hashes. As with many of my modules, it was an integral
part of another software project which I actually never finished, and
now can't find. 

The first module that I ever B<wrote> (but, by a curious quirk of fate,
precisely the fiftieth module I released) was called L<String::Tokeniser>,
which is still a reasonably handy way of getting an iterator over
tokenising a string. (Someone recently released C<String::Tokenizer>,
which makes me laugh.) This too was for an abortive project, C<webperl>,
an application of Don Knuth's WEB system of structured documentation to
Perl. However, given the code quality of these two modules, it's perhaps
just as well that the projects never saw the light of day.

There are a few other modules I'd rather like to forget, too.
C<Devel::Pointer> was a sick joke that went badly wrong - it allowed
people to use pointers in Perl. Some people failed to notice that
referring to memory locations directly in an extremely high-level
language was a dangerous and silly thing to do, and actually used the
damned thing, and I started getting requests for support for it. Then at
some point in 2001, when I should really have known better, I developed
an interest in Microsoft's .NET and the C# language, which I still think
is pretty neat; but I decided it might be a good idea to translate the
Mono project's tokenizer and parser into Perl, ending up with
L<C::Sharp>. I never got around to doing the parser part, or indeed
anything else with it, and so it died a lonely death in a dark corner of
CPAN. L<GTK::HandyClist> was my foray into programming graphical
applications, which started and ended there. L<Bundle::SDK::SIMON> was
actually the slides from a talk on my top ten favourite CPAN modules -
except that this changes so quickly over time, it doesn't really make
much sense any more.

Finally, L<Array::FileReader> was an attempt to optimize a file access
process. Unfortunately, my "optimization" ended up introducing more
overheads than the naive solution. It all goes to show. Since then,
Mark-Jason Dominus, another huge influence in the development of my CPAN
career, has written C<Tie::File>, which not only has a better name but
is actually efficient too.

=head2 The Internals Phase

1999-2000 were disastrous years for me personally but magnificent years
Perl-sonally. Stuck in a boring job and a tiny flat in the middle of
Tokyo, I had plenty of time to get stuck into more Perl development. I
felt that getting involved with C<perl5-porters> would be a good way of
gettting to know more about Perl, and so I needed a hobby horse - an
issue of Perl's development that I cared about. Since I was in Japan and
working a lot with non-Latin text, Unicode support seemed a good thing
to work on, and so L<Unicode::Decompose> appeared, while I fixed up a
substantial part of the post-5.6 core Unicode support.

I'd recommend this way to anyone who wants to get more involved in the
Perl community, although I was very lucky in terms of who else happened
to be around at the time: Gurusamy Sarathy was extremely gracious in
helping me turn my fledgling C code into something fit for the Perl
core, and he also helped me understand the C<perl5-porters> etiquette
(yes, there was some at the time) and what makes a good patch, while
Jarkko Hietaniemi was always good for suggestions of interesting things
for keen people to work on. Seriously, get involved. If I can do it,
anyone can.

Anyway, this fixation with understanding the Perl 5 internals, and
especially the Perl 5 compiler, (due to yet another of my Perl
influences, the great Malcolm Beattie) led to quite a torrent of
modules, from L<ByteCache>, an implementation of just-in-time
compilation for Perl modules, through L<B::Flags> and L<B::Tree> to help
visualising the Perl op tree, to L<uninit>, L<B::Generate>, L<optimizer>
and L<B::Utils> for modifying it.

=head2 Perl About The House

Now we abandon chronological order somewhat and take a look at the
various areas in which I've used Perl. One of these areas has been the
automation of everyday life: checking my bank balance with
L<Finance::Bank::LloydsTSB> (the first Perl module to interface to
personal internet banking, no less) and my phone bill with a release of
Tony Bowden's L<Data::BT::PhoneBill>. 

L<Finance::Bank::LloydsTSB> was meant to go with L<Finance::QIF>, my
Quicken file parser, to produce another now-abandoned idea, a Perl
finances manager. It seemed that I'm only capable of producing modules,
not full standalone applications - or at least, it seemed that way until
I produced L<Bryar>, my blogging software, based on the concepts from
Rael Dornfest's C<blosxom> and beginning my adventures with Andy
Wardley's Template Toolkit. Bryar also tuned me in to the
Model-View-Controller framework idea, of which more later.

Another project I briefly played with was a personal robot, using the
C<Sphinx>/C<Festival> speech handling and recognition modules from
Cepstral and Kevin Lenzo. I didn't have X10, so I couldn't shout
"lights" into the air in a wonderfully scifi way, but I could shout
"mail" and have a summary of my inbox read to me, "news" to get the
latest BBC news headlines, and "time" to hear the time. Of course,
getting computers to tell the time nicely takes a little bit of work. I
don't like "It's eleven oh-three pee em", since that's not what someone
would say if you asked them the time. I wanted my robot to say "It's
just after eleven", and that's what L<Time::Human> does. Shame about the
localisation.

=head2 Messing About With Classes

One of the things that continues to amaze me about Perl is its
flexibility; the way you can change core parts of its operation, even
from pure Perl. This lead to quite a few modules, many of which were
mere proofs of concept.

L<Sub::Versive>, for instance, was the first module on CPAN to handle
pre- and post-hooks for a subroutine; it has since been joined by a
plethora of imitators. It was written, though, in response to a peculiar
scenario. I was writing a module (C<Safety::First>) which provided
additional built-in-like functions for Perl to encourage and facilitate
defensive programming and intelligible error reporting. ("Couldn't open
file? Why not?") These built-ins had to be available from every
package, which meant playing with C<UNIVERSAL::AUTOLOAD>. But what if
another package was already using C<UNIVERSAL::AUTOLOAD>? Hence,
C<Sub::Versive> wrapped it in a pre-hook. Of course, with the
interesting bit of the problem solved, C<Safety::First> was abandoned.

L<Class::Dynamic> was an interesting attempt to provide support for code
references in C<@ISA>, analogous to code references in C<@INC>. It
works, but of course I could never find any practical use for it.

L<Class::Wrap> was written as a lazy profiler. A certain application I
was writing for my employer of the time, Kasei, made use of the (IMHO
evil) C<Mail::Message> module. How do we isolate all calls to that
class? There are plenty of modules out there for instrumenting
individual methods, including of course C<Sub::Versive>. But the whole
class? C<Class::Wrap> takes a wonderfully brute-force but workable
approach to the problem. A real profiler, however, can be constructed
from L<Devel::DProfPP>, which is sort of a profiler toolkit.

I wrote a couple of other modules with Kasei in this category,
particularly while working on our Plucene port of the Lucene search
engine. (I guess I could claim C<Plucene> as one of my 100 modules, but
that would be to deny Marc Kerr the recognition he deserves for the work
he put in to packaging, documenting and providing tests for my insane
and scrambled code.) I wrote L<Bit::Vector::Minimal>, for instance, as I
ported C<org.apache.lucene.util.BitVector>; L<Tie::Array::Stored>, which
I'm amazed wasn't already implemented on CPAN, provided the Perl
equivalent of C<org.apache.lucene.util.PriorityQueue>.
L<Lucene::QueryParser>, of course, does what it says on the tin. (I also
produced a couple of add-ons for Plucene after leaving Kasei when I was
doing a bit of Plucene consultancy:
L<Plucene::Plugin::Analyzer::PorterAnalyzer> and
L<Plucene::Plugin::WeightedQueryParser>.)

Another module produced in the course of writing Plucene was
L<Class::HasA>, a handy little utility module which works well with Tony
Bowden's C<Class::Accessor> and merely dispatches certain method calls
to objects contained within your object.

And speaking of C<Class::Accessor>, L<Class::Accessor::Assert> would
have been a godsend while writing Plucene, as it's a version of accessor
handling which typechecks what you're putting into the accessor slots.
When you're converting a typed language into an untyped one, occasional
checks that you're handling the right kind of object don't go amiss. I
learnt my lesson eventually, though, and wrote the module after Plucene
was done.

Another Java-influenced module was C<Attribute::Final>, which was written 
for my book Advanced Perl Programming as an example of both attributes
and messing about with the class module - by marking some subtourines as
C<:final>, you get an error if a derived class attempts to override it.
As with many of my proof-of-concept modules, this isn't something I'd
ever use myself, but I know others have used it. I'll let you into a
secret - over the past few months I've settled on giving modules a
version number of C<0.x> if I've never used them myself and C<1.x> if I
have.

Java wasn't the only language to influence my Perl coding activities.
Ruby is a wonderful little language I first encountered in Japan, but
didn't really get into until around 2003. Of course, when you see
another language has dome good ideas, you steal them, which is what I
did with L<rubyisms>, L<SUPER>, and L<Class::SingletonMethod> - all of
which, by the way, are B<excellent> examples of what you can do to the
behaviour of Perl just from pure Perl. C<SUPER> is the kind of module
I've so often wanted to use in production code but never dared.

=head2 Smart Perl

My views on human-computer interface and computer usability have been
unchanged since I wrote C<Tie::DiscoveryHash> way back in the mists of
time. The underlying principle behind that module was simple: the user
should B<never> tell the computer anything it already knows or can
reasonably be expected to work out. C<Tie::DiscoveryHash> was all about
having the computer find out stuff for itself.

This has influenced a number of my modules, which have focussed on
trying to make everything as simple as possible for the user (or more
usually, for the programmer using my modules) and then a bit simpler.

So, for instance, I found the whole process of keeping values persistent
between runs of Perl a bit of a nightmare - I could never remember the
syntax for tying to C<DB_File>, and I would always forget to use the
extremely handy C<MLDBM> module. I just wanted to say "keep this
variable around". L<Attribute::Persistent> does just that, cleanly and
simply. It even works out a sensible place to put the database, so you
don't have to.

Similarly, L<Config::Auto> works out where your application might keep a
configuration file, works out what format it's in, parses it, and hands
you back a hash. No muss, no fuss. And more importantly, no need to even
think about writing a config file parser again. It's done once, forever.
L<Getopt::Auto> applies the same design principles to handling command
line arguments - I hate forgetting how to use C<Getopt::Long>.

Other attempts at making things simple for the end-user weren't that
successful. As part of writing my (first) mail archiving and indexing
program, C<Mail::Miner>, of which more later, I wanted a nice way for
users to specify a time period in which they're looking for mails - "a
week ago", "sometime last summer", "near the beginning of last month" -
and so on. L<Date::PeriodParser> would take these descriptions and turn
them into a start and end time in which to search. Except, of course,
that this is a very hard thing to do and requires a lot of heuristics,
and while I started off quite well, as ever, I got distracted with other
interesting and considerably more tractable problems.

=head2 Mail Handling

A good number of my Perl modules focussed on mail handling, so many that
I was actually able to get a job basically doing mail processing in
Perl. It all started with L<Mail::Audit>. I was introduced to
F<procmail> at University, and it was useful enough, but it kept having
locking problems and losing my mail, and I didn't really understand it,
to be honest, so I wanted to write my mail filtering rules in Perl.
C<Mail::Audit> worked well for a couple of years before it grew into an
obese monster. I actually only use a very old version of C<Mail::Audit>
on my production server.

As part of the attempt to slim it back down again, I abstracted out one
of the major parts of its functionality, delivering an email to a local
mailbox. Now I only use mbox files, so it was reasonably easy for me,
but people wanted me to add Maildir and whatever to C<Mail::Audit>, so I
kicked it all out to L<Mail::LocalDelivery> instead.

But I found that I still wasn't able to filter my mail adequately and
find the stuff I needed from it. Attachments were a big problem, since
they both made ordinary search with C<grep> or C<grepmail> much slower,
and they weren't always easy to find anyway. So I wrote something to
remove attachments from mail and stick them in a database, and while I'm
at it, index mail for quick retrieval. And then it grew to identifying
"interesting" features of an email and searching for them too, and then
( run in 0.820 second using v1.01-cache-2.11-cpan-df04353d9ac )