Acme-OneHundredNotOut
view release on metacpan or search on metacpan
OneHundredNotOut.pm view on Meta::CPAN
was writing for my employer of the time, Kasei, made use of the (IMHO
evil) C<Mail::Message> module. How do we isolate all calls to that
class? There are plenty of modules out there for instrumenting
individual methods, including of course C<Sub::Versive>. But the whole
class? C<Class::Wrap> takes a wonderfully brute-force but workable
approach to the problem. A real profiler, however, can be constructed
from L<Devel::DProfPP>, which is sort of a profiler toolkit.
I wrote a couple of other modules with Kasei in this category,
particularly while working on our Plucene port of the Lucene search
engine. (I guess I could claim C<Plucene> as one of my 100 modules, but
that would be to deny Marc Kerr the recognition he deserves for the work
he put in to packaging, documenting and providing tests for my insane
and scrambled code.) I wrote L<Bit::Vector::Minimal>, for instance, as I
ported C<org.apache.lucene.util.BitVector>; L<Tie::Array::Stored>, which
I'm amazed wasn't already implemented on CPAN, provided the Perl
equivalent of C<org.apache.lucene.util.PriorityQueue>.
L<Lucene::QueryParser>, of course, does what it says on the tin. (I also
produced a couple of add-ons for Plucene after leaving Kasei when I was
doing a bit of Plucene consultancy:
L<Plucene::Plugin::Analyzer::PorterAnalyzer> and
OneHundredNotOut.pm view on Meta::CPAN
Another thing those wacky computational linguists do a lot of is
working with n-gram streams. In every discipline, there's a particular
hammer you can use to solve any given problem. In data mining, it's
called market basket analysis. In computational linguistics, it's
maximal entropy. You look at the past stream of n characters (that's an
n-gram) and work out how hard it is to see what's coming next.
For instance, if I feed you the 4-gram C<xylo> the chances of a C<p>
next are very high. The chances of a C<e>, or indeed anything else, are
pretty low. Low entropy area. But if I feed you C<then>, it's really not
easy to guess the next letter, since we're likely to be at the end of a
word and the next word might be anything; high entropy. That's how you
use maximal entropy to find word breaks in unsegmented text, and there's
a huge amount of other cool stuff you can do with it.
I swear the day I wrote L<Text::Ngram>, there were no other modules on
CPAN which extracted n-grams, but as soon as I released it it looked
like there were three or four there all along. (Including one from
Jarkko, no less.) Anyway, I wanted to see if I could still remember how
to write XS modules, especially since I'd just written a book about it.
( run in 0.273 second using v1.01-cache-2.11-cpan-748bfb374f4 )