CPAN-Search-Lite
view release on metacpan or search on metacpan
in one match, supply the "subletter" information.
- remove some leading and trailing spaces in the information
gathered from the CPAN indices for author full names and
chapter ids.
- in the example Apache directives, indicate that CPAN ids
can contain numbers, as well as letters and the '-' sign.
- when a distribution is scheduled for deletion from the
database, remove it's pod and html docs as well, if present.
Sept 4, 2004 - 0.55
- use AI::Categorizer to guess at a category for unassigned
modules/distributions, based on the learning from existing
categorized modules. There are two new options available
in the [CPAN] section of the configuration file:
- cat_threshold: a number between 0 and 1 used as a threshold
for assigning categories
- no_cat: to not use this feature
- bump up length of module names stored in the mods table,
as some modules were longer than this
- if there is only one result available for a link from a "letter"
page of modules/distributions or a chapter page, redirect to
build/gen_conf.pl view on Meta::CPAN
my $remote_mirror = prompt('CPAN mirror to fetch index files?',
'http://www.cpan.org');
$cfg->newval('CPAN', 'remote_mirror', $remote_mirror);
}
unless (prompt_y('Gather Win32 ppm package information?')) {
$cfg->newval('CPAN', 'no_ppm', 1);
}
if (prompt_y('Guess unassigned module categories?')) {
my $cat_threshold = prompt('Threshold value for guesses?', 0.998);
$cfg->newval('CPAN', 'cat_threshold', $cat_threshold);
}
else {
$cfg->newval('CPAN', 'no_cat', 1);
}
$cfg->newval('CPAN', 'DEBUG', 1);
my $multiplex;
if (prompt_y('Use a multiplexer for mirror redirects?')) {
$multiplex = prompt('Address of multiplexer',
lib/CPAN/Search/Lite/Index.pm view on Meta::CPAN
=item * multiplexer = http://cpan.redirect.edu/cpan
This can be used to specify a multiplexer to redirect
downloads to nearby CPAN mirrors. See, for example,
L<Apache::GeoIP> for one implementation.
=item * cat_threshold = 0.99
Many modules do not have a category (chapter) associated with
them. In such cases, when populating the database, the
I<AI::Catgorizer> module is used to guess which category
should be assigned to such a module, based on available information
for those modules that do have a category. The value of I<cat_threshold>
is used to determine if the guessed category should be accepted
(a perfect match has a score of 1, and no match has 0). If no
such value is given, a default of 0.995 is used.
=item * no_cat = 1
Set I<no_cat> equal to a true value if you don't want
I<AI::Categorizer> to try categorizing modules which
don't have a category assigned.
=back
lib/CPAN/Search/Lite/Populate.pm view on Meta::CPAN
to (eg, C<File_Handle_Input_Output>).
=back
=head1 CATEGORIES
When uploading a module to PAUSE, there exists an option
to assign it to one of 24 broad categories. However, many
modules have not been assigned such a category, for one
reason or another. When populating the tables, the
I<AI::Categorizer> module is used to guess a possible
category for those modules that haven't been assigned one,
based on a training set based on the modules that have been
assigned a category (see <AI::Categorizer> for general
details). If this guess is above a configurable
threshold (see L<CPAN::Search::Lite::Index>, the guess is
accepted and subsequently inserted into the database, as
well as updating the categories associated with the
module's distribution.
=head1 SEE ALSO
L<CPAN::Search::Lite::Index>
=cut
( run in 0.955 second using v1.01-cache-2.11-cpan-748bfb374f4 )