CPAN-Search-Lite

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

      in one match, supply the "subletter" information.
    - remove some leading and trailing spaces in the information
      gathered from the CPAN indices for author full names and
      chapter ids.
    - in the example Apache directives, indicate that CPAN ids
      can contain numbers, as well as letters and the '-' sign.
    - when a distribution is scheduled for deletion from the
      database, remove it's pod and html docs as well, if present.

Sept 4, 2004 - 0.55
    - use AI::Categorizer to guess at a category for unassigned
      modules/distributions, based on the learning from existing
      categorized modules. There are two new options available
      in the [CPAN] section of the configuration file:
         - cat_threshold: a number between 0 and 1 used as a threshold
           for assigning categories
         - no_cat: to not use this feature
    - bump up length of module names stored in the mods table,
      as some modules were longer than this
    - if there is only one result available for a link from a "letter"
      page of modules/distributions or a chapter page, redirect to

build/gen_conf.pl  view on Meta::CPAN

  my $remote_mirror = prompt('CPAN mirror to fetch index files?',
                            'http://www.cpan.org');
  $cfg->newval('CPAN', 'remote_mirror', $remote_mirror);
}

unless (prompt_y('Gather Win32 ppm package information?')) {
  $cfg->newval('CPAN', 'no_ppm', 1);
}

if (prompt_y('Guess unassigned module categories?')) {
  my $cat_threshold = prompt('Threshold value for guesses?', 0.998);
  $cfg->newval('CPAN', 'cat_threshold', $cat_threshold);
}
else {
  $cfg->newval('CPAN', 'no_cat', 1);
}
$cfg->newval('CPAN', 'DEBUG', 1);

my $multiplex;
if (prompt_y('Use a multiplexer for mirror redirects?')) {
  $multiplex = prompt('Address of multiplexer',

lib/CPAN/Search/Lite/Index.pm  view on Meta::CPAN

=item * multiplexer = http://cpan.redirect.edu/cpan

This can be used to specify a multiplexer to redirect
downloads to nearby CPAN mirrors. See, for example,
L<Apache::GeoIP> for one implementation.

=item * cat_threshold = 0.99

Many modules do not have a category (chapter) associated with
them. In such cases, when populating the database, the 
I<AI::Catgorizer> module is used to guess which category
should be assigned to such a module, based on available information 
for those modules that do have a category. The value of I<cat_threshold>
is used to determine if the guessed category should be accepted
(a perfect match has a score of 1, and no match has 0). If no
such value is given, a default of 0.995 is used.

=item * no_cat = 1

Set I<no_cat> equal to a true value if you don't want
I<AI::Categorizer> to try categorizing modules which
don't have a category assigned.

=back

lib/CPAN/Search/Lite/Populate.pm  view on Meta::CPAN

to (eg, C<File_Handle_Input_Output>).

=back

=head1 CATEGORIES

When uploading a module to PAUSE, there exists an option
to assign it to one of 24 broad categories. However, many
modules have not been assigned such a category, for one
reason or another. When populating the tables, the
I<AI::Categorizer> module is used to guess a possible
category for those modules that haven't been assigned one,
based on a training set based on the modules that have been
assigned a category (see <AI::Categorizer> for general
details). If this guess is above a configurable
threshold (see L<CPAN::Search::Lite::Index>, the guess is
accepted and subsequently inserted into the database, as
well as updating the categories associated with the
module's distribution.

=head1 SEE ALSO

L<CPAN::Search::Lite::Index>

=cut



( run in 0.955 second using v1.01-cache-2.11-cpan-748bfb374f4 )