Algorithm-LibLinear

 view release on metacpan or  search on metacpan

lib/Algorithm/LibLinear/FeatureScaling.pm  view on Meta::CPAN

        my $target;

    my $method = $self->can("scale_$target_type");
    unless ($method) {
        Carp::croak("Cannot scale such type of target: $target_type.");
    }
    $self->$method($target);
}

sub scale_data_set {
    args_pos
        my $self => $InstanceOfPackage,
        my $data_set => InstanceOf['Algorithm::LibLinear::DataSet'];

    my @scaled_data_set =
        map { $self->scale_labeled_data($_) } @{ $data_set->as_arrayref };
    Algorithm::LibLinear::DataSet->new(data_set => \@scaled_data_set);
}

sub scale_feature {
    args_pos
        my $self => $InstanceOfPackage,
        my $feature => Feature;

    my ($lower_bound, $upper_bound) = ($self->lower_bound, $self->upper_bound);
    my $min_max_values = $self->min_max_values;
    my %scaled_feature;
    for my $index (1 .. @$min_max_values) {
        my $unscaled = $feature->{$index} // 0;
        my ($min, $max) = @{ $min_max_values->[$index - 1] // [0, 0] };
        next if $min == $max;
        my $scaled;
        if ($unscaled == $min) {
            $scaled = $lower_bound;
        } elsif ($unscaled == $max) {
            $scaled = $upper_bound;
        } else {
            my $ratio = ($unscaled - $min) / ($max - $min);
            $scaled = $lower_bound + ($upper_bound - $lower_bound) * $ratio;
        }
        $scaled_feature{$index} = $scaled if $scaled != 0;
    }
    return \%scaled_feature;
}

sub scale_labeled_data {
    args_pos
        my $self => $InstanceOfPackage,
        my $labeled_data => FeatureWithLabel;

    +{
        feature => $self->scale_feature($labeled_data->{feature}),
        label => $labeled_data->{label},
    };
}

sub upper_bound { $_[0]->{upper_bound} }

1;

__DATA__

=head1 NAME

Algorithm::LibLinear::FeatureScaling

=head1 SYNOPSIS

  use Algorithm::LibLinear::DataSet;
  use Algorithm::LibLinear::FeatureScaling;
  
  my $scale = Algorithm::LibLinear::FeatureScaling->new(
    data_set => Algorithm::LibLinear::DataSet->new(...),
    lower_bound => -10,
    upper_bound => 10,
  );
  my $scale = Algorithm::LibLinear::FeatureScaling->load(
    filename => '/path/to/file',
  );
  
  my $scaled_feature = $scale->scale(feature => +{ 1 => 30, 2 => - 25, ... });
  my $scaled_labeled_data = $scale->scale(
    labeled_data => +{ feature => +{ 1 => 30, ... }, label => 1 },
  );
  my $scaled_data_set = $scale->scale(
    data_set => Algorithm::LibLinear::DataSet->new(...),
  );
  
  say $scale->as_string;
  $scale->save(filename => '/path/to/another/file');

=head1 DESCRIPTION

Support vector classification is actually just a calculation of inner product of feature vector and normal vector of separation hyperplane. If some elements in feature vectors have greater dynamic range than others, they can have stronger influence o...

For example, consider a normal vector to be C<{ 1 1 1 }> and feature vectors to be classified are C<{ -2 10 5 }>, C<{ 5 -50 0 }> and C<{ 10 100 10 }>. Inner products of these normal vector and feature vectors are 13, -45 and 120 respectively. Obvious...

To avoid such a problem, normalizing range of elements of feature vectors is very important. This module provides such vector scaling functionality. You can see this is a library version of LIBLINEAR's C<svm-scale> command.

=head1 METHODS

=head2 new(data_set => $data_set | min_max_values => \@min_max_values [, lower_bound => 0.0] [, upper_bound => 1.0])

Constructor. You can set some named parameters below. At least C<data_set> or C<min_max_values> is required.

=over 4

=item data_set

An instance of L<Algorithm::LibLinear::DataSet>. This is used to compute dynamic ranges of each vector element.

=item min_max_values

Pre-calculated dynamic ranges of each vector element. Its structure is like:

  my @min_max_values = (
    [ -10, 10 ],  # Dynamic range of 1st elements of vectors.
    [ 0, 1 ],     # 2nd.
    [ -1, 1 ],    # 3rd.
    ...
  );



( run in 0.498 second using v1.01-cache-2.11-cpan-140bd7fdf52 )