AI-Perceptron-Simple
view release on metacpan or search on metacpan
lib/AI/Perceptron/Simple.pm view on Meta::CPAN
package AI::Perceptron::Simple;
use 5.008001;
use strict;
use warnings;
use Carp "croak";
use utf8;
binmode STDOUT, ":utf8";
require local::lib; # no local::lib in tests, this is also to avoid loading local::lib multiple times
use Text::CSV qw( csv );
use Text::Matrix;
use File::Basename qw( basename );
use List::Util qw( shuffle );
=head1 NAME
AI::Perceptron::Simple
A Newbie Friendly Module to Create, Train, Validate and Test Perceptrons / Neurons
=head1 VERSION
Version 1.04
=cut
our $VERSION = '1.04';
# default values
use constant LEARNING_RATE => 0.05;
use constant THRESHOLD => 0.5;
use constant TUNE_UP => 1;
use constant TUNE_DOWN => 0;
=head1 SYNOPSIS
#!/usr/bin/perl
use AI::Perceptron::Simple qw(...);
# create a new nerve / neuron / perceptron
$nerve = AI::Perceptron::Simple->new( {
initial_value => $size_of_each_dendrite,
learning_rate => 0.3, # optional
threshold => 0.85, # optional
attribs => \@dendrites,
} );
# train
$nerve->tame( ... );
$nerve->exercise( ... );
$nerve->train( $training_data_csv, $expected_column_name, $save_nerve_to );
# or
$nerve->train(
$training_data_csv, $expected_column_name, $save_nerve_to,
$show_progress, $identifier); # these two parameters must go together
# validate
$nerve->take_lab_test( ... );
$nerve->take_mock_exam( ... );
# fill results to original file
$nerve->validate( {
stimuli_validate => $validation_data_csv,
predicted_column_index => 4,
} );
lib/AI/Perceptron/Simple.pm view on Meta::CPAN
here on is all mathematics and numbers which only makes sense to the computer and not humans anymore.
Whenever the perceptron fine tunes itself, it will increase/decrease all the dendrites that is significant (attributes labelled 1) for each input. This means that even when the
perceptron successfully fine tunes itself to suite all the data in your file for the first round, the perceptron might still get some of the things wrong for the next round of training.
Therefore, the perceptron should be trained for as many rounds as possible. The more "confusion" the perceptron is able to correctly handle, the more "mature" the perceptron is.
No one defines how "mature" it is except the programmer himself/herself :)
=head1 CONVENTIONS USED
Please take note that not all subroutines/method must be used to make things work. All the subroutines and methods are listed out for the sake of writing the documentation.
Private methods/subroutines are prefixed with C<_> or C<&_> and they aren't meant to be called directly. You can if you want to. There are quite a number of them to be honest, just ignore them if you happen to see them :)
Synonyms are placed before the actual ie. technical subroutines/methods. You will see C<...> as the parameters if they are synonyms. Move to the next subroutine/method until you find something like C<\%options> as the parameter or anything that isn't...
=head1 DATASET STRUCTURE
I<This module can only process CSV files.>
Any field ie columns that will be used for processing must be binary ie. C<0> or C<1> only. Your dataset can contain other columns with non-binary data as long as they are not one of the dendrites.
There are soem sample dataset which can be found in the C<t> directory. The original dataset can also be found in C<docs/book_list.csv>. The files can also be found L<here|https://github.com/Ellednera/AI-Perceptron-Simple>.
=head1 PERCEPTRON DATA
The perceptron/neuron data is stored using the C<Storable> module.
See C<Portability of Nerve Data> section below for more info on some known issues.
=head1 DATA PROCESSING RELATED SUBROUTINES
These subroutines can be imported using the tag C<:process_data>.
These subroutines should be called in the procedural way.
=head2 shuffle_stimuli ( ... )
The parameters and usage are the same as C<shuffled_data>. See the next two subroutines.
=head2 shuffle_data ( $original_data => $shuffled_1, $shuffled_2, ... )
=head2 shuffle_data ( ORIGINAL_DATA, $shuffled_1, $shuffled_2, ... )
Shuffles C<$original_data> or C<ORIGINAL_DATA> and saves them to other files.
=cut
sub shuffle_stimuli {
shuffle_data( @_ );
}
sub shuffle_data {
my $stimuli = shift or croak "Please specify the original file name";
my @shuffled_stimuli_names = @_
or croak "Please specify the output files for the shuffled data";
my @aoa;
for ( @shuffled_stimuli_names ) {
# copied from _real_validate_or_test
# open for shuffling
my $aoa = csv (in => $stimuli, encoding => ":encoding(utf-8)");
my $attrib_array_ref = shift @$aoa; # 'remove' the header, it's annoying :)
@aoa = shuffle( @$aoa ); # this can only process actual array
unshift @aoa, $attrib_array_ref; # put back the headers before saving file
csv( in => \@aoa, out => $_, encoding => ":encoding(utf-8)" )
and
print "Saved shuffled data into ", basename($_), "!\n";
}
}
=head1 CREATION RELATED SUBROUTINES/METHODS
=head2 new ( \%options )
Creates a brand new perceptron and initializes the value of each attribute / dendrite aka. weight. Think of it as the thickness or plasticity of the dendrites.
For C<%options>, the followings are needed unless mentioned:
=over 4
=item initial_value => $decimal
The value or thickness of ALL the dendrites when a new perceptron is created.
Generally speaking, this value is usually between 0 and 1. However, it all depend on your combination of numbers for the other options.
=item attribs => $array_ref
An array reference containing all the attributes / dendrites names. Yes, give them some names :)
=item learning_rate => $decimal
Optional. The default is C<0.05>.
The learning rate of the perceptron for the fine-tuning process.
This value is usually between 0 and 1. However, it all depends on your combination of numbers for the other options.
=item threshold => $decimal
Optional. The default is C<0.5>
This is the passing rate to determine the neuron output (C<0> or C<1>).
Generally speaking, this value is usually between C<0> and C<1>. However, it all depend on your combination of numbers for the other options.
=back
=cut
sub new {
my $class = shift;
my $data_ref = shift;
my %data = %{ $data_ref };
# check keys
$data{ learning_rate } = LEARNING_RATE if not exists $data{ learning_rate };
$data{ threshold } = THRESHOLD if not exists $data{ threshold };
#####
# don't pack this key checking process into a subroutine for now
# this is also used in &_real_validate_or_test
my @missing_keys;
lib/AI/Perceptron/Simple.pm view on Meta::CPAN
=head2 test ( \%options )
This method is used to put the trained nerve to the test. You can think of it as deploying the nerve for the actual work or maybe putting the nerve into an empty brain and see how
well the brain survives :)
This method works and behaves the same way as the C<validate> method. See C<validate> for the details.
I<*This method will call &_real_validate_or_test to do the actual work.>
=cut
# redirect to _real_validate_or_test
sub take_real_exam {
my ( $self, $data_hash_ref ) = @_;
$self->_real_validate_or_test( $data_hash_ref );
}
sub work_in_real_world {
my ( $self, $data_hash_ref ) = @_;
$self->_real_validate_or_test( $data_hash_ref );
}
sub test {
my ( $self, $data_hash_ref ) = @_;
$self->_real_validate_or_test( $data_hash_ref );
}
=head2 _real_validate_or_test ( $data_hash_ref )
This is where the actual validation or testing takes place.
C<$data_hash_ref> is the list of parameters passed into the C<validate> or C<test> methods.
This is a B<method>, so use the OO way. This is one of the exceptions to the rules where private subroutines are treated as methods :)
=cut
sub _real_validate_or_test {
my $self = shift; my $data_hash_ref = shift;
#####
my @missing_keys;
for ( qw( stimuli_validate predicted_column_index ) ) {
push @missing_keys, $_ unless exists $data_hash_ref->{ $_ };
}
croak "Missing keys: @missing_keys" if @missing_keys;
#####
my $stimuli_validate = $data_hash_ref->{ stimuli_validate };
my $predicted_index = $data_hash_ref->{ predicted_column_index };
# actual processing starts here
my $output_file = defined $data_hash_ref->{ results_write_to }
? $data_hash_ref->{ results_write_to }
: $stimuli_validate;
# open for writing results
my $aoa = csv (in => $stimuli_validate, encoding => ":encoding(utf-8)");
my $attrib_array_ref = shift @$aoa; # 'remove' the header, it's annoying :)
$aoa = _fill_predicted_values( $self, $stimuli_validate, $predicted_index, $aoa );
# put back the array of headers before saving file
unshift @$aoa, $attrib_array_ref;
print "Saving data to $output_file\n";
csv( in => $aoa, out => $output_file, encoding => ":encoding(utf-8)" );
print "Done saving!\n";
}
=head2 &_fill_predicted_values ( $self, $stimuli_validate, $predicted_index, $aoa )
This is where the filling in of the predicted values takes place. Take note that the parameters naming are the same as the ones used in the C<validate> and C<test> method.
This subroutine should be called in the procedural way.
=cut
sub _fill_predicted_values {
my ( $self, $stimuli_validate, $predicted_index, $aoa ) = @_;
# CSV processing is all according to the documentation of Text::CSV
open my $data_fh, "<:encoding(UTF-8)", $stimuli_validate
or croak "Can't open $stimuli_validate: $!";
my $csv = Text::CSV->new( {auto_diag => 1, binary => 1} );
my $attrib = $csv->getline($data_fh);
$csv->column_names( $attrib );
# individual row
my $row = 0;
while ( my $data = $csv->getline_hr($data_fh) ) {
if ( _calculate_output( $self, $data ) >= $self->threshold ) {
# write 1 into aoa
$aoa->[ $row ][ $predicted_index ] = 1;
} else {
#write 0 into aoa
$aoa->[ $row ][ $predicted_index ] = 0;
}
$row++;
}
close $data_fh;
$aoa;
}
=head1 RESULTS RELATED SUBROUTINES/METHODS
This part is related to generating the confusion matrix.
=head2 get_exam_results ( ... )
The parameters and usage are the same as C<get_confusion_matrix>. See the next method.
=head2 get_confusion_matrix ( \%options )
Returns the confusion matrix in the form of a hash. The hash will contain these keys: C<true_positive>, C<true_negative>, C<false_positive>, C<false_negative>, C<accuracy>, C<sensitivity>. More stats like C<precision>, C<specificity> and C<F1_Score> ...
If you are trying to manipulate the confusion matrix hash or something, take note that all the stats are in percentage (%) in decimal (if any) except the total entries.
For C<%options>, the followings are needed unless mentioned:
( run in 0.764 second using v1.01-cache-2.11-cpan-e1769b4cff6 )