view release on metacpan or search on metacpan
Version 1.04 17 SEPTEMBER 2021
* fixed some critical problems
* yml nerve not loading back as an AI::Perceptron::Simple object
* fix docs: missing parameter $nerve for:
* save_perceptron
* save_perceptron_yaml
* changed die to croak for file opening
Version 1.03 9 SEPTEMBER 2021
* data processing subroutine available:
* shuffle data
* added import tag ":process_data"
* added more useful data to the confusion matrix:
* sum of column and rows to make it look more classic :)
* more_stats option to show more stats:
* precision, specificity, F1 score, negative predicted value, false negative rate, false positive rate
* false discovery rate, false omission rate, balanced accuracy
Version 1.02 26 AUGUST 2021
* minimum perl version changed to 5.8.1 due to YAML
* fix test for display_confusion_matrix
* modifier "n" ( >= perl 5.22 ) changed to primitive '?:', 5.22 is too high
* fixed inaccurate test for output
* YAML (nerve file) for portability supported
* make subroutines exportable, the names are too long
* :local_data
* :portable_data
* cleaned & refactored codes
* &display_confusion_matrix
* improved the documentation
Version 1.01 24 AUGUST 2021
* Fixed some technical issues
* fixed test scripts not run in correct sequence
* must be creation -> train -> validate/test
* local::lib issue should be fixed by now
Version 1.00 23 AUGUST 2021
* The following features were implemented over the course of time (see also My::Perceptron v0.04 on github):
* create perceptron
* process data: &train method
* read csv - for training stage
* save and load the perceptron
* output algorithm for train
* read and calculate data line by line
* validate method
* read csv bulk
* write predicted values into original file
* write predicted values into new file
* test method
* read csv bulk
* write predicted values into original file
* write predicted values into new file
* confusion matrix
* read only expected and predicted columns, line by line
* return a hash of data
* TP, TN, FP, FN
* total entries
* accuracy
* sensitivity
* display confusion matrix data to console
* use Text:Matrix
* synonyms
* synonyms MUST call actual subroutines and not copy pasting!
* train: tame, exercise
* validate: take_mock_exam, take_lab_test
* test: take_real_exam, work_in_real_world
* generate_confusion_matrix: get_exam_results
* display_confusion_matrix: display_exam_results
* save_perceptron: preserve
t/04-train_synonyms_exercise.t
t/04-train_synonyms_tame.t
t/06-validate.t
t/06-validate_synonyms_lab.t
t/06-validate_synonyms_mock.t
t/08-confusion_matrix.t
t/08-confusion_matrix_synonyms.t
t/10-test.t
t/10-test_synonyms_exam.t
t/10-test_synonyms_work.t
t/12-shuffle_data.t
t/12-shuffle_data_synonym.t
t/book_list_test-filled-non-binary.csv
t/book_list_test-filled.csv
t/book_list_test.csv
t/book_list_test_exam-filled.csv
t/book_list_test_work-filled.csv
t/book_list_to_shuffle.csv
t/book_list_train.csv
t/book_list_validate-filled.csv
t/book_list_validate.csv
t/book_list_validate_lab-filled.csv
docs/AI-Perceptron-Simple-1.04.html view on Meta::CPAN
<li><a href="#VERSION">VERSION</a></li>
<li><a href="#SYNOPSIS">SYNOPSIS</a></li>
<li><a href="#EXPORT">EXPORT</a></li>
<li><a href="#DESCRIPTION">DESCRIPTION</a></li>
<li><a href="#CONVENTIONS-USED">CONVENTIONS USED</a></li>
<li><a href="#DATASET-STRUCTURE">DATASET STRUCTURE</a></li>
<li><a href="#PERCEPTRON-DATA">PERCEPTRON DATA</a></li>
<li><a href="#DATA-PROCESSING-RELATED-SUBROUTINES">DATA PROCESSING RELATED SUBROUTINES</a>
<ul>
<li><a href="#shuffle_stimuli">shuffle_stimuli ( ... )</a></li>
<li><a href="#shuffle_data-original_data-shuffled_1-shuffled_2">shuffle_data ( $original_data => $shuffled_1, $shuffled_2, ... )</a></li>
<li><a href="#shuffle_data-ORIGINAL_DATA-shuffled_1-shuffled_2">shuffle_data ( ORIGINAL_DATA, $shuffled_1, $shuffled_2, ... )</a></li>
</ul>
</li>
<li><a href="#CREATION-RELATED-SUBROUTINES-METHODS">CREATION RELATED SUBROUTINES/METHODS</a>
<ul>
<li><a href="#new-options">new ( \%options )</a></li>
<li><a href="#get_attributes">get_attributes</a></li>
<li><a href="#learning_rate-value">learning_rate ( $value )</a></li>
<li><a href="#learning_rate">learning_rate</a></li>
<li><a href="#threshold-value">threshold ( $value )</a></li>
<li><a href="#threshold">threshold</a></li>
lib/AI/Perceptron/Simple.pm view on Meta::CPAN
$nerve = AI::Perceptron::Simple->new( {
initial_value => $size_of_each_dendrite,
learning_rate => 0.3, # optional
threshold => 0.85, # optional
attribs => \@dendrites,
} );
# train
$nerve->tame( ... );
$nerve->exercise( ... );
$nerve->train( $training_data_csv, $expected_column_name, $save_nerve_to );
# or
$nerve->train(
$training_data_csv, $expected_column_name, $save_nerve_to,
$show_progress, $identifier); # these two parameters must go together
# validate
$nerve->take_lab_test( ... );
$nerve->take_mock_exam( ... );
# fill results to original file
$nerve->validate( {
stimuli_validate => $validation_data_csv,
predicted_column_index => 4,
} );
# or
# fill results to a new file
$nerve->validate( {
stimuli_validate => $validation_data_csv,
predicted_column_index => 4,
results_write_to => $new_csv
} );
# test - see "validate" method, same usage
$nerve->take_real_exam( ... );
$nerve->work_in_real_world( ... );
$nerve->test( ... );
# confusion matrix
my %c_matrix = $nerve->get_confusion_matrix( {
full_data_file => $file_csv,
actual_output_header => $header_name,
predicted_output_header => $predicted_header_name,
more_stats => 1, # optional
} );
# accessing the confusion matrix
my @keys = qw( true_positive true_negative false_positive false_negative
total_entries accuracy sensitivity );
for ( @keys ) {
print $_, " => ", $c_matrix{ $_ }, "\n";
}
# output to console
$nerve->display_confusion_matrix( \%c_matrix, {
zero_as => "bad apples", # cat milk green etc.
one_as => "good apples", # dog honey pink etc.
} );
# saving and loading data of perceptron locally
# NOTE: nerve data is automatically saved after each trainning process
use AI::Perceptron::Simple ":local_data";
my $nerve_file = "apples.nerve";
preserve( ... );
save_perceptron( $nerve, $nerve_file );
# load data of percpetron for use in actual program
my $apple_nerve = revive( ... );
my $apple_nerve = load_perceptron( $nerve_file );
# for portability of nerve data
use AI::Perceptron::Simple ":portable_data";
my $yaml_nerve_file = "pearls.yaml";
preserve_as_yaml ( ... );
save_perceptron_yaml ( $nerve, $yaml_nerve_file );
# load nerve data on the other computer
my $pearl_nerve = revive_from_yaml ( ... );
my $pearl_nerve = load_perceptron_yaml ( $yaml_nerve_file );
# processing data
use AI::Perceptron::Simple ":process_data";
shuffle_stimuli ( ... )
shuffle_data ( ORIGINAL_STIMULI, $new_file_1, $new_file_2, ... );
shuffle_data ( $original_stimuli => $new_file_1, $new_file_2, ... );
=head1 EXPORT
None by default.
All the subroutines from C<DATA PROCESSING RELATED SUBROUTINES>, C<NERVE DATA RELATED SUBROUTINES> and C<NERVE PORTABILITY RELATED SUBROUTINES> sections are importable through tags or manually specifying them.
The tags available include the following:
=over 4
=item C<:process_data> - subroutines under C<DATA PROCESSING RELATED SUBROUTINES> section.
=item C<:local_data> - subroutines under C<NERVE DATA RELATED SUBROUTINES> section.
=item C<:portable_data> - subroutines under C<NERVE PORTABILITY RELATED SUBROUTINES> section.
=back
Most of the stuff are OO.
=cut
use Exporter qw( import );
our @EXPORT_OK = qw(
shuffle_data shuffle_stimuli
preserve save_perceptron revive load_perceptron
preserve_as_yaml save_perceptron_yaml revive_from_yaml load_perceptron_yaml
);
our %EXPORT_TAGS = (
process_data => [ qw( shuffle_data shuffle_stimuli ) ],
local_data => [ qw( preserve save_perceptron revive load_perceptron ) ],
portable_data => [ qw( preserve_as_yaml save_perceptron_yaml revive_from_yaml load_perceptron_yaml ) ],
);
=head1 DESCRIPTION
This module provides methods to build, train, validate and test a perceptron. It can also save the data of the perceptron for future use for any actual AI programs.
This module is also aimed to help newbies grasp hold of the concept of perceptron, training, validation and testing as much as possible. Hence, all the methods and subroutines in this module are decoupled as much as possible so that the actual script...
The implementation here is super basic as it only takes in input of the dendrites and calculate the output. If the output is higher than the threshold, the final result (category) will
be 1 aka perceptron is activated. If not, then the result will be 0 (not activated).
Depending on how you view or categorize the final result, the perceptron will fine tune itself (aka train) based on the learning rate until the desired result is met. Everything from
here on is all mathematics and numbers which only makes sense to the computer and not humans anymore.
Whenever the perceptron fine tunes itself, it will increase/decrease all the dendrites that is significant (attributes labelled 1) for each input. This means that even when the
perceptron successfully fine tunes itself to suite all the data in your file for the first round, the perceptron might still get some of the things wrong for the next round of training.
Therefore, the perceptron should be trained for as many rounds as possible. The more "confusion" the perceptron is able to correctly handle, the more "mature" the perceptron is.
No one defines how "mature" it is except the programmer himself/herself :)
=head1 CONVENTIONS USED
Please take note that not all subroutines/method must be used to make things work. All the subroutines and methods are listed out for the sake of writing the documentation.
Private methods/subroutines are prefixed with C<_> or C<&_> and they aren't meant to be called directly. You can if you want to. There are quite a number of them to be honest, just ignore them if you happen to see them :)
Synonyms are placed before the actual ie. technical subroutines/methods. You will see C<...> as the parameters if they are synonyms. Move to the next subroutine/method until you find something like C<\%options> as the parameter or anything that isn't...
=head1 DATASET STRUCTURE
I<This module can only process CSV files.>
Any field ie columns that will be used for processing must be binary ie. C<0> or C<1> only. Your dataset can contain other columns with non-binary data as long as they are not one of the dendrites.
There are soem sample dataset which can be found in the C<t> directory. The original dataset can also be found in C<docs/book_list.csv>. The files can also be found L<here|https://github.com/Ellednera/AI-Perceptron-Simple>.
=head1 PERCEPTRON DATA
The perceptron/neuron data is stored using the C<Storable> module.
See C<Portability of Nerve Data> section below for more info on some known issues.
=head1 DATA PROCESSING RELATED SUBROUTINES
These subroutines can be imported using the tag C<:process_data>.
These subroutines should be called in the procedural way.
=head2 shuffle_stimuli ( ... )
The parameters and usage are the same as C<shuffled_data>. See the next two subroutines.
=head2 shuffle_data ( $original_data => $shuffled_1, $shuffled_2, ... )
=head2 shuffle_data ( ORIGINAL_DATA, $shuffled_1, $shuffled_2, ... )
Shuffles C<$original_data> or C<ORIGINAL_DATA> and saves them to other files.
=cut
sub shuffle_stimuli {
shuffle_data( @_ );
}
sub shuffle_data {
my $stimuli = shift or croak "Please specify the original file name";
my @shuffled_stimuli_names = @_
or croak "Please specify the output files for the shuffled data";
my @aoa;
for ( @shuffled_stimuli_names ) {
# copied from _real_validate_or_test
# open for shuffling
my $aoa = csv (in => $stimuli, encoding => ":encoding(utf-8)");
my $attrib_array_ref = shift @$aoa; # 'remove' the header, it's annoying :)
@aoa = shuffle( @$aoa ); # this can only process actual array
unshift @aoa, $attrib_array_ref; # put back the headers before saving file
csv( in => \@aoa, out => $_, encoding => ":encoding(utf-8)" )
and
print "Saved shuffled data into ", basename($_), "!\n";
}
}
=head1 CREATION RELATED SUBROUTINES/METHODS
=head2 new ( \%options )
Creates a brand new perceptron and initializes the value of each attribute / dendrite aka. weight. Think of it as the thickness or plasticity of the dendrites.
t/02-state_portable.t view on Meta::CPAN
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use AI::Perceptron::Simple qw( :portable_data );
# for :local_data test, see 04-train.t 02-state_synonyms.t utilizes the full invocation
use FindBin;
use constant MODULE_NAME => "AI::Perceptron::Simple";
my @attributes = qw ( has_trees trees_coverage_more_than_half has_other_living_things );
my $total_headers = scalar @attributes;
my $perceptron = AI::Perceptron::Simple->new( {
initial_value => 0.01,
attribs => \@attributes
} );
subtest "All data related subroutines found" => sub {
# this only checks if the subroutines are contained in the package
ok( AI::Perceptron::Simple->can("preserve_as_yaml"), "&preserve_as_yaml is present" );
ok( AI::Perceptron::Simple->can("save_perceptron_yaml"), "&save_perceptron_yaml is persent" );
ok( AI::Perceptron::Simple->can("revive_from_yaml"), "&revive_from_yaml is present" );
ok( AI::Perceptron::Simple->can("load_perceptron_yaml"), "&load_perceptron_yaml is present" );
};
my $yaml_nerve_file = $FindBin::Bin . "/portable_nerve.yaml";
# save file
save_perceptron_yaml( $perceptron, $yaml_nerve_file );
ok( -e $yaml_nerve_file, "Found the YAML perceptron." );
# load and check
ok( my $transfered_nerve = load_perceptron_yaml( $yaml_nerve_file ), "&loaded_perceptron_from_YAML" );
is_deeply( $transfered_nerve, $perceptron, "&load_perceptron_yaml - correct data after loading" );
is ( ref ($transfered_nerve), "AI::Perceptron::Simple", "Loaded back as a blessed object" );
# test synonyms
AI::Perceptron::Simple::preserve_as_yaml( $perceptron, $yaml_nerve_file );
ok( -e $yaml_nerve_file, "Synonym - Found the YAML perceptron." );
ok( $transfered_nerve = AI::Perceptron::Simple::revive_from_yaml( $yaml_nerve_file ), "&revive_from_yaml is working correctly" );
is_deeply( $transfered_nerve, $perceptron, "&revive_from_yaml - correct data after loading" );
is ( ref ($transfered_nerve), "AI::Perceptron::Simple", "Loaded back as a blessed object" );
done_testing();
# besiyata d'shmaya
t/04-train.t view on Meta::CPAN
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use AI::Perceptron::Simple qw( :local_data );
# pwd is the actual .pm module in blib
# ie. My-Perceptron/blib/lib/My/Perceptron.pm
use FindBin;
use constant TRAINING_DATA => $FindBin::Bin . "/book_list_train.csv";
use constant MODULE_NAME => "AI::Perceptron::Simple";
use constant WANT_STATS => 1;
use constant IDENTIFIER => "book_name";
# 36 headers
t/06-validate.t view on Meta::CPAN
use strict;
use warnings;
use Test::More;
use Test::Output;
use AI::Perceptron::Simple;
use FindBin;
# TRAINING_DATA & VALIDATION_DATA have the same contents, in real world, don't do this
# use different sets of data for training and validating the nerve. Same goes to testing data.
# I'm doing this only to make sure the nerve is working correctly
use constant TRAINING_DATA => $FindBin::Bin . "/book_list_train.csv";
use constant VALIDATION_DATA => $FindBin::Bin . "/book_list_validate.csv";
use constant VALIDATION_DATA_OUTPUT_FILE => $FindBin::Bin . "/book_list_validate-filled.csv";
use constant MODULE_NAME => "AI::Perceptron::Simple";
use constant WANT_STATS => 1;
use constant IDENTIFIER => "book_name";
t/06-validate_synonyms_lab.t view on Meta::CPAN
use strict;
use warnings;
use Test::More;
use Test::Output;
use AI::Perceptron::Simple;
use FindBin;
# TRAINING_DATA & VALIDATION_DATA have the same contents, in real world, don't do this
# use different sets of data for training and validating the nerve. Same goes to testing data.
# I'm doing this only to make sure the nerve is working correctly
use constant TRAINING_DATA => $FindBin::Bin . "/book_list_train.csv";
use constant VALIDATION_DATA => $FindBin::Bin . "/book_list_validate.csv";
use constant VALIDATION_DATA_OUTPUT_FILE => $FindBin::Bin . "/book_list_validate_lab-filled.csv";
use constant MODULE_NAME => "AI::Perceptron::Simple";
use constant WANT_STATS => 1;
use constant IDENTIFIER => "book_name";
t/06-validate_synonyms_mock.t view on Meta::CPAN
use strict;
use warnings;
use Test::More;
use Test::Output;
use AI::Perceptron::Simple;
use FindBin;
# TRAINING_DATA & VALIDATION_DATA have the same contents, in real world, don't do this
# use different sets of data for training and validating the nerve. Same goes to testing data.
# I'm doing this only to make sure the nerve is working correctly
use constant TRAINING_DATA => $FindBin::Bin . "/book_list_train.csv";
use constant VALIDATION_DATA => $FindBin::Bin . "/book_list_validate.csv";
use constant VALIDATION_DATA_OUTPUT_FILE => $FindBin::Bin . "/book_list_validate_mock-filled.csv";
use constant MODULE_NAME => "AI::Perceptron::Simple";
use constant WANT_STATS => 1;
use constant IDENTIFIER => "book_name";
t/08-confusion_matrix.t view on Meta::CPAN
use FindBin;
use constant TEST_FILE => $FindBin::Bin . "/book_list_test-filled.csv";
use constant NON_BINARY_FILE => $FindBin::Bin . "/book_list_test-filled-non-binary.csv";
my $nerve_file = $FindBin::Bin . "/perceptron_1.nerve";
my $perceptron = AI::Perceptron::Simple::load_perceptron( $nerve_file );
ok ( my %c_matrix = $perceptron->get_confusion_matrix( {
full_data_file => TEST_FILE,
actual_output_header => "brand",
predicted_output_header => "predicted",
} ),
"get_confusion_matrix method is working");
is ( ref \%c_matrix, ref {}, "Confusion matrix in correct data structure" );
is ( $c_matrix{ true_positive }, 2, "Correct true_positive" );
is ( $c_matrix{ true_negative }, 4, "Correct true_negative" );
is ( $c_matrix{ false_positive }, 1, "Correct false_positive" );
is ( $c_matrix{ false_negative }, 3, "Correct false_negative" );
is ( $c_matrix{ total_entries }, 10, "Total entries is correct" );
ok ( AI::Perceptron::Simple::_calculate_total_entries( \%c_matrix ),
"Testing the 'untestable' &_calculate_total_entries" );
is ( $c_matrix{ total_entries }, 10, "'illegal' calculation of total entries is correct" );
t/08-confusion_matrix_synonyms.t view on Meta::CPAN
use FindBin;
use constant TEST_FILE => $FindBin::Bin . "/book_list_test-filled.csv";
use constant NON_BINARY_FILE => $FindBin::Bin . "/book_list_test-filled-non-binary.csv";
my $nerve_file = $FindBin::Bin . "/perceptron_1.nerve";
my $perceptron = AI::Perceptron::Simple::load_perceptron( $nerve_file );
ok ( my %c_matrix = $perceptron->get_exam_results( {
full_data_file => TEST_FILE,
actual_output_header => "brand",
predicted_output_header => "predicted",
} ),
"get_exam_results method is working");
is ( ref \%c_matrix, ref {}, "Confusion matrix in correct data structure" );
is ( $c_matrix{ true_positive }, 2, "Correct true_positive" );
is ( $c_matrix{ true_negative }, 4, "Correct true_negative" );
is ( $c_matrix{ false_positive }, 1, "Correct false_positive" );
is ( $c_matrix{ false_negative }, 3, "Correct false_negative" );
is ( $c_matrix{ total_entries }, 10, "Total entries is correct" );
ok ( AI::Perceptron::Simple::_calculate_total_entries( \%c_matrix ),
"Testing the 'untestable' &_calculate_total_entries" );
is ( $c_matrix{ total_entries }, 10, "'illegal' calculation of total entries is correct" );
t/12-shuffle_data.t view on Meta::CPAN
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use Test::Output;
# use AI::Perceptron::Simple "shuffle_data";
use AI::Perceptron::Simple ":process_data";
use FindBin;
# this one will directly use "ORIGINAL_STIMULI" as the filename if use with "=>", strange
use constant ORIGINAL_STIMULI => $FindBin::Bin . "/book_list_to_shuffle.csv";
my $original_stimuli = $FindBin::Bin . "/book_list_to_shuffle.csv";
my $shuffled_data_1 = $FindBin::Bin . "/shuffled_1.csv";
my $shuffled_data_2 = $FindBin::Bin . "/shuffled_2.csv";
my $shuffled_data_3 = $FindBin::Bin . "/shuffled_3.csv";
ok( -e $original_stimuli, "Found the original file" );
{
local $@;
eval { shuffle_data };
like( $@, qr/^Please specify/, "Croaked at invocation with any arguements" )
}
{
local $@;
eval { shuffle_data($original_stimuli) };
like( $@, qr/output files/, "Croaked when new file names not present" )
}
shuffle_data( $original_stimuli => $shuffled_data_1, $shuffled_data_2, $shuffled_data_3 );
stdout_like {
shuffle_data( ORIGINAL_STIMULI, $shuffled_data_1, $shuffled_data_2, $shuffled_data_3 );
} qr/^Saved/, "Correct output after saving file";
ok( -e $shuffled_data_1, "Found the first shuffled file" );
ok( -e $shuffled_data_2, "Found the second shuffled file" );
ok( -e $shuffled_data_3, "Found the third shuffled file" );
done_testing();
# besiyata d'shmaya
t/12-shuffle_data_synonym.t view on Meta::CPAN
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use Test::Output;
use AI::Perceptron::Simple "shuffle_stimuli";
#use AI::Perceptron::Simple ":process_data";
use FindBin;
# this one will directly use "ORIGINAL_STIMULI" as the filename if use with "=>", strange
use constant ORIGINAL_STIMULI => $FindBin::Bin . "/book_list_to_shuffle.csv";
my $original_stimuli = $FindBin::Bin . "/book_list_to_shuffle.csv";
my $shuffled_data_1 = $FindBin::Bin . "/shuffled_1.csv";
my $shuffled_data_2 = $FindBin::Bin . "/shuffled_2.csv";
my $shuffled_data_3 = $FindBin::Bin . "/shuffled_3.csv";
ok( -e $original_stimuli, "Found the original file" );
{
local $@;
eval { shuffle_stimuli };
like( $@, qr/^Please specify/, "Croaked at invocation with any arguements" )
}
{
local $@;
eval { shuffle_stimuli($original_stimuli) };
like( $@, qr/output files/, "Croaked when new file names not present" )
}
shuffle_stimuli( $original_stimuli => $shuffled_data_1, $shuffled_data_2, $shuffled_data_3 );
stdout_like {
shuffle_stimuli( ORIGINAL_STIMULI, $shuffled_data_1, $shuffled_data_2, $shuffled_data_3 );
} qr/^Saved/, "Correct output after saving file";
ok( -e $shuffled_data_1, "Found the first shuffled file" );
ok( -e $shuffled_data_2, "Found the second shuffled file" );
ok( -e $shuffled_data_3, "Found the third shuffled file" );
done_testing();
# besiyata d'shmaya
view all matches for this distributionview release on metacpan - search on metacpan