AI-NaiveBayes1

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

Revision history for Perl extension AI::NaiveBayes1.

2.012  Sat May 29 11:00:47 ADT 2021
 - Removing non-ASCII character in "Bohmer" last name to avoid testing
   issues, "Daniel B\"ohmer" name changed to "Daniel Bohmer" for
   documentation purposes

2.011  Fri May 28 07:55:50 ADT 2021
  - merged corrections from Daniel Bohmer

2.010  Tue May 25 18:30:47 ADT 2021
  - fixing bugs reported by perltesters on some platforms.
    It seems to be due to running tests in a parallel way, and
    that using the same name of the temporary file is the problem.

2.009  Sun May 23 09:23:06 ADT 2021
  - fixed bug with t/auxfunctions.pl not being located during
    testing (hopefully fixed because it requires a different
    environment to reproduce)
  - made Changes reverse chronological

2.008  Fri May 21 10:56:33 ADT 2021
  - GitHub release
  - documentation improvements

2.007  Wed Jan 29 05:58:37 AST 2020
  - documentation update
  - fixed bug: import_from_YAML and import_from_YAML_file now
    return bless-ed references (they did not return before)

2.006  Tue Mar 15 06:17:04 ADT 2011
  - reduced precision in \d{4}.\d+ to \d{4}.\d{10} avoid testing
    problems on different platforms

2.005  Tue Mar  8 06:35:43 AST 2011
  - reduced precision in comparison to avoid testing problems on
    different platforms (from 14 to 12 digits after decimal point)

2.004  Sat Mar  5 12:39:35 AST 2011
  - added missing files to MANIFEST (t/7-3.out and t/7-5.out)

2.003  Fri Mar  4 06:43:31 AST 2011
  - fixed more testing problems due to differences in precision

2.002  Wed Mar  2 06:29:06 AST 2011
  - fixed some testing problems due to differences in precision

2.001  Mon Feb 28 11:18:22 AST 2011
  - fixed some testing problems due to differences in precision

2.000  Tue Feb 22 09:58:04 AST 2011
  - fixing lexical sorting of version numbers

1.10  Tue Feb 22 09:31:12 AST 2011
  - fixed testing problems due to differences in precision
  - fixed podchecker warning (some space)
  - better test error reporting in 2.t
  - added t/pod.t, thanks to Michael Stevens

1.9   Tue Aug 31 09:27:51 ADT 2010
  - fixed testing problems due to differences in precision in t/2.t

1.8   Fri Aug 21 06:36:34 ADT 2009
  - fixed a pod documentation error

1.7   Thu Aug 20 14:20:15 ADT 2009
  - improvements in documentation
  - added method add_csv_file
  - added method drop_attributes
  - removed real_attr and added attribute_type field

1.6   Wed Aug 19 09:09:57 ADT 2009
  - improved an error message
  - fixed some testing problems due to whitespace
  - small improvement in generating documentation

1.5   Wed Jan 30 08:06:22 AST 2008
  - fixed testing problems due to differences in the lowest
    significant digit on different platforms

1.4   Fri Dec 14 11:42:16 AST 2007
  - added test skipping if YAML module is not available
  - removing deprecated "our" reserved word

1.3   Fri Dec  7 07:26:47 AST 2007
  - added reading from a table format (add_table)
  - fixed some warnings reported during testing
  - added option "with counts" to print_model
  - added smoothing option for attributes

1.2   Mon Mar 14 08:03:16 AST 2005
  - addition to documentation
  - fixing a minor bug and a warning

1.1   Tue Apr 20 08:07:33 ADT 2004
  - added several more test
  - implemented optional Gaussian distribution for numerical
    attributes

1.0   Thu Sep  4 08:01:19 ADT 2003
  - bug fix in testing (attributes and labels are sorted now in
    print_model)

0.03  Wed Sep  3 08:25:43 ADT 2003
  - bug fix
  - import and export to/from string and file (using YAML)

0.02  Fri May  9 15:53:53 ADT 2003
  - bug fix

0.01  Fri May  9 14:48:53 ADT 2003
  - original version;

MANIFEST  view on Meta::CPAN

README
MANIFEST
Changes
Makefile.PL
NaiveBayes1.pm
META.yml
t/auxfunctions.pl
t/pod.t
t/1.t
t/1-1.out
t/1-2.out
t/2-1.out
t/2-2.out
t/2-3.out
t/2.t
t/3-1.out
t/3-2.out
t/3-3.out
t/3.t
t/4-1.out
t/4-2.out
t/4-3.out
t/4.t
t/5-1.out
t/5-2.out
t/5-3.out
t/5.t
t/a2.arff
t/6.t
t/6-1.out
t/6-2.out
t/6-3.out
t/6-4.out
t/7.t
t/7-1.out
t/7-2.out
t/7-3.out
t/7-5.out
t/8.t
t/8-1.out
META.json                                Module JSON meta-data (added by MakeMaker)

META.json  view on Meta::CPAN

{
   "abstract" : "Naive Bayes Classification",
   "author" : [
      "Vlado Keselj https://web.cs.dal.ca/~vlado"
   ],
   "dynamic_config" : 1,
   "generated_by" : "ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.150001",
   "license" : [
      "perl_5"
   ],
   "meta-spec" : {
      "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
      "version" : "2"
   },
   "name" : "AI-NaiveBayes1",
   "no_index" : {
      "directory" : [
         "t",
         "inc"
      ]
   },
   "prereqs" : {
      "build" : {
         "requires" : {
            "ExtUtils::MakeMaker" : "0"
         }
      },
      "configure" : {
         "requires" : {
            "ExtUtils::MakeMaker" : "0"
         }
      },
      "runtime" : {
         "requires" : {
            "YAML" : "0.0"
         }
      }
   },
   "release_status" : "stable",
   "version" : "2.012"
}

META.yml  view on Meta::CPAN

---
abstract: 'Naive Bayes Classification'
author:
  - 'Vlado Keselj https://web.cs.dal.ca/~vlado'
build_requires:
  ExtUtils::MakeMaker: '0'
configure_requires:
  ExtUtils::MakeMaker: '0'
dynamic_config: 1
generated_by: 'ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.150001'
license: perl
meta-spec:
  url: http://module-build.sourceforge.net/META-spec-v1.4.html
  version: '1.4'
name: AI-NaiveBayes1
no_index:
  directory:
    - t
    - inc
requires:
  YAML: '0.0'
version: '2.012'

Makefile.PL  view on Meta::CPAN

use ExtUtils::MakeMaker;
#<? read_starfish_conf() !>

my $module = 'AI::NaiveBayes1';
my $name   = 'NaiveBayes1.pm';
(my $dir = $module) =~ s/::/-/g;

WriteMakefile
( 'NAME'       => $module,
  'VERSION_FROM' => $name, # finds $VERSION
  'dist' => { COMPRESS=>"gzip",
	      SUFFIX=>"gz",
	      #PREOP=>('starfish README; '.
	      #        "cp -f README $dir-\$(VERSION); "
	      #       ),
	  },
  'clean' => {FILES => "tmp* testfiles/tmp1 AI *~ t/tmp1 t/*~ t/tmp6 t/tmp6-2"},
  'PREREQ_PM' => { YAML => '0.0' },
  'PL_FILES' => {},
  ($] >= 5.005 ?    ## Add these new keywords supported since 5.005
   (ABSTRACT_FROM => $name, # retrieve abstract from module
   # <? echo "   AUTHOR => 'Vlado Keselj $VladoURL')" !>#+
      AUTHOR => 'Vlado Keselj https://web.cs.dal.ca/~vlado')
   #-
   : ()
   ),
  ($ExtUtils::MakeMaker::VERSION ge '6.30_00'? 
   ('LICENSE'         => 'perl', ) : ()),
  );

open(M, ">>Makefile") or die;

if ( -f 'priv.make' ) { print M getfile('priv.make') }

close(M);

sub getfile($) {
    my $f = shift;
    local *F;
    open(F, "<$f") or die "getfile:cannot open $f:$!";
    my @r = <F>;
    close(F);
    return wantarray ? @r : join ('', @r);
}

NaiveBayes1.pm  view on Meta::CPAN

# (c) 2003-21 Vlado Keselj https://web.cs.dal.ca/~vlado

package AI::NaiveBayes1;
use strict;
require Exporter;
use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);
@EXPORT = qw(new);
use vars qw($Version);
$Version = $VERSION = '2.012';

use vars @EXPORT_OK;

# non-exported package globals go here
use vars qw();

sub new {
  my $package = shift;
  return bless {
                attributes => [ ],
		labels     => [ ],
		attvals    => {},
		real_stat  => {},
		numof_instances => 0,
		stat_labels => {},
		stat_attributes => {},
		smoothing => {},
		attribute_type => {},
	       }, $package;
}

sub set_real {
    my ($self, @attr) = @_;
    foreach my $a (@attr) { $self->{attribute_type}{$a} = 'real' }
}

sub import_from_YAML {
    my $package = shift;
    my $yaml = shift;
    my $self = YAML::Load($yaml);
    return bless $self, $package;
}

sub import_from_YAML_file {
    my $package = shift;
    my $yamlf = shift;
    my $self = YAML::LoadFile($yamlf);
    return bless $self, $package;
}

# assume that the last header count means counts
# after optionally removing counts, the last header is label
sub add_table {
    my $self = shift;
    my @atts = (); my $lbl=''; my $cnt = '';
    while (@_) {
	my $table = shift;
	if ($table =~ /^(.*)\n[ \t]*-+\n/) {
	    my $a = $1; $table = $';
	    $a =~ s/^\s+//; $a =~ s/\s+$//;
	    if ($a =~ /\s*\bcount\s*$/) {
		$a=$`; $cnt=1; } else { $cnt='' }
	    @atts = split(/\s+/, $a);
	    $lbl = pop @atts;
	}
	while ($table ne '') {
	    $table =~ /^(.*)\n?/ or die;
	    my $r=$1; $table = $';
	    $r =~ s/^\s+//; $r=~ s/\s+$//;
	    if ($r =~ /^-+$/) { next }
	    my @v = split(/\s+/, $r);
	    die "values (#=$#v): {@v}\natts (#=$#atts): @atts, lbl=$lbl,\n".
                 "count: $cnt\n" unless $#v-($cnt?2:1) == $#atts;
	    my %av=(); my @a = @atts;
	    while (@a) { $av{shift @a} = shift(@v) }
	    $self->add_instances(attributes=>\%av,
				 label=>"$lbl=$v[0]",
				 cases=>($cnt?$v[1]:1) );
	}
    }
} # end of add_table

# Simplified; not generally compatible.
# Assume that the last header is label.  The first row contains
# attribute names.
sub add_csv_file {
    my $self = shift; my $fn = shift; local *F;
    open(F,$fn) or die "Cannot open CSV file `$fn': $!";
    local $_ = <F>; my @atts = (); my $lbl=''; my $cnt = '';
    chomp; @atts = split(/\s*,\s*/, $_); $lbl = pop @atts;
    while (<F>) {
	chomp; my @v = split(/\s*,\s*/, $_);
	die "values (#=$#v): {@v}\natts (#=$#atts): @atts, lbl=$lbl,\n".
	    "count: $cnt\n" unless $#v-($cnt?2:1) == $#atts;
	my %av=(); my @a = @atts;
	while (@a) { $av{shift @a} = shift(@v) }
	$self->add_instances(attributes=>\%av,
			     label=>"$lbl=$v[0]",
			     cases=>($cnt?$v[1]:1) );
    }
    close(F);
} # end of add_csv_file

sub drop_attributes {
    my $self = shift;
    foreach my $a (@_) {
	my @tmp = grep { $a ne $_ } @{ $self->{attributes} };
	$self->{attributes} = \@tmp;
	delete($self->{attvals}{$a});
	delete($self->{stat_attributes}{$a});
	delete($self->{attribute_type}{$a});
	delete($self->{real_stat}{$a});
	delete($self->{smoothing}{$a});
    }
} # end of drop_attributes

sub add_instances {
  my ($self, %params) = @_;
  for ('attributes', 'label', 'cases') {
      die "Missing required '$_' parameter" unless exists $params{$_};
  }

  if (scalar(keys(%{ $self->{stat_attributes} })) == 0) {
      foreach my $a (keys(%{$params{attributes}})) {
	  $self->{stat_attributes}{$a} = {};
	  push @{ $self->{attributes} }, $a;
	  $self->{attvals}{$a} = [ ];
	  $self->{attribute_type}{$a} = 'nominal' unless defined($self->{attribute_type}{$a});
      }
  } else {
      foreach my $a (keys(%{$self->{stat_attributes}}))
      { die "attribute not given in instance: $a"
	    unless exists($params{attributes}{$a}) }
  }

  $self->{numof_instances} += $params{cases};

  push @{ $self->{labels} }, $params{label} unless
      exists $self->{stat_labels}->{$params{label}};

  $self->{stat_labels}{$params{label}} += $params{cases};

  foreach my $a (keys(%{$self->{stat_attributes}})) {
      if ( not exists($params{attributes}{$a}) )
      { die "attribute $a not given" }
      my $attval = $params{attributes}{$a};
      if (not exists($self->{stat_attributes}{$a}{$attval})) {
	  push @{ $self->{attvals}{$a} }, $attval;
	  $self->{stat_attributes}{$a}{$attval} = {};
      }
      $self->{stat_attributes}{$a}{$attval}{$params{label}} += $params{cases};
  }
}

sub add_instance {
    my ($self, %params) = @_; $params{cases} = 1;
    $self->add_instances(%params);
}

sub train {
    my $self = shift;
    my $m = $self->{model} = {};
    
    $m->{labelprob} = {};
    foreach my $label (keys(%{$self->{stat_labels}}))
    { $m->{labelprob}{$label} = $self->{stat_labels}{$label} /
                                $self->{numof_instances} } 

    $m->{condprob} = {};
    $m->{condprobe} = {};
    foreach my $att (keys(%{$self->{stat_attributes}})) {
        next if $self->{attribute_type}{$att} eq 'real';
	$m->{condprob}{$att} = {};
	$m->{condprobe}{$att} = {};
	foreach my $label (keys(%{$self->{stat_labels}})) {
	    my $total = 0; my @attvals = ();
	    foreach my $attval (keys(%{$self->{stat_attributes}{$att}})) {
		next unless
		    exists($self->{stat_attributes}{$att}{$attval}{$label}) and
		    $self->{stat_attributes}{$att}{$attval}{$label} > 0;
		push @attvals, $attval;
		$m->{condprob}{$att}{$attval} = {} unless
		    exists( $m->{condprob}{$att}{$attval} );
		$m->{condprob}{$att}{$attval}{$label} = 
		    $self->{stat_attributes}{$att}{$attval}{$label};
		$m->{condprobe}{$att}{$attval} = {} unless
		    exists( $m->{condprob}{$att}{$attval} );
		$m->{condprobe}{$att}{$attval}{$label} = 
		    $self->{stat_attributes}{$att}{$attval}{$label};
		$total += $m->{condprob}{$att}{$attval}{$label};
	    }
	    if (exists($self->{smoothing}{$att}) and
		$self->{smoothing}{$att} =~ /^unseen count=/) {
		my $uc = $'; $uc = 0.5 if $uc <= 0;
		if(! exists($m->{condprob}{$att}{'*'}) ) {
		    $m->{condprob}{$att}{'*'} = {};
		    $m->{condprobe}{$att}{'*'} = {};
		}
		$m->{condprob}{$att}{'*'}{$label} = $uc;
		$total += $uc;
		if (grep {$_ eq '*'} @attvals) { die }
		push @attvals, '*';
	    }
	    foreach my $attval (@attvals) {
		$m->{condprobe}{$att}{$attval}{$label} =
		    "(= $m->{condprob}{$att}{$attval}{$label} / $total)";
		$m->{condprob}{$att}{$attval}{$label} /= $total;
	    }
	}
    }

    # For real-valued attributes, we use Gaussian distribution
    # let us collect statistics
    foreach my $att (keys(%{$self->{stat_attributes}})) {
        next unless $self->{attribute_type}{$att} eq 'real';
	print STDERR "Smoothing ignored for real attribute $att!\n" if
	    defined($self->{smoothing}{att}) and $self->{smoothing}{att};
        $m->{real_stat}->{$att} = {};
        foreach my $attval (keys %{$self->{stat_attributes}{$att}}){
            foreach my $label (keys %{$self->{stat_attributes}{$att}{$attval}}){
                $m->{real_stat}{$att}{$label}{sum}
                += $attval * $self->{stat_attributes}{$att}{$attval}{$label};

                $m->{real_stat}{$att}{$label}{count}
                += $self->{stat_attributes}{$att}{$attval}{$label};
            }
            foreach my $label (keys %{$self->{stat_attributes}{$att}{$attval}}){
		next if
                !defined($m->{real_stat}{$att}{$label}{count}) ||
		$m->{real_stat}{$att}{$label}{count} == 0;

                $m->{real_stat}{$att}{$label}{mean} =
                    $m->{real_stat}{$att}{$label}{sum} /
                        $m->{real_stat}{$att}{$label}{count};
            }
        }

        # calculate stddev
        foreach my $attval (keys %{$self->{stat_attributes}{$att}}) {
            foreach my $label (keys %{$self->{stat_attributes}{$att}{$attval}}){
                $m->{real_stat}{$att}{$label}{stddev} +=
		    ($attval - $m->{real_stat}{$att}{$label}{mean})**2 *
		    $self->{stat_attributes}{$att}{$attval}{$label};
            }
        }
	foreach my $label (keys %{$m->{real_stat}{$att}}) {
	    $m->{real_stat}{$att}{$label}{stddev} =
		sqrt($m->{real_stat}{$att}{$label}{stddev} /
		     ($m->{real_stat}{$att}{$label}{count}-1)
		     );
	}
    }				# foreach real attribute
}				# end of sub train

sub predict {
  my ($self, %params) = @_;
  my $newattrs = $params{attributes} or die "Missing 'attributes' parameter for predict()";
  my $m = $self->{model};  # For convenience
  
  my %scores;
  my @labels = @{ $self->{labels} };
  $scores{$_} = $m->{labelprob}{$_} foreach (@labels);
  foreach my $att (keys(%{ $newattrs })) {
      if (!defined($self->{attribute_type}{$att})) { die "Unknown attribute: `$att'" }
      next if $self->{attribute_type}{$att} eq 'real';
      die unless exists($self->{stat_attributes}{$att});
      my $attval = $newattrs->{$att};
      die "Unknown value `$attval' for attribute `$att'."
      unless exists($self->{stat_attributes}{$att}{$attval}) or
	  exists($self->{smoothing}{$att});
      foreach my $label (@labels) {
	  if (exists($m->{condprob}{$att}{$attval}) and
	      exists($m->{condprob}{$att}{$attval}{$label}) and
	      $m->{condprob}{$att}{$attval}{$label} > 0 ) {
	      $scores{$label} *=
		  $m->{condprob}{$att}{$attval}{$label};
	  } elsif (exists($self->{smoothing}{$att})) {
	      $scores{$label} *=
                  $m->{condprob}{$att}{'*'}{$label};
	  } else { $scores{$label} = 0 }

      }
  }

  foreach my $att (keys %{$newattrs}){
      next unless $self->{attribute_type}{$att} eq 'real';
      my $sum=0; my %nscores;
      foreach my $label (@labels) {
	  die unless exists $m->{real_stat}{$att}{$label}{mean};
	  $nscores{$label} =
              0.398942280401433 / $m->{real_stat}{$att}{$label}{stddev}*
              exp( -0.5 *
                  ( ( $newattrs->{$att} -
                      $m->{real_stat}{$att}{$label}{mean})
                    / $m->{real_stat}{$att}{$label}{stddev}
                  ) ** 2
		 );
	  $sum += $nscores{$label};
      }
      if ($sum==0) { print STDERR "Ignoring all Gaussian probabilities: all=0!\n" }
      else {
	  foreach my $label (@labels) { $scores{$label} *= $nscores{$label} }
      }
  }

  my $sumPx = 0.0;
  $sumPx += $scores{$_} foreach (keys(%scores));
  $scores{$_} /= $sumPx foreach (keys(%scores));
  return \%scores;
}

sub print_model {
    my $self = shift;
    my $withcounts = '';
    if ($#_>-1 && $_[0] eq 'with counts')
    { shift @_; $withcounts = 1; }
    my $m = $self->{model};
    my @labels = $self->labels;
    my $r;

    # prepare table category P(category)
    my @lines;
    push @lines, 'category ', '-';
    push @lines, "$_ " foreach @labels;
    @lines = _append_lines(@lines);
    @lines = map { $_.='| ' } @lines;
    $lines[1] = substr($lines[1],0,length($lines[1])-2).'+-';
    $lines[0] .= "P(category) ";
    foreach my $i (2..$#lines) {
	my $label = $labels[$i-2];
	$lines[$i] .= $m->{labelprob}{$label} .' ';
	if ($withcounts) {
	    $lines[$i] .= "(= $self->{stat_labels}{$label} / ".
		"$self->{numof_instances} ) ";
	}
    }
    @lines = _append_lines(@lines);

    $r .= join("\n", @lines) . "\n". $lines[1]. "\n\n";

    # prepare conditional tables
    my @attributes = sort $self->attributes;
    foreach my $att (@attributes) {
	@lines = ( "category ", '-' );
	my @lines1 = ( "$att ", '-' );
	my @lines2 = ( "P( $att | category ) ", '-' );
	my @attvals = sort keys(%{ $m->{condprob}{$att} });
	foreach my $label (@labels) {
	    if ( $self->{attribute_type}{$att} ne 'real' ) {
		foreach my $attval (@attvals) {
		    next unless exists($m->{condprob}{$att}{$attval}{$label});
		    push @lines, "$label ";
		    push @lines1, "$attval ";

		    my $line = $m->{condprob}{$att}{$attval}{$label};
		    if ($withcounts)
		    { $line.= ' '.$m->{condprobe}{$att}{$attval}{$label} }
		    $line .= ' ';
		    push @lines2, $line;
		}
	    } else {
		push @lines, "$label ";
		push @lines1, "real ";
		push @lines2, "Gaussian(mean=".
		    $m->{real_stat}{$att}{$label}{mean}.",stddev=".
		    $m->{real_stat}{$att}{$label}{stddev}.") ";
	    }
	    push @lines, '-'; push @lines1, '-'; push @lines2, '-';
	}
	@lines = _append_lines(@lines);
	foreach my $i (0 .. $#lines)
	{ $lines[$i] .= ($lines[$i]=~/-$/?'+-':'| ') . $lines1[$i] }
	@lines = _append_lines(@lines);
	foreach my $i (0 .. $#lines)
	{ $lines[$i] .= ($lines[$i]=~/-$/?'+-':'| ') . $lines2[$i] }
	@lines = _append_lines(@lines);

	$r .= join("\n", @lines). "\n\n";
    }

    return $r;
}

sub _append_lines {
    my @l = @_;
    my $m = 0;
    foreach (@l) { $m = length($_) if length($_) > $m }
    @l = map 
    { while (length($_) < $m) { $_.=substr($_,length($_)-1) }; $_ }
    @l;
    return @l;
}

sub labels {
  my $self = shift;
  return @{ $self->{labels} };
}

sub attributes {
  my $self = shift;
  return keys %{ $self->{stat_attributes} };
}

sub export_to_YAML {
    my $self = shift;
    require YAML;
    return YAML::Dump($self);
}

sub export_to_YAML_file {
    my $self = shift;
    my $file = shift;
    require YAML;
    YAML::DumpFile($file, $self);
}

1;
__END__

=head1 NAME

AI::NaiveBayes1 - Naive Bayes Classification

=head1 SYNOPSIS

  use AI::NaiveBayes1;
  my $nb = AI::NaiveBayes1->new;
  $nb->add_table(
  "Html  Caps  Free  Spam  count
  -------------------------------
     Y     Y     Y     Y    42   
     Y     Y     Y     N    32   
     Y     Y     N     Y    17   
     Y     Y     N     N     7   
     Y     N     Y     Y    32   
     Y     N     Y     N    12   
     Y     N     N     Y    20   
     Y     N     N     N    16   
     N     Y     Y     Y    38   
     N     Y     Y     N    18   
     N     Y     N     Y    16   
     N     Y     N     N    16   
     N     N     Y     Y     2   
     N     N     Y     N     9   
     N     N     N     Y    11   
     N     N     N     N    91   
  -------------------------------
  ");
  $nb->train;
  print "Model:\n" . $nb->print_model;
  print "Model (with counts):\n" . $nb->print_model('with counts');

  $nb = AI::NaiveBayes1->new;
  $nb->add_instances(attributes=>{model=>'H',place=>'B'},
		     label=>'repairs=Y',cases=>30);
  $nb->add_instances(attributes=>{model=>'H',place=>'B'},
		     label=>'repairs=N',cases=>10);
  $nb->add_instances(attributes=>{model=>'H',place=>'N'},
		     label=>'repairs=Y',cases=>18);
  $nb->add_instances(attributes=>{model=>'H',place=>'N'},
		     label=>'repairs=N',cases=>16);
  $nb->add_instances(attributes=>{model=>'T',place=>'B'},
		     label=>'repairs=Y',cases=>22);
  $nb->add_instances(attributes=>{model=>'T',place=>'B'},
		     label=>'repairs=N',cases=>14);
  $nb->add_instances(attributes=>{model=>'T',place=>'N'},
		     label=>'repairs=Y',cases=> 6);
  $nb->add_instances(attributes=>{model=>'T',place=>'N'},
		     label=>'repairs=N',cases=>84);

  $nb->train;

  print "Model:\n" . $nb->print_model;
  
  # Find results for unseen instances
  my $result = $nb->predict
     (attributes => {model=>'T', place=>'N'});

  foreach my $k (keys(%{ $result })) {
      print "for label $k P = " . $result->{$k} . "\n";
  }

  # export the model into a string
  my $string = $nb->export_to_YAML();

  # create the same model from the string
  my $nb1 = AI::NaiveBayes1->import_from_YAML($string);

  # write the model to a file (shorter than model->string->file)
  $nb->export_to_YAML_file('t/tmp1');

  # read the model from a file (shorter than file->string->model)
  my $nb2 = AI::NaiveBayes1->import_from_YAML_file('t/tmp1');

See Examples for more examples.

=head1 DESCRIPTION

This module implements the classic "Naive Bayes" machine learning
algorithm.

=head2 Data Structure

An object contains the following fields:

=over 4

=item C<{attributes}>

List of attribute names.

=item C<{attribute_type}{$a}>

Attribute types - 'real', or not (e.g., 'nominal')

=item C<{labels}>

List of labels.

=item C<{attvals}{$a}>

List of attribute values

=item C<{real_stat}{$a}{$v}{$l}{sum}>

Statistics for real valued attributes; besides 'sum' also: count, mean, stddev

=item C<{numof_instances}>

Number of training instances.

=item C<{stat_labels}{$l}>

Label count in training data.

=item C<{stat_attributes}{$a}>

Statistics for an attribute: C<...{$value}{$label}> = count of
instances.

=item C<{smoothing}{$attribute}>

Attribute smoothing.  No smoothing if does not exist.  Implemented smoothing:

      - /^unseen count=/ followed by number, e.g., 0.5

=back

=head2 Attribute Smoothing

For an attribute A one can specify:

    $nb->{smoothing}{A} = 'unseen count=0.5';

to provide a count for unseen data.  The count is taken into
consideration in training and prediction, when any unseen attribute
values are observed.  Zero probabilities can be prevented in this way.
A count other than 0.5 can be provided, but if it is <=0 it will be
set to 0.5.  The method is similar to add-one smoothing.  A special
attribute value '*' is used for all unseen data. 

=head1 METHODS

=head2 Constructor Methods

=over 4

=item new()

Constructor. Creates a new C<AI::NaiveBayes1> object and returns it.

=item import_from_YAML($string)

Constructor. Creates a new C<AI::NaiveBayes1> object from a string where it is
represented in C<YAML>.  Requires YAML module.

=item import_from_YAML_file($file_name)

Constructor. Creates a new C<AI::NaiveBayes1> object from a file where it is
represented in C<YAML>.  Requires YAML module.

=back

=head2 Non-Constructor Methods

=over 4

=item add_table()

Add instances from a table.  The first row are attributes, followed by
values.  If the name of the last attribute is `count', it is
interpreted as a repetition count and used appropriatelly.  The last
attribute (after optionally removing `count') is the class attribute.
The attributes and values are separated by white space.

=item add_csv_file($filename)

Add instances from a CSV file.  Primitive format implementation (e.g.,
no commas allowed in attribute names or values).

=item drop_attributes(@attributes)

Delete attributes after adding instances.

=item set_real(list_of_attributes)

Delares a list of attributes to be real-valued.  During training,
their conditional probabilities will be modeled with Gaussian (normal)
distributions. 

=item C<add_instance(attributes=E<gt>HASH,label=E<gt>STRING|ARRAY)>

Adds a training instance to the categorizer.

=item C<add_instances(attributes=E<gt>HASH,label=E<gt>STRING|ARRAY,cases=E<gt>NUMBER)>

Adds a number of identical instances to the categorizer.

=item export_to_YAML()

Returns a C<YAML> string representation of an C<AI::NaiveBayes1>
object.  Requires YAML module.

=item C<export_to_YAML_file( $file_name )>

Writes a C<YAML> string representation of an C<AI::NaiveBayes1>
object to a file.  Requires YAML module.

=item C<print_model( OPTIONAL 'with counts' )>

Returns a string, human-friendly representation of the model.
The model is supposed to be trained before calling this method.
One argument 'with counts' can be supplied, in which case explanatory
expressions with counts are printed as well.

=item train()

Calculates the probabilities that will be necessary for categorization
using the C<predict()> method.

=item C<predict( attributes =E<gt> HASH )>

Use this method to predict the label of an unknown instance.  The
attributes should be of the same format as you passed to
C<add_instance()>.  C<predict()> returns a hash reference whose keys
are the names of labels, and whose values are corresponding
probabilities.

=item C<labels>

Returns a list of all the labels the object knows about (in no
particular order), or the number of labels if called in a scalar
context.

=back

=head1 THEORY

Bayes' Theorem is a way of inverting a conditional probability. It
states:

                P(y|x) P(x)
      P(x|y) = -------------
                   P(y)

and so on...

This is a pretty standard algorithm explained in many machine learning
textbooks (e.g., "Data Mining" by Witten and Eibe).

The algorithm relies on estimating P(A|C), where A is an arbitrary
attribute, and C is the class attribute.  If A is not real-valued,
then this conditional probability is estimated using a table of all
possible values for A and C.

If A is real-valued, then the distribution P(A|C) is modeled as a
Gaussian (normal) distribution for each possible value of C=c,  Hence,
for each C=c we collect the mean value (m) and standard deviation (s)
for A during training.  During classification, P(A=a|C=c) is estimated
using Gaussian distribution, i.e., in the following way:

                    1               (a-m)^2
 P(A=a|C=c) = ------------ * exp( - ------- )
              sqrt(2*Pi)*s           2*s^2

this boils down to the following lines of code:

    $scores{$label} *=
    0.398942280401433 / $m->{real_stat}{$att}{$label}{stddev}*
      exp( -0.5 *
           ( ( $newattrs->{$att} -
               $m->{real_stat}{$att}{$label}{mean})
             / $m->{real_stat}{$att}{$label}{stddev}
           ) ** 2
	   );

i.e.,

  P(A=a|C=c) = 0.398942280401433 / s *
    exp( -0.5 * ( ( a-m ) / s ) ** 2 );


=head1 EXAMPLES

Example with a real-valued attribute modeled by a Gaussian
distribution (from Witten I. and Frank E. book "Data Mining" (the WEKA
book), page 86):

 # @relation weather
 # 
 # @attribute outlook {sunny, overcast, rainy}
 # @attribute temperature real
 # @attribute humidity real
 # @attribute windy {TRUE, FALSE}
 # @attribute play {yes, no}
 # 
 # @data
 # sunny,85,85,FALSE,no
 # sunny,80,90,TRUE,no
 # overcast,83,86,FALSE,yes
 # rainy,70,96,FALSE,yes
 # rainy,68,80,FALSE,yes
 # rainy,65,70,TRUE,no
 # overcast,64,65,TRUE,yes
 # sunny,72,95,FALSE,no
 # sunny,69,70,FALSE,yes
 # rainy,75,80,FALSE,yes
 # sunny,75,70,TRUE,yes
 # overcast,72,90,TRUE,yes
 # overcast,81,75,FALSE,yes
 # rainy,71,91,TRUE,no
 
 $nb->set_real('temperature', 'humidity');
 
 $nb->add_instance(attributes=>{outlook=>'sunny',temperature=>85,humidity=>85,windy=>'FALSE'},label=>'play=no');
 $nb->add_instance(attributes=>{outlook=>'sunny',temperature=>80,humidity=>90,windy=>'TRUE'},label=>'play=no');
 $nb->add_instance(attributes=>{outlook=>'overcast',temperature=>83,humidity=>86,windy=>'FALSE'},label=>'play=yes');
 $nb->add_instance(attributes=>{outlook=>'rainy',temperature=>70,humidity=>96,windy=>'FALSE'},label=>'play=yes');
 $nb->add_instance(attributes=>{outlook=>'rainy',temperature=>68,humidity=>80,windy=>'FALSE'},label=>'play=yes');
 $nb->add_instance(attributes=>{outlook=>'rainy',temperature=>65,humidity=>70,windy=>'TRUE'},label=>'play=no');
 $nb->add_instance(attributes=>{outlook=>'overcast',temperature=>64,humidity=>65,windy=>'TRUE'},label=>'play=yes');
 $nb->add_instance(attributes=>{outlook=>'sunny',temperature=>72,humidity=>95,windy=>'FALSE'},label=>'play=no');
 $nb->add_instance(attributes=>{outlook=>'sunny',temperature=>69,humidity=>70,windy=>'FALSE'},label=>'play=yes');
 $nb->add_instance(attributes=>{outlook=>'rainy',temperature=>75,humidity=>80,windy=>'FALSE'},label=>'play=yes');
 $nb->add_instance(attributes=>{outlook=>'sunny',temperature=>75,humidity=>70,windy=>'TRUE'},label=>'play=yes');
 $nb->add_instance(attributes=>{outlook=>'overcast',temperature=>72,humidity=>90,windy=>'TRUE'},label=>'play=yes');
 $nb->add_instance(attributes=>{outlook=>'overcast',temperature=>81,humidity=>75,windy=>'FALSE'},label=>'play=yes');
 $nb->add_instance(attributes=>{outlook=>'rainy',temperature=>71,humidity=>91,windy=>'TRUE'},label=>'play=no');
 
 $nb->train;
 
 my $printedmodel =  "Model:\n" . $nb->print_model;
 my $p = $nb->predict(attributes=>{outlook=>'sunny',temperature=>66,humidity=>90,windy=>'TRUE'});

 YAML::DumpFile('file', $p);
 die unless (abs($p->{'play=no'}  - 0.792) < 0.001);
 die unless(abs($p->{'play=yes'} - 0.208) < 0.001);

=head1 HISTORY

L<Algorithm::NaiveBayes> by Ken Williams was not what I needed so I
wrote this one.  L<Algorithm::NaiveBayes> is oriented towards text
categorization, it includes smoothing, and log probabilities.  This
module is a generic, basic Naive Bayes algorithm.

=head1 THANKS

I would like to thank Daniel Bohmer for documentation corrections,
Yung-chung Lin (cpan:xern) for the implementation of the Gaussian model
for continuous variables, and the following people for bug reports, support,
and comments (in no particular order):

Michael Stevens, Tom Dyson, Dan Von Kohorn, Craig Talbert,
Andrew Brian Clegg,

and CPAN-testers, including: Andreas Koenig, Alexandr Ciornii, jlatour,
Jost.Krieger, tvmaly, Matthew Musgrove, Michael Stevens, Nigel Horne,
Graham Crookham, David Cantrell (dcantrell).

=head1 AUTHOR

Copyright 2003-21 Vlado Keselj L<https://web.cs.dal.ca/~vlado>.
In 2004 Yung-chung Lin provided implementation of the Gaussian model for
continous variables.

This script is provided "as is" without expressed or implied warranty.
This is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.

The module is available on CPAN (L<https://metacpan.org/author/VLADO>), and
L<https://web.cs.dal.ca/~vlado/srcperl/>.  The latter site is
updated more frequently.

=head1 SEE ALSO

L<Algorithm::NaiveBayes>, L<perl>.

=cut

README  view on Meta::CPAN

AI/NaiveBayes1 version 2.011
============================

This is an implementation of the basic Naive Bayes classification
algorithm.  Documentation is a part of the file NaiveBayes1.pm in the
pod format.  To convert it into manual page format, type something
like the following:

	  pod2html NaiveBayes1.pm

INSTALLATION

To install this module type the following:

   perl Makefile.PL
   make
   make test
   make install

DEPENDENCIES

The methods for importing to and exporting from YAML format require
module YAML.

COPYRIGHT AND LICENCE

Copyright 2003-21 Vlado Keselj https://web.cs.dal.ca/~vlado

This script is provided "as is" without express or implied warranty.
This is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.

t/1-1.out  view on Meta::CPAN

Model:
category  | P(category) 
---+---
repairs=Y | 0.38        
repairs=N | 0.62        
---+---

category  | model | P( model | category ) 
---+---+---
repairs=Y | H     | 0.631578947368     
repairs=Y | T     | 0.368421052631     
---+---+---
repairs=N | H     | 0.209677419354     
repairs=N | T     | 0.790322580645     
---+---+---

category  | place | P( place | category ) 
---+---+---
repairs=Y | B     | 0.684210526315     
repairs=Y | N     | 0.315789473684     
---+---+---
repairs=N | B     | 0.193548387096     
repairs=N | N     | 0.806451612903     
---+---+---

t/1-2.out  view on Meta::CPAN

--- #YAML:1.0 !perl/AI::NaiveBayes1
attributes:
  - place
  - model
attvals:
  model:
    - H
    - T
  place:
    - B
    - N
labels:
  - repairs=Y
  - repairs=N
model:
  condprob:
    model:
      H:
        repairs=N: 0.209677419354839
        repairs=Y: 0.631578947368421
      T:
        repairs=N: 0.790322580645161
        repairs=Y: 0.368421052631579
    place:
      B:
        repairs=N: 0.193548387096774
        repairs=Y: 0.684210526315789
      N:
        repairs=N: 0.806451612903226
        repairs=Y: 0.315789473684211
  labelprob:
    repairs=N: 0.62
    repairs=Y: 0.38
  numerical_attr: {}
numof_instances: 200
stat_attributes:
  model:
    H:
      repairs=N: 26
      repairs=Y: 48
    T:
      repairs=N: 98
      repairs=Y: 28
  place:
    B:
      repairs=N: 24
      repairs=Y: 52
    N:
      repairs=N: 100
      repairs=Y: 24
stat_labels:
  repairs=N: 124
  repairs=Y: 76

t/1.t  view on Meta::CPAN

#!/usr/bin/perl

use Test::More tests => 4;
use_ok("AI::NaiveBayes1");

use lib '.';
require 't/auxfunctions.pl';

my $nb = AI::NaiveBayes1->new;

$nb->add_instances(attributes=>{model=>'H',place=>'B'},label=>'repairs=Y',cases=>30);
$nb->add_instances(attributes=>{model=>'H',place=>'B'},label=>'repairs=N',cases=>10);
$nb->add_instances(attributes=>{model=>'H',place=>'N'},label=>'repairs=Y',cases=>18);
$nb->add_instances(attributes=>{model=>'H',place=>'N'},label=>'repairs=N',cases=>16);
$nb->add_instances(attributes=>{model=>'T',place=>'B'},label=>'repairs=Y',cases=>22);
$nb->add_instances(attributes=>{model=>'T',place=>'B'},label=>'repairs=N',cases=>14);
$nb->add_instances(attributes=>{model=>'T',place=>'N'},label=>'repairs=Y',cases=> 6);
$nb->add_instances(attributes=>{model=>'T',place=>'N'},label=>'repairs=N',cases=>84);

$nb->train;

my $printedmodel =  "Model:\n" . $nb->print_model;
$printedmodel = &shorterdecimals($printedmodel);
#putfile('t/1-1.out', $printedmodel);
&compare_by_line($printedmodel, 't/1-1.out');

#putfile('t/1-2.out', $nb->export_to_YAML());
#is($nb->export_to_YAML(), getfile('t/1-2.out'));

eval "require YAML;";
plan skip_all => "YAML module required for the remaining tests in 1.t" if $@;

$nb->export_to_YAML_file('t/tmp1');

my $nb1 = AI::NaiveBayes1->import_from_YAML_file('t/tmp1');

$printedmodel = &shorterdecimals($nb1->print_model);
#is("Model:\n" . $printedmodel, getfile('t/1-1.out'));
&compare_by_line("Model:\n" . $printedmodel, 't/1-1.out');

my $tmp = $nb->export_to_YAML();
my $nb2 = AI::NaiveBayes1->import_from_YAML($tmp);
$printedmodel = &shorterdecimals($nb2->print_model);
#is("Model:\n" . $printedmodel, getfile('t/1-1.out'));
&compare_by_line("Model:\n" . $printedmodel, 't/1-1.out');

t/2-1.out  view on Meta::CPAN

Model:
category | P(category) 
---+---
spam=N   | 0.15        
spam=Y   | 0.85        
---+---

category | html | P( html | category ) 
---+---+---
spam=N   | N    | 0.666666666666    
spam=N   | Y    | 0.333333333333    
---+---+---
spam=Y   | N    | 0.058823529411   
spam=Y   | Y    | 0.941176470588    
---+---+---

category | morning | P( morning | category ) 
---+---+---
spam=N   | N       | 0.333333333333       
spam=N   | Y       | 0.666666666666       
---+---+---
spam=Y   | N       | 0.647058823529       
spam=Y   | Y       | 0.352941176470       
---+---+---

category | size1 | P( size1 | category ) 
---+---+---
spam=N   | L     | 0.333333333333     
spam=N   | S     | 0.666666666666     
---+---+---
spam=Y   | L     | 0.588235294117     
spam=Y   | S     | 0.411764705882     
---+---+---

t/2-2.out  view on Meta::CPAN

--- #YAML:1.0 !perl/AI::NaiveBayes1
attributes:
  - size1
  - morning
  - html
attvals:
  html:
    - N
    - Y
  morning:
    - Y
    - N
  size1:
    - S
    - L
labels:
  - spam=N
  - spam=Y
model:
  condprob:
    html:
      N:
        spam=N: 0.666666666666667
        spam=Y: 0.0588235294117647
      Y:
        spam=N: 0.333333333333333
        spam=Y: 0.941176470588235
    morning:
      N:
        spam=N: 0.333333333333333
        spam=Y: 0.647058823529412
      Y:
        spam=N: 0.666666666666667
        spam=Y: 0.352941176470588
    size1:
      L:
        spam=N: 0.333333333333333
        spam=Y: 0.588235294117647
      S:
        spam=N: 0.666666666666667
        spam=Y: 0.411764705882353
  labelprob:
    spam=N: 0.15
    spam=Y: 0.85
  numerical_attr: {}
numof_instances: 20
stat_attributes:
  html:
    N:
      spam=N: 2
      spam=Y: 1
    Y:
      spam=N: 1
      spam=Y: 16
  morning:
    N:
      spam=N: 1
      spam=Y: 11
    Y:
      spam=N: 2
      spam=Y: 6
  size1:
    L:
      spam=N: 1
      spam=Y: 10
    S:
      spam=N: 2
      spam=Y: 7
stat_labels:
  spam=N: 3
  spam=Y: 17

t/2-3.out  view on Meta::CPAN

--- #YAML:1.0
spam=N: 0.062703406378824
spam=Y: 0.937296593621176

t/2.t  view on Meta::CPAN

#!/usr/bin/perl

use Test::More tests => 6;
use_ok("AI::NaiveBayes1");

use lib '.';
require 't/auxfunctions.pl';

my $nb = AI::NaiveBayes1->new;

# @relation spam-b
#
# @attribute morning {Y,N}
# @attribute html    {Y,N}
# @attribute size1   {S,L}
# @attribute spam    {Y,N}
#
# @data
# Y, N, S, N
# N, Y, L, Y
# Y, Y, L, Y
# N, Y, S, Y
# N, N, S, N
# Y, Y, L, Y
# Y, Y, L, N
# N, Y, L, Y
# N, Y, L, Y
# N, Y, S, Y
# N, Y, S, Y
# N, Y, L, Y
# Y, Y, L, Y
# N, Y, L, Y
# Y, Y, L, Y
# N, Y, S, Y
# N, Y, S, Y
# Y, Y, L, Y
# Y, Y, S, Y
# N, N, S, Y

$nb->add_instance(attributes=>{morning=>'Y',html=>'N',size1=>'S'},label=>'spam=N');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'S'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'N',size1=>'S'},label=>'spam=N');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size1=>'L'},label=>'spam=N');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'S'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'S'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'S'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size1=>'S'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size1=>'L'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size1=>'S'},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'N',size1=>'S'},label=>'spam=Y');

$nb->train;

my $printedmodel =  "Model:\n" . $nb->print_model;
$printedmodel = &shorterdecimals($printedmodel);
#putfile('t/2-1.out', $printedmodel);
&compare_by_line($printedmodel, 't/2-1.out', __FILE__ , __LINE__);

#putfile('t/2-2.out', $nb->export_to_YAML());
#is($nb->export_to_YAML(), getfile('t/2-2.out'));

eval "require YAML;";
plan skip_all => "YAML module required for the remaining tests in 2.t" if $@;

$nb->export_to_YAML_file('t/tmp2');
my $nb1 = AI::NaiveBayes1->import_from_YAML_file('t/tmp2');
$printedmodel = "Model:\n" . $nb1->print_model;
$printedmodel = &shorterdecimals($printedmodel);
&compare_by_line($printedmodel, 't/2-1.out', __FILE__ , __LINE__);
#is("Model:\n" . $nb1->print_model, getfile('t/2-1.out'));

my $tmp = $nb->export_to_YAML();
my $nb2 = AI::NaiveBayes1->import_from_YAML($tmp);
$printedmodel = "Model:\n" . $nb2->print_model;
$printedmodel = &shorterdecimals($printedmodel);
&compare_by_line($printedmodel, 't/2-1.out', __FILE__ , __LINE__);
#is("Model:\n" . $nb2->print_model, getfile('t/2-1.out'));

my $p = $nb->predict(attributes=>{morning=>'Y',html=>'Y',size1=>'L'});

#putfile('t/2-3.out', YAML::Dump($p));
like($p->{'spam=N'}, qr/0\.0627/);
like($p->{'spam=Y'}, qr/0\.9372/);

t/3-1.out  view on Meta::CPAN

Model:
category | P(category)       
---+---
play=no  | 0.357142857142 
play=yes | 0.642857142857 
---+---

category | humidity | P( humidity | category ) 
---+---+---
play=no  | high     | 0.8                      
play=no  | normal   | 0.2                      
---+---+---
play=yes | high     | 0.333333333333        
play=yes | normal   | 0.666666666666        
---+---+---

category | outlook  | P( outlook | category ) 
---+---+---
play=no  | rainy    | 0.4                     
play=no  | sunny    | 0.6                     
---+---+---
play=yes | overcast | 0.444444444444       
play=yes | rainy    | 0.333333333333       
play=yes | sunny    | 0.222222222222       
---+---+---

category | temperature | P( temperature | category ) 
---+---+---
play=no  | cool        | 0.2                         
play=no  | hot         | 0.4                         
play=no  | mild        | 0.4                         
---+---+---
play=yes | cool        | 0.333333333333           
play=yes | hot         | 0.222222222222           
play=yes | mild        | 0.444444444444           
---+---+---

category | windy | P( windy | category ) 
---+---+---
play=no  | FALSE | 0.4                   
play=no  | TRUE  | 0.6                   
---+---+---
play=yes | FALSE | 0.666666666666     
play=yes | TRUE  | 0.333333333333     
---+---+---

t/3-2.out  view on Meta::CPAN

--- #YAML:1.0 !perl/AI::NaiveBayes1
attributes:
  - outlook
  - windy
  - humidity
  - temperature
attvals:
  humidity:
    - high
    - normal
  outlook:
    - sunny
    - overcast
    - rainy
  temperature:
    - hot
    - mild
    - cool
  windy:
    - FALSE
    - TRUE
labels:
  - play=no
  - play=yes
model:
  condprob:
    humidity:
      high:
        play=no: 0.8
        play=yes: 0.333333333333333
      normal:
        play=no: 0.2
        play=yes: 0.666666666666667
    outlook:
      overcast:
        play=no: 0
        play=yes: 0.444444444444444
      rainy:
        play=no: 0.4
        play=yes: 0.333333333333333
      sunny:
        play=no: 0.6
        play=yes: 0.222222222222222
    temperature:
      cool:
        play=no: 0.2
        play=yes: 0.333333333333333
      hot:
        play=no: 0.4
        play=yes: 0.222222222222222
      mild:
        play=no: 0.4
        play=yes: 0.444444444444444
    windy:
      FALSE:
        play=no: 0.4
        play=yes: 0.666666666666667
      TRUE:
        play=no: 0.6
        play=yes: 0.333333333333333
  labelprob:
    play=no: 0.357142857142857
    play=yes: 0.642857142857143
  numerical_attr: {}
numof_instances: 14
stat_attributes:
  humidity:
    high:
      play=no: 4
      play=yes: 3
    normal:
      play=no: 1
      play=yes: 6
  outlook:
    overcast:
      play=yes: 4
    rainy:
      play=no: 2
      play=yes: 3
    sunny:
      play=no: 3
      play=yes: 2
  temperature:
    cool:
      play=no: 1
      play=yes: 3
    hot:
      play=no: 2
      play=yes: 2
    mild:
      play=no: 2
      play=yes: 4
  windy:
    FALSE:
      play=no: 2
      play=yes: 6
    TRUE:
      play=no: 3
      play=yes: 3
stat_labels:
  play=no: 5
  play=yes: 9

t/3-3.out  view on Meta::CPAN

--- #YAML:1.0
play=no: 0.795417348608838
play=yes: 0.204582651391162

t/3.t  view on Meta::CPAN

#!/usr/bin/perl

use Test::More tests => 6;
use_ok("AI::NaiveBayes1");

use lib '.';
require 't/auxfunctions.pl';

my $nb = AI::NaiveBayes1->new;

# Example from Witten I. and Frank E. book "Data Mining" (the WEKA
# book), page82
#
# @relation weather.symbolic
# 
# @attribute outlook {sunny, overcast, rainy}
# @attribute temperature {hot, mild, cool}
# @attribute humidity {high, normal}
# @attribute windy {TRUE, FALSE}
# @attribute play {yes, no}
# 
# @data
# sunny,hot,high,FALSE,no
# sunny,hot,high,TRUE,no
# overcast,hot,high,FALSE,yes
# rainy,mild,high,FALSE,yes
# rainy,cool,normal,FALSE,yes
# rainy,cool,normal,TRUE,no
# overcast,cool,normal,TRUE,yes
# sunny,mild,high,FALSE,no
# sunny,cool,normal,FALSE,yes
# rainy,mild,normal,FALSE,yes
# sunny,mild,normal,TRUE,yes
# overcast,mild,high,TRUE,yes
# overcast,hot,normal,FALSE,yes
# rainy,mild,high,TRUE,no

$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>'hot',humidity=>'high',windy=>'FALSE'},label=>'play=no');
$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>'hot',humidity=>'high',windy=>'TRUE'},label=>'play=no');
$nb->add_instance(attributes=>{outlook=>'overcast',temperature=>'hot',humidity=>'high',windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>'mild',humidity=>'high',windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>'cool',humidity=>'normal',windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>'cool',humidity=>'normal',windy=>'TRUE'},label=>'play=no');
$nb->add_instance(attributes=>{outlook=>'overcast',temperature=>'cool',humidity=>'normal',windy=>'TRUE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>'mild',humidity=>'high',windy=>'FALSE'},label=>'play=no');
$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>'cool',humidity=>'normal',windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>'mild',humidity=>'normal',windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>'mild',humidity=>'normal',windy=>'TRUE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'overcast',temperature=>'mild',humidity=>'high',windy=>'TRUE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'overcast',temperature=>'hot',humidity=>'normal',windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>'mild',humidity=>'high',windy=>'TRUE'},label=>'play=no');

$nb->train;

my $printedmodel =  "Model:\n" . $nb->print_model;
$printedmodel = &shorterdecimals($printedmodel);

#putfile('t/3-1.out', $printedmodel);
&compare_by_line($printedmodel, 't/3-1.out', __FILE__, __LINE__);

#putfile('t/3-2.out', $nb->export_to_YAML());
#is($nb->export_to_YAML(), getfile('t/3-2.out'));

eval "require YAML;";
plan skip_all => "YAML module required for the remaining tests in 3.t" if $@;

$nb->export_to_YAML_file('t/tmp3');
my $nb1 = AI::NaiveBayes1->import_from_YAML_file('t/tmp3');
&compare_by_line("Model:\n" . &shorterdecimals($nb1->print_model),
		 't/3-1.out', __FILE__, __LINE__);

my $tmp = $nb->export_to_YAML();
my $nb2 = AI::NaiveBayes1->import_from_YAML($tmp);
&compare_by_line("Model:\n" . &shorterdecimals($nb2->print_model),
		 't/3-1.out', __FILE__, __LINE__);

my $p = $nb->predict(attributes=>{outlook=>'sunny',temperature=>'cool',humidity=>'high',windy=>'TRUE'});

#putfile('t/3-3.out', YAML::Dump($p));
ok(abs($p->{'play=no'}  - 0.795) < 0.001);
ok(abs($p->{'play=yes'} - 0.205) < 0.001);

t/4-1.out  view on Meta::CPAN

Model:
category | P(category)       
---+---
play=no  | 0.357142857142 
play=yes | 0.642857142857 
---+---

category | humidity | P( humidity | category )                                
---+---+---
play=no  | real     | Gaussian(mean=86.2,stddev=9.731392500562)             
---+---+---
play=yes | real     | Gaussian(mean=79.111111111111,stddev=10.215728613814) 
---+---+---

category | outlook  | P( outlook | category ) 
---+---+---
play=no  | rainy    | 0.4                     
play=no  | sunny    | 0.6                     
---+---+---
play=yes | overcast | 0.444444444444       
play=yes | rainy    | 0.333333333333       
play=yes | sunny    | 0.222222222222       
---+---+---

category | temperature | P( temperature | category )                 
---+---+---
play=no  | real        | Gaussian(mean=74.6,stddev=7.893034904268) 
---+---+---
play=yes | real        | Gaussian(mean=73,stddev=6.164414002968)   
---+---+---

category | windy | P( windy | category ) 
---+---+---
play=no  | FALSE | 0.4                   
play=no  | TRUE  | 0.6                   
---+---+---
play=yes | FALSE | 0.666666666666     
play=yes | TRUE  | 0.333333333333     
---+---+---

t/4-2.out  view on Meta::CPAN

--- #YAML:1.0 !perl/AI::NaiveBayes1
attributes:
  - outlook
  - windy
  - humidity
  - temperature
attvals:
  humidity:
    - 85
    - 90
    - 86
    - 96
    - 80
    - 70
    - 65
    - 95
    - 75
    - 91
  outlook:
    - sunny
    - overcast
    - rainy
  temperature:
    - 85
    - 80
    - 83
    - 70
    - 68
    - 65
    - 64
    - 72
    - 69
    - 75
    - 81
    - 71
  windy:
    - FALSE
    - TRUE
labels:
  - play=>no
  - play=>yes
model:
  condprob:
    outlook:
      overcast:
        play=>no: 0
        play=>yes: 0.444444444444444
      rainy:
        play=>no: 0.4
        play=>yes: 0.333333333333333
      sunny:
        play=>no: 0.6
        play=>yes: 0.222222222222222
    windy:
      FALSE:
        play=>no: 0.4
        play=>yes: 0.666666666666667
      TRUE:
        play=>no: 0.6
        play=>yes: 0.333333333333333
  labelprob:
    play=>no: 0.357142857142857
    play=>yes: 0.642857142857143
  real_attr: &1
    humidity: 1
    temperature: 1
  real_stat:
    humidity:
      play=>no:
        count: 5
        mean: 86.2
        stddev: 9.73139250056229
        sum: 431
      play=>yes:
        count: 9
        mean: 79.1111111111111
        stddev: 10.2157286138146
        sum: 712
    temperature:
      play=>no:
        count: 5
        mean: 74.6
        stddev: 7.89303490426845
        sum: 373
      play=>yes:
        count: 9
        mean: 73
        stddev: 6.16441400296898
        sum: 657
numof_instances: 14
real_attr: *1
real_stat: {}
stat_attributes:
  humidity:
    65:
      play=>yes: 1
    70:
      play=>no: 1
      play=>yes: 2
    75:
      play=>yes: 1
    80:
      play=>yes: 2
    85:
      play=>no: 1
    86:
      play=>yes: 1
    90:
      play=>no: 1
      play=>yes: 1
    91:
      play=>no: 1
    95:
      play=>no: 1
    96:
      play=>yes: 1
  outlook:
    overcast:
      play=>yes: 4
    rainy:
      play=>no: 2
      play=>yes: 3
    sunny:
      play=>no: 3
      play=>yes: 2
  temperature:
    64:
      play=>yes: 1
    65:
      play=>no: 1
    68:
      play=>yes: 1
    69:
      play=>yes: 1
    70:
      play=>yes: 1
    71:
      play=>no: 1
    72:
      play=>no: 1
      play=>yes: 1
    75:
      play=>yes: 2
    80:
      play=>no: 1
    81:
      play=>yes: 1
    83:
      play=>yes: 1
    85:
      play=>no: 1
  windy:
    FALSE:
      play=>no: 2
      play=>yes: 6
    TRUE:
      play=>no: 3
      play=>yes: 3
stat_labels:
  play=>no: 5
  play=>yes: 9

t/4-3.out  view on Meta::CPAN

--- #YAML:1.0
play=no: 0.792097926094358
play=yes: 0.207902073905642

t/4.t  view on Meta::CPAN

#!/usr/bin/perl

use Test::More tests => 6;
use_ok("AI::NaiveBayes1");

use lib '.';
require 't/auxfunctions.pl';

my $nb = AI::NaiveBayes1->new;

# Example from Witten I. and Frank E. book "Data Mining" (the WEKA
# book), page 86
#
# @relation weather
# 
# @attribute outlook {sunny, overcast, rainy}
# @attribute temperature real
# @attribute humidity real
# @attribute windy {TRUE, FALSE}
# @attribute play {yes, no}
# 
# @data
# sunny,85,85,FALSE,no
# sunny,80,90,TRUE,no
# overcast,83,86,FALSE,yes
# rainy,70,96,FALSE,yes
# rainy,68,80,FALSE,yes
# rainy,65,70,TRUE,no
# overcast,64,65,TRUE,yes
# sunny,72,95,FALSE,no
# sunny,69,70,FALSE,yes
# rainy,75,80,FALSE,yes
# sunny,75,70,TRUE,yes
# overcast,72,90,TRUE,yes
# overcast,81,75,FALSE,yes
# rainy,71,91,TRUE,no
#

$nb->set_real('temperature', 'humidity');

$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>85,humidity=>85,windy=>'FALSE'},label=>'play=no');
$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>80,humidity=>90,windy=>'TRUE'},label=>'play=no');
$nb->add_instance(attributes=>{outlook=>'overcast',temperature=>83,humidity=>86,windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>70,humidity=>96,windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>68,humidity=>80,windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>65,humidity=>70,windy=>'TRUE'},label=>'play=no');
$nb->add_instance(attributes=>{outlook=>'overcast',temperature=>64,humidity=>65,windy=>'TRUE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>72,humidity=>95,windy=>'FALSE'},label=>'play=no');
$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>69,humidity=>70,windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>75,humidity=>80,windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'sunny',temperature=>75,humidity=>70,windy=>'TRUE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'overcast',temperature=>72,humidity=>90,windy=>'TRUE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'overcast',temperature=>81,humidity=>75,windy=>'FALSE'},label=>'play=yes');
$nb->add_instance(attributes=>{outlook=>'rainy',temperature=>71,humidity=>91,windy=>'TRUE'},label=>'play=no');

$nb->train;

my $printedmodel =  "Model:\n" . $nb->print_model;
$printedmodel = &shorterdecimals($printedmodel);

#putfile('t/4-1.out', $printedmodel);
&compare_by_line($printedmodel, 't/4-1.out', __FILE__, __LINE__);

#putfile('t/4-2.out', $nb->export_to_YAML());
#is($nb->export_to_YAML(), getfile('t/4-2.out'));

eval "require YAML;";
plan skip_all => "YAML module required for the remaining tests in 4.t" if $@;

$nb->export_to_YAML_file('t/tmp4');
my $nb1 = AI::NaiveBayes1->import_from_YAML_file('t/tmp4');
&compare_by_line("Model:\n" . &shorterdecimals($nb1->print_model),
		 't/4-1.out', __FILE__, __LINE__);

my $tmp = $nb->export_to_YAML();
my $nb2 = AI::NaiveBayes1->import_from_YAML($tmp);
&compare_by_line("Model:\n" . &shorterdecimals($nb2->print_model),
		 't/4-1.out', __FILE__, __LINE__);

my $p = $nb->predict(attributes=>{outlook=>'sunny',temperature=>66,humidity=>90,windy=>'TRUE'});

#putfile('t/4-3.out', YAML::Dump($p));
ok(abs($p->{'play=no'}  - 0.792) < 0.001);
ok(abs($p->{'play=yes'} - 0.208) < 0.001);

t/5-1.out  view on Meta::CPAN

Model:
category | P(category) 
---+---
spam=N   | 0.15        
spam=Y   | 0.85        
---+---

category | html | P( html | category ) 
---+---+---
spam=N   | N    | 0.666666666666    
spam=N   | Y    | 0.333333333333    
---+---+---
spam=Y   | N    | 0.058823529411   
spam=Y   | Y    | 0.941176470588    
---+---+---

category | morning | P( morning | category ) 
---+---+---
spam=N   | N       | 0.333333333333       
spam=N   | Y       | 0.666666666666       
---+---+---
spam=Y   | N       | 0.647058823529       
spam=Y   | Y       | 0.352941176470       
---+---+---

category | size | P( size | category )                                    
---+---+---
spam=N   | real | Gaussian(mean=1443.3333333333,stddev=1521.3077707463) 
---+---+---
spam=Y   | real | Gaussian(mean=2344.6470588235,stddev=1397.4010672126) 
---+---+---

t/5-2.out  view on Meta::CPAN

--- !!perl/hash:AI::NaiveBayes1
attribute_type:
  html: nominal
  morning: nominal
  size: real
attributes:
  - morning
  - html
  - size
attvals:
  html:
    - N
    - Y
  morning:
    - Y
    - N
  size:
    - 408
    - 3353
    - 4995
    - 1853
    - 732
    - 4017
    - 3190
    - 2345
    - 3569
    - 559
    - 1732
    - 2042
    - 3893
    - 3601
    - 2176
    - 877
    - 272
    - 2740
    - 514
    - 1321
labels:
  - spam=N
  - spam=Y
model:
  condprob:
    html:
      N:
        spam=N: 0.666666666666667
        spam=Y: 0.0588235294117647
      Y:
        spam=N: 0.333333333333333
        spam=Y: 0.941176470588235
    morning:
      N:
        spam=N: 0.333333333333333
        spam=Y: 0.647058823529412
      Y:
        spam=N: 0.666666666666667
        spam=Y: 0.352941176470588
    size: {}
  condprobe:
    html:
      N:
        spam=N: (= 2 / 3)
        spam=Y: (= 1 / 17)
      Y:
        spam=N: (= 1 / 3)
        spam=Y: (= 16 / 17)
    morning:
      N:
        spam=N: (= 1 / 3)
        spam=Y: (= 11 / 17)
      Y:
        spam=N: (= 2 / 3)
        spam=Y: (= 6 / 17)
  labelprob:
    spam=N: 0.15
    spam=Y: 0.85
  real_stat:
    size:
      spam=N:
        count: 3
        mean: 1443.33333333333
        stddev: 1521.30777074638
        sum: 4330
      spam=Y:
        count: 17
        mean: 2344.64705882353
        stddev: 1397.40106721265
        sum: 39859
numof_instances: 20
real_stat: {}
smoothing: {}
stat_attributes:
  html:
    N:
      spam=N: 2
      spam=Y: 1
    Y:
      spam=N: 1
      spam=Y: 16
  morning:
    N:
      spam=N: 1
      spam=Y: 11
    Y:
      spam=N: 2
      spam=Y: 6
  size:
    1321:
      spam=Y: 1
    1732:
      spam=Y: 1
    1853:
      spam=Y: 1
    2042:
      spam=Y: 1
    2176:
      spam=Y: 1
    2345:
      spam=Y: 1
    272:
      spam=Y: 1
    2740:
      spam=Y: 1
    3190:
      spam=N: 1
    3353:
      spam=Y: 1
    3569:
      spam=Y: 1
    3601:
      spam=Y: 1
    3893:
      spam=Y: 1
    4017:
      spam=Y: 1
    408:
      spam=N: 1
    4995:
      spam=Y: 1
    514:
      spam=Y: 1
    559:
      spam=Y: 1
    732:
      spam=N: 1
    877:
      spam=Y: 1
stat_labels:
  spam=N: 3
  spam=Y: 17

t/5-3.out  view on Meta::CPAN

--- #YAML:1.0
spam=N: 0.0430201234637135
spam=Y: 0.956979876536287

t/5.t  view on Meta::CPAN

#!/usr/bin/perl

use Test::More tests => 6;
use_ok("AI::NaiveBayes1");

use lib '.';
require 't/auxfunctions.pl';

my $nb = AI::NaiveBayes1->new;

# @relation spam
#
# @attribute morning {Y,N}
# @attribute html    {Y,N}
# @attribute size    real
# @attribute spam    {Y,N}
#
# @data
# Y, N, 408, N
# N, Y, 3353, Y
# Y, Y, 4995, Y
# N, Y, 1853, Y
# N, N, 732, N
# Y, Y, 4017, Y
# Y, Y, 3190, N
# N, Y, 2345, Y
# N, Y, 3569, Y
# N, Y, 559, Y
# N, Y, 1732, Y
# N, Y, 2042, Y
# Y, Y, 3893, Y
# N, Y, 3601, Y
# Y, Y, 2176, Y
# N, Y, 877, Y
# N, Y, 272, Y
# Y, Y, 2740, Y
# Y, Y, 514, Y
# N, N, 1321, Y

$nb->add_instance(attributes=>{morning=>'Y',html=>'N',size=>408},label=>'spam=N');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>3353},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size=>4995},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>1853},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'N',size=>732},label=>'spam=N');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size=>4017},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size=>3190},label=>'spam=N');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>2345},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>3569},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>559},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>1732},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>2042},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size=>3893},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>3601},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size=>2176},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>877},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'Y',size=>272},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size=>2740},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'Y',html=>'Y',size=>514},label=>'spam=Y');
$nb->add_instance(attributes=>{morning=>'N',html=>'N',size=>1321},label=>'spam=Y');

$nb->set_real('size');
$nb->train;

my $printedmodel =  "Model:\n" . $nb->print_model;
$printedmodel = &shorterdecimals($printedmodel);

#putfile('t/5-1.out', $printedmodel);
&compare_by_line($printedmodel, 't/5-1.out', __FILE__, __LINE__);

#putfile('t/5-2.out', $nb->export_to_YAML());
#is($nb->export_to_YAML(), getfile('t/5-2.out'));

eval "require YAML;";
plan skip_all => "YAML module required for the remaining tests in 5.t" if $@;

$nb->export_to_YAML_file('t/tmp5');
my $nb1 = AI::NaiveBayes1->import_from_YAML_file('t/tmp5');
&compare_by_line("Model:\n" . &shorterdecimals($nb1->print_model),
		 't/5-1.out', __FILE__, __LINE__);

my $tmp = $nb->export_to_YAML();
my $nb2 = AI::NaiveBayes1->import_from_YAML($tmp);
&compare_by_line("Model:\n" . &shorterdecimals($nb2->print_model),
		 't/5-1.out', __FILE__, __LINE__);

my $p = $nb->predict(attributes=>{morning=>'Y',html=>'Y',size=>4749});

#putfile('t/5-3.out', YAML::Dump($p));
ok(abs($p->{'spam=N'} - 0.043) < 0.001);
ok(abs($p->{'spam=Y'} - 0.957) < 0.001);

t/6-1.out  view on Meta::CPAN

Model:
category | P(category) 
---+---
S=Y      | 0.38        
S=N      | 0.62        
---+---

category | C | P( C | category ) 
---+---+---
S=Y      | N | 0.368421052631 
S=Y      | Y | 0.631578947368 
---+---+---
S=N      | N | 0.790322580645 
S=N      | Y | 0.209677419354 
---+---+---

category | F | P( F | category ) 
---+---+---
S=Y      | 0 | 0.315789473684 
S=Y      | 2 | 0.684210526315 
---+---+---
S=N      | 0 | 0.806451612903 
S=N      | 2 | 0.193548387096 
---+---+---

t/6-2.out  view on Meta::CPAN

--- #YAML:1.0
S=N: 0.580411692828273
S=Y: 0.419588307171727

t/6-3.out  view on Meta::CPAN

Model:
category | P(category) 
---+---
S=Y      | 0.38        
S=N      | 0.62        
---+---

category | C | P( C | category ) 
---+---+---
S=Y      | N | 0.368421052631 
S=Y      | Y | 0.631578947368 
---+---+---
S=N      | N | 0.790322580645 
S=N      | Y | 0.209677419354 
---+---+---

category | F    | P( F | category )                                         
---+---+---
S=Y      | real | Gaussian(mean=1.368421052631,stddev=0.935836242984)  
---+---+---
S=N      | real | Gaussian(mean=0.387096774193,stddev=0.793363503758) 
---+---+---

t/6-4.out  view on Meta::CPAN

--- #YAML:1.0
S=N: 0.338752080681736
S=Y: 0.661247919318264

 view all matches for this distribution
 view release on metacpan -  search on metacpan

( run in 2.614 seconds using v1.00-cache-2.02-grep-82fe00e-cpan-cec75d87357c )