Graphics-Skullplot

 view release on metacpan or  search on metacpan

lib/Graphics/Skullplot/ClassifyColumns.pm  view on Meta::CPAN

=cut 

# TODO revise these before shipping
our $VERSION = '0.02';
my $DEBUG = 1;

=head1 SYNOPSIS

  use Graphics::Skullplot::ClassifyColumns;

  my $cc = Graphics::Skullplot::ClassifyColumns->new( data => $data );  
  my $plot_cols = 
    $cc->classify_columns_simple( { indie_count => $indie_count, } );

=head1 DESCRIPTION

Graphics::Skullplot::ClassifyColumns is a stripped down version 
of an old experimental module I was developing I called Data::Classify.
I expect to go back to that project and develop a more elaborate 
system of plug-ins to target different kinds of databases and so on,
most likely named Table::TypeInference.

This particular module just needs a "classify_columns_simple" routine 
that works well enough to figure out how to plot some data via 
ggplot2 in R (i.e. the "Graphics::Skullplot" project).

=cut

use 5.10.0;
use strict;
use warnings;
use Carp;
use Data::Dumper;

use Scalar::Classify qw();

=over

=item new

Creates a new Graphics::Skullplot::ClassifyColumns object.

Takes a hashref as an argument, with named fields identical
to the names of the object attributes. These attributes are:

=over

=item data

A required field, columns of data as an array of array references, 
with a header in the first row.

=back

=cut

# Example attribute:
# has is_loop => ( is => 'rw', isa => Int, default => 0 );
# Tempted to use Mouse over Moo so I can do my usual "isa => 'Int'"

has data => ( is => 'ro', isa => ArrayRef );

has patterns => ( is => 'ro', isa => HashRef, builder => "define_regxeps" );

# $DB::single = 1;


=item classify_columns_simple

Note: here "simple" might be thought of as "stub":
This does the simplest possible categorization using only 
a single numeric hint for the number of independent fields.

The presumption here is the incoming data is organized like 
the output of a typical sql group by select, x-axis in the 
first column a number of columns of dependent data as the
end, and (possibly) a certain number of categorical variables
(ones with a small number of allowed values) in-between.

This returns a hash indicating how different columns should be
handled in the plotting stage, the keys are:

  x    (rename: indie_x )
  y             but just for when there's only one dependent 
  gb_cats
  dep_fields  (rename: dependents_y }

Example usage:

  my $cc = Graphics::Skullplot::ClassifyColumns->new( data => $data );  
  my $opt = { indie_count => 1, };
  my $plot_cols_href = 
    $cc->classify_columns_simple( $opt ); 

=cut

sub classify_columns_simple {
  my $self = shift;

  my $opt          = shift;
  my $indie_count  = $opt->{ indie_count } // 1;

  my %field_data; # return values

  my $dependent_requested   = $opt->{ dependent_requested };
  my $independent_requested = $opt->{ independent_requested };

  my $data = $self->data;
  my @header = @{ $data->[0] };

  # when we're told what to do there's no need to guess
  if ( $dependent_requested && $independent_requested ) {

    # TODO might be better to just use the empty set
    # my @gb_cats = grep{ !/^$dependent_requested$/ } grep{ !/^$independent_requested$/ } @header;
    my @gb_cats = ();

    %field_data =
      ( indie_x       => $independent_requested,  
        y             => $dependent_requested,   # redundant with dependents_y
        gb_cats       => [ @gb_cats ],
        dependents_y  => [ $dependent_requested ],
      );



( run in 1.748 second using v1.01-cache-2.11-cpan-39bf76dae61 )