Flat-Profile
view release on metacpan or search on metacpan
lib/Flat/Profile.pm view on Meta::CPAN
or croak "Failed to open '$path' for reading: $!";
return Flat::Profile::Iterator->new(
fh => $fh,
delimiter => $delimiter,
has_header => $has_header,
);
}
sub _format_utc_timestamp {
my @t = gmtime(time());
my $year = $t[5] + 1900;
my $mon = $t[4] + 1;
my $day = $t[3];
my $hour = $t[2];
my $min = $t[1];
my $sec = $t[0];
return sprintf("%04d-%02d-%02dT%02d:%02d:%02dZ", $year, $mon, $day, $hour, $min, $sec);
}
1;
__END__
=pod
=head1 NAME
Flat::Profile - Streaming-first profiling for CSV/TSV flat files
=head1 SYNOPSIS
use Flat::Profile;
my $p = Flat::Profile->new();
my $it = $p->iter_rows(
path => "data.csv",
has_header => 1,
delimiter => ",",
encoding => "UTF-8",
);
while (my $row = $it->next_row) {
# $row is an arrayref
}
my $report = $p->profile_file(
path => "data.csv",
has_header => 1,
delimiter => ",",
null_empty => 1,
null_tokens => ["NULL", "NA"],
example_cap => 10,
max_errors => 1000,
);
=head1 DESCRIPTION
Flat::Profile is part of the Flat::* series. It provides streaming-first profiling
for CSV/TSV inputs for practical ETL and legacy data workflows.
Design goals:
=over 4
=item *
Streaming-first (single pass, predictable memory)
=item *
Practical diagnostics (ragged rows, null policy, examples)
=item *
Stable report format intended to feed Flat::Schema / Flat::Validate
=back
=head1 METHODS
=head2 new
my $p = Flat::Profile->new();
Constructor. Takes named arguments (currently reserved for future configuration).
=head2 iter_rows
my $it = $p->iter_rows(%args);
Returns an iterator object (L<Flat::Profile::Iterator>).
Required named arguments:
=over 4
=item * path
=back
Common named arguments:
=over 4
=item * has_header (boolean)
=item * delimiter ("," or "\t")
=item * encoding (default "UTF-8")
=back
=head2 profile_file
my $report = $p->profile_file(%args);
Profiles a CSV/TSV file in a streaming pass and returns a hashref report.
Key named arguments include:
=over 4
=item * path (required)
=item * has_header
=item * delimiter
=item * encoding
=item * null_empty (default true)
=item * null_tokens (arrayref; default empty)
=item * example_cap (default 10)
=item * max_errors (threshold stop; default 1000)
=back
=head1 NULL SEMANTICS
By default, empty string is treated as null:
null_empty => 1 # default
To treat empty string as a value:
null_empty => 0
You can also treat specific exact tokens as null:
null_tokens => ["NULL", "N/A"]
Notes:
=over 4
=item *
Token matching is exact (no trimming, case-sensitive) in v1.
=item *
undef is always treated as null.
=back
=head1 RAGGED ROWS
Flat::Profile tracks width mismatches relative to an expected width:
=over 4
=item *
If has_header is true, expected width is the header width.
( run in 0.835 second using v1.01-cache-2.11-cpan-140bd7fdf52 )