Flat-Profile

 view release on metacpan or  search on metacpan

lib/Flat/Profile.pm  view on Meta::CPAN

        or croak "Failed to open '$path' for reading: $!";

    return Flat::Profile::Iterator->new(
        fh         => $fh,
        delimiter  => $delimiter,
        has_header => $has_header,
    );
}

sub _format_utc_timestamp {
    my @t = gmtime(time());
    my $year = $t[5] + 1900;
    my $mon  = $t[4] + 1;
    my $day  = $t[3];
    my $hour = $t[2];
    my $min  = $t[1];
    my $sec  = $t[0];

    return sprintf("%04d-%02d-%02dT%02d:%02d:%02dZ", $year, $mon, $day, $hour, $min, $sec);
}

1;

__END__

=pod

=head1 NAME

Flat::Profile - Streaming-first profiling for CSV/TSV flat files

=head1 SYNOPSIS

    use Flat::Profile;

    my $p = Flat::Profile->new();

    my $it = $p->iter_rows(
        path       => "data.csv",
        has_header => 1,
        delimiter  => ",",
        encoding   => "UTF-8",
    );

    while (my $row = $it->next_row) {
        # $row is an arrayref
    }

    my $report = $p->profile_file(
        path        => "data.csv",
        has_header  => 1,
        delimiter   => ",",
        null_empty  => 1,
        null_tokens => ["NULL", "NA"],
        example_cap => 10,
        max_errors  => 1000,
    );

=head1 DESCRIPTION

Flat::Profile is part of the Flat::* series. It provides streaming-first profiling
for CSV/TSV inputs for practical ETL and legacy data workflows.

Design goals:

=over 4

=item *

Streaming-first (single pass, predictable memory)

=item *

Practical diagnostics (ragged rows, null policy, examples)

=item *

Stable report format intended to feed Flat::Schema / Flat::Validate

=back

=head1 METHODS

=head2 new

    my $p = Flat::Profile->new();

Constructor. Takes named arguments (currently reserved for future configuration).

=head2 iter_rows

    my $it = $p->iter_rows(%args);

Returns an iterator object (L<Flat::Profile::Iterator>).

Required named arguments:

=over 4

=item * path

=back

Common named arguments:

=over 4

=item * has_header (boolean)

=item * delimiter ("," or "\t")

=item * encoding (default "UTF-8")

=back

=head2 profile_file

    my $report = $p->profile_file(%args);

Profiles a CSV/TSV file in a streaming pass and returns a hashref report.

Key named arguments include:

=over 4

=item * path (required)

=item * has_header

=item * delimiter

=item * encoding

=item * null_empty (default true)

=item * null_tokens (arrayref; default empty)

=item * example_cap (default 10)

=item * max_errors (threshold stop; default 1000)

=back

=head1 NULL SEMANTICS

By default, empty string is treated as null:

    null_empty => 1   # default

To treat empty string as a value:

    null_empty => 0

You can also treat specific exact tokens as null:

    null_tokens => ["NULL", "N/A"]

Notes:

=over 4

=item *

Token matching is exact (no trimming, case-sensitive) in v1.

=item *

undef is always treated as null.

=back

=head1 RAGGED ROWS

Flat::Profile tracks width mismatches relative to an expected width:

=over 4

=item *

If has_header is true, expected width is the header width.



( run in 0.835 second using v1.01-cache-2.11-cpan-140bd7fdf52 )