DataExtract-FixedWidth
view release on metacpan or search on metacpan
0.04 Better pod -- almost a complete overhaul
Constructor tests
0.03 Speed fixes
Pods fixes
And, BrowserUK's heuristic algo in the constructor by opt in
0.02 Speed fix per uri's suggestion
Fixed pod
Fixed default for ->fix_overlay
0.01 Date/time
First version, released on an unsuspecting world.
lib/DataExtract/FixedWidth.pm view on Meta::CPAN
, is => 'rw'
, predicate => 'has_header_row'
);
has 'first_col_zero' => (
isa => 'Bool'
, is => 'ro'
, default => 1
);
has 'fix_overlay' => (
isa => 'Bool'
, is => 'ro'
, default => 0
);
has 'trim_whitespace' => (
isa => 'Bool'
, is => 'ro'
, default => 1
);
lib/DataExtract/FixedWidth.pm view on Meta::CPAN
) {
warn "Skipping duplicate header row\n" if $self->verbose;
return undef
}
#printf "\nData:|%s|\tHeader:|%s|", $data, $self->header_row;
my @cols = unpack ( $self->unpack_string, $data );
## If we bleed over a bit we can fix that.
if ( $self->fix_overlay ) {
foreach my $idx ( 0 .. $#cols ) {
if (
$cols[$idx] =~ m/\S+$/
&& exists $cols[$idx+1]
&& $cols[$idx+1] =~ s/^(\S+)//
) {
$cols[$idx] .= $1;
}
}
}
lib/DataExtract/FixedWidth.pm view on Meta::CPAN
This option forces the unpack string to make the first column assume the characters to the left of the header column. So, in the below example the first column also includes the first char of the row, even though the word stock begins at the second c...
CHAR NUMBERS: |1|2|3|4|5|6|7|8|9|10
HEADER ROW : | |S|T|O|C|K| |V|I|N
=item ->trim_whitespace(*1|0)
Trim the whitespace for the elements that C<-E<gt>parse($line)> outputs.
=item ->fix_overlay(1|0*)
Fixes columns that bleed into other columns, move over all non-whitespace characters preceding the first whitespace of the next column. This does not work with heurisitic because the unpack string makes the assumption the data is not mangeled.
So if ColumnA as is 'foob' and ColumnB is 'ar Hello world'
* ColumnA becomes 'foobar', and ColumnB becomes 'Hello world'
=item ->null_as_undef(1|0*)
Simply undef all elements that return C<length(element) = 0>, requires C<-E<gt>trim_whitespace>.
t/04-Fix-Overlay.t view on Meta::CPAN
#!/usr/bin/env perl
## Quick test for fix_overlay
use strict;
use warnings;
use feature ':5.10';
use Test::More tests => 3;
use File::Spec;
use DataExtract::FixedWidth;
my $file = File::Spec->catfile( 't', 'data', 'Fix-Overlay.txt' );
open ( my $fh, $file ) || die "Can not open $file";
while ( my $line = <$fh> ) {
state $fw;
if ( $. == 1 ) {
$fw = DataExtract::FixedWidth->new({
header_row => $line
, fix_overlay => 1
});
}
else {
my $arrRef = $fw->parse( $line );
my $hashRef = $fw->parse_hash( $line );
given ( $. ) {
when ( 2 ) {
ok ( $hashRef->{id} eq 1, "Testing output (->parse_hash)" );
ok ( $hashRef->{name} eq 'Amy is foobared', "Testing output (->parse_hash)" );
( run in 0.777 second using v1.01-cache-2.11-cpan-49f99fa48dc )