DataExtract-FixedWidth

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN


0.04    Better pod -- almost a complete overhaul
        Constructor tests

0.03    Speed fixes
        Pods fixes
        And, BrowserUK's heuristic algo in the constructor by opt in

0.02    Speed fix per uri's suggestion
        Fixed pod
        Fixed default for ->fix_overlay

0.01    Date/time
        First version, released on an unsuspecting world.

lib/DataExtract/FixedWidth.pm  view on Meta::CPAN

	, is         => 'rw'
	, predicate  => 'has_header_row'
);

has 'first_col_zero' => (
	isa       => 'Bool'
	, is      => 'ro'
	, default => 1
);

has 'fix_overlay' => (
	isa       => 'Bool'
	, is      => 'ro'
	, default => 0
);

has 'trim_whitespace' => (
	isa       => 'Bool'
	, is      => 'ro'
	, default => 1
);

lib/DataExtract/FixedWidth.pm  view on Meta::CPAN

	) {
		warn "Skipping duplicate header row\n" if $self->verbose;
		return undef
	}

	#printf "\nData:|%s|\tHeader:|%s|", $data, $self->header_row;

	my @cols = unpack ( $self->unpack_string, $data );

	## If we bleed over a bit we can fix that.
	if ( $self->fix_overlay ) {
		foreach my $idx ( 0 .. $#cols ) {
			if (
				$cols[$idx] =~ m/\S+$/
				&& exists $cols[$idx+1]
				&& $cols[$idx+1] =~ s/^(\S+)//
			) {
					$cols[$idx] .= $1;
			}
		}
	}

lib/DataExtract/FixedWidth.pm  view on Meta::CPAN


This option forces the unpack string to make the first column assume the characters to the left of the header column. So, in the below example the first column also includes the first char of the row, even though the word stock begins at the second c...

	CHAR NUMBERS: |1|2|3|4|5|6|7|8|9|10
	HEADER ROW  : | |S|T|O|C|K| |V|I|N

=item ->trim_whitespace(*1|0)

Trim the whitespace for the elements that C<-E<gt>parse($line)> outputs.

=item ->fix_overlay(1|0*)

Fixes columns that bleed into other columns, move over all non-whitespace characters preceding the first whitespace of the next column. This does not work with heurisitic because the unpack string makes the assumption the data is not mangeled.

So if ColumnA as is 'foob' and ColumnB is 'ar Hello world'

* ColumnA becomes 'foobar', and ColumnB becomes 'Hello world'

=item ->null_as_undef(1|0*)

Simply undef all elements that return C<length(element) = 0>, requires C<-E<gt>trim_whitespace>.

t/04-Fix-Overlay.t  view on Meta::CPAN

#!/usr/bin/env perl
## Quick test for fix_overlay
use strict;
use warnings;
use feature ':5.10';

use Test::More tests => 3;
use File::Spec;
use DataExtract::FixedWidth;

my $file = File::Spec->catfile( 't', 'data', 'Fix-Overlay.txt' );
open ( my $fh, $file ) || die "Can not open $file";

while ( my $line = <$fh> ) {
	state $fw;

	if ( $. == 1 ) {
		$fw = DataExtract::FixedWidth->new({
			header_row => $line
			, fix_overlay => 1
		});
	}
	else {
		my $arrRef = $fw->parse( $line );
		my $hashRef = $fw->parse_hash( $line );

		given ( $. ) {
			when ( 2 ) {
				ok ( $hashRef->{id} eq 1, "Testing output (->parse_hash)" );
				ok ( $hashRef->{name} eq 'Amy is foobared', "Testing output (->parse_hash)" );



( run in 0.777 second using v1.01-cache-2.11-cpan-49f99fa48dc )