App-Jawk
view release on metacpan or search on metacpan
#########################################################
# convert_args_to_fields( $args_ref, $numparts_in_line )
# args come in 1-based (or negative), and are returned 0-based
# handles ranges like 1..3 or 3..1
# as well as negative args alone or in ranges,
# like -2..1, or like -1..1
# note: could probably be optimized 20-30% by doing this all in one giant map{}
sub convert_args_to_fields {
my ($args_ref, $numparts) = @_;
return () unless $numparts;
print STDERR "$prog: 0: initially (@$args_ref)\n" if $debug;
my $numreg = '-?\d+'; # our 'number regex'
# convert the user's fields to field numbers within the fields
# parse the command line arguments for ints and ranges like a..b , a.. , and b.. .
# skip indexes that we don't have a value for, and
# shift each int down by one; 1-based to 0-based.
# (This was originally more broken up for clarity, but we optimized (and shortened)
# it to this.)
my @ret = map { $_ - 1 }
grep {$_ <= $numparts && $_ >= 1 }
map { /^$numreg$/ ? (_converted_version($_, $numparts)) : # ** a single int
/^($numreg)\.\.($numreg)$/ ? (get_range($1, $2, $numparts)) : # ** an int range
/^($numreg)\.\.$/ ? (get_range($1, $numparts, $numparts)) : # ** an integer and up
/^\.\.($numreg)$/ ? (get_range(1, $1, $numparts)) : # ** up to an integer
die "$prog: Don't know how to handle field '$_'\n" . Usage();
}
@$args_ref;
print STDERR "$prog: 1: modified to (@ret)\n" if $debug;
return @ret;
}
############################################
# invert_fields( [ 1, 2], 3 );# ( [activated], num_fields )
# choose the opposite of whatever's currently selected from num_fields.
# all indexes have been converted to normal 0-based perl style.
sub invert_fields {
my ($fieldsref, $numparts) = @_;
my %fieldshash = map { ( $_, undef ) } @$fieldsref; # keys are fields for exclusion
my @inversefields = grep { !exists($fieldshash{$_}) } ( 0 .. $numparts-1); # this is correct
return @inversefields; # all the other fields
}
############################################
# get_range( $a, $b )
# where $a and $b are indexes as specified by user.
# Converts $a and $b to positive versions, and
# returns all the possible ints from $a to $b, inclusive
# handles descending lists, unlike perl's '..' :)
sub get_range {
my ($a, $b, $numparts) = @_;
$a = _restrict_to_possible( _converted_version($a, $numparts), $numparts);
$b = _restrict_to_possible( _converted_version($b, $numparts), $numparts);
if ( $a <= $b ) { return ($a..$b); }
return (reverse($b..$a)); # they asked for a list in reverse. make it ascending, and reverse it.
# or as Missy Elliott would say - flip it and reverse it.
}
############################################
# $num = _converted_version( $num, $numparts )
# returns the int $numparts converted to be as if it were sane and 1-based.
# returns 0 if there arent as many columns are requested
# (IE, there are seven columns and you ask for the eighth).
sub _converted_version {
my ($num, $numparts) = @_;
if ($num < 0) {
$num = $numparts + $num + 1;
}
return $num;
}
############################################
# $num = _restrict_to_possible( $index, $numparts );
# restricts an input index within the range
# of possible positive indexes for this row.
sub _restrict_to_possible {
my ($num, $numparts) = @_;
if ($num < 0) { $num = $numparts ? 1 : 0; }
if ($num > $numparts) { $num = $numparts; }
return $num;
}
############################################
# pod docs moved back from lib/App/Jawk.pm
1;
=pod
=head1 NAME
jawk -- like awk, but post-modern and perly. AKA, Josh's Awk.
=head1 SYNOPSIS
jawk [-x] [-e 'code'] [-d delim] fieldspec [fieldspec...] [-- (FILES..)]:
If you haven't seen awk, then jawk can be described as a flexible tool for extracting columns
of data from text files.
If you've seen 'awk', then we can describe jawk as a replacement for statements like
awk '{print $N}'
which supports ranges, indexing columns by negative numbers, a perl mode, and more.
=head1 DESCRIPTION
jawk 1 is somewhat like awk '{print $1}'. Let's start with a fairly complex example.
Suppose you have a file called 'users.txt' with lines of data in this format:
Bob Elmer, 2716 Fremont Blvd, New York, NY, 12344, ID:91818, CanastaRating:3.1415
Elmer Fudd, 1 Bunny Hill Drive, Tarrytown, NY, 87654, ID:1, CanastaRating:123456789
This statement would pull out the 1st, and 3rd through last columns, using ', ' as an
( run in 1.848 second using v1.01-cache-2.11-cpan-d7a12ab2c7f )