App-CSVUtils
view release on metacpan or search on metacpan
- Add options: --input-skip-num-lines, --input-skip-until-pattern.
[cli csv-concat]
- [doc][internal] Add more code comments on how csv_concat works.
1.035 2025-01-14 Released-By: PERLANCAR; Urgency: medium
- [cli csv-concat] Add --overlay mode & --overwrite-fields option.
1.034 2024-02-02 Released-By: PERLANCAR; Urgency: medium
- [ux] (Re)add csv2csv as alias for csv-csv.
1.033 2023-09-06 Released-By: PERLANCAR; Urgency: medium
- Add utility: csv-sort-fields-by-spec.
lib/App/CSVUtils/csv_concat.pm view on Meta::CPAN
col1,col2,col4,col3
1,2,
3,4,
,a,b
,c,d
,e,f
,,,X
,,,Y
When `--overlay` option is enabled, the result will be:
col1,col2,col4,col3
1,2,b,X
3,4,d,Y
,e,f,
When `--overlay` as well as `--overwrite-fields` option are enabled, the result
will be:
col1,col2,col4,col3
1,a,b,X
3,c,d,Y
,e,f,
Keywords: join, merge, overlay
MARKDOWN
add_args => {
overlay => {
summary => 'Whether to overlay rows from second and subsequent CSV files to the first',
schema => 'bool*',
description => <<'MARKDOWN',
By default, rows from the second CSV file will be added after all the rows from
the first CSV are added, and so on. However, when this option is enabled, the
rows the second and subsequent CSV files will be added together (overlaid). See
the utility's example for an illustration.
See also the `--overwrite-fields` option.
MARKDOWN
},
overwrite_fields => {
summary => 'Whether fields from subsequent CSV files should overwrite existing fields from previous CSV files',
schema => 'bool*',
description => <<'MARKDOWN',
When in overlay mode (`--overlay`), by default the value for a field is
retrieved from the first CSV file that has the field. With `--overwrite-fields`
option enabled, the value will be retrieved from the last CSV that has the
field. See the utility's example for an illustration.
MARKDOWN
},
},
tags => ['category:combining', 'join', 'merge'],
reads_multiple_csv => 1,
lib/App/CSVUtils/csv_concat.pm view on Meta::CPAN
my $field = $input_fields->[$j];
unless (grep {$field eq $_} @{ $r->{output_fields} }) {
push @{ $r->{output_fields} }, $field;
$r->{output_fields_idx}{$field} = $#{ $r->{output_fields} };
}
}
}
my $csv = $r->{input_parser};
if ($r->{util_args}{overlay}) {
my $overwrite_fields = $r->{util_args}{overwrite_fields};
my $output_fields_idx = $r->{output_fields_idx};
while (1) {
my $has_not_eof;
my $combined_row = [("") x @{ $r->{output_fields} }];
my %seen_fields;
for my $i (0 .. $#{ $r->{all_input_fh} }) {
my $fh = $r->{all_input_fh}[$i];
lib/App/CSVUtils/csv_concat.pm view on Meta::CPAN
col1,col2,col4,col3
1,2,
3,4,
,a,b
,c,d
,e,f
,,,X
,,,Y
When C<--overlay> option is enabled, the result will be:
col1,col2,col4,col3
1,2,b,X
3,4,d,Y
,e,f,
When C<--overlay> as well as C<--overwrite-fields> option are enabled, the result
will be:
col1,col2,col4,col3
1,a,b,X
3,c,d,Y
,e,f,
Keywords: join, merge, overlay
This function is not exported.
Arguments ('*' denotes required arguments):
=over 4
=item * B<inplace> => I<true>
Output to the same file as input.
lib/App/CSVUtils/csv_concat.pm view on Meta::CPAN
=item * B<output_tsv> => I<bool>
Inform that output file is TSV (tab-separated) format instead of CSV.
This is like C<--input-tsv> option but for output instead of input.
Overriden by C<--output-sep-char>, C<--output-quote-char>, C<--output-escape-char>
options. If one of those options is specified, then C<--output-tsv> will be
ignored.
=item * B<overlay> => I<bool>
Whether to overlay rows from second and subsequent CSV files to the first.
By default, rows from the second CSV file will be added after all the rows from
the first CSV are added, and so on. However, when this option is enabled, the
rows the second and subsequent CSV files will be added together (overlaid). See
the utility's example for an illustration.
See also the C<--overwrite-fields> option.
=item * B<overwrite> => I<bool>
Whether to override existing output file.
=item * B<overwrite_fields> => I<bool>
Whether fields from subsequent CSV files should overwrite existing fields from previous CSV files.
When in overlay mode (C<--overlay>), by default the value for a field is
retrieved from the first CSV file that has the field. With C<--overwrite-fields>
option enabled, the value will be retrieved from the last CSV that has the
field. See the utility's example for an illustration.
=back
Returns an enveloped result (an array).
First element ($status_code) is an integer containing HTTP-like status code
script/csv-concat view on Meta::CPAN
=head1 VERSION
This document describes version 1.036 of csv-concat (from Perl distribution App-CSVUtils), released on 2025-02-04.
=head1 SYNOPSIS
B<csv-concat> B<L<--help|/"--help, -h, -?">> (or B<L<-h|/"--help, -h, -?">>, B<L<-?|/"--help, -h, -?">>)
B<csv-concat> B<L<--version|/"--version, -v">> (or B<L<-v|/"--version, -v">>)
B<csv-concat> [B<L<--debug|/"--debug">>|B<L<--log-level|/"--log-level=s">>=I<level>|B<L<--quiet|/"--quiet">>|B<L<--trace|/"--trace">>|B<L<--verbose|/"--verbose">>] [B<L<--format|/"--format=s">>=I<name>|B<L<--json|/"--json">>] [B<L<--inplace|/"--inpla...
=head1 DESCRIPTION
Example, concatenating this CSV:
col1,col2
1,2
3,4
and:
script/csv-concat view on Meta::CPAN
col1,col2,col4,col3
1,2,
3,4,
,a,b
,c,d
,e,f
,,,X
,,,Y
When C<--overlay> option is enabled, the result will be:
col1,col2,col4,col3
1,2,b,X
3,4,d,Y
,e,f,
When C<--overlay> as well as C<--overwrite-fields> option are enabled, the result
will be:
col1,col2,col4,col3
1,a,b,X
3,c,d,Y
,e,f,
Keywords: join, merge, overlay
=head1 OPTIONS
C<*> marks required options.
=head2 Main options
=over
=item B<--input-skip-num-lines>=I<s>
script/csv-concat view on Meta::CPAN
This is an alternative to the C<--input-skip-num-lines> and can be useful if you
have a CSV files (usually some generated reports, sometimes converted from
spreadsheet) that have additional header lines or info before the CSV header
row.
With C<--input-skip-num-lines>, you skip a fixed number of lines. With this
option, rows will be skipped until the first field matches the specified regex
pattern.
=item B<--overlay>
Whether to overlay rows from second and subsequent CSV files to the first.
By default, rows from the second CSV file will be added after all the rows from
the first CSV are added, and so on. However, when this option is enabled, the
rows the second and subsequent CSV files will be added together (overlaid). See
the utility's example for an illustration.
See also the C<--overwrite-fields> option.
=item B<--overwrite-fields>
Whether fields from subsequent CSV files should overwrite existing fields from previous CSV files.
When in overlay mode (C<--overlay>), by default the value for a field is
retrieved from the first CSV file that has the field. With C<--overwrite-fields>
option enabled, the value will be retrieved from the last CSV that has the
field. See the utility's example for an illustration.
=back
=head2 Input options
=over
t/01-basic.t view on Meta::CPAN
$res = App::CSVUtils::csv_convert_to_hash::csv_convert_to_hash(input_filename=>"$dir/1.csv", rownum=>10);
is_deeply($res, [200,"OK",{f1=>undef, f2=>undef, f3=>undef}], "result 3");
};
subtest csv_concat => sub {
my ($res, $stdout);
require App::CSVUtils::csv_concat;
$stdout = capture_stdout { $res = App::CSVUtils::csv_concat::csv_concat(input_filenames=>["$dir/1.csv","$dir/2.csv","$dir/4.csv"]) };
is($stdout, qq(f1,f2,f3,F3\n1,2,3,\n4,5,6,\n7,8,9,\n1,,,\n2,,,\n3,,,\n1,3,,2\n4,6,,5\n), "output");
$stdout = capture_stdout { $res = App::CSVUtils::csv_concat::csv_concat(input_filenames=>["$dir/1.csv","$dir/2.csv","$dir/4.csv"], overlay=>1) };
is($stdout, qq(f1,f2,f3,F3\n1,2,3,2\n4,5,6,5\n7,8,9,\n), "output");
$stdout = capture_stdout { $res = App::CSVUtils::csv_concat::csv_concat(input_filenames=>["$dir/1.csv","$dir/2.csv","$dir/4.csv"], overlay=>1, overwrite_fields=>1) };
is($stdout, qq(f1,f2,f3,F3\n1,3,3,2\n4,6,6,5\n3,8,9,\n), "output");
};
subtest csv_select_fields => sub {
my ($res, $stdout);
require App::CSVUtils::csv_select_fields;
$res = App::CSVUtils::csv_select_fields::csv_select_fields(input_filename=>"$dir/1.csv", include_fields=>["f1", "f4"]);
is($res->[0], 400, "specifying unknown field -> error");
( run in 0.498 second using v1.01-cache-2.11-cpan-49f99fa48dc )