App-CSVUtils
view release on metacpan or search on metacpan
lib/App/CSVUtils/csv2paras.pm view on Meta::CPAN
package App::CSVUtils::csv2paras;
use 5.010001;
use strict;
use warnings;
our $AUTHORITY = 'cpan:PERLANCAR'; # AUTHORITY
our $DATE = '2025-02-04'; # DATE
our $DIST = 'App-CSVUtils'; # DIST
our $VERSION = '1.036'; # VERSION
use App::CSVUtils qw(gen_csv_util);
use String::Pad qw(pad);
sub _escape_value {
my $val = shift;
$val =~ s/(\\|\n)/$1 eq "\\" ? "\\\\" : "\\n\n "/eg;
$val;
}
sub _escape_header {
my $val = shift;
$val =~ s/(\\|\n|:)/$1 eq "\\" ? "\\\\" : $1 eq ":" ? "\\:" : "\\n\n "/eg;
$val;
}
gen_csv_util(
name => 'csv2paras',
summary => 'Convert CSV to paragraphs',
description => <<'_',
This utility converts CSV format like this:
name,email,phone,notes
bill,bill@example.com,555-1236,+
lisa,lisa@example.com,555-1235,from work
jimmy,jimmy@example.com,555-1237,
into paragraphs format like this, which resembles (but not strictly follows)
email headers (RFC-822) or internet message headers (RFC-5322):
name: bill
email: bill@example.com
phone: 555-1236
notes: +
name: lisa
email: lisa@example.com
phone: 555-1235
notes: from work
name: jimmy
email: jimmy@example.com
phone: 555-1237
notes:
Why display in this format? It might be more visually readable or diff-able
especially if there are a lot of fields and/or there are long values.
If a CSV value contains newline, it will escaped "\n", e.g.:
# CSV
name,email,phone,notes
beth,beth@example.com,555-1231,"Has no last name
Might be adopted sometime by Jimmy"
matthew,matthew@example.com,555-1239,"Quit
or fired?"
# paragraph
name: beth
email: beth@example.com
phone: 555-1231
notes: Has no last name\nMight be adopted sometime by Jimmy
name: matthew
email: matthew@example.com
phone: 555-1239
notes: Quit\n\n or fired?
If a CSV value contains literal "\" (backslash) it will be escaped as "\\".
Long lines are also by default folded at 78 columns (but you can customize with
the `--width` option); if a line is folded a literal backslash is added to the
end of each physical line and the next line will be indented by two spaces:
notes: This is a long note. This is a long note. This is a long note. This is
a long note. This is a long note.
A long word is also folded and the next line will be indented by one space:
notes: Thisisalongwordthisisalongwordthisisalongwordthisisalongwordthisisalongw
ord
Newline and backslash are also escaped in header; additionally a literal ":"
(colon) is escaped into "\:".
There is option to skip displaying empty fields (`--hide-empty-values`) and to
align the ":" header separator.
Keywords: paragraphs, cards, pages, headers
_
add_args => {
width => {
summary => 'The width at which to fold long lines, -1 means to never fold',
schema => ['int*', 'clset|'=>[{is=>-1, "is.err_msg"=>"Must be >0 or -1"}, {min=>1}]],
default => 78,
},
hide_empty_values => {
summary => 'Whether to skip showing empty values',
schema => 'bool*',
},
align => {
summary => 'Whether to align header separator across lines',
schema => 'bool*',
description => <<'_',
Note that if you want to convert the paragraphs back to CSV later using
<prog:paras2csv>, the padding spaces added by this option will become part of
header value, unless you use its `--trim-header` or `--rtrim-header` option.
_
},
},
links => [
{url=>'prog:paras2csv'},
],
tags => ['category:converting'],
examples => [
{
summary => 'Convert to paragraphs format, show fields alphabetically, do not fold, hide empty values',
src => 'csv-sort-fields INPUT.csv | [[prog]] --width=-1 --hide-empty-values',
src_plang => 'bash',
test => 0,
'x.doc.show_result' => 0,
},
],
on_input_header_row => sub {
my $r = shift;
# these are the keys we add to the stash
$r->{escaped_headers} = [];
$r->{longest_header_len} = 0;
for my $field (@{ $r->{input_fields} }) {
push @{ $r->{escaped_headers} }, _escape_header($field);
my $l = length($r->{escaped_headers}[-1]);
$r->{longest_header_len} = $l if $r->{longest_header_len} < $l;
}
},
on_input_data_row => sub {
my $r = shift;
print "\n" if $r->{input_data_rownum} > 1;
for my $i (0 .. $#{ $r->{input_fields} }) {
my $val = $r->{input_row}[$i];
next if $r->{util_args}{hide_empty_values} && length $val == 0;
my $line =
($r->{util_args}{align} ? pad($r->{escaped_headers}[$i], $r->{longest_header_len}, "r") : $r->{escaped_headers}[$i]).
": ".
_escape_value($val);
if ($r->{util_args}{width} == -1 || length($line) <= $r->{util_args}{width}) {
print $line, "\n";
} else {
require Text::Wrap::NoStrip;
local $Text::Wrap::NoStrip::columns = $r->{util_args}{width};
my $wrapped_line = Text::Wrap::NoStrip::wrap("", " ", $line);
print $wrapped_line, "\n";
}
}
},
writes_csv => 0,
);
1;
# ABSTRACT: Convert CSV to paragraphs
__END__
=pod
=encoding UTF-8
=head1 NAME
App::CSVUtils::csv2paras - Convert CSV to paragraphs
=head1 VERSION
This document describes version 1.036 of App::CSVUtils::csv2paras (from Perl distribution App-CSVUtils), released on 2025-02-04.
=head1 FUNCTIONS
=head2 csv2paras
Usage:
csv2paras(%args) -> [$status_code, $reason, $payload, \%result_meta]
Convert CSV to paragraphs.
This utility converts CSV format like this:
name,email,phone,notes
bill,bill@example.com,555-1236,+
lisa,lisa@example.com,555-1235,from work
jimmy,jimmy@example.com,555-1237,
into paragraphs format like this, which resembles (but not strictly follows)
email headers (RFC-822) or internet message headers (RFC-5322):
name: bill
email: bill@example.com
phone: 555-1236
notes: +
name: lisa
email: lisa@example.com
phone: 555-1235
notes: from work
name: jimmy
email: jimmy@example.com
phone: 555-1237
notes:
Why display in this format? It might be more visually readable or diff-able
especially if there are a lot of fields and/or there are long values.
If a CSV value contains newline, it will escaped "\n", e.g.:
# CSV
name,email,phone,notes
beth,beth@example.com,555-1231,"Has no last name
Might be adopted sometime by Jimmy"
matthew,matthew@example.com,555-1239,"Quit
or fired?"
# paragraph
name: beth
email: beth@example.com
phone: 555-1231
notes: Has no last name\nMight be adopted sometime by Jimmy
name: matthew
email: matthew@example.com
phone: 555-1239
notes: Quit\n\n or fired?
If a CSV value contains literal "\" (backslash) it will be escaped as "\".
Long lines are also by default folded at 78 columns (but you can customize with
the C<--width> option); if a line is folded a literal backslash is added to the
end of each physical line and the next line will be indented by two spaces:
notes: This is a long note. This is a long note. This is a long note. This is
a long note. This is a long note.
A long word is also folded and the next line will be indented by one space:
notes: Thisisalongwordthisisalongwordthisisalongwordthisisalongwordthisisalongw
ord
Newline and backslash are also escaped in header; additionally a literal ":"
(colon) is escaped into "\:".
There is option to skip displaying empty fields (C<--hide-empty-values>) and to
align the ":" header separator.
Keywords: paragraphs, cards, pages, headers
This function is not exported.
Arguments ('*' denotes required arguments):
=over 4
=item * B<align> => I<bool>
Whether to align header separator across lines.
Note that if you want to convert the paragraphs back to CSV later using
L<paras2csv>, the padding spaces added by this option will become part of
header value, unless you use its C<--trim-header> or C<--rtrim-header> option.
=item * B<hide_empty_values> => I<bool>
Whether to skip showing empty values.
=item * B<input_escape_char> => I<str>
Specify character to escape value in field in input CSV, will be passed to Text::CSV_XS.
Defaults to C<\\> (backslash). Overrides C<--input-tsv> option.
=item * B<input_filename> => I<filename> (default: "-")
Input CSV file.
Use C<-> to read from stdin.
Encoding of input file is assumed to be UTF-8.
=item * B<input_header> => I<bool> (default: 1)
Specify whether input CSV has a header row.
By default, the first row of the input CSV will be assumed to contain field
names (and the second row contains the first data row). When you declare that
input CSV does not have header row (C<--no-input-header>), the first row of the
CSV is assumed to contain the first data row. Fields will be named C<field1>,
C<field2>, and so on.
=item * B<input_quote_char> => I<str>
Specify field quote character in input CSV, will be passed to Text::CSV_XS.
Defaults to C<"> (double quote). Overrides C<--input-tsv> option.
=item * B<input_sep_char> => I<str>
Specify field separator character in input CSV, will be passed to Text::CSV_XS.
Defaults to C<,> (comma). Overrides C<--input-tsv> option.
=item * B<input_skip_num_lines> => I<posint>
Number of lines to skip before header row.
This can be useful if you have a CSV files (usually some generated reports,
sometimes converted from spreadsheet) that have additional header lines or info
before the CSV header row.
See also the alternative option: C<--input-skip-until-pattern>.
=item * B<input_skip_until_pattern> => I<re_from_str>
Skip rows until the first header row matches a regex pattern.
This is an alternative to the C<--input-skip-num-lines> and can be useful if you
have a CSV files (usually some generated reports, sometimes converted from
spreadsheet) that have additional header lines or info before the CSV header
row.
With C<--input-skip-num-lines>, you skip a fixed number of lines. With this
option, rows will be skipped until the first field matches the specified regex
pattern.
=item * B<input_tsv> => I<true>
Inform that input file is in TSV (tab-separated) format instead of CSV.
Overriden by C<--input-sep-char>, C<--input-quote-char>, C<--input-escape-char>
options. If one of those options is specified, then C<--input-tsv> will be
ignored.
=item * B<width> => I<int> (default: 78)
The width at which to fold long lines, -1 means to never fold.
=back
Returns an enveloped result (an array).
First element ($status_code) is an integer containing HTTP-like status code
(200 means OK, 4xx caller error, 5xx function error). Second element
($reason) is a string containing error message, or something like "OK" if status is
200. Third element ($payload) is the actual result, but usually not present when enveloped result is an error response ($status_code is not 2xx). Fourth
element (%result_meta) is called result metadata and is optional, a hash
that contains extra information, much like how HTTP response headers provide additional metadata.
Return value: (any)
=head1 HOMEPAGE
Please visit the project's homepage at L<https://metacpan.org/release/App-CSVUtils>.
=head1 SOURCE
Source repository is at L<https://github.com/perlancar/perl-App-CSVUtils>.
=head1 SEE ALSO
L<Acme::MetaSyntactic::newsradio>
=head1 AUTHOR
perlancar <perlancar@cpan.org>
=head1 CONTRIBUTING
To contribute, you can send patches by email/via RT, or send pull requests on
GitHub.
Most of the time, you don't need to build the distribution yourself. You can
simply modify the code, then test via:
% prove -l
If you want to build the distribution (e.g. to try to install it locally on your
system), you can install L<Dist::Zilla>,
L<Dist::Zilla::PluginBundle::Author::PERLANCAR>,
L<Pod::Weaver::PluginBundle::Author::PERLANCAR>, and sometimes one or two other
Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond
that are considered a bug and can be reported to me.
=head1 COPYRIGHT AND LICENSE
This software is copyright (c) 2025 by perlancar <perlancar@cpan.org>.
This is free software; you can redistribute it and/or modify it under
( run in 1.128 second using v1.01-cache-2.11-cpan-39bf76dae61 )