App-SpreadsheetUtils

 view release on metacpan or  search on metacpan

lib/App/SpreadsheetUtils.pm  view on Meta::CPAN


- `input_row_as_hashref`, hos (hash of str). The current input CSV row as a
  hashref, with field names as hash keys and field values as hash values. This
  will only be calculated if utility wants it. Utility can express so by setting
  `$r->{wants_input_row_as_hashref}` to true, e.g. in the `on_begin` hook.

- `input_header_row_count`, uint. Contains the number of actual header rows that
  have been read. If CLI user specifies `--no-input-header`, this will stay at
  zero. Will be reset for each CSV file.

- `input_data_row_count`, int. Contains the number of actual data rows that have
  read. Will be reset for each CSV file.

If you are outputting CSV (`writes_csv` gen argument set to true), the following
keys will be available:

- `output_emitter`, a <pm:Text::CSV_XS> instance for output.

- `output_filenames`, array of str.

- `output_filename`, str, name of current output file.

- `output_filenum`, uint, the number of the current output file, 1 being the
  first file, 2 for the second, and so on.

- `output_fh`, handle to the current output file.

- `output_rownum`, uint. The number of rows that have been outputted (reset
  after each output file).

- `output_data_rownum`, uint. The number of data rows that have been outputted
  (reset after each output file). This will be equal to `input_rownum` less 1 if
  input file has header.

For other hook-specific keys, see the documentation for associated hook point.


*ACCEPTING ADDITIONAL COMMAND-LINE OPTIONS/ARGUMENTS*

As mentioned above, you will get additional command-line options/arguments in
`$r->{util_args}` hashref. Some options/arguments are already added by
`gen_csv_util`, e.g. `input_filename` or `input_filenames` along with
`input_sep_char`, etc (when your utility declares `reads_csv`),
`output_filename` or `output_filenames` along with `overwrite`,
`output_sep_char`, etc (when your utility declares `writes_csv`).

If you want to accept additional arguments/options, you specify them in
`add_args` (hashref, with key being Each option/argument has to be specified
first via `add_args` (as hashref, with key being argument name and value the
argument specification as defined in <pm:Rinci::function>)). Some argument
specifications have been defined in <pm:App::CSVUtils> and can be used. See
existing utilities for examples.


*READING CSV DATA*

To read CSV data, normally your utility would provide handler for the
`on_input_data_row` hook and sometimes additionally `on_input_header_row`.


*OUTPUTTING STRING OR RETURNING RESULT*

To output string, usually you call the provided routine `$r->{code_print}`. This
routine will open the output files for you.

You can also return enveloped result directly by setting `$r->{result}`.


*OUTPUTTING CSV DATA*

To output CSV data, usually you call the provided routine `$r->{code_print_row}`.
This routine accepts a row (arrayref or hashref). This routine will open the
output files for you when needed, as well as print header row automatically.

You can also buffer rows from input to e.g. `$r->{output_rows}`, then call
`$r->{code_print_row}` repeatedly in the `after_read_input` hook to print all the
rows.


*READING MULTIPLE CSV FILES*

To read multiple CSV files, you first specify `reads_multiple_csv`. Then, you
can supply handler for `on_input_header_row` and `on_input_data_row` as usual.
If you want to do something before/after each input file, you can also supply
handler for `before_open_input_file` or `after_close_input_file`.


*WRITING TO MULTIPLE CSV FILES*

Similarly, to write to many CSv files, you first specify `writes_multiple_csv`.
Then, you can supply handler for `on_input_header_row` and `on_input_data_row`
as usual. To switch to the next file, set
`$r->{wants_switch_to_next_output_file}` to true, in which case the next call to
`$r->{code_print_row}` will close the current file and open the next file.


*CHANGING THE OUTPUT FIELDS*

When calling `$r->{code_print_row}`, you can output whatever fields you want. By
convention, you can set `$r->{output_fields}` and `$r->{output_fields_idx}` to
let other handlers know about the output fields. For example, see the
implementation of <prog:csv-concat>.

_
    args => {
        name => {
            schema => 'perl::identifier::unqualified_ascii*',
            req => 1,
            tags => ['category:metadata'],
        },
        summary => {
            schema => 'str*',
            tags => ['category:metadata'],
        },
        description => {
            schema => 'str*',
            tags => ['category:metadata'],
        },
        links => {
            schema => ['array*', of=>'hash*'], # XXX defhashes
            tags => ['category:metadata'],
        },
        examples => {
            schema => ['array*'], # defhashes
            tags => ['category:metadata'],
        },
        add_meta_props => {
            summary => 'Add additional Rinci function metadata properties',
            schema => ['hash*'],
            tags => ['category:metadata'],
        },
        add_args => {
            schema => ['hash*'],
            tags => ['category:metadata'],
        },
        add_args_rels => {
            schema => ['hash*'],
            tags => ['category:metadata'],
        },

        reads_csv => {
            summary => 'Whether utility reads CSV data',
            'summary.alt.bool.not' => 'Specify that utility does not read CSV data',
            schema => 'bool*',
            default => 1,
        },
        reads_multiple_csv => {
            summary => 'Whether utility accepts CSV data',
            schema => 'bool*',
            description => <<'_',

Setting this option to true will implicitly set the `reads_csv` option to true,
obviously.

_
        },
        writes_csv => {

lib/App/SpreadsheetUtils.pm  view on Meta::CPAN

                            $r->{output_fields_idx} = {};
                            for my $j (0 .. $#{ $r->{output_fields} }) {
                                $r->{output_fields_idx}{ $r->{output_fields}[$j] } = $j;
                            }
                        }

                        $code_open_file->();

                        # print header line, if not yet
                        if ($outputs_header && !$r->{has_printed_header}) {
                            $r->{has_printed_header}++;
                            $r->{output_emitter}->print($r->{output_fh}, $r->{output_fields});
                            print { $r->{output_fh} } "\n";
                            $r->{output_rownum}++;
                        }
                    };
                    $r->{code_print_header_row} = $code_print_header_row;

                    my $code_print_row = sub {
                        my $row = shift;

                        $code_print_header_row->();

                        # print data line
                        if ($row) {
                            if (ref $row eq 'HASH') {
                                my $row0 = $row;
                                $row = [];
                                for my $j (0 .. $#{ $r->{output_fields} }) {
                                    $row->[$j] = $row0->{ $r->{output_fields}[$j] } // '';
                                }
                            }
                            $r->{output_emitter}->print( $r->{output_fh}, $row );
                            print { $r->{output_fh} } "\n";
                            $r->{output_rownum}++;
                            $r->{output_data_rownum}++;
                        }
                    }; # code_print_row
                    $r->{code_print_row} = $code_print_row;
                } # if outputs csv

                if ($before_read_input) {
                    log_trace "Calling before_read_input handler ...";
                    $before_read_input->($r);
                }

              READ_CSV: {
                    last unless $reads_csv;

                    my $input_parser = _instantiate_parser(\%util_args, 'input_');
                    $r->{input_parser} = $input_parser;

                    my @input_filenames;
                    if ($reads_multiple_csv) {
                        @input_filenames = @{ $util_args{input_filenames} // ['-'] };
                    } else {
                        @input_filenames = ($util_args{input_filename} // '-');
                    }
                    $r->{input_filenames} //= \@input_filenames;

                  BEFORE_INPUT_FILENAME:
                    $r->{input_filenum} = 0;

                  INPUT_FILENAME:
                    for my $input_filename (@input_filenames) {
                        $r->{input_filenum}++;
                        $r->{input_filename} = $input_filename;

                        if ($r->{input_filenum} == 1 && $before_open_input_files) {
                            log_trace "Calling before_open_input_files handler ...";
                            $before_open_input_files->($r);
                            if (delete $r->{wants_skip_files}) {
                                log_trace "Handler wants to skip files, skipping all input files";
                                last READ_CSV;
                            }
                        }

                        if ($before_open_input_file) {
                            log_trace "Calling before_open_input_file handler ...";
                            $before_open_input_file->($r);
                            if (delete $r->{wants_skip_file}) {
                                log_trace "Handler wants to skip this file, moving on to the next file";
                                next INPUT_FILENAME;
                            } elsif (delete $r->{wants_skip_files}) {
                                log_trace "Handler wants to skip all files, skipping all input files";
                                last READ_CSV;
                            }
                        }

                        log_info "[file %d/%d] Reading input file %s ...",
                            $r->{input_filenum}, scalar(@input_filenames), $input_filename;
                        my ($fh, $err) = _open_file_read($input_filename);
                        die $err if $err;
                        $r->{input_fh} = $r->{input_fhs}[ $r->{input_filenum}-1 ] = $fh;

                        my $i;
                        $r->{input_header_row_count} = 0;
                        $r->{input_data_row_count} = 0;
                        $r->{input_fields} = []; # array, field names in order
                        $r->{input_field_idxs} = {}; # key=field name, value=index (0-based)
                        my $row0;
                        my $code_getline = sub {
                            if ($r->{stdin_input_fields} && $r->{input_filename} eq '-') {
                                if ($i == 0) {
                                    # we have read the header for stdin. since
                                    # we can't seek to the beginning, we return
                                    # the saved fields
                                    $r->{input_header_row_count}++;
                                    return $r->{stdin_input_fields};
                                } else {
                                    my $row = $input_parser->getline($r->{input_fh});
                                    $r->{input_data_row_count}++ if $row;
                                    return $row;
                                }
                            } elsif ($i == 0 && !$has_header) {
                                # this is the first line of a file and user
                                # specifies there is no input header. we save
                                # the line and return the generated field names
                                # instead.
                                $row0 = $input_parser->getline($r->{input_fh});
                                return unless $row0;
                                return [map { "field$_" } 1..@$row0];
                            } elsif ($i == 1 && !$has_header) {
                                # we return the saved first line
                                $r->{input_data_row_count}++ if $row0;
                                return $row0;
                            }
                            my $res = $input_parser->getline($r->{input_fh});
                            if ($res) {
                                $r->{input_header_row_count}++ if $i==0;
                                $r->{input_data_row_count}++ if $i;
                            }
                            $res;
                        };
                        $r->{code_getline} = $code_getline;

                        $i = 0;
                        while ($r->{input_row} = $code_getline->()) {
                            $i++;
                            $r->{input_rownum} = $i;
                            $r->{input_data_rownum} = $has_header ? $i-1 : $i;
                            if ($i == 1) {
                                # gather the list of fields
                                $r->{input_fields} = $r->{input_row};
                                $r->{stdin_input_fields} //= $r->{input_row} if $input_filename eq '-';
                                $r->{orig_input_fields} = $r->{input_fields};
                                $r->{input_fields_idx} = {};
                                for my $j (0 .. $#{ $r->{input_fields} }) {
                                    $r->{input_fields_idx}{ $r->{input_fields}[$j] } = $j;
                                }

                                if ($on_input_header_row) {
                                    log_trace "Calling on_input_header_row hook handler ...";
                                    $on_input_header_row->($r);

                                    if (delete $r->{wants_skip_file}) {
                                        log_trace "Handler wants to skip this file, moving on to the next file";
                                        next INPUT_FILENAME;
                                    } elsif (delete $r->{wants_skip_files}) {
                                        log_trace "Handler wants to skip all files, skipping all input files";
                                        last READ_CSV;
                                    }
                                }

                                # reindex the fields, should the above hook
                                # handler adds/removes fields. let's save the
                                # old fields_idx to orig_fields_idx.
                                $r->{orig_input_fields_idx} = $r->{input_fields_idx};
                                $r->{input_fields_idx} = {};
                                for my $j (0 .. $#{ $r->{input_fields} }) {
                                    $r->{input_fields_idx}{ $r->{input_fields}[$j] } = $j;
                                }

                            } else {
                                # generate the hashref version of row if utility
                                # requires it
                                if ($r->{wants_input_row_as_hashref}) {
                                    $r->{input_row_as_hashref} = {};
                                    for my $j (0 .. $#{ $r->{input_row} }) {
                                        # ignore extraneous data fields
                                        last if $j >= @{ $r->{input_fields} };
                                        $r->{input_row_as_hashref}{ $r->{input_fields}[$j] } = $r->{input_row}[$j];
                                    }
                                }

                                if ($on_input_data_row) {
                                    log_trace "Calling on_input_data_row hook handler (for first data row) ..." if $r->{input_rownum} <= 2;
                                    $on_input_data_row->($r);

                                    if (delete $r->{wants_skip_file}) {
                                        log_trace "Handler wants to skip this file, moving on to the next file";
                                        next INPUT_FILENAME;
                                    } elsif (delete $r->{wants_skip_files}) {
                                        log_trace "Handler wants to skip all files, skipping all input files";
                                        last READ_CSV;
                                    }
                                }
                            }

                        } # while getline

                        # XXX actually close filehandle except stdin

                        if ($after_close_input_file) {
                            log_trace "Calling after_close_input_file handler ...";
                            $after_close_input_file->($r);
                            if (delete $r->{wants_skip_files}) {
                                log_trace "Handler wants to skip reading all file, skipping";
                                last READ_CSV;
                            }
                        }
                    } # for input_filename

                    if ($after_close_input_files) {
                        log_trace "Calling after_close_input_files handler ...";
                        $after_close_input_files->($r);
                    }

                } # READ_CSV

                # cleanup stash from csv-reading-related keys
                delete $r->{input_filenames};
                delete $r->{input_filenum};
                delete $r->{input_filename};
                delete $r->{input_fh};
                delete $r->{input_rownum};
                delete $r->{input_data_rownum};
                delete $r->{input_row};
                delete $r->{input_row_as_hashref};
                delete $r->{input_fields};
                delete $r->{input_fields_idx};
                delete $r->{orig_input_fields_idx};
                delete $r->{code_getline};
                delete $r->{wants_input_row_as_hashref};

                if ($after_read_input) {
                    log_trace "Calling after_read_input handler ...";
                    $after_read_input->($r);
                }

                # cleanup stash from csv-outputting-related keys
                delete $r->{output_filenames};
                delete $r->{output_num_of_files};
                delete $r->{output_filenum};
                if ($r->{output_fh}) {
                    if ($r->{output_filename} ne '-') {
                        log_info "Closing output file '$r->{output_filename}' ...";
                        close $r->{output_fh} or die [500, "Can't close output file '$r->{output_filename}': $!"];
                    }
                    delete $r->{output_fh};
                }
                delete $r->{output_filename};



( run in 0.651 second using v1.01-cache-2.11-cpan-39bf76dae61 )