DBIx-BatchChunker
view release on metacpan or search on metacpan
If either of the min/max statements don't return any ID data, this
method will return 0.
execute
my $batch_chunker = DBIx::BatchChunker->new(
# ...other attributes for calculate_ranges...
dbi_connector => $conn, # DBIx::Connector::Retry object
stmt => $do_stmt, # INSERT/UPDATE/DELETE $stmt with BETWEEN placeholders
### OR ###
dbi_connector => $conn, # DBIx::Connector::Retry object
stmt => $select_stmt, # SELECT $stmt with BETWEEN placeholders
count_stmt => $count_stmt, # SELECT COUNT $stmt to be used for min_chunk_percent; optional
coderef => $coderef, # called code that does the actual work
### OR ###
rs => $account_rs, # base ResultSet, which gets filtered with -between later on
id_name => 'account_id', # can be looked up if not provided
coderef => $coderef, # called code that does the actual work
### OR ###
coderef => $coderef, # DIY database work; just pass the $start/$end IDs
### Optional but recommended ###
sleep => 0.25, # number of seconds to sleep each chunk; defaults to 0
process_past_max => 1, # use this if processing the whole table
single_rows => 1, # does $coderef get a single $row or the whole $chunk_rs / $stmt
min_chunk_percent => 0.25, # minimum row count of chunk size percentage; defaults to 0.5 (or 50%)
target_time => 5, # target runtime for dynamic chunk size scaling; default is 5 seconds
max_runtime => 12 * 60 * 60, # stop processing after 12 hours
progress_name => 'Updating Accounts', # easier than creating your own progress_bar
### Optional ###
progress_bar => $progress, # defaults to "Processing $source_name" bar
verbose => 1, # displays timing stats on each chunk
);
$batch_chunker->execute if $batch_chunker->calculate_ranges;
Applies the configured DB changes in chunks. Runs through the loop,
processing a statement handle, ResultSet, and/or coderef as it goes.
Each loop iteration processes a chunk of work, determined by
"chunk_size".
The "calculate_ranges" method should be run first to fill in "min_id"
and "max_id". If either of these are missing, the function will assume
"calculate_ranges" couldn't find them and warn about it.
More details can be found in the "Processing Modes" and "ATTRIBUTES"
sections.
PRIVATE METHODS
_process_block
Runs the DB work and passes it to the coderef. Its return value
determines whether the block should be processed or not.
_process_past_max_checker
Checks to make sure the current endpoint is actually the end, by
checking the database. Its return value determines whether the block
should be processed or not.
See "process_past_max".
_chunk_count_checker
Checks the chunk count to make sure it's properly sized. If not, it
will try to shrink or expand the current chunk (in chunk_size
increments) as necessary. Its return value determines whether the block
should be processed or not.
See "min_chunk_percent".
This is not to be confused with the "_runtime_checker", which adjusts
chunk_size after processing, based on previous run times.
_runtime_checker
Stores the previously processed chunk's runtime, and then adjusts
chunk_size as necessary.
See "target_time".
_increment_progress
Increments the progress bar.
_print_chunk_status
Prints out a standard chunk status line, if "verbose" is enabled. What
it prints is generally uniform, but it depends on the processing
action. Most of the data is pulled from "loop_state".
CAVEATS
Big Number Support
If the module detects that the ID numbers are no longer safe for
standard Perl NV storage, it will automatically switch to using
Math::BigInt and Math::BigFloat for big number support. If any blessed
numbers are already being used to define the attributes, this will also
switch on the support.
String-based IDs
If you're working with VARCHAR types or other string-based IDs to
represent integers, these may be subject to whatever string-based
comparison rules your RDBMS uses when calculating with MIN/MAX or using
BETWEEN. Row counting and chunk size scaling will try to compensate,
but will be mixing string-based comparisons from the RDBMS and
Perl-based integer math.
Using the CAST function may help, but it may also cause critical
indexes to be ignored, especially if the function is used on the
left-hand side against the column. Strings with the exact same length
may be safe from comparison weirdness, but YMMV.
Non-integer inputs from ID columns, such as GUIDs or other alphanumeric
strings, are not currently supported. They would have to be converted
( run in 0.656 second using v1.01-cache-2.11-cpan-13bb782fe5a )