App-ElasticSearch-Utilities
view release on metacpan or search on metacpan
scripts/es-copy-index.pl view on Meta::CPAN
scroll => '1m',
}
});
verbose(sprintf "Batch of %d done in %00.2fs.", $batch, $took);
}
sub show_counts {
my $inc_records = shift;
output({color=>'green'}, "Starting copy of $INDEX{from} to $HOST{to}:$INDEX{to}.") if $RECORDS == 0;
$RECORDS += $inc_records;
if( $RECORDS % ($INDEX{block} * 10) == 0 ) {
my $now = time;
my $diff = $now - $LAST;
my @time=localtime;
my $msg = sprintf "%00.2f%% %02d:%02d:%02d Records: %d of %d in %0.2fs", ($RECORDS/$TOTAL)*100, @time[2,1,0], $RECORDS, $TOTAL, $diff;
output({color=>'yellow'}, $msg);
$LAST=$now;
}
}
__END__
=pod
=head1 NAME
es-copy-index.pl - Copy an index from one cluster to another
=head1 VERSION
version 8.9
=head1 SYNOPSIS
es-copy-access.pl [options] [query to select documents]
Options:
--source (Required) The source index name for the copy
--destination Destination index name, assumes source
--from (Required) A server in the cluster where the index lives
--to A server in the cluster where the index will be copied to
--block How many docs to process in one batch, default: 1,000
--mapping JSON mapping to use instead of the source mapping
--settings JSON index settings to use instead of those from the source
--append Instead of creating the index, add the documents to the destination
--help print help
--manual print full manual
From App::ElasticSearch::Utilities:
--local Use localhost as the elasticsearch host
--host ElasticSearch host to connect to
--port HTTP port for your cluster
--proto Defaults to 'http', can also be 'https'
--http-username HTTP Basic Auth username
--password-exec Script to run to get the users password
--insecure Don't verify TLS certificates
--cacert Specify the TLS CA file
--capath Specify the directory with TLS CAs
--cert Specify the path to the client certificate
--key Specify the path to the client private key file
--noop Any operations other than GET are disabled, can be negated with --no-noop
--timeout Timeout to ElasticSearch, default 10
--keep-proxy Do not remove any proxy settings from %ENV
--index Index to run commands against
--base For daily indexes, reference only those starting with "logstash"
(same as --pattern logstash-* or logstash-DATE)
--pattern Use a pattern to operate on the indexes
--days If using a pattern or base, how many days back to go, default: 1
See also the "CONNECTION ARGUMENTS" and "INDEX SELECTION ARGUMENTS" sections from App::ElasticSearch::Utilities.
From CLI::Helpers:
--data-file Path to a file to write lines tagged with 'data => 1'
--tags A comma separated list of tags to display
--color Boolean, enable/disable color, default use git settings
--verbose Incremental, increase verbosity (Alias is -v)
--debug Show developer output
--debug-class Show debug messages originating from a specific package, default: main
--quiet Show no output (for cron)
--syslog Generate messages to syslog as well
--syslog-facility Default "local0"
--syslog-tag The program name, default is the script name
--syslog-debug Enable debug messages to syslog if in use, default false
--nopaste Use App::Nopaste to paste output to configured paste service
--nopaste-public Defaults to false, specify to use public paste services
--nopaste-service Comma-separated App::Nopaste service, defaults to Shadowcat
=head1 DESCRIPTION
This script allows you to copy data from one index to another on the same cluster or
on a separate cluster. It handles index creation, either directly copying the mapping
and settings from the source index or from mapping/settings JSON files.
This script could also be used to split up an index into smaller indexes for any number of reasons.
This uses the reindex API to copy data from one cluster to another
=head1 NAME
es-copy-index.pl - Copy an index from one cluster to another
=head1 OPTIONS
=over 8
=item B<from>
B<REQUIRED>: hostname or IP of the source cluster
=item B<to>
Hostname or IP of the destination cluster, defaults to the same host unless otherwise specified.
=item B<source>
scripts/es-copy-index.pl view on Meta::CPAN
src_ip:test.dat => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
src_ip:test.json[ip] => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
This make it simple to use the --data-file output options and build queries
based off previous queries. For .txt and .dat file, the delimiter for columns
in the file must be either a tab or a null. For files ending in
.csv, Text::CSV_XS is used to accurate parsing of the file format. Files
ending in .json are considered to be newline-delimited JSON.
You can also specify the column of the data file to use, the default being the last column or (-1). Columns are
B<zero-based> indexing. This means the first column is index 0, second is 1, .. The previous example can be rewritten
as:
src_ip:test.dat[1]
or:
src_ip:test.dat[-1]
For newline delimited JSON files, you need to specify the key path you want to extract from the file. If we have a
JSON source file with:
{ "first": { "second": { "third": [ "bob", "alice" ] } } }
{ "first": { "second": { "third": "ginger" } } }
{ "first": { "second": { "nope": "fred" } } }
We could search using:
actor:test.json[first.second.third]
Which would expand to:
{ "terms": { "actor": [ "alice", "bob", "ginger" ] } }
This option will iterate through the whole file and unique the elements of the list. They will then be transformed into
an appropriate L<terms query|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html>.
=head3 Wildcards
We can also have a group of wildcard or regexp in a file:
$ cat wildcards.dat
*@gmail.com
*@yahoo.com
To enable wildcard parsing, prefix the filename with a C<*>.
es-search.pl to_address:*wildcards.dat
Which expands the query to:
{
"bool": {
"minimum_should_match":1,
"should": [
{"wildcard":{"to_outbound":{"value":"*@gmail.com"}}},
{"wildcard":{"to_outbound":{"value":"*@yahoo.com"}}}
]
}
}
No attempt is made to verify or validate the wildcard patterns.
=head3 Regular Expressions
If you'd like to specify a file full of regexp, you can do that as well:
$ cat regexp.dat
.*google\.com$
.*yahoo\.com$
To enable regexp parsing, prefix the filename with a C<~>.
es-search.pl to_address:~regexp.dat
Which expands the query to:
{
"bool": {
"minimum_should_match":1,
"should": [
{"regexp":{"to_outbound":{"value":".*google\\.com$"}}},
{"regexp":{"to_outbound":{"value":".*yahoo\\.com$"}}}
]
}
}
No attempt is made to verify or validate the regexp expressions.
=head2 App::ElasticSearch::Utilities::QueryString::Nested
Implement the proposed nested query syntax early. Example:
nested_path:"field:match AND string"
=head1 AUTHOR
Brad Lhotsky <brad@divisionbyzero.net>
=head1 COPYRIGHT AND LICENSE
This software is Copyright (c) 2026 by Brad Lhotsky.
This is free software, licensed under:
The (three-clause) BSD License
=cut
( run in 2.062 seconds using v1.01-cache-2.11-cpan-5837b0d9d2c )