App-ElasticSearch-Utilities
view release on metacpan or search on metacpan
scripts/es-search.pl view on Meta::CPAN
: $indices{$a} <=> $indices{$b};
}
__END__
=pod
=head1 NAME
es-search.pl - Provides a CLI for quick searches of data in ElasticSearch daily indexes
=head1 VERSION
version 8.8
=head1 SYNOPSIS
es-search.pl [search string]
Options:
--help print help
--manual print full manual
--filter Force filter context for all query elements
--show Comma separated list of fields to display, default is ALL, switches to tab output
--tail Continue the query until CTRL+C is sent
--top Perform an aggregation on the fields, by a comma separated list of up to 2 items
--by Perform an aggregation using the result of this, example: --by cardinality:src_ip
--with Perform a sub aggregation on the query
--bg-filter Only used if --top aggregation is significant_terms, applies a background filter
--match-all Enables the ElasticSearch match_all operator
--interval When running aggregations, wrap the aggreation in a date_histogram with this interval
--prefix Takes "field:string" and enables the Lucene prefix query for that field
--exists Field which must be present in the document
--missing Field which must not be present in the document
--size Result size, default is 20, aliased to -n and --limit
--max-batch-size When making requests to ES, retrieve this many docs in a single request, defaults to 50
--all Don't consider result size, just give me *everything*
--asc Sort by ascending timestamp
--desc Sort by descending timestamp (Default)
--sort List of fields for custom sorting
--format When --show isn't used, use this method for outputting the record, supported: json, jsonpretty, yaml
json assumes --no-decorator as we assume you're piping through jq
--pretty Where possible, use JSON->pretty
--precision For floating point values, use this many digits of precision, defaults to 3
--no-decorators Do not show the header with field names in the query results
--no-header Same as above
--no-implications Don't attempt to imply filters from statistical aggregations
--fields Display the field list for this index!
--bases Display the index base list for this cluster.
--timestamp Field to use as the date object, default: @timestamp
From App::ElasticSearch::Utilities:
--local Use localhost as the elasticsearch host
--host ElasticSearch host to connect to
--port HTTP port for your cluster
--proto Defaults to 'http', can also be 'https'
--http-username HTTP Basic Auth username
--password-exec Script to run to get the users password
--insecure Don't verify TLS certificates
--cacert Specify the TLS CA file
--capath Specify the directory with TLS CAs
--cert Specify the path to the client certificate
--key Specify the path to the client private key file
--noop Any operations other than GET are disabled, can be negated with --no-noop
--timeout Timeout to ElasticSearch, default 10
--keep-proxy Do not remove any proxy settings from %ENV
--index Index to run commands against
--base For daily indexes, reference only those starting with "logstash"
(same as --pattern logstash-* or logstash-DATE)
--pattern Use a pattern to operate on the indexes
--days If using a pattern or base, how many days back to go, default: 1
See also the "CONNECTION ARGUMENTS" and "INDEX SELECTION ARGUMENTS" sections from App::ElasticSearch::Utilities.
From CLI::Helpers:
--data-file Path to a file to write lines tagged with 'data => 1'
--tags A comma separated list of tags to display
--color Boolean, enable/disable color, default use git settings
--verbose Incremental, increase verbosity (Alias is -v)
--debug Show developer output
--debug-class Show debug messages originating from a specific package, default: main
--quiet Show no output (for cron)
--syslog Generate messages to syslog as well
--syslog-facility Default "local0"
--syslog-tag The program name, default is the script name
--syslog-debug Enable debug messages to syslog if in use, default false
--nopaste Use App::Nopaste to paste output to configured paste service
--nopaste-public Defaults to false, specify to use public paste services
--nopaste-service Comma-separated App::Nopaste service, defaults to Shadowcat
=head1 DESCRIPTION
This tool takes a search string parameter to search the cluster. It is in the format of the Lucene
L<query string|http://lucene.apache.org/core/2_9_4/queryparsersyntax.html>
Examples might include:
# Search for past 10 days vhost admin.example.com and client IP 1.2.3.4
es-search.pl --days=10 --size=100 dst:"admin.example.com" AND src_ip:"1.2.3.4"
# Search for all apache logs past with status 500
es-search.pl program:"apache" AND crit:500
# Search for all apache logs with status 500 show only file and out_bytes
es-search.pl program:"apache" AND crit:500 --show file,out_bytes
# Search for ip subnet client IP 1.2.3.0 to 1.2.3.255 or 1.2.0.0 to 1.2.255.255
es-search.pl --size=100 dst:"admin.example.com" AND src_ip:"1.2.3.0/24"
es-search.pl --size=100 dst:"admin.example.com" AND src_ip:"1.2.0/16"
# Show the top src_ip for 'www.example.com'
es-search.pl --base access dst:www.example.com --top src_ip
# Tail the access log for www.example.com 404's
es-search.pl --base access --tail --show src_ip,file,referer_domain dst:www.example.com AND crit:404
=head1 NAME
scripts/es-search.pl view on Meta::CPAN
src_ip:test.dat => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
src_ip:test.json[ip] => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
This make it simple to use the --data-file output options and build queries
based off previous queries. For .txt and .dat file, the delimiter for columns
in the file must be either a tab or a null. For files ending in
.csv, Text::CSV_XS is used to accurate parsing of the file format. Files
ending in .json are considered to be newline-delimited JSON.
You can also specify the column of the data file to use, the default being the last column or (-1). Columns are
B<zero-based> indexing. This means the first column is index 0, second is 1, .. The previous example can be rewritten
as:
src_ip:test.dat[1]
or:
src_ip:test.dat[-1]
For newline delimited JSON files, you need to specify the key path you want to extract from the file. If we have a
JSON source file with:
{ "first": { "second": { "third": [ "bob", "alice" ] } } }
{ "first": { "second": { "third": "ginger" } } }
{ "first": { "second": { "nope": "fred" } } }
We could search using:
actor:test.json[first.second.third]
Which would expand to:
{ "terms": { "actor": [ "alice", "bob", "ginger" ] } }
This option will iterate through the whole file and unique the elements of the list. They will then be transformed into
an appropriate L<terms query|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html>.
=head3 Wildcards
We can also have a group of wildcard or regexp in a file:
$ cat wildcards.dat
*@gmail.com
*@yahoo.com
To enable wildcard parsing, prefix the filename with a C<*>.
es-search.pl to_address:*wildcards.dat
Which expands the query to:
{
"bool": {
"minimum_should_match":1,
"should": [
{"wildcard":{"to_outbound":{"value":"*@gmail.com"}}},
{"wildcard":{"to_outbound":{"value":"*@yahoo.com"}}}
]
}
}
No attempt is made to verify or validate the wildcard patterns.
=head3 Regular Expressions
If you'd like to specify a file full of regexp, you can do that as well:
$ cat regexp.dat
.*google\.com$
.*yahoo\.com$
To enable regexp parsing, prefix the filename with a C<~>.
es-search.pl to_address:~regexp.dat
Which expands the query to:
{
"bool": {
"minimum_should_match":1,
"should": [
{"regexp":{"to_outbound":{"value":".*google\\.com$"}}},
{"regexp":{"to_outbound":{"value":".*yahoo\\.com$"}}}
]
}
}
No attempt is made to verify or validate the regexp expressions.
=head2 App::ElasticSearch::Utilities::QueryString::Nested
Implement the proposed nested query syntax early. Example:
nested_path:"field:match AND string"
=head1 Meta-Queries
Helpful in building queries is the --bases and --fields options which lists the index bases and fields:
es-search.pl --bases
es-search.pl --fields
es-search.pl --base access --fields
=head1 AUTHOR
Brad Lhotsky <brad@divisionbyzero.net>
=head1 COPYRIGHT AND LICENSE
This software is Copyright (c) 2024 by Brad Lhotsky.
This is free software, licensed under:
The (three-clause) BSD License
=cut
( run in 0.587 second using v1.01-cache-2.11-cpan-39bf76dae61 )