verify results from the CPAN

App-ElasticSearch-Utilities

        : $indices{$a} <=> $indices{$b};
}

__END__

=pod

=head1 NAME

es-search.pl - Provides a CLI for quick searches of data in ElasticSearch daily indexes

=head1 VERSION

version 8.8

=head1 SYNOPSIS

es-search.pl [search string]

Options:

    --help              print help
    --manual            print full manual
    --filter            Force filter context for all query elements
    --show              Comma separated list of fields to display, default is ALL, switches to tab output
    --tail              Continue the query until CTRL+C is sent
    --top               Perform an aggregation on the fields, by a comma separated list of up to 2 items
    --by                Perform an aggregation using the result of this, example: --by cardinality:src_ip
    --with              Perform a sub aggregation on the query
    --bg-filter         Only used if --top aggregation is significant_terms, applies a background filter
    --match-all         Enables the ElasticSearch match_all operator
    --interval          When running aggregations, wrap the aggreation in a date_histogram with this interval
    --prefix            Takes "field:string" and enables the Lucene prefix query for that field
    --exists            Field which must be present in the document
    --missing           Field which must not be present in the document
    --size              Result size, default is 20, aliased to -n and --limit
    --max-batch-size    When making requests to ES, retrieve this many docs in a single request, defaults to 50
    --all               Don't consider result size, just give me *everything*
    --asc               Sort by ascending timestamp
    --desc              Sort by descending timestamp (Default)
    --sort              List of fields for custom sorting
    --format            When --show isn't used, use this method for outputting the record, supported: json, jsonpretty, yaml
                        json assumes --no-decorator as we assume you're piping through jq
    --pretty            Where possible, use JSON->pretty
    --precision         For floating point values, use this many digits of precision, defaults to 3
    --no-decorators     Do not show the header with field names in the query results
    --no-header         Same as above
    --no-implications   Don't attempt to imply filters from statistical aggregations
    --fields            Display the field list for this index!
    --bases             Display the index base list for this cluster.
    --timestamp         Field to use as the date object, default: @timestamp

From App::ElasticSearch::Utilities:

    --local         Use localhost as the elasticsearch host
    --host          ElasticSearch host to connect to
    --port          HTTP port for your cluster
    --proto         Defaults to 'http', can also be 'https'
    --http-username HTTP Basic Auth username
    --password-exec Script to run to get the users password
    --insecure      Don't verify TLS certificates
    --cacert        Specify the TLS CA file
    --capath        Specify the directory with TLS CAs
    --cert          Specify the path to the client certificate
    --key           Specify the path to the client private key file
    --noop          Any operations other than GET are disabled, can be negated with --no-noop
    --timeout       Timeout to ElasticSearch, default 10
    --keep-proxy    Do not remove any proxy settings from %ENV
    --index         Index to run commands against
    --base          For daily indexes, reference only those starting with "logstash"
                     (same as --pattern logstash-* or logstash-DATE)
    --pattern       Use a pattern to operate on the indexes
    --days          If using a pattern or base, how many days back to go, default: 1

See also the "CONNECTION ARGUMENTS" and "INDEX SELECTION ARGUMENTS" sections from App::ElasticSearch::Utilities.

From CLI::Helpers:

    --data-file         Path to a file to write lines tagged with 'data => 1'
    --tags              A comma separated list of tags to display
    --color             Boolean, enable/disable color, default use git settings
    --verbose           Incremental, increase verbosity (Alias is -v)
    --debug             Show developer output
    --debug-class       Show debug messages originating from a specific package, default: main
    --quiet             Show no output (for cron)
    --syslog            Generate messages to syslog as well
    --syslog-facility   Default "local0"
    --syslog-tag        The program name, default is the script name
    --syslog-debug      Enable debug messages to syslog if in use, default false
    --nopaste           Use App::Nopaste to paste output to configured paste service
    --nopaste-public    Defaults to false, specify to use public paste services
    --nopaste-service   Comma-separated App::Nopaste service, defaults to Shadowcat

=head1 DESCRIPTION

This tool takes a search string parameter to search the cluster.  It is in the format of the Lucene
L<query string|http://lucene.apache.org/core/2_9_4/queryparsersyntax.html>

Examples might include:

    # Search for past 10 days vhost admin.example.com and client IP 1.2.3.4
    es-search.pl --days=10 --size=100 dst:"admin.example.com" AND src_ip:"1.2.3.4"

    # Search for all apache logs past with status 500
    es-search.pl program:"apache" AND crit:500

    # Search for all apache logs with status 500 show only file and out_bytes
    es-search.pl program:"apache" AND crit:500 --show file,out_bytes

    # Search for ip subnet client IP 1.2.3.0 to 1.2.3.255 or 1.2.0.0 to 1.2.255.255
    es-search.pl --size=100 dst:"admin.example.com" AND src_ip:"1.2.3.0/24"
    es-search.pl --size=100 dst:"admin.example.com" AND src_ip:"1.2.0/16"

    # Show the top src_ip for 'www.example.com'
    es-search.pl --base access dst:www.example.com --top src_ip

    # Tail the access log for www.example.com 404's
    es-search.pl --base access --tail --show src_ip,file,referer_domain dst:www.example.com AND crit:404

=head1 NAME

scripts/es-search.pl view on Meta::CPAN

    src_ip:test.dat      => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
    src_ip:test.json[ip] => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)

This make it simple to use the --data-file output options and build queries
based off previous queries. For .txt and .dat file, the delimiter for columns
in the file must be either a tab or a null.  For files ending in
.csv, Text::CSV_XS is used to accurate parsing of the file format.  Files
ending in .json are considered to be newline-delimited JSON.

You can also specify the column of the data file to use, the default being the last column or (-1).  Columns are
B<zero-based> indexing. This means the first column is index 0, second is 1, ..  The previous example can be rewritten
as:

    src_ip:test.dat[1]

or:
    src_ip:test.dat[-1]

For newline delimited JSON files, you need to specify the key path you want to extract from the file.  If we have a
JSON source file with:

    { "first": { "second": { "third": [ "bob", "alice" ] } } }
    { "first": { "second": { "third": "ginger" } } }
    { "first": { "second": { "nope":  "fred" } } }

We could search using:

    actor:test.json[first.second.third]

Which would expand to:

    { "terms": { "actor": [ "alice", "bob", "ginger" ] } }

This option will iterate through the whole file and unique the elements of the list.  They will then be transformed into
an appropriate L<terms query|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html>.

=head3 Wildcards

We can also have a group of wildcard or regexp in a file:

    $ cat wildcards.dat
    *@gmail.com
    *@yahoo.com

To enable wildcard parsing, prefix the filename with a C<*>.

    es-search.pl to_address:*wildcards.dat

Which expands the query to:

    {
      "bool": {
        "minimum_should_match":1,
        "should": [
           {"wildcard":{"to_outbound":{"value":"*@gmail.com"}}},
           {"wildcard":{"to_outbound":{"value":"*@yahoo.com"}}}
        ]
      }
    }

No attempt is made to verify or validate the wildcard patterns.

=head3 Regular Expressions

If you'd like to specify a file full of regexp, you can do that as well:

    $ cat regexp.dat
    .*google\.com$
    .*yahoo\.com$

To enable regexp parsing, prefix the filename with a C<~>.

    es-search.pl to_address:~regexp.dat

Which expands the query to:

    {
      "bool": {
        "minimum_should_match":1,
        "should": [
          {"regexp":{"to_outbound":{"value":".*google\\.com$"}}},
          {"regexp":{"to_outbound":{"value":".*yahoo\\.com$"}}}
        ]
      }
    }

No attempt is made to verify or validate the regexp expressions.

=head2 App::ElasticSearch::Utilities::QueryString::Nested

Implement the proposed nested query syntax early.  Example:

    nested_path:"field:match AND string"

=head1 Meta-Queries

Helpful in building queries is the --bases and --fields options which lists the index bases and fields:

    es-search.pl --bases

    es-search.pl --fields

    es-search.pl --base access --fields

=head1 AUTHOR

Brad Lhotsky <brad@divisionbyzero.net>

=head1 COPYRIGHT AND LICENSE

This software is Copyright (c) 2024 by Brad Lhotsky.

This is free software, licensed under:

  The (three-clause) BSD License

=cut

( run in 0.587 second using v1.01-cache-2.11-cpan-39bf76dae61 )