App-WebSearchUtils

 view release on metacpan or  search on metacpan

lib/App/WebSearchUtils.pm  view on Meta::CPAN

package App::WebSearchUtils;

use 5.010001;
use strict;
use warnings;
use Log::ger;

use Perinci::Object 'envresmulti';

our $AUTHORITY = 'cpan:PERLANCAR'; # AUTHORITY
our $DATE = '2022-10-10'; # DATE
our $DIST = 'App-WebSearchUtils'; # DIST
our $VERSION = '0.001'; # VERSION

our %SPEC;

sub _fmt_html_link {
    my ($url, $query) = @_;
    require HTML::Entities;
    my $query_htmlesc = HTML::Entities::encode_entities($query // "(query)");
    qq(<a href="$url">$query_htmlesc<</a>);
}

sub _fmt_org_link {
    my ($url, $query) = @_;
    qq([[$url][$query]]);
}

$SPEC{web_search} = {
    v => 1.1,
    summary => 'Open web search page in browser',
    description => <<'_',

This utility can save you time when you want to open multiple queries (with
added common prefix/suffix words) or specify some options like time limit. It
will formulate the search URL(s) then open them for you in browser. You can also
specify to print out the URLs instead.

Aside from standard web search, you can also generate/open other searches like
image, video, news, or map.

_
    args => {
        queries => {
            'x.name.is_plural' => 1,
            'x.name.singular' => 'query',
            schema => ['array*', of=>'str*', min_len=>1],
            pos => 0,
            slurpy => 1,
        },
        queries_from => {
            summary => 'Supply queries from lines of text file (specify "-" for stdin)',
            schema => 'filename*',
        },
        delay => {
            summary => 'Delay between opening each query',
            schema => 'duration*',
            description => <<'_',

As an alternative to the `--delay` option, you can also use `--min-delay` and
`--max-delay` to set a random delay between a minimum and maximum value.

_
        },
        min_delay => {
            summary => 'Delay between opening each query',
            schema => 'duration*',
            description => <<'_',

As an alternative to the `--mindelay` and `--max-delay` options, you can also
use `--delay` to set a constant delay between requests.

_
        },
        max_delay => {
            summary => 'Delay between opening each query',
            schema => 'duration*',
        },
        prepend => {
            summary => 'String to add at the beginning of each query',
            schema => 'str*',
        },
        append => {
            summary => 'String to add at the end of each query',
            schema => 'str*',
        },
        num => {
            summary => 'Number of results per page',
            schema => 'posint*',
        },
        time_start => {
            schema => ['date*', 'x.perl.coerce_rules' => ['From_str::natural'], 'x.perl.coerce_to'=>'DateTime'],
            tags => ['category:time-period-criteria'],
        },
        time_end => {
            schema => ['date*', 'x.perl.coerce_rules' => ['From_str::natural'], 'x.perl.coerce_to'=>'DateTime'],

lib/App/WebSearchUtils.pm  view on Meta::CPAN

                    if ($action =~ /html/) {
                        push @rows, _fmt_html_link($link->url_abs . "", $link->text);
                    } elsif ($action =~ /html/) {
                        push @rows, _fmt_org_link($link->url_abs . "", $link->text);
                    } else {
                        push @rows, $link->url_abs . "";
                    }
                }
            }
        } else {
            return [400, "Unknown action '$action'"];
        }
    }
    if ($action eq 'open_url') {
        return $envres->as_struct;
    } else {
        return [200, "OK", \@rows];
    }
}

1;
# ABSTRACT: CLI utilites related to searching with search engines

__END__

=pod

=encoding UTF-8

=head1 NAME

App::WebSearchUtils - CLI utilites related to searching with search engines

=head1 VERSION

This document describes version 0.001 of App::WebSearchUtils (from Perl distribution App-WebSearchUtils), released on 2022-10-10.

=head1 SYNOPSIS

This distribution provides the following utilities:

=over

=item * L<web-search>

=back

=head1 FUNCTIONS


=head2 web_search

Usage:

 web_search(%args) -> [$status_code, $reason, $payload, \%result_meta]

Open web search page in browser.

This utility can save you time when you want to open multiple queries (with
added common prefix/suffix words) or specify some options like time limit. It
will formulate the search URL(s) then open them for you in browser. You can also
specify to print out the URLs instead.

Aside from standard web search, you can also generate/open other searches like
image, video, news, or map.

This function is not exported.

Arguments ('*' denotes required arguments):

=over 4

=item * B<action> => I<str> (default: "open_url")

What to do with the URLs.

Instead of opening the queries in browser (C<open_url>), you can also do other
action instead.

B<Printing search URLs>: C<print_url> will print the search URL.
C<print_html_link> will print the HTML link (the <a> tag). And C<print_org_link>
will print the Org-mode link, e.g. C<[[url...][query]]>.

B<Saving search result HTMLs>: C<save_html> will first visit each search URL
(currently using L<Firefox::Marionette>) then save each result page to a file
named C<< E<lt>numE<gt>-E<lt>queryE<gt>.html >> in the current directory. Existing files will not be
overwritten; the utility will save to C<*.html.1>, C<*.html.2> and so on instead.

B<Extracting search result links>: C<print_result_link> will first will first
visit each search URL (currently using L<Firefox::Marionette>) then extract
result links and print them. C<print_result_html_link> and
C<print_result_org_link> are similar but will instead format each link as HTML
and Org link, respectively.

The C<print_result_*link> actions are not very useful for some search engines
like Google because result HTML page is obfuscated. Thus we can only extract all
links in each page instead of selecting (via DOM) only the actual search result
entry links, etc.

If you want to filter the links further by domain, path, etc. you can use
L<grep-url>.

=item * B<append> => I<str>

String to add at the end of each query.

=item * B<delay> => I<duration>

Delay between opening each query.

As an alternative to the C<--delay> option, you can also use C<--min-delay> and
C<--max-delay> to set a random delay between a minimum and maximum value.

=item * B<engine> => I<str> (default: "google")

Search engine to use.

=item * B<max_delay> => I<duration>

Delay between opening each query.



( run in 0.822 second using v1.01-cache-2.11-cpan-df04353d9ac )