App-GoogleSearchUtils

 view release on metacpan or  search on metacpan

lib/App/GoogleSearchUtils.pm  view on Meta::CPAN


This utility can save you time when you want to open multiple queries (with
added common prefix/suffix words) or specify some options like time limit. It
will formulate the search URL(s) then open them for you in browser. You can also
specify to print out the URLs instead.

Aside from standard web search, you can also generate/open other searches like
image, video, news, or map.

_
    args => {
        queries => {
            'x.name.is_plural' => 1,
            'x.name.singular' => 'query',
            schema => ['array*', of=>'str*', min_len=>1],
            pos => 0,
            slurpy => 1,
        },
        queries_from => {
            summary => 'Supply queries from lines of text file (specify "-" for stdin)',
            schema => 'filename*',
        },
        delay => {
            summary => 'Delay between opening each query',
            schema => 'duration*',
            description => <<'_',

As an alternative to the `--delay` option, you can also use `--min-delay` and
`--max-delay` to set a random delay between a minimum and maximum value.

_
        },
        min_delay => {
            summary => 'Delay between opening each query',
            schema => 'duration*',
            description => <<'_',

As an alternative to the `--mindelay` and `--max-delay` options, you can also
use `--delay` to set a constant delay between requests.

_
        },
        max_delay => {
            summary => 'Delay between opening each query',
            schema => 'duration*',
        },
        prepend => {
            summary => 'String to add at the beginning of each query',
            schema => 'str*',
        },
        append => {
            summary => 'String to add at the end of each query',
            schema => 'str*',
        },
        num => {
            summary => 'Number of results per page',
            schema => 'posint*',
            default => 100,
        },
        time_start => {
            schema => ['date*', 'x.perl.coerce_rules' => ['From_str::natural'], 'x.perl.coerce_to'=>'DateTime'],
            tags => ['category:time-period-criteria'],
        },
        time_end => {
            schema => ['date*', 'x.perl.coerce_rules' => ['From_str::natural'], 'x.perl.coerce_to'=>'DateTime'],
            tags => ['category:time-period-criteria'],
        },
        time_past => {
            summary => 'Limit time period to the past hour/24hour/week/month/year',
            schema => ['str*', in=>[qw/hour 24hour day week month year/]],
            tags => ['category:time-period-criteria'],
        },
        action => {
            summary => 'What to do with the URLs',
            schema => ['str*', in=>[qw/
                                          open_url
                                          print_url print_html_link print_org_link
                                          save_html
                                          print_result_link
                                          print_result_html_link
                                          print_result_org_link
                                      /]],
            default => 'open_url',
            cmdline_aliases => {
                open_url               => {is_flag=>1, summary=>'Alias for --action=open_url'       , code=>sub {$_[0]{action}='open_url'       }},
                print_url              => {is_flag=>1, summary=>'Alias for --action=print_url'      , code=>sub {$_[0]{action}='print_url'      }},
                print_html_link        => {is_flag=>1, summary=>'Alias for --action=print_html_link', code=>sub {$_[0]{action}='print_html_link'}},
                print_org_link         => {is_flag=>1, summary=>'Alias for --action=print_org_link' , code=>sub {$_[0]{action}='print_org_link' }},
                save_html              => {is_flag=>1, summary=>'Alias for --action=save_html'      , code=>sub {$_[0]{action}='save_html'      }},
                print_result_link      => {is_flag=>1, summary=>'Alias for --action=extract_links'  , code=>sub {$_[0]{action}='print_result_link'      }},
                print_result_html_link => {is_flag=>1, summary=>'Alias for --action=extract_links'  , code=>sub {$_[0]{action}='print_result_html_link' }},
                print_result_org_link  => {is_flag=>1, summary=>'Alias for --action=extract_links'  , code=>sub {$_[0]{action}='print_result_org_link'  }},
            },
            description => <<'_',

Instead of opening the queries in browser (`open_url`), you can also do other
action instead.

**Printing search URLs**: `print_url` will print the search URL.
`print_html_link` will print the HTML link (the <a> tag). And `print_org_link`
will print the Org-mode link, e.g. `[[url...][query]]`.

**Saving search result HTMLs**: `save_html` will first visit each search URL
(currently using <pm:Firefox::Marionette>) then save each result page to a file
named `<num>-<query>.html` in the current directory. Existing files will not be
overwritten; the utility will save to `*.html.1`, `*.html.2` and so on instead.

**Extracting search result links**: `print_result_link` will first will first
visit each search URL (currently using <pm:Firefox::Marionette>) then extract
result links and print them. `print_result_html_link` and
`print_result_org_link` are similar but will instead format each link as HTML
and Org link, respectively.

Currently the `print_result_*link` actions are not very useful because result
HTML page is now obfuscated by Google. Thus we can only extract all links in
each page instead of selecting (via DOM) only the actual search result entry
links, etc.

If you want to filter the links further by domain, path, etc. you can use
<prog:grep-url>.


_
        },
        type => {



( run in 0.387 second using v1.01-cache-2.11-cpan-39bf76dae61 )