Net-Async-WebSearch

 view release on metacpan or  search on metacpan

lib/Net/Async/WebSearch.pm  view on Meta::CPAN

applied. At the Search plan rate of C<$5 / 1000 requests> that's about
1000 queries/month. B<You must pick a plan on signup even to use the
free credits>, and B<a credit card is required> as an anti-fraud check
(not charged while you stay within the credit allowance). API key is
minted at L<https://api.search.brave.com/app/dashboard>.

=item * B<Serper.dev> — L<https://serper.dev>

Best free-tier deal of the paid providers: B<2500 free queries on
signup, no credit card required>. After that, paid plans in the
~$1 / 1000 range. Google results behind a proxy, very fast. Sign up
on the homepage; the API key is shown in the dashboard afterward
(there is no standalone C</api-key> URL).

=item * B<Google Programmable Search> (Custom Search JSON API) —
L<https://programmablesearchengine.google.com>

Two things to set up and both are free at low volume:

=over 4

=item 1. Create a Programmable Search Engine at the URL above. That
gives you the C<cx> value ("Search engine ID"). By default the PSE
is scoped to specific sites you list — to get full web results, open
I<Search features> for that engine and turn B<Search the entire web>
on. (Google has been steadily burying this toggle but it's still
there.)

=item 2. Enable the Custom Search API in a Google Cloud project
(L<https://console.cloud.google.com/apis/library/customsearch.googleapis.com>)
and create an API key under I<Credentials>. No credit card needed at
the free tier.

=back

Quota: 100 free queries/day. Paid: $5 / 1000, capped at 10,000/day.
Results per call capped at 10.

=item * B<Yandex Search API>

=over 4

=item Signup:  L<https://console.yandex.cloud/link/search-api/>

=item Docs:    L<https://yandex.cloud/en/docs/search-api/>

=back

Requires a Yandex Cloud account and a "folder" (their project-scope
concept — the folder id is your C<folderid>). Pricing is via Yandex
Cloud credits; a free trial exists via the standard Cloud welcome
credits. API key: create a service account in the Cloud Console,
grant it the C<search-api.executor> role, then generate an API key
(C<apikey>) or IAM token — that's your C<api_key>.

=item * B<Reddit> (public JSON) — no key

Works out of the box but rate-limited aggressively with generic UAs.
Fine for low-volume use; for anything serious use OAuth (below).

=item * B<Reddit OAuth> — L<https://www.reddit.com/prefs/apps>

Free. You need a Reddit account and a working User-Agent string
(Reddit insists on the form C<< app/1.0 by /u/yourname >>). At the
bottom of L<https://www.reddit.com/prefs/apps> click I<create app>,
pick type B<script> (for C<client_credentials>/C<password>) or
B<installed> (for C<installed>) or B<web> (for the full
C<authorization_code> consent flow). The short string under the app
name is C<client_id>; C<secret> is shown once on creation. Rate limit
is 100 QPM per OAuth identity. See
L<Net::Async::WebSearch::Provider::Reddit::OAuth/SETUP> for the full
walkthrough.

=back

Summary table:

  Provider         Free tier                        CC?   Key source
  ---------------- -------------------------------- ---- --------------------------------------
  DuckDuckGo       unlimited (HTML scrape)          no   (no key)
  SearxNG          self-hosted, unlimited           no   (self-host; see ex/docker-compose.*)
  Brave            $5/month credits (~1000 q)       yes  api.search.brave.com/app/dashboard
  Serper           2500 / signup                    no   serper.dev (dashboard after signup)
  Google CSE       100 / day                        no   Cloud Console + programmablesearchengine.google.com
  Yandex           Cloud trial credits              no   console.yandex.cloud/link/search-api/
  Reddit           keyless (rate-limited)           no   (no key)
  Reddit OAuth     100 QPM per client_id            no   reddit.com/prefs/apps

=head2 Fetching result bodies

Pass C<fetch =E<gt> N> to any of the search modes to additionally GET the top
N result URLs and attach the response to each C<Result> under C<< $r->fetched >>
(see L<Net::Async::WebSearch::Result/fetched> for the hash shape). You still
get the full search result list — fetch is I<additive>.

Semantics per mode:

=over 4

=item * C<collect> — fetches the top C<N> URLs I<after> RRF dedup/ranking, so
every URL is hit at most once no matter how many providers surfaced it.

=item * C<stream> — fetches the first C<N> unique URLs in arrival order, kicked
off the moment C<on_result> fires for each. An optional C<on_fetch> coderef
fires per result once its fetch settles. The outer Future resolves after every
search I<and> every fetch is done.

=item * C<race> — fetches the top C<N> of the winning provider's list.

=back

Knobs (constructor defaults, all overridable per call):

=over 4

=item * C<fetch_concurrency> — global cap on parallel in-flight fetches
(default 100). In C<collect>/C<race> this is the C<concurrent> arg to
L<Future::Utils/fmap_void>. In C<stream> it's the ceiling for fetches queued
on result arrival.

=item * C<fetch_concurrency_per_target_ip> — per-host cap (default 5). Wired
to L<Net::Async::HTTP>'s C<max_connections_per_host> on the shared HTTP
client. Keeps you from hammering a single origin even when the global pool
has headroom. Currently this is B<per-hostname>, not per-resolved-IP;
different names pointing at the same CDN edge are counted separately.

=item * C<fetch_timeout> — seconds per request, passed straight to
L<Net::Async::HTTP>.

=item * C<fetch_max_bytes> — truncate the response body to this many bytes.

=item * C<fetch_user_agent> — User-Agent for fetch requests. Default is the
library's own UA; set it to something representative if you care about
politeness.

=item * C<fetch_accept> — per-call Accept header (e.g. C<text/html>).

=back

This feature is deliberately separate from the provider plumbing — providers
hand back search results only. Fetching is for use-cases like RAG, crawling,
and summarization where you want the actual page bodies, and is optional for
MCP-style consumers that only care about the search hits themselves.

=head2 Stacking providers

You can register multiple instances of the same provider class — five SearxNG



( run in 1.281 second using v1.01-cache-2.11-cpan-8f98c5d2c55 )