Net-Async-WebSearch
view release on metacpan or search on metacpan
lib/Net/Async/WebSearch.pm view on Meta::CPAN
applied. At the Search plan rate of C<$5 / 1000 requests> that's about
1000 queries/month. B<You must pick a plan on signup even to use the
free credits>, and B<a credit card is required> as an anti-fraud check
(not charged while you stay within the credit allowance). API key is
minted at L<https://api.search.brave.com/app/dashboard>.
=item * B<Serper.dev> â L<https://serper.dev>
Best free-tier deal of the paid providers: B<2500 free queries on
signup, no credit card required>. After that, paid plans in the
~$1 / 1000 range. Google results behind a proxy, very fast. Sign up
on the homepage; the API key is shown in the dashboard afterward
(there is no standalone C</api-key> URL).
=item * B<Google Programmable Search> (Custom Search JSON API) â
L<https://programmablesearchengine.google.com>
Two things to set up and both are free at low volume:
=over 4
=item 1. Create a Programmable Search Engine at the URL above. That
gives you the C<cx> value ("Search engine ID"). By default the PSE
is scoped to specific sites you list â to get full web results, open
I<Search features> for that engine and turn B<Search the entire web>
on. (Google has been steadily burying this toggle but it's still
there.)
=item 2. Enable the Custom Search API in a Google Cloud project
(L<https://console.cloud.google.com/apis/library/customsearch.googleapis.com>)
and create an API key under I<Credentials>. No credit card needed at
the free tier.
=back
Quota: 100 free queries/day. Paid: $5 / 1000, capped at 10,000/day.
Results per call capped at 10.
=item * B<Yandex Search API>
=over 4
=item Signup: L<https://console.yandex.cloud/link/search-api/>
=item Docs: L<https://yandex.cloud/en/docs/search-api/>
=back
Requires a Yandex Cloud account and a "folder" (their project-scope
concept â the folder id is your C<folderid>). Pricing is via Yandex
Cloud credits; a free trial exists via the standard Cloud welcome
credits. API key: create a service account in the Cloud Console,
grant it the C<search-api.executor> role, then generate an API key
(C<apikey>) or IAM token â that's your C<api_key>.
=item * B<Reddit> (public JSON) â no key
Works out of the box but rate-limited aggressively with generic UAs.
Fine for low-volume use; for anything serious use OAuth (below).
=item * B<Reddit OAuth> â L<https://www.reddit.com/prefs/apps>
Free. You need a Reddit account and a working User-Agent string
(Reddit insists on the form C<< app/1.0 by /u/yourname >>). At the
bottom of L<https://www.reddit.com/prefs/apps> click I<create app>,
pick type B<script> (for C<client_credentials>/C<password>) or
B<installed> (for C<installed>) or B<web> (for the full
C<authorization_code> consent flow). The short string under the app
name is C<client_id>; C<secret> is shown once on creation. Rate limit
is 100 QPM per OAuth identity. See
L<Net::Async::WebSearch::Provider::Reddit::OAuth/SETUP> for the full
walkthrough.
=back
Summary table:
Provider Free tier CC? Key source
---------------- -------------------------------- ---- --------------------------------------
DuckDuckGo unlimited (HTML scrape) no (no key)
SearxNG self-hosted, unlimited no (self-host; see ex/docker-compose.*)
Brave $5/month credits (~1000 q) yes api.search.brave.com/app/dashboard
Serper 2500 / signup no serper.dev (dashboard after signup)
Google CSE 100 / day no Cloud Console + programmablesearchengine.google.com
Yandex Cloud trial credits no console.yandex.cloud/link/search-api/
Reddit keyless (rate-limited) no (no key)
Reddit OAuth 100 QPM per client_id no reddit.com/prefs/apps
=head2 Fetching result bodies
Pass C<fetch =E<gt> N> to any of the search modes to additionally GET the top
N result URLs and attach the response to each C<Result> under C<< $r->fetched >>
(see L<Net::Async::WebSearch::Result/fetched> for the hash shape). You still
get the full search result list â fetch is I<additive>.
Semantics per mode:
=over 4
=item * C<collect> â fetches the top C<N> URLs I<after> RRF dedup/ranking, so
every URL is hit at most once no matter how many providers surfaced it.
=item * C<stream> â fetches the first C<N> unique URLs in arrival order, kicked
off the moment C<on_result> fires for each. An optional C<on_fetch> coderef
fires per result once its fetch settles. The outer Future resolves after every
search I<and> every fetch is done.
=item * C<race> â fetches the top C<N> of the winning provider's list.
=back
Knobs (constructor defaults, all overridable per call):
=over 4
=item * C<fetch_concurrency> â global cap on parallel in-flight fetches
(default 100). In C<collect>/C<race> this is the C<concurrent> arg to
L<Future::Utils/fmap_void>. In C<stream> it's the ceiling for fetches queued
on result arrival.
=item * C<fetch_concurrency_per_target_ip> â per-host cap (default 5). Wired
to L<Net::Async::HTTP>'s C<max_connections_per_host> on the shared HTTP
client. Keeps you from hammering a single origin even when the global pool
has headroom. Currently this is B<per-hostname>, not per-resolved-IP;
different names pointing at the same CDN edge are counted separately.
=item * C<fetch_timeout> â seconds per request, passed straight to
L<Net::Async::HTTP>.
=item * C<fetch_max_bytes> â truncate the response body to this many bytes.
=item * C<fetch_user_agent> â User-Agent for fetch requests. Default is the
library's own UA; set it to something representative if you care about
politeness.
=item * C<fetch_accept> â per-call Accept header (e.g. C<text/html>).
=back
This feature is deliberately separate from the provider plumbing â providers
hand back search results only. Fetching is for use-cases like RAG, crawling,
and summarization where you want the actual page bodies, and is optional for
MCP-style consumers that only care about the search hits themselves.
=head2 Stacking providers
You can register multiple instances of the same provider class â five SearxNG
( run in 1.281 second using v1.01-cache-2.11-cpan-8f98c5d2c55 )