WWW-Firecrawl
view release on metacpan or search on metacpan
lib/WWW/Firecrawl.pm view on Meta::CPAN
__END__
=pod
=encoding UTF-8
=head1 NAME
WWW::Firecrawl - Firecrawl v2 API bindings (self-host first, cloud compatible)
=head1 VERSION
version 0.001
=head1 SYNOPSIS
use WWW::Firecrawl;
# Self-hosted
my $fc = WWW::Firecrawl->new(
base_url => 'http://localhost:3002',
);
# Cloud
my $fc = WWW::Firecrawl->new(
api_key => 'fc-...',
);
# Synchronous calls (uses LWP::UserAgent)
my $doc = $fc->scrape( url => 'https://example.com', formats => ['markdown'] );
my $links = $fc->map( url => 'https://example.com' );
my $results = $fc->search( query => 'perl firecrawl', limit => 5 );
my $job = $fc->crawl( url => 'https://example.com', limit => 50 );
my $status = $fc->crawl_status( $job->{id} );
# Request builders (bring your own UA / async framework)
my $req = $fc->scrape_request( url => 'https://example.com' );
my $res = $my_ua->request($req);
my $data = $fc->parse_scrape_response($res);
=head1 DESCRIPTION
Firecrawl (L<https://firecrawl.dev>, L<https://github.com/firecrawl/firecrawl>)
is an open-source web scraping and crawling API. This module provides Perl
bindings for the v2 API, with a focus on self-hosted deployments (cloud works
too).
Every endpoint is exposed in three flavours:
=over 4
=item * C<< $fc->foo_request(%args) >> â returns an L<HTTP::Request>, no network I/O
=item * C<< $fc->parse_foo_response($http_response) >> â decodes JSON, dies on error, returns the payload
=item * C<< $fc->foo(%args) >> â convenience: builds, fires via L<LWP::UserAgent>, parses
=back
The split makes the module trivial to use with any async framework; see
L<Net::Async::Firecrawl> for the L<IO::Async> integration.
=head1 ERROR HANDLING
All failures throw a L<WWW::Firecrawl::Error> object (stringifies to its
message â so existing C<< die "..." >> / C<$@>-matching code keeps working).
Five error types:
=over 4
=item * C<transport> â Could not reach Firecrawl (DNS / connect / TLS / socket).
=item * C<api> â Firecrawl returned a non-2xx HTTP response, invalid JSON, or C<< {success: false} >>.
=item * C<job> â For flows using C<*_status>: the Firecrawl job ended with
status C<failed> or C<cancelled>.
=item * C<scrape> â Single scrape: the target URL was classified as failed
by L</is_failure>. Only thrown when L</strict> is on.
=item * C<page> â Surfaced in the C<failed[]> arrayref of
L</scrape_many> / L</retry_failed_pages>: an individual URL's scrape
was classified as failed but the overall operation continued.
=back
Retries are automatic for C<transport> and retryable C<api> statuses (see
L</retry_statuses>). Never for C<job>, C<scrape>, or C<page> â Firecrawl
already retries target-level failures server-side, and re-running a failed
job is a caller decision. See L</retry_failed_pages> for the manual
re-scrape helper.
Usage:
use Try::Tiny;
try {
my $data = $fc->scrape( url => $u, strict => 1 );
...
}
catch {
my $e = $_;
if (ref $e && $e->isa('WWW::Firecrawl::Error')) {
if ($e->is_transport) { ... }
elsif ($e->is_scrape) { warn "target dead: ", $e->url }
else { warn "firecrawl: $e" }
}
};
=head2 base_url
Base URL of the Firecrawl server. Defaults to C<$ENV{FIRECRAWL_BASE_URL}> or
C<https://api.firecrawl.dev>.
=head2 api_key
Bearer token for authentication. Defaults to C<$ENV{FIRECRAWL_API_KEY}>.
Optional â self-hosted instances can run without auth.
=head2 api_version
( run in 2.187 seconds using v1.01-cache-2.11-cpan-71847e10f99 )