WWW-Firecrawl

 view release on metacpan or  search on metacpan

lib/WWW/Firecrawl.pm  view on Meta::CPAN

__END__

=pod

=encoding UTF-8

=head1 NAME

WWW::Firecrawl - Firecrawl v2 API bindings (self-host first, cloud compatible)

=head1 VERSION

version 0.001

=head1 SYNOPSIS

  use WWW::Firecrawl;

  # Self-hosted
  my $fc = WWW::Firecrawl->new(
    base_url => 'http://localhost:3002',
  );

  # Cloud
  my $fc = WWW::Firecrawl->new(
    api_key => 'fc-...',
  );

  # Synchronous calls (uses LWP::UserAgent)
  my $doc     = $fc->scrape( url => 'https://example.com', formats => ['markdown'] );
  my $links   = $fc->map( url => 'https://example.com' );
  my $results = $fc->search( query => 'perl firecrawl', limit => 5 );

  my $job = $fc->crawl( url => 'https://example.com', limit => 50 );
  my $status = $fc->crawl_status( $job->{id} );

  # Request builders (bring your own UA / async framework)
  my $req = $fc->scrape_request( url => 'https://example.com' );
  my $res = $my_ua->request($req);
  my $data = $fc->parse_scrape_response($res);

=head1 DESCRIPTION

Firecrawl (L<https://firecrawl.dev>, L<https://github.com/firecrawl/firecrawl>)
is an open-source web scraping and crawling API. This module provides Perl
bindings for the v2 API, with a focus on self-hosted deployments (cloud works
too).

Every endpoint is exposed in three flavours:

=over 4

=item * C<< $fc->foo_request(%args) >> — returns an L<HTTP::Request>, no network I/O

=item * C<< $fc->parse_foo_response($http_response) >> — decodes JSON, dies on error, returns the payload

=item * C<< $fc->foo(%args) >> — convenience: builds, fires via L<LWP::UserAgent>, parses

=back

The split makes the module trivial to use with any async framework; see
L<Net::Async::Firecrawl> for the L<IO::Async> integration.

=head1 ERROR HANDLING

All failures throw a L<WWW::Firecrawl::Error> object (stringifies to its
message — so existing C<< die "..." >> / C<$@>-matching code keeps working).

Five error types:

=over 4

=item * C<transport> — Could not reach Firecrawl (DNS / connect / TLS / socket).

=item * C<api> — Firecrawl returned a non-2xx HTTP response, invalid JSON, or C<< {success: false} >>.

=item * C<job> — For flows using C<*_status>: the Firecrawl job ended with
status C<failed> or C<cancelled>.

=item * C<scrape> — Single scrape: the target URL was classified as failed
by L</is_failure>. Only thrown when L</strict> is on.

=item * C<page> — Surfaced in the C<failed[]> arrayref of
L</scrape_many> / L</retry_failed_pages>: an individual URL's scrape
was classified as failed but the overall operation continued.

=back

Retries are automatic for C<transport> and retryable C<api> statuses (see
L</retry_statuses>). Never for C<job>, C<scrape>, or C<page> — Firecrawl
already retries target-level failures server-side, and re-running a failed
job is a caller decision. See L</retry_failed_pages> for the manual
re-scrape helper.

Usage:

  use Try::Tiny;
  try {
    my $data = $fc->scrape( url => $u, strict => 1 );
    ...
  }
  catch {
    my $e = $_;
    if (ref $e && $e->isa('WWW::Firecrawl::Error')) {
      if ($e->is_transport) { ... }
      elsif ($e->is_scrape) { warn "target dead: ", $e->url }
      else                  { warn "firecrawl: $e" }
    }
  };

=head2 base_url

Base URL of the Firecrawl server. Defaults to C<$ENV{FIRECRAWL_BASE_URL}> or
C<https://api.firecrawl.dev>.

=head2 api_key

Bearer token for authentication. Defaults to C<$ENV{FIRECRAWL_API_KEY}>.
Optional — self-hosted instances can run without auth.

=head2 api_version



( run in 2.187 seconds using v1.01-cache-2.11-cpan-71847e10f99 )