App-phoebe

 view release on metacpan or  search on metacpan

lib/App/Phoebe/SpeedBump.pm  view on Meta::CPAN

=encoding utf8

=head1 NAME

App::Phoebe::SpeedBump - defend Phoebe against bots and leeches

=head1 DESCRIPTION

We want to block crawlers that are too fast or that don’t follow the
instructions in robots.txt. We do this by keeping a list of recent visitors: for
every IP number, we remember the timestamps of their last visits. If they make
more than 30 requests in 60s, we block them for an ever increasing amount of
seconds, starting with 60s and doubling every time this happens.

For every IP number, Phoebe also records whether the last 30 requests were
“suspicious” or not. A suspicious request is a request that is “disallowed” for
bots according to “robots.txt” (more or less). If 10 requests or more of the
last 30 requests in the last 60 seconds are suspicious, the IP number is
blocked.

When an IP number is blocked, it is blocked for 60s, and there’s a 120s
probation time. When you’re blocked, Phoebe responds with a “44” response. This
means: slow down!

If the IP number is unblocked but gives cause for another block in the probation
time, it is blocked again and the blocking time is doubled: the IP is blocked
for 120s and there’s 240s probation time. And if it happens again, it is doubled
again.

There is no configuration required, but adding a known fingerprint is suggested.
The C</do/speed-bump> URL shows you more information, if you have a client
certificate with a known fingerprint.

The exact number of requests and the length of the time window (in seconds) can
be changed in the F<config> file, too.

 Here’s one way to do all that:

    package App::Phoebe;
    our @known_fingerprints = qw(
      sha256$0ba6ba61da1385890f611439590f2f0758760708d1375859b2184dcd8f855a00);
    package App::Phoebe::SpeedBump;
    our $speed_bump_requests = 20;
    our $speed_bump_window = 20;
    use App::Phoebe::SpeedBump;

Here’s how to get the fingerprint from a certificate named F<client-cert.pem>:

    openssl x509 -in client-cert.pem -noout -sha256 -fingerprint \
    | sed -e 's/://g' -e 's/SHA256 Fingerprint=/sha256$/' \
    | tr [:upper:] [:lower:]

This should give you the fingerprint in the correct format to add to the list
above.

=cut

package App::Phoebe::SpeedBump;
use App::Phoebe qw(@extensions $log $server @known_fingerprints
		   success result port host_regex );
use Modern::Perl;
use File::Slurper qw(read_binary write_binary);
use List::Util qw(sum);
use Mojo::JSON qw(decode_json encode_json);
use Net::IP;
use Net::DNS qw(rr);

@known_fingerprints = qw(
  sha256$54c0b95dd56aebac1432a3665107d3aec0d4e28fef905020ed6762db49e84ee1);

our $speed_bump_requests ||= 30;
our $speed_bump_window ||= 60;

# $speed_data->{$ip}->{visits} = [$last, ... , $oldest]
# $speed_data->{$ip}->{warnings} = [1, ... , 0]
# $speed_data->{$ip}->{seconds} = $sec
# $speed_data->{$ip}->{until} = $ts
# $speed_data->{$ip}->{probation} = $ts + $sec
my $speed_data;
# $speed_cidr_data->{$cidr} = $ts
my $speed_cidr_data;

# order is important: we must be able to reset the stats for tests; and we need
# to be there before others handle our requests
unshift(@extensions, \&speed_bump_admin, \&speed_bump);

sub speed_bump {
  my ($stream, $url) = @_;
  my $now = time;
  # go through the data we keep and delete it if the two time limits ellapsed
  # and the last visit is past the time window we're interested in
  for my $ip (keys %$speed_data) {
    if ((not $speed_data->{$ip}->{until}
	 or $speed_data->{$ip}->{until} < $now)
	and (not $speed_data->{$ip}->{probation}
	     or $speed_data->{$ip}->{probation} < $now)
	and (not $speed_data->{$ip}->{visits}
	     or @{$speed_data->{$ip}->{visits}} == 0
	     or $speed_data->{$ip}->{visits}->[0] < $now - $speed_bump_window)) {
      delete($speed_data->{$ip});
    }
  }
  for my $cidr (keys %$speed_cidr_data) {
    delete($speed_cidr_data->{$cidr}) if $speed_cidr_data->{$cidr} < $now;
  }
  # check whether the range is blocked
  my $ip = $stream->handle->peerhost;
  my $ob = new Net::IP($ip);
  for my $cidr (keys %$speed_cidr_data) {
    my $range = new Net::IP($cidr) or $log->error(Net::IP::Error());
    my $overlap = $range->overlaps($ob);
    # $IP_PARTIAL_OVERLAP (ranges overlap) $IP_NO_OVERLAP (no overlap)
    # $IP_A_IN_B_OVERLAP (range2 contains range1) $IP_B_IN_A_OVERLAP (range1
    # contains range2) $IP_IDENTICAL (ranges are identical) undef (problem)
    if (defined $overlap and $overlap != $IP_NO_OVERLAP) {
      $log->info("Net range $cidr is blocked");
      my $delta = $speed_cidr_data->{$cidr} - $now;
      result($stream, "44", "$delta");
      # no more processing
      return 1;
    }



( run in 2.010 seconds using v1.01-cache-2.11-cpan-d7a12ab2c7f )