App-Phoebe
view release on metacpan or search on metacpan
lib/App/Phoebe/SpeedBump.pm view on Meta::CPAN
=encoding utf8
=head1 NAME
App::Phoebe::SpeedBump - defend Phoebe against bots and leeches
=head1 DESCRIPTION
We want to block crawlers that are too fast or that donât follow the
instructions in robots.txt. We do this by keeping a list of recent visitors: for
every IP number, we remember the timestamps of their last visits. If they make
more than 30 requests in 60s, we block them for an ever increasing amount of
seconds, starting with 60s and doubling every time this happens.
For every IP number, Phoebe also records whether the last 30 requests were
âsuspiciousâ or not. A suspicious request is a request that is âdisallowedâ for
bots according to ârobots.txtâ (more or less). If 10 requests or more of the
last 30 requests in the last 60 seconds are suspicious, the IP number is
blocked.
When an IP number is blocked, it is blocked for 60s, and thereâs a 120s
probation time. When youâre blocked, Phoebe responds with a â44â response. This
means: slow down!
If the IP number is unblocked but gives cause for another block in the probation
time, it is blocked again and the blocking time is doubled: the IP is blocked
for 120s and thereâs 240s probation time. And if it happens again, it is doubled
again.
There is no configuration required, but adding a known fingerprint is suggested.
The C</do/speed-bump> URL shows you more information, if you have a client
certificate with a known fingerprint.
The exact number of requests and the length of the time window (in seconds) can
be changed in the F<config> file, too.
Hereâs one way to do all that:
package App::Phoebe;
our @known_fingerprints = qw(
sha256$0ba6ba61da1385890f611439590f2f0758760708d1375859b2184dcd8f855a00);
package App::Phoebe::SpeedBump;
our $speed_bump_requests = 20;
our $speed_bump_window = 20;
use App::Phoebe::SpeedBump;
Hereâs how to get the fingerprint from a certificate named F<client-cert.pem>:
openssl x509 -in client-cert.pem -noout -sha256 -fingerprint \
| sed -e 's/://g' -e 's/SHA256 Fingerprint=/sha256$/' \
| tr [:upper:] [:lower:]
This should give you the fingerprint in the correct format to add to the list
above.
=cut
package App::Phoebe::SpeedBump;
use App::Phoebe qw(@extensions $log $server @known_fingerprints
success result port host_regex );
use Modern::Perl;
use File::Slurper qw(read_binary write_binary);
use List::Util qw(sum);
use Mojo::JSON qw(decode_json encode_json);
use Net::IP qw(:PROC);
use Net::DNS qw(rr);
@known_fingerprints = qw(
sha256$54c0b95dd56aebac1432a3665107d3aec0d4e28fef905020ed6762db49e84ee1);
our $speed_bump_requests ||= 30;
our $speed_bump_window ||= 60;
# $speed_data->{$ip}->{visits} = [$last, ... , $oldest]
# $speed_data->{$ip}->{warnings} = [1, ... , 0]
# $speed_data->{$ip}->{seconds} = $sec
# $speed_data->{$ip}->{until} = $ts
# $speed_data->{$ip}->{probation} = $ts + $sec
my $speed_data;
# $speed_cidr_data->{$cidr} = $ts
my $speed_cidr_data;
# order is important: we must be able to reset the stats for tests; and we need
# to be there before others handle our requests
unshift(@extensions, \&speed_bump_admin, \&speed_bump);
sub speed_bump {
my ($stream, $url) = @_;
my $now = time;
# go through the data we keep and delete it if the two time limits ellapsed
# and the last visit is past the time window we're interested in
for my $ip (keys %$speed_data) {
if ((not $speed_data->{$ip}->{until}
or $speed_data->{$ip}->{until} < $now)
and (not $speed_data->{$ip}->{probation}
or $speed_data->{$ip}->{probation} < $now)
and (not $speed_data->{$ip}->{visits}
or @{$speed_data->{$ip}->{visits}} == 0
or $speed_data->{$ip}->{visits}->[0] < $now - $speed_bump_window)) {
delete($speed_data->{$ip});
}
}
for my $cidr (keys %$speed_cidr_data) {
delete($speed_cidr_data->{$cidr}) if $speed_cidr_data->{$cidr} < $now;
}
# check whether the range is blocked
my $ip = $stream->handle->peerhost;
if (not $ip) {
$log->info("IP number cannot be determined");
result($stream, "44", "10");
# no more processing
return 1;
}
my $ob = new Net::IP($ip);
for my $cidr (keys %$speed_cidr_data) {
my $range = new Net::IP($cidr);
if (not $range) {
$log->error("$cidr: " . Net::IP::Error());
next;
}
my $overlap = $range->overlaps($ob);
( run in 0.807 second using v1.01-cache-2.11-cpan-d7a12ab2c7f )