Regexp-Log-BlueCoat

 view release on metacpan or  search on metacpan

lib/Regexp/Log/BlueCoat.pm  view on Meta::CPAN

package Regexp::Log::BlueCoat;

use strict;
use Carp;
use Regexp::Log 0.01;
use base qw( Regexp::Log );
use vars qw( $VERSION %DEFAULT %FORMAT %REGEXP %UFS );

$VERSION = 0.03;

=head1 NAME

Regexp::Log::BlueCoat - A regexp builder to parse BlueCoat log files

=head1 SYNOPSIS

    my $blue = Regexp::Log::BlueCoat->new(
        format  => '%g %e %a %w/%s %b %m %i %u %H/%d %c',
        capture => [qw( host code )],
    );

    # the format() and capture() methods can be used to set or get
    $blue->format('%g %e %a %w/%s %b %m %i %u %H/%d %c %f %A');
    $blue->capture(qw( host code ));
    $blue->ufs( 'smartfilter' );

    # this is necessary to know in which order
    # we will receive the captured fields from the regex
    my @fields = $blue->capture;

    # the all-powerful capturing regex :-)
    my $re = $blue->regex;

    while (<>) {
        my %data;
        @data{@fields} = /$re/;

        # do something with the fields
    }

=head1 DESCRIPTION

Regexp::Log::BlueCoat is a module that computes custom regular
expressions to parse log files generated by the BlueCoat Sytems
I<Port 80 Security Appliance>.

See the Regexp::Log documentation for a description of the standard
Regexp::Log interface.

=head2 Streaming media logs

This version of Regexp::Log::BlueCoat does not support streaming
related logs. You will have to add the following line at the beginning
of the log parsing loop in your scripts, if your BlueCoat appliance
is configured to log those events.

    next if /^(?:Windows_Media|<RealMedia>)/;

This may or may not be faster than have the regular expression generated
by the regexp() method fail on each streaming log line.

=cut

my $IP   = '\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}';
my $HOST = '[-.\\S]+';

# define the BlueCoat specific stuff
%REGEXP = (

    # %% - Denotes '%' character -
    '%%' => '%',

    # %a   c-ip Client IP address. Yes
    '%a' => "(?#=c-ip)$IP(?#!c-ip)",

    # %b   sc-bytes Number of bytes returned by the server (or the Cache).  Yes
    '%b' => '(?#=sc-bytes)-|\\d+(?#!sc-bytes)',

    # %c   cs (content-type) The type of object. Usually the MIME-type. No
    '%c' => '(?#=cs-content-type)-|UNKNOWN|\\S+(?:/\\S+)?(?#!cs-content-type)',

# %d   cs-supplier-name SUPPLIER NAME - Name or IP address of the server/cache from which the object was received.  Yes
    '%d' => "(?#=cs-supplier-name)-|$HOST(?#!cs-supplier-name)",

    # %e   time-taken Number of milliseconds request took to process.  Yes
    '%e' => '(?#=time-taken)\\d+(?#!time-taken)',

    # %f   sc-filter-category Filtering reason. Why it was denied (such as sex or business) No
    # this is handled in _postprocess()
    '%f' => '(?#=sc-filter-category)%f(?#!sc-filter-category)',

    # %g    timestamp UNIX type timestamp. Yes
    '%g' => '(?#=timestamp)\\d+\\.\\d+(?#!timestamp)',

    # %h    c-ip Client Hostname (uses IP to avoid reverse DNS) - same as %a Yes
    '%h' => "(?#=c-hostname)-|$HOST(?#!c-hostname)",

# %i    cs-uri The requested URI. Note: Web trends expects this to be only cs-uri-stem + cs-uri-query No
    '%i' => '(?#=cs-uri)-|\\S+://\\S+|.*?(?#!cs-uri)',

    # %j    -  [Not used.] -
    '%j' => '',

# %l    - Client Identification string. (User Login name remote). - always '-' Yes
# %m    cs-method HTTP method. HTTP methods include GET, PUT, POST, and so on.  Yes
    '%m' =>
'(?#=cs-method)-|OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT(?#!cs-method)',

    # %n    - [Not used.] -
    '%n' => '',

    # %o    - [Not used.] -
    '%o' => '',

    # %p    r-port Port fetched from on host - origin server port Yes
    '%p' => '(?#=r-port)\\d+(?#!r-port)',

    # %q    - [Not used.] -
    '%q' => '',

lib/Regexp/Log/BlueCoat.pm  view on Meta::CPAN

        po                 => "politics, opinion, religion",
        mm                 => "dating",
        ac                 => "art/culture",
        na                 => "usenet news access",
        oc                 => "cults/occult",
        na                 => "Usenet News",
        sh                 => "self help",
        tr                 => "travel",
        mt                 => "mature",
        wm                 => "webmail",
        ps                 => "portal sites",
        an                 => "anonymizer/translator",
        u0                 => "user defined category 0",
        u1                 => "user defined category 1",
        u2                 => "user defined category 2",
        u3                 => "user defined category 3",
        u4                 => "user defined category 4",
        u5                 => "user defined category 5",
        u6                 => "user defined category 6",
        u7                 => "user defined category 7",
        u8                 => "user defined category 8",
        u9                 => "user defined category 9",
    },
    websense => {},
);
 
%DEFAULT = (
    format  => '',
    capture => [],
    ufs     => '',
    login   => '',
    _ufs    => { map { ($_, {} ) } keys %UFS },
);

sub _preprocess {
    my $self  = shift;
    my $login = $self->{login};

    # Login specific regexps
    $self->{_regexp} =~ s/%u/%u-$login/g
      if defined $login && $login =~ /^(?:ldap|username)$/;

    # Multiple consecutive spaces are compressed to a single space
    $self->{_regexp} =~ s/ +/ /g;
}

sub _postprocess {
    my $self = shift;
    my $ufs  = $self->{ufs};

    # UFS specific regexps
    if ( defined $ufs and $ufs ne '' ) {
        my %categories = ( %{ $UFS{$ufs} }, %{ $self->{_ufs}{$ufs} } );
        my $categories = join '|', sort values %categories;
        $self->{_regexp} =~ s/%f/$categories/g;
    }
}

=head1 TODO

Support streaming logs: Windows Media and RealMedia.

Support the W3C Extended Log File Format (ELFF), which is a subset of
the Blue Coat format where each field is described using a text string.

Have a look at the entries that produce multi-line logs.

=head1 BUGS

Most of the developpement has been done when I was trying to process
logs created with the following format:
C<%g %e %a %w/%s %b %m %i %u %H/%d %c %f %A>.

Which means that the regular expressions that this module produces do not
cover every possible format.

If Regexp::Log::BlueCoat's regular expressions do not match some of the
log that you are trying to munge, please use the F<eg/notmatch.pl> script
and send the resulting file to me.

=head1 REFERENCES

Blue Coat Systems Port 80 Security Appliance, I<Configuration and Management
Guide>: http://www.bluecoat.com/downloads/manuals/BC_Config_Mgmt_Guide.pdf

Secure Computing Smartfilter, I<Installation & Configuration Guide>,
version 3.1.2: http://www.securecomputing.com/pdf/SFConfig312_IC_RevE.pdf

=head1 THANKS

Thanks to Jarkko Hietaniemi for Regex::PreSuf.

=head1 AUTHOR

Philippe 'BooK' Bruhat E<lt>book@cpan.orgE<gt>.

=head1 LICENCE

This module is free software; you can redistribute it or modify it under
the same terms as Perl itself.

=cut

1;



( run in 2.205 seconds using v1.01-cache-2.11-cpan-f56aa216473 )