Apache2-ClickPath

 view release on metacpan or  search on metacpan

lib/Apache2/ClickPath.pm  view on Meta::CPAN

}

1;

__END__

=head1 NAME

Apache2::ClickPath - Apache WEB Server User Tracking

=head1 SYNOPSIS

 LoadModule perl_module ".../mod_perl.so"
 PerlLoadModule Apache2::ClickPath
 <ClickPathUAExceptions>
   Google     Googlebot
   MSN        msnbot
   Mirago     HeinrichderMiragoRobot
   Yahoo      Yahoo-MMCrawler
   Seekbot    Seekbot
   Picsearch  psbot
   Globalspec Ocelli
   Naver      NaverBot
   Turnitin   TurnitinBot
   dir.com    Pompos
   search.ch  search\.ch
   IBM        http://www\.almaden\.ibm\.com/cs/crawler/
 </ClickPathUAExceptions>
 ClickPathSessionPrefix "-S:"
 ClickPathMaxSessionAge 18000
 PerlTransHandler Apache2::ClickPath
 PerlOutputFilterHandler Apache2::ClickPath::OutputFilter
 LogFormat "%h %l %u %t \"%m %U%q %H\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" \"%{SESSION}e\""

=head1 ABSTRACT

C<Apache2::ClickPath> can be used to track user activity on your web server
and gather click streams. Unlike mod_usertrack it does not use a cookie.
Instead the session identifier is transferred as the first part on an URI.

Furthermore, in conjunction with a load balancer it can be used to direct
all requests belonging to a session to the same server.

=head1 DESCRIPTION

C<Apache2::ClickPath> adds a PerlTransHandler and an output filter to
Apache's request cycle. The transhandler inspects the requested URI to
decide if an existing session is used or a new one has to be created.

=head2 The Translation Handler

If the requested URI starts with a slash followed by the session prefix
(see L</"B<ClickPathSessionPrefix>"> below) the rest of the URI up to the next
slash is treated as session identifier. If for example the requested URI
is C</-S:s9NNNd:doBAYNNNiaNQOtNNNNNM/index.html> then assuming
C<ClickPathSessionPrefix> is set to C<-S:> the session identifier would be
C<s9NNNd:doBAYNNNiaNQOtNNNNNM>.

Starting with version 1.8 a checksum is included in the session ID. Further,
some parts of the information contained in the session including the checksum
can be encrypted. This both makes a valid session ID hard to guess. If an
invalid session ID is detected an error message is printed to the ErrorLog.
So, a log watching agent can be set up to catch frequent abuses.

If no session identifier is found a new one is created.

Then the session prefix and identifier are stripped from the current URI.
Also a potentially existing session is stripped from the incoming C<Referer>
header.

There are several exceptions to this scheme. Even if the incoming URI
contains a session a new one is created if it is too old. This is done
to prevent link collections, bookmarks or search engines generating
endless click streams.

If the incoming C<UserAgent> header matches a configurable regular
expression neither session identifier is generated nor output filtering
is done. That way search engine crawlers will not create sessions and
links to your site remain readable (without the session stuff).

The translation handler sets the following environment variables that
can be used in CGI programms or template systems (eg. SSI):

=over 4

=item B<SESSION>

the session identifier itself. In the example above
C<s9NNNd:doBAYNNNiaNQOtNNNNNM> is assigned. If the C<UserAgent> prevents
session generation the name of the matching regular expression is
assigned, (see L</"B<ClickPathUAExceptions>">).

=item B<CGI_SESSION>

the session prefix + the session identifier. In the example above
C</-S:s9NNNd:doBAYNNNiaNQOtNNNNNM> is assigned. If the C<UserAgent> prevents
session generation C<CGI_SESSION> is empty.

=item B<SESSION_START>

the request time of the request starting a session in seconds since 1/1/1970.

=item B<CGI_SESSION_AGE>

the session age in seconds, i.e. CURRENT_TIME - SESSION_START.

=item B<REMOTE_SESSION>

in case a friendly session was caught this variable contains it, see below.

=item B<REMOTE_SESSION_HOST>

in case a friendly session was caught this variable contains the host it
belongs to, see below.

=item B<EXPIRED_SESSION>

if a session has expired and a new one has been created the old session is
stored here.

=item B<INVALID_SESSION>

lib/Apache2/ClickPath.pm  view on Meta::CPAN

does it's own URL based session management but there are links from the
shop to the company site and back. Wouldn't it be nice if a customer once
he has stepped into the shop could click links to the company without loosing
the shopping session? This is where friendly sessions come in.

Since your shop's session management is URL based the C<Referer> seen
by C<www.tld.org> will be something like

 https://shop.tld.org/cgi-bin/shop.pl?session=sdafsgr;clusterid=25

(if session and clusterid are passed as CGI parameters) or

 https://shop.tld.org/C:25/S:sdafsgr/cgi-bin/shop.pl

(if session and clusterid are passed as URL parts) or something mixed.

Assuming that C<clusterid> and C<session> both identify the session on
C<shop.tld.org> C<Apache2::ClickPath> can extract them, encode them in it's
own session and place them in environment variables.

Each line in the C<ClickPathFriendlySessions> section decribes one friendly
site. The line consists of the friendly hostname, a list of URL parts or
CGI parameters identifying the friendly session and an optional short name
for this friend, eg:

 shop.tld.org uri(1) param(session) shop

This means sessions at C<shop.tld.org> are identified by the combination
of 1st URL part after the leading slash (/) and a CGI parameter named
C<session>.

If now a request comes in with a C<Referer> of
C<http://shop.tld.org/25/bin/shop.pl?action=showbasket;session=213>
the C<REMOTE_SESSION> environment variable will contain 2 lines:

 25
 session=213

Their order is determined by the order of C<uri()> and C<param()> statements
in the configuration section between the hostname and the short name. The
C<REMOTE_SESSION_HOST> environment variable will contain the host name the
session belongs to.

Now a CGI script or a modperl handler or something similar can fetch the
environment and build links back to C<shop.tld.org>. Instead of directly
linking back to the shop your links then point to that script. The script
then puts out an appropriate redirect.

=item B<ClickPathFriendlySessionsFile>

this directive takes a filename as argument. The file's syntax and semantic
are the same as for C<ClickPathFriendlySessions>. The file is reread every time
is has been changed avoiding server restarts after configuration changes at
the prize of memory consumption.

=item B<ClickPathSecret>

=item B<ClickPathSecretIV>

if you want to run something like a shop with our session identifiers they
must be unguessable. That means knowing a valid session ID it must be
difficult to guess another one. With these directives a significant part
of the session ID is encrypted with Blowfish in the cipher block chaining
mode thus making the session ID unguessable. C<ClickPathSecret> specifies
the key, C<ClickPathSecretIV> the initialization vector.

C<ClickPathSecretIV> is a simple string of arbitrary length. The first 8
bytes of its MD5 digest are used as initialization vector. If omitted the
string C<abcd1234> is the IV.

C<ClickPathSecret> is given as C<http:>, C<https:>, C<file:> or C<data:> URL.
Thus the secret can be stored directly as data-URL in the httpd.conf or in a
separate file on the local disk or on a possibly secured server. To enable
all modes of accessing the WEB the http(s)-URL syntax is a bit extented.
Maybe you have already used C<http://user:password@server.tld/...>. Many
browsers allow this syntax to specify a username and password for HTTP
authentication. But how about proxies, SSL-authentication etc? Well, add
another colon (:) after the password and append a semicolon (;) delimited
list of C<key=value> pairs. The special characters (@:;\) can be quoted
with a backslash (\). In fact, all characters can be quoted. Thus, C<\a> and
C<a> produce the same string C<a>.

The following keys are defined:

=over 2

=item B<https_proxy>

=item B<https_proxy_username>

=item B<https_proxy_password>

=item B<https_version>

=item B<https_cert_file>

=item B<https_key_file>

=item B<https_ca_file>

=item B<https_ca_dir>

=item B<https_pkcs12_file>

=item B<https_pkcs12_password>

their meaning is defined in L<Crypt::SSLeay>.

=item B<http_proxy>

=item B<http_proxy_username>

=item B<http_proxy_password>

these are passed to L<LWP::UserAgent>.

Remember a HTTP-proxy is accessed with the GET or POST, ... methods whereas
a HTTPS-proxy is accessed with CONNECT. Don't mix them, see L<Crypt::SSLeay>.

=back

B<Examples>

 ClickPathSecret https://john:a\@b\;c\::https_ca_file=/my/ca.pem@secrethost.tld/bin/secret.pl?host=me



( run in 1.221 second using v1.01-cache-2.11-cpan-39bf76dae61 )