Apache2-ClickPath
view release on metacpan or search on metacpan
NAME
Apache2::ClickPath - Apache WEB Server User Tracking
SYNOPSIS
LoadModule perl_module ".../mod_perl.so"
PerlLoadModule Apache2::ClickPath
<ClickPathUAExceptions>
Google Googlebot
MSN msnbot
Mirago HeinrichderMiragoRobot
Yahoo Yahoo-MMCrawler
Seekbot Seekbot
Picsearch psbot
Globalspec Ocelli
Naver NaverBot
Turnitin TurnitinBot
dir.com Pompos
search.ch search\.ch
IBM http://www\.almaden\.ibm\.com/cs/crawler/
</ClickPathUAExceptions>
ClickPathSessionPrefix "-S:"
ClickPathMaxSessionAge 18000
PerlTransHandler Apache2::ClickPath
PerlOutputFilterHandler Apache2::ClickPath::OutputFilter
LogFormat "%h %l %u %t \"%m %U%q %H\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" \"%{SESSION}e\""
ABSTRACT
"Apache2::ClickPath" can be used to track user activity on your web
server and gather click streams. Unlike mod_usertrack it does not use a
cookie. Instead the session identifier is transferred as the first part
on an URI.
Furthermore, in conjunction with a load balancer it can be used to
direct all requests belonging to a session to the same server.
DESCRIPTION
"Apache2::ClickPath" adds a PerlTransHandler and an output filter to
Apache's request cycle. The transhandler inspects the requested URI to
decide if an existing session is used or a new one has to be created.
The Translation Handler
If the requested URI starts with a slash followed by the session prefix
(see "ClickPathSessionPrefix" below) the rest of the URI up to the next
slash is treated as session identifier. If for example the requested URI
is "/-S:s9NNNd:doBAYNNNiaNQOtNNNNNM/index.html" then assuming
"ClickPathSessionPrefix" is set to "-S:" the session identifier would be
"s9NNNd:doBAYNNNiaNQOtNNNNNM".
Starting with version 1.8 a checksum is included in the session ID.
Further, some parts of the information contained in the session
including the checksum can be encrypted. This both makes a valid session
ID hard to guess. If an invalid session ID is detected an error message
is printed to the ErrorLog. So, a log watching agent can be set up to
catch frequent abuses.
If no session identifier is found a new one is created.
Then the session prefix and identifier are stripped from the current
URI. Also a potentially existing session is stripped from the incoming
"Referer" header.
There are several exceptions to this scheme. Even if the incoming URI
contains a session a new one is created if it is too old. This is done
to prevent link collections, bookmarks or search engines generating
endless click streams.
If the incoming "UserAgent" header matches a configurable regular
expression neither session identifier is generated nor output filtering
is done. That way search engine crawlers will not create sessions and
links to your site remain readable (without the session stuff).
The translation handler sets the following environment variables that
can be used in CGI programms or template systems (eg. SSI):
SESSION
the session identifier itself. In the example above
"s9NNNd:doBAYNNNiaNQOtNNNNNM" is assigned. If the "UserAgent"
prevents session generation the name of the matching regular
expression is assigned, (see "ClickPathUAExceptions").
CGI_SESSION
the session prefix + the session identifier. In the example above
"/-S:s9NNNd:doBAYNNNiaNQOtNNNNNM" is assigned. If the "UserAgent"
prevents session generation "CGI_SESSION" is empty.
SESSION_START
the request time of the request starting a session in seconds since
1/1/1970.
CGI_SESSION_AGE
the session age in seconds, i.e. CURRENT_TIME - SESSION_START.
REMOTE_SESSION
in case a friendly session was caught this variable contains it, see
below.
REMOTE_SESSION_HOST
in case a friendly session was caught this variable contains the
host it belongs to, see below.
EXPIRED_SESSION
if a session has expired and a new one has been created the old
session is stored here.
INVALID_SESSION
when a "ClickPathMachineTable" is used a check is accomplished to
ensure the session was created by on of the machines of the cluster.
If it was not a message is written to the "ErrorLog", a new one is
created and the invalid session is written to this environment
variable.
ClickPathMachineName
running on "shop.tld.org" and your company site running on
"www.tld.org". The shop does it's own URL based session management
but there are links from the shop to the company site and back.
Wouldn't it be nice if a customer once he has stepped into the shop
could click links to the company without loosing the shopping
session? This is where friendly sessions come in.
Since your shop's session management is URL based the "Referer" seen
by "www.tld.org" will be something like
https://shop.tld.org/cgi-bin/shop.pl?session=sdafsgr;clusterid=25
(if session and clusterid are passed as CGI parameters) or
https://shop.tld.org/C:25/S:sdafsgr/cgi-bin/shop.pl
(if session and clusterid are passed as URL parts) or something
mixed.
Assuming that "clusterid" and "session" both identify the session on
"shop.tld.org" "Apache2::ClickPath" can extract them, encode them in
it's own session and place them in environment variables.
Each line in the "ClickPathFriendlySessions" section decribes one
friendly site. The line consists of the friendly hostname, a list of
URL parts or CGI parameters identifying the friendly session and an
optional short name for this friend, eg:
shop.tld.org uri(1) param(session) shop
This means sessions at "shop.tld.org" are identified by the
combination of 1st URL part after the leading slash (/) and a CGI
parameter named "session".
If now a request comes in with a "Referer" of
"http://shop.tld.org/25/bin/shop.pl?action=showbasket;session=213"
the "REMOTE_SESSION" environment variable will contain 2 lines:
25
session=213
Their order is determined by the order of "uri()" and "param()"
statements in the configuration section between the hostname and the
short name. The "REMOTE_SESSION_HOST" environment variable will
contain the host name the session belongs to.
Now a CGI script or a modperl handler or something similar can fetch
the environment and build links back to "shop.tld.org". Instead of
directly linking back to the shop your links then point to that
script. The script then puts out an appropriate redirect.
ClickPathFriendlySessionsFile
this directive takes a filename as argument. The file's syntax and
semantic are the same as for "ClickPathFriendlySessions". The file
is reread every time is has been changed avoiding server restarts
after configuration changes at the prize of memory consumption.
ClickPathSecret
ClickPathSecretIV
if you want to run something like a shop with our session
identifiers they must be unguessable. That means knowing a valid
session ID it must be difficult to guess another one. With these
directives a significant part of the session ID is encrypted with
Blowfish in the cipher block chaining mode thus making the session
ID unguessable. "ClickPathSecret" specifies the key,
"ClickPathSecretIV" the initialization vector.
"ClickPathSecretIV" is a simple string of arbitrary length. The
first 8 bytes of its MD5 digest are used as initialization vector.
If omitted the string "abcd1234" is the IV.
"ClickPathSecret" is given as "http:", "https:", "file:" or "data:"
URL. Thus the secret can be stored directly as data-URL in the
httpd.conf or in a separate file on the local disk or on a possibly
secured server. To enable all modes of accessing the WEB the
http(s)-URL syntax is a bit extented. Maybe you have already used
"http://user:password@server.tld/...". Many browsers allow this
syntax to specify a username and password for HTTP authentication.
But how about proxies, SSL-authentication etc? Well, add another
colon (:) after the password and append a semicolon (;) delimited
list of "key=value" pairs. The special characters (@:;\) can be
quoted with a backslash (\). In fact, all characters can be quoted.
Thus, "\a" and "a" produce the same string "a".
The following keys are defined:
https_proxy
https_proxy_username
https_proxy_password
https_version
https_cert_file
https_key_file
https_ca_file
https_ca_dir
https_pkcs12_file
https_pkcs12_password
their meaning is defined in Crypt::SSLeay.
http_proxy
http_proxy_username
http_proxy_password
these are passed to LWP::UserAgent.
Remember a HTTP-proxy is accessed with the GET or POST, ...
methods whereas a HTTPS-proxy is accessed with CONNECT. Don't mix
them, see Crypt::SSLeay.
Examples
ClickPathSecret https://john:a\@b\;c\::https_ca_file=/my/ca.pem@secrethost.tld/bin/secret.pl?host=me
fetches the secret from
"https://secrethost.tdl/bin/secret.pl?host=me" using "john" as
username and "a@b;c:" as password. The server certificate of
secrethost.tld is verified against the CA certificate found in
"/my/ca.pem".
ClickPathSecret https://::https_pkcs12_file=/my/john.p12;https_pkcs12_password=a\@b\;c\:;https_ca_file=/my/ca.pem@secrethost.tld/bin/secret.pl?host=me
fetches the secret again from
"https://secrethost.tdl/bin/secret.pl?host=me" using "/my/john.p12"
as client certificate with "a@b;c:" as password. The server
certificate of secrethost.tld is again verified against the CA
certificate found in "/my/ca.pem".
( run in 0.532 second using v1.01-cache-2.11-cpan-39bf76dae61 )