Apache2-ClickPath

 view release on metacpan or  search on metacpan

lib/Apache2/ClickPath.pm  view on Meta::CPAN

(if session and clusterid are passed as URL parts) or something mixed.

Assuming that C<clusterid> and C<session> both identify the session on
C<shop.tld.org> C<Apache2::ClickPath> can extract them, encode them in it's
own session and place them in environment variables.

Each line in the C<ClickPathFriendlySessions> section decribes one friendly
site. The line consists of the friendly hostname, a list of URL parts or
CGI parameters identifying the friendly session and an optional short name
for this friend, eg:

 shop.tld.org uri(1) param(session) shop

This means sessions at C<shop.tld.org> are identified by the combination
of 1st URL part after the leading slash (/) and a CGI parameter named
C<session>.

If now a request comes in with a C<Referer> of
C<http://shop.tld.org/25/bin/shop.pl?action=showbasket;session=213>
the C<REMOTE_SESSION> environment variable will contain 2 lines:

 25
 session=213

Their order is determined by the order of C<uri()> and C<param()> statements
in the configuration section between the hostname and the short name. The
C<REMOTE_SESSION_HOST> environment variable will contain the host name the
session belongs to.

Now a CGI script or a modperl handler or something similar can fetch the
environment and build links back to C<shop.tld.org>. Instead of directly
linking back to the shop your links then point to that script. The script
then puts out an appropriate redirect.

=item B<ClickPathFriendlySessionsFile>

this directive takes a filename as argument. The file's syntax and semantic
are the same as for C<ClickPathFriendlySessions>. The file is reread every time
is has been changed avoiding server restarts after configuration changes at
the prize of memory consumption.

=item B<ClickPathSecret>

=item B<ClickPathSecretIV>

if you want to run something like a shop with our session identifiers they
must be unguessable. That means knowing a valid session ID it must be
difficult to guess another one. With these directives a significant part
of the session ID is encrypted with Blowfish in the cipher block chaining
mode thus making the session ID unguessable. C<ClickPathSecret> specifies
the key, C<ClickPathSecretIV> the initialization vector.

C<ClickPathSecretIV> is a simple string of arbitrary length. The first 8
bytes of its MD5 digest are used as initialization vector. If omitted the
string C<abcd1234> is the IV.

C<ClickPathSecret> is given as C<http:>, C<https:>, C<file:> or C<data:> URL.
Thus the secret can be stored directly as data-URL in the httpd.conf or in a
separate file on the local disk or on a possibly secured server. To enable
all modes of accessing the WEB the http(s)-URL syntax is a bit extented.
Maybe you have already used C<http://user:password@server.tld/...>. Many
browsers allow this syntax to specify a username and password for HTTP
authentication. But how about proxies, SSL-authentication etc? Well, add
another colon (:) after the password and append a semicolon (;) delimited
list of C<key=value> pairs. The special characters (@:;\) can be quoted
with a backslash (\). In fact, all characters can be quoted. Thus, C<\a> and
C<a> produce the same string C<a>.

The following keys are defined:

=over 2

=item B<https_proxy>

=item B<https_proxy_username>

=item B<https_proxy_password>

=item B<https_version>

=item B<https_cert_file>

=item B<https_key_file>

=item B<https_ca_file>

=item B<https_ca_dir>

=item B<https_pkcs12_file>

=item B<https_pkcs12_password>

their meaning is defined in L<Crypt::SSLeay>.

=item B<http_proxy>

=item B<http_proxy_username>

=item B<http_proxy_password>

these are passed to L<LWP::UserAgent>.

Remember a HTTP-proxy is accessed with the GET or POST, ... methods whereas
a HTTPS-proxy is accessed with CONNECT. Don't mix them, see L<Crypt::SSLeay>.

=back

B<Examples>

 ClickPathSecret https://john:a\@b\;c\::https_ca_file=/my/ca.pem@secrethost.tld/bin/secret.pl?host=me

fetches the secret from C<https://secrethost.tdl/bin/secret.pl?host=me>
using C<john> as username and C<a@b;c:> as password. The server certificate
of secrethost.tld is verified against the CA certificate found in
C</my/ca.pem>.

 ClickPathSecret https://::https_pkcs12_file=/my/john.p12;https_pkcs12_password=a\@b\;c\:;https_ca_file=/my/ca.pem@secrethost.tld/bin/secret.pl?host=me

fetches the secret again from C<https://secrethost.tdl/bin/secret.pl?host=me>
using C</my/john.p12> as client certificate with C<a@b;c:> as password.
The server certificate of secrethost.tld is again verified against the CA
certificate found in C</my/ca.pem>.

 ClickPathSecret data:,password:very%20secret%20password

here a data-URL is used that produces the content
C<password:very secret password>.

The URL's content is fetched by L<LWP::UserAgent> once at server startup.

Its content defines the secret either in binary form or as string of
hexadecimal characters or as a password. If it starts with C<binary:> the
rest of the content is taken as is as the key. If it starts with C<hex:>
C<pack( 'H*', $arg )> is used to convert it to binary. If it starts with
C<password:> or with neither of them the MD5 digest of the rest of the
content is used as secret.

The Blowfish algorithm allows up to 56 bytes as secret. In hex and binary
mode the starting 56 bytes are used. You can specify more bytes but they
won't be regarded. In password mode the MD5 algorithm produces
16 bytes long secret.

=back

=head2 Working with a load balancer

Most load balancers are able to map a request to a particular machine
based on a part of the request URI. They look for a prefix followed
by a given number of characters or until a suffix is found. The string
between identifies the machine to route the request to.

The name set with C<ClickPathMachine> can be used by a load balancer.
It is immediately following the session prefix and finished by a single
colon. The default name is always 6 bytes long.

=head2 Logging

The most important part of user tracking and clickstreams is logging.
With C<Apache2::ClickPath> many request URIs contain an initial session part.
Thus, for logfile analyzers most requests are unique which leads to
useless results. Normally Apache's common logfile format starts with

 %h %l %u %t \"%r\"

C<%r> stands for I<the request>. It is the first line a browser sends to
a server. For use with C<Apache2::ClickPath> C<%r> is better changed to
C<%m %U%q %H>. Since C<Apache2::ClickPath> strips the session part from
the current URI C<%U> appears without the session. With this modification
logfile analyzers will produce meaningful results again.

The session can be logged as C<%{SESSION}e> at end of a logfile line.

=head2 A word about proxies

Depending on your content and your users community HTTP proxies can
serve a significant part of your traffic. With C<Apache2::ClickPath>
almost all request have to be served by your server.

=head2 Debugging

Sometimes it is useful to know the information encoded in a session
identifier. This is why L<Apache2::ClickPath::Decode> exists.

=head1 SEE ALSO

L<Apache2::ClickPath::Store>
L<Apache2::ClickPath::StoreClient>
L<Apache2::ClickPath::Decode>
L<http://perl.apache.org>,
L<http://httpd.apache.org>

=head1 AUTHOR

Torsten Foertsch, E<lt>torsten.foertsch@gmx.netE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2004-2005 by Torsten Foertsch

This library is free software; you can redistribute it and/or modify



( run in 5.158 seconds using v1.01-cache-2.11-cpan-2398b32b56e )