Apache-Dynagzip

 view release on metacpan or  search on metacpan

Dynagzip.pm  view on Meta::CPAN

	}
	return $headers;
}

sub send_lightly_compressed_stream { # call model: send_lightly_compressed_stream($r, $fh);
	# Transfer the stream from filehandle $fh to standard output
	# using "blank-space compression only...
	#
	my $r = shift;
	my $fh = shift;
	my $body = ''; # incoming content
	my $buf;
	my $lbr = Compress::LeadingBlankSpaces->new();
	while (defined($buf = <$fh>)){
		if ($buf = $lbr->squeeze_string ($buf)) {
			$body .= $buf;
			print ($buf);
		}
	}
	return $body;
}
sub send_lightly_compressed_stream_chunked { # call model: send_lightly_compressed_stream_chunked($r, $fh, $minChSize);
	# Transfer the stream chunked from filehandle $fh to standard output
	# using "blank-space compression only...
	#
	my $r = shift;
	my $fh = shift;
	my $minChunkSizePP = shift;
	my $body = ''; # incoming content
	my $buf;
	my $chunkBody = '';
	my $lbr = Compress::LeadingBlankSpaces->new();

	while (defined($buf = <$fh>)){
		$buf = $lbr->squeeze_string ($buf);
		if (length($buf) > 0){
			$chunkBody .= $buf;
		}
		if (length($chunkBody) > $minChunkSizePP){ # send it...
			$body .= $chunkBody;
			print (chunk_out($chunkBody));
			$chunkBody = '';
		}
	}
	if (length($chunkBody) > 0){ # send it...
		$body .= $chunkBody;
		print (chunk_out($chunkBody));
		$chunkBody = '';
	}
	return $body;
}

sub chunkable { # call model: $var = chunkable($r);
	# Check if the response could be chunked
	#
	my $r = shift;
	my $result = undef;
	# this is to downgrade to HTTP/1.0 for MSIE requests over SSL
	# works in conjunction with this snippet from httpd.conf:
	# SetEnvIf User-Agent ".*MSIE.*" \
	# nokeepalive ssl-unclean-shutdown \
	# downgrade-1.0 force-response-1.0
	#
	if ( $ENV{'downgrade-1.0'} or $ENV{'force-response-1.0'} ) {
		$result = 0;
	} elsif ($r->protocol =~ /http\/1\.(\d+)/io) {
		# any HTTP/1.X is OK, just X==0 will be evaluated to FALSE in result
		$result = $1;
	}
	return $result;
}

sub chunk_out { # call model: my $chunk = chunk_out ($string);
	my $HttpEol = "\015\012";  # HTTP end of line marker (see RFC 2068)
	my $source = shift;
	return  sprintf("%x",length($source)).$HttpEol.$source.$HttpEol;
}

sub kill_over_env { # just to clean up the unnessessary environment
	delete($ENV{HISTSIZE});
	delete($ENV{HOSTNAME});
	delete($ENV{LOGNAME});
	delete($ENV{HISTFILESIZE});
	delete($ENV{SSH_TTY});
	delete($ENV{MAIL});
	delete($ENV{MACHTYPE});
	delete($ENV{TERM});
	delete($ENV{HOSTTYPE});
	delete($ENV{OLDPWD});
	delete($ENV{HOME});
	delete($ENV{INPUTRC});
	delete($ENV{SUDO_GID});
	delete($ENV{SHELL});
	delete($ENV{SUDO_UID});
	delete($ENV{USER});
	delete($ENV{SUDO_USER});
	delete($ENV{SSH_CLIENT});
	delete($ENV{OSTYPE});
	delete($ENV{PWD});
	delete($ENV{SHLVL});
	delete($ENV{SUDO_COMMAND});
	delete($ENV{_});
	delete($ENV{HTTP_CONNECTION});
}

sub cgi_headers_from_script {
	# boolin function to determine whether it was configured to retrieve CGI headers from script, or not.
	# 
	# Could it be possible to have Content-Type coming from the previous filter?
	# call model: my $condition = cgi_headers_from_script($r);
	my $r = shift;
	my $res = lc $r->dir_config('UseCGIHeadersFromScript') eq 'on';
	return $res;
}

sub handler { # it is supposed to be only a dispatcher since now...

	my $r = shift;
	my $HttpEol = "\015\012";  # HTTP end of line marker (see RFC 2068)
	my $fh = undef; # will be the reference to the incoming data stream

Dynagzip.pm  view on Meta::CPAN


I<... So I am basically looking for anyone who has had any success in achieving this
kind of "streaming" compression, who could direct me at an appropriate Apache module.">

=for html
</blockquote>

Unfortunately for him, C<Apache::Dynagzip> has not yet been publicly available at that time...

Since relesed this handler is especially useful when one needs to compress the outgoing
web content that is dynamically generated on the fly using templates,
DB data, XML, etc., and when at the time of the request it is impossible
to determine the length of the response.

Content provider can benefit additionally from the fact that handler begins the transmission
of compressed data concurent to further document creation.
On the other hand, the internal buffer inside the
handler prevents Apache from the creation of too short chunks over C<HTTP/1.1>.

In order to simplify the use of this handler on public/open-source sites,
the capability of content compression over HTTP/1.0 was added to this handler since the version 0.06.
This helps to avoid dynamic invocation of other Apache handlers
for the content generation phase.

=head2 Acknowledgments

Thanks to Tom Evans, Valerio Paolini, and Serge Bizyayev for their valuable idea contributions and multiple testing.
Thanks to Igor Sysoev and Henrik Nordstrom those helped me to understand better the HTTP/1.0 compression features.
Thanks to Vlad Jebelev for the patch that helps to survive possible dynamical Apache downgrade
from HTTP/1.1 to HTTP/1.0 (especially serving MSIE requests over SSL).
Thanks to Rob Bloodgood and Damyan Ivanov for the patches those help to eliminate some unnecessary warnings.
Thanks to John Siracusa for the hint that helps to control the content type properly.
Thanks to Richard Chen for the bug report concerning some uncompressed responses.

Obviously, I hold a full responsibility for how all those contributions are implemented.

=head1 DESCRIPTION

The main pupose of this package is to serve the C<content generation phase> within the mod_perl enabled
C<Apache 1.3.X>, providing dynamic on the fly compression of outgoing web content.
This is done through the use of C<zlib> library via the C<Compress::Zlib> perl interface
to serve both C<HTTP/1.0> and C<HTTP/1.1> requests from clients/browsers,
capable to understand C<gzip> format and decompress it on the fly.
This handler does never C<gzip> content for clients/browsers
those do not declare the ability to decompress C<gzip> format.

In fact, this handler mainly serves as a kind of
customizable filter of outbound web content for C<Apache 1.3.X>.

This handler is supposed to be used within C<Apache::Filter> chain mostly in order to serve the
outgoing content that is dynamically generated on the fly by Perl and/or Java.
It is featured to serve the regular CGI binaries (C-written for examle)
as a standalong handler out of C<Apache::Filter> chain.
As an extra option, this handler can be used to compress dynamically the huge static
files, and to transfer gzipped content in the form of a stream back to the
client browser. For the last purpose C<Apache::Dynagzip> handler should be configured as
a standalong handler out of C<Apache::Filter> chain too.

Working over C<HTTP/1.0> this handler indicates the end of data stream by closing connection.
Indeed, over C<HTTP/1.1> the outgoing data is compressed within a chunked outgoing stream,
keeping the connection alive. Resonable control over the chunk-size is provided in this case.

In order to serve better the older web clients,
an C<extra light> compression is provided independently in order to remove
unnecessary leading blank spaces and/or blank lines
from the outgoing web content. This C<extra light> compression could be combined with
the main C<gzip> compression, when necessary.

The list of features of this handler includes:

=over 4

=item ·
Support for both HTTP/1.0 and HTTP/1.1 requests.

=item ·
Reasonable control over the size of content chunks for HTTP/1.1.

=item ·
Support for Perl, Java, or C/C++ CGI applications in order to provide dynamic on-the-fly compression of outbound content.

=item ·
Optional C<extra light> compression for all browsers, including older ones that incapable to decompress gzipped content.

=item ·
Optional control over the duration of the content's life in client/proxy local cache.

=item ·
Limited control over the proxy caching.

=item ·
Optional support for server-side caching of dynamically generated content.

=back

=head2 Compression Features

C<Apache::Dynagzip> provides content compression for both C<HTTP/1.0> and C<HTTP/1.1>
in accordance with the type of the initial request.

There are two types of compression, which could be applied to outgoing content by this handler:

  - extra light compression
  - gzip compression

These compressions could be applied independently, or in combination.

An C<extra light> compression is provided in order to remove leading blank spaces and/or blank lines
from the outgoing web content. It is supposed to serve the ASCII data types like C<html>,
C<JavaScript>, C<css>, etc. The implementation of C<extra light> compression is turned off
by default. It could be turned on with the statement

  PerlSetVar LightCompression On

in C<httpd.conf>. The value "On" is case-insensitive.
Any other value turns the C<extra light> compression off.

The main C<gzip> format is described in rfc1952.
This type of compression is applied when the client is recognized as one capable
to decompress C<gzip> format on the fly. In this version the decision is under the control
of whether the client sends the C<Accept-Encoding: gzip> HTTP header within the request, or not.

Dynagzip.pm  view on Meta::CPAN


When serving CGI binary this version of the handler is CGI/1.1 compatible.
It accepts CGI headers from the binary and produces a set of required HTTP headers
followed by gzipped content.

=head2 POST Request Features

I have to serve the POST requests for CGI binary with special care,
because in this case the handler
is standing along and have to serve all data flow in both directions
at the time when C<stdin> is tied into
Apache, and could not be exposed to CGI binary transparently.

To solve the problem I alter POST with GET request internally
doing the required incoming data transformations on the fly.

This could cause a problem, when you have a huge incoming stream from your client (more than 4K bytes).
Another problem could appear if your CGI binary is capable to distinguish POST and GET requests internally.

=head2 Control over the Client Cache

The control over the lifetime of the response in client's cache is provided
through implementation of C<Expires> HTTP header:

The Expires entity-header field gives the date/time after which the response should be considered stale.
A stale cache entry may not normally be returned by a cache (either a proxy cache or an user agent cache)
unless it is first validated with the origin server (or with an intermediate cache that has a fresh copy
of the entity). The format is an absolute date and time as defined by HTTP-date in section 3.3;
it MUST be in rfc1123-date format:

C<Expires = "Expires" ":" HTTP-date>

This handler creates the C<Expires> HTTP header, adding the C<pageLifeTime> to the date-time
of the request. The internal variable C<pageLifeTime> has default value

  pageLifeTime = 300 # sec.

that could be overwriten in C<httpd.conf> for example as:

  PerlSetVar pageLifeTime 1800

to make the C<pageLifeTime = 30 minutes>.

During the lifetime the client (browser) will
not even try to access the server when user requests the same URL again.
Instead, it restarts the page from the local cache.

It's important to point out here, that all initial JavaScripts will be restarted indeed,
so you can rotate your advertisements and dynamic content when needed.

The second important point should be mentioned here: when user clicks the "Refresh" button, the
browser will reload the page from the server unconditionally. This is right behavior,
because it is exactly what the human user expects from "Refresh" button.

=over 4

=item Notes:

The lifetime defined by C<Expires> depends on accuracy of time settings on client machine.
If the client's local clock is running 1 hour back, the cached copy of
the page will be alive 60 minutes longer on that machine.

C<Apache::Dynagzip> never overwrites C<Expires> header set by earlier handler inside the filter-chain.

=back

=head2 Support for the Server-Side Cache

In order to support the Server-Side Cache
I place a reference to the dynamically generated document to the C<notes()>
when the Server-Side Cache Support is ordered.
The referenced document could be already compressed with
an C<extra light> compression (if an C<extra light> compression is in effect for the current request).

In this case the regular dynamic C<gzip> compression takes place as usual
and the effective C<gzip> compression is supposed to take place within the C<log> stage
of the request processing flow.

You usually should not care about this feature of C<Apache::Dynagzip>
unless you use it in your own chain of handlers for the various phases of the request processing.

=head2 Control over the Proxy Cache.

Control over the (possible) proxy cache is provided through the implementation of C<Vary>
HTTP header.
Within C<Apache::Dynagzip> this header is under the control of few simple rules:

=over 4

=item *

C<Apache::Dynagzip> does never generate this header unless C<gzip> compression is provided.

=item *

The value of C<Accept-Encoding> is always provided for this header, accompanying C<gzip> compression.

=item *

Advanced control over the proxy cache is provided since the version 0.07
with optional extension of Vary HTTP header.
This extension could be placed into your configuration file, using directive

C<PerlSetVar Vary E<lt>valueE<gt>>

Particularly, it might be helpful to indicate the content, which depends on some conditions,
other than just compression features.
For example, when the content is personalized, someone might wish to use
the "*" C<Vary> extension in order to prevent any proxy caching.

When the outgoing content is gzipped, this extension will be appended to the regular C<Vary> header,
like in the following example:

Using the following fragment within the C<httpd.conf>:

  PerlModule Apache::Dynagzip
  <Files ~ "*\.html">
    SetHandler perl-script
    PerlHandler Apache::Dynagzip
    PerlSetVar LightCompression On
    PerlSetVar Vary *



( run in 0.897 second using v1.01-cache-2.11-cpan-df04353d9ac )