\bsay[ (] results from the CPAN

\bsay[ (]
Apache-Filter
view release on metacpan or search on metacpan
Currently the following public modules are Filter-aware.  Please tell
me of others you know about.

 Apache::Registry (using Apache::RegistryFilter, included here)
 Apache::SSI
 Apache::ASP
 HTML::Mason
 Apache::SimpleReplace
 Text::Forge

=head1 METHODS

Apache::Filter is a subclass of Apache, so all Apache methods are available.

This module doesn't create an Apache handler class of its own - rather, it adds some
methods to the Apache:: class.  Thus, it's really a mix-in package
that just adds functionality to the $r request object.

=over 4

=item * $r = $r->filter_register()

Every Filter-aware module must call this method exactly once, so that
Apache::filter can properly rotate its filters from previous handlers,
and so it can know when the output should eventually go to the
browser.

=item * $r->filter_input()

This method will give you a filehandle that contains either the file 
requested by the user ($r->filename), or the output of a previous filter.
If called in a scalar context, that filehandle is all you'll get back.  If
called in a list context, you'll also get an Apache status code (OK, 
NOT_FOUND, or FORBIDDEN) that tells you whether $r->filename was successfully
found and opened.  If it was not, the filehandle returned will be undef.

=item * $r->changed_since($time)

Returns true or false based on whether the current input seems like it 
has changed since C<$time>.  Currently the criteria to figure this out
is this: if the file pointed to by C<$r-E<gt>finfo> hasn't changed since
the time given, and if all previous filters in the chain are deterministic
(see below), then we return false.  Otherwise we return true.

This method is meant to be useful in implementing caching schemes.

A caution: always call the C<changed_since()> and C<deterministic()> methods
B<AFTER> the C<filter_register()> method.  This is because Apache::Filter uses a 
crude counting method to figure out which handler in the chain is currently 
executing, and calling these routines out of order messes up the counting.

=item * $r->deterministic(1|0);

As of version 0.07, the concept of a "deterministic" filter is supported.  A
deterministic filter is one whose output is entirely determined by the contents
of its input file (whether the $r->filename file or the output of another filter),
and doesn't depend at all on outside factors.  For example, a filter that translates
all its output to upper-case is deterministic, but a filter that adds a date
stamp to a page, or looks things up in a database which may vary over time, is not.

Why is this a big deal?  Let's say you have the following setup:

 <Files ~ "\.boffo$">
  SetHandler perl-script
  PerlSetVar Filter On
  PerlHandler Apache::FormatNumbers Apache::DoBigCalculation
  # The above are fake modules, you get the idea
 </Files>

Suppose the FormatNumbers module is deterministic, and the DoBigCalculation module
takes a long time to run.  The DoBigCalculation module can now cache its results,
so that when an input file is unchanged on disk, its results will remain known
when passed through the FormatNumbers module, and the DoBigCalculation module
will be able to used cached results from a previous run.

The guts of the modules would look something like this:

 sub Apache::FormatNumbers::handler {
    my $r = shift;
    $r->content_type("text/html");
    my ($fh, $status) = $r->filter_input();
    return $status unless $status == OK;
    $r->deterministic(1); # Set to true; default is false
    
    # ... do some formatting, print to STDOUT
    return OK;
 }
 
 sub Apache::DoBigCalculation::handler {
    my $r = shift;
    $r->content_type("text/html");
    my ($fh, $status) = $r->filter_input();
    return $status unless $status == OK;
    
    # This module implements a caching scheme by using the 
    # %cache_time and %cache_content hashes.
    my $time = $cache_time{$r->filename};
    my $output;
    if ($r->changed_since($time)) {
        # Read from <$fh>, perform a big calculation on it, and print to STDOUT
    } else {
        print $cache_content{$r->filename};
    }
    
    return OK;
 }

A caution: always call the C<changed_since()> and C<deterministic()> methods
B<AFTER> the C<filter_register()> method.  This is because Apache::Filter uses a 
crude counting method to figure out which handler in the chain is currently 
executing, and calling these routines out of order messes up the counting.


=back


=head1 HEADERS

In previous releases of this module, it was dangerous to call
$r->send_http_header(), because a previous/postvious filter might also
try to send headers, and then you'd have duplicate headers getting
sent.  In current releases you can simply send the headers.  If the
current filter is the last filter, the headers will be sent as usual,
and otherwise send_http_header() is a no-op.

=head1 NOTES

You'll notice in the SYNOPSIS that I say C<"PerlSetVar Filter On">.  That
information isn't actually used by this module, it's used by modules which
are themselves filters (like Apache::SSI).  I hereby suggest that filtering
modules use this parameter, using it as the switch to detect whether they 
should call $r->filter_register.  However, it's often not necessary -
there is very little overhead in simply calling $r->filter_register
even when you don't need to do any filtering, and $r->filter_input can
be a handy way of opening the $r->filename file.

VERY IMPORTANT: if one handler in a stacked handler chain uses 
C<Apache::Filter>, then THEY ALL MUST USE IT.  This means they all must
call $r->filter_register exactly once.  Otherwise C<Apache::Filter> couldn't
capture the output of the handlers properly, and it wouldn't know when
to release the output to the browser.

The output of each filter (except the last) is accumulated in memory
before it's passed to the next filter, so memory requirements are
large for large pages.  Apache::OutputChain only needs to keep one
item from print()'s argument list in memory at a time, so it doesn't
have this problem, but there are others (each chunk is filtered
independently, so content spanning several chunks won't be properly
parsed).  In future versions I might find a way around this, or cache
large pages to disk so memory requirements don't get out of hand.
We'll see whether it's a problem.

A couple examples of filters are provided with this distribution in the t/
subdirectory: UC.pm converts all its input to upper-case, and Reverse.pm
prints the lines of its input reversed.

Finally, a caveat: in version 0.09 I started explicitly setting the
Content-Length to undef.  This prevents early
filters from incorrectly setting the content length, which will almost
certainly be wrong if there are any filters after it.  This means that
if you write any filters which set the content length, they should do
it B<after> the $r->filter_register call.

=head1 TO DO

Add a buffered mode to the final output, so that we can send a proper
Content-Length header. [gozer@hbesoftware.com (Philippe M. Chiasson)]

=head1 BUGS

This uses some funny stuff to figure out when the currently executing
handler is the last handler in the chain.  As a result, code that
manipulates the handler list at runtime (using push_handlers and the
like) might produce mayhem.  Poke around a bit in the code before you 
try anything.  Let me know if you have a better idea.

As of 0.07, Apache::Filter will automatically return DECLINED when
$r->filename points to a directory.  This is just because in most
cases this is what you want to do (so that mod_dir can take care of
the request), and because figuring out the "right" way to handle
directories seems pretty tough - the right way would allow a directory
indexing handler to be a filter, which isn't possible now.  Also, you
can't properly pass control to a non-mod_perl indexer like
mod_autoindex.  Suggestions are welcome.

I haven't considered what will happen if you use this and you haven't
turned on PERL_STACKED_HANDLERS.  So don't do it.
( run in 2.338 seconds using v1.01-cache-2.11-cpan-524268b4103 )