streaming results from the CPAN

streaming
PSGI
view release on metacpan or search on metacpan
See also L<IO::Handle::Util> to turn anything iterators-like into
IO::Handle-like.

=head3 How should server determine to switch to sendfile(2) based serving?

First of all, an application SHOULD always set a IO::Handle-like
object (or an array of chunks) that responds to C<getline> and
C<close> as a body. That is guaranteed to work with any servers.

Optionally, if the server is written in perl or can tell a file
descriptor number to the C-land to serve the file, then the server MAY
check if the body is a real filehandle (possibly using
L<Plack::Util>'s C<is_real_fh> function), then get a file descriptor
with C<fileno> and call sendfile(2) or equivalent zero-copy data
transfer using that.

Otherwise, if the server can't send a file using the file descriptor
but needs a local file path (like mod_perl or nginx), the application
can return an IO::Handle-like object that also responds to C<path>
method. This type of IO-like object can easily be created using
L<IO::File::WithPath>, L<IO::Handle::Util> or L<Plack::Util>'s
C<set_io_path> function.

Middlewares can also look to see if the body has C<path> method and
does something interesting with it, like setting C<X-Sendfile>
headers.

To summarize:

=over 4

=item *

When to serve static files, applications should always return a real
filehandle or IO::Handle object. That should work everywhere, and can
be optimized in some environments.

=item *

Applications can also set IO::Handle like object with an additional
C<path> method, then it should work everywhere again, and can be
optimized in even more environments.

=back

=head3 What if I want to stream content or do a long-poll Comet?

The most straightforward way to implement server push is for your
application to return a IO::Handle-like object as a content body that
implements C<getline> to return pushed content. This is guaranteed to
work everywhere, but it's more like I<pull> than I<push>, and it's
hard to do non-blocking I/O unless you use Coro.

If you want to do server push, where your application runs in an event
loop and push content body to the client as it's ready, you should
return a callback to delay the response.

  # long-poll comet like a chat application
  my $app = sub {
      my $env = shift;
      unless ($env->{'psgi.streaming'}) {
          die "This application needs psgi.streaming support";
      }
      return sub {
          my $respond = shift;
          wait_for_new_message(sub {
              my $message = shift;
              my $body = [ $message->to_json ];
              $respond->([200, ['Content-Type', 'application/json'], $body]);
          });
      };
  };

C<wait_for_new_message> can be blocking or non-blocking: it's up to
you. Most of the case you want to run it non-blockingly and should use
event loops like L<AnyEvent>. You may also check C<psgi.nonblocking>
value to see that it's possible and fallback to a blocking call
otherwise.

Also, to stream the content body (like streaming messages over the
Flash socket or multipart XMLHTTPRequest):

  my $app = sub {
      my $env = shift;
      unless ($env->{'psgi.streaming'}) {
          die "This application needs psgi.streaming support";
      }
      return sub {
          my $respond = shift;
          my $writer = $respond->([200, ['Content-Type', 'text/plain']]);
          wait_for_new_message(sub {
              my $message = shift;
              if ($message) {
                  $writer->write($message->to_json);
              } else {
                  $writer->close;
              }
          });
      };
  };

=head3 Which framework should I use to do streaming though?

We have servers that support non-blocking (where C<psgi.nonblocking>
is set to true), but the problem is that framework side doesn't
necessarily support asynchronous event loop. For instance Catalyst has
C<write> method on the response object:

  while ($cond) {
      $c->res->write($some_stuff);
  }

This should work with all servers with C<psgi.streaming> support even
if they are blocking, and it should be fine if they're running in
multiple processes (C<psgi.multiprocess> is true).

L<Catalyst::Engine::PSGI> also supports setting an IO::Handle-like
object that supports C<getline>, so using L<IO::Handle::Util>

  my $io = io_from_getline sub {
       return $data; # or undef when done()
  };
  $c->res->body($io);

And that works fine to do streaming, but it's blocking (I<pull>)
rather than asynchronous server push, so again you should be careful
not to run this application on non-blocking (and non-multiprocess)
server environments.

We expect that more web frameworks will appear that is focused on, or
existent frameworks will add support for, asynchronous and
non-blocking streaming interface.

=head3 Is psgi.streaming interface a requirement for the servers?

It is specified as B<SHOULD>, so unless there is a strong reason not
to implement the interface, all servers are encouraged to implement
this interface.

However, if you implement a PSGI server using an Perl XS interface for
the ultimate performance or integration with web servers like Apache
or nginx, or implement a sandbox like environment (like Google
AppEngine or Heroku) or distributed platform using tools like Gearman,
you might not want to implement this interface.

That's fine, and in that case applications relying on the streaming
interface can still use L<Plack::Middleware::BufferedStreaming> to
fallback to the buffered write on unsupported servers.

=head3 Why CGI-style environment variables instead of HTTP headers as a hash?

Most existing web application frameworks already have code or a handler
to run under the CGI environment. Using CGI-style hash keys instead of
HTTP headers makes it trivial for the framework developers to implement
an adapter to support PSGI. For instance, L<Catalyst::Engine::PSGI> is
only a few dozens lines different from L<Catalyst::Engine::CGI> and was
written in less than an hour.

=head3 Why is PATH_INFO URI decoded?

To be compatible with CGI spec (RFC 3875) and most web servers'
implementations (like Apache and lighttpd).

I understand it could be inconvenient that you can't distinguish
C<foo%2fbar> from C<foo/bar> in the trailing path, but the CGI spec
clearly says C<PATH_INFO> should be decoded by servers, and that web
servers can deny such requests containing C<%2f> (since such requests
would lose information in PATH_INFO). Leaving those reserved characters
undecoded (partial decoding) would make things worse, since then you
can't tell C<foo%2fbar> from C<foo%252fbar> and could be a security hole
with double encoding or decoding.

For web application frameworks that need more control over the actual
raw URI (such as L<Catalyst>), we made the C<REQUEST_URI> environment
hash key REQUIRED. The servers should set the undecoded (unparsed)
original URI (containing the query string) to this key. Note that
C<REQUEST_URI> is completely raw even if the encoded entities are
URI-safe.

For comparison, WSGI (PEP-333) defines both C<SCRIPT_NAME> and
C<PATH_INFO> be decoded and Rack leaves it implementation dependent,
while I<fixing> most of PATH_INFO left encoded in Ruby web server
implementations.

L<http://www.python.org/dev/peps/pep-0333/#url-reconstruction>
L<http://groups.google.com/group/rack-devel/browse_thread/thread/ddf4622e69bea53f>

=head1 SEE ALSO

WSGI's FAQ clearly answers lots of questions about how some API design
decisions were made, some of which can directly apply to PSGI.

L<http://www.python.org/dev/peps/pep-0333/#questions-and-answers>

=head1 MORE QUESTIONS?

If you have a question that is not answered here, or things you totally
disagree with, come join the IRC channel #plack on irc.perl.org or
mailing list L<http://groups.google.com/group/psgi-plack>. Be sure you
clarify which hat you're wearing: application developers, server
implementors or middleware developers. And don't criticize the spec just
to criticize it: show your exact code that doesn't work or get too messy
because of spec restrictions etc. We'll ignore all nitpicks and bikeshed
discussion.

=head1 AUTHOR
( run in 0.729 second using v1.01-cache-2.11-cpan-39bf76dae61 )