streaming results from the CPAN

RDF-Sesame
Evaluates a construct query and returns the RDF serialization of the resulting
RDF graph.  A minimal invocation looks something like:

    my $q = qq(
        CONSTRUCT {Parent} ex:hasChild {Child}
        FROM {Child} ex:hasParent {Parent}
        USING NAMESPACE
            ex = <http://example.org/things#>
    );
    my $rdf = $repo->construct(
        query  => $q,
        format => 'turtle',
    );

If an error occurs during the construction, an exception is thrown.  This is
different from some RDF::Sesame methods which return C<undef>.

=head3 format

    Required: Yes

Indicates the RDF serialization format that the Sesame server should return.
Acceptable values are 'rdfxml', 'turtle' and 'ntriples'.

=head3 language

    Default: SeRQL

Specifies the language in which the construct query is written.  This is only
included for forwards-compatibility since the only query language supported by
Sesame is SeRQL.

=head3 output

    Default: undef

Indicates where the RDF serialization should be placed.  The default value of
C<undef> means that the serialization should simply be returned as the value
of the C<construct> method.

If the value is a filehandle, the serialization is written to that filehandle.
The filehandle must already be open for writing.  Otherwise, the value is
taken to be a filename which is opened for writing (clobbering existing
contents) and the serialization is written to the file.

=head3 query

    Required : Yes

The text of the construct query.

=head2 extract ( %opts )

Extract an RDF representation of all the triples in the repository.  The only
required option is L</format> which specifies the serialization format of the
resulting RDF.  The minimal method invocation looks like

    my $rdf = $repo->extract( format => 'turtle' )

where C<$rdf> is a reference to a scalar containing the serialization of all
the triples in the repository.  The streaming results returned by Sesame are
handled appropriately so that memory usage in minimized.  If the output is
sent to a file (see L</output>), only one "chunk" is held in memory at a time
(subject to caching by your OS).  The serialization may also be compressed (or
otherwise processed) as it's being streamed from the server (see
L</compress>).

Error handling is done differently in this method than in other methods in
L<RDF::Sesame>.  Namely, if an error occurs, an exception is thrown (rather
than returning undef and setting C<errstr()>.  Eventually, I'd like all
methods to behave this way.

=head3 compress

    Default: 'none'

Indicates how the RDF serialization returned by the Sesame server should be
compressed (or otherwise processed) before it's sent to the designated output
destination (see L</output)>.  The default value of C<none> indicates that no
compression or processing should be performed.  The value C<gz> indicates that
L<Compress::Zlib> should be used to compress the serialization into the gzip
file format.  Unfortunately, gzip compression is incompatible with an C<undef>
value of the L</output> option.  This is because of a problem with
L<Compress::Zlib> writing to in-memory filehandles.  If you try it, you'll get
an error message about a "bad file descriptor".

One may also specify a hash reference as the value of this option.  The hash
reference should contain the keys 'init', 'content', and 'finish'.  The value
for each key should be a subroutine reference which will be called during the
extraction process.

The 'init' coderef is called before any data is received from Sesame.  It
receives an output filehandle as its sole argument and should return a
"context" value which will be passed to the 'content' and 'finish' callbacks.
The context may be any value, but objects and hashrefs seem to be the most
useful.

The 'content' coderef is called once for each chunk of data returned from the
Sesame server.  It receives the context, the output filehandle and a
serialization chunk as arguments.  Its return value is ignored.

The 'finish' coderef is called after all data has been received from the
server and after the last call to the 'content' coderef has completed.
'finish' receives the context and the output filehandle as arguments.  Its
return value is ignored.

Here is a short example of using callbacks to implement gzip compression (of
course gzip compression is already implemented by specifying 'gz' as the
compression value):

    my $rdf_gz = $repo->extract(
        format   => 'turtle',
        compress => {
            init => sub {
                my ($fh) = @_;
                require Compress::Zlib;
                binmode $fh;
                my $gz = Compress::Zlib::gzopen( $fh, 'wb' );
                return $gz;    # our context object
            },
            content => sub {
( run in 1.042 second using v1.01-cache-2.11-cpan-39bf76dae61 )