RDFStore
view release on metacpan or search on metacpan
the Perl language environment.
By using the Perl TIE interface, a generic application script can access RDF
triplets using normal key/value hashes; the storage can happen either
in-memory data structures (not tie) or on the local filesystem by using the
DB_File.pm or BerkeleyDB.pm modules. An experimental remote storage service
is also provided using a custom DBMS.pm module coupled with a fast and
performant TCP/IP deamon (http://rdfstore.sourceforge.net/dbms.html). The
deamon has been written entirely in the C language and is actually storing
the data in Berkeley DB v1.x files; such a software is similar to the rdfbd
(http://web1.guha.com/rdfdb/) approach from Guha. The input RDF files are
being parsed and processed by using a streaming SiRPAC like parser completely
written in Perl. Such an implementation includes most of the proposed bug
fixes and updates as suggested on the W3C RDF-interest-Group mailing list and
on the SiRPAC Web site. A strawman parser for a simplified syntax proposed
by Jonathan Borden at http://www.openhealth.org/RDF/rdf_Syntax_and_Names.htm,
Jason Diamond's at http://www.injektilo.org/rdf/rdf.xsl and Dan Connolly at
http://www.w3.org/XML/2000/04rdf-parse/ is also included. By using the Sablotron
XSLT engine is then possible to easily tranform XML documents to RDF and query
them from the Perl language.
INSTALLATION
doc/SWADe-rdfstore.html view on Meta::CPAN
<H2>The compression algorithm</H2>
Both the graph as well as the free-text words index are relatively sparsely populated which make simple compression possible. The bit arrays used in each can grow to very significant sizes; in the order of several, if not tens of page multiples. Comb...
Initially a Run Length Encoding method was used; with two small optimizations. The first optimization was early termination; i.e. if the remainder of the row would solely contain zero's it would simply not list those explicitly. The second optimizati...
The first issue is that certain values, such as a reference to a schema or a common property are dis-proportionally over represented; by several orders of magnitude (e.g rdf:type property or contextual information). Secondly certain other values; suc...
So for this reason a variant of the Variable Run Length encoding is used along with part of the above RLE method. This method is still applicable to the word indexing but adds the ability to recognize short patterns; and code the patterns which occur...
At this point in time (de-)compression is such that the storage volumes are reasonable, that transfer volumes are manageable and we do not expect to give priority to work in this area. However we expect to examine this issue again and will be looking...
<H2>Conclusion: RDFStore</H2>
RDFStore <a href="#47">[47]</a> is a perl/C toolkit to process, store, retrieve and manage RDF; it consists of a programming API, streaming RDF/XML and N-Triples parsers and a generic hashed data storage which implements the indexing algorithm as des...
<BR>
RDFStore has been successfully used for the development of several Semantic Web applications <a href="#16a">[16a]</a><a href="#16b">[16b]</a><a href="#16c">[16c]</a> and others which read/write and query RDF descriptions using RDQL.
<H2>References</H2>
<a name="1">[1]</a> "A Relational Model of Data for Large Shared Data Banks", E.F. Codd, Communications of the ACM, Vol. 13, No. 6, June 1970, pp. 377-387. <a href="http://www.acm.org/classics/nov95/toc.html">http://www.acm.org/classics/nov95/toc.htm...
<a name="2">[2]</a> P. Buneman, S. Davidson, G. Hillebrand and D. Suciu, "A query language and optimization techniques for unstructured data". In SIGMOD, San Diego, 1996<BR>
<a name="3">[3]</a> S. Abiteboul, D. Quass, J. McHugh, J. Widom and J. Wiener "The lorel query language for semistructured data" 1996 ftp://db.stanford.edu/pub/papers/lorel96.ps<BR>
<a name="4">[4]</a> Dan Brickley, R.V. Guha "RDF Vocabulary Description Language 1.0: RDF Schema" <a href="http://www.w3.org/TR/rdf-schema/">http://www.w3.org/TR/rdf-schema/</a><BR>
<a name="5">[5]</a> Grady Booch "Object-Oriented Analysis and Design with Applications" p. 71-72<BR>
lib/DBD/RDFStore.pm view on Meta::CPAN
# i.e. $sth->{'result'} = ( '?x' => 1, '?y' => Test1 )
#
sub _nextMatch {
my( $sth, $rpi, $gp, $tpi, %bind ) = @_;
if($DBD::RDFStore::st::debug>1) {
print STDERR (" " x $tpi);
print STDERR "$tpi BEGIN\n";
};
# if we have a previous state try to recover it (this is needed for streaming results)
my $bind_state = pop @{ $sth->{'binds'} };
if( ( $bind_state ) && ($DBD::RDFStore::st::debug>1) ) {
print STDERR (" " x $tpi);
print STDERR "RECOVER previous state for $tpi\n";
};
_nextMatch( $sth, $rpi, $gp, $tpi+1, %{$bind_state} )
if( $bind_state );
lib/DBD/RDFStore.pm view on Meta::CPAN
my($sth) = @_;
if($sth->{'RDF_or_XML_stream_finished'}) {
$sth->{'RDF_or_XML_stream_finished'} = 0;
return;
};
return _fetchrow_RDF_or_XML( $sth );
};
# fetch the whole matching graph in one call (not streaming then)
# return RDFStore::Model of matching statements
sub fetchallgraph {
my($sth) = @_;
my $whole_graph;
while ( my $graph = fetchsubgraph($sth) ) {
$whole_graph = $graph
unless($whole_graph);
my $e = $graph->elements;
while(my $ss = $e->each) {
$whole_graph->add($ss);
};
};
return $whole_graph;
};
# should be streaming
sub _fetchrow_RDF_or_XML {
my($sth, $syntax) = @_;
return
if($sth->{'RDF_or_XML_stream_finished'});
unless($syntax) {
$syntax = $sth->{'results'}->{'syntax'}
if(exists $sth->{'results'}->{'syntax'});
};
return
unless( (!$syntax) ||
($syntax =~ m#(RDF/XML|N-Triples|dawg-results|rdf-for-xml|dawg-xml)#i) );
my $result = '';
my $mm = new RDFStore::Model; # we want streaming - that's why this...
# DESCRIBE <URI> are done once in one single subgraph / match
if( ( $sth->{'Statement'}->getQueryType eq 'DESCRIBE' ) &&
( grep m/^<([^>]+)>/, @{ $sth->{'Statement'}->{'describes'} }) ) {
foreach my $d ( @{ $sth->{'Statement'}->{'describes'} } ) {
next
unless($d =~ m/^<([^>]+)>/);
$d = $1;
lib/RDFStore/Parser/NTriples.pm view on Meta::CPAN
return 'genid' . $class->{iReificationCounter}++;
};
1;
};
__END__
=head1 NAME
RDFStore::Parser::NTriples - This module implements a streaming N-Triples parser
=head1 SYNOPSIS
use RDFStore::Parser::NTriples;
use RDFStore::NodeFactory;
my $p=new RDFStore::Parser::NTriples(
ErrorContext => 2,
Handlers => {
Init => sub { print "INIT\n"; },
Final => sub { print "FINAL\n"; },
lib/RDFStore/Parser/NTriples.pm view on Meta::CPAN
persistent => 1,
seevalues => 1,
store_options => { Name => '/tmp/test' }
}
);
$pstore->parsefile('http://www.gils.net/bsr-gils.nt');
=head1 DESCRIPTION
This module implements a N-Triples I<streaming> parser.
=head1 METHODS
=over 4
=item new
This is a class method, the constructor for RDFStore::Parser::NTriples. B<Options> are passed as keyword value
pairs. Recognized options are:
lib/RDFStore/Parser/SiRPAC.pm view on Meta::CPAN
$expat->{SiRPAC}->{EXPECT_Element} = $newElement
if($setScanModeElement);
my $sLiteralValue;
if($expat->{SiRPAC}->{scanMode} ne 'SKIPPING') {
# goes through the attributes of newElement to see
# 1. if there are symbolic references to other nodes in the data model.
# in which case they must be stored for later resolving with
# resolveLater method (fix aboutEach on streaming!!!)
# 2. if there is an identity attribute, it is registered using
# registerResource or registerID method.
my $sResource;
$sResource = getAttributeValue($expat,$newElement->{attlist}, $RDFStore::Parser::SiRPAC::RDFMS_resource);
if (defined $sResource) {
$newElement->{sResource} = normalizeResourceIdentifier($expat,$sResource);
} else {
$sResource = getAttributeValue($expat,$newElement->{attlist}, $RDFStore::Parser::SiRPAC::RDFMS_nodeID);
if (defined $sResource) {
lib/RDFStore/Parser/SiRPAC.pm view on Meta::CPAN
sub namespace { };
};
1;
};
__END__
=head1 NAME
RDFStore::Parser::SiRPAC - This module implements a streaming RDF Parser as a direct implementation of XML::Parser::Expat(3)
=head1 SYNOPSIS
use RDFStore::Parser::SiRPAC;
use RDFStore::NodeFactory;
my $p=new RDFStore::Parser::SiRPAC(
ErrorContext => 2,
Handlers => {
Init => sub { print "INIT\n"; },
Final => sub { print "FINAL\n"; },
lib/RDFStore/Parser/SiRPAC.pm view on Meta::CPAN
}
);
my $rdfstore_model = $pstore->parsefile('http://www.gils.net/bsr-gils.rdfs');
#using the expat no-blocking feature (generally for large XML streams) - see XML::Parse::Expat(3)
my $rdfstore_stream_model = $pstore->parsestream(*STDIN);
=head1 DESCRIPTION
This module implements a Resource Description Framework (RDF) I<streaming> parser completely in
Perl using the XML::Parser::Expat(3) module. The actual RDF parsing happens using an instance of XML::Parser::Expat with Namespaces option enabled and start/stop and char handlers set.
The RDF specific code is based on the modified version of SiRPAC of Sergey Melnik in Java; a lot of
changes and adaptations have been done to actually run it under Perl.
Expat options may be provided when the RDFStore::Parser::SiRPAC object is created. These options are then passed on to the Expat object on each parse call.
Exactly like XML::Parser(3) the behavior of the parser is controlled either by the Style entry elsewhere in this document and/or the Handlers entry elsewhere in this document options, or by the setHandlers entry elsewhere in this document method. The...
To see some examples about how to use it look at the sections below and in the samples and utils directory coming with this software distribution.
E.g.
lib/RDFStore/Parser/SiRPAC.pm view on Meta::CPAN
Benchmarking XML Parsers by Clark Cooper - http://www.xml.com/pub/Benchmark/article.html
See also http://www.w3.org/RDF/Implementations/SiRPAC/SiRPAC-defects.html
RDF::Parser(3) from http://www.pro-solutions.com
=head1 AUTHOR
Alberto Reggiori <areggiori@webweaving.org>
Sergey Melnik <melnik@db.stanford.edu> is the original author of the streaming version of SiRPAC in Java
Clark Cooper is the author of the XML::Parser(3) module together with Larry Wall
( run in 0.372 second using v1.01-cache-2.11-cpan-4d50c553e7e )