BerkeleyDB

 view release on metacpan or  search on metacpan

BerkeleyDB.pod.P  view on Meta::CPAN


    $env = new BerkeleyDB::Env
             -Home         => "/home/databases"
    ...

    $db1 = new BerkeleyDB::Hash
	     -Filename => "fred.db",
	     -Env => $env
    ...

    $db2 = new BerkeleyDB::Hash
	     -Filename => "/other/joe.db",
	     -Env => $env
    ...

=item -Server

If present, this parameter should be the hostname of a server that is running
the Berkeley DB RPC server. All databases will be accessed via the RPC server.

=item -Encrypt

If present, this parameter will enable encryption of  all data before
it is written to the database. This parameters must be given a hash
reference. The format is shown below.

    -Encrypt => { -Password => "abc", Flags => DB_ENCRYPT_AES }

Valid values for the Flags are 0 or C<DB_ENCRYPT_AES>.

This option requires Berkeley DB 4.1 or better.

=item -Cachesize

If present, this parameter sets the size of the environments shared memory
buffer pool.

=item -TxMax

If present, this parameter sets the number of simultaneous
transactions that are allowed.  Default 100.  This default is
definitely too low for programs using the MVCC capabilities.

=item -LogConfig

If present, this parameter is used to configure log options.

=item -MaxLockers

If present, this parameter is used to configure the maximum number of
processes doing locking on the database.  Default 1000.

=item -MaxLocks

If present, this parameter is used to configure the maximum number of
locks on the database.  Default 1000.  This is often lower than required.

=item -MaxObjects

If present, this parameter is used to configure the maximum number of
locked objects.  Default 1000.  This is often lower than required.

=item -SharedMemKey

If present, this parameter sets the base segment ID for the shared memory
region used by Berkeley DB.

This option requires Berkeley DB 3.1 or better.

Use C<$env-E<gt>get_shm_key($id)> to find out the base segment ID used
once the environment is open.

=item -ThreadCount

If present, this parameter declares the approximate number of threads that
will be used in the database environment. This parameter is only necessary
when the $env->failchk method will be used. It does not actually set the
maximum number of threads but rather is used to determine memory sizing.

This option requires Berkeley DB 4.4 or better. It is only supported on
Unix/Linux.

=item -BlobThreshold

Sets the size threshold that will be used to decide when data is stored as
a BLOB. This option must be set for a blobs to be used.

This option requires Berkeley DB 6.0 or better.

=item -BlobDir

The directory where the BLOB objects are stored.

If not specified blob files are stores in the environment directoy.


This option requires Berkeley DB 6.0 or better.

=item -Config

This is a variation on the C<-Home> parameter, but it allows finer
control of where specific types of files will be stored.

The parameter expects a reference to a hash. Valid keys are:
B<DB_DATA_DIR>, B<DB_LOG_DIR> and B<DB_TMP_DIR>

The code below shows an example of how it can be used.

    $env = new BerkeleyDB::Env
             -Config => { DB_DATA_DIR => "/home/databases",
                          DB_LOG_DIR  => "/home/logs",
                          DB_TMP_DIR  => "/home/tmp"
                        }
    ...

=item -ErrFile

Expects a filename or filehandle. Any errors generated internally by
Berkeley DB will be logged to this file. A useful debug setting is to
open environments with either

BerkeleyDB.pod.P  view on Meta::CPAN

=over 5

=item B<DB_RMW>

TODO.

=back

=head2 ($flag, $old_offset, $old_length) = $db->partial_set($offset, $length) ;

TODO

=head2 ($flag, $old_offset, $old_length) = $db->partial_clear() ;

TODO

=head2 $db->byteswapped()

TODO

=head2 $status = $db->get_blob_threshold($t1) ;

Sets the parameter $t1 to the threshold value (in bytes) that is used to
determine when a data item is stored as a Blob.

=head2 $status = $db->get_blob_dir($dir) ;

Sets the $dir parameter to the directory where blob files are stored.

=head2 $db->type()

Returns the type of the database. The possible return code are B<DB_HASH>
for a B<BerkeleyDB::Hash> database, B<DB_BTREE> for a B<BerkeleyDB::Btree>
database and B<DB_RECNO> for a B<BerkeleyDB::Recno> database. This method
is typically used when a database has been opened with
B<BerkeleyDB::Unknown>.

=head2   $bool = $env->cds_enabled();

Returns true if the Berkeley DB environment C<$env> has been opened on
CDS mode.

=head2   $bool = $db->cds_enabled();

Returns true if the database C<$db> has been opened on CDS mode.

=head2 $lock = $db->cds_lock();

Creates a CDS write lock object C<$lock>.

It is a fatal error to attempt to create a cds_lock if the Berkeley DB
environment has not been opened in CDS mode.

=head2 $lock->cds_unlock();

Removes a CDS lock. The destruction of the CDS lock object automatically
calls this method.

Note that if multiple CDS lock objects are created, the underlying write
lock will not be released until all CDS lock objects are either explicitly
unlocked with this method, or the CDS lock objects have been destroyed.

=head2 $ref = $db->db_stat()

Returns a reference to an associative array containing information about
the database. The keys of the associative array correspond directly to the
names of the fields defined in the Berkeley DB documentation. For example,
in the DB documentation, the field B<bt_version> stores the version of the
Btree database. Assuming you called B<db_stat> on a Btree database the
equivalent field would be accessed as follows:

    $version = $ref->{'bt_version'} ;

If you are using Berkeley DB 3.x or better, this method will work will
all database formats. When DB 2.x is used, it only works with
B<BerkeleyDB::Btree>.

=head2 $status = $db->status()

Returns the status of the last C<$db> method called.

=head2 $status = $db->truncate($count)

Truncates the database and returns the number or records deleted
in C<$count>.

=head2  $status = $db->compact($start, $stop, $c_data, $flags, $end);

Compacts the database C<$db>.

All the parameters are optional - if only want to make use of some of them,
use C<undef> for those you don't want.  Trailing unused parameters can be
omitted. For example, if you only want to use the C<$c_data> parameter to
set the C<compact_fillpercent>, write you code like this

    my %hash;
    $hash{compact_fillpercent} = 50;
    $db->compact(undef, undef, \%hash);

The parameters operate identically to the C equivalent of this method.
The C<$c_data> needs a bit of explanation - it must be a hash reference.
The values of the following keys can be set before calling C<compact> and
will affect the operation of the compaction.

=over 5

=item * compact_fillpercent

=item * compact_timeout

=back

The following keys, along with associated values, will be created in the
hash reference if the C<compact> operation was successful.

=over 5

=item * compact_deadlock

=item * compact_levels

BerkeleyDB.pod.P  view on Meta::CPAN

        or die "cannot open database: $BerkeleyDB::Error\n";

or this, if you use the tied interface

    tie %hash, "BerkeleyDB::Hash",
                -Filename       => 'test2.db',
                -Flags          => DB_CREATE,
                -Env            => $env
        or die "cannot open database: $BerkeleyDB::Error\n";

The first thing to note is that you B<MUST> always use a Berkeley DB
environment if you want to use locking with Berkeley DB.

Remember, that apart from the actual database files you explicitly create
yourself, Berkeley DB will create a few behind the scenes to handle locking
- they usually have names like "__db.001". It is therefore a good idea to
use the C<-Home> option, unless you are happy for all these files to be
written in the current directory.

Next, remember to include the C<DB_CREATE> flag when opening the
environment for the first time. A common mistake is to forget to add this
option and then wonder why the application doesn't work.

Finally, it is vital that all processes that are going to access the
database files use the same Berkeley DB environment.


=head2 Safely Updating a Record

One of the main gotchas when using CDS is if you want to update a record in
a database, i.e. you want to retrieve a record from a database, modify it
in some way and put it back in the database.

For example, say you are writing a web application and you want to keep a
record of the number of times your site is accessed in a Berkeley DB
database. So your code will have a line of code like this (assume, of
course, that C<%hash> has been tied to a Berkeley DB database):

    $hash{Counter} ++ ;

That may look innocent enough, but there is a race condition lurking in
there. If I rewrite the line of code using the low-level Berkeley DB API,
which is what will actually be executed, the race condition may be more
apparent:

    $db->db_get("Counter", $value);
    ++ $value ;
    $db->db_put("Counter", $value);

Consider what happens behind the scenes when you execute the commands
above.  Firstly, the existing value for the key "Counter" is fetched from
the database using C<db_get>. A read lock will be used for this part of the
update.  The value is then incremented, and the new value is written back
to the database using C<db_put>. This time a write lock will be used.

Here's the problem - there is nothing to stop two (or more) processes
executing the read part at the same time. Remember multiple processes can
hold a read lock on the database at the same time. So both will fetch the
same value, let's say 7, from the database. Both increment the value to 8
and attempt to write it to the database. Berkeley DB will ensure that only
one of the processes gets a write lock, while the other will be blocked. So
the process that happened to get the write lock will store the value 8 to
the database and release the write lock. Now the other process will be
unblocked, and it too will write the value 8 to the database. The result,
in this example, is we have missed a hit in the counter.

To deal with this kind of scenario, you need to make the update atomic. A
convenience method, called C<cds_lock>, is supplied with the BerkeleyDB
module for this purpose. Using C<cds_lock>, the counter update code can now
be rewritten thus:

    my $lk = $dbh->cds_lock() ;
    $hash{Counter} ++ ;
    $lk->cds_unlock;

or this, where scoping is used to limit the lifetime of the lock object

    {
        my $lk = $dbh->cds_lock() ;
        $hash{Counter} ++ ;
    }

Similarly, C<cds_lock> can be used with the native Berkeley DB API

    my $lk = $dbh->cds_lock() ;
    $db->db_get("Counter", $value);
    ++ $value ;
    $db->db_put("Counter", $value);
    $lk->unlock;


The C<cds_lock> method will ensure that the current process has exclusive
access to the database until the lock is either explicitly released, via
the C<< $lk->cds_unlock() >> or by the lock object being destroyed.

If you are interested, all that C<cds_lock> does is open a "write" cursor.
This has the useful side-effect of holding a write-lock on the database
until the cursor is deleted. This is how you create a write-cursor

    $cursor = $db->db_cursor(DB_WRITECURSOR);

If you have instantiated multiple C<cds_lock> objects for one database
within a single process, that process will hold a write-lock on the
database until I<ALL> C<cds_lock> objects have been destroyed.

As with all write-cursors, you should try to limit the scope of the
C<cds_lock> to as short a time as possible. Remember the complete database
will be locked to other process whilst the write lock is in place.

=head2 Cannot write with a read cursor while a write cursor is active

This issue is easier to demonstrate with an example, so consider the code
below. The intention of the code is to increment the values of all the
elements in a database by one.

    # Assume $db is a database opened in a CDS environment.

    # Create a write-lock
    my $lock = $db->db_cursor(DB_WRITECURSOR);
    # or
    # my $lock = $db->cds_lock();


    my $cursor = $db->db_cursor();

    # Now loop through the database, and increment
    # each value using c_put.
    while ($cursor->c_get($key, $value, DB_NEXT) == 0)
    {
         $cursor->c_put($key, $value+1, DB_CURRENT) == 0
             or die "$BerkeleyDB::Error\n";
    }


When this code is run, it will fail on the C<c_put> line with this error

    Write attempted on read-only cursor

The read cursor has automatically disallowed a write operation to prevent a
deadlock.


So the rule is -- you B<CANNOT> carry out a write operation using a
read-only cursor (i.e. you cannot use C<c_put> or C<c_del>) whilst another
write-cursor is already active.

The workaround for this issue is to just use C<db_put> instead of C<c_put>,
like this

    # Assume $db is a database opened in a CDS environment.

    # Create a write-lock
    my $lock = $db->db_cursor(DB_WRITECURSOR);
    # or
    # my $lock = $db->cds_lock();


    my $cursor = $db->db_cursor();

    # Now loop through the database, and increment
    # each value using c_put.
    while ($cursor->c_get($key, $value, DB_NEXT) == 0)
    {
         $db->db_put($key, $value+1) == 0
             or die "$BerkeleyDB::Error\n";
    }



=head2 Implicit Cursors

All Berkeley DB cursors will hold either a read lock or a write lock on the
database for the existence of the cursor. In order to prevent blocking of
other processes you need to make sure that they are not long lived.

There are a number of instances where the Perl interface to Berkeley DB
will create a cursor behind the scenes without you being aware of it. Most
of these are very short-lived and will not affect the running of your
script, but there are a few notable exceptions.

Consider this snippet of code

    while (my ($k, $v) = each %hash)
    {
        # do something
    }


To implement the "each" functionality, a read cursor will be created behind
the scenes to allow you to iterate through the tied hash, C<%hash>. While
that cursor is still active, a read lock will obviously be held against the
database. If your application has any other writing processes, these will
be blocked until the read cursor is closed. That won't happen until the
loop terminates.

To avoid blocking problems, only keep cursors open as long as they are
needed. The same is true when you use the C<cursor> method or the
C<cds_lock> method.


The locking behaviour of the C<values> or C<keys> functions, shown below,
is subtly different.

    foreach my $k (keys %hash)
    {
        # do something
    }

    foreach my $v (values %hash)
    {
        # do something
    }


Just as in the C<each> function, a read cursor will be created to iterate
over the database in both of these cases. Where C<keys> and C<values>
differ is the place where the cursor carries out the iteration through the
database. Whilst C<each> carried out a single iteration every time it was
invoked, the C<keys> and C<values> functions will iterate through the
entire database in one go -- the complete database will be read into memory
before the first iteration of the loop.

Apart from the fact that a read lock will be held for the amount of time
required to iterate through the database, the use of C<keys> and C<values>
is B<not> recommended because it will result in the complete database being
read into memory.


=head2 Avoiding Deadlock with multiple databases

If your CDS application uses multiple database files, and you need to write
to more than one of them, you need to be careful you don't create a
deadlock.

For example, say you have two databases, D1 and D2, and two processes, P1
and P2. Assume you want to write a record to each database. If P1 writes
the records to the databases in the order D1, D2 while process P2 writes
the records in the order D2, D1, there is the potential for a deadlock to
occur.

This scenario can be avoided by either always acquiring the write locks in
exactly the same order in your application code, or by using the
C<DB_CDB_ALLDB> flag when opening the environment. This flag will make a
write-lock apply to all the databases in the environment.

Add example here

=head1 DBM Filters

A DBM Filter is a piece of code that is be used when you I<always>
want to make the same transformation to all keys and/or values in a DBM
database. All of the database classes (BerkeleyDB::Hash,
BerkeleyDB::Btree and BerkeleyDB::Recno) support DBM Filters.



( run in 0.737 second using v1.01-cache-2.11-cpan-5837b0d9d2c )