transparent results from the CPAN

FlatFile-DataStore
view release on metacpan or search on metacpan
lib/FlatFile/DataStore/FMTEYEWTK.pm view on Meta::CPAN

 - validation of a datastore
 - migration of datastore records to newly configured datastore
 - comparison of pre-migration and post-migration datastores

=head2 Motivation

Several factors motivated the development of this module:

 - the desire for simple, efficient reading and writing of records
 - the desire to handle any number and size of records
 - the desire to identify records using sequence numbers
 - the need to retain previous versions of records and to view update history
 - the ability to store any sort of data: binary or text in any encoding
 - the desire for a relatively simple file structure
 - the desire for the data to be reasonably easily read by a human
 - the ability to easily increase the datastore size (through migration)

The key file makes it easy and efficient to retrieve the current
version of a record--you just need the record's key sequence number.
Other retrievals via file number and seek position (e.g., gotten from a
history list) are also fast and easy.

Because the size and number of data files is configurable, the data
store should scale up to large numbers of (perhaps large) records --
while still retaining efficient reading and writing.

(In the extreme case that a record is too large for a single file,
users might break up the record into parts, store them as multiple data
store records and store a "directory" record to guide the reassembly.
While that's outside the scope of this module, that sort of scheme is
accommodated by the fact that the datastore doesn't care if the record
data is not a complete unit of a known format.)

When a record is created, it is assigned a key sequence number (keynum)
that persistently identifies that record for the life of the data
store.  This should help user-developed indexing schemes that employ,
e.g., bit maps, to remain correct.

Since a record links to it's predecessors, it's easy to get a history
of that record's changes over time.  This can facilitate recovery and
reporting.

Since record retrieval is by seek position and record length in bytes,
any sequence of bytes may be stored and retrieved.  Disparate types of
data may be stored in the same datastore.

Outside of the record data itself, the datastore file structure uses
ascii characters for the key file, toc file, and preambles.  It appends
a record separator, typically a newline character, after each record.
This is intended to make the file structure relatively simple and more
easily read by a human--to aid copying, debugging, disaster recovery,
simple curiosity, etc.

Migration scripts are included in the module distribution.  If your
initial configuration values prove too small to accommodate your data,
you can configure a new datastore with larger values and migrate all
the records to the new datastore.  All of the transaction and sequence
numbers remain the same; the record data and user data are identical;
and interfacing with the new datastore vs. the old one should be
completely transparent to programs using the FlatFile::DataStore
module.

=head2 CRUD cases

 Create: no previous preamble required or allowed
    - create a record object (with no previous)
    - write the record
    - return the record object
 Retrieve:
    - read a data record
    - create a record object (with a preamble, which may become a previous)
    - return the record object
 Update: previous preamble required (and it must not have changed)
    - create a record object (with a previous preamble)
    - write the record (updating the previous in the datastore)
    - return the record object
 Delete: previous preamble required (and it must not have changed)
    - create a record object (with a previous preamble)
    - write the record (updating the previous in the datastore)
    - return the record object

Some notes about the "previous" preamble:

In order to protect data from conflicting concurrent updates, you may
not update or delete a record without first retrieving it from the data
store.  Supplying the previous preamble along with the new record data
is proof that you did this.  Before the new record is written, the
supplied previous preamble is compared with what's in the datastore,
and if they are not exactly the same, it means that someone else
retrieved and updated/deleted the record between the time you read it
and the time you tried to update/delete it.

So unless you supply a previous preamble and unless the one you supply
matches exactly the one in the datastore, your update/delete will not
be accepted--you will have to re-retrieve the new version of the record
(getting a more recent preamble) and apply your updates to it.

=head2 Scaling to infinity (and beyond)

Past experience designing datastores suggests that once a design is in
place, you will always want to throw a lot more data at it that you
thought you were going to.

So in this module, I want to make an extra effort to accommodate all the
data anyone might want to put in a datastore.  For that reason, any
file that increases in size as data is stored (toc file, key file, data
file) may be split into multiple files.  Logically, these are one entity:
the toc files are one table of contents, the key files are one index,
the data files make up a single datastore.  But by allowing them to be
split up, the file sizes can be kept manageable.

Similarly, since the number of files increases, and too many files in a
single directory can be problematic for a number of reasons, the module
accommodates multiple directories for these files.  That is, as the
number of data files grows, they can be stored in multiple data
directories.  As the number of key files grows, and as the number of
toc files grows, they can be stored in multiple key and toc
directories.

To keep things simpler, the specs for the data file number can be
( run in 0.467 second using v1.01-cache-2.11-cpan-39bf76dae61 )