Alien-SVN

 view release on metacpan or  search on metacpan

src/subversion/subversion/libsvn_fs_base/notes/structure  view on Meta::CPAN

table; instead, they always refer into the `representations' table,
which knows whether a given string is a fulltext or a delta, and if it
is a delta, what it is a delta against.  That, combined with the
knowledge in (a) and (b) above, allows Subversion to retrieve the data
and parse it appropriately.  A representation has the form:

   (HEADER KIND-SPECIFIC)

where HEADER is

   (KIND TXN [MD5 [SHA1]])

The KIND is "fulltext" or "delta".  TXN is the txn ID for the txn in
which this representation was created.  MD5 is a checksum of the
representation's contents, that is, what the representation produces,
regardless of whether it is stored deltified or as fulltext.  (For
compatibility with older versions of Subversion, MD5 may be
absent, in which case the filesystem behaves as though the checksum is
there and is correct.) An additional kind of checksum, SHA1, is present
in newer formats, starting with version ...
### TODO

The TXN also serves as a kind of mutability flag: if txn T tries to
change a representation's contents, but the rep's TXN is not T, then
something has gone horribly wrong and T should leave the rep alone
(and probably error).  Of course, "change a representation" here means
changing what the rep's consumer sees.  Switching a representation's
storage strategy, for example from fulltext to deltified, wouldn't
count as a change, since that wouldn't affect what the rep produces.

KIND-SPECIFIC varies considerably depending on the kind of
representation.  Here are the two forms currently recognized:

   (("fulltext" TXN [MD5 [SHA1]]) STRING-KEY)
       The data is at STRING-KEY in the `strings' table.

   (("delta" TXN [MD5 [SHA1]]) (OFFSET WINDOW) ...)
       Each OFFSET indicates the point in the fulltext that this
       element reconstructs, and WINDOW says how to reconstruct it:

       WINDOW ::= (DIFF SIZE REP-KEY [REP-OFFSET]) ;
       DIFF   ::= ("svndiff" VERSION STRING-KEY)

       Notice that a WINDOW holds only metadata.  REP-KEY says what
       the window should be applied against, or none if this is a
       self-compressed delta; SIZE says how much data this window
       reconstructs; VERSION says what version of the svndiff format
       is being used (currently only version 0 is supported); and
       STRING-KEY says which string contains the actual svndiff data
       (there is no diff data held directly in the representations
       table, of course).

       Note also that REP-KEY might refer to a representation that
       itself requires undeltification.  We use a delta combiner to
       combine all the deltas needed to reproduce the fulltext from
       some stored plaintext.

       Branko says this is what REP-OFFSET is for:
       > The offsets embedded in the svndiff are stored in a string;
       > these offsets would be in the representation. The point is that
       > you get all the information you need to select the appropriate
       > windows from the rep skel -- without touching a single
       > string. This means a bit more space used in the repository, but
       > lots less memory used on the server.

       We'll see if it turns out to be necessary.

In the future, there may be other representations, for example
indicating that the text is stored elsewhere in the database, or
perhaps in an ordinary Unix file.

Let's work through an example node revision:

   (("file" REV COUNT) PROP-KEY "2345")

The entry for key "2345" in `representations' is:

   (("delta" TXN CHECKSUM) (0 (("svndiff" 0 "1729") 65 "2343")))

and the entry for key "2343" in `representations' is:

   (("fulltext" TXN CHECKSUM) "1001")

while the entry for key "1729" in `strings' is:

   <some unprintable glob of svndiff data>

which, when applied to the fulltext at key "1001" in strings, results
in this new fulltext:

   "((some text) (that looks) (deceptively like) (directory entries))"

Et voila!  Subversion knew enough, via the `representations' and
`strings' tables, to undeltify and get that fulltext; and knew enough,
because of the node revision's "file" type, to interpret the result as
file contents, not as a directory entry list.

(Note that the `strings' table stores multiple DB values per key.
That is, although it's accurate to say there is one string per key,
the string may be divided into multiple consecutive blocks, all
sharing that key.  You use a Berkeley DB cursor to find the desired
value[s], when retrieving a particular offset+len in a string.)

Representations know nothing about ancestry -- the `representations'
table never refers to node revision id's, only to strings or to other
representations.  In other words, while the `nodes' table allows
recovery of ancestry information, the `representations' and `strings'
tables together handle deltification and undeltification
*independently* of ancestry.  At present, Subversion generally stores
the youngest strings in "fulltext" form, and older strings as "delta"s
against them (unless the delta would save no space compared to the
fulltext).  However, there's nothing magic about that particular
arrangement.  Other interesting alternatives:

   * We could store the N most recently accessed strings as fulltexts,
     letting access patterns determine the most appropriate
     representation for each revision.

   * We could occasionally store deltas against the N'th younger
     revision, storing larger jumps with a frequency inverse to the
     distance covered, yielding a tree-structured history.



( run in 0.742 second using v1.01-cache-2.11-cpan-02777c243ea )