Alien-SVN
view release on metacpan or search on metacpan
src/subversion/subversion/libsvn_fs_base/notes/structure view on Meta::CPAN
WINDOW ::= (DIFF SIZE REP-KEY [REP-OFFSET]) ;
DIFF ::= ("svndiff" VERSION STRING-KEY)
Notice that a WINDOW holds only metadata. REP-KEY says what
the window should be applied against, or none if this is a
self-compressed delta; SIZE says how much data this window
reconstructs; VERSION says what version of the svndiff format
is being used (currently only version 0 is supported); and
STRING-KEY says which string contains the actual svndiff data
(there is no diff data held directly in the representations
table, of course).
Note also that REP-KEY might refer to a representation that
itself requires undeltification. We use a delta combiner to
combine all the deltas needed to reproduce the fulltext from
some stored plaintext.
Branko says this is what REP-OFFSET is for:
> The offsets embedded in the svndiff are stored in a string;
> these offsets would be in the representation. The point is that
> you get all the information you need to select the appropriate
> windows from the rep skel -- without touching a single
> string. This means a bit more space used in the repository, but
> lots less memory used on the server.
We'll see if it turns out to be necessary.
In the future, there may be other representations, for example
indicating that the text is stored elsewhere in the database, or
perhaps in an ordinary Unix file.
Let's work through an example node revision:
(("file" REV COUNT) PROP-KEY "2345")
The entry for key "2345" in `representations' is:
(("delta" TXN CHECKSUM) (0 (("svndiff" 0 "1729") 65 "2343")))
and the entry for key "2343" in `representations' is:
(("fulltext" TXN CHECKSUM) "1001")
while the entry for key "1729" in `strings' is:
<some unprintable glob of svndiff data>
which, when applied to the fulltext at key "1001" in strings, results
in this new fulltext:
"((some text) (that looks) (deceptively like) (directory entries))"
Et voila! Subversion knew enough, via the `representations' and
`strings' tables, to undeltify and get that fulltext; and knew enough,
because of the node revision's "file" type, to interpret the result as
file contents, not as a directory entry list.
(Note that the `strings' table stores multiple DB values per key.
That is, although it's accurate to say there is one string per key,
the string may be divided into multiple consecutive blocks, all
sharing that key. You use a Berkeley DB cursor to find the desired
value[s], when retrieving a particular offset+len in a string.)
Representations know nothing about ancestry -- the `representations'
table never refers to node revision id's, only to strings or to other
representations. In other words, while the `nodes' table allows
recovery of ancestry information, the `representations' and `strings'
tables together handle deltification and undeltification
*independently* of ancestry. At present, Subversion generally stores
the youngest strings in "fulltext" form, and older strings as "delta"s
against them (unless the delta would save no space compared to the
fulltext). However, there's nothing magic about that particular
arrangement. Other interesting alternatives:
* We could store the N most recently accessed strings as fulltexts,
letting access patterns determine the most appropriate
representation for each revision.
* We could occasionally store deltas against the N'th younger
revision, storing larger jumps with a frequency inverse to the
distance covered, yielding a tree-structured history.
Since the filesystem interface doesn't expose these details, we can
change the representation pretty much as we please to optimize
whatever parameter we care about --- storage size, speed, robustness,
etc.
Representations never share strings - every string is referred to by
exactly one representation. This is so that when we change a
representation to a different form (e.g. during deltification), we can
delete the strings containing the old form, and know that we're not
messing up any other reps by doing so.
Further Notes On Deltifying:
----------------------------
When a representation is deltified, it is changed in place.
New strings are created containing the new delta, the representation
is changed to refer to the new strings, and the original (usually
fulltext) string or strings are deleted.
The node revisions referring to that representation will not be
changed; instead, the same rep key will now be associated with
different value. That way, we get reader locking for free: if someone
is reading a file while Subversion is deltifying that file, one of the
two sides will get a DB_DEADLOCK and svn_fs__retry_txn() will retry.
### todo: add a note about cycle-checking here, too.
The Berkeley DB "nodes" table
The database contains a table called "nodes", which is a btree indexed
by node revision ID's, mapping them onto REPRESENTATION skels. Node 0
is always the root directory, and node revision ID 0.0.0 is always the
empty directory. We use the value of the key 'next-key' to indicate
the next unused node ID.
Assuming that we store the most recent revision on every branch as
( run in 0.788 second using v1.01-cache-2.11-cpan-f5b5a18a01a )