Alien-SVN

 view release on metacpan or  search on metacpan

src/subversion/subversion/libsvn_fs_base/notes/fs-history  view on Meta::CPAN

                            -*- text -*-

                   Subversion Filesystem History
         (a love song for libsvn_fs, by C. Michael Pilato)


The Subversion filesystem can be your best friend, or your worst
enemy, usually depending on which side of the public API you are
working on.  Callers of the libsvn_fs interfaces do their work in a
world pleasantly addressed by roots (the name given to a revision or
transaction snapshot of the versioned directory tree) and paths under
those roots.  But once you swim beneath the surface, you quickly
realize that there is a beast both beautiful and dangerous lying in
wait.  What looks to the outside world as a sort of coordinate system
with axes for "Time" and "Location" is, in fact, a complicated DAG
subsystem, with nodes that represent revisions of versioned locations
all interconnected in various relationships with each other.

The goal of this document is straightforward: to relay knowledge about
how to untangle that DAG subsystem -- knowledge which likely lives
only in the minds of a few developers -- so that the Few might become
the Many.



Node-Revisions: The Nodes of the DAG

When working outside the filesystem API, people generally talk about
their versioned resources in terms of the paths of those resources,
and the global revisions (or revisions-to-be) in which those paths
are located.  But inside the filesystem, paths are broken down and
stored as a hierarchical linked-list of path components.  Each of
these path components has its own historical lineage (because
Subversion versions directories, too!), and each revision of that
lineage is referred to as a "node-revision".  It is these
node-revisions which are the nodes of the DAG subsystem, or "DAG
nodes".

DAG nodes are identified by unique keys called "node-revision IDs",
and are inter-connected in a variety of ways.  A DAG node that
represents a directory stores information about which other DAG nodes
represent the children of that directory.  A DAG node contains
information about which other DAG node is its historical predecessor.
By tracing these links from node to node, we can effectively traverse
both space and time, both the geography and the chronology of the
filesystem landscape.

For example, the path "/trunk/src/main.c" in revision 4 of the
filesystem consumes four DAG nodes: one for "/", one for "/trunk", one
for "/trunk/src", and one for "/trunk/src/main.c".  The DAG node for
"/" contains a list of the names and node-revision IDs of its
children, among which is the node-revision ID for the child named
"trunk".  Similar links are found in "/trunk" (for "src") and
"/trunk/src" (for "main.c").  Additionally, if these paths existed in
different forms in previous revisions of the filesystem, their DAG
nodes would store the node-revision IDs of their respective
predecessor nodes.

Whenever someone uses the public API to query for information about a
versioned path under a particular root, the typical course of action
under-the-hood is as follows:

   1. The root refers to a particular snapshot of the DAG node tree,
      and from this we can learn the node-revision ID of the node
      which represents the root directory ("/") as it appears in that
      snapshot.  Given this node-revision ID, it's all DAG from here.

   2. The path is split into components and traversed, beginning with
      the root node, and walking down toward the full path.  Each
      intermediate node-revision is read, its entries list parsed, and
      the next component looked up in that entries list to find the
      next node-revision ID along the traversal path.

   3. Finally, we wind up with a node-revision ID for our original
      path.  We use it and its associated node-revision to answer the
      query.

Seems pretty easy, doesn't it?  Keep reading.



All About Node-Revision IDs

As previously mentioned, each node-revision in the filesystem has a
unique key, referred to as the node-revision ID.  This key is
typically represented as a string which looks like a period-separated
list of its three components:

   1. node ID: This key is unique to the members of a single
      historical lineage.  Differing versions of the same versioned
      resource, irrespective of the paths and revision in which those
      versions are located, all share this ID.  If two node-revisions
      have different node IDs, their are historically unrelated.

   2. copy ID: This key uniquely identifies a copy operation, and is
      sometimes referred to (or at least thought of) as a "branch ID."
      If two node-revisions have the same copy ID, they are said to be
      on the same branch.  The only exception to this is in the key
      "0", a special key that means "not branched".

   3. txn ID: This key uniquely identifies the Subversion transaction
      in which this node-revision came into existence.

Whenever someone uses the public API to *modify* a versioned resource,
these actions are much the same as those used when querying.  But
there are important differences.

   1. The path is traversed in the same manner is described in the
      previous section.  The result is an in-memory linked-list of
      information about the node-revisions which comprise the
      components of the path.

   2. But before any changes can be made to a path, its node-revision
      and those of its parent directories must first be cloned so that
      changes to them don't affect previous incarnations of those
      node-revisions.  This process is called "making the path
      mutable".  If previous operations under this transaction caused
      one or more of the parent directories to be made mutable
      already, they are not again cloned.

   3. Once the path and all its parents are mutable, the desired
      changes can be made to the cloned node-revision, and they in no
      way affect prior history.

To clone a node-revision means to literally make a duplicate of it
which is granted its own unique node-revision ID.  The new
node-revision ID consists of the same node ID as the node-revision
that was cloned (since this is just another point along the historical
lineage of this versioned resource), a copy ID (which will be
discussed later), and the txn ID in which this modification is
occuring.

There are some cool things we can read between the lines above.  Since
the only time a node-revision comes into existence is when it is brand
new or a fresh clone, and we never do cloning except during a
modification, then we can use the txn ID as a sort of mutability flag.
Mutability of a node-revision is determined by comparing the txn ID of
the node-revision with the ID of the Subversion transaction being used
to modify the filesystem -- if, and only if, they are the same, the node
is allowed to be changed inside that transaction.

So, we know how txn IDs come into existence now.  And the origin of
node IDs hardly warrants its own paragraph: brand new lines of history
(introduced with svn_fs_make_file() and svn_fs_make_dir()) get new
unique node IDs, and every other node-revision that is created simply
keeps the same node ID as the node-revision on which it is based.



( run in 1.377 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )