Alien-Libjio

 view release on metacpan or  search on metacpan

lib/Alien/Libjio.pm  view on Meta::CPAN

=head1 DESCRIPTION

To ensure reliability, some file systems and databases provide support for
something known as journalling. The idea is to ensure data consistency by
creating a log of actions to be taken (called a Write Ahead Log) before
committing them to disk. That way, if a transaction were to fail due to a
system crash or other unexpected event, the write ahead log could be used to
finish writing the data.

While this functionality is often available with networked databases, it can
be a rather memory- and processor-intensive solution, even where reliable
writes are important. In other cases, the filesystem does not provide native
journalling support, so other tricks may be used to ensure data integrity,
such as writing to a separate temporary file and then overwriting the file
instead of modifying it in-place. Unfortunately, this method cannot handle
threaded operations appropriately.

Thankfully, Alberto Bertogli published a userspace C library called libjio
that can provide these features in a small (less than 1500 lines of code)
library with no external dependencies.

libjio/doc/guide.rst  view on Meta::CPAN

instead of the regular calls.

The full function list is available on the man page and I won't reproduce it
here; however the naming is quite simple: just prepend a 'j' to all the names:
*jread()*, *jwrite()*, etc.


Processes, threads and locking
------------------------------

The library is completely safe to use in multi-process and/or multi-thread
applications, as long as you abide by the following rules:

 - Within a process, a file must not be held open at the same time more than
   once, due to *fcntl()* locking limitations. Opening, closing and then
   opening again is safe.
 - *jclose()* must only be called when there are no other I/O operations in
   progress.
 - *jfsck()* must only be called when the file is known **not** to be open by
   any process.
 - *jmove_journal()* must only be called when the file is known **not** to be
   open by any other processes.

All other operations (committing a transaction, rolling it back, adding
operations, etc.) and all the wrappers are safe and don't require any special
considerations.


Lingering transactions
----------------------

If you need to increase performance, you can use lingering transactions. In
this mode, transactions take up more disk space but allows you to do the
synchronous write only once, making commits much faster. To use them, just add
*J_LINGER* to the *jflags* parameter in *jopen()*. You should call *jsync()*
frequently to avoid using up too much space, or start an asynchronous thread
that calls *jsync()* automatically using *jfs_autosync_start()*. Note that
files opened with this mode must not be opened by more than one process at the
same time.


Disk layout
-----------

The library creates a single directory for each file opened, named after it.
So if we open a file *output*, a directory named *.output.jio* will be
created. We call it the journal directory, and it's used internally by the
library to save temporary data; **you shouldn't modify any of the files that
are inside it, nor move it while it's in use**.

It doesn't grow much (it only uses space for transactions that are in the
process of committing) and gets automatically cleaned while working with it so
you can (and should) ignore it. Besides that, the file you work with has no
special modification and is just like any other file, all the internal stuff
is kept isolated on the journal directory.


ANSI C alike API
----------------

Besides the UNIX-alike API you can find an ANSI C alike API, which emulates
the traditional *fread()*, *fwrite()*, etc. It's still in development and has

libjio/doc/libjio.rst  view on Meta::CPAN


The former, "safely", means that after a commit has been done we can assume
the data will not get lost and can be retrieved, unless of course some major
event happens (like a physical hard disk crash). For us, this means that the
data was effectively written to the disk and if a crash occurs after the
commit operation has returned, the operation will be complete and data will be
available from the file.

The latter, "atomically", guarantees that the operation is either completely
done, or not done at all. This is a really common word, specially if you have
worked with multiprocessing, and should be quite familiar. We implement
atomicity by combining fine-grained locks and journaling, which can assure us
both to be able to recover from crashes, and to have exclusive access to a
portion of the file without having any other transaction overlap it.

Well, so much for talking, now let's get real; libjio applies commits in a
very simple and straightforward way, inside jtrans_commit():

 - Lock the file offsets where the commit takes place
 - Open the transaction file
 - Write the header

libjio/doc/libjio.rst  view on Meta::CPAN

the journal, so there is really nothing left to be done. So if the transaction
is complete, we only need to rollback.


UNIX-alike API
--------------

We call UNIX-alike API to the functions provided by the library that emulate
the good old UNIX file manipulation calls. Most of them are just wrappers
around commits, and implement proper locking when operating in order to allow
simultaneous operations (either across threads or processes). They are
described in detail in the manual pages, we'll only list them here for
completion:

 - jopen()
 - jread(), jpread(), jreadv()
 - jwrite(), jpwrite(), jwritev()
 - jtruncate()
 - jclose()


libjio/doc/libjio.rst  view on Meta::CPAN

  locks and relying on a commit procedure.

Consistency
  A transaction either creates a new and valid state of data, or, if any
  failure occurs, returns all data to its state before the transaction was
  started. This, like atomicity, has been discussed before, specially in the
  recovery section, when we saw how in case of a crash we end up with a fully
  applied transaction, or no transaction applied at all.

Isolation
  A transaction in process and not yet committed must remain isolated from any
  other transaction. This comes as a side effect of doing proper locking on
  the sections each transaction affect, and guarantees that there can't be two
  transactions working on the same section at the same time.

Durability
  Committed data is saved by the system such that, even in the event of a
  failure, the data is available in a correct state. To provide this, libjio
  relies on the disk as a method of permanent storage, and expects that when
  it does syncronous I/O, data is safely written and can be recovered after a
  crash.

libjio/doc/tids.rst  view on Meta::CPAN


To avoid parallel modifications, we will always lock the file with *fcntl()*
before accessing it.

Let's begin by describing how *get_tid()* works, because it's quite simple: it
locks the lockfile, gets the max tid, adds 1 to it, unlock the file and return
that value. That way, the new tid is always the new max, and with the locking
we can be sure it's impossible to assign the same tid to two different
transactions.

After a tid has been assigned, the commit process will create a file named
after it inside the journal directory. Then, it will operate on that file all
it wants, and when the moment comes, the transaction is no longer needed and
has to be freed.

The first thing we do is to unlink that transaction file. And then, we call
*free_tid()*, which will update the lockfile to represent the new max tid, in
case it has changed.

*free_tid()* begins by checking that if the transaction we're freeing is the
greatest, and if not, just returns.

libjio/libjio/libjio.3  view on Meta::CPAN

.BR pread() :
a buffer, its length and the offset where it should read from, and adds it to
the transaction. Note that if there is not enough data in the file to read
the specified amount of bytes, the commit will fail, so do not attempt to read
beyond EOF (you can use jread() for that purpose).

.B jtrans_commit()
commits the given transaction to disk. After it has returned, write operations
have been saved to the disk, and read operations have been read from it. The
commit operation is atomic with regards to other read or write operations on
different processes, as long as they all access it via libjio. It returns the
number 0 on success, -1 if there was an error but atomic warantees were
preserved, or -2 if there was an error and there is a possible break of atomic
warantees (which is an indication of a severe underlying condition).

.B jtrans_rollback()
reverses a transaction that was applied with
.BR jtrans_commit() ,
and leaves the file as it was before applying it. Be very very careful with
this function, it's quite dangerous if you don't know for sure that you're
doing the right thing. It returns the same values as

libjio/libjio/libjio.h  view on Meta::CPAN

/*
 * Public types
 */

/** The result of a jfsck() run.
 *
 * @see jfsck()
 * @ingroup check
 */
struct jfsck_result {
	/** Total transactions files processed */
	int total;

	/** Number of invalid transactions */
	int invalid;

	/** Number of transactions in progress */
	int in_progress;

	/** Number of broken transactions */
	int broken;

libjio/libjio/libjio.h  view on Meta::CPAN

 * @see jread()
 */
int jtrans_add_r(jtrans_t *ts, void *buf, size_t count, off_t offset);

/** Commit a transaction.
 * 
 * All the operations added to it using jtrans_add_w()/jtrans_add_r() will be
 * written to/read from disk, in the same order they were added.
 *
 * After this function returns successfully, all the data can be trusted to be
 * on the disk. The commit is atomic with regards to other processes using
 * libjio, but not accessing directly to the file.
 *
 * @param ts transaction
 * @returns 0 on success, or -1 if there was an error but atomic warranties
 * 	were preserved, or -2 if there was an error and there is a possible
 * 	break of atomic warranties (which is an indication of a severe
 * 	underlying condition).
 * @ingroup basic
 */
ssize_t jtrans_commit(jtrans_t *ts);

libjio/libjio/libjio.h  view on Meta::CPAN

/** Free a transaction structure.
 *
 * @param ts transaction to free
 * @see jtrans_new()
 * @ingroup basic
 */
void jtrans_free(jtrans_t *ts);

/** Change the location of the journal directory.
 *
 * The file MUST NOT be in use by any other thread or process. The older
 * journal directory will be removed.
 *
 * @param fs open file
 * @param newpath path to the new journal directory, which will be created if
 * 	it doesn't exist
 * @returns 0 on success, -1 on error
 * @ingroup basic
 */
int jmove_journal(jfs_t *fs, const char *newpath);

libjio/libjio/libjio.h  view on Meta::CPAN

 */
int jfs_autosync_stop(jfs_t *fs);


/*
 * Journal checker
 */

/** Check and repair the given path.
 *
 * The file MUST NOT be in use by any other thread or process. This
 * requirement will be lifted in future releases.
 *
 * @param name path to the file to check
 * @param jdir journal directory of the given file, use NULL for the default
 * @param res structure where to store the result
 * @param flags flags that change the checking behaviour, currently only
 *	J_CLEANUP is supported, which removes the journal directory after a
 *	successful recovery
 * @see struct jfsck_result
 * @returns 0 on success, < 0 on error, with the following possible negative

libjio/libjio/trans.c  view on Meta::CPAN

		goto error_exit;

	snprintf(jlockfile, PATH_MAX, "%s/lock", jdir);
	jfd = open(jlockfile, O_RDWR | O_CREAT, 0600);
	if (jfd < 0)
		goto error_exit;

	fs->jfd = jfd;

	/* initialize the lock file by writing the first tid to it, but only
	 * if its empty, otherwise there is a race if two processes call
	 * jopen() simultaneously and both initialize the file */
	plockf(jfd, F_LOCKW, 0, 0);
	lstat(jlockfile, &sinfo);
	if (sinfo.st_size != sizeof(unsigned int)) {
		t = 0;
		rv = spwrite(jfd, &t, sizeof(t), 0);
		if (rv != sizeof(t)) {
			goto error_exit;
		}
	}

libjio/tests/behaviour/tf.py  view on Meta::CPAN

	shame. Yes, it's not safe; I know and I don't care."""
	tmpdir = os.environ.get('TMPDIR', '/tmp')
	now = time.time()
	now_s = str(int(now))
	now_f = str((now - int(now)) * 10000)
	now_str = "%s.%s" % (now_s[-5:], now_f[:now_f.find('.')])
	return tmpdir + '/jiotest.%s.%s' % (now_str, os.getpid())


def run_forked(f, *args, **kwargs):
	"""Runs the function in a different process."""
	sys.stdout.flush()
	pid = os.fork()
	if pid == 0:
		# child
		f(*args, **kwargs)
		sys.exit(0)
	else:
		# parent
		id, status = os.waitpid(pid, 0)
		if not os.WIFEXITED(status):
			raise RuntimeError, (id, status)

def forked(f):
	"Decorator that makes the function run in a different process."
	def newf(*args, **kwargs):
		run_forked(f, *args, **kwargs)
	return newf


def gencontent(size = 9377):
	"Generates random content."
	s = ''
	a = "%.20f" % random.random()
	while len(s) < size:

libjio/tests/behaviour/tf.py  view on Meta::CPAN


def bitmp(mode = 'w+', jflags = 0):
	"Opens a temporary file with biopen()."
	path = tmppath()
	return biopen(path, mode, jflags)


def run_with_tmp(func, jflags = 0):
	"""Runs the given function, that takes a file and a jfile as
	parameters, using a temporary file. Returns the path of the temporary
	file. The function runs in a new process that exits afterwards."""
	f, jf = bitmp(jflags = jflags)
	run_forked(func, f, jf)
	return f.name


def jiodir(path):
	return os.path.dirname(path) + '/.' + os.path.basename(path) + '.jio'

def transpath(path, ntrans):
	jpath = jiodir(path)

libjio/tests/stress/jiostress  view on Meta::CPAN

		n = fd.read(total - c)
		if (n == ''):
			break
		c += len(n)
		r += n
	fd.seek(ppos, 0)
	assert c == end - start
	return r

#
# Output handler, used to get a nice output when using multiple processes
#

class OutputHandler:
	def __init__(self, every):
		# fds to read from
		self.rs = []

		# we will report every this number of seconds
		self.every = every

		# how many transactions has each child processed; we use the
		# read end of the pipe to identify them
		self.ntrans = {}

		# like self.ntrans but counts only the failed ones
		self.nfailures = {}

		# fd to write to, only relevant in the child
		self.w = None

		# p = parent, c = child

libjio/tests/stress/jiostress  view on Meta::CPAN

			self.print()

	def print(self):
		self.last_print_time = time.time()
		for r in sorted(self.ntrans):
			print("%4d" % self.ntrans[r], end = ' ')
		print()


#
# Lock manager, used to lock ranges between multiple processes
#
# We can't lock the real file because that would ruin libjio's locking, so we
# create a new file, remove it, and use fcntl locking. Not very elegant but it
# does the trick.
#

class VoidLockManager:
	def __init__(self):
		pass

libjio/tests/stress/jiostress  view on Meta::CPAN

#

def run_stressers(nproc, fname, fsize, nops, use_fi, use_as, output, lockmgr,
		do_verify):
	pids = []
	print("Launching stress test")
	for i in range(nproc):
		# Calculate how many operations will this child perform. The
		# last one will work a little more so we get exactly nops.
		# Note we prefer to work extra in the end rather than having
		# the last process with 0 child_nops, that's why we use int()
		# instead of round() or ceil().
		child_nops = int(nops / nproc)
		if i == nproc - 1:
			child_nops = nops - int(nops / nproc) * i

		output.prefork()
		sys.stdout.flush()
		pid = os.fork()
		if pid == 0:
			# child

libjio/tests/stress/jiostress  view on Meta::CPAN

	totalops, nfailures = output.output_loop()
	print("Stress test completed, waiting for children")
	nerrors = 0
	for pid in pids:
		rpid, status = os.waitpid(pid, 0)
		if os.WEXITSTATUS(status) != 0:
			nerrors += 1

	print("  %d operations" % totalops)
	print("  %d simulated failures" % nfailures)
	print("  %d processes ended with errors" % nerrors)
	if nerrors:
		return False
	return True

def main():
	usage = "Use: %prog [options] <file name> <file size in Mb>"
	parser = OptionParser(usage = usage)
	parser.add_option("-n", "--nops", dest = "nops", type = "int",
		default = 100,
		help = "number of operations (defaults to %default)")
	parser.add_option("-p", "--nproc", dest = "nproc", type = "int",
		default = 1,
		help = "number of processes (defaults to %default)")
	parser.add_option("", "--fi", dest = "use_fi",
		action = "store_true", default = False,
		help = "use fault injection (conflicts with --as and -p > 1)")
	parser.add_option("", "--as", dest = "use_as",
		action = "store_true", default = False,
		help = "use J_LINGER + autosync (conflicts with --fi)")
	parser.add_option("", "--no-internal-lock",
		dest = "use_internal_locks", action = "store_false",
		default = True,
		help = "do not lock internally, disables verification")

libjio/tests/stress/jiostress  view on Meta::CPAN

	except ValueError:
		print("Error: the size of the file must be numeric")
		return 1

	if not options.force:
		if options.use_fi and options.use_as:
			print("Error: --fi and --as cannot be used together")
			return 1

		if options.use_fi and options.nproc > 1:
			print("Error: --fi cannot be used with multiple processes")
			return 1

	if not options.use_internal_locks:
		options.do_verify = False

	output = OutputHandler(every = 2)
	if options.use_internal_locks:
		lockmgr = LockManager()
	else:
		lockmgr = VoidLockManager()



( run in 0.359 second using v1.01-cache-2.11-cpan-8d75d55dd25 )