locked results from the CPAN

Cache-Static
view release on metacpan or search on metacpan

- ./t/filedep.t doesn't clean up /tmp/Cache-Static-test/ after itself,
	and should be using mktemp anyway

- more .pod documentation for sub-modules, etc.

- we don't detect the error of 
	"_XML::Comma|Store|Post|main|$id"
	(that should be Doc, not Store - detect by the number of pipes in the spec)

- t/dbi.t fails with server in dsn (it's ok if we omit 127.0.0.1 tho)
  see svn diff between dbi.t in revs. 599 and 603
  http://chronicle.allafrica.com/svnweb/index.cgi/allafrica/log/trunk/Cache-Static/t/dbi.t

- a bug: when we change a depended upon component, we don't check
	the filemod time? or perhaps we aren't checking down the tree?
		(note: i have no idea what this refers to, or if it still
		exists. maybe it was something in hmc.pm?)

- if there is no config file, do we keep trying to load it in?

-- API should not force you to call most of the it's making you call...
new API:
	init
	make_key (optional)
	get_if_same
	set
in gis/set:
	$key = make_key($key) unless($key isa Cache::Static::Key);
you should get the same results regardless of if you called make_key,
but calling make_key and saving the results saves you one MD5 lookup
	(think about memoization tho)
ALSO - it'd be nice to have the deps encoded in the key object too (?)
THINK: do we want to expose is_same?
every other function should be underscored!!!

-- get a better solution for permission stuff.
making everything 777/666 is not right.
plus all the extra chmod calls (which are slow and a race condition)
is there umask in perl? YES, see perldoc -f umask

-- lock around set (EXCL) / get (LOCK_SH)
  actually, need to lock around basically everything in HMU::cache_it...
  but this should not be hacked in, should be an API level change...
  get_if_same_or_lock() ... set() ?
  probably do this:
  get_if_same():
    take an optional lock argument (default to what conf says)
  set():
    if there is a lock, clear it.
  problem:
    we need to pass a filehandle back to inherit the lock.
    or we could save it globally somewheres?

pseudo code:

get_if_same:

if(!defined(is_same(...))) {
  if(my $FH = open_with_flock(LOCK_EX, block => 0)) {
    $cache::static::_curr_locked_FH = $FH;
  } else {
    #wait until it's been set, then return it
    open_with_flock(LOCK_SH, block => 1);
    return get(...);
  }
} else {
   return get(...);
}

set:
my $FH = $cache::static::_curr_locked_FH || open(...);

...
close($FH);

an optimization idea stolen from the lighthttpd folks:
  (http://www.lighttpd.net/documentation/performance.html)
		stat() cache

		A stat(2) can be expensive; caching it saves time and context switches.

		Instead of using stat() every time to check for the existence of a file 
		you can stat() it once and monitor the directory the file is in for 
		modifications. As long as the directory doesn't change, the files in it 
		must all still be the same.

		With the help of FAM or gamin you can use kernel events to assure that 
		your stat cache is up to date.

		server.stat-cache-engine = "fam"   # either fam, simple or disabled
	granted, they are talking about the context of httpd serving static
	pages, but it could still cut down on our number of stat calls by a
	large factor in the common case...

next up:
in the case where you allow old results:
  before writing, copy the cache file "$f" to "$f.bak"
  any get_if_same() during that time reads "$f.bak"

  when done, remove "$f.bak" (this results in race condition in reader:
    1 - find the file locked, try the .bak file
    2 - .bak file is removed, goto 1
  OR have a crontab that runs e.g. hourly to clean up .bak files...


  start the lock when generation starts, remove the lock when new value
  has been set
situation:
  refresh of key K takes 30 seconds.
  every 5 seconds, we get a hit.
  this means we regenerate 6 times, when we only needed to do it once.
  this is slower by half on average and takes 6* the resources.
solution:
  if we have fcntl:
  exclusive lock while writing to the file
  all attempts to read will block on the initial thread.
  that's all.

-- TODO: think - when is the right time to reload the config files?
  right now we keep them loaded forever... which is ok, but there
  should be some signal...

-- dbi.t:
exec this shit first:
  drop database scache_test_db;
  create database scache_test_db;
  use scache_test_db;
  create table test_table ( test_field1 TINYINT, test_field2 TINYINT );

-- think about using a non-standard depend to parse the SQL code:
http://www.perl.com/pub/a/2006/01/05/parsing.html
http://perlmonks.org/index.pl?node_id=472684
http://search.cpan.org/perldoc/SQL::Statement
http://search.cpan.org/perldoc/SQL::Parser
( SQL::Statement uses SQL::Parser )

-- DBI implementation TODO:
column level depends "DBI|column|$dsn|$tablename|$columname"
row depends "DBI|row|$dsn|$tablename|$uid_column_name|$uid_value"
	NOTE: don't have to do the hard work of figuring out where
	the primary key is - let the programmer do that - for two reasons:
		- 1: it's non-sensical/unclear to specify a uid_value without
		     saying where it comes from
		- 2: it's hard
how to do row depends:
	WORM - it's ok for a write to be expensive. so before exec(), do a
	select primary_key_name from table_name where $WHERE_CLAUSE
	then update those timestamps.
	this implies we need a list of table -> primary key mappings
		somewhere (we can try to get fancy and autodetect later)
seperate out prepare()
what about SELECT expressions, etc. that have
	side-effects? for now, caveat emptor...
	auto-increment just used at row create time (not read time).
	actually i think we're ok except for triggers...
also parsing, we need to watch out for compound statements (aka 
	procedures)
	http://dev.mysql.com/doc/refman/5.0/en/stored-procedure-syntax.html

-- it'd be nice to add a config value for a threshold above which
   items are compressed, e.g. 250k
( run in 1.084 second using v1.01-cache-2.11-cpan-39bf76dae61 )