Cache-Static
view release on metacpan or search on metacpan
- if there is no config file, do we keep trying to load it in?
-- API should not force you to call most of the it's making you call...
new API:
init
make_key (optional)
get_if_same
set
in gis/set:
$key = make_key($key) unless($key isa Cache::Static::Key);
you should get the same results regardless of if you called make_key,
but calling make_key and saving the results saves you one MD5 lookup
(think about memoization tho)
ALSO - it'd be nice to have the deps encoded in the key object too (?)
THINK: do we want to expose is_same?
every other function should be underscored!!!
-- get a better solution for permission stuff.
making everything 777/666 is not right.
plus all the extra chmod calls (which are slow and a race condition)
is there umask in perl? YES, see perldoc -f umask
-- lock around set (EXCL) / get (LOCK_SH)
actually, need to lock around basically everything in HMU::cache_it...
but this should not be hacked in, should be an API level change...
get_if_same_or_lock() ... set() ?
probably do this:
get_if_same():
take an optional lock argument (default to what conf says)
set():
if there is a lock, clear it.
problem:
we need to pass a filehandle back to inherit the lock.
or we could save it globally somewheres?
pseudo code:
get_if_same:
if(!defined(is_same(...))) {
if(my $FH = open_with_flock(LOCK_EX, block => 0)) {
$cache::static::_curr_locked_FH = $FH;
} else {
#wait until it's been set, then return it
open_with_flock(LOCK_SH, block => 1);
return get(...);
}
} else {
return get(...);
}
set:
my $FH = $cache::static::_curr_locked_FH || open(...);
...
close($FH);
an optimization idea stolen from the lighthttpd folks:
(http://www.lighttpd.net/documentation/performance.html)
stat() cache
A stat(2) can be expensive; caching it saves time and context switches.
Instead of using stat() every time to check for the existence of a file
you can stat() it once and monitor the directory the file is in for
modifications. As long as the directory doesn't change, the files in it
must all still be the same.
With the help of FAM or gamin you can use kernel events to assure that
your stat cache is up to date.
server.stat-cache-engine = "fam" # either fam, simple or disabled
granted, they are talking about the context of httpd serving static
pages, but it could still cut down on our number of stat calls by a
large factor in the common case...
next up:
in the case where you allow old results:
before writing, copy the cache file "$f" to "$f.bak"
any get_if_same() during that time reads "$f.bak"
when done, remove "$f.bak" (this results in race condition in reader:
1 - find the file locked, try the .bak file
2 - .bak file is removed, goto 1
OR have a crontab that runs e.g. hourly to clean up .bak files...
start the lock when generation starts, remove the lock when new value
has been set
situation:
refresh of key K takes 30 seconds.
every 5 seconds, we get a hit.
this means we regenerate 6 times, when we only needed to do it once.
this is slower by half on average and takes 6* the resources.
solution:
if we have fcntl:
exclusive lock while writing to the file
all attempts to read will block on the initial thread.
that's all.
-- TODO: think - when is the right time to reload the config files?
right now we keep them loaded forever... which is ok, but there
should be some signal...
-- dbi.t:
exec this shit first:
drop database scache_test_db;
create database scache_test_db;
use scache_test_db;
create table test_table ( test_field1 TINYINT, test_field2 TINYINT );
-- think about using a non-standard depend to parse the SQL code:
http://www.perl.com/pub/a/2006/01/05/parsing.html
http://perlmonks.org/index.pl?node_id=472684
http://search.cpan.org/perldoc/SQL::Statement
http://search.cpan.org/perldoc/SQL::Parser
( SQL::Statement uses SQL::Parser )
-- DBI implementation TODO:
column level depends "DBI|column|$dsn|$tablename|$columname"
row depends "DBI|row|$dsn|$tablename|$uid_column_name|$uid_value"
NOTE: don't have to do the hard work of figuring out where
the primary key is - let the programmer do that - for two reasons:
- 1: it's non-sensical/unclear to specify a uid_value without
saying where it comes from
actually i think we're ok except for triggers...
also parsing, we need to watch out for compound statements (aka
procedures)
http://dev.mysql.com/doc/refman/5.0/en/stored-procedure-syntax.html
-- it'd be nice to add a config value for a threshold above which
items are compressed, e.g. 250k
-- writing .dep file should be optional, controlled by a config time
option
-- implement a similar store which dumps friendly key name & conf time
option for it
-- HTML_Mason_Util:
- cache component compile stuff for non-top-level depends
find the list of nodes that have changed.
save the tree and just update the children of those nodes
this requires saving a tree structure, not merely an array, to disk
- think about $m->comp() and $m->scomp() (currently unrecognized)
- configurability:
unrecognized_html_mason_dependency_returns => 0
(inherit from unrecognized_dependency_returns)
html_mason_recurse_levels => -1 (unlimited)
(disable with 0, or specify number of times to recurse)
- think about components that have content
... it'd be nice to also depend on the relevant autohandlers too ...
-- cleanup:
- use a TEST namespace for t/*.t in TEST.pm
- _DBI|... vs. DBI|... specs
is there a similar problem with XML_Comma vs. _XML_Comma ?
- permission handling on directories - making them all 777 is probably
not what we want...
-- test suite
finish tests:
test namespace overrides and init(), rebase(), etc. behavior
dbi tests should try to create db/table if they don't exist...
? dbi tests should look for a config file (from previous install) ?
html-mason-util.t (use HTML::Mason::Tests, hmc)
xml-comma-util.t
misc.t (_log... makekey, md5path, etc... misc from Static.pm)
-- read old/write asynchronous option
-- cleanup:
extension/wrapper thing could be cleaner
change way extensions work -> if possible, translate to a file
dependency, instead of checking the dependency in the extension.
this allows for scache to work on extensions
-- documentation:
null $deps in set()... two usage modes, one where we care,
one where we don't. be more explicit about this...
this is not even the tip of the iceberg...
---- STUFF FOR LATER ----
-- the one second wait bug:
you have to wait one second or else you will always regenerate
this is a limitation of stat() and/or what's stored in most filesystems
solution:
- if $have_fcntl, lock writes to lessen dependency expiration bursts
(only regen once)
since we have a WORM model, writes can be expensive. therefore,
- when you call set, also save a file called $cachefile.microseconds
- if timestamps are equal, THEN read $cachefile.microseconds.
use this to determine if we need a refresh or not
to do this we require Time::HiRes, but that's ok b/c this whole
functionality will be optional
-- binary dependency lists, eg.
( "file|/die/young" AND "file|/leave/a/beautiful/corpse" )
OR ( "file|/get/old" AND "file|/get/fat" AND "file|/get/ugly" )
-- tools for manually purging groups of stuff from the cache...
e.g.
/items/200512140010?doc=mm_item|post|200512140010&offset=&per_page=20
a/0/Q/0lMqP_p04cdaaBZK2Dw
This will require having a list of MD5->orig names somewhere.
-- cache expiration/replacement policies:
we want to cache "forever" - but in reality, pages will go away, etc.
we should have a default timeout of say a week...
cronjob cleanups?
-- perhaps make _Cache_Static_hmc a restricted namespace but allow
writes to it with !_Cache_Static_hmc ?
-- XML_Comma_Util:
it'd be nice to have an automagic depend on the def file
... but gettihg thru comma is too slow (0.02 sec on p4, 3GHz)...
this would require a caching scheme...
0) check in a hash in memory
1) check in /usr/local/Cache-Static/ext/XML_Comma/$defname
(whose contents will be just a .def file)
2) do the expensive lookup
( run in 0.811 second using v1.01-cache-2.11-cpan-39bf76dae61 )