Data-HashMap-Shared
view release on metacpan or search on metacpan
live reader is concurrently present, the dead slot is left intact for
the next recovery cycle (preserves the only record of the stuck
counter). Beyond 1024 simultaneous handles per map, new handles skip
slot tracking and fall back to the slow per-timeout drain.
The same path validates and rebuilds the LRU doubly-linked list if a
dead writer left it inconsistent. "stat_recoveries" in "stats" counts
every recovery event.
Recovery uses "kill($pid, 0)" for liveness, which cannot distinguish a
reused PID from the original. Hitting a false "alive" requires a process
to die in the brief window it holds a read lock and the kernel to cycle
through the entire PID space back to that exact number within the
~2-second recovery window and hand it to a long-lived process â i.e. a
runaway fork storm. Even then the effect is bounded: writers stall until
the recycled process exits; reads are unaffected and no data is
corrupted. Writer-crash recovery is immune (the writer PID lives in the
lock word and is reclaimed independently of the slot table).
Limitation: PID-based recovery assumes all processes share the same PID
namespace. Cross-container sharing (different PID namespaces) is not
print "permanent=$pv ttl_remaining=$pr\n";
}
print "\nSleeping 3 seconds...\n";
sleep 3;
my $v = shm_si_get $map, "counter";
printf "counter=%s (expired: %s)\n", $v // 'undef', defined $v ? 'no' : 'yes';
my $p = shm_si_get $map, "permanent";
printf "permanent=%s (still alive)\n", $p;
$map->unlink;
lib/Data/HashMap/Shared.pm view on Meta::CPAN
If a live reader is concurrently present, the dead slot is left intact
for the next recovery cycle (preserves the only record of the stuck
counter). Beyond 1024 simultaneous handles per map, new handles skip
slot tracking and fall back to the slow per-timeout drain.
The same path validates and rebuilds the LRU doubly-linked list if a
dead writer left it inconsistent. C<stat_recoveries> in C<stats> counts
every recovery event.
Recovery uses C<kill($pid, 0)> for liveness, which cannot distinguish a
reused PID from the original. Hitting a false "alive" requires a process to
die in the brief window it holds a read lock B<and> the kernel to cycle
through the entire PID space back to that exact number within the ~2-second
recovery window B<and> hand it to a long-lived process â i.e. a runaway fork
storm. Even then the effect is bounded: writers stall until the recycled
process exits; reads are unaffected and no data is corrupted. Writer-crash
recovery is immune (the writer PID lives in the lock word and is reclaimed
independently of the slot table).
B<Limitation>: PID-based recovery assumes all processes share the same
PID namespace. Cross-container sharing (different PID namespaces) is not
shm_generic.h view on Meta::CPAN
#else
__asm__ volatile("" ::: "memory");
#endif
}
/* Extract writer PID from rwlock value (lower 31 bits when write-locked). */
#define SHM_RWLOCK_WRITER_BIT 0x80000000U
#define SHM_RWLOCK_PID_MASK 0x7FFFFFFFU
#define SHM_RWLOCK_WR(pid) (SHM_RWLOCK_WRITER_BIT | ((uint32_t)(pid) & SHM_RWLOCK_PID_MASK))
/* Check if a PID is alive. Returns 1 if alive or unknown, 0 if definitely dead. */
/* Liveness via kill(pid,0). NOTE: cannot detect PID reuse â if a dead
* lock-holder's PID is recycled to an unrelated live process before recovery
* runs, this reports "alive" and that slot's orphaned contribution is not
* reclaimed until the recycled process exits. Robust detection would require
* a per-slot process-start-time epoch (a header-layout/SHM_VERSION change).
* Documented under "Crash Safety" in the POD. */
static inline int shm_pid_alive(uint32_t pid) {
if (pid == 0) return 1; /* no owner recorded, assume alive */
return !(kill((pid_t)pid, 0) == -1 && errno == ESRCH);
}
/* Forward declaration â defined later in the LRU helpers section. */
static void shm_lru_rebuild_if_corrupt(ShmHandle *h);
/* Force-recover a stale write lock left by a dead process.
* CAS to OUR pid to hold the lock while fixing seqlock, then release.
* Using our pid (not a bare WRITER_BIT sentinel) means a subsequent
* recovering process can detect and re-recover if we crash mid-recovery. */
shm_generic.h view on Meta::CPAN
* future claimants and drain any orphan parked-waiter counters. Slots
* with dead pid and sc > 0 are left intact in this pass: if force-
* reset cannot fire (because a live reader is concurrently present),
* wiping the dead slot would lose the only record of its orphan
* rwlock contribution and strand writers permanently once the live
* reader releases. */
for (uint32_t i = 0; i < SHM_READER_SLOTS; i++) {
uint32_t pid = __atomic_load_n(&h->reader_slots[i].pid, __ATOMIC_ACQUIRE);
if (pid == 0) continue;
uint32_t sc = __atomic_load_n(&h->reader_slots[i].subcount, __ATOMIC_RELAXED);
if (shm_pid_alive(pid)) {
if (sc > 0) any_live_reader = 1;
continue;
}
if (sc > 0) { found_dead_reader = 1; continue; }
if (shm_drain_dead_slot(h, i, pid)) any_recovery = 1;
}
/* Pass 2: only if force-reset will fire. Issue the rwlock force-
* reset CAS FIRST, while the window since pass 1's last scan is
* still narrow (a handful of instructions, as in the original
shm_generic.h view on Meta::CPAN
if (cur > 0 && cur < SHM_RWLOCK_WRITER_BIT) {
if (__atomic_compare_exchange_n(&hdr->rwlock, &cur, 0,
0, __ATOMIC_RELEASE, __ATOMIC_RELAXED)) {
any_recovery = 1;
if (__atomic_load_n(&hdr->rwlock_waiters, __ATOMIC_RELAXED) > 0)
syscall(SYS_futex, &hdr->rwlock, FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
}
}
for (uint32_t i = 0; i < SHM_READER_SLOTS; i++) {
uint32_t pid = __atomic_load_n(&h->reader_slots[i].pid, __ATOMIC_ACQUIRE);
if (pid == 0 || shm_pid_alive(pid)) continue;
if (shm_drain_dead_slot(h, i, pid)) any_recovery = 1;
}
}
if (any_recovery)
__atomic_add_fetch(&hdr->stat_recoveries, 1, __ATOMIC_RELAXED);
}
/* Inspect the lock word after a futex-wait timeout. If a dead writer
* holds it, force-recover the lock (which also rebuilds the LRU list
* if it was left half-linked, all under the recovered write lock).
* Otherwise drain dead readers' shares of the rwlock/waiter counters.
* Called from rdlock and wrlock ETIMEDOUT branches â identical recovery
* logic in both. */
static inline void shm_recover_after_timeout(ShmHandle *h) {
ShmHeader *hdr = h->hdr;
uint32_t val = __atomic_load_n(&hdr->rwlock, __ATOMIC_RELAXED);
if (val >= SHM_RWLOCK_WRITER_BIT) {
uint32_t pid = val & SHM_RWLOCK_PID_MASK;
if (!shm_pid_alive(pid))
shm_recover_stale_lock(h, val);
} else {
shm_recover_dead_readers(h);
}
}
/* Park/unpark helpers: bump the global waiter counters together with this
* process's mirrored slot counters so a wrlock-timeout recovery scan can
* attribute and reverse a dead PID's contribution. Kept paired to make
* accidental drift between global and per-slot counts impossible. */
shm_generic.h view on Meta::CPAN
__atomic_sub_fetch(&h->reader_slots[h->my_slot_idx].writers_parked, 1, __ATOMIC_RELAXED);
}
}
static inline void shm_rwlock_rdlock(ShmHandle *h) {
shm_claim_reader_slot(h);
ShmHeader *hdr = h->hdr;
uint32_t *lock = &hdr->rwlock;
uint32_t *writers_waiting = &hdr->rwlock_writers_waiting;
/* Claim subcount BEFORE bumping the shared rwlock counter. This way
* a concurrent writer-side recovery scan that sees our PID alive with
* subcount > 0 will (correctly) defer force-reset, even while we are
* still spinning trying to win the rwlock CAS. Without this, a reader
* killed between rwlock CAS-success and subcount++ would let recovery
* force-reset rwlock to 0 underneath us, causing a UINT32_MAX wrap on
* our eventual rdunlock dec. */
if (h->my_slot_idx != UINT32_MAX)
__atomic_add_fetch(&h->reader_slots[h->my_slot_idx].subcount, 1, __ATOMIC_RELAXED);
for (int spin = 0; ; spin++) {
uint32_t cur = __atomic_load_n(lock, __ATOMIC_RELAXED);
/* Write-preferring: when lock is free (cur==0) and writers are
shm_generic.h view on Meta::CPAN
if (__builtin_expect((s & 1) == 0, 1)) return s;
if (__builtin_expect(spin < 100000, 1)) {
shm_rwlock_spin_pause();
spin++;
continue;
}
/* Prolonged odd seq â check for dead writer */
uint32_t val = __atomic_load_n(&hdr->rwlock, __ATOMIC_RELAXED);
if (val >= SHM_RWLOCK_WRITER_BIT) {
uint32_t pid = val & SHM_RWLOCK_PID_MASK;
if (!shm_pid_alive(pid)) {
shm_recover_stale_lock(h, val);
spin = 0;
continue;
}
}
/* Writer is alive, yield CPU */
struct timespec ts = {0, 1000000}; /* 1ms */
nanosleep(&ts, NULL);
spin = 0;
}
}
static inline int shm_seqlock_read_retry(uint32_t *seq, uint32_t start) {
__atomic_thread_fence(__ATOMIC_ACQUIRE); /* ensure data loads complete before retry check */
return __atomic_load_n(seq, __ATOMIC_RELAXED) != start;
}
xt/edge-cases.t view on Meta::CPAN
}
# unlink: instance method
{
my $path = tmpfile();
my $map = Data::HashMap::Shared::II->new($path, 100);
shm_ii_put $map, 1, 42;
ok(-f $path, 'backing file exists');
ok($map->unlink, 'instance unlink returns true');
ok(!-f $path, 'backing file removed after unlink');
# map still works (mmap stays alive after unlink)
my $v = shm_ii_get $map, 1;
is($v, 42, 'map still readable after unlink');
}
# unlink: class method
{
my $path = tmpfile();
my $map = Data::HashMap::Shared::II->new($path, 100);
undef $map;
ok(-f $path, 'backing file exists before class unlink');
xt/lru-ttl.t view on Meta::CPAN
shm_ii_put $map, 1, 10;
shm_ii_put_ttl $map, 2, 20, 100; # long TTL
sleep 4;
my @k = shm_ii_keys $map;
# key 1 should be expired during iteration, key 2 should survive
# Note: keys iteration may or may not lazily expire
# But get should definitely expire
ok(!defined(shm_ii_get $map, 1), 'key 1 expired');
is(shm_ii_get $map, 2, 20, 'key 2 still alive');
unlink $path;
}
# clear resets LRU state
{
my $path = tmpfile();
my $map = Data::HashMap::Shared::II->new($path, 1000, 3);
shm_ii_put $map, 1, 10;
xt/stale_recovery_crash.t view on Meta::CPAN
use warnings;
use Test::More;
use POSIX qw(_exit);
use Time::HiRes qw(time);
use Data::HashMap::Shared::II;
# Regression: Pass 14 â if the process recovering a stale lock itself
# crashes mid-recovery, the lock must remain recoverable.
# Pre-fix: shm_recover_stale_lock held lock as bare WRITER_BIT (PID=0),
# which shm_pid_alive treated as always alive, causing permanent hang.
# This regression is hard to trigger deterministically â it requires a
# crash in the ~5 instruction window between CAS and seqlock-fix + release.
# Best we can do in a portable test: verify that basic operations succeed
# after a forced SIGKILL during writes, which is the common trigger path.
use File::Temp qw(tmpnam);
my $path = tmpnam() . ".$$";
my $m = Data::HashMap::Shared::II->new($path, 1024);
( run in 2.661 seconds using v1.01-cache-2.11-cpan-df04353d9ac )