Data-NDArray-Shared
view release on metacpan or search on metacpan
$a->path; $a->memfd; $a->sync; $a->unlink; # or Class->unlink($path)
"sync" flushes the mapping to its backing store (a no-op for anonymous
and memfd arrays, which have none); "unlink" removes the backing file
(also callable as "Class->unlink($path)"); "path" returns the backing
path ("undef" for anonymous, memfd, or fd-reopened arrays); "memfd"
returns the backing descriptor -- the memfd of a "new_memfd" array or
the dup'd fd of a "new_from_fd" array, and -1 for file-backed or
anonymous arrays.
STATS
stats() returns a hashref describing the array:
* "dtype" -- the dtype name string.
* "ndim" -- the number of dimensions.
* "size" -- the total element count.
* "itemsize" -- bytes per element.
* "shape" -- an arrayref of the dimension sizes.
* "ops" -- running count of operations that took the write lock (every
"set", "set_flat", "fill", "zero", "reshape", "add_scalar",
"mul_scalar", "add", "subtract", "multiply").
* "mmap_size" -- bytes of the shared mapping.
PDL INTEROP
If PDL is installed the array converts to and from PDL ndarrays. PDL is
an optional, load-on-demand dependency -- there is no build- or runtime
prereq; the four conversion methods ("to_pdl", "from_pdl",
"update_from_pdl", "as_pdl_alias") "croak" if PDL is missing, while
"buffer" and "update_from_bytes" have no PDL dependency. Each dtype maps
to a PDL type of the same byte width ("f64" to "double", "i32" to
"long", "u64" to "ulonglong", and so on), so the data moves with no
per-element conversion.
Axis order: this array is row-major (C-order) while PDL's dim(0) is the
fastest-varying axis, so the shape is reversed across the boundary -- an
"($r, $c)" array corresponds to PDL dims "($c, $r)", and
"$piddle->at($j, $i)" is "$array->get($i, $j)". The conversion methods
handle this for you.
* "$piddle = $array->to_pdl"
A new piddle holding a copy of the data, of the mapped PDL type and
dims reverse($array->shape). Read under the lock, so it is a
consistent snapshot.
* "$array = Data::NDArray::Shared->from_pdl($piddle, $path)"
A new shared array copied from $piddle (made physical and contiguous
first); the dtype and shape follow the piddle's type and "reverse"
of its dims. $path is the backing file ("undef" or omitted for an
anonymous mapping).
* "$array->update_from_pdl($piddle)"
Copy $piddle into this array in place (write-locked). The piddle's
type must match the dtype and its dims must equal
reverse($array->shape), else it croaks. Returns the array.
* "$piddle = $array->as_pdl_alias"
A piddle that aliases the shared mapping with no copy (a real
"PDL_DONTTOUCHDATA" ndarray over our memory): an in-place PDL
operation ("$p .= ...", "$p->inplace->...") writes straight through
to shared memory -- visible to every process that maps it -- and
reads see live data. The array is kept alive for as long as the
piddle.
This one method needs PDL at build time (it is compiled against
PDL's C API): if the module was installed without PDL present it
"croak"s, while the copy methods above keep working through a
runtime "require PDL". Reinstall with PDL installed to enable it.
Caveats. The alias bypasses the rwlock: you must coordinate access
yourself (no other process mutating concurrently), as with any
unlocked shared-memory view. Do not resize or retype the alias (a
reshape that grows it, a type conversion) -- it is a fixed window
onto the mapping; use "to_pdl"/"from_pdl" when you want an
independent, resizable copy.
* "$bytes = $array->buffer"
The raw contiguous data region as a byte string (read-locked
snapshot), row-major C-order -- useful on its own for serialization
or IPC, and the basis for "to_pdl".
"$array->update_from_bytes($bytes)" is the inverse (write-locked;
the string must be exactly "size * itemsize" bytes).
See eg/pdl_interop.pl for a worked example, including a cross-process
PDL transform on one shared array.
SHARING ACROSS PROCESSES
The array lives in a shared mapping, shared the same three ways as the
rest of the family: a backing file (every process calls "new($path,
$dtype, @shape)" on the same path), an anonymous mapping inherited
across "fork", or a memfd whose descriptor is passed to an unrelated
process (over a UNIX socket via "SCM_RIGHTS", or via "/proc/$pid/fd/$n")
and reopened with new_from_fd($fd). Because the mapping is shared, every
process reads and writes the same elements. All mutation is serialized
by the write lock, so a set of disjoint writers produces a well-defined
final array regardless of how they interleave.
# parent and children fill disjoint slices of one shared array
my $a = Data::NDArray::Shared->new(undef, "f64", 4000); # before fork
unless (fork) { $a->set_flat($_, $_) for 0 .. 999; exit }
wait;
print $a->get_flat(500), "\n"; # reflects the child's writes
SECURITY
The mmap region is writable by all processes that open it. Do not share
backing files with untrusted processes.
CRASH SAFETY
Mutation is guarded by a futex-based write-preferring rwlock with
PID-encoded ownership; if a holder dies, the next contender detects the
dead owner and recovers. Because each mutation updates the data buffer
(and, for "reshape", a few header words) while holding the lock, a crash
leaves the array consistent up to the last completed operation.
Limitation: PID reuse is not detected (very unlikely in practice).
SEE ALSO
Data::Histogram::Shared, Data::RoaringBitmap::Shared,
Data::DisjointSet::Shared, Data::CountMinSketch::Shared,
Data::HyperLogLog::Shared, Data::BloomFilter::Shared,
Data::Intern::Shared, Data::SortedSet::Shared,
Data::SpatialHash::Shared, and the rest of the "Data::*::Shared" family.
AUTHOR
vividsnow
LICENSE
This is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.
( run in 0.368 second using v1.01-cache-2.11-cpan-bbe5e583499 )