Data-NDArray-Shared

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN

        $a->path; $a->memfd; $a->sync; $a->unlink;   # or Class->unlink($path)

    "sync" flushes the mapping to its backing store (a no-op for anonymous
    and memfd arrays, which have none); "unlink" removes the backing file
    (also callable as "Class->unlink($path)"); "path" returns the backing
    path ("undef" for anonymous, memfd, or fd-reopened arrays); "memfd"
    returns the backing descriptor -- the memfd of a "new_memfd" array or
    the dup'd fd of a "new_from_fd" array, and -1 for file-backed or
    anonymous arrays.

STATS
    stats() returns a hashref describing the array:

    *   "dtype" -- the dtype name string.

    *   "ndim" -- the number of dimensions.

    *   "size" -- the total element count.

    *   "itemsize" -- bytes per element.

    *   "shape" -- an arrayref of the dimension sizes.

    *   "ops" -- running count of operations that took the write lock (every
        "set", "set_flat", "fill", "zero", "reshape", "add_scalar",
        "mul_scalar", "add", "subtract", "multiply").

    *   "mmap_size" -- bytes of the shared mapping.

PDL INTEROP
    If PDL is installed the array converts to and from PDL ndarrays. PDL is
    an optional, load-on-demand dependency -- there is no build- or runtime
    prereq; the four conversion methods ("to_pdl", "from_pdl",
    "update_from_pdl", "as_pdl_alias") "croak" if PDL is missing, while
    "buffer" and "update_from_bytes" have no PDL dependency. Each dtype maps
    to a PDL type of the same byte width ("f64" to "double", "i32" to
    "long", "u64" to "ulonglong", and so on), so the data moves with no
    per-element conversion.

    Axis order: this array is row-major (C-order) while PDL's dim(0) is the
    fastest-varying axis, so the shape is reversed across the boundary -- an
    "($r, $c)" array corresponds to PDL dims "($c, $r)", and
    "$piddle->at($j, $i)" is "$array->get($i, $j)". The conversion methods
    handle this for you.

    *   "$piddle = $array->to_pdl"

        A new piddle holding a copy of the data, of the mapped PDL type and
        dims reverse($array->shape). Read under the lock, so it is a
        consistent snapshot.

    *   "$array = Data::NDArray::Shared->from_pdl($piddle, $path)"

        A new shared array copied from $piddle (made physical and contiguous
        first); the dtype and shape follow the piddle's type and "reverse"
        of its dims. $path is the backing file ("undef" or omitted for an
        anonymous mapping).

    *   "$array->update_from_pdl($piddle)"

        Copy $piddle into this array in place (write-locked). The piddle's
        type must match the dtype and its dims must equal
        reverse($array->shape), else it croaks. Returns the array.

    *   "$piddle = $array->as_pdl_alias"

        A piddle that aliases the shared mapping with no copy (a real
        "PDL_DONTTOUCHDATA" ndarray over our memory): an in-place PDL
        operation ("$p .= ...", "$p->inplace->...") writes straight through
        to shared memory -- visible to every process that maps it -- and
        reads see live data. The array is kept alive for as long as the
        piddle.

        This one method needs PDL at build time (it is compiled against
        PDL's C API): if the module was installed without PDL present it
        "croak"s, while the copy methods above keep working through a
        runtime "require PDL". Reinstall with PDL installed to enable it.

        Caveats. The alias bypasses the rwlock: you must coordinate access
        yourself (no other process mutating concurrently), as with any
        unlocked shared-memory view. Do not resize or retype the alias (a
        reshape that grows it, a type conversion) -- it is a fixed window
        onto the mapping; use "to_pdl"/"from_pdl" when you want an
        independent, resizable copy.

    *   "$bytes = $array->buffer"

        The raw contiguous data region as a byte string (read-locked
        snapshot), row-major C-order -- useful on its own for serialization
        or IPC, and the basis for "to_pdl".
        "$array->update_from_bytes($bytes)" is the inverse (write-locked;
        the string must be exactly "size * itemsize" bytes).

    See eg/pdl_interop.pl for a worked example, including a cross-process
    PDL transform on one shared array.

SHARING ACROSS PROCESSES
    The array lives in a shared mapping, shared the same three ways as the
    rest of the family: a backing file (every process calls "new($path,
    $dtype, @shape)" on the same path), an anonymous mapping inherited
    across "fork", or a memfd whose descriptor is passed to an unrelated
    process (over a UNIX socket via "SCM_RIGHTS", or via "/proc/$pid/fd/$n")
    and reopened with new_from_fd($fd). Because the mapping is shared, every
    process reads and writes the same elements. All mutation is serialized
    by the write lock, so a set of disjoint writers produces a well-defined
    final array regardless of how they interleave.

        # parent and children fill disjoint slices of one shared array
        my $a = Data::NDArray::Shared->new(undef, "f64", 4000);   # before fork
        unless (fork) { $a->set_flat($_, $_) for 0 .. 999; exit }
        wait;
        print $a->get_flat(500), "\n";   # reflects the child's writes

SECURITY
    The mmap region is writable by all processes that open it. Do not share
    backing files with untrusted processes.

CRASH SAFETY
    Mutation is guarded by a futex-based write-preferring rwlock with
    PID-encoded ownership; if a holder dies, the next contender detects the
    dead owner and recovers. Because each mutation updates the data buffer
    (and, for "reshape", a few header words) while holding the lock, a crash
    leaves the array consistent up to the last completed operation.
    Limitation: PID reuse is not detected (very unlikely in practice).

SEE ALSO
    Data::Histogram::Shared, Data::RoaringBitmap::Shared,
    Data::DisjointSet::Shared, Data::CountMinSketch::Shared,
    Data::HyperLogLog::Shared, Data::BloomFilter::Shared,
    Data::Intern::Shared, Data::SortedSet::Shared,
    Data::SpatialHash::Shared, and the rest of the "Data::*::Shared" family.

AUTHOR
    vividsnow

LICENSE
    This is free software; you can redistribute it and/or modify it under
    the same terms as Perl itself.



( run in 0.368 second using v1.01-cache-2.11-cpan-bbe5e583499 )