Algorithm-SpatialIndex-Strategy-MedianQuadTree

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN

NAME
    Algorithm::SpatialIndex::Strategy::MedianQuadTree - QuadTree splitting
    on bucket medians

SYNOPSIS
      use Algorithm::SpatialIndex;
      my $idx = Algorithm::SpatialIndex->new(
        strategy => 'MedianQuadTree',
      );

DESCRIPTION
    A modified quad tree implementation that I'll call Median Quad Tree
    (MQT) in this document. (Not sure if this data structure has a different
    name elsewhere.) See "ALGORITHM" below.

    For a description of the public interface, see Algorithm::SpatialIndex.
    This spatial index strategy uses the default
    "Algorithm::SpatialIndex::Strategy::QuadTree" as a base class and gets
    away with containing very little code because of that. This also means
    that most of the interface and behaviour is inherited.

ALGORITHM
    For a basic discussion of quad trees, take a look at the documentation
    of the Algorithm::QuadTree module or look it up on Wikipedia. The
    following describes how the MQT differs from a normal quad tree and some
    implementation details.

    Any x/y coordinate pair can be used to divide a rectangular area into
    four sub-rectangles. When splitting up a node of an ordinary quad tree,
    the center of the quad tree is chosen to split the node into four
    sub-nodes. This can be done either when the tree is created (before it
    is populated) with a static depth of the tree, or dynamically whenever
    the number of items associated with a node becomes too large.

    For the MQT, the point which splits a given node into four is chosen to
    be the median of all contained item coordinates in each dimension.

    This has several consequences:

    * Due to the dynamic nature of the coordinate choice when splitting a
      node in four, the MQT cannot be of a fixed depth and preallocated. It
      needs to grow dynamically as it is filled.

    * When the data in the tree is a reasonably general sample of the
      underlying distribution of data, then this algorithm will create a
      tree of evenly filled buckets, but not necessarily a well balanced
      tree. To obtain a reasonable sample of the underlying data
      distribution, it is prudent to insert items in random order.

    * Due to this behaviour, the tree will create very small nodes/bins
      where the most data is concentrated and bins of gradually increasing
      size as one moves away from the highest concentrations. A normal quad
      tree will have a similar behaviour, but due to the fixed size of the
      bins, will "converge" much more slowly.

    * If the data is inserted in order of the item coordinates, an ordinary
      quad tree is a more efficient. The MQT will make bad choices as the
      median of a contiguous subset of the ordered data will not reflect the
      overall distribution and the property of having fairly evenly filled
      buckets vanishes.

    * It is likely that polling a spatial index using an MQT is faster than
      an ordinary quad tree if the distribution of data is very different
      from uniformity. If in doubt, benchmark.

    * If the data is uniform but inserted in random order, the MQT will at
      best be equal in performance to a quad tree.

    * Filling a dynamically growing MQT has slightly more overhead than
      filling a dynamically growing quad tree due to the median calculation,
      but it is neither algorithmically slower nor necessarily slower in
      practice. Algorithmically, splitting a quad tree node will be O(n) in
      the bucket size. Splitting an MQT node will be O(n*log(n)) if a naive
      median calculation is used or also O(n) with a linear-time median
      calculation (like this implementation).



( run in 1.978 second using v1.01-cache-2.11-cpan-13bb782fe5a )