Algorithm-SpatialIndex-Strategy-MedianQuadTree
view release on metacpan or search on metacpan
NAME
Algorithm::SpatialIndex::Strategy::MedianQuadTree - QuadTree splitting
on bucket medians
SYNOPSIS
use Algorithm::SpatialIndex;
my $idx = Algorithm::SpatialIndex->new(
strategy => 'MedianQuadTree',
);
DESCRIPTION
A modified quad tree implementation that I'll call Median Quad Tree
(MQT) in this document. (Not sure if this data structure has a different
name elsewhere.) See "ALGORITHM" below.
For a description of the public interface, see Algorithm::SpatialIndex.
This spatial index strategy uses the default
"Algorithm::SpatialIndex::Strategy::QuadTree" as a base class and gets
away with containing very little code because of that. This also means
that most of the interface and behaviour is inherited.
ALGORITHM
For a basic discussion of quad trees, take a look at the documentation
of the Algorithm::QuadTree module or look it up on Wikipedia. The
following describes how the MQT differs from a normal quad tree and some
implementation details.
Any x/y coordinate pair can be used to divide a rectangular area into
four sub-rectangles. When splitting up a node of an ordinary quad tree,
the center of the quad tree is chosen to split the node into four
sub-nodes. This can be done either when the tree is created (before it
is populated) with a static depth of the tree, or dynamically whenever
the number of items associated with a node becomes too large.
For the MQT, the point which splits a given node into four is chosen to
be the median of all contained item coordinates in each dimension.
This has several consequences:
* Due to the dynamic nature of the coordinate choice when splitting a
node in four, the MQT cannot be of a fixed depth and preallocated. It
needs to grow dynamically as it is filled.
* When the data in the tree is a reasonably general sample of the
underlying distribution of data, then this algorithm will create a
tree of evenly filled buckets, but not necessarily a well balanced
tree. To obtain a reasonable sample of the underlying data
distribution, it is prudent to insert items in random order.
* Due to this behaviour, the tree will create very small nodes/bins
where the most data is concentrated and bins of gradually increasing
size as one moves away from the highest concentrations. A normal quad
tree will have a similar behaviour, but due to the fixed size of the
bins, will "converge" much more slowly.
* If the data is inserted in order of the item coordinates, an ordinary
quad tree is a more efficient. The MQT will make bad choices as the
median of a contiguous subset of the ordered data will not reflect the
overall distribution and the property of having fairly evenly filled
buckets vanishes.
* It is likely that polling a spatial index using an MQT is faster than
an ordinary quad tree if the distribution of data is very different
from uniformity. If in doubt, benchmark.
* If the data is uniform but inserted in random order, the MQT will at
best be equal in performance to a quad tree.
* Filling a dynamically growing MQT has slightly more overhead than
filling a dynamically growing quad tree due to the median calculation,
but it is neither algorithmically slower nor necessarily slower in
practice. Algorithmically, splitting a quad tree node will be O(n) in
the bucket size. Splitting an MQT node will be O(n*log(n)) if a naive
median calculation is used or also O(n) with a linear-time median
calculation (like this implementation).
( run in 1.978 second using v1.01-cache-2.11-cpan-13bb782fe5a )