Geo-Index
view release on metacpan or search on metacpan
lib/Geo/Index.pm view on Meta::CPAN
eval { DynaLoader::bootstrap Geo::Index $VERSION; };
#. Note whether C low-level code library is available
my $C_CODE_COMPILED;
if ($@) {
$C_CODE_COMPILED = 0;
} else {
$C_CODE_COMPILED = 1;
}
#. Choose which C function to export
@Geo::Index::EXPORT = qw(); # Symbols to export by default
#@Geo::Index::EXPORT_OK = qw(
# GetCCodeVersion fast_log2_double fast_log2_float ComputeAreaExtrema_float
# ComputeAreaExtrema_double ComputeAreaExtrema_double SetUpDistance_float HaversineDistance_float
# SetUpDistance_double SetUpDistance_double HaversineDistance_double
# );
@Geo::Index::EXPORT_OK = qw(); # Symbols to export on request
=head1 SYNOPSIS
# Create and populate a geographic index
use Geo::Index;
@points = (
{ lat => 1.0, lon => 2.0 },
{ lat => -90.0, lon => 0.0, name => 'South Pole' },
{ lat => 30.0, lon => -20.0, ele => 123.4 }
);
$point = { lat=>10.0, lon=>20.0 };
$index = Geo::Index->new();
$index->IndexPoints( \@points );
$index->Index( $point );
$index->Index( [ 30, 40 ] );
# Search index
%search_options = ( sort_results => 1, radius=>5_000_000 );
$results = $index->Search( [ -80, 20 ], \%search_options );
print "$$results[0]{name}\n"; # Prints 'South Pole'
# Get all points in the southern hemisphere
$results = $index->SearchByBounds( [ -180, -90, 180, 0 ] );
print "$$results[0]{name}\n"; # Also prints 'South Pole'
($closest) = $index->Closest( [ -80, 20 ] );
print "$$closest{name}\n"; # Also prints 'South Pole'
($closest) = $index->Closest( $points[1], { post_condition=>'NONE' } );
print "$$closest{name}\n"; # Also prints 'South Pole'
($farthest) = $index->Farthest( [ 90, 0 ] );
print "$$farthest{name}\n"; # Also prints 'South Pole'
# Compute distance in meters between two points (using haversine formula)
$m = $index->Distance( { lat=>51.507222, lon=>-0.1275 }, [ -6.2, 106.816667 ] );
printf("London to Jakarta: %i km\n", $m / 1000);
$index->DistanceFrom( [ 90, 0 ] );
$m = $index->DistanceTo( $points[1] );
printf("Pole to pole: %i km\n", $m / 1000);
=head1 DESCRIPTION
Geo::Index is a Perl module for creating in-memory geographic points indices.
Once points have been indexed, fast searches can be run.
Efficient searches methods include B<C<L<Search(...)|/Search( ... )>>> to get all points
within a distance from a given point, B<C<L<SearchByBounds(...)|/SearchByBounds( ... )>>> to get all
points in an given area, B<C<L<Closest(...)|/Closest( ... )>>> to get the closest points to a
given point, and B<C<L<Farthest(...)|/Farthest( ... )>>> to get the farthest points from a given
point.
Additional methods are provided to compute distances between arbitrary points
(E<nbsp>B<C<L<Distance(...)|/Distance( ... )>>>, B<C<L<DistanceFrom(...)|/DistanceFrom( ... )>>>, and B<C<L<DistanceTo(...)|/DistanceTo( ... )>>>E<nbsp>)
and to get the size in meters of one degree or the size in degrees of one meter
at a given point (B<C<L<OneDegreeInMeters(...)|/OneDegreeInMeters( ... )>>> and B<C<L<OneMeterInDegrees(...)|/OneMeterInDegrees( ... )>>>,
respectively).
While by default computations are done for the Earth, other bodies can be used
by supplying appropriates radii and circumferences to B<C<L<new(...)|/Geo::Index-E<gt>new( ... )>>>.
=head1 POINTS
Geo::Index works with points on a spherical body. Points are hash references
containing, at a minimum, C<lat> and C<lon> entries which give the point's
position in degrees. Additional hash entries can be present and will be both
ignored and preserved. The C<L<Index(...)|/Index( ... )>>, C<L<IndexPoints(...)|/IndexPoints( ... )>>,
C<L<Search(...)|/Search( ... )>>, C<L<Closest(...)|/Closest( ... )>>, C<L<Farthest(...)|/Farthest( ... )>>,
C<L<Distance(...)|/Distance( ... )>>, C<L<DistanceFrom(...)|/DistanceFrom( ... )>>, and C<L<DistanceTo(...)|/DistanceTo( ... )>>
methods add additional entries in point hashes.
The hash entries used by Geo::Gpx are shown below. Apart from C<lat> and C<lon>
these values are created by Geo::Gpx. Unless noted, these values may be read but
should not be set, altered, or deleted.
=over
=item *
B<C<lat>> - Point's latitude in degrees [ -90 .. 90 ]
=item *
B<C<lon>> - Point longitude in degrees [ -180 .. 180 )
These two values may be changed but the altered point should then be re-indexed
using C<L<Index(...)|/Index( ... )>> before further searches are run.
=item *
B<C<data>> - The optional user data supplied when a point was created
using the array shorthand. This contents of this field may be freely modified
by the user. See C<L<Index(...)|/Index( ... )>> and C<L<IndexPoints(...)|/IndexPoints( ... )>>, below.
=item *
B<C<lat_rad>> - The point's latitude in radians [ -pi/2 .. pi/2 ]
=item *
B<C<lon_rad>> - The point's longitude in radians [ -pi .. pi )
=item *
B<C<circumference>> - Circumference (in meters) of the circle of latitude
that the point falls on. This is computed from the body's equatorial
circumference assuming a spherical (not an oblate) body.
=item *
B<C<search_result_distance>> - Distance (in meters) of point from search
point of previous search. The distance computation assumes a spherical body
and is computed using a ruggedized version of the haversine formula. This
value is only generated when C<L<Search(...)|/Search( ... )>> is called with the C<radius>
or C<sort_results> option. See also C<L<Distance(...)|/Distance( ... )>>, C<L<DistanceFrom(...)|/DistanceFrom( ... )>>,
and C<L<DistanceTo(...)|/DistanceTo( ... )>>.
=item *
B<C<antipode_distance>> - Distance (in meters) of point from search
point's antipode as determined by a previous call to C<L<Farthest(...)|/Farthest( ... )>>.
This distance is computed using a ruggedized version of the haversine formula.
=back
As a convenience, most methods allow points to be specified using a shorthand
notation S<C<[ I<lat>, I<lon> ]>> or S<C<[ I<lat>, I<lon>, I<data> ]>>. Points
given in this notation will be converted to hash-based points. If a point
created using this notation is returned as a search result it will be as a
reference to the hash constructed by Geo::Index and not as a reference to the
original array. To access the data field of a point created using the shorthand
notation use C<$$point{'data'}> where C<$point> is a search result point.
Any fields added to the indexed points by Geo::Index can be removed using
C<L<Sweep(...)|/Sweep( ... )>> and C<L<Vacuum(...)|/Vacuum( ... )>>.
=head1 METHODS
=cut
BEGIN {
} # END BEGIN
use fields qw(index indices positions planetary_radius planetary_diameter polar_circumference equatorial_circumference levels max_level max_size quiet);
=head2 Geo::Index-E<gt>new( ... )
=over
C<$index = Geo::Index-E<gt>new()>;
=over
Create a new empty index using default options: radius and circumferences are those of Earth, C<levels> is set to 20 (~40E<nbsp>m index resolution).
=back
C<$index = Geo::Index-E<gt>new( \@points );>
=over
Create a new index using default options and populate it with the given points.
The points in the array can be in either hash or array notation.
=back
C<$index = Geo::Index-E<gt>new( \%options );>
=over
Create a new empty index using specified options.
=back
C<$index = Geo::Index-E<gt>new( \@points, \%options );>
=over
lib/Geo/Index.pm view on Meta::CPAN
SetUpDistance($self->{planetary_diameter}, $$p0{lat_rad}, $$p0{lon_rad});
}
=head2 DistanceTo( ... )
=over
C<$meters = $index-E<gt>DistanceTo( \%point_2 );>
C<$meters = $index-E<gt>DistanceTo( \@point_2 );>
Returns the distance in meters between the specified point and the one set
earlier with C<L<DistanceFrom(...)|/DistanceFrom( ... )>>.
The haversine function is used to compute the distance. As this assumes a
spherical body the distances returned may show errors. Using the default
options, these errors are up to 0.056% (north - south) or 1.12% (east - west).
Such errors typically start becoming significant at distances over S<20 km>.
B<C<%point_2>> or B<C<@point_2>>
=over
The point to measure distances to
This can be either a hash containing at a minimum a C<lat> and a C<lon> value
(both in degrees) or an array giving the point. See the B<L<Points|/POINTS>>
section above for details.
=back
=back
=cut
# Used by Distance(...)
sub DistanceTo($$) {
my ($self, $p1) = @_;
if (ref $p1 eq 'ARRAY') {
#. Got array; expand arguments into a full point
$p1 = { 'lat'=>$$p1[0], 'lon'=>$$p1[1] };
}
$$p1{lat_rad} = Math::Trig::deg2rad($$p1{lat}) unless ($$p1{lat_rad});
$$p1{lon_rad} = Math::Trig::deg2rad($$p1{lon}) unless ($$p1{lon_rad});
return HaversineDistance($$p1{lat_rad}, $$p1{lon_rad});
}
#. Distance functions
#.
#. Geo::Index uses the haversine formula to compute great circle distances
#. between points.
#.
#. Three versions are supported: a fallback version written in Perl (used if the
#. C versions fail to compile) and two accelerated versions written in C, one
#. using floats and the other using doubles. By default the C float version is
#. used; if it fails to compile then the Perl version is used. Use of a specific
#. version can also be requested with Geo::Index->SetDistanceFunctionType(...).
#.
#. The Perl version uses doubles. When using floats instead of doubles the loss
#. of precision is typically under a meter (about 2 meters in the worst case).
#. #. Compared to the errors inherent to the haversine function, this loss of
#. precision is negligable.
#. Here are the results of benchmarking the three versions on a fairly high-end
#. workstation (higher numbers are better). The test dataset is 1 million random
#. points and each search type was performed once for each point in random order.
#. The same points were used for each test and they were in the same order. All
#. searches returned results as lists except for the 'all points' search which
#. returned a list reference. The default options (Earth, 20-level index) were
#. used for each test. Each version's benchmark was run 32 times; some jitter
#. was observed.
#.
#. Average number of operations per second using each version (rounded):
#. Percentages (in parentheses) are relative to the pure-Perl version.
#.
#. Results for 32 iterations of each test:
#.
#. Operation Perl C double (%) C float (%)
#. -------------------------------- ------ ------------ ------------
#. Add point to index 35861 36067 (101) 36060 (101)
#. Search: return all points 256127 256877 (101) 252116 (98)
#. Search: sort, max 5 6733 8718 (129) 8860 (132)
#. Search: sort, radius 1000, max 5 45364 49063 (108) 49831 (110)
#. Search: sort, radius 1000 45902 49418 (108) 51673 (113)
#. Search: max 5 198499 190404 (96) 204905 (103)
#. Search: radius 1000, max 5 46942 51295 (109) 54554 (116)
#. Search: radius 1000 47908 51941 (108) 55522 (116)
#. These will be set to references to the Perl versions of the functions
my $SetUpDistance_perl = undef;
my $HaversineDistance_perl = undef;
my $ComputeAreaExtrema_perl = undef;
my $fast_log2_perl = undef;
#. Choose whether to use Perl or C code for distance and log2 calculations
#. Default is to use the Perl functions
# Used by new(...)
sub SetDistanceFunctionType($) {
my ($type) = @_;
#. Get function pointers for the Perl versions (if not already recorded)
$SetUpDistance_perl = *SetUpDistance unless (defined $SetUpDistance_perl);
$HaversineDistance_perl = *HaversineDistance unless (defined $HaversineDistance_perl);
$ComputeAreaExtrema_perl = *ComputeAreaExtrema unless (defined $ComputeAreaExtrema_perl);
$fast_log2_perl = *fast_log2 unless (defined $fast_log2_perl);
#. Choose the type of functions to use:
lib/Geo/Index.pm view on Meta::CPAN
return [ @SUPPORTED_CODE ];
}
#. Perl version of the distance functions
# Used by Search(...)
#. For the values that the module is interested in the
#. return value is the same as ceil(log2(n))
sub fast_log2($) {
my ($n) = @_;
my $i = 0;
my $c = 1;
for ( $n = ceil( $n );
$n > $c;
$c<<=1, $i++ ) { }
return $i;
}
#. Perl doesn't have a log2(n) function; if one wants
#. to use it the following performs it:
# Not used internally
sub log2($) {
my ($n) = @_;
return log($n) / log(2);
}
#. Used internally by the Perl versions of SetUpDistance and HaversineDistance
my ( $DistanceFrom_diameter, $DistanceFrom_lat_1, $DistanceFrom_lon_1 );
my $DistanceFrom_cos_lat_1;
#. Specify the point to get distances from
#. Diameter is in meters, Lat and Lon are in radians
#.
#. If possible, this function will be replaced by an equivalent written in C.
sub SetUpDistance($$$) {
my ($new_diameter, $new_lat_1, $new_lon_1) = @_;
$DistanceFrom_diameter = $new_diameter;
$DistanceFrom_lat_1 = $new_lat_1;
$DistanceFrom_lon_1 = $new_lon_1;
$DistanceFrom_cos_lat_1 = cos( $DistanceFrom_lat_1 );
}
#. Returns the approximate distance from previously-set point to specified point
#. Lat and Lon are in radians, return value is in meters
#.
#. If possible, this function will be replaced by an equivalent written in C.
sub HaversineDistance($$) {
my ($lat_0, $lon_0)= @_;
my $sin_lat_diff_over_2 = sin( ( $lat_0 - $DistanceFrom_lat_1 ) / 2.0 );
my $sin_lon_diff_over_2 = sin( ( $lon_0 - $DistanceFrom_lon_1 ) / 2.0 );
my $n = ( $sin_lat_diff_over_2 * $sin_lat_diff_over_2 )
+ (
( $sin_lon_diff_over_2 * $sin_lon_diff_over_2 )
* $DistanceFrom_cos_lat_1
* cos( $lat_0 )
);
#. The haversine formula may get messy around antipodal points so clip to the largest sane value.
if ( $n < 0.0 ) { $n = 0.0; }
return $DistanceFrom_diameter * asin( sqrt($n) );
}
=head2 GetConfiguration( )
=over
C<%configuration = $index-E<gt>GetConfiguration( );>
Returns the running configuration of the Geo::Index object.
See also C<L<GetStatistics(...)|/GetStatistics( )>> and C<examples/show_configuration.pl>
The return value is a hash with the following entries:
=over
B<C<key_type>> - The key type in use:
=over
'C<text>' for text keys (e.g. 'C<12:345,6789>')
'C<numeric>' for 64-bit numeric keys
'C<packed>' for 64-bit numeric keys packed into an 8-byte string
=back
B<C<supported_key_types>> - The types of keys that can be used
=over
Value is a reference to a list of supported key types (as given above).
=back
B<C<code_type>> - The type of low-level code in use:
=over
'C<perl>' for Perl functions
'C<float>' for C functions mostly using C<float> values.
'C<double>' for C functions mostly using C<double> values.
=back
B<C<supported_code_types>> - The types of low-level code that can be used
=over
Value is a reference to a list of supported code types (as given above).
lib/Geo/Index.pm view on Meta::CPAN
To fix this, C<DistanceFrom*(...)> and C<DistanceTo*(...)> would
need to be removed plus C<SetUpDistance*(...)> and
C<HaversineDistance*(...)> would need to be combined into a single
4-argument C<HaversineDistance*(...)> function. Calling code would need
to be modified as appropriate. In terms of performance, the overall cost of
doing this is likely quite low.
=item *
The search code stores distances computed for a specific search into the point
datastructures. If multiple concurrent searches are run against a single index
then distances computed by one search may overwrite those from another search.
This can lead to inconsistent results.
To fix this a per-search distance hash would need to be maintained. This could
have serious performance implications and would preclude returning the point
distances within the point hashes. The distances could, however, be returned in
an additional datastructure.
=item *
Adding and deleting points to/from the index is not atomic. Running e.g. a
search while points are being added or deleted can lead to unpredictable
behavior (up to and including the program crashing).
One could fix this by adding object-level locks:
=over
=item * Block concurrent calls to the C<Index(...)> and C<Unindex(...)>methods
=item * Block calls to the C<Index(...)> and C<Unindex(...)> methods while searches are running
=item * Block calls to C<Search(...)> I<et al.> when the C<Index(...)> or C<Unindex(...)> methods are active
=back
=back
=back
=over
=item *
Including the same point in multiple indices or searches at the same time could
lead to interesting results.
As mentioned above, this is due to the storage of search result distances within
the points and not within the index object. Each search that involves a given
point will likely overwrite its C<search_result_distance> value.
This could be encountered in a number of ways. For example, a search using a
condition function that itself runs a search against the second index could be
problematic. This could be encountered even when using a single index. For
example, if code relies on the distances values from a search it should save a
copy of them before running subsearches against the same set of points. If this
is not done then the distance values from the first search may be overwritten by
those of the subsequent searches.
=item *
Geo::Index uses the spherical haversine formula to compute distances. While
quite fast, its accuracy over long distances is poor, with a worst case error
of about 0.1% (22 km). Since the module already has provision for changing the
backends used for the distance methods, adding a new backend to, for example,
compute accurate distances on e.g. a WGS-84 spheroid would be simple and
straight-forward.
=item *
In places the code can be repetitious or awkward in style. This was done
because, especially in the inner loops, speed has been favoured over clarity.
=back
=head3 Reporting bugs
Please submit any bugs or feature requests either to C<bug-geo-index at rt.cpan.org>,
through L<CPAN's web interface|http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Geo-Index>,
or through L<Github|https://github.com/Alex-Kent/Geo-Index/issues>. In any case I will
receive notification when you do and you will be automatically notified of progress
on your submission as it takes place. Any other comments can be sent to C<akh@cpan.org>.
=head1 VERSION HISTORY
B<0.0.8> (2019-04-10) - Corrected internal links, CPAN release
=over
=item * Corrected POD links to use spaces instead of underscores
=item * Uploaded to CPAN
=back
B<0.0.7> (2019-04-08) - Methods can now be called using snake case
=over
=item * Added method aliases so that CamelCase methods can also be called using snake_case.
=back
B<0.0.6> (2019-04-05) - Bug fixes, additional tests
=over
=item * B<C<GetIntLatLon(...)>>: Fixed off-by-one error at the north pole
This affected B<C<L<Index(...)|/Index( ... )>>> and B<C<L<Search(...)|/Search( ... )>>>.
=item * B<C<GetIntLat(...)>>: Fixed off-by-one error at the north pole
=item * More thorough tests
=item * Improved documentation
=back
( run in 2.711 seconds using v1.01-cache-2.11-cpan-f56aa216473 )