List-BinarySearch
view release on metacpan or search on metacpan
lib/List/BinarySearch.pm view on Meta::CPAN
=item * In the case of non-unique keys, a stable binary search will always
return the lowest-indexed matching element. An unstable binary search would
return the first one found, which may not be the chronological first.
=item * Best and worst case time complexity is always O(log n). Unstable
searches may stop once the target is found, but in the worst case are still
O(log n). In practical terms, this difference is usually not meaningful.
=item * Stable binary searches only require one relational comparison of a
given pair of data elements per iteration, where unstable binary searches
require two comparisons per iteration.
=item * The net result is that although an unstable binary search might have
better "best case" performance, the fact that a stable binary search gets away
with fewer comparisons per iteration gives it better performance in the worst
case, and approximately equal performance in the average case. By trading away
slightly better "best case" performance, the stable search gains the guarantee
that the element found will always be the lowest-indexed element in a range of
non-unique keys.
=back
B<< This module has a companion "XS" module: L<List::BinarySearch::XS> which
users are strongly encouraged to install as well. >> If List::BinarySearch::XS
is also installed, C<binsearch> and C<binsearch_pos> will use XS code. This
behavior may be overridden by setting C<$ENV{List_BinarySearch_PP}> to a
true value. Most CPAN installers will either automatically install the XS
module, or prompt to automatically install it. See CONFIGURATION for details.
=head1 RATIONALE
B<A binary search is pretty simple, right? Why use a module for this?>
Quoting from
L<Wikipedia|http://en.wikipedia.org/wiki/Binary_search_algorithm>: I<When Jon
Bentley assigned it as a problem in a course for professional
programmers, he found that an astounding ninety percent failed to code a
binary search correctly after several hours of working on it, and another
study shows that accurate code for it is only found in five out of twenty
textbooks. Furthermore, Bentley's own implementation of binary search,
published in his 1986 book Programming Pearls, contains an error that remained
undetected for over twenty years.>
So to answer the question, you might use a module so that you
don't have to write, test, and debug your own implementation.
B<< Perl has C<grep>, hashes, and other alternatives, right? >>
Yes, before using this module the user should weigh the other options such as
linear searches ( C<grep> or C<List::Util::first> ), or hash based searches. A
binary search requires an ordered list, so one must weigh the cost of sorting or
maintaining the list in sorted order. Ordered lists have O(n) time complexity
for inserts. Binary Searches are best when the data set is already ordered, or
will be searched enough times to justify the cost of an initial sort.
There are cases where a binary search may be an excellent choice. Finding the
first matching element in a list of 1,000,000 items with a linear search would
have a worst-case of 1,000,000 iterations, whereas the worst case for a binary
search of 1,000,000 elements is about 20 iterations. In fact, if many lookups
will be performed on a seldom-changed list, the savings of O(log n) lookups may
outweigh the cost of sorting or performing occasional linear time inserts.
=head1 EXPORT
Nothing is exported by default. C<binsearch>, C<binsearch_pos>, and
C<binsearch_range> may be exported by listing them on the export list.
Or import all functions by specifying C<:all>.
=head1 SUBROUTINES/METHODS
=head2 WHICH SEARCH ROUTINE TO USE
=over 4
=item * C<binsearch>: Returns lowest index where match is found, or undef.
=item * C<binsearch_pos>: Returns lowest index where match is found, or the
index of the best insert point for needle if the needle isn't found.
=item * C<binsearch_range>: Performs a search for both low and high needles.
Returns a pair of indices refering to a range of elements corresponding to
low and high needles.
=back
=head2 binsearch CODE NEEDLE ARRAY_HAYSTACK
$first_found_ix = binsearch { $a cmp $b } $needle, @haystack;
Pass a code block, a search target, and an array to search. Uses
the supplied code block C<$needle> to test the needle against elements
in C<@haystack>.
See the section entitled B<The Callback Block>, below, for an explanation
of how the comparator works
(hint: very similar to C<< sort { $a <=> $b } ... >> ).
Return value will be the lowest index of an element that matches target, or
undef if target isn't found.
=head2 binsearch_pos CODE NEEDLE ARRAY_HAYSTACK
$first_found_ix = binsearch_pos { $a cmp $b } $needle, @haystack;
The only difference between this function and C<binsearch> is its return
value upon failure. C<binsearch> returns undef upon failure.
C<binsearch_pos> returns the index of a valid insert point for
C<$needle>.
Pass a code block, a search target, and an array to search. Uses
the code block to test C<$needle> against elements in C<@haystack>.
Return value is the index of the first element equaling C<$needle>. If no
element is found, the best insert-point for C<$needle> is returned.
=head2 binsearch_range CODE LOW_NEEDLE HIGH_NEEDLE ARRAY_HAYSTACK
( run in 0.676 second using v1.01-cache-2.11-cpan-71847e10f99 )