List-BinarySearch-XS

 view release on metacpan or  search on metacpan

lib/List/BinarySearch/XS.pm  view on Meta::CPAN

optimal performance.  You are free to use this module directly, or as a plugin
for the more general L<List::BinarySearch>.

The binary search algorithm implemented in this module is known as a
I<Deferred Detection> Binary Search.  Deferred Detection provides
B<stable searches>.  Stable binary search algorithms have the following
characteristics, contrasted with their unstable binary search cousins:

=over 4

=item * In the case of non-unique keys, a stable binary search will always
return the lowest-indexed matching element.  An unstable binary search would
return the first one found, which may not be the chronological first.

=item * Best and worst case time complexity is always O(log n).  Unstable
searches may stop once the target is found, but in the worst case are still
O(log n).  In practical terms, this difference is usually not meaningful.

=item * Stable binary searches only require one relational comparison of a
given pair of data elements per iteration, where unstable binary searches
require two comparisons per iteration.

=item * The net result is that although an unstable binary search might have
better "best case" performance, the fact that a stable binary search gets away
with fewer comparisons per iteration gives it better performance in the worst
case, and approximately equal performance in the average case. By trading away
slightly better "best case" performance, the stable search gains the guarantee
that the element found will always be the lowest-indexed element in a range of
non-unique keys.

=back

=head1 RATIONALE

B<A binary search is pretty simple, right?  Why use a module for this?>

Quoting from
L<Wikipedia|http://en.wikipedia.org/wiki/Binary_search_algorithm>:  I<When Jon
Bentley assigned it as a problem in a course for professional
programmers, he found that an astounding ninety percent failed to code a
binary search correctly after several hours of working on it, and another
study shows that accurate code for it is only found in five out of twenty
textbooks. Furthermore, Bentley's own implementation of binary search,
published in his 1986 book Programming Pearls, contains an error that remained
undetected for over twenty years.>

So the answer to the question "Why use a module for this?" is "So that you
don't have to write, test, and debug your own implementation."

B<< Perl has C<grep>, hashes, and other alternatives, right? >>

Yes, before using this module the user should weigh the other options such as
linear searches ( C<grep> or C<List::Util::first> ), or hash based searches. A
binary search requires an ordered list, so one must weigh the cost of sorting or
maintaining the list in sorted order.  Ordered lists have O(n) time complexity
for inserts.  Binary Searches are best when the data set is already ordered, or
will be searched enough times to justify the cost of an initial sort.

There are cases where a binary search may be an excellent choice. Finding the
first matching element in a list of 1,000,000 items with a linear search would
have a worst-case of 1,000,000 iterations, whereas the worst case for a binary
search of 1,000,000 elements is about 20 iterations.  In fact, if many lookups
will be performed on a seldom-changed list, the savings of O(log n) lookups may
outweigh the cost of sorting or performing occasional linear time inserts.


=head1 EXPORT

Nothing is exported by default.  Upon request this module will export
C<binsearch>, and C<binsearch_pos>.

Or import all functions by specifying the export tag C<:all>.

=head1 SUBROUTINES/METHODS

=head2 WHICH SEARCH ROUTINE TO USE

=over 4

=item * C<binsearch>: Returns lowest index where match is found, or undef.

=item * C<binsearch_pos>: Returns lowest index where match is found, or the
index of the best insert point for needle if the needle isn't found.

=back

=head2 binsearch CODE NEEDLE ARRAY_HAYSTACK

    $first_found_ix = binsearch { $a cmp $b } $needle, @haystack;

Pass a code block, a search target, and an array to search.  Uses
the supplied code block C<$needle> to test the needle against elements
in C<@haystack>.

See the section entitled B<The Callback Block>, below, for an explanation
of how the comparator works
(hint: very similar to C<< sort { $a <=> $b } ... >> ).

Return value will be the lowest index of an element that matches target, or
undef if target isn't found.

=head2 binsearch_pos CODE NEEDLE ARRAY_HAYSTACK

    $first_found_ix = binsearch_pos { $a cmp $b } $needle, @haystack;

The only difference between this function and C<binsearch> is its return
value upon failure.  C<binsearch> returns undef upon failure.
C<binsearch_pos> returns the index of a valid insert point for
C<$needle>.

Pass a code block, a search target, and an array to search.  Uses
the code block to test C<$needle> against elements in C<@haystack>.

Return value is the index of the first element equaling C<$needle>.  If no
element is found, the best insert-point for C<$needle> is returned.


=head2 The callback block (The comparator)

Comparators in L<List::BinarySearch::XS> are used to compare the target (needle)
with individual haystack elements, and should return the result of the
relational comparison of the two values.  A good example would be the code block



( run in 1.581 second using v1.01-cache-2.11-cpan-71847e10f99 )