File-Locate-Iterator
view release on metacpan or search on metacpan
lib/File/Locate/Iterator.pm view on Meta::CPAN
$self->{'fh_start'} = tell $self->{'fh'};
}
return $self;
}
sub _ERROR_BAD_HEADER {
croak 'Invalid database contents (no LOCATE02 header)';
}
sub rewind {
my ($self) = @_;
$self->{'sharelen'} = 0;
$self->{'entry'} = '';
if ($self->{'mref'}) {
$self->{'pos'} = length($header);
} else {
$self->{'fh_start'} > 0
or croak "Cannot seek database";
seek ($self->{'fh'}, $self->{'fh_start'}, 0)
or croak "Cannot seek database: $!";
}
}
# return true if mmap is in use
# (an actual mmap, not the slightly similar 'database_str' option)
# this is meant for internal use as a diagnostic ...
sub _using_mmap {
my ($self) = @_;
return defined $self->{'fm'};
}
# Not yet documented, likely worthwhile as long as it works properly.
# Return empty list for nothing yet? Same as next().
# Return empty list at EOF? At EOF 'entry' is undefed out.
#
# =item C<< $entry = $it->current >>
#
# Return the current entry from the database, meaning the same as the last
# call to C<next> returned. At the start of the database (before the first
# C<next>) or at end of the database the return is an empty list.
#
# while (defined $it->next) {
# ...
# print $it->current,"\n";
# }
#
sub _current {
my ($self) = @_;
if (defined $self->{'entry'}) {
return $self->{'entry'};
} else {
return;
}
}
1;
__END__
=for stopwords filename filenames filesystem slocate filehandle arrayref mmap mmaps seekable PerlIO mmapped XSUB coroutining fd Findutils Ryde wildcard charset wordsize wildcards Taintedness taintedness untaint ie
=head1 NAME
File::Locate::Iterator -- read "locate" database with an iterator
=head1 SYNOPSIS
use File::Locate::Iterator;
my $it = File::Locate::Iterator->new;
while (defined (my $entry = $it->next)) {
print $entry,"\n";
}
=head1 DESCRIPTION
C<File::Locate::Iterator> reads a "locate" database file in iterator style.
Each C<next()> call on the iterator returns the next entry from the
database.
/
/bin
/bin/bash
/bin/cat
Locate databases normally hold filename strings as a way of finding files by
name faster than searching through all directories. Optional glob, suffix
and regexp options on the iterator can restrict the entries returned.
Although it's called a database, the format is only actually a long list of
filenames with some "front coding" compression to save space. There's no
random access and any search requires a scan through the file from the
start. Generally this is still much faster than an equivalent traversal
through the directory structure of an entire file system (C<find> etc).
See F<examples/native.pl> for a simple sample read, or
F<examples/mini-locate.pl> for a whole program like the real C<locate>.
Only "LOCATE02" format files are supported, per current versions of GNU
C<locate>, not the previous "slocate" format.
Iterators from this module are stand-alone and don't need any of the Perl
iterator frameworks. But see L<Iterator::Locate>,
L<Iterator::Simple::Locate> and L<MooseX::Iterator::Locate> to inter-operate
with those others. Those frameworks include ways to grep, map and otherwise
manipulate iterations.
=head2 Forks and Threads
If an iterator using a file handle is cloned to a new thread or to a process
level C<fork()> then generally it can be used by the parent or the child but
not both. The underlying file descriptor position is shared by parent and
child, so when one of them reads it will upset the position for the other.
This sort of thing affects almost all code working with file handles across
C<fork()> and threads. Perhaps some thread C<CLONE> code here could let
threads work correctly (but slower), but a C<fork()> is probably doomed.
Iterators using C<mmap> work correctly for both forks and threads, except
that the size calculation and sharing for C<if_sensible> is not thread-aware
beyond the mmaps existing when the thread is spawned. (C<File::Map> knows
the C<mmap>s across all threads, but currently does not reveal them.)
lib/File/Locate/Iterator.pm view on Meta::CPAN
=head2 Operations
=over 4
=item C<< $entry = $it->next() >>
Return the next entry from the database, or no values at end of file. No
values means C<undef> in scalar context or an empty list in array context so
you can loop with either
while (defined (my $filename = $it->next)) ...
or
while (my ($filename) = $it->next) ...
The return is a byte string since it's normally a filename and Perl handles
filenames as byte strings.
=item C<< $it->rewind() >>
Rewind C<$it> back to the start of the database. The next C<$it-E<gt>next>
call will return the first entry.
This is only possible when the underlying database file or handle is
seekable, ie. C<seek()> works. This means a plain file, or a seekable char
special, or PerlIO layers with seek support.
=back
=head1 ENVIRONMENT VARIABLES
=over 4
=item C<LOCATE_PATH>
Default locate database.
=back
=head1 FILES
=over 4
=item F</var/cache/locate/locatedb>
Default locate database, if C<LOCATE_PATH> environment variable not set.
=back
=head1 OTHER WAYS TO DO IT
C<File::Locate> reads a locate database with callbacks instead. Whether you
want callbacks or an iterator is generally a matter of personal preference.
Iterators let you write your own loop, and can have multiple searches in
progress simultaneously.
The speed of an iterator is about the same as callbacks when
C<File::Locate::Iterator> is built with its XS code.
Iterators are good for cooperative coroutining like C<POE> or C<Gtk> where
state must be held in some sort of variable to be progressed by calls from
the main loop. Note that C<next()> will block on reading from the database,
so the database should generally be a plain file rather than a socket or
something, so as not to hold up a main loop.
If you have the recommended C<File::Map> module then iterators share an
C<mmap()> of the database file. Otherwise the database file is a separate
open handle in each iterator, meaning a file descriptor and PerlIO buffering
each. Sharing a handle and having each seek to its desired position would
be possible, but a seek drops buffered data so would be slower. Maybe some
hairy C<PerlIO> or C<IO::Handle> trickery could transparently share an fd
and keep buffered blocks from multiple file positions.
=head1 SEE ALSO
L<Iterator::Locate>, L<Iterator::Simple::Locate>,
L<MooseX::Iterator::Locate>
L<File::Locate>, L<locate(1)>, L<locatedb(5)>, GNU Findutils manual,
L<File::FnMatch>, L<File::Map>
=head1 HOME PAGE
L<http://user42.tuxfamily.org/file-locate-iterator/index.html>
=head1 COPYRIGHT
Copyright 2009, 2010, 2011, 2012, 2013, 2014, 2017, 2018, 2019 Kevin Ryde
File-Locate-Iterator is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option) any
later version.
File-Locate-Iterator is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along with
File-Locate-Iterator. If not, see http://www.gnu.org/licenses/
=cut
( run in 3.197 seconds using v1.01-cache-2.11-cpan-5a3173703d6 )