File-Locate-Iterator
view release on metacpan or search on metacpan
lib/File/Locate/Iterator.pm view on Meta::CPAN
# }
#
sub _current {
my ($self) = @_;
if (defined $self->{'entry'}) {
return $self->{'entry'};
} else {
return;
}
}
1;
__END__
=for stopwords filename filenames filesystem slocate filehandle arrayref mmap mmaps seekable PerlIO mmapped XSUB coroutining fd Findutils Ryde wildcard charset wordsize wildcards Taintedness taintedness untaint ie
=head1 NAME
File::Locate::Iterator -- read "locate" database with an iterator
=head1 SYNOPSIS
use File::Locate::Iterator;
my $it = File::Locate::Iterator->new;
while (defined (my $entry = $it->next)) {
print $entry,"\n";
}
=head1 DESCRIPTION
C<File::Locate::Iterator> reads a "locate" database file in iterator style.
Each C<next()> call on the iterator returns the next entry from the
database.
/
/bin
/bin/bash
/bin/cat
Locate databases normally hold filename strings as a way of finding files by
name faster than searching through all directories. Optional glob, suffix
and regexp options on the iterator can restrict the entries returned.
Although it's called a database, the format is only actually a long list of
filenames with some "front coding" compression to save space. There's no
random access and any search requires a scan through the file from the
start. Generally this is still much faster than an equivalent traversal
through the directory structure of an entire file system (C<find> etc).
See F<examples/native.pl> for a simple sample read, or
F<examples/mini-locate.pl> for a whole program like the real C<locate>.
Only "LOCATE02" format files are supported, per current versions of GNU
C<locate>, not the previous "slocate" format.
Iterators from this module are stand-alone and don't need any of the Perl
iterator frameworks. But see L<Iterator::Locate>,
L<Iterator::Simple::Locate> and L<MooseX::Iterator::Locate> to inter-operate
with those others. Those frameworks include ways to grep, map and otherwise
manipulate iterations.
=head2 Forks and Threads
If an iterator using a file handle is cloned to a new thread or to a process
level C<fork()> then generally it can be used by the parent or the child but
not both. The underlying file descriptor position is shared by parent and
child, so when one of them reads it will upset the position for the other.
This sort of thing affects almost all code working with file handles across
C<fork()> and threads. Perhaps some thread C<CLONE> code here could let
threads work correctly (but slower), but a C<fork()> is probably doomed.
Iterators using C<mmap> work correctly for both forks and threads, except
that the size calculation and sharing for C<if_sensible> is not thread-aware
beyond the mmaps existing when the thread is spawned. (C<File::Map> knows
the C<mmap>s across all threads, but currently does not reveal them.)
=head2 Taint Mode
Under taint mode (see L<perlsec/Taint mode>), strings read from a file or
file handle are always tainted, the same as other file input. Taintedness
of a C<database_str> string propagates to the entry strings returned.
For C<database_str_ref>, the initial taintedness of the database string
propagates to the entries. If you untaint it during iteration then
subsequent entries returned are still tainted because the front-coding of
the database format means subsequent entries may still use data back from
when the input was tainted. Perhaps entries should follow an untaint of the
database string, but normally you'd expect an untaint to be worked out
before beginning iteration. In all cases a C<rewind()> will reset to the
new taintedness of the database string.
For reference, taint mode is only a small slowdown for the XS iterator code,
and usually (it seems) only a little more for the pure Perl.
=head2 Other Notes
The locate database format is only designed to be read forwards, hence no
C<prev()> method on the iterator. The start of a previous record can't be
distinguished by its content, and the "front coding" means the state at a
given point may depend on records an arbitrary distance back too. A "tell"
which gave file position plus state would be possible, though perhaps some
"clone" of the whole iterator would be more use.
On some systems, C<mmap()> may be a bit too effective, giving a process more
of the CPU than other processes which make periodic C<read()> system calls.
This is a matter of OS scheduling, but you might have to apply some C<nice>
or C<ionice> if doing a lot of mmapped work (see L<nice(1)>, L<ionice(1)>,
L<perlfunc/setpriority>, and L<ioprio_set(2)>).
=head1 FUNCTIONS
=head2 Constructor
=over 4
=item C<< $it = File::Locate::Iterator->new (key=>value,...) >>
Create and return a new locate database iterator object. The following
optional key/value pairs can be given,
( run in 0.641 second using v1.01-cache-2.11-cpan-71847e10f99 )