Code-DRY
view release on metacpan or search on metacpan
lib/Code/DRY.pm view on Meta::CPAN
=head2 C<get_next_ranked_index> not yet implemented
This XS function returns the next index number of the sorted lcp values or ~0, if there are no more entries left.
=head2 C<reset_rank_iterator> not yet implemented
This XS function resets the iterator of the sorted lcp values.
=head2 C<get_size>
This XS function returns the size of string (in 8-bit characters) used by the C<build_suffixarray_and_lcp> function.
=head2 C<get_lcp>
This XS function returns a reference of a copy of the lcp array from the C<build_suffixarray_and_lcp> function.
=head2 C<get_sa>
This XS function returns a reference of a copy of the suffix array from the C<build_suffixarray_and_lcp> function.
=head2 C<__get_text>
Internal function
=head2 C<__free_all>
Internal function
=head1 DIAGNOSTICS
=head2 Output messages
Duplicates are reported (as per default reporter) in the following format:
1 duplicate(s) found with a length of 8 (>= 2 lines) and 78 bytes reduced to complete lines:
1. File: t/00_lowlevel.t in lines 57..64 (offsets 1467..1544)
2. File: t/00_lowlevel.t in lines 74..81 (offsets 1865..1942)
=================
...<duplicated content>
=================
=head2 Error messages
This module can die with the following error messages:
=over
=item * "cannot open file $file: $!\n";
The opening of a file for read access failed.
=item * "Error building suffix array:$!\n"
The XS code could not allocate enough memory for the combined file content.
=back
=head1 BUGS AND LIMITATIONS
Probably some, it is new code :-).
Currently the underlying XS code operates with 8-bit characters only.
With Perl source code that seems to work on most texts.
The full extent of masking out submatches has not yet beem implemented.
To report bugs, go to
E<lt>http://rt.cpan.org/NoAuth/Bugs.html?Dist=Code-DRYE<gt>
or send mail to E<lt>bug-Code-DRY#rt.cpan.orgE<gt>
=head1 EXPORTED SYMBOLS
None by default.
=head1 ACKNOWLEDGEMENTS
Thanks to Yuta Mori for providing the C code for the construction of the suffix array
(sais-lite) and to Johannes Fischer for extending it with the efficient
generation of lcp values. I am grateful that both authors provided their work as open
source.
Some code and ideas cribbed from:
Ovid's blog L<http://blogs.perl.org/users/ovid/2012/12/finding-duplicate-code-in-perl.html>
=head1 SEE ALSO
=over
=item * Suffix array construction algorithm: G. Nong, S. Zhang, and W. H. Chan. 'Linear suffix array construction by almost pure induced-sorting', In Proc. DCC, pages 193--202. IEEE Press, 2009
=item * LCP construction algorithm: Johannes Fischer, 'Inducing the LCP-Array' L<http://arxiv.org/abs/1101.3448>
=item * C code: Yuta Mori, sais-lite 2.4.1 at L<http://sites.google.com/site/yuta256/sais>
=item * C code: Johannes Fischer, sais-lite-lcp-master at L<https://github.com/elventear/sais-lite-lcp>
=item * Perl code: Ovid, blog at L<http://blogs.perl.org/users/ovid/2012/12/finding-duplicate-code-in-perl.html>, code at L<https://gist.github.com/Ovid/4231878#file-find_duplicate_code-pl>
=item * Theory: Dan Gusfield, 'Algorithms on String, Trees, and Sequences', Cambridge University Press, 1999, ISBN 978-0521670357
=back
=head1 AUTHOR
Heiko Eißfeldt, E<lt>hexcoder@cpan.orgE<gt>
=head1 COPYRIGHT AND LICENSE
Copyright (C) 2014,2019 by hexcoder
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.8 or,
at your option, any later version of Perl 5 you may have available.
For files salcpis.[ch] from the sais-lite-lcp-master package:
( run in 1.425 second using v1.01-cache-2.11-cpan-39bf76dae61 )