Code-DRY

 view release on metacpan or  search on metacpan

lib/Code/DRY.pm  view on Meta::CPAN

=head2 C<get_next_ranked_index> not yet implemented

This XS function returns the next index number of the sorted lcp values or ~0, if there are no more entries left.

=head2 C<reset_rank_iterator> not yet implemented

This XS function resets the iterator of the sorted lcp values.

=head2 C<get_size>

This XS function returns the size of string (in 8-bit characters) used by the C<build_suffixarray_and_lcp> function.

=head2 C<get_lcp>

This XS function returns a reference of a copy of the lcp array from the C<build_suffixarray_and_lcp> function.

=head2 C<get_sa>

This XS function returns a reference of a copy of the suffix array from the C<build_suffixarray_and_lcp> function.

=head2 C<__get_text>

Internal function

=head2 C<__free_all>

Internal function



=head1 DIAGNOSTICS

=head2 Output messages

Duplicates are reported (as per default reporter) in the following format:

	1 duplicate(s) found with a length of 8 (>= 2 lines) and 78 bytes reduced to complete lines:
	1.  File: t/00_lowlevel.t in lines 57..64 (offsets 1467..1544)
	2.  File: t/00_lowlevel.t in lines 74..81 (offsets 1865..1942)
	=================
	...<duplicated content>
	=================


=head2 Error messages

This module can die with the following error messages:

=over

=item * "cannot open file $file: $!\n";

The opening of a file for read access failed.

=item * "Error building suffix array:$!\n"

The XS code could not allocate enough memory for the combined file content.

=back

=head1 BUGS AND LIMITATIONS

Probably some, it is new code :-).

Currently the underlying XS code operates with 8-bit characters only.
With Perl source code that seems to work on most texts.

The full extent of masking out submatches has not yet beem implemented.

To report bugs, go to
E<lt>http://rt.cpan.org/NoAuth/Bugs.html?Dist=Code-DRYE<gt>
or send mail to E<lt>bug-Code-DRY#rt.cpan.orgE<gt>


=head1 EXPORTED SYMBOLS

None by default.

=head1 ACKNOWLEDGEMENTS

Thanks to Yuta Mori for providing the C code for the construction of the suffix array
(sais-lite) and to Johannes Fischer for extending it with the efficient
generation of lcp values. I am grateful that both authors provided their work as open
source.

Some code and ideas cribbed from:

Ovid's blog L<http://blogs.perl.org/users/ovid/2012/12/finding-duplicate-code-in-perl.html>

=head1 SEE ALSO

=over

=item * Suffix array construction algorithm: G. Nong, S. Zhang, and W. H. Chan. 'Linear suffix array construction by almost pure induced-sorting', In Proc. DCC, pages 193--202. IEEE Press, 2009

=item * LCP construction algorithm: Johannes Fischer, 'Inducing the LCP-Array' L<http://arxiv.org/abs/1101.3448>

=item * C code: Yuta Mori, sais-lite 2.4.1 at L<http://sites.google.com/site/yuta256/sais>

=item * C code: Johannes Fischer, sais-lite-lcp-master at L<https://github.com/elventear/sais-lite-lcp>

=item * Perl code: Ovid, blog at L<http://blogs.perl.org/users/ovid/2012/12/finding-duplicate-code-in-perl.html>, code at L<https://gist.github.com/Ovid/4231878#file-find_duplicate_code-pl>

=item * Theory: Dan Gusfield, 'Algorithms on String, Trees, and Sequences', Cambridge University Press, 1999, ISBN 978-0521670357

=back

=head1 AUTHOR

Heiko Eißfeldt, E<lt>hexcoder@cpan.orgE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2014,2019 by hexcoder

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.8 or,
at your option, any later version of Perl 5 you may have available.

For files salcpis.[ch] from the sais-lite-lcp-master package:



( run in 1.425 second using v1.01-cache-2.11-cpan-39bf76dae61 )