Digest-ssdeep
view release on metacpan or search on metacpan
lib/Digest/ssdeep.pm view on Meta::CPAN
=item *
Adition or deletion: 1
=item *
Sustitution: 2
=back
=item *
Following the original ssdeep algorithm we scale the value so the output be between 0
and 100.
=back
=head1 INTERFACE
This section describes the recommended interface for generating and comparing
ssdeep fuzzy hashes.
=over
=item B<ssdeep_hash>
Calculates the ssdeep hash of the input string.
Usage:
$hash = ssdeep_hash( $string );
or in array context
@hash = ssdeep_hash( $string );
In scalar context it returns a
hash with the format C<bs:hash1:hash2>. Being C<bs> the blocksize, C<hash1>
the fuzzy hash for this blocksize and C<hash2> the hash for double blocksize.
The maximum length of each hash is 64 characters.
In array context it returns the same components above but in a 3 elements array.
=item B<ssdeep_hash_file>
Calculates the hash of a file.
Usage:
$hash = ssdeep_hash_file( "/tmp/malware1.exe" );
This is a convenient function. Returns the same of ssdeep_file in scalar or
array context.
Since this function slurps the whole file into memory, you should not use it in
big files. You should not use this module for big files, use libfuzzy wrapper
instead (L<BUGS AND LIMITATIONS>).
Returns B<undef> on errors.
=item B<ssdeep_compare>
Calculates the matching between two hashes.
Usage. To compare two scalar hashes:
$match = ssdeep_compare( $hashA, $hashB );
To compare two hashes in array format:
$match = ssdeep_compare( \@hashA, \@hashB );
The default is to discard hashes with less than 7 characters common substring.
To override this default and set this limit to any number you can use:
$match = ssdeep_compare( $hashA, $hashB, 4 );
The result is a matching score between 0 and 100. See L<Comparison> for
algorithm details.
=item B<ssdeep_dump_last>
Returns an array with information of the last hash calculation. Useful for
debugging or extended details.
Usage after a calculation:
$hash = ssdeep_hash_file( "/tmp/malware1.exe" );
@details = ssdeep_dump_last();
The output is an array of CSV values.
...
2,125870,187|245|110|27|190|66|97,1393131242,q
1,210575,13|216|13|115|29|52|208,4009217630,e
2,210575,13|216|13|115|29|52|208,4009217630,e
1,210730,61|231|220|179|40|89|210,1069791891,T
1,237707,45|66|251|98|56|138|91,4014305026,C
....
Meaning of the output array:
=over
=item B<Field 1>
Part of the hash which is affected. 1 for the fist part, 2 for the second part.
=item B<Field 2>
Offset of the file where the chunk ends.
=item B<Field 3>
Sequence of 7 characters that triggered the rolling hash.
=item B<Field 4>
Value of the rolling hash at this moment.
=item B<Field 5>
Character output to the fuzzy hash due to this rolling hash trigger.
=back
So we can read it this way:
At byte 125870 of the input file, there is a sequence of these 7 characters:
C<187 245 110 27 190 66 97>. That sequence triggered the second part of the
hash. The FNV hash value of the current chunk is 1393131242 that maps to
character C<q>.
Or this way:
From the 4th row I know the letter C<T> in the first hash comes from the
chunk that started at 210575+1 (the one-starting row before) and ends at
210730. The whole FNV hash of this block was 1069791891.
=back
=head1 BUGS AND LIMITATIONS
=over
=item B<Small blocksize comparison>
Original ssdeep limit the matching of small blocksize hashes. So when comparing
them the matching is limited by its size and is never 100%. This algorithm do
not behaviours that way. Small block sizes hashes are compared as big block
sizes ones.
=item B<Performance>
This is a Pure Perl implementation. The performance is far from optimal. To
calculate hashes more efficiently, please use compiled software like libfuzzy
bindings (L<SEE ALSO>).
=item B<Test 64 bits systems>
This module has not been tested in 64 bit systems yet.
=back
Please report any bugs or feature requests to
C<bug-digest-ssdeep@rt.cpan.org>, or through the web interface at
L<http://rt.cpan.org>.
=head1 SEE ALSO
=over
=item Ssdeep's home page
L<http://ssdeep.sourceforge.net/>
=item Jesse Kornblum's original paper I<Identifying almost identical files using context triggered piecewise hashing>
L<http://dfrws.org/2006/proceedings/12-Kornblum.pdf>
=item I<Data::FuzzyHash> Perl binding of binary libfuzzy libraries
L<https://github.com/hideo55/Data-FuzzyHash>
=item Text::WagnerFischer - An implementation of the Wagner-Fischer edit distance.
L<http://search.cpan.org/perldoc?Text%3A%3AWagnerFischer>
=item FNV hash's description
L<http://www.isthe.com/chongo/tech/comp/fnv/>
=back
=head1 AUTHOR
Reinoso Guzman C<< <reinoso.guzman@gmail.com> >>
=head1 LICENCE AND COPYRIGHT
( run in 2.256 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )