Algorithm-MinPerfHashTwoLevel
view release on metacpan or search on metacpan
lib/Algorithm/MinPerfHashTwoLevel.pm view on Meta::CPAN
use Algorithm::MinPerfHashTwoLevel;
my $buckets= Algorithm::MinPerfHashTwoLevel->compute(\%source_hash);
=head1 DESCRIPTION
This module implements an algorithm to construct (relatively efficiently) a
minimal perfect hash using the "two level" algorithm. A perfect hash is one
which has no collisions for any keys, a minimal perfect hash has exactly the
same number of buckets as it has keys. The "two level" algorithm involves
computing two hash values for each key. The first is used as an initial lookup
into the bucket array to find a mask value which is used to modify the second
hash value to determine the actual bucket to read from. This module computes
the appropriate mask values.
In this implementation only one 64 bit hash value is computed, but the high
and low 32 bits are used as if there were two hash values. The hash function
used is SipHash 1-3. The full 64 bit hash is called h0, the high 32 bits are
called h1, and the low 32 bits are called h2.)
Computing the hash and mask is done in C (via XS).
lib/Algorithm/MinPerfHashTwoLevel.pm view on Meta::CPAN
Whether this key is encoded as utf8. Will be either
0 for "not utf8" or 1 for "is utf8".
=item val_normalized
The raw bytes of the normalized key. (See val_is_utf8).
=item xor_val
The mask to be xor'ed with the second hash (h2) to determine
the actual lookup bucket. If the xor_val for a given bucket
is 0 then the key is not in the hash.
=back
=back
=head2 EXPORT
None by default.
lib/Tie/Hash/MinPerfHashTwoLevel/OnDisk.pm view on Meta::CPAN
U32 variant -> 5
U32 num_buckets -> number of buckets/keys in hash
U32 state_ofs -> offset in file where hash preseeded state is found
U32 table_ofs -> offset in file where bucket table starts
U32 key_flags_ofs -> offset in file where key flags are located
U32 val_flags ofs -> offset in file where val flags are located
U32 str_buf_ofs -> offset in file where strings are located
U64 general_flags -> flags used for this header
U64 reserved -> reserved field.
All "_ofs" values in the header are a multiple of 8, and the relevant sections
maybe be null padded to ensure this is so.
The string buffer contains the comment at str_buf_ofs+1, its length can be found
with strlen(), the comment may NOT contain nulls, and will be null terminated. All
other strings in the table are NOT null padded, the length data stored in the
bucket records should be used to determine the length of the keys and values.
The last 8 bytes of the file contains a hash checksum of the rest of the entire
file. This value is itself 8 byte aligned.
Buckets:
mph_hv_macro.h view on Meta::CPAN
#ifndef PERL_SEEN_HV_MACRO_H /* compile once */
#define PERL_SEEN_HV_MACRO_H
#if IVSIZE == 8
#define CAN64BITHASH
#endif
/*-----------------------------------------------------------------------------
* Endianess, misalignment capabilities and util macros
*
* The following 3 macros are defined in this section. The other macros defined
* are only needed to help derive these 3.
*
* U8TO16_LE(x) Read a little endian unsigned 32-bit int
* U8TO32_LE(x) Read a little endian unsigned 32-bit int
* U8TO28_LE(x) Read a little endian unsigned 32-bit int
* ROTL32(x,r) Rotate x left by r bits
* ROTL64(x,r) Rotate x left by r bits
* ROTR32(x,r) Rotate x right by r bits
* ROTR64(x,r) Rotate x right by r bits
*/
macro. Just C<#define> the macro before including C<ppport.h>:
#define DPPP_NAMESPACE MyOwnNamespace_
#include "ppport.h"
The default namespace is C<DPPP_>.
=back
The good thing is that most of the above can be checked by running
F<ppport.h> on your source code. See the next section for
details.
=head1 EXAMPLES
To verify whether F<ppport.h> is needed for your module, whether you
should make any changes to your code, and whether any special defines
should be used, F<ppport.h> can be run as a Perl script to check your
source code. Simply say:
perl ppport.h
sortsv||5.007003|
space_join_names_mortal|||
ss_dup|||
ssc_add_range|||
ssc_and|||
ssc_anything|||
ssc_clear_locale|||n
ssc_cp_and|||
ssc_finalize|||
ssc_init|||
ssc_intersection|||
ssc_is_anything|||n
ssc_is_cp_posixl_init|||n
ssc_or|||
ssc_union|||
stack_grow|||
start_subparse||5.004000|
stdize_locale|||
strEQ|||
strGE|||
strGT|||
#endif
#if (PERL_BCDVERSION < 0x5031002)
/* Versions prior to this accepted things that are now considered
* malformations, and didn't return -1 on error with warnings enabled
* */
# undef utf8_to_uvchr_buf
#endif
/* This implementation brings modern, generally more restricted standards to
* utf8_to_uvchr_buf. Some of these are security related, and clearly must
* be done. But its arguable that the others need not, and hence should not.
* The reason they're here is that a module that intends to play with the
* latest perls shoud be able to work the same in all releases. An example is
* that perl no longer accepts any UV for a code point, but limits them to
* IV_MAX or below. This is for future internal use of the larger code points.
* If it turns out that some of these changes are breaking code that isn't
* intended to work with modern perls, the tighter restrictions could be
* relaxed. khw thinks this is unlikely, but has been wrong in the past. */
#ifndef utf8_to_uvchr_buf
* an IV can hold */
if (ret > PERL_INT_MAX) {
overflows = 1;
}
# if (PERL_BCDVERSION < 0x5026000)
# ifndef EBCDIC
/* There are bugs in versions earlier than this on non-EBCDIC platforms
* in which it did not detect all instances of overflow, which could be
* a security hole. Also, earlier versions did not allow the overflow
* malformation under any circumstances, and modern ones do. So we
* need to check here. */
else if (curlen > 0 && *s >= 0xFE) {
/* If the main routine detected overflow, great; it returned 0. But if the
* input's first byte indicates it could overflow, we need to verify.
* First, on a 32-bit machine the first byte being at least \xFE
* automatically is overflow */
if (sizeof(ret) < 8) {
* data from C. All statics in extensions should be reworked to use
* this, if you want to make the extension thread-safe. See ext/re/re.xs
* for an example of the use of these macros.
*
* Code that uses these macros is responsible for the following:
* 1. #define MY_CXT_KEY to a unique string, e.g. "DynaLoader_guts"
* 2. Declare a typedef named my_cxt_t that is a structure that contains
* all the data that needs to be interpreter-local.
* 3. Use the START_MY_CXT macro after the declaration of my_cxt_t.
* 4. Use the MY_CXT_INIT macro such that it is called exactly once
* (typically put in the BOOT: section).
* 5. Use the members of the my_cxt_t structure everywhere as
* MY_CXT.member.
* 6. Use the dMY_CXT macro (a declaration) in all the functions that
* access MY_CXT.
*/
#if defined(MULTIPLICITY) || defined(PERL_OBJECT) || \
defined(PERL_CAPI) || defined(PERL_IMPLICIT_CONTEXT)
#ifndef START_MY_CXT
( run in 0.690 second using v1.01-cache-2.11-cpan-39bf76dae61 )