Crypt-xxHash
view release on metacpan or search on metacpan
ext/xxHash/doc/xxhash_spec.md view on Meta::CPAN
The algorithm then proceeds directly to step 4.
### Step 2. Process stripes
A stripe is a contiguous segment of 16 bytes.
It is evenly divided into 4 _lanes_, of 4 bytes each.
The first lane is used to update accumulator 1, the second lane is used to update accumulator 2, and so on.
Each lane read its associated 32-bit value using __little-endian__ convention.
For each {lane, accumulator}, the update process is called a _round_, and applies the following formula:
```c
accN = accN + (laneN * PRIME32_2);
accN = accN <<< 13;
accN = accN * PRIME32_1;
```
This shuffles the bits so that any bit from input _lane_ impacts several bits in output _accumulator_. All operations are performed modulo 2^32.
Input is consumed one full stripe at a time. Step 2 is looped as many times as necessary to consume the whole input, except for the last remaining bytes which cannot form a stripe (< 16 bytes).
When that happens, move to step 3.
### Step 3. Accumulator convergence
All 4 lane accumulators from the previous steps are merged to produce a single remaining accumulator of the same width (32-bit). The associated formula is as follows:
```c
acc = (acc1 <<< 1) + (acc2 <<< 7) + (acc3 <<< 12) + (acc4 <<< 18);
```
### Step 4. Add input length
The input total length is presumed known at this stage. This step is just about adding the length to accumulator, so that it participates to final mixing.
```c
ext/xxHash/doc/xxhash_spec.md view on Meta::CPAN
The algorithm then proceeds directly to step 4.
### Step 2. Process stripes
A stripe is a contiguous segment of 32 bytes.
It is evenly divided into 4 _lanes_, of 8 bytes each.
The first lane is used to update accumulator 1, the second lane is used to update accumulator 2, and so on.
Each lane read its associated 64-bit value using __little-endian__ convention.
For each {lane, accumulator}, the update process is called a _round_, and applies the following formula:
```c
round(accN,laneN):
accN = accN + (laneN * PRIME64_2);
accN = accN <<< 31;
return accN * PRIME64_1;
```
This shuffles the bits so that any bit from input _lane_ impacts several bits in output _accumulator_. All operations are performed modulo 2^64.
Input is consumed one full stripe at a time. Step 2 is looped as many times as necessary to consume the whole input, except for the last remaining bytes which cannot form a stripe (< 32 bytes).
When that happens, move to step 3.
### Step 3. Accumulator convergence
All 4 lane accumulators from previous steps are merged to produce a single remaining accumulator of same width (64-bit). The associated formula is as follows.
Note that accumulator convergence is more complex than 32-bit variant, and requires to define another function called _mergeAccumulator()_:
```c
mergeAccumulator(acc,accN):
acc = acc xor round(0, accN);
acc = acc * PRIME64_1;
return acc + PRIME64_4;
```
which is then used in the convergence formula:
```c
acc = (acc1 <<< 1) + (acc2 <<< 7) + (acc3 <<< 12) + (acc4 <<< 18);
acc = mergeAccumulator(acc, acc1);
acc = mergeAccumulator(acc, acc2);
acc = mergeAccumulator(acc, acc3);
acc = mergeAccumulator(acc, acc4);
```
### Step 4. Add input length
ext/xxHash/xxhash.h view on Meta::CPAN
* - SSE2
* - ARM NEON
* - WebAssembly SIMD128
* - POWER8 VSX
* - s390x ZVector
* This can be controlled via the @ref XXH_VECTOR macro, but it automatically
* selects the best version according to predefined macros. For the x86 family, an
* automatic runtime dispatcher is included separately in @ref xxh_x86dispatch.c.
*
* XXH3 implementation is portable:
* it has a generic C90 formulation that can be compiled on any platform,
* all implementations generate exactly the same hash value on all platforms.
* Starting from v0.8.0, it's also labelled "stable", meaning that
* any future version will also generate the same hash value.
*
* XXH3 offers 2 variants, _64bits and _128bits.
*
* When only 64 bits are needed, prefer invoking the _64bits variant, as it
* reduces the amount of mixing, resulting in faster speed on small inputs.
* It's also generally simpler to manipulate a scalar return type than a struct.
*
( run in 3.746 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )