Apache Arrow (C++)
A columnar in-memory analytics layer designed to accelerate big data.
Static Public Member Functions | Static Public Attributes | List of all members
arrow::HashUtil Class Reference

Utility class to compute hash values. More...

#include <arrow/util/hash-util.h>

Static Public Member Functions

static uint32_t CrcHash (const void *data, int32_t bytes, uint32_t hash)
 Compute the Crc32 hash for data using SSE4 instructions. More...
 
static uint32_t CrcHash1 (const void *v, uint32_t hash)
 CrcHash() specialized for 1-byte data. More...
 
static uint32_t CrcHash2 (const void *v, uint32_t hash)
 CrcHash() specialized for 2-byte data. More...
 
static uint32_t CrcHash4 (const void *v, uint32_t hash)
 CrcHash() specialized for 4-byte data. More...
 
static uint32_t CrcHash8 (const void *v, uint32_t hash)
 CrcHash() specialized for 8-byte data. More...
 
static uint32_t CrcHash12 (const void *v, uint32_t hash)
 CrcHash() specialized for 12-byte data. More...
 
static uint32_t CrcHash16 (const void *v, uint32_t hash)
 CrcHash() specialized for 16-byte data. More...
 
static uint64_t MurmurHash2_64 (const void *input, int len, uint64_t seed)
 Murmur2 hash implementation returning 64-bit hashes. More...
 
static uint64_t FnvHash64 (const void *data, int32_t bytes, uint64_t hash)
 Implementation of the Fowler-Noll-Vo hash function. More...
 
static uint32_t FnvHash64to32 (const void *data, int32_t bytes, uint32_t hash)
 Return a 32-bit hash computed by invoking FNV-64 and folding the result to 32-bits. More...
 
static uint32_t Hash (const void *data, int32_t bytes, uint32_t seed)
 Computes the hash value for data. More...
 
static uint32_t HashCombine32 (uint32_t value, uint32_t seed)
 Combine hashes 'value' and 'seed' to get a new hash value. More...
 
static uint32_t Rehash32to32 (const uint32_t hash)
 
static uint64_t Rehash32to64 (const uint32_t hash)
 

Static Public Attributes

static const uint64_t MURMUR_PRIME = 0xc6a4a7935bd1e995
 
static const int MURMUR_R = 47
 
static const uint32_t FNV_PRIME = 0x01000193
 default values recommended by http://isthe.com/chongo/tech/comp/fnv/ More...
 
static const uint32_t FNV_SEED = 0x811C9DC5
 
static const uint64_t FNV64_PRIME = 1099511628211UL
 
static const uint64_t FNV64_SEED = 14695981039346656037UL
 
static const uint32_t HASH_COMBINE_SEED = 0x9e3779b9
 The magic number (used in hash_combine()) 0x9e3779b9 = 2^32 / (golden ratio). More...
 

Detailed Description

Utility class to compute hash values.

Member Function Documentation

◆ CrcHash()

static uint32_t arrow::HashUtil::CrcHash ( const void *  data,
int32_t  bytes,
uint32_t  hash 
)
inlinestatic

Compute the Crc32 hash for data using SSE4 instructions.

The input hash parameter is the current hash/seed value. This should only be called if SSE is supported. This is ~4x faster than Fnv/Boost Hash. TODO: crc32 hashes with different seeds do not result in different hash functions. The resulting hashes are correlated. TODO: update this to also use SSE4_crc32_u64 and SSE4_crc32_u16 where appropriate.

◆ CrcHash1()

static uint32_t arrow::HashUtil::CrcHash1 ( const void *  v,
uint32_t  hash 
)
inlinestatic

CrcHash() specialized for 1-byte data.

◆ CrcHash12()

static uint32_t arrow::HashUtil::CrcHash12 ( const void *  v,
uint32_t  hash 
)
inlinestatic

CrcHash() specialized for 12-byte data.

◆ CrcHash16()

static uint32_t arrow::HashUtil::CrcHash16 ( const void *  v,
uint32_t  hash 
)
inlinestatic

CrcHash() specialized for 16-byte data.

◆ CrcHash2()

static uint32_t arrow::HashUtil::CrcHash2 ( const void *  v,
uint32_t  hash 
)
inlinestatic

CrcHash() specialized for 2-byte data.

◆ CrcHash4()

static uint32_t arrow::HashUtil::CrcHash4 ( const void *  v,
uint32_t  hash 
)
inlinestatic

CrcHash() specialized for 4-byte data.

◆ CrcHash8()

static uint32_t arrow::HashUtil::CrcHash8 ( const void *  v,
uint32_t  hash 
)
inlinestatic

CrcHash() specialized for 8-byte data.

◆ FnvHash64()

static uint64_t arrow::HashUtil::FnvHash64 ( const void *  data,
int32_t  bytes,
uint64_t  hash 
)
inlinestatic

Implementation of the Fowler-Noll-Vo hash function.

This is not as performant as boost's hash on int types (2x slower) but has bit entropy. For ints, boost just returns the value of the int which can be pathological. For example, if the data is <1000, 2000, 3000, 4000, ..> and then the mod of 1000 is taken on the hash, all values will collide to the same bucket. For string values, Fnv is slightly faster than boost. IMPORTANT: FNV hash suffers from poor diffusion of the least significant bit, which can lead to poor results when input bytes are duplicated. See FnvHash64to32() for how this can be mitigated.

◆ FnvHash64to32()

static uint32_t arrow::HashUtil::FnvHash64to32 ( const void *  data,
int32_t  bytes,
uint32_t  hash 
)
inlinestatic

Return a 32-bit hash computed by invoking FNV-64 and folding the result to 32-bits.

This technique is recommended instead of FNV-32 since the LSB of an FNV hash is the XOR of the LSBs of its input bytes, leading to poor results for duplicate inputs. The input seed 'hash' is duplicated so the top half of the seed is not all zero. Data length must be at least 1 byte: zero-length data should be handled separately, for example using CombineHash with a unique constant value to avoid returning the hash argument. Zero-length data gives terrible results: the initial hash value is xored with itself cancelling all bits.

◆ Hash()

static uint32_t arrow::HashUtil::Hash ( const void *  data,
int32_t  bytes,
uint32_t  seed 
)
inlinestatic

Computes the hash value for data.

Will call either CrcHash or MurmurHash depending on hardware capabilities. Seed values for different steps of the query execution should use different seeds to prevent accidental key collisions. (See IMPALA-219 for more details).

◆ HashCombine32()

static uint32_t arrow::HashUtil::HashCombine32 ( uint32_t  value,
uint32_t  seed 
)
inlinestatic

Combine hashes 'value' and 'seed' to get a new hash value.

Similar to boost::hash_combine(), but for uint32_t. This function should be used with a constant first argument to update the hash value for zero-length values such as NULL, boolean, and empty strings.

◆ MurmurHash2_64()

static uint64_t arrow::HashUtil::MurmurHash2_64 ( const void *  input,
int  len,
uint64_t  seed 
)
inlinestatic

Murmur2 hash implementation returning 64-bit hashes.

◆ Rehash32to32()

static uint32_t arrow::HashUtil::Rehash32to32 ( const uint32_t  hash)
inlinestatic

◆ Rehash32to64()

static uint64_t arrow::HashUtil::Rehash32to64 ( const uint32_t  hash)
inlinestatic

Member Data Documentation

◆ FNV64_PRIME

const uint64_t arrow::HashUtil::FNV64_PRIME = 1099511628211UL
static

◆ FNV64_SEED

const uint64_t arrow::HashUtil::FNV64_SEED = 14695981039346656037UL
static

◆ FNV_PRIME

const uint32_t arrow::HashUtil::FNV_PRIME = 0x01000193
static

default values recommended by http://isthe.com/chongo/tech/comp/fnv/

◆ FNV_SEED

const uint32_t arrow::HashUtil::FNV_SEED = 0x811C9DC5
static

◆ HASH_COMBINE_SEED

const uint32_t arrow::HashUtil::HASH_COMBINE_SEED = 0x9e3779b9
static

The magic number (used in hash_combine()) 0x9e3779b9 = 2^32 / (golden ratio).

◆ MURMUR_PRIME

const uint64_t arrow::HashUtil::MURMUR_PRIME = 0xc6a4a7935bd1e995
static

◆ MURMUR_R

const int arrow::HashUtil::MURMUR_R = 47
static

The documentation for this class was generated from the following file: