Class Hashes

java.lang.Object
it.unimi.dsi.sux4j.mph.Hashes

public class Hashes extends Object
Basic hash functions.

This class provides a number of hash functions for bit vectors. For this reason, they are sometimes tweaked with respect to the official C code, which can assume at least byte granularity. In particular, we provide:

In particular, SpookyHash is available in the 4-word-state version and in the 12-word-state version. The 12-word-state version delegates to the 4-word-state version for short keys (i.e., less than 12 words).

All functions (with the exception of the 12-word-state version of SpookyHash) provide preprocessing: a linear pass on the bit vector generates a vector storing the internal state of the function after each mixing round. Using the preprocessed state, one can hash any prefix of the bit vector in constant time.

It is your responsibility to use preprocessing correctly: using the state vector of the wrong bit vector will generate unpredictable results.

  • Method Summary

    Modifier and Type
    Method
    Description
    static void
    jenkins(long[] triple, long seed, long[] h)
    Jenkins 64-bit hashing (all three values produced) for a triple of longs.
    static long
    Jenkins 64-bit hashing.
    static long
    jenkins(BitVector bv, long seed)
    Jenkins 64-bit hashing.
    static void
    jenkins(BitVector bv, long seed, long[] h)
    Jenkins 64-bit hashing (all three values produced).
    static long
    jenkins(BitVector bv, long prefixLength, long[] aa, long[] bb, long[] cc)
    Constant-time Jenkins 64-bit hashing for any prefix.
    static void
    jenkins(BitVector bv, long prefixLength, long[] aa, long[] bb, long[] cc, long[] h)
    Constant-time Jenkins 64-bit hashing for any prefix (all three values produced).
    static void
    main(String[] arg)
    A simple test to check the relative speed of various hashes on your architecture.
    static long
    murmur(BitVector bv, long seed)
    MurmurHash 64-bit
    static long
    murmur(BitVector bv, long prefixLength, long[] state)
    Constant-time MurmurHash 64-bit hashing for any prefix.
    static long
    murmur(BitVector bv, long prefixLength, long[] state, long lcp)
    Constant-time MurmurHash 64-bit hashing reusing precomputed state partially.
    static long
    murmur3(BitVector bv, long seed)
    MurmurHash3 64-bit
    static void
    murmur3(BitVector bv, long seed, long[] h)
    MurmurHash3 128-bit
    static long
    murmur3(BitVector bv, long prefixLength, long[] hh1, long[] hh2, long[] cc1, long[] cc2)
    Constant-time MurmurHash3 64-bit hashing for any prefix.
    static long
    murmur3(BitVector bv, long prefixLength, long[] hh1, long[] hh2, long[] cc1, long[] cc2, long lcp)
    Constant-time MurmurHash3 64-bit hashing reusing precomputed state partially.
    static void
    murmur3(BitVector bv, long prefixLength, long[] hh1, long[] hh2, long[] cc1, long[] cc2, long[] h)
    Constant-time MurmurHash3 128-bit hashing for any prefix.
    static void
    murmur3(BitVector bv, long prefixLength, long[] hh1, long[] hh2, long[] cc1, long[] cc2, long lcp, long[] h)
    Constant-time MurmurHash3 128-bit hashing reusing precomputed state partially.
    static long[][]
    preprocessJenkins(BitVector bv, long seed)
    Preprocesses a bit vector so that Jenkins 64-bit hashing can be computed in constant time on all prefixes.
    static long[]
    preprocessMurmur(BitVector bv, long seed)
    Preprocesses a bit vector so that MurmurHash 64-bit can be computed in constant time on all prefixes.
    static long[][]
    preprocessMurmur3(BitVector bv, long seed)
    Preprocesses a bit vector so that MurmurHash3 can be computed in constant time on all prefixes.
    static long[]
    preprocessSpooky4(BitVector bv, long seed)
    Preprocesses a bit vector so that SpookyHash 4-word-state can be computed in constant time on all prefixes.
    static void
    spooky12(BitVector bv, long seed, long[] tuple)
    SpookyHash 12-word-state (up to four values produced).
    static void
    spooky4(long[] triple, long seed, long[] tuple)
    SpookyHash (up to four values produced) for a triple of longs.
    static void
    spooky4(long x, long y, long seed, long[] tuple)
    SpookyHash (up to four values produced) for a pair of longs.
    static long
    spooky4(BitVector bv, long seed)
    SpookyHash 4-word-state (up to four values produced).
    static void
    spooky4(BitVector bv, long seed, long[] tuple)
    SpookyHash 4-word-state (up to four values produced).
    static long
    spooky4(BitVector bv, long prefixLength, long seed, long[] state)
    Constant-time SpookyHash hashing reusing precomputed state.
    static void
    spooky4(BitVector bv, long prefixLength, long seed, long[] state, long[] tuple)
    Constant-time SpookyHash 4-word-state hashing reusing precomputed state.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • jenkins

      public static void jenkins(BitVector bv, long seed, long[] h)
      Jenkins 64-bit hashing (all three values produced).

      This code is based on the lookup8.c, and in particular on the version consuming 64 bits at a time, but it has been slightly modified to work correctly with any bit vector length (not just multiples of 64). Moreover, we return all three values generated by the algorithm.

      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      h - a triple of long values in which the three generated hashes will be saved.
    • jenkins

      public static long jenkins(BitVector bv, long seed)
      Jenkins 64-bit hashing.
      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      Returns:
      the third of the three hash values returned by jenkins(BitVector, long, long[]).
    • jenkins

      public static long jenkins(BitVector bv)
      Jenkins 64-bit hashing.
      Parameters:
      bv - a bit vector.
      Returns:
      the third of the three hash values returned by jenkins(BitVector, long, long[]) with seed 0.
    • preprocessJenkins

      public static long[][] preprocessJenkins(BitVector bv, long seed)
      Preprocesses a bit vector so that Jenkins 64-bit hashing can be computed in constant time on all prefixes.
      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      Returns:
      an array of three element; each element is an array containing the state of the variables a, b and c during the hash computation; these vector must be passed to jenkins(BitVector, long, long[], long[], long[]) (and analogous methods) in this order.
      See Also:
    • jenkins

      public static void jenkins(BitVector bv, long prefixLength, long[] aa, long[] bb, long[] cc, long[] h)
      Constant-time Jenkins 64-bit hashing for any prefix (all three values produced).
      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which the hash must be computed.
      aa - the first state array returned by preprocessJenkins(BitVector, long).
      bb - the second state array returned by preprocessJenkins(BitVector, long).
      cc - the third state array returned by preprocessJenkins(BitVector, long).
      h - a triple of long values in which the three generated hashes will be saved.
    • jenkins

      public static long jenkins(BitVector bv, long prefixLength, long[] aa, long[] bb, long[] cc)
      Constant-time Jenkins 64-bit hashing for any prefix.
      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which the hash must be computed.
      aa - the first state array returned by preprocessJenkins(BitVector, long).
      bb - the second state array returned by preprocessJenkins(BitVector, long).
      cc - the third state array returned by preprocessJenkins(BitVector, long).
      Returns:
      the third of the three hash values returned by jenkins(BitVector, long, long[], long[], long[], long[]).
    • jenkins

      public static void jenkins(long[] triple, long seed, long[] h)
      Jenkins 64-bit hashing (all three values produced) for a triple of longs.
      Parameters:
      triple - three longs.
      seed - a seed for the hash.
      h - a triple of long values in which the three generated hashes will be saved.
    • murmur

      public static long murmur(BitVector bv, long seed)
      MurmurHash 64-bit

      This code is based on a mix of the sources that can be found at MurmurHash's web site, and in particular on the version consuming 64 bits at a time, which has been merged with the 2A version to obtain an incremental implementation.

      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      Returns:
      the hash.
    • murmur

      public static long murmur(BitVector bv, long prefixLength, long[] state)
      Constant-time MurmurHash 64-bit hashing for any prefix.
      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which the hash must be computed.
      state - the state array returned by preprocessMurmur(BitVector, long).
      Returns:
      the hash for the prefix of bv or prefixLength bits.
    • murmur

      public static long murmur(BitVector bv, long prefixLength, long[] state, long lcp)
      Constant-time MurmurHash 64-bit hashing reusing precomputed state partially.
      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which the hash must be computed.
      state - the state array returned by preprocessMurmur(BitVector, long).
      lcp - the length of the longest common prefix between bv and the vector over which state was computed.
      Returns:
      the hash for the prefix of bv or prefixLength bits.
    • preprocessMurmur

      public static long[] preprocessMurmur(BitVector bv, long seed)
      Preprocesses a bit vector so that MurmurHash 64-bit can be computed in constant time on all prefixes.
      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      Returns:
      an array containing the state of the variables h during the hash computation; these vector must be passed to murmur(BitVector, long, long[]) (and analogous methods) in this order.
      See Also:
    • murmur3

      public static void murmur3(BitVector bv, long seed, long[] h)
      MurmurHash3 128-bit

      This code is based on a mix of the sources that can be found at MurmurHash's web site.

      Warning: this code is still experimental.

      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      h - a pair of long values in which the two generated hashes will be saved.
    • murmur3

      public static long murmur3(BitVector bv, long seed)
      MurmurHash3 64-bit

      This code is based on a mix of the sources that can be found at MurmurHash's web site.

      Warning: this code is still experimental.

      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      Returns:
      a hash.
    • murmur3

      public static void murmur3(BitVector bv, long prefixLength, long[] hh1, long[] hh2, long[] cc1, long[] cc2, long[] h)
      Constant-time MurmurHash3 128-bit hashing for any prefix.

      Warning: this code is still experimental.

      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which the hash must be computed.
      hh1 - the first component of the state array returned by preprocessMurmur3(BitVector, long).
      hh2 - the second component of the state array returned by preprocessMurmur3(BitVector, long).
      cc1 - the third component of the state array returned by preprocessMurmur3(BitVector, long).
      cc2 - the fourth component of the state array returned by preprocessMurmur3(BitVector, long).
      h - a pair of long values in which the two generated hashes will be saved.
    • murmur3

      public static long murmur3(BitVector bv, long prefixLength, long[] hh1, long[] hh2, long[] cc1, long[] cc2)
      Constant-time MurmurHash3 64-bit hashing for any prefix.

      Warning: this code is still experimental.

      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which the hash must be computed.
      hh1 - the first component of the state array returned by preprocessMurmur3(BitVector, long).
      hh2 - the second component of the state array returned by preprocessMurmur3(BitVector, long).
      cc1 - the third component of the state array returned by preprocessMurmur3(BitVector, long).
      cc2 - the fourth component of the state array returned by preprocessMurmur3(BitVector, long).
      Returns:
      a hash.
    • murmur3

      public static void murmur3(BitVector bv, long prefixLength, long[] hh1, long[] hh2, long[] cc1, long[] cc2, long lcp, long[] h)
      Constant-time MurmurHash3 128-bit hashing reusing precomputed state partially.

      Warning: this code is still experimental.

      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which the hash must be computed.
      hh1 - the first component of the state array returned by preprocessMurmur3(BitVector, long).
      hh2 - the second component of the state array returned by preprocessMurmur3(BitVector, long).
      cc1 - the third component of the state array returned by preprocessMurmur3(BitVector, long).
      cc2 - the fourth component of the state array returned by preprocessMurmur3(BitVector, long).
      lcp - the length of the longest common prefix between bv and the vector over which state was computed.
      h - a pair of long values in which the two generated hashes will be saved.
    • murmur3

      public static long murmur3(BitVector bv, long prefixLength, long[] hh1, long[] hh2, long[] cc1, long[] cc2, long lcp)
      Constant-time MurmurHash3 64-bit hashing reusing precomputed state partially.

      Warning: this code is still experimental.

      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which the hash must be computed.
      hh1 - the first component of the state array returned by preprocessMurmur3(BitVector, long).
      hh2 - the second component of the state array returned by preprocessMurmur3(BitVector, long).
      cc1 - the third component of the state array returned by preprocessMurmur3(BitVector, long).
      cc2 - the fourth component of the state array returned by preprocessMurmur3(BitVector, long).
      lcp - the length of the longest common prefix between bv and the vector over which state was computed.
      Returns:
      a hash.
    • preprocessMurmur3

      public static long[][] preprocessMurmur3(BitVector bv, long seed)
      Preprocesses a bit vector so that MurmurHash3 can be computed in constant time on all prefixes.

      Warning: this code is still experimental.

      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      Returns:
      an array of four component arrays containing the state of the variables h1, h2, c1 and c2 during the hash computation; these vector must be passed to murmur3(BitVector, long, long[], long[], long[], long[]) (and analogous methods) in this order.
      See Also:
    • spooky4

      public static void spooky4(BitVector bv, long seed, long[] tuple)
      SpookyHash 4-word-state (up to four values produced).
      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      tuple - a tuple of longs in which up to four generated hashes will be saved.
    • spooky4

      public static long spooky4(BitVector bv, long seed)
      SpookyHash 4-word-state (up to four values produced).
      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      Returns:
      the first of the four available hash values.
    • spooky12

      public static void spooky12(BitVector bv, long seed, long[] tuple)
      SpookyHash 12-word-state (up to four values produced).

      This method will automatically switch to spooky4(BitVector, long, long[]) when the provided bit vector is shorter than 768 bits.

      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      tuple - a tuple of longs in which up to four generated hashes will be saved.
    • preprocessSpooky4

      public static long[] preprocessSpooky4(BitVector bv, long seed)
      Preprocesses a bit vector so that SpookyHash 4-word-state can be computed in constant time on all prefixes.
      Parameters:
      bv - a bit vector.
      seed - a seed for the hash.
      Returns:
      an array containing the four internal words of state during the hash computation; it can be passed to spooky4(BitVector, long, long, long[], long[]) (and analogous methods).
      See Also:
    • spooky4

      public static void spooky4(BitVector bv, long prefixLength, long seed, long[] state, long[] tuple)
      Constant-time SpookyHash 4-word-state hashing reusing precomputed state.
      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which to compute the hash.
      seed - a seed for the hash.
      state - the state vector returned by preprocessSpooky4(BitVector, long); note that seed must be the same.
      tuple - a tuple of longs in which up to four generated hashes will be saved.
    • spooky4

      public static long spooky4(BitVector bv, long prefixLength, long seed, long[] state)
      Constant-time SpookyHash hashing reusing precomputed state.
      Parameters:
      bv - a bit vector.
      prefixLength - the length of the prefix of bv over which to compute the hash. *
      seed - a seed for the hash.
      state - the state vector returned by preprocessSpooky4(BitVector, long); note that seed must be the same.
      Returns:
      the first of the four available hash values.
    • spooky4

      public static void spooky4(long[] triple, long seed, long[] tuple)
      SpookyHash (up to four values produced) for a triple of longs.
      Parameters:
      triple - three longs.
      seed - a seed for the hash.
      tuple - a tuple of longs in which up to four generated hashes will be saved.
    • spooky4

      public static void spooky4(long x, long y, long seed, long[] tuple)
      SpookyHash (up to four values produced) for a pair of longs.
      Parameters:
      x - the first long.
      y - the second long.
      seed - a seed for the hash.
      tuple - a tuple of longs in which up to four generated hashes will be saved.
    • main

      public static void main(String[] arg)
      A simple test to check the relative speed of various hashes on your architecture.
      Parameters:
      arg - the length of the bit vector to hash, and then the number of evaluations.