Class TwoStepsGOV3Function<T>

All Implemented Interfaces:
Function<T,Long>, Object2LongFunction<T>, Size64, Serializable, Function<T,Long>, ToLongFunction<T>

public class TwoStepsGOV3Function<T> extends AbstractHashFunction<T> implements Serializable, Size64
A function stored using two GOV3Functions—one for frequent values, and one for infrequent values. This naive idea turns out to be very effective in reducing the function size when the distribution of values is skewed (e.g., as it happens in a TwoStepsLcpMonotoneMinimalPerfectHashFunction).

To create an instance, we perform a pre-scan of the values to be assigned. If possible, we finds the best possible r such that the 2r − 1 most frequent values can be stored in a GOV3Function and suitably remapped when read. The function uses 2r − 1 as an escape symbol for all other values, which are stored in a separate function.

Warning: during the construction phase, a filter will be set on the BucketedHashStore used to store the keys. If you are passing a store, you will have to reset it to its previous state.

Sebastiano Vigna
See Also:
  • Field Details

    • serialVersionUID

      public static final long serialVersionUID
      See Also:
    • n

      protected final long n
      The number of keys.
    • transform

      protected final TransformationStrategy<? super T> transform
      The transformation strategy to turn objects of type T into bit vectors.
    • firstFunction

      protected final GOV3Function<T> firstFunction
      The first function, or null. The special output value escape denotes that secondFunction should be queried instead.
    • secondFunction

      protected final GOV3Function<T> secondFunction
      The second function. All queries for which firstFunction returns escape (or simply all queries, if firstFunction is null) will be rerouted here.
    • remap

      protected final long[] remap
      A mapping from values of the first function to actual values, provided that there is a first function.
    • escape

      protected final int escape
      The escape value returned by firstFunction to suggest that secondFunction should be queried instead, provided that there is a first function.
    • seed

      protected long seed
      The seed to be used when converting keys to signatures.
    • width

      protected final int width
      The width of the output of this function, in bits.
    • rankMean

      protected final double rankMean
      The mean of the rank distribution.
  • Constructor Details

    • TwoStepsGOV3Function

      protected TwoStepsGOV3Function(Iterable<? extends T> keys, TransformationStrategy<? super T> transform, LongBigList values, File tempDir, BucketedHashStore<T> bucketedHashStore) throws IOException
      Creates a new two-step function for the given keys and values.
      keys - the keys in the domain of the function.
      transform - a transformation strategy for the keys.
      values - values to be assigned to each key, in the same order of the iterator returned by keys; if null, the assigned value will the ordinal number of each key.
      tempDir - a temporary directory for the store files, or null for the standard temporary directory.
      bucketedHashStore - a bucketed hash store containing the keys associated with their rank, or null; the store can be unchecked, but in this case keys and transform must be non-null.
  • Method Details