Class HashTableDictionaryEncoder<E extends BaseIntVector,D extends ElementAddressableVector>

java.lang.Object
org.apache.arrow.algorithm.dictionary.HashTableDictionaryEncoder<E,D>
Type Parameters:
E - encoded vector type.
D - decoded vector type, which is also the dictionary type.
All Implemented Interfaces:
DictionaryEncoder<E,D>

public class HashTableDictionaryEncoder<E extends BaseIntVector,D extends ElementAddressableVector> extends Object implements DictionaryEncoder<E,D>
Dictionary encoder based on hash table.
  • Constructor Details

    • HashTableDictionaryEncoder

      public HashTableDictionaryEncoder(D dictionary)
      Constructs a dictionary encoder.
      Parameters:
      dictionary - the dictionary.
    • HashTableDictionaryEncoder

      public HashTableDictionaryEncoder(D dictionary, boolean encodeNull)
      Constructs a dictionary encoder.
      Parameters:
      dictionary - the dictionary.
      encodeNull - a flag indicating if null should be encoded. It determines the behaviors for processing null values in the input during encoding/decoding.
    • For encoding, when a null is encountered in the input, 1) If the flag is set to true, the encoder searches for the value in the dictionary, and outputs the index in the dictionary. 2) If the flag is set to false, the encoder simply produces a null in the output.
    • For decoding, when a null is encountered in the input, 1) If the flag is set to true, the decoder should never expect a null in the input. 2) If set to false, the decoder simply produces a null in the output.
    • HashTableDictionaryEncoder

      public HashTableDictionaryEncoder(D dictionary, boolean encodeNull, ArrowBufHasher hasher)
      Constructs a dictionary encoder.
      Parameters:
      dictionary - the dictionary.
      encodeNull - a flag indicating if null should be encoded. It determines the behaviors for processing null values in the input during encoding. When a null is encountered in the input, 1) If the flag is set to true, the encoder searches for the value in the dictionary, and outputs the index in the dictionary. 2) If the flag is set to false, the encoder simply produces a null in the output.
      hasher - the hasher used to calculate the hash code.
  • Method Details

    • encode

      public void encode(D input, E output)
      Encodes an input vector by a hash table. So the algorithm takes O(n) time, where n is the length of the input vector.
      Specified by:
      encode in interface DictionaryEncoder<E extends BaseIntVector,D extends ElementAddressableVector>
      Parameters:
      input - the input vector.
      output - the output vector.