Class DictionaryEncoder

java.lang.Object
org.apache.arrow.vector.dictionary.DictionaryEncoder

public class DictionaryEncoder extends Object
Encoder/decoder for Dictionary encoded ValueVector. Dictionary encoding produces an integer ValueVector. Each entry in the Vector is index into the dictionary which can hold values of any type.
  • Constructor Details

  • Method Details

    • encode

      public static ValueVector encode(ValueVector vector, Dictionary dictionary)
      Dictionary encodes a vector with a provided dictionary. The dictionary must contain all values in the vector.
      Parameters:
      vector - vector to encode
      dictionary - dictionary used for encoding
      Returns:
      dictionary encoded vector
    • decode

      public static ValueVector decode(ValueVector indices, Dictionary dictionary)
      Decodes a dictionary encoded array using the provided dictionary.
      Parameters:
      indices - dictionary encoded values, must be int type
      dictionary - dictionary used to decode the values
      Returns:
      vector with values restored from dictionary
    • decode

      public static ValueVector decode(ValueVector indices, Dictionary dictionary, BufferAllocator allocator)
      Decodes a dictionary encoded array using the provided dictionary.
      Parameters:
      indices - dictionary encoded values, must be int type
      dictionary - dictionary used to decode the values
      allocator - allocator the decoded values use
      Returns:
      vector with values restored from dictionary
    • getIndexType

      public static ArrowType.Int getIndexType(int valueCount)
      Get the indexType according to the dictionary vector valueCount.
      Parameters:
      valueCount - dictionary vector valueCount.
      Returns:
      index type.
    • encode

      public ValueVector encode(ValueVector vector)
      Encodes a vector with the built hash table in this encoder.
    • decode

      public ValueVector decode(ValueVector indices)
      Decodes a vector with the dictionary in this encoder. decode(ValueVector, Dictionary, BufferAllocator) should be used instead if only decoding is required as it can avoid building the DictionaryHashTable which only makes sense when encoding.