Class LinearDictionaryEncoder<E extends BaseIntVector,D extends ValueVector>

java.lang.Object
org.apache.arrow.algorithm.dictionary.LinearDictionaryEncoder<E,D>
Type Parameters:
E - encoded vector type.
D - decoded vector type, which is also the dictionary type.
All Implemented Interfaces:
DictionaryEncoder<E,D>

public class LinearDictionaryEncoder<E extends BaseIntVector,D extends ValueVector> extends Object implements DictionaryEncoder<E,D>
Dictionary encoder based on linear search.
  • Constructor Details

    • LinearDictionaryEncoder

      public LinearDictionaryEncoder(D dictionary)
      Constructs a dictionary encoder, with the encode null flag set to false.
      Parameters:
      dictionary - the dictionary. Its entries should be sorted in the non-increasing order of their frequency. Otherwise, the encoder still produces correct results, but at the expense of performance overhead.
    • LinearDictionaryEncoder

      public LinearDictionaryEncoder(D dictionary, boolean encodeNull)
      Constructs a dictionary encoder.
      Parameters:
      dictionary - the dictionary. Its entries should be sorted in the non-increasing order of their frequency. Otherwise, the encoder still produces correct results, but at the expense of performance overhead.
      encodeNull - a flag indicating if null should be encoded. It determines the behaviors for processing null values in the input during encoding. When a null is encountered in the input, 1) If the flag is set to true, the encoder searches for the value in the dictionary, and outputs the index in the dictionary. 2) If the flag is set to false, the encoder simply produces a null in the output.
  • Method Details

    • encode

      public void encode(D input, E output)
      Encodes an input vector by linear search. When the dictionary is sorted in the non-increasing order of the entry frequency, it will have constant time complexity, with no extra memory requirement.
      Specified by:
      encode in interface DictionaryEncoder<E extends BaseIntVector,D extends ValueVector>
      Parameters:
      input - the input vector.
      output - the output vector. Note that it must be in a fresh state. At least, all its validity bits should be clear.