Class SearchDictionaryEncoder<E extends BaseIntVector,D extends ValueVector>

java.lang.Object
org.apache.arrow.algorithm.dictionary.SearchDictionaryEncoder<E,D>
Type Parameters:
E - encoded vector type.
D - decoded vector type, which is also the dictionary type.
All Implemented Interfaces:
DictionaryEncoder<E,D>

public class SearchDictionaryEncoder<E extends BaseIntVector,D extends ValueVector> extends Object implements DictionaryEncoder<E,D>
Dictionary encoder based on searching.
  • Constructor Details

    • SearchDictionaryEncoder

      public SearchDictionaryEncoder(D dictionary, VectorValueComparator<D> comparator)
      Constructs a dictionary encoder.
      Parameters:
      dictionary - the dictionary. It must be in sorted order.
      comparator - the criteria for sorting.
    • SearchDictionaryEncoder

      public SearchDictionaryEncoder(D dictionary, VectorValueComparator<D> comparator, boolean encodeNull)
      Constructs a dictionary encoder.
      Parameters:
      dictionary - the dictionary. It must be in sorted order.
      comparator - the criteria for sorting.
      encodeNull - a flag indicating if null should be encoded. It determines the behaviors for processing null values in the input during encoding. When a null is encountered in the input, 1) If the flag is set to true, the encoder searches for the value in the dictionary, and outputs the index in the dictionary. 2) If the flag is set to false, the encoder simply produces a null in the output.
  • Method Details

    • encode

      public void encode(D input, E output)
      Encodes an input vector by binary search. So the algorithm takes O(n * log(m)) time, where n is the length of the input vector, and m is the length of the dictionary.
      Specified by:
      encode in interface DictionaryEncoder<E extends BaseIntVector,D extends ValueVector>
      Parameters:
      input - the input vector.
      output - the output vector. Note that it must be in a fresh state. At least, all its validity bits should be clear.