pyarrow.compute.hash_count_distinct

pyarrow.compute.hash_count_distinct(array, group_id_array, *, memory_pool=None, options=None, mode='only_valid')

Count the distinct values in each group.

Whether nulls/values are counted is controlled by CountOptions. NaNs and signed zeroes are not normalized.

Parameters
  • array (Array-like or scalar-like) – Argument to compute function

  • group_id_array (Array-like or scalar-like) – Argument to compute function

  • memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the default memory pool.

  • options (pyarrow.compute.CountOptions, optional) – Parameters altering compute function semantics.

  • mode (optional) – Parameter for CountOptions constructor. Either options or mode can be passed, but not both at the same time.