pyarrow.compute.extract_regex

pyarrow.compute.extract_regex(strings, *, memory_pool=None, options=None, pattern)

Extract substrings captured by a regex pattern.

For each string in strings, match the regular expression and, if successful, emit a struct with field names and values coming from the regular expression’s named capture groups. If the input is null or the regular expression fails matching, a null output value is emitted.

Regular expression matching is done using the Google RE2 library.

Parameters
  • strings (Array-like or scalar-like) – Argument to compute function

  • memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the default memory pool.

  • options (pyarrow.compute.ExtractRegexOptions, optional) – Parameters altering compute function semantics.

  • pattern (optional) – Parameter for ExtractRegexOptions constructor. Either options or pattern can be passed, but not both at the same time.