pyarrow.Int64Array

class pyarrow.Int64Array

Bases: pyarrow.lib.IntegerArray

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

buffers(self) Return a list of Buffer objects pointing to this array’s physical storage.
cast(self, target_type, bool safe=True) Cast array values to another data type.
dictionary_encode(self) Compute dictionary-encoded representation of array
equals(self, Array other)
format(self, int indent=0, int window=10)
from_buffers(DataType type, length, buffers) Construct an Array from a sequence of buffers.
from_pandas(obj[, mask, type]) Convert pandas.Series to an Arrow Array, using pandas’s semantics about what values indicate nulls.
isnull(self)
slice(self[, offset, length]) Compute zero-copy slice of this array
to_numpy(self) Experimental: return a NumPy view of this array.
to_pandas(self, …) Convert to a NumPy array object suitable for use in pandas.
to_pylist(self) Convert to a list of native Python objects.
unique(self) Compute distinct elements in array
validate(self) Perform any validation checks implemented by arrow::ValidateArray.

Attributes

null_count
offset A relative position into another array’s data, to enable zero-copy slicing.
type
buffers(self)

Return a list of Buffer objects pointing to this array’s physical storage.

To correctly interpret these buffers, you need to also apply the offset multiplied with the size of the stored data type.

cast(self, target_type, bool safe=True)

Cast array values to another data type.

Example

>>> from datetime import datetime
>>> import pyarrow as pa
>>> arr = pa.array([datetime(2010, 1, 1), datetime(2015, 1, 1)])
>>> arr.type
TimestampType(timestamp[us])

You can use pyarrow.DataType objects to specify the target type:

>>> arr.cast(pa.timestamp('ms'))
<pyarrow.lib.TimestampArray object at 0x10420eb88>
[
  1262304000000,
  1420070400000
]
>>> arr.cast(pa.timestamp('ms')).type
TimestampType(timestamp[ms])

Alternatively, it is also supported to use the string aliases for these types:

>>> arr.cast('timestamp[ms]')
<pyarrow.lib.TimestampArray object at 0x10420eb88>
[
  1262304000000,
  1420070400000
]
>>> arr.cast('timestamp[ms]').type
TimestampType(timestamp[ms])
Parameters:
  • target_type (DataType) – Type to cast to
  • safe (boolean, default True) – Check for overflows or other unsafe conversions
Returns:

casted (Array)

dictionary_encode(self)

Compute dictionary-encoded representation of array

equals(self, Array other)
format(self, int indent=0, int window=10)
static from_buffers(DataType type, length, buffers, null_count=-1, offset=0)

Construct an Array from a sequence of buffers. The concrete type returned depends on the datatype.

Parameters:
  • type (DataType) – The value type of the array
  • length (int) – The number of values in the array
  • buffers (List[Buffer]) – The buffers backing this array
  • null_count (int, default -1) –
  • offset (int, default 0) – The array’s logical offset (in values, not in bytes) from the start of each buffer
Returns:

array (Array)

static from_pandas(obj, mask=None, type=None, bool safe=True, MemoryPool memory_pool=None)

Convert pandas.Series to an Arrow Array, using pandas’s semantics about what values indicate nulls. See pyarrow.array for more general conversion from arrays or sequences to Arrow arrays.

Parameters:
  • sequence (ndarray, Inded Series) –
  • mask (array (boolean), optional) – Indicate which values are null (True) or not null (False)
  • type (pyarrow.DataType) – Explicit type to attempt to coerce to, otherwise will be inferred from the data
  • safe (boolean, default True) – Check for overflows or other unsafe conversions
  • memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default memory pool

Notes

Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC.

Returns:
  • array (pyarrow.Array or pyarrow.ChunkedArray (if object data)
  • overflows binary buffer)
isnull(self)
null_count
offset

A relative position into another array’s data, to enable zero-copy slicing. This value defaults to zero but must be applied on all operations with the physical storage buffers.

slice(self, offset=0, length=None)

Compute zero-copy slice of this array

Parameters:
  • offset (int, default 0) – Offset from start of array to slice
  • length (int, default None) – Length of slice (default is until end of Array starting from offset)
Returns:

sliced (RecordBatch)

to_numpy(self)

Experimental: return a NumPy view of this array. Only primitive arrays with the same memory layout as NumPy (i.e. integers, floating point), without any nulls, are supported.

Returns:array (numpy.ndarray)
to_pandas(self, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=False)

Convert to a NumPy array object suitable for use in pandas.

Parameters:
  • strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
  • zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
  • integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
  • date_as_object (boolean, default False) – Cast dates to objects
to_pylist(self)

Convert to a list of native Python objects.

Returns:lst (list)
type
unique(self)

Compute distinct elements in array

validate(self)

Perform any validation checks implemented by arrow::ValidateArray. Raises exception with error message if array does not validate

Raises:ArrowInvalid