pyarrow.Int8Array¶
- class pyarrow.Int8Array¶
- Bases: - IntegerArray- Concrete class for Arrow arrays of int8 data type. - __init__(*args, **kwargs)¶
 - Methods - __init__(*args, **kwargs)- buffers(self)- Return a list of Buffer objects pointing to this array's physical storage. - cast(self[, target_type, safe, options])- Cast array values to another data type - dictionary_encode(self[, null_encoding])- Compute dictionary-encoded representation of array. - diff(self, Array other)- Compare contents of this array against another one. - drop_null(self)- Remove missing values from an array. - equals(self, Array other)- fill_null(self, fill_value)- See - pyarrow.compute.fill_null()for usage.- filter(self, Array mask, *[, ...])- Select values from an array. - format(self, **kwargs)- from_buffers(DataType type, length, buffers)- Construct an Array from a sequence of buffers. - from_pandas(obj[, mask, type])- Convert pandas.Series to an Arrow Array. - get_total_buffer_size(self)- The sum of bytes in each buffer referenced by the array. - index(self, value[, start, end, memory_pool])- Find the first index of a value. - is_null(self, *[, nan_is_null])- Return BooleanArray indicating the null values. - is_valid(self)- Return BooleanArray indicating the non-null values. - slice(self[, offset, length])- Compute zero-copy slice of this array. - sort(self[, order])- Sort the Array - sum(self, **kwargs)- Sum the values in a numerical array. - take(self, indices)- Select values from an array. - to_numpy(self[, zero_copy_only, writable])- Return a NumPy view or copy of this array (experimental). - to_pandas(self[, memory_pool, categories, ...])- Convert to a pandas-compatible NumPy array or DataFrame, as appropriate - to_pylist(self)- Convert to a list of native Python objects. - to_string(self, *, int indent=2, ...)- Render a "pretty-printed" string representation of the Array. - tolist(self)- Alias of to_pylist for compatibility with NumPy. - unique(self)- Compute distinct elements in array. - validate(self, *[, full])- Perform validation checks. - value_counts(self)- Compute counts of unique elements in array. - view(self, target_type)- Return zero-copy "view" of array as another data type. - Attributes - Total number of bytes consumed by the elements of the array. - A relative position into another array's data. - buffers(self)¶
- Return a list of Buffer objects pointing to this array’s physical storage. - To correctly interpret these buffers, you need to also apply the offset multiplied with the size of the stored data type. 
 - cast(self, target_type=None, safe=None, options=None)¶
- Cast array values to another data type - See - pyarrow.compute.cast()for usage.
 - dictionary_encode(self, null_encoding='mask')¶
- Compute dictionary-encoded representation of array. - See - pyarrow.compute.dictionary_encode()for full usage.- Parameters:
- null_encodingstr, default “mask”
- How to handle null entries. 
 
- null_encoding
- Returns:
- encodedDictionaryArray
- A dictionary-encoded version of this array. 
 
- encoded
 
 - diff(self, Array other)¶
- Compare contents of this array against another one. - Return a string containing the result of diffing this array (on the left side) against the other array (on the right side). - Parameters:
- otherArray
- The other array to compare this array with. 
 
- other
- Returns:
- diffstr
- A human-readable printout of the differences. 
 
- diff
 - Examples - >>> import pyarrow as pa >>> left = pa.array(["one", "two", "three"]) >>> right = pa.array(["two", None, "two-and-a-half", "three"]) >>> print(left.diff(right)) - @@ -0, +0 @@ -“one” @@ -2, +1 @@ +null +”two-and-a-half” 
 - drop_null(self)¶
- Remove missing values from an array. 
 - equals(self, Array other)¶
 - fill_null(self, fill_value)¶
- See - pyarrow.compute.fill_null()for usage.
 - filter(self, Array mask, *, null_selection_behavior=u'drop')¶
- Select values from an array. - See - pyarrow.compute.filter()for full usage.- Parameters:
- maskArrayorarray-like
- The boolean mask to filter the array with. 
- null_selection_behaviorstr, default “drop”
- How nulls in the mask should be handled. 
 
- mask
- Returns:
- filteredArray
- An array of the same type, with only the elements selected by the boolean mask. 
 
- filtered
 
 - format(self, **kwargs)¶
 - static from_buffers(DataType type, length, buffers, null_count=-1, offset=0, children=None)¶
- Construct an Array from a sequence of buffers. - The concrete type returned depends on the datatype. - Parameters:
- typeDataType
- The value type of the array. 
- lengthint
- The number of values in the array. 
- buffersList[Buffer]
- The buffers backing this array. 
- null_countint, default -1
- The number of null entries in the array. Negative value means that the null count is not known. 
- offsetint, default 0
- The array’s logical offset (in values, not in bytes) from the start of each buffer. 
- childrenList[Array], defaultNone
- Nested type children with length matching type.num_fields. 
 
- type
- Returns:
- arrayArray
 
- array
 
 - static from_pandas(obj, mask=None, type=None, bool safe=True, MemoryPool memory_pool=None)¶
- Convert pandas.Series to an Arrow Array. - This method uses Pandas semantics about what values indicate nulls. See pyarrow.array for more general conversion from arrays or sequences to Arrow arrays. - Parameters:
- objndarray,pandas.Series,array-like
- maskarray(bool), optional
- Indicate which values are null (True) or not null (False). 
- typepyarrow.DataType
- Explicit type to attempt to coerce to, otherwise will be inferred from the data. 
- safebool, default True
- Check for overflows or other unsafe conversions. 
- memory_poolpyarrow.MemoryPool, optional
- If not passed, will allocate memory from the currently-set default memory pool. 
 
- obj
- Returns:
- arraypyarrow.Arrayorpyarrow.ChunkedArray
- ChunkedArray is returned if object data overflows binary buffer. 
 
- array
 - Notes - Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC. 
 - get_total_buffer_size(self)¶
- The sum of bytes in each buffer referenced by the array. - An array may only reference a portion of a buffer. This method will overestimate in this case and return the byte size of the entire buffer. - If a buffer is referenced multiple times then it will only be counted once. 
 - index(self, value, start=None, end=None, *, memory_pool=None)¶
- Find the first index of a value. - See - pyarrow.compute.index()for full usage.- Parameters:
- valueScalaror object
- The value to look for in the array. 
- startint, optional
- The start index where to look for value. 
- endint, optional
- The end index where to look for value. 
- memory_poolMemoryPool, optional
- A memory pool for potential memory allocations. 
 
- value
- Returns:
- indexInt64Scalar
- The index of the value in the array (-1 if not found). 
 
- index
 
 - is_null(self, *, nan_is_null=False)¶
- Return BooleanArray indicating the null values. 
 - is_valid(self)¶
- Return BooleanArray indicating the non-null values. 
 - nbytes¶
- Total number of bytes consumed by the elements of the array. - In other words, the sum of bytes from all buffer ranges referenced. - Unlike get_total_buffer_size this method will account for array offsets. - If buffers are shared between arrays then the shared portion will be counted multiple times. - The dictionary of dictionary arrays will always be counted in their entirety even if the array only references a portion of the dictionary. 
 - null_count¶
 - offset¶
- A relative position into another array’s data. - The purpose is to enable zero-copy slicing. This value defaults to zero but must be applied on all operations with the physical storage buffers. 
 - slice(self, offset=0, length=None)¶
- Compute zero-copy slice of this array. - Parameters:
- Returns:
- slicedRecordBatch
 
- sliced
 
 - sort(self, order='ascending', **kwargs)¶
- Sort the Array 
 - sum(self, **kwargs)¶
- Sum the values in a numerical array. - See - pyarrow.compute.sum()for full usage.- Parameters:
- **kwargsdict, optional
- Options to pass to - pyarrow.compute.sum().
 
- **kwargs
- Returns:
- sumScalar
- A scalar containing the sum value. 
 
- sum
 
 - take(self, indices)¶
- Select values from an array. - See - pyarrow.compute.take()for full usage.- Parameters:
- indicesArrayorarray-like
- The indices in the array whose values will be returned. 
 
- indices
- Returns:
- takenArray
- An array with the same datatype, containing the taken values. 
 
- taken
 
 - to_numpy(self, zero_copy_only=True, writable=False)¶
- Return a NumPy view or copy of this array (experimental). - By default, tries to return a view of this array. This is only supported for primitive arrays with the same memory layout as NumPy (i.e. integers, floating point, ..) and without any nulls. - Parameters:
- zero_copy_onlybool, default True
- If True, an exception will be raised if the conversion to a numpy array would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types). 
- writablebool, default False
- For numpy arrays created with zero copy (view on the Arrow data), the resulting array is not writable (Arrow data is immutable). By setting this to True, a copy of the array is made to ensure it is writable. 
 
- zero_copy_onlybool, default 
- Returns:
- arraynumpy.ndarray
 
- array
 
 - to_pandas(self, memory_pool=None, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool timestamp_as_object=False, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False, bool safe=True, bool split_blocks=False, bool self_destruct=False, types_mapper=None)¶
- Convert to a pandas-compatible NumPy array or DataFrame, as appropriate - Parameters:
- memory_poolMemoryPool, defaultNone
- Arrow MemoryPool to use for allocations. Uses the default memory pool is not passed. 
- categorieslist, defaultempty
- List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures. 
- strings_to_categoricalbool, default False
- Encode string (UTF8) and binary types to pandas.Categorical. 
- zero_copy_onlybool, default False
- Raise an ArrowException if this function call would require copying the underlying data. 
- integer_object_nullsbool, default False
- Cast integers with nulls to objects 
- date_as_objectbool, default True
- Cast dates to objects. If False, convert to datetime64[ns] dtype. 
- timestamp_as_objectbool, default False
- Cast non-nanosecond timestamps (np.datetime64) to objects. This is useful if you have timestamps that don’t fit in the normal date range of nanosecond timestamps (1678 CE-2262 CE). If False, all timestamps are converted to datetime64[ns] dtype. 
- use_threadsbool, default True
- Whether to parallelize the conversion using multiple threads. 
- deduplicate_objectsbool, default False
- Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower. 
- ignore_metadatabool, default False
- If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present 
- safebool, default True
- For certain data types, a cast is needed in order to store the data in a pandas DataFrame or Series (e.g. timestamps are always stored as nanoseconds in pandas). This option controls whether it is a safe cast or not. 
- split_blocksbool, default False
- If True, generate one internal “block” for each column when creating a pandas.DataFrame from a RecordBatch or Table. While this can temporarily reduce memory note that various pandas operations can trigger “consolidation” which may balloon memory use. 
- self_destructbool, default False
- EXPERIMENTAL: If True, attempt to deallocate the originating Arrow memory while converting the Arrow object to pandas. If you use the object after calling to_pandas with this option it will crash your program. - Note that you may not see always memory usage improvements. For example, if multiple columns share an underlying allocation, memory can’t be freed until all columns are converted. 
- types_mapperfunction, default None
- A function mapping a pyarrow DataType to a pandas ExtensionDtype. This can be used to override the default pandas type for conversion of built-in pyarrow types or in absence of pandas_metadata in the Table schema. The function receives a pyarrow DataType and is expected to return a pandas ExtensionDtype or - Noneif the default conversion should be used for that type. If you have a dictionary mapping, you can pass- dict.getas function.
 
- memory_pool
- Returns:
- pandas.Seriesor- pandas.DataFramedepending on- typeof object
 
 - Examples - >>> import pyarrow as pa >>> import pandas as pd - Convert a Table to pandas DataFrame: - >>> table = pa.table([ ... pa.array([2, 4, 5, 100]), ... pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"]) ... ], names=['n_legs', 'animals']) >>> table.to_pandas() n_legs animals 0 2 Flamingo 1 4 Horse 2 5 Brittle stars 3 100 Centipede >>> isinstance(table.to_pandas(), pd.DataFrame) True - Convert a RecordBatch to pandas DataFrame: - >>> import pyarrow as pa >>> n_legs = pa.array([2, 4, 5, 100]) >>> animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"]) >>> batch = pa.record_batch([n_legs, animals], ... names=["n_legs", "animals"]) >>> batch pyarrow.RecordBatch n_legs: int64 animals: string >>> batch.to_pandas() n_legs animals 0 2 Flamingo 1 4 Horse 2 5 Brittle stars 3 100 Centipede >>> isinstance(batch.to_pandas(), pd.DataFrame) True - Convert a Chunked Array to pandas Series: - >>> import pyarrow as pa >>> n_legs = pa.chunked_array([[2, 2, 4], [4, 5, 100]]) >>> n_legs.to_pandas() 0 2 1 2 2 4 3 4 4 5 5 100 dtype: int64 >>> isinstance(n_legs.to_pandas(), pd.Series) True 
 - to_string(self, *, int indent=2, int top_level_indent=0, int window=10, int container_window=2, bool skip_new_lines=False)¶
- Render a “pretty-printed” string representation of the Array. - Parameters:
- indentint, default 2
- How much to indent the internal items in the string to the right, by default - 2.
- top_level_indentint, default 0
- How much to indent right the entire content of the array, by default - 0.
- windowint
- How many primitive items to preview at the begin and end of the array when the array is bigger than the window. The other items will be ellipsed. 
- container_windowint
- How many container items (such as a list in a list array) to preview at the begin and end of the array when the array is bigger than the window. 
- skip_new_linesbool
- If the array should be rendered as a single line of text or if each element should be on its own line. 
 
- indent
 
 - tolist(self)¶
- Alias of to_pylist for compatibility with NumPy. 
 - type¶
 - unique(self)¶
- Compute distinct elements in array. - Returns:
- uniqueArray
- An array of the same data type, with deduplicated elements. 
 
- unique
 
 - validate(self, *, full=False)¶
- Perform validation checks. An exception is raised if validation fails. - By default only cheap validation checks are run. Pass full=True for thorough validation checks (potentially O(n)). 
 - value_counts(self)¶
- Compute counts of unique elements in array. - Returns:
- StructArray
- An array of <input type “Values”, int64 “Counts”> structs 
 
 
 
