pyarrow.plasma.PlasmaClient

class pyarrow.plasma.PlasmaClient

Bases: pyarrow.lib._Weakrefable

The PlasmaClient is used to interface with a plasma store and manager.

The PlasmaClient can ask the PlasmaStore to allocate a new buffer, seal a buffer, and get a buffer. Buffers are referred to by object IDs, which are strings.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__()

Initialize self.

contains(self, ObjectID object_id)

Check if the object is present and sealed in the PlasmaStore.

create(self, ObjectID object_id, …)

Create a new buffer in the PlasmaStore for a particular object ID.

create_and_seal(self, ObjectID object_id, …)

Store a new object in the PlasmaStore for a particular object ID.

debug_string(self)

decode_notifications(self, const uint8_t *buf)

Get the notification from the buffer.

delete(self, object_ids)

Delete the objects with the given IDs from other object store.

disconnect(self)

Disconnect this client from the Plasma store.

evict(self, int64_t num_bytes)

Evict some objects until to recover some bytes.

get(self, object_ids, int timeout_ms=-1[, …])

Get one or more Python values from the object store.

get_buffers(self, object_ids[, timeout_ms, …])

Returns data buffer from the PlasmaStore based on object ID.

get_metadata(self, object_ids[, timeout_ms])

Returns metadata buffer from the PlasmaStore based on object ID.

get_next_notification(self)

Get the next notification from the notification socket.

get_notification_socket(self)

Get the notification socket.

hash(self, ObjectID object_id)

Compute the checksum of an object in the object store.

list(self)

Experimental: List the objects in the store.

put(self, value, ObjectID object_id=None, …)

Store a Python value into the object store.

put_raw_buffer(self, value, …)

Store Python buffer into the object store.

seal(self, ObjectID object_id)

Seal the buffer in the PlasmaStore for a particular object ID.

set_client_options(self, client_name, …)

store_capacity(self)

Get the memory capacity of the store.

subscribe(self)

Subscribe to notifications about sealed objects.

to_capsule(self)

Attributes

store_socket_name

contains(self, ObjectID object_id)

Check if the object is present and sealed in the PlasmaStore.

Parameters

object_id (ObjectID) – A string used to identify an object.

create(self, ObjectID object_id, int64_t data_size, string metadata=b'')

Create a new buffer in the PlasmaStore for a particular object ID.

The returned buffer is mutable until seal is called.

Parameters
  • object_id (ObjectID) – The object ID used to identify an object.

  • size (int) – The size in bytes of the created buffer.

  • metadata (bytes) – An optional string of bytes encoding whatever metadata the user wishes to encode.

Raises
  • PlasmaObjectExists – This exception is raised if the object could not be created because there already is an object with the same ID in the plasma store.

  • PlasmaStoreFull – This exception is raised if the object could: not be created because the plasma store is unable to evict enough objects to create room for it.

create_and_seal(self, ObjectID object_id, string data, string metadata=b'')

Store a new object in the PlasmaStore for a particular object ID.

Parameters
  • object_id (ObjectID) – The object ID used to identify an object.

  • data (bytes) – The object to store.

  • metadata (bytes) – An optional string of bytes encoding whatever metadata the user wishes to encode.

Raises
  • PlasmaObjectExists – This exception is raised if the object could not be created because there already is an object with the same ID in the plasma store.

  • PlasmaStoreFull – This exception is raised if the object could: not be created because the plasma store is unable to evict enough objects to create room for it.

debug_string(self)
decode_notifications(self, const uint8_t *buf)

Get the notification from the buffer.

Returns

  • [ObjectID] – The list of object IDs in the notification message.

  • c_vector[int64_t] – The data sizes of the objects in the notification message.

  • c_vector[int64_t] – The metadata sizes of the objects in the notification message.

delete(self, object_ids)

Delete the objects with the given IDs from other object store.

Parameters

object_ids (list) – A list of strings used to identify the objects.

disconnect(self)

Disconnect this client from the Plasma store.

evict(self, int64_t num_bytes)

Evict some objects until to recover some bytes.

Recover at least num_bytes bytes if possible.

Parameters

num_bytes (int) – The number of bytes to attempt to recover.

get(self, object_ids, int timeout_ms=-1, serialization_context=None)

Get one or more Python values from the object store.

Parameters
  • object_ids (list or ObjectID) – Object ID or list of object IDs associated to the values we get from the store.

  • timeout_ms (int, default -1) – The number of milliseconds that the get call should block before timing out and returning. Pass -1 if the call should block and 0 if the call should return immediately.

  • serialization_context (pyarrow.SerializationContext, default None) – Custom serialization and deserialization context.

Returns

list or object – Python value or list of Python values for the data associated with the object_ids and ObjectNotAvailable if the object was not available.

get_buffers(self, object_ids, timeout_ms=- 1, with_meta=False)

Returns data buffer from the PlasmaStore based on object ID.

If the object has not been sealed yet, this call will block. The retrieved buffer is immutable.

Parameters
  • object_ids (list) – A list of ObjectIDs used to identify some objects.

  • timeout_ms (int) – The number of milliseconds that the get call should block before timing out and returning. Pass -1 if the call should block and 0 if the call should return immediately.

  • with_meta (bool) –

Returns

list – If with_meta=False, this is a list of PlasmaBuffers for the data associated with the object_ids and None if the object was not available. If with_meta=True, this is a list of tuples of PlasmaBuffer and metadata bytes.

get_metadata(self, object_ids, timeout_ms=- 1)

Returns metadata buffer from the PlasmaStore based on object ID.

If the object has not been sealed yet, this call will block. The retrieved buffer is immutable.

Parameters
  • object_ids (list) – A list of ObjectIDs used to identify some objects.

  • timeout_ms (int) – The number of milliseconds that the get call should block before timing out and returning. Pass -1 if the call should block and 0 if the call should return immediately.

Returns

list – List of PlasmaBuffers for the metadata associated with the object_ids and None if the object was not available.

get_next_notification(self)

Get the next notification from the notification socket.

Returns

  • ObjectID – The object ID of the object that was stored.

  • int – The data size of the object that was stored.

  • int – The metadata size of the object that was stored.

get_notification_socket(self)

Get the notification socket.

hash(self, ObjectID object_id)

Compute the checksum of an object in the object store.

Parameters

object_id (ObjectID) – A string used to identify an object.

Returns

bytes – A digest string object’s hash. If the object isn’t in the object store, the string will have length zero.

list(self)

Experimental: List the objects in the store.

Returns

dict – Dictionary from ObjectIDs to an “info” dictionary describing the object. The “info” dictionary has the following entries:

data_size

size of the object in bytes

metadata_size

size of the object metadata in bytes

ref_count

Number of clients referencing the object buffer

create_time

Unix timestamp of the creation of the object

construct_duration

Time the creation of the object took in seconds

state

”created” if the object is still being created and “sealed” if it is already sealed

put(self, value, ObjectID object_id=None, int memcopy_threads=6, serialization_context=None)

Store a Python value into the object store.

Parameters
  • value (object) – A Python object to store.

  • object_id (ObjectID, default None) – If this is provided, the specified object ID will be used to refer to the object.

  • memcopy_threads (int, default 6) – The number of threads to use to write the serialized object into the object store for large objects.

  • serialization_context (pyarrow.SerializationContext, default None) – Custom serialization and deserialization context.

Returns

The object ID associated to the Python object.

put_raw_buffer(self, value, ObjectID object_id=None, string metadata=b'', int memcopy_threads=6)

Store Python buffer into the object store.

Parameters
  • value (Python object that implements the buffer protocol) – A Python buffer object to store.

  • object_id (ObjectID, default None) – If this is provided, the specified object ID will be used to refer to the object.

  • metadata (bytes) – An optional string of bytes encoding whatever metadata the user wishes to encode.

  • memcopy_threads (int, default 6) – The number of threads to use to write the serialized object into the object store for large objects.

Returns

The object ID associated to the Python buffer object.

seal(self, ObjectID object_id)

Seal the buffer in the PlasmaStore for a particular object ID.

Once a buffer has been sealed, the buffer is immutable and can only be accessed through get.

Parameters

object_id (ObjectID) – A string used to identify an object.

set_client_options(self, client_name, int64_t limit_output_memory)
store_capacity(self)

Get the memory capacity of the store.

Returns

int – The memory capacity of the store in bytes.

store_socket_name
subscribe(self)

Subscribe to notifications about sealed objects.

to_capsule(self)