pyarrow.plasma.PlasmaClient

class pyarrow.plasma.PlasmaClient

Bases: object

The PlasmaClient is used to interface with a plasma store and manager.

The PlasmaClient can ask the PlasmaStore to allocate a new buffer, seal a buffer, and get a buffer. Buffers are referred to by object IDs, which are strings.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

contains(self, ObjectID object_id) Check if the object is present and sealed in the PlasmaStore.
create(self, ObjectID object_id, …) Create a new buffer in the PlasmaStore for a particular object ID.
disconnect(self) Disconnect this client from the Plasma store.
evict(self, int64_t num_bytes) Evict some objects until to recover some bytes.
fetch(self, object_ids) Fetch the objects with the given IDs from other plasma managers.
get(self, object_ids, int timeout_ms=-1[, …]) Get one or more Python values from the object store.
get_buffers(self, object_ids[, timeout_ms]) Returns data buffer from the PlasmaStore based on object ID.
get_metadata(self, object_ids[, timeout_ms]) Returns metadata buffer from the PlasmaStore based on object ID.
get_next_notification(self) Get the next notification from the notification socket.
hash(self, ObjectID object_id) Compute the checksum of an object in the object store.
put(self, value, ObjectID object_id=None[, …]) Store a Python value into the object store.
release(self, ObjectID object_id) Notify Plasma that the object is no longer needed.
seal(self, ObjectID object_id) Seal the buffer in the PlasmaStore for a particular object ID.
subscribe(self) Subscribe to notifications about sealed objects.
to_capsule(self)
transfer(self, address, int port, …) Transfer local object with id object_id to another plasma instance
wait(self, object_ids, …) Wait until num_returns objects in object_ids are ready.
contains(self, ObjectID object_id)

Check if the object is present and sealed in the PlasmaStore.

Parameters:object_id (ObjectID) – A string used to identify an object.
create(self, ObjectID object_id, int64_t data_size, string metadata=b'')

Create a new buffer in the PlasmaStore for a particular object ID.

The returned buffer is mutable until seal is called.

Parameters:
  • object_id (ObjectID) – The object ID used to identify an object.
  • size (int) – The size in bytes of the created buffer.
  • metadata (bytes) – An optional string of bytes encoding whatever metadata the user wishes to encode.
Raises:
  • PlasmaObjectExists – This exception is raised if the object could not be created because there already is an object with the same ID in the plasma store.
  • PlasmaStoreFull: This exception is raised if the object could – not be created because the plasma store is unable to evict enough objects to create room for it.
disconnect(self)

Disconnect this client from the Plasma store.

evict(self, int64_t num_bytes)

Evict some objects until to recover some bytes.

Recover at least num_bytes bytes if possible.

Parameters:num_bytes (int) – The number of bytes to attempt to recover.
fetch(self, object_ids)

Fetch the objects with the given IDs from other plasma managers.

Parameters:object_ids (list) – A list of strings used to identify the objects.
get(self, object_ids, int timeout_ms=-1, serialization_context=None)

Get one or more Python values from the object store.

Parameters:
  • object_ids (list or ObjectID) – Object ID or list of object IDs associated to the values we get from the store.
  • timeout_ms (int, default -1) – The number of milliseconds that the get call should block before timing out and returning. Pass -1 if the call should block and 0 if the call should return immediately.
  • serialization_context (pyarrow.SerializationContext, default None) – Custom serialization and deserialization context.
Returns:

list or object – Python value or list of Python values for the data associated with the object_ids and ObjectNotAvailable if the object was not available.

get_buffers(self, object_ids, timeout_ms=-1)

Returns data buffer from the PlasmaStore based on object ID.

If the object has not been sealed yet, this call will block. The retrieved buffer is immutable.

Parameters:
  • object_ids (list) – A list of ObjectIDs used to identify some objects.
  • timeout_ms (int) – The number of milliseconds that the get call should block before timing out and returning. Pass -1 if the call should block and 0 if the call should return immediately.
Returns:

list – List of PlasmaBuffers for the data associated with the object_ids and None if the object was not available.

get_metadata(self, object_ids, timeout_ms=-1)

Returns metadata buffer from the PlasmaStore based on object ID.

If the object has not been sealed yet, this call will block. The retrieved buffer is immutable.

Parameters:
  • object_ids (list) – A list of ObjectIDs used to identify some objects.
  • timeout_ms (int) – The number of milliseconds that the get call should block before timing out and returning. Pass -1 if the call should block and 0 if the call should return immediately.
Returns:

list – List of PlasmaBuffers for the metadata associated with the object_ids and None if the object was not available.

get_next_notification(self)

Get the next notification from the notification socket.

Returns:
  • ObjectID – The object ID of the object that was stored.
  • int – The data size of the object that was stored.
  • int – The metadata size of the object that was stored.
hash(self, ObjectID object_id)

Compute the checksum of an object in the object store.

Parameters:object_id (ObjectID) – A string used to identify an object.
Returns:bytes – A digest string object’s hash. If the object isn’t in the object store, the string will have length zero.
manager_socket_name
put(self, value, ObjectID object_id=None, serialization_context=None)

Store a Python value into the object store.

Parameters:
  • value (object) – A Python object to store.
  • object_id (ObjectID, default None) – If this is provided, the specified object ID will be used to refer to the object.
  • serialization_context (pyarrow.SerializationContext, default None) – Custom serialization and deserialization context.
Returns:

The object ID associated to the Python object.

release(self, ObjectID object_id)

Notify Plasma that the object is no longer needed.

Parameters:object_id (ObjectID) – A string used to identify an object.
seal(self, ObjectID object_id)

Seal the buffer in the PlasmaStore for a particular object ID.

Once a buffer has been sealed, the buffer is immutable and can only be accessed through get.

Parameters:object_id (ObjectID) – A string used to identify an object.
store_socket_name
subscribe(self)

Subscribe to notifications about sealed objects.

to_capsule(self)
transfer(self, address, int port, ObjectID object_id)

Transfer local object with id object_id to another plasma instance

Parameters:
  • addr (str) – IPv4 address of the plasma instance the object is sent to.
  • port (int) – Port number of the plasma instance the object is sent to.
  • object_id (str) – A string used to identify an object.
wait(self, object_ids, int64_t timeout=PLASMA_WAIT_TIMEOUT, int num_returns=1)

Wait until num_returns objects in object_ids are ready. Currently, the object ID arguments to wait must be unique.

Parameters:
  • object_ids (list) – List of object IDs to wait for.
  • timeout (int) – Return to the caller after timeout milliseconds.
  • num_returns (int) – We are waiting for this number of objects to be ready.
Returns:

  • list – List of object IDs that are ready.
  • list – List of object IDs we might still wait on.