pyarrow.fs.S3FileSystem

class pyarrow.fs.S3FileSystem(access_key=None, *, secret_key=None, session_token=None, bool anonymous=False, region=None, scheme=None, endpoint_override=None, bool background_writes=True, default_metadata=None, role_arn=None, session_name=None, external_id=None, load_frequency=900, proxy_options=None)

Bases: pyarrow._fs.FileSystem

S3-backed FileSystem implementation

If neither access_key nor secret_key are provided, and role_arn is also not provided, then attempts to initialize from AWS environment variables, otherwise both access_key and secret_key must be provided.

If role_arn is provided instead of access_key and secret_key, temporary credentials will be fetched by issuing a request to STS to assume the specified role.

Note: S3 buckets are special and the operations available on them may be limited or more expensive than desired.

Parameters
  • access_key (str, default None) – AWS Access Key ID. Pass None to use the standard AWS environment variables and/or configuration file.

  • secret_key (str, default None) – AWS Secret Access key. Pass None to use the standard AWS environment variables and/or configuration file.

  • session_token (str, default None) – AWS Session Token. An optional session token, required if access_key and secret_key are temporary credentials from STS.

  • anonymous (boolean, default False) – Whether to connect anonymously if access_key and secret_key are None. If true, will not attempt to look up credentials using standard AWS configuration methods.

  • role_arn (str, default None) – AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role.

  • session_name (str, default None) – An optional identifier for the assumed role session.

  • external_id (str, default None) – An optional unique identifier that might be required when you assume a role in another account.

  • load_frequency (int, default 900) – The frequency (in seconds) with which temporary credentials from an assumed role session will be refreshed.

  • region (str, default 'us-east-1') – AWS region to connect to.

  • scheme (str, default 'https') – S3 connection transport scheme.

  • endpoint_override (str, default None) – Override region with a connect string such as “localhost:9000”

  • background_writes (boolean, default True) – Whether file writes will be issued in the background, without blocking.

  • default_metadata (mapping or KeyValueMetadata, default None) – Default metadata for open_output_stream. This will be ignored if non-empty metadata is passed to open_output_stream.

  • proxy_options (dict or str, default None) –

    If a proxy is used, provide the options here. Supported options are: ‘scheme’ (str: ‘http’ or ‘https’; required), ‘host’ (str; required), ‘port’ (int; required), ‘username’ (str; optional), ‘password’ (str; optional). A proxy URI (str) can also be provided, in which case these options will be derived from the provided URI. The following are equivalent:

    S3FileSystem(proxy_options='http://username:password@localhost:8020')
    S3FileSystem(proxy_options={'scheme': 'http', 'host': 'localhost',
                                'port': 8020, 'username': 'username',
                                'password': 'password'})
    

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(*args, **kwargs)

Initialize self.

copy_file(self, src, dest)

Copy a file.

create_dir(self, path, *, bool recursive=True)

Create a directory and subdirectories.

delete_dir(self, path)

Delete a directory and its contents, recursively.

delete_dir_contents(self, path, *, …)

Delete a directory’s contents, recursively.

delete_file(self, path)

Delete a file.

equals(self, FileSystem other)

from_uri(uri)

Create a new FileSystem from URI or Path.

get_file_info(self, paths_or_selector)

Get info for the given files.

move(self, src, dest)

Move / rename a file or directory.

normalize_path(self, path)

Normalize filesystem path.

open_append_stream(self, path[, …])

DEPRECATED: Open an output stream for appending.

open_input_file(self, path)

Open an input file for random access reading.

open_input_stream(self, path[, compression, …])

Open an input stream for sequential reading.

open_output_stream(self, path[, …])

Open an output stream for sequential writing.

Attributes

region

The AWS region this filesystem connects to.

type_name

The filesystem’s type name.

copy_file(self, src, dest)

Copy a file.

If the destination exists and is a directory, an error is returned. Otherwise, it is replaced.

Parameters
  • src (str) – The path of the file to be copied from.

  • dest (str) – The destination path where the file is copied to.

create_dir(self, path, *, bool recursive=True)

Create a directory and subdirectories.

This function succeeds if the directory already exists.

Parameters
  • path (str) – The path of the new directory.

  • recursive (bool, default True) – Create nested directories as well.

delete_dir(self, path)

Delete a directory and its contents, recursively.

Parameters

path (str) – The path of the directory to be deleted.

delete_dir_contents(self, path, *, bool accept_root_dir=False)

Delete a directory’s contents, recursively.

Like delete_dir, but doesn’t delete the directory itself.

Parameters
  • path (str) – The path of the directory to be deleted.

  • accept_root_dir (boolean, default False) – Allow deleting the root directory’s contents (if path is empty or “/”)

delete_file(self, path)

Delete a file.

Parameters

path (str) – The path of the file to be deleted.

equals(self, FileSystem other)
static from_uri(uri)

Create a new FileSystem from URI or Path.

Recognized URI schemes are “file”, “mock”, “s3fs”, “hdfs” and “viewfs”. In addition, the argument can be a pathlib.Path object, or a string describing an absolute local path.

Parameters

uri (string) – URI-based path, for example: file:///some/local/path.

Returns

  • With (filesystem, path) tuple where path is the abstract path inside

  • the FileSystem instance.

get_file_info(self, paths_or_selector)

Get info for the given files.

Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns a FileStat object and has a FileType of value NotFound. An exception indicates a truly exceptional condition (low-level I/O error, etc.).

Parameters

paths_or_selector (FileSelector, path-like or list of path-likes) – Either a selector object, a path-like object or a list of path-like objects. The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, use allow_not_found.

Returns

FileInfo or list of FileInfo – Single FileInfo object is returned for a single path, otherwise a list of FileInfo objects is returned.

move(self, src, dest)

Move / rename a file or directory.

If the destination exists: - if it is a non-empty directory, an error is returned - otherwise, if it has the same type as the source, it is replaced - otherwise, behavior is unspecified (implementation-dependent).

Parameters
  • src (str) – The path of the file or the directory to be moved.

  • dest (str) – The destination path where the file or directory is moved to.

normalize_path(self, path)

Normalize filesystem path.

Parameters

path (str) – The path to normalize

Returns

normalized_path (str) – The normalized path

open_append_stream(self, path, compression='detect', buffer_size=None, metadata=None)

DEPRECATED: Open an output stream for appending.

If the target doesn’t exist, a new empty file is created.

Deprecated since version 6.0: Several filesystems don’t support this functionality and it will be later removed.

Parameters
  • path (str) – The source to open for writing.

  • compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”).

  • buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer.

  • metadata (dict optional, default None) – If not None, a mapping of string keys to string values. Some filesystems support storing metadata along the file (such as “Content-Type”). Unsupported metadata keys will be ignored.

Returns

stream (NativeFile)

open_input_file(self, path)

Open an input file for random access reading.

Parameters

path (str) – The source to open for reading.

Returns

stram (NativeFile)

open_input_stream(self, path, compression='detect', buffer_size=None)

Open an input stream for sequential reading.

Parameters
  • source (str) – The source to open for reading.

  • compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly decompression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”).

  • buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary read buffer.

Returns

stream (NativeFile)

open_output_stream(self, path, compression='detect', buffer_size=None, metadata=None)

Open an output stream for sequential writing.

If the target already exists, existing data is truncated.

Parameters
  • path (str) – The source to open for writing.

  • compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”).

  • buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer.

  • metadata (dict optional, default None) – If not None, a mapping of string keys to string values. Some filesystems support storing metadata along the file (such as “Content-Type”). Unsupported metadata keys will be ignored.

Returns

stream (NativeFile)

region

The AWS region this filesystem connects to.

type_name

The filesystem’s type name.