pyarrow.fs.S3FileSystem¶
- class pyarrow.fs.S3FileSystem(access_key=None, *, secret_key=None, session_token=None, bool anonymous=False, region=None, scheme=None, endpoint_override=None, bool background_writes=True, default_metadata=None, role_arn=None, session_name=None, external_id=None, load_frequency=900, proxy_options=None)¶
- Bases: - pyarrow._fs.FileSystem- S3-backed FileSystem implementation - If neither access_key nor secret_key are provided, and role_arn is also not provided, then attempts to initialize from AWS environment variables, otherwise both access_key and secret_key must be provided. - If role_arn is provided instead of access_key and secret_key, temporary credentials will be fetched by issuing a request to STS to assume the specified role. - Note: S3 buckets are special and the operations available on them may be limited or more expensive than desired. - Parameters
- access_keystr, defaultNone
- AWS Access Key ID. Pass None to use the standard AWS environment variables and/or configuration file. 
- secret_keystr, defaultNone
- AWS Secret Access key. Pass None to use the standard AWS environment variables and/or configuration file. 
- session_tokenstr, defaultNone
- AWS Session Token. An optional session token, required if access_key and secret_key are temporary credentials from STS. 
- anonymousbool, default False
- Whether to connect anonymously if access_key and secret_key are None. If true, will not attempt to look up credentials using standard AWS configuration methods. 
- role_arnstr, defaultNone
- AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. 
- session_namestr, defaultNone
- An optional identifier for the assumed role session. 
- external_idstr, defaultNone
- An optional unique identifier that might be required when you assume a role in another account. 
- load_frequencyint, default 900
- The frequency (in seconds) with which temporary credentials from an assumed role session will be refreshed. 
- regionstr, default ‘us-east-1’
- AWS region to connect to. 
- schemestr, default ‘https’
- S3 connection transport scheme. 
- endpoint_overridestr, defaultNone
- Override region with a connect string such as “localhost:9000” 
- background_writesbool, default True
- Whether file writes will be issued in the background, without blocking. 
- default_metadatamapping or pyarrow.KeyValueMetadata, defaultNone
- Default metadata for open_output_stream. This will be ignored if non-empty metadata is passed to open_output_stream. 
- proxy_optionsdictorstr, defaultNone
- If a proxy is used, provide the options here. Supported options are: ‘scheme’ (str: ‘http’ or ‘https’; required), ‘host’ (str; required), ‘port’ (int; required), ‘username’ (str; optional), ‘password’ (str; optional). A proxy URI (str) can also be provided, in which case these options will be derived from the provided URI. The following are equivalent: - S3FileSystem(proxy_options='http://username:password@localhost:8020') S3FileSystem(proxy_options={'scheme': 'http', 'host': 'localhost', 'port': 8020, 'username': 'username', 'password': 'password'}) 
 
- access_key
 - __init__(*args, **kwargs)¶
 - Methods - __init__(*args, **kwargs)- copy_file(self, src, dest)- Copy a file. - create_dir(self, path, *, bool recursive=True)- Create a directory and subdirectories. - delete_dir(self, path)- Delete a directory and its contents, recursively. - delete_dir_contents(self, path, *, ...)- Delete a directory's contents, recursively. - delete_file(self, path)- Delete a file. - equals(self, FileSystem other)- from_uri(uri)- Create a new FileSystem from URI or Path. - get_file_info(self, paths_or_selector)- Get info for the given files. - move(self, src, dest)- Move / rename a file or directory. - normalize_path(self, path)- Normalize filesystem path. - open_append_stream(self, path[, ...])- Open an output stream for appending. - open_input_file(self, path)- Open an input file for random access reading. - open_input_stream(self, path[, compression, ...])- Open an input stream for sequential reading. - open_output_stream(self, path[, ...])- Open an output stream for sequential writing. - Attributes - The AWS region this filesystem connects to. - The filesystem's type name. - copy_file(self, src, dest)¶
- Copy a file. - If the destination exists and is a directory, an error is returned. Otherwise, it is replaced. 
 - create_dir(self, path, *, bool recursive=True)¶
- Create a directory and subdirectories. - This function succeeds if the directory already exists. - Parameters
- pathstr
- The path of the new directory. 
- recursive: bool, default True
- Create nested directories as well. 
 
- path
 
 - delete_dir(self, path)¶
- Delete a directory and its contents, recursively. - Parameters
- pathstr
- The path of the directory to be deleted. 
 
- path
 
 - delete_dir_contents(self, path, *, bool accept_root_dir=False)¶
- Delete a directory’s contents, recursively. - Like delete_dir, but doesn’t delete the directory itself. 
 - equals(self, FileSystem other)¶
 - static from_uri(uri)¶
- Create a new FileSystem from URI or Path. - Recognized URI schemes are “file”, “mock”, “s3fs”, “hdfs” and “viewfs”. In addition, the argument can be a pathlib.Path object, or a string describing an absolute local path. - Parameters
- uristr
- URI-based path, for example: file:///some/local/path. 
 
- uri
- Returns
- tupleof (- FileSystem,- strpath)
- With (filesystem, path) tuple where path is the abstract path inside the FileSystem instance. 
 
 
 - get_file_info(self, paths_or_selector)¶
- Get info for the given files. - Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns a FileStat object and has a FileType of value NotFound. An exception indicates a truly exceptional condition (low-level I/O error, etc.). - Parameters
- paths_or_selector: FileSelector, path-like or list of path-likes
- Either a selector object, a path-like object or a list of path-like objects. The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, use allow_not_found. 
 
- Returns
 
 - move(self, src, dest)¶
- Move / rename a file or directory. - If the destination exists: - if it is a non-empty directory, an error is returned - otherwise, if it has the same type as the source, it is replaced - otherwise, behavior is unspecified (implementation-dependent). 
 - normalize_path(self, path)¶
- Normalize filesystem path. 
 - open_append_stream(self, path, compression='detect', buffer_size=None, metadata=None)¶
- Open an output stream for appending. - If the target doesn’t exist, a new empty file is created. - Note - Some filesystem implementations do not support efficient appending to an existing file, in which case this method will raise NotImplementedError. Consider writing to multiple files (using e.g. the dataset layer) instead. - Parameters
- pathstr
- The source to open for writing. 
- compressionstroptional, default ‘detect’
- The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”). 
- buffer_sizeintoptional, defaultNone
- If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer. 
- metadatadictoptional, defaultNone
- If not None, a mapping of string keys to string values. Some filesystems support storing metadata along the file (such as “Content-Type”). Unsupported metadata keys will be ignored. 
 
- path
- Returns
- streamNativeFile
 
- stream
 
 - open_input_file(self, path)¶
- Open an input file for random access reading. - Parameters
- pathstr
- The source to open for reading. 
 
- path
- Returns
- stramNativeFile
 
- stram
 
 - open_input_stream(self, path, compression='detect', buffer_size=None)¶
- Open an input stream for sequential reading. - Parameters
- sourcestr
- The source to open for reading. 
- compressionstroptional, default ‘detect’
- The compression algorithm to use for on-the-fly decompression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”). 
- buffer_sizeintoptional, defaultNone
- If None or 0, no buffering will happen. Otherwise the size of the temporary read buffer. 
 
- source
- Returns
- streamNativeFile
 
- stream
 
 - open_output_stream(self, path, compression='detect', buffer_size=None, metadata=None)¶
- Open an output stream for sequential writing. - If the target already exists, existing data is truncated. - Parameters
- pathstr
- The source to open for writing. 
- compressionstroptional, default ‘detect’
- The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”). 
- buffer_sizeintoptional, defaultNone
- If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer. 
- metadatadictoptional, defaultNone
- If not None, a mapping of string keys to string values. Some filesystems support storing metadata along the file (such as “Content-Type”). Unsupported metadata keys will be ignored. 
 
- path
- Returns
- streamNativeFile
 
- stream
 
 - region¶
- The AWS region this filesystem connects to. 
 - type_name¶
- The filesystem’s type name. 
 
