pyarrow.fs.S3FileSystem¶
- class pyarrow.fs.S3FileSystem¶
- Bases: - pyarrow._fs.FileSystem- S3-backed FileSystem implementation - If neither access_key nor secret_key are provided, and role_arn is also not provided, then attempts to initialize from AWS environment variables, otherwise both access_key and secret_key must be provided. - If role_arn is provided instead of access_key and secret_key, temporary credentials will be fetched by issuing a request to STS to assume the specified role. - Note: S3 buckets are special and the operations available on them may be limited or more expensive than desired. - Parameters
- access_key (str, default None) – AWS Access Key ID. Pass None to use the standard AWS environment variables and/or configuration file. 
- secret_key (str, default None) – AWS Secret Access key. Pass None to use the standard AWS environment variables and/or configuration file. 
- session_token (str, default None) – AWS Session Token. An optional session token, required if access_key and secret_key are temporary credentials from STS. 
- anonymous (boolean, default False) – Whether to connect anonymously if access_key and secret_key are None. If true, will not attempt to look up credentials using standard AWS configuration methods. 
- role_arn (str, default None) – AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. 
- session_name (str, default None) – An optional identifier for the assumed role session. 
- external_id (str, default None) – An optional unique identifier that might be required when you assume a role in another account. 
- load_frequency (int, default 900) – The frequency (in seconds) with which temporary credentials from an assumed role session will be refreshed. 
- region (str, default 'us-east-1') – AWS region to connect to. 
- scheme (str, default 'https') – S3 connection transport scheme. 
- endpoint_override (str, default None) – Override region with a connect string such as “localhost:9000” 
- background_writes (boolean, default True) – Whether OutputStream writes will be issued in the background, without blocking. 
- proxy_options (dict or str, default None) – - If a proxy is used, provide the options here. Supported options are: ‘scheme’ (str: ‘http’ or ‘https’; required), ‘host’ (str; required), ‘port’ (int; required), ‘username’ (str; optional), ‘password’ (str; optional). A proxy URI (str) can also be provided, in which case these options will be derived from the provided URI. The following are equivalent: - S3FileSystem(proxy_options='http://username:password@localhost:8020') S3FileSystem(proxy_options={'scheme': 'http', 'host': 'localhost', 'port': 8020, 'username': 'username', 'password': 'password'}) 
 
 - __init__(*args, **kwargs)¶
- Initialize self. See help(type(self)) for accurate signature. 
 - Methods - __init__(*args, **kwargs)- Initialize self. - Copy a file. - Create a directory and subdirectories. - Delete a directory and its contents, recursively. - Delete a directory’s contents, recursively. - Delete a file. - Create a new FileSystem from URI or Path. - Get info for the given files. - Move / rename a file or directory. - Normalize filesystem path. - Open an output stream for appending. - Open an input file for random access reading. - Open an input stream for sequential reading. - Open an output stream for sequential writing. - Attributes - The AWS region this filesystem connects to. - The filesystem’s type name. - copy_file()¶
- Copy a file. - If the destination exists and is a directory, an error is returned. Otherwise, it is replaced. - Parameters
- src (str) – The path of the file to be copied from. 
- dest (str) – The destination path where the file is copied to. 
 
 
 - create_dir()¶
- Create a directory and subdirectories. - This function succeeds if the directory already exists. - Parameters
- path (str) – The path of the new directory. 
- recursive (bool, default True) – Create nested directories as well. 
 
 
 - delete_dir()¶
- Delete a directory and its contents, recursively. - Parameters
- path (str) – The path of the directory to be deleted. 
 
 - delete_dir_contents()¶
- Delete a directory’s contents, recursively. - Like delete_dir, but doesn’t delete the directory itself. - Parameters
- path (str) – The path of the directory to be deleted. 
- accept_root_dir (boolean, default False) – Allow deleting the root directory’s contents (if path is empty or “/”) 
 
 
 - delete_file()¶
- Delete a file. - Parameters
- path (str) – The path of the file to be deleted. 
 
 - equals()¶
 - static from_uri()¶
- Create a new FileSystem from URI or Path. - Recognized URI schemes are “file”, “mock”, “s3fs”, “hdfs” and “viewfs”. In addition, the argument can be a pathlib.Path object, or a string describing an absolute local path. - Parameters
- uri (string) – URI-based path, for example: file:///some/local/path. 
- Returns
- With (filesystem, path) tuple where path is the abstract path inside 
- the FileSystem instance. 
 
 
 - get_file_info()¶
- Get info for the given files. - Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns a FileStat object and has a FileType of value NotFound. An exception indicates a truly exceptional condition (low-level I/O error, etc.). - Parameters
- paths_or_selector (FileSelector, path-like or list of path-likes) – Either a selector object, a path-like object or a list of path-like objects. The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, use allow_not_found. 
- Returns
- FileInfo or list of FileInfo – Single FileInfo object is returned for a single path, otherwise a list of FileInfo objects is returned. 
 
 - move()¶
- Move / rename a file or directory. - If the destination exists: - if it is a non-empty directory, an error is returned - otherwise, if it has the same type as the source, it is replaced - otherwise, behavior is unspecified (implementation-dependent). - Parameters
- src (str) – The path of the file or the directory to be moved. 
- dest (str) – The destination path where the file or directory is moved to. 
 
 
 - normalize_path()¶
- Normalize filesystem path. - Parameters
- path (str) – The path to normalize 
- Returns
- normalized_path (str) – The normalized path 
 
 - open_append_stream()¶
- Open an output stream for appending. - If the target doesn’t exist, a new empty file is created. - Parameters
- path (str) – The source to open for writing. 
- compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”). 
- buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer. 
 
- Returns
- stream (NativeFile) 
 
 - open_input_file()¶
- Open an input file for random access reading. - Parameters
- path (str) – The source to open for reading. 
- Returns
- stram (NativeFile) 
 
 - open_input_stream()¶
- Open an input stream for sequential reading. - Parameters
- source (str) – The source to open for reading. 
- compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly decompression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”). 
- buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary read buffer. 
 
- Returns
- stream (NativeFile) 
 
 - open_output_stream()¶
- Open an output stream for sequential writing. - If the target already exists, existing data is truncated. - Parameters
- path (str) – The source to open for writing. 
- compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”). 
- buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer. 
 
- Returns
- stream (NativeFile) 
 
 - region¶
- The AWS region this filesystem connects to. 
 - type_name¶
- The filesystem’s type name. 
 
