pyarrow.fs.HadoopFileSystem¶

class pyarrow.fs.HadoopFileSystem(unicode host, int port=8020, unicode user=None, *, int replication=3, int buffer_size=0, default_block_size=None, kerb_ticket=None, extra_conf=None)¶

Bases: pyarrow._fs.FileSystem

HDFS backed FileSystem implementation

Parameters

host (str) – HDFS host to connect to.
port (int, default 8020) – HDFS port to connect to.
replication (int, default 3) – Number of copies each block will have.
buffer_size (int, default 0) – If 0, no buffering will happen otherwise the size of the temporary read and write buffer.
default_block_size (int, default None) – None means the default configuration for HDFS, a typical block size is 128 MB.
kerb_ticket (string or path, default None) – If not None, the path to the Kerberos ticket cache.

__init__(*args, **kwargs)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(args, *kwargs)	Initialize self.
`copy_file`(self, src, dest)	Copy a file.
`create_dir`(self, path, *, bool recursive=True)	Create a directory and subdirectories.
`delete_dir`(self, path)	Delete a directory and its contents, recursively.
`delete_dir_contents`(self, path, *, …)	Delete a directory’s contents, recursively.
`delete_file`(self, path)	Delete a file.
`equals`(self, FileSystem other)
`from_uri`(uri)	Instantiate HadoopFileSystem object from an URI string.
`get_file_info`(self, paths_or_selector)	Get info for the given files.
`move`(self, src, dest)	Move / rename a file or directory.
`normalize_path`(self, path)	Normalize filesystem path.
`open_append_stream`(self, path[, …])	Open an output stream for appending.
`open_input_file`(self, path)	Open an input file for random access reading.
`open_input_stream`(self, path[, compression, …])	Open an input stream for sequential reading.
`open_output_stream`(self, path[, …])	Open an output stream for sequential writing.

Attributes

type_name

The filesystem’s type name.

copy_file(self, src, dest)¶

Copy a file.

If the destination exists and is a directory, an error is returned. Otherwise, it is replaced.

Parameters

src (str) – The path of the file to be copied from.
dest (str) – The destination path where the file is copied to.

create_dir(self, path, *, bool recursive=True)¶

Create a directory and subdirectories.

This function succeeds if the directory already exists.

Parameters

path (str) – The path of the new directory.
recursive (bool, default True) – Create nested directories as well.

delete_dir(self, path)¶

Delete a directory and its contents, recursively.

Parameters: path (str) – The path of the directory to be deleted.

delete_dir_contents(self, path, *, bool accept_root_dir=False)¶

Delete a directory’s contents, recursively.

Like delete_dir, but doesn’t delete the directory itself.

Parameters

path (str) – The path of the directory to be deleted.
accept_root_dir (boolean, default False) – Allow deleting the root directory’s contents (if path is empty or “/”)

delete_file(self, path)¶

Delete a file.

Parameters: path (str) – The path of the file to be deleted.

equals(self, FileSystem other)¶

static from_uri(uri)¶

Instantiate HadoopFileSystem object from an URI string.

The following two calls are equivalent

HadoopFileSystem.from_uri('hdfs://localhost:8020/?user=test&replication=1')
HadoopFileSystem('localhost', port=8020, user='test', replication=1)

Parameters: uri (str) – A string URI describing the connection to HDFS. In order to change the user, replication, buffer_size or default_block_size pass the values as query parts.
Returns: HadoopFileSystem

get_file_info(self, paths_or_selector)¶

Get info for the given files.

Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns a FileStat object and has a FileType of value NotFound. An exception indicates a truly exceptional condition (low-level I/O error, etc.).

Parameters: paths_or_selector (FileSelector, path-like or list of path-likes) – Either a selector object, a path-like object or a list of path-like objects. The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, use allow_not_found.
Returns: FileInfo or list of FileInfo – Single FileInfo object is returned for a single path, otherwise a list of FileInfo objects is returned.

move(self, src, dest)¶

Move / rename a file or directory.

If the destination exists: - if it is a non-empty directory, an error is returned - otherwise, if it has the same type as the source, it is replaced - otherwise, behavior is unspecified (implementation-dependent).

Parameters

src (str) – The path of the file or the directory to be moved.
dest (str) – The destination path where the file or directory is moved to.

normalize_path(self, path)¶

Normalize filesystem path.

Parameters: path (str) – The path to normalize
Returns: normalized_path (str) – The normalized path

open_append_stream(self, path, compression='detect', buffer_size=None, metadata=None)¶

Open an output stream for appending.

If the target doesn’t exist, a new empty file is created.

Parameters

path (str) – The source to open for writing.
compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”).
buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer.
metadata (dict optional, default None) – If not None, a mapping of string keys to string values. Some filesystems support storing metadata along the file (such as “Content-Type”). Unsupported metadata keys will be ignored.

Returns

stream (NativeFile)

open_input_file(self, path)¶

Open an input file for random access reading.

Parameters: path (str) – The source to open for reading.
Returns: stram (NativeFile)

open_input_stream(self, path, compression='detect', buffer_size=None)¶

Open an input stream for sequential reading.

Parameters

source (str) – The source to open for reading.
compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly decompression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”).
buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary read buffer.

Returns

stream (NativeFile)

open_output_stream(self, path, compression='detect', buffer_size=None, metadata=None)¶

Open an output stream for sequential writing.

If the target already exists, existing data is truncated.

Parameters

path (str) – The source to open for writing.
compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”).
buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer.
metadata (dict optional, default None) – If not None, a mapping of string keys to string values. Some filesystems support storing metadata along the file (such as “Content-Type”). Unsupported metadata keys will be ignored.

Returns

stream (NativeFile)

type_name¶: The filesystem’s type name.

pyarrow.fs.S3FileSystem pyarrow.fs.SubTreeFileSystem