pyarrow.fs.HadoopFileSystem¶
-
class
pyarrow.fs.
HadoopFileSystem
¶ Bases:
pyarrow._fs.FileSystem
HDFS backed FileSystem implementation
- Parameters
host (str) – HDFS host to connect to.
port (int, default 8020) – HDFS port to connect to.
replication (int, default 3) – Number of copies each block will have.
buffer_size (int, default 0) – If 0, no buffering will happen otherwise the size of the temporary read and write buffer.
default_block_size (int, default None) – None means the default configuration for HDFS, a typical block size is 128 MB.
kerb_ticket (string or path, default None) – If not None, the path to the Kerberos ticket cache.
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(*args, **kwargs)Initialize self.
Copy a file.
Create a directory and subdirectories.
Delete a directory and its contents, recursively.
Delete a directory’s contents, recursively.
Delete a file.
Instantiate HadoopFileSystem object from an URI string.
Get info for the given files.
Move / rename a file or directory.
Normalize filesystem path.
Open an output stream for appending.
Open an input file for random access reading.
Open an input stream for sequential reading.
Open an output stream for sequential writing.
Attributes
The filesystem’s type name.
-
copy_file
()¶ Copy a file.
If the destination exists and is a directory, an error is returned. Otherwise, it is replaced.
- Parameters
src (str) – The path of the file to be copied from.
dest (str) – The destination path where the file is copied to.
-
create_dir
()¶ Create a directory and subdirectories.
This function succeeds if the directory already exists.
- Parameters
path (str) – The path of the new directory.
recursive (bool, default True) – Create nested directories as well.
-
delete_dir
()¶ Delete a directory and its contents, recursively.
- Parameters
path (str) – The path of the directory to be deleted.
-
delete_dir_contents
()¶ Delete a directory’s contents, recursively.
Like delete_dir, but doesn’t delete the directory itself.
- Parameters
path (str) – The path of the directory to be deleted.
accept_root_dir (boolean, default False) – Allow deleting the root directory’s contents (if path is empty or “/”)
-
delete_file
()¶ Delete a file.
- Parameters
path (str) – The path of the file to be deleted.
-
equals
()¶
-
static
from_uri
()¶ Instantiate HadoopFileSystem object from an URI string.
The following two calls are equivalent * HadoopFileSystem.from_uri(‘hdfs://localhost:8020/?user=test’
‘&replication=1’)
HadoopFileSystem(‘localhost’, port=8020, user=’test’, replication=1)
- Parameters
uri (str) – A string URI describing the connection to HDFS. In order to change the user, replication, buffer_size or default_block_size pass the values as query parts.
- Returns
HadoopFileSystem
-
get_file_info
()¶ Get info for the given files.
Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns a FileStat object and has a FileType of value NotFound. An exception indicates a truly exceptional condition (low-level I/O error, etc.).
- Parameters
paths_or_selector (FileSelector, path-like or list of path-likes) – Either a selector object, a path-like object or a list of path-like objects. The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, use allow_not_found.
- Returns
FileInfo or list of FileInfo – Single FileInfo object is returned for a single path, otherwise a list of FileInfo objects is returned.
-
move
()¶ Move / rename a file or directory.
If the destination exists: - if it is a non-empty directory, an error is returned - otherwise, if it has the same type as the source, it is replaced - otherwise, behavior is unspecified (implementation-dependent).
- Parameters
src (str) – The path of the file or the directory to be moved.
dest (str) – The destination path where the file or directory is moved to.
-
normalize_path
()¶ Normalize filesystem path.
- Parameters
path (str) – The path to normalize
- Returns
normalized_path (str) – The normalized path
-
open_append_stream
()¶ Open an output stream for appending.
If the target doesn’t exist, a new empty file is created.
- Parameters
path (str) – The source to open for writing.
compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”).
buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer.
- Returns
stream (NativeFile)
-
open_input_file
()¶ Open an input file for random access reading.
- Parameters
path (str) – The source to open for reading.
- Returns
stram (NativeFile)
-
open_input_stream
()¶ Open an input stream for sequential reading.
- Parameters
source (str) – The source to open for reading.
compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly decompression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”).
buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary read buffer.
- Returns
stream (NativeFile)
-
open_output_stream
()¶ Open an output stream for sequential writing.
If the target already exists, existing data is truncated.
- Parameters
path (str) – The source to open for writing.
compression (str optional, default 'detect') – The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”).
buffer_size (int optional, default None) – If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer.
- Returns
stream (NativeFile)
-
type_name
¶ The filesystem’s type name.