Filesystems¶
Interface¶
- 
enum arrow::fs::FileType¶
- FileSystem entry type. - Values: - 
enumerator NotFound¶
- Entry is not found. 
 - 
enumerator Unknown¶
- Entry exists but its type is unknown. - This can designate a special file such as a Unix socket or character device, or Windows NUL / CON / … 
 - 
enumerator File¶
- Entry is a regular file. 
 - 
enumerator Directory¶
- Entry is a directory. 
 
- 
enumerator 
- 
struct arrow::fs::FileInfo: public arrow::util::EqualityComparable<FileInfo>¶
- FileSystem entry info. - Public Functions - 
const std::string &path() const¶
- The full file path in the filesystem. 
 - 
std::string base_name() const¶
- The file base name (component after the last directory separator) 
 - 
int64_t size() const¶
- The size in bytes, if available. - Only regular files are guaranteed to have a size. 
 - 
std::string extension() const¶
- The file extension (excluding the dot) 
 - 
TimePoint mtime() const¶
- The time of last modification, if available. 
 - 
struct ByPath¶
- Function object implementing less-than comparison and hashing by path, to support sorting infos, using them as keys, and other interactions with the STL. 
 
- 
const std::string &
- 
struct arrow::fs::FileSelector¶
- File selector for filesystem APIs. - Public Members - 
std::string base_dir¶
- The directory in which to select files. - If the path exists but doesn’t point to a directory, this should be an error. 
 - 
bool allow_not_found¶
- The behavior if - base_dirisn’t found in the filesystem.- If false, an error is returned. If true, an empty selection is returned. 
 - 
bool recursive¶
- Whether to recurse into subdirectories. 
 - 
int32_t max_recursion¶
- The maximum number of subdirectories to recurse into. 
 
- 
std::string 
- 
class arrow::fs::FileSystem: public std::enable_shared_from_this<FileSystem>¶
- Abstract file system API. - Subclassed by arrow::fs::HadoopFileSystem, arrow::fs::internal::MockFileSystem, arrow::fs::LocalFileSystem, arrow::fs::S3FileSystem, arrow::fs::SlowFileSystem, arrow::fs::SubTreeFileSystem, arrow::py::fs::PyFileSystem - Public Functions - 
Result<std::string> NormalizePath(std::string path)¶
- Normalize path for the given filesystem. - The default implementation of this method is a no-op, but subclasses may allow normalizing irregular path forms (such as Windows local paths). 
 - 
Result<FileInfo> GetFileInfo(const std::string &path) = 0¶
- Get info for the given target. - Any symlink is automatically dereferenced, recursively. A nonexistent or unreachable file returns an Ok status and has a FileType of value NotFound. An error status indicates a truly exceptional condition (low-level I/O error, etc.). 
 - 
Result<std::vector<FileInfo>> GetFileInfo(const std::vector<std::string> &paths)¶
- Same, for many targets at once. 
 - 
Result<std::vector<FileInfo>> GetFileInfo(const FileSelector &select) = 0¶
- Same, according to a selector. - The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, see - FileSelector::allow_not_found.
 - 
Status CreateDir(const std::string &path, bool recursive = true) = 0¶
- Create a directory and subdirectories. - This function succeeds if the directory already exists. 
 - 
Status DeleteDirContents(const std::string &path) = 0¶
- Delete a directory’s contents, recursively. - Like DeleteDir, but doesn’t delete the directory itself. Passing an empty path (“” or “/”) is disallowed, see DeleteRootDirContents. 
 - 
Status DeleteRootDirContents() = 0¶
- EXPERIMENTAL: Delete the root directory’s contents, recursively. - Implementations may decide to raise an error if this operation is too dangerous. 
 - 
Status DeleteFiles(const std::vector<std::string> &paths)¶
- Delete many files. - The default implementation issues individual delete operations in sequence. 
 - 
Status Move(const std::string &src, const std::string &dest) = 0¶
- Move / rename a file or directory. - If the destination exists: - if it is a non-empty directory, an error is returned 
- otherwise, if it has the same type as the source, it is replaced 
- otherwise, behavior is unspecified (implementation-dependent). 
 
 - 
Status CopyFile(const std::string &src, const std::string &dest) = 0¶
- Copy a file. - If the destination exists and is a directory, an error is returned. Otherwise, it is replaced. 
 - 
Result<std::shared_ptr<io::InputStream>> OpenInputStream(const std::string &path) = 0¶
- Open an input stream for sequential reading. 
 - 
Result<std::shared_ptr<io::InputStream>> OpenInputStream(const FileInfo &info)¶
- Open an input stream for sequential reading. - This override assumes the given FileInfo validly represents the file’s characteristics, and may optimize access depending on them (for example avoid querying the file size or its existence). 
 - 
Result<std::shared_ptr<io::RandomAccessFile>> OpenInputFile(const std::string &path) = 0¶
- Open an input file for random access reading. 
 - 
Result<std::shared_ptr<io::RandomAccessFile>> OpenInputFile(const FileInfo &info)¶
- Open an input file for random access reading. - This override assumes the given FileInfo validly represents the file’s characteristics, and may optimize access depending on them (for example avoid querying the file size or its existence). 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenOutputStream(const std::string &path) = 0¶
- Open an output stream for sequential writing. - If the target already exists, existing data is truncated. 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenAppendStream(const std::string &path) = 0¶
- Open an output stream for appending. - If the target doesn’t exist, a new empty file is created. 
 
- 
Result<std::string> 
High-level factory function¶
- 
Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const std::string &uri, std::string *out_path = NULLPTR)¶
- Create a new FileSystem by URI. - Recognized schemes are “file”, “mock”, “hdfs” and “s3fs”. - Return
- out_fs FileSystem instance. 
- Parameters
- [in] uri: a URI-based path, ex: file:///some/local/path
- [out] out_path: (optional) Path inside the filesystem.
 
 
- 
Result<std::shared_ptr<FileSystem>> FileSystemFromUriOrPath(const std::string &uri, std::string *out_path = NULLPTR)¶
- Create a new FileSystem by URI. - Same as FileSystemFromUri, but in addition also recognize non-URIs and treat them as local filesystem paths. Only absolute local filesystem paths are allowed. 
Concrete implementations¶
- 
class arrow::fs::SubTreeFileSystem: public arrow::fs::FileSystem¶
- A FileSystem implementation that delegates to another implementation after prepending a fixed base path. - This is useful to expose a logical view of a subtree of a filesystem, for example a directory in a LocalFileSystem. This works on abstract paths, i.e. paths using forward slashes and and a single root “/”. Windows paths are not guaranteed to work. This makes no security guarantee. For example, symlinks may allow to “escape” the subtree and access other parts of the underlying filesystem. - Public Functions - 
Result<std::string> NormalizePath(std::string path) override¶
- Normalize path for the given filesystem. - The default implementation of this method is a no-op, but subclasses may allow normalizing irregular path forms (such as Windows local paths). 
 - 
Result<FileInfo> GetFileInfo(const std::string &path) override¶
- Get info for the given target. - Any symlink is automatically dereferenced, recursively. A nonexistent or unreachable file returns an Ok status and has a FileType of value NotFound. An error status indicates a truly exceptional condition (low-level I/O error, etc.). 
 - 
Result<std::vector<FileInfo>> GetFileInfo(const FileSelector &select) override¶
- Same, according to a selector. - The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, see - FileSelector::allow_not_found.
 - 
Status CreateDir(const std::string &path, bool recursive = true) override¶
- Create a directory and subdirectories. - This function succeeds if the directory already exists. 
 - 
Status DeleteDir(const std::string &path) override¶
- Delete a directory and its contents, recursively. 
 - 
Status DeleteDirContents(const std::string &path) override¶
- Delete a directory’s contents, recursively. - Like DeleteDir, but doesn’t delete the directory itself. Passing an empty path (“” or “/”) is disallowed, see DeleteRootDirContents. 
 - 
Status DeleteRootDirContents() override¶
- EXPERIMENTAL: Delete the root directory’s contents, recursively. - Implementations may decide to raise an error if this operation is too dangerous. 
 - 
Status Move(const std::string &src, const std::string &dest) override¶
- Move / rename a file or directory. - If the destination exists: - if it is a non-empty directory, an error is returned 
- otherwise, if it has the same type as the source, it is replaced 
- otherwise, behavior is unspecified (implementation-dependent). 
 
 - 
Status CopyFile(const std::string &src, const std::string &dest) override¶
- Copy a file. - If the destination exists and is a directory, an error is returned. Otherwise, it is replaced. 
 - 
Result<std::shared_ptr<io::InputStream>> OpenInputStream(const std::string &path) override¶
- Open an input stream for sequential reading. 
 - 
Result<std::shared_ptr<io::InputStream>> OpenInputStream(const FileInfo &info) override¶
- Open an input stream for sequential reading. - This override assumes the given FileInfo validly represents the file’s characteristics, and may optimize access depending on them (for example avoid querying the file size or its existence). 
 - 
Result<std::shared_ptr<io::RandomAccessFile>> OpenInputFile(const std::string &path) override¶
- Open an input file for random access reading. 
 - 
Result<std::shared_ptr<io::RandomAccessFile>> OpenInputFile(const FileInfo &info) override¶
- Open an input file for random access reading. - This override assumes the given FileInfo validly represents the file’s characteristics, and may optimize access depending on them (for example avoid querying the file size or its existence). 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenOutputStream(const std::string &path) override¶
- Open an output stream for sequential writing. - If the target already exists, existing data is truncated. 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenAppendStream(const std::string &path) override¶
- Open an output stream for appending. - If the target doesn’t exist, a new empty file is created. 
 
- 
Result<std::string> 
- 
struct arrow::fs::LocalFileSystemOptions¶
- Options for the LocalFileSystem implementation. - Public Members - 
bool use_mmap= false¶
- Whether OpenInputStream and OpenInputFile return a mmap’ed file, or a regular one. 
 - Public Static Functions - 
LocalFileSystemOptions Defaults()¶
- Initialize with defaults. 
 
- 
bool 
- 
class arrow::fs::LocalFileSystem: public arrow::fs::FileSystem¶
- A FileSystem implementation accessing files on the local machine. - This class handles only - /-separated paths. If desired, conversion from Windows backslash-separated paths should be done by the caller. Details such as symlinks are abstracted away (symlinks are always followed, except when deleting an entry).- Public Functions - 
Result<std::string> NormalizePath(std::string path) override¶
- Normalize path for the given filesystem. - The default implementation of this method is a no-op, but subclasses may allow normalizing irregular path forms (such as Windows local paths). 
 - 
Result<FileInfo> GetFileInfo(const std::string &path) override¶
- Get info for the given target. - Any symlink is automatically dereferenced, recursively. A nonexistent or unreachable file returns an Ok status and has a FileType of value NotFound. An error status indicates a truly exceptional condition (low-level I/O error, etc.). 
 - 
Result<std::vector<FileInfo>> GetFileInfo(const FileSelector &select) override¶
- Same, according to a selector. - The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, see - FileSelector::allow_not_found.
 - 
Status CreateDir(const std::string &path, bool recursive = true) override¶
- Create a directory and subdirectories. - This function succeeds if the directory already exists. 
 - 
Status DeleteDir(const std::string &path) override¶
- Delete a directory and its contents, recursively. 
 - 
Status DeleteDirContents(const std::string &path) override¶
- Delete a directory’s contents, recursively. - Like DeleteDir, but doesn’t delete the directory itself. Passing an empty path (“” or “/”) is disallowed, see DeleteRootDirContents. 
 - 
Status DeleteRootDirContents() override¶
- EXPERIMENTAL: Delete the root directory’s contents, recursively. - Implementations may decide to raise an error if this operation is too dangerous. 
 - 
Status Move(const std::string &src, const std::string &dest) override¶
- Move / rename a file or directory. - If the destination exists: - if it is a non-empty directory, an error is returned 
- otherwise, if it has the same type as the source, it is replaced 
- otherwise, behavior is unspecified (implementation-dependent). 
 
 - 
Status CopyFile(const std::string &src, const std::string &dest) override¶
- Copy a file. - If the destination exists and is a directory, an error is returned. Otherwise, it is replaced. 
 - 
Result<std::shared_ptr<io::InputStream>> OpenInputStream(const std::string &path) override¶
- Open an input stream for sequential reading. 
 - 
Result<std::shared_ptr<io::RandomAccessFile>> OpenInputFile(const std::string &path) override¶
- Open an input file for random access reading. 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenOutputStream(const std::string &path) override¶
- Open an output stream for sequential writing. - If the target already exists, existing data is truncated. 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenAppendStream(const std::string &path) override¶
- Open an output stream for appending. - If the target doesn’t exist, a new empty file is created. 
 
- 
Result<std::string> 
- 
struct arrow::fs::S3Options¶
- Options for the S3FileSystem implementation. - Public Functions - 
void ConfigureDefaultCredentials()¶
- Configure with the default AWS credentials provider chain. 
 - 
void ConfigureAnonymousCredentials()¶
- Configure with anonymous credentials. This will only let you access public buckets. 
 - 
void ConfigureAccessKey(const std::string &access_key, const std::string &secret_key, const std::string &session_token = "")¶
- Configure with explicit access and secret key. 
 - Configure with credentials from an assumed role. 
 - Public Members - 
std::string region¶
- AWS region to connect to. - If unset, the AWS SDK will choose a default value. The exact algorithm depends on the SDK version. Before 1.8, the default is hardcoded to “us-east-1”. Since 1.8, several heuristics are used to determine the region (environment variables, configuration profile, EC2 metadata server). 
 - 
std::string endpoint_override¶
- If non-empty, override region with a connect string such as “localhost:9000”. 
 - 
std::string scheme= "https"¶
- S3 connection transport, default “https”. 
 - 
std::string role_arn¶
- ARN of role to assume. 
 - 
std::string session_name¶
- Optional identifier for an assumed role session. 
 - 
std::string external_id¶
- Optional external idenitifer to pass to STS when assuming a role. 
 - 
int load_frequency¶
- Frequency (in seconds) to refresh temporary credentials from assumed role. 
 - 
std::shared_ptr<Aws::Auth::AWSCredentialsProvider> credentials_provider¶
- AWS credentials provider. 
 - 
bool background_writes= true¶
- Whether OutputStream writes will be issued in the background, without blocking. 
 - Public Static Functions - 
S3Options Defaults()¶
- Initialize with default credentials provider chain. - This is recommended if you use the standard AWS environment variables and/or configuration file. 
 - 
S3Options Anonymous()¶
- Initialize with anonymous credentials. - This will only let you access public buckets. 
 - 
S3Options FromAccessKey(const std::string &access_key, const std::string &secret_key, const std::string &session_token = "")¶
- Initialize with explicit access and secret key. - Optionally, a session token may also be provided for temporary credentials (from STS). 
 - Initialize from an assumed role. 
 
- 
void 
- 
class arrow::fs::S3FileSystem: public arrow::fs::FileSystem¶
- S3-backed FileSystem implementation. - Some implementation notes: - buckets are special and the operations available on them may be limited or more expensive than desired. 
 - Public Functions - 
std::string region() const¶
- Return the actual region this filesystem connects to. 
 - 
Result<FileInfo> GetFileInfo(const std::string &path) override¶
- Get info for the given target. - Any symlink is automatically dereferenced, recursively. A nonexistent or unreachable file returns an Ok status and has a FileType of value NotFound. An error status indicates a truly exceptional condition (low-level I/O error, etc.). 
 - 
Result<std::vector<FileInfo>> GetFileInfo(const FileSelector &select) override¶
- Same, according to a selector. - The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, see - FileSelector::allow_not_found.
 - 
Status CreateDir(const std::string &path, bool recursive = true) override¶
- Create a directory and subdirectories. - This function succeeds if the directory already exists. 
 - 
Status DeleteDir(const std::string &path) override¶
- Delete a directory and its contents, recursively. 
 - 
Status DeleteDirContents(const std::string &path) override¶
- Delete a directory’s contents, recursively. - Like DeleteDir, but doesn’t delete the directory itself. Passing an empty path (“” or “/”) is disallowed, see DeleteRootDirContents. 
 - 
Status DeleteRootDirContents() override¶
- EXPERIMENTAL: Delete the root directory’s contents, recursively. - Implementations may decide to raise an error if this operation is too dangerous. 
 - 
Status Move(const std::string &src, const std::string &dest) override¶
- Move / rename a file or directory. - If the destination exists: - if it is a non-empty directory, an error is returned 
- otherwise, if it has the same type as the source, it is replaced 
- otherwise, behavior is unspecified (implementation-dependent). 
 
 - 
Status CopyFile(const std::string &src, const std::string &dest) override¶
- Copy a file. - If the destination exists and is a directory, an error is returned. Otherwise, it is replaced. 
 - 
Result<std::shared_ptr<io::InputStream>> OpenInputStream(const std::string &path) override¶
- Create a sequential input stream for reading from a S3 object. - NOTE: Reads from the stream will be synchronous and unbuffered. You way want to wrap the stream in a BufferedInputStream or use a custom readahead strategy to avoid idle waits. 
 - 
Result<std::shared_ptr<io::InputStream>> OpenInputStream(const FileInfo &info) override¶
- Create a sequential input stream for reading from a S3 object. - This override avoids a HEAD request by assuming the FileInfo contains correct information. 
 - 
Result<std::shared_ptr<io::RandomAccessFile>> OpenInputFile(const std::string &path) override¶
- Create a random access file for reading from a S3 object. - See OpenInputStream for performance notes. 
 - 
Result<std::shared_ptr<io::RandomAccessFile>> OpenInputFile(const FileInfo &info) override¶
- Create a random access file for reading from a S3 object. - This override avoids a HEAD request by assuming the FileInfo contains correct information. 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenOutputStream(const std::string &path) override¶
- Create a sequential output stream for writing to a S3 object. - NOTE: Writes to the stream will be buffered. Depending on S3Options.background_writes, they can be synchronous or not. It is recommended to enable background_writes unless you prefer implementing your own background execution strategy. 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenAppendStream(const std::string &path) override¶
- Open an output stream for appending. - If the target doesn’t exist, a new empty file is created. 
 - Public Static Functions - 
Result<std::shared_ptr<S3FileSystem>> Make(const S3Options &options)¶
- Create a S3FileSystem instance from the given options. 
 
- 
struct arrow::fs::HdfsOptions¶
- Options for the HDFS implementation. 
- 
class arrow::fs::HadoopFileSystem: public arrow::fs::FileSystem¶
- HDFS-backed FileSystem implementation. - implementation notes: - This is a wrapper of arrow/io/hdfs, so we can use FileSystem API to handle hdfs. 
 - Public Functions - 
Result<FileInfo> GetFileInfo(const std::string &path) override¶
- Get info for the given target. - Any symlink is automatically dereferenced, recursively. A nonexistent or unreachable file returns an Ok status and has a FileType of value NotFound. An error status indicates a truly exceptional condition (low-level I/O error, etc.). 
 - 
Result<std::vector<FileInfo>> GetFileInfo(const FileSelector &select) override¶
- Same, according to a selector. - The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, see - FileSelector::allow_not_found.
 - 
Status CreateDir(const std::string &path, bool recursive = true) override¶
- Create a directory and subdirectories. - This function succeeds if the directory already exists. 
 - 
Status DeleteDir(const std::string &path) override¶
- Delete a directory and its contents, recursively. 
 - 
Status DeleteDirContents(const std::string &path) override¶
- Delete a directory’s contents, recursively. - Like DeleteDir, but doesn’t delete the directory itself. Passing an empty path (“” or “/”) is disallowed, see DeleteRootDirContents. 
 - 
Status DeleteRootDirContents() override¶
- EXPERIMENTAL: Delete the root directory’s contents, recursively. - Implementations may decide to raise an error if this operation is too dangerous. 
 - 
Status Move(const std::string &src, const std::string &dest) override¶
- Move / rename a file or directory. - If the destination exists: - if it is a non-empty directory, an error is returned 
- otherwise, if it has the same type as the source, it is replaced 
- otherwise, behavior is unspecified (implementation-dependent). 
 
 - 
Status CopyFile(const std::string &src, const std::string &dest) override¶
- Copy a file. - If the destination exists and is a directory, an error is returned. Otherwise, it is replaced. 
 - 
Result<std::shared_ptr<io::InputStream>> OpenInputStream(const std::string &path) override¶
- Open an input stream for sequential reading. 
 - 
Result<std::shared_ptr<io::RandomAccessFile>> OpenInputFile(const std::string &path) override¶
- Open an input file for random access reading. 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenOutputStream(const std::string &path) override¶
- Open an output stream for sequential writing. - If the target already exists, existing data is truncated. 
 - 
Result<std::shared_ptr<io::OutputStream>> OpenAppendStream(const std::string &path) override¶
- Open an output stream for appending. - If the target doesn’t exist, a new empty file is created. 
 - Public Static Functions - 
Result<std::shared_ptr<HadoopFileSystem>> Make(const HdfsOptions &options)¶
- Create a HdfsFileSystem instance from the given options.