Filesystems

Interface

enum arrow::fs::FileType

EXPERIMENTAL: FileSystem entry type.

Values:

NonExistent

Entry does not exist.

Unknown

Entry exists but its type is unknown.

This can designate a special file such as a Unix socket or character device, or Windows NUL / CON / …

File

Entry is a regular file.

Directory

Entry is a directory.

struct FileStats

EXPERIMENTAL: FileSystem entry stats.

Public Functions

FileType type() const

The file type.

std::string path() const

The full file path in the filesystem.

std::string base_name() const

The file base name (component after the last directory separator)

int64_t size() const

The size in bytes, if available.

Only regular files are guaranteed to have a size.

std::string extension() const

The file extension (excluding the dot)

TimePoint mtime() const

The time of last modification, if available.

class FileSystem

EXPERIMENTAL: abstract file system API.

Subclassed by arrow::fs::internal::MockFileSystem, arrow::fs::LocalFileSystem, arrow::fs::S3FileSystem, arrow::fs::SlowFileSystem, arrow::fs::SubTreeFileSystem

Public Functions

virtual Status GetTargetStats(const std::string &path, FileStats *out) = 0

Get statistics for the given target.

Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns an Ok status and has a FileType of value NonExistent. An error status indicates a truly exceptional condition (low-level I/O error, etc.).

virtual Status GetTargetStats(const std::vector<std::string> &paths, std::vector<FileStats> *out)

Same, for many targets at once.

virtual Status GetTargetStats(const Selector &select, std::vector<FileStats> *out) = 0

Same, according to a selector.

The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, see Selector::allow_non_existent.

virtual Status CreateDir(const std::string &path, bool recursive = true) = 0

Create a directory and subdirectories.

This function succeeds if the directory already exists.

virtual Status DeleteDir(const std::string &path) = 0

Delete a directory and its contents, recursively.

virtual Status DeleteDirContents(const std::string &path) = 0

Delete a directory’s contents, recursively.

Like DeleteDir, but doesn’t delete the directory itself. Passing an empty path (“”) will wipe the entire filesystem tree.

virtual Status DeleteFile(const std::string &path) = 0

Delete a file.

virtual Status DeleteFiles(const std::vector<std::string> &paths)

Delete many files.

The default implementation issues individual delete operations in sequence.

virtual Status Move(const std::string &src, const std::string &dest) = 0

Move / rename a file or directory.

If the destination exists:

  • if it is a non-empty directory, an error is returned

  • otherwise, if it has the same type as the source, it is replaced

  • otherwise, behavior is unspecified (implementation-dependent).

virtual Status CopyFile(const std::string &src, const std::string &dest) = 0

Copy a file.

If the destination exists and is a directory, an error is returned. Otherwise, it is replaced.

virtual Status OpenInputStream(const std::string &path, std::shared_ptr<io::InputStream> *out) = 0

Open an input stream for sequential reading.

virtual Status OpenInputFile(const std::string &path, std::shared_ptr<io::RandomAccessFile> *out) = 0

Open an input file for random access reading.

virtual Status OpenOutputStream(const std::string &path, std::shared_ptr<io::OutputStream> *out) = 0

Open an output stream for sequential writing.

If the target already exists, existing data is truncated.

virtual Status OpenAppendStream(const std::string &path, std::shared_ptr<io::OutputStream> *out) = 0

Open an output stream for appending.

If the target doesn’t exist, a new empty file is created.

Concrete implementations

class SubTreeFileSystem : public arrow::fs::FileSystem

EXPERIMENTAL: a FileSystem implementation that delegates to another implementation after prepending a fixed base path.

This is useful to expose a logical view of a subtree of a filesystem, for example a directory in a LocalFileSystem. This makes no security guarantee. For example, symlinks may allow to “escape” the subtree and access other parts of the underlying filesystem.

Public Functions

Status GetTargetStats(const std::string &path, FileStats *out)

Get statistics for the given target.

Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns an Ok status and has a FileType of value NonExistent. An error status indicates a truly exceptional condition (low-level I/O error, etc.).

Status GetTargetStats(const Selector &select, std::vector<FileStats> *out)

Same, according to a selector.

The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, see Selector::allow_non_existent.

Status CreateDir(const std::string &path, bool recursive = true)

Create a directory and subdirectories.

This function succeeds if the directory already exists.

Status DeleteDir(const std::string &path)

Delete a directory and its contents, recursively.

Status DeleteDirContents(const std::string &path)

Delete a directory’s contents, recursively.

Like DeleteDir, but doesn’t delete the directory itself. Passing an empty path (“”) will wipe the entire filesystem tree.

Status DeleteFile(const std::string &path)

Delete a file.

Status Move(const std::string &src, const std::string &dest)

Move / rename a file or directory.

If the destination exists:

  • if it is a non-empty directory, an error is returned

  • otherwise, if it has the same type as the source, it is replaced

  • otherwise, behavior is unspecified (implementation-dependent).

Status CopyFile(const std::string &src, const std::string &dest)

Copy a file.

If the destination exists and is a directory, an error is returned. Otherwise, it is replaced.

Status OpenInputStream(const std::string &path, std::shared_ptr<io::InputStream> *out)

Open an input stream for sequential reading.

Status OpenInputFile(const std::string &path, std::shared_ptr<io::RandomAccessFile> *out)

Open an input file for random access reading.

Status OpenOutputStream(const std::string &path, std::shared_ptr<io::OutputStream> *out)

Open an output stream for sequential writing.

If the target already exists, existing data is truncated.

Status OpenAppendStream(const std::string &path, std::shared_ptr<io::OutputStream> *out)

Open an output stream for appending.

If the target doesn’t exist, a new empty file is created.

class LocalFileSystem : public arrow::fs::FileSystem

EXPERIMENTAL: a FileSystem implementation accessing files on the local machine.

This class handles only /-separated paths. If desired, conversion from Windows backslash-separated paths should be done by the caller. Details such as symlinks are abstracted away (symlinks are always followed, except when deleting an entry).

Public Functions

Status GetTargetStats(const std::string &path, FileStats *out)

Get statistics for the given target.

Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns an Ok status and has a FileType of value NonExistent. An error status indicates a truly exceptional condition (low-level I/O error, etc.).

Status GetTargetStats(const Selector &select, std::vector<FileStats> *out)

Same, according to a selector.

The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, see Selector::allow_non_existent.

Status CreateDir(const std::string &path, bool recursive = true)

Create a directory and subdirectories.

This function succeeds if the directory already exists.

Status DeleteDir(const std::string &path)

Delete a directory and its contents, recursively.

Status DeleteDirContents(const std::string &path)

Delete a directory’s contents, recursively.

Like DeleteDir, but doesn’t delete the directory itself. Passing an empty path (“”) will wipe the entire filesystem tree.

Status DeleteFile(const std::string &path)

Delete a file.

Status Move(const std::string &src, const std::string &dest)

Move / rename a file or directory.

If the destination exists:

  • if it is a non-empty directory, an error is returned

  • otherwise, if it has the same type as the source, it is replaced

  • otherwise, behavior is unspecified (implementation-dependent).

Status CopyFile(const std::string &src, const std::string &dest)

Copy a file.

If the destination exists and is a directory, an error is returned. Otherwise, it is replaced.

Status OpenInputStream(const std::string &path, std::shared_ptr<io::InputStream> *out)

Open an input stream for sequential reading.

Status OpenInputFile(const std::string &path, std::shared_ptr<io::RandomAccessFile> *out)

Open an input file for random access reading.

Status OpenOutputStream(const std::string &path, std::shared_ptr<io::OutputStream> *out)

Open an output stream for sequential writing.

If the target already exists, existing data is truncated.

Status OpenAppendStream(const std::string &path, std::shared_ptr<io::OutputStream> *out)

Open an output stream for appending.

If the target doesn’t exist, a new empty file is created.

struct S3Options

Options for the S3FileSystem implementation.

Public Functions

void ConfigureDefaultCredentials()

Configure with the default AWS credentials provider chain.

void ConfigureAccessKey(const std::string &access_key, const std::string &secret_key)

Configure with explicit access and secret key.

Public Members

std::string region = kS3DefaultRegion

AWS region to connect to (default “us-east-1”)

std::string endpoint_override

If non-empty, override region with a connect string such as “localhost:9000”.

std::string scheme = "https"

S3 connection transport, default “https”.

std::shared_ptr<Aws::Auth::AWSCredentialsProvider> credentials_provider

AWS credentials provider.

bool background_writes = true

Whether OutputStream writes will be issued in the background, without blocking.

Public Static Functions

static S3Options Defaults()

Initialize with default credentials provider chain.

This is recommended if you use the standard AWS environment variables and/or configuration file.

static S3Options FromAccessKey(const std::string &access_key, const std::string &secret_key)

Initialize with explicit access and secret key.

class S3FileSystem : public arrow::fs::FileSystem

S3-backed FileSystem implementation.

Some implementation notes:

  • buckets are special and the operations available on them may be limited or more expensive than desired.

Public Functions

Status GetTargetStats(const std::string &path, FileStats *out)

Get statistics for the given target.

Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns an Ok status and has a FileType of value NonExistent. An error status indicates a truly exceptional condition (low-level I/O error, etc.).

Status GetTargetStats(const Selector &select, std::vector<FileStats> *out)

Same, according to a selector.

The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, see Selector::allow_non_existent.

Status CreateDir(const std::string &path, bool recursive = true)

Create a directory and subdirectories.

This function succeeds if the directory already exists.

Status DeleteDir(const std::string &path)

Delete a directory and its contents, recursively.

Status DeleteDirContents(const std::string &path)

Delete a directory’s contents, recursively.

Like DeleteDir, but doesn’t delete the directory itself. Passing an empty path (“”) will wipe the entire filesystem tree.

Status DeleteFile(const std::string &path)

Delete a file.

Status Move(const std::string &src, const std::string &dest)

Move / rename a file or directory.

If the destination exists:

  • if it is a non-empty directory, an error is returned

  • otherwise, if it has the same type as the source, it is replaced

  • otherwise, behavior is unspecified (implementation-dependent).

Status CopyFile(const std::string &src, const std::string &dest)

Copy a file.

If the destination exists and is a directory, an error is returned. Otherwise, it is replaced.

Status OpenInputStream(const std::string &path, std::shared_ptr<io::InputStream> *out)

Create a sequential input stream for reading from a S3 object.

NOTE: Reads from the stream will be synchronous and unbuffered. You way want to wrap the stream in a BufferedInputStream or use a custom readahead strategy to avoid idle waits.

Status OpenInputFile(const std::string &path, std::shared_ptr<io::RandomAccessFile> *out)

Create a random access file for reading from a S3 object.

See OpenInputStream for performance notes.

Status OpenOutputStream(const std::string &path, std::shared_ptr<io::OutputStream> *out)

Create a sequential output stream for writing to a S3 object.

NOTE: Writes to the stream will be buffered. Depending on S3Options.background_writes, they can be synchronous or not. It is recommended to enable background_writes unless you prefer implementing your own background execution strategy.

Status OpenAppendStream(const std::string &path, std::shared_ptr<io::OutputStream> *out)

Open an output stream for appending.

If the target doesn’t exist, a new empty file is created.

Public Static Functions

static Status Make(const S3Options &options, std::shared_ptr<S3FileSystem> *out)

Create a S3FileSystem instance from the given options.