pyarrow.fs.copy_files¶

pyarrow.fs.copy_files(source, destination, source_filesystem=None, destination_filesystem=None, *, chunk_size=1048576, use_threads=True)[source]¶

Copy files between FileSystems.

This functions allows you to recursively copy directories of files from one file system to another, such as from S3 to your local machine.

Parameters:

sourcestr: Source file path or URI to a single file or directory. If a directory, files will be copied recursively from this path.
destinationstr: Destination file path or URI. If source is a file, destination is also interpreted as the destination file (not directory). Directories will be created as necessary.
source_filesystemFileSystem, optional: Source filesystem, needs to be specified if source is not a URI, otherwise inferred.
destination_filesystemFileSystem, optional: Destination filesystem, needs to be specified if destination is not a URI, otherwise inferred.
chunk_sizeint, default 1MB: The maximum size of block to read before flushing to the destination file. A larger chunk_size will use more memory while copying but may help accommodate high latency FileSystems.
use_threadsbool, default True: Whether to use multiple threads to accelerate copying.

Examples

Inspect an S3 bucket’s files:

>>> s3, path = fs.FileSystem.from_uri(
...            "s3://registry.opendata.aws/roda/ndjson/")
>>> selector = fs.FileSelector(path)
>>> s3.get_file_info(selector)
[<FileInfo for 'registry.opendata.aws/roda/ndjson/index.ndjson':...]

Copy one file from S3 bucket to a local directory:

>>> fs.copy_files("s3://registry.opendata.aws/roda/ndjson/index.ndjson",
...               "file:///{}/index_copy.ndjson".format(local_path))

>>> fs.LocalFileSystem().get_file_info(str(local_path)+
...                                    '/index_copy.ndjson')
<FileInfo for '.../index_copy.ndjson': type=FileType.File, size=...>

Copy file using a FileSystem object:

>>> fs.copy_files("registry.opendata.aws/roda/ndjson/index.ndjson",
...               "file:///{}/index_copy.ndjson".format(local_path),
...               source_filesystem=fs.S3FileSystem())

pyarrow.fs.FSSpecHandler

pyarrow.fs.initialize_s3