pyarrow.fs.copy_files

pyarrow.fs.copy_files(source, destination, source_filesystem=None, destination_filesystem=None, *, chunk_size=1048576, use_threads=True)[source]

Copy files between FileSystems.

This functions allows you to recursively copy directories of files from one file system to another, such as from S3 to your local machine.

Parameters
sourcestr

Source file path or URI to a single file or directory. If a directory, files will be copied recursively from this path.

destinationstr

Destination file path or URI. If source is a file, destination is also interpreted as the destination file (not directory). Directories will be created as necessary.

source_filesystemFileSystem, optional

Source filesystem, needs to be specified if source is not a URI, otherwise inferred.

destination_filesystemFileSystem, optional

Destination filesystem, needs to be specified if destination is not a URI, otherwise inferred.

chunk_sizeint, default 1MB

The maximum size of block to read before flushing to the destination file. A larger chunk_size will use more memory while copying but may help accommodate high latency FileSystems.

use_threadsbool, default True

Whether to use multiple threads to accelerate copying.

Examples

Copy an S3 bucket’s files to a local directory:

>>> copy_files("s3://your-bucket-name", "local-directory")

Using a FileSystem object:

>>> copy_files("your-bucket-name", "local-directory",
...            source_filesystem=S3FileSystem(...))