Filesystem Interface (legacy)#


This section documents the deprecated filesystem layer. You should use the new filesystem layer instead.

Hadoop File System (HDFS)#

PyArrow comes with bindings to a C++-based interface to the Hadoop File System. You connect like so:

import pyarrow as pa
fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path)
with, 'rb') as f:
    # Do something with f

By default, pyarrow.hdfs.HadoopFileSystem uses libhdfs, a JNI-based interface to the Java Hadoop client. This library is loaded at runtime (rather than at link / library load time, since the library may not be in your LD_LIBRARY_PATH), and relies on some environment variables.

  • HADOOP_HOME: the root of your installed Hadoop distribution. Often has lib/native/

  • JAVA_HOME: the location of your Java SDK installation.

  • ARROW_LIBHDFS_DIR (optional): explicit location of if it is installed somewhere other than $HADOOP_HOME/lib/native.

  • CLASSPATH: must contain the Hadoop jars. You can set these using:

export CLASSPATH=`$HADOOP_HOME/bin/hdfs classpath --glob`

If CLASSPATH is not set, then it will be set automatically if the hadoop executable is in your system path, or if HADOOP_HOME is set.


hdfs.connect([host, port, user, ...])

DEPRECATED: Connect to an HDFS cluster.

Return contents of file as a bytes object.

HadoopFileSystem.chmod(self, path, mode)

Change file permissions

HadoopFileSystem.chown(self, path[, owner, ...])

Change file permissions

HadoopFileSystem.delete(path[, recursive])

Delete the indicated file or directory.


Return free space on disk, like the UNIX df command


Compute bytes used by all contents under indicated path in file tree., path, stream)


Return True if path exists.


Get reported total capacity of file system


Get space used on file system, path)

Return detailed HDFS information for path[, detail])

Retrieve directory contents and metadata, if requested.

HadoopFileSystem.mkdir(path, **kwargs)

Create directory in HDFS., path[, mode, ...])

Open HDFS file for reading or writing

HadoopFileSystem.rename(path, new_path)

Rename file, like UNIX mv command.

HadoopFileSystem.rm(path[, recursive])

Alias for FileSystem.delete.

HadoopFileSystem.upload(self, path, stream)

Upload file-like object to HDFS path