pyarrow.fs.GcsFileSystem#
- class pyarrow.fs.GcsFileSystem(bool anonymous=False, *, access_token=None, target_service_account=None, credential_token_expiration=None, default_bucket_location='US', scheme=None, endpoint_override=None, default_metadata=None, retry_time_limit=None, project_id=None)#
- Bases: - FileSystem- Google Cloud Storage (GCS) backed FileSystem implementation - By default uses the process described in https://google.aip.dev/auth/4110 to resolve credentials. If not running on Google Cloud Platform (GCP), this generally requires the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to a JSON file containing credentials. - Note: GCS buckets are special and the operations available on them may be limited or more expensive than expected compared to local file systems. - Note: When pickling a GcsFileSystem that uses default credentials, resolution credentials are not stored in the serialized data. Therefore, when unpickling it is assumed that the necessary credentials are in place for the target process. - Parameters:
- anonymousbool, default False
- Whether to connect anonymously. If true, will not attempt to look up credentials using standard GCP configuration methods. 
- access_tokenstr, defaultNone
- GCP access token. If provided, temporary credentials will be fetched by assuming this role; also, a credential_token_expiration must be specified as well. 
- target_service_accountstr, defaultNone
- An optional service account to try to impersonate when accessing GCS. This requires the specified credential user or service account to have the necessary permissions. 
- credential_token_expirationdatetime, defaultNone
- Expiration for credential generated with an access token. Must be specified if access_token is specified. 
- default_bucket_locationstr, default ‘US’
- GCP region to create buckets in. 
- schemestr, default ‘https’
- GCS connection transport scheme. 
- endpoint_overridestr, defaultNone
- Override endpoint with a connect string such as “localhost:9000” 
- default_metadatamapping or pyarrow.KeyValueMetadata, defaultNone
- Default metadata for open_output_stream. This will be ignored if non-empty metadata is passed to open_output_stream. 
- retry_time_limittimedelta, defaultNone
- Set the maximum amount of time the GCS client will attempt to retry transient errors. Subsecond granularity is ignored. 
- project_idstr, defaultNone
- The GCP project identifier to use for creating buckets. If not set, the library uses the GOOGLE_CLOUD_PROJECT environment variable. Most I/O operations do not need a project id, only applications that create new buckets need a project id. 
 
- anonymousbool, default 
 - __init__(*args, **kwargs)#
 - Methods - __init__(*args, **kwargs)- copy_file(self, src, dest)- Copy a file. - create_dir(self, path, *, bool recursive=True)- Create a directory and subdirectories. - delete_dir(self, path)- Delete a directory and its contents, recursively. - delete_dir_contents(self, path, *, ...)- Delete a directory's contents, recursively. - delete_file(self, path)- Delete a file. - equals(self, FileSystem other)- from_uri(uri)- Create a new FileSystem from URI or Path. - get_file_info(self, paths_or_selector)- Get info for the given files. - move(self, src, dest)- Move / rename a file or directory. - normalize_path(self, path)- Normalize filesystem path. - open_append_stream(self, path[, ...])- Open an output stream for appending. - open_input_file(self, path)- Open an input file for random access reading. - open_input_stream(self, path[, compression, ...])- Open an input stream for sequential reading. - open_output_stream(self, path[, ...])- Open an output stream for sequential writing. - Attributes - The GCP location this filesystem will write to. - The GCP project id this filesystem will use. - The filesystem's type name. - copy_file(self, src, dest)#
- Copy a file. - If the destination exists and is a directory, an error is returned. Otherwise, it is replaced. - Parameters:
 - Examples - >>> local.copy_file(path, ... local_path + '/pyarrow-fs-example_copy.dat') - Inspect the file info: - >>> local.get_file_info(local_path + '/pyarrow-fs-example_copy.dat') <FileInfo for '/.../pyarrow-fs-example_copy.dat': type=FileType.File, size=4> >>> local.get_file_info(path) <FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4> 
 - create_dir(self, path, *, bool recursive=True)#
- Create a directory and subdirectories. - This function succeeds if the directory already exists. 
 - default_bucket_location#
- The GCP location this filesystem will write to. 
 - delete_dir(self, path)#
- Delete a directory and its contents, recursively. - Parameters:
- pathstr
- The path of the directory to be deleted. 
 
- path
 
 - delete_dir_contents(self, path, *, bool accept_root_dir=False, bool missing_dir_ok=False)#
- Delete a directory’s contents, recursively. - Like delete_dir, but doesn’t delete the directory itself. 
 - equals(self, FileSystem other)#
- Parameters:
- Returns:
 
 - static from_uri(uri)#
- Create a new FileSystem from URI or Path. - Recognized URI schemes are “file”, “mock”, “s3fs”, “gs”, “gcs”, “hdfs” and “viewfs”. In addition, the argument can be a pathlib.Path object, or a string describing an absolute local path. - Parameters:
- uristr
- URI-based path, for example: file:///some/local/path. 
 
- uri
- Returns:
- tupleof (- FileSystem,- strpath)
- With (filesystem, path) tuple where path is the abstract path inside the FileSystem instance. 
 
 - Examples - Create a new FileSystem subclass from a URI: - >>> uri = f'file:///{local_path}/pyarrow-fs-example.dat' >>> local_new, path_new = fs.FileSystem.from_uri(uri) >>> local_new <pyarrow._fs.LocalFileSystem object at ... >>> path_new '/.../pyarrow-fs-example.dat' - Or from a s3 bucket: - >>> fs.FileSystem.from_uri("s3://usgs-landsat/collection02/") (<pyarrow._s3fs.S3FileSystem object at ...>, 'usgs-landsat/collection02') - Or from an fsspec+ URI: - >>> fs.FileSystem.from_uri("fsspec+memory:///path/to/file") (<pyarrow._fs.PyFileSystem object at ...>, '/path/to/file') 
 - get_file_info(self, paths_or_selector)#
- Get info for the given files. - Any symlink is automatically dereferenced, recursively. A non-existing or unreachable file returns a FileStat object and has a FileType of value NotFound. An exception indicates a truly exceptional condition (low-level I/O error, etc.). - Parameters:
- paths_or_selectorFileSelector, path-like orlistof path-likes
- Either a selector object, a path-like object or a list of path-like objects. The selector’s base directory will not be part of the results, even if it exists. If it doesn’t exist, use allow_not_found. 
 
- paths_or_selector
- Returns:
 - Examples - >>> local <pyarrow._fs.LocalFileSystem object at ...> >>> local.get_file_info(f"/{local_path}/pyarrow-fs-example.dat") <FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4> 
 - move(self, src, dest)#
- Move / rename a file or directory. - If the destination exists: - if it is a non-empty directory, an error is returned - otherwise, if it has the same type as the source, it is replaced - otherwise, behavior is unspecified (implementation-dependent). - Parameters:
 - Examples - Create a new folder with a file: - >>> local.create_dir('/tmp/other_dir') >>> local.copy_file(path,'/tmp/move_example.dat') - Move the file: - >>> local.move('/tmp/move_example.dat', ... '/tmp/other_dir/move_example_2.dat') - Inspect the file info: - >>> local.get_file_info('/tmp/other_dir/move_example_2.dat') <FileInfo for '/tmp/other_dir/move_example_2.dat': type=FileType.File, size=4> >>> local.get_file_info('/tmp/move_example.dat') <FileInfo for '/tmp/move_example.dat': type=FileType.NotFound> - Delete the folder: >>> local.delete_dir(‘/tmp/other_dir’) 
 - normalize_path(self, path)#
- Normalize filesystem path. 
 - open_append_stream(self, path, compression='detect', buffer_size=None, metadata=None)#
- Open an output stream for appending. - If the target doesn’t exist, a new empty file is created. - Note - Some filesystem implementations do not support efficient appending to an existing file, in which case this method will raise NotImplementedError. Consider writing to multiple files (using e.g. the dataset layer) instead. - Parameters:
- pathstr
- The source to open for writing. 
- compressionstroptional, default ‘detect’
- The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”). 
- buffer_sizeintoptional, defaultNone
- If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer. 
- metadatadictoptional, defaultNone
- If not None, a mapping of string keys to string values. Some filesystems support storing metadata along the file (such as “Content-Type”). Unsupported metadata keys will be ignored. 
 
- path
- Returns:
- streamNativeFile
 
- stream
 - Examples - Append new data to a FileSystem subclass with nonempty file: - >>> with local.open_append_stream(path) as f: ... f.write(b'+newly added') 12 - Print out the content to the file: - >>> with local.open_input_file(path) as f: ... print(f.readall()) b'data+newly added' 
 - open_input_file(self, path)#
- Open an input file for random access reading. - Parameters:
- pathstr
- The source to open for reading. 
 
- path
- Returns:
- streamNativeFile
 
- stream
 - Examples - Print the data from the file with open_input_file(): - >>> with local.open_input_file(path) as f: ... print(f.readall()) b'data' 
 - open_input_stream(self, path, compression='detect', buffer_size=None)#
- Open an input stream for sequential reading. - Parameters:
- pathstr
- The source to open for reading. 
- compressionstroptional, default ‘detect’
- The compression algorithm to use for on-the-fly decompression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”). 
- buffer_sizeintoptional, defaultNone
- If None or 0, no buffering will happen. Otherwise the size of the temporary read buffer. 
 
- path
- Returns:
- streamNativeFile
 
- stream
 - Examples - Print the data from the file with open_input_stream(): - >>> with local.open_input_stream(path) as f: ... print(f.readall()) b'data' 
 - open_output_stream(self, path, compression='detect', buffer_size=None, metadata=None)#
- Open an output stream for sequential writing. - If the target already exists, existing data is truncated. - Parameters:
- pathstr
- The source to open for writing. 
- compressionstroptional, default ‘detect’
- The compression algorithm to use for on-the-fly compression. If “detect” and source is a file path, then compression will be chosen based on the file extension. If None, no compression will be applied. Otherwise, a well-known algorithm name must be supplied (e.g. “gzip”). 
- buffer_sizeintoptional, defaultNone
- If None or 0, no buffering will happen. Otherwise the size of the temporary write buffer. 
- metadatadictoptional, defaultNone
- If not None, a mapping of string keys to string values. Some filesystems support storing metadata along the file (such as “Content-Type”). Unsupported metadata keys will be ignored. 
 
- path
- Returns:
- streamNativeFile
 
- stream
 - Examples - >>> local = fs.LocalFileSystem() >>> with local.open_output_stream(path) as stream: ... stream.write(b'data') 4 
 - project_id#
- The GCP project id this filesystem will use. 
 - type_name#
- The filesystem’s type name. 
 
 
    