Skip to content

File systems

A file system block is an object which allows you to read and write data from paths. Prefect provides multiple built-in file system types that cover a wide range of use cases.

Additional file system types are available in Prefect Collections.

Local file system

The LocalFileSystem block enables interaction with the files in your current development environment.

LocalFileSystem properties include:

Property Description
basepath String path to the location of files on the local filesystem. Access to files outside of the base path will not be allowed.
from prefect.filesystems import LocalFileSystem

fs = LocalFileSystem(basepath="/foo/bar")

Limited access to local file system

Be aware that LocalFileSystem access is limited to the exact path provided. This file system may not be ideal for some use cases. The execution environment for your workflows may not have the same file system as the enviornment you are writing and deploying your code on.

Use of this file system can limit the availability of results after a flow run has completed or prevent the code for a flow from being retrieved successfully at the start of a run.

Remote file system

The RemoteFileSystem block enables interaction with arbitrary remote file systems. Under the hood, RemoteFileSystem uses fsspec and supports any file system that fsspec supports.

RemoteFileSystem properties include:

Property Description
basepath String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed.
settings Dictionary containing extra parameters required to access the remote file system.

The file system is specified using a protocol:

  • s3://my-bucket/my-folder/ will use S3
  • gcs://my-bucket/my-folder/ will use GCS
  • az://my-bucket/my-folder/ will use Azure

For example, to use it with Amazon S3:

from prefect.filesystems import RemoteFileSystem

block = RemoteFileSystem(basepath="s3://my-bucket/folder/")
block.save("dev")

You may need to install additional libraries to use some remote storage types.

RemoteFileSystem examples

How can we use RemoteFileSystem to store our flow code? The following is a use case where we use MinIO as a storage backend:

from prefect.filesystems import RemoteFileSystem

minio_block = RemoteFileSystem(
    basepath="s3://my-bucket",
    settings={
        "key": "MINIO_ROOT_USER",
        "secret": "MINIO_ROOT_PASSWORD",
        "client_kwargs": {"endpoint_url": "http://localhost:9000"},
    },
)
minio_block.save("minio")

S3

The S3 file system block enables interaction with Amazon S3. Under the hood, S3 uses s3fs.

S3 properties include:

Property Description
basepath String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed.
aws_access_key_id AWS Access Key ID
aws_secret_access_key AWS Secret Access Key

To create a block:

from prefect.filesystems import S3

block = S3(basepath="my-bucket/folder/")
block.save("dev")

To use it in a deployment:

prefect deployment build path/to/flow.py:flow_name --name deployment_name --tag dev -sb s3/dev

You need to install s3fsto use it.

GCS

The GCS file system block enables interaction with Google Cloud Storage. Under the hood, GCS uses gcsfs.

GCS properties include:

Property Description
basepath String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed.
service_account_info The contents of a service account keyfile as a JSON string.
project The project the GCS bucket resides in. If not provided, the project will be inferred from the credentials or environment.

To create a block:

from prefect.filesystems import GCS

block = GCS(basepath="my-bucket/folder/")
block.save("dev")

To use it in a deployment:

prefect deployment build path/to/flow.py:flow_name --name deployment_name --tag dev -sb gcs/dev

You need to install gcsfsto use it.

Azure

The Azure file system block enables interaction with Azure Datalake and Azure Blob Storage. Under the hood, Azure uses adlfs.

Azure properties include:

Property Description
basepath String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed.
azure_storage_connection_string Azure storage connection string.
azure_storage_account_name Azure storage account name.
azure_storage_account_key Azure storage account key.

To create a block:

from prefect.filesystems import Azure

block = Azure(basepath="my-bucket/folder/")
block.save("dev")

To use it in a deployment:

prefect deployment build path/to/flow.py:flow_name --name deployment_name --tag dev -sb az/dev

You need to install adlfs to use it.

Handling credentials for cloud object storage services

If you leverage S3, GCS, or Azure storage blocks, and you don't explicitly configure credentials on the respective storage block, those credentials will be inferred from the environment. Make sure to set those either explicitly on the block or as environment variables, configuration files, or IAM roles within both the build and runtime environment for your deployments.

Filesystem-specific package dependencies in Docker images

The core package and Prefect base images don't include filesystem-specific package dependencies such as s3fs, gcsfs or adlfs. To solve that problem in dockerized deployments, you can leverage the EXTRA_PIP_PACKAGES environment variable. Those dependencies will be installed at runtime within your Docker container or Kubernetes Job before the flow starts running.

Here is an example showing how you can specify that in your deployment YAML manifest:

infrastructure:
  type: docker-container
  env:
    EXTRA_PIP_PACKAGES: s3fs  # could be gcsfs, adlfs, etc.

Saving and loading file systems

Configuration for a file system can be saved to the Prefect API. For example:

fs = RemoteFileSystem(basepath="s3://my-bucket/folder/")
fs.write_path("foo", b"hello")
fs.save("dev-s3")

This file system can be retrieved for later use with load.

fs = RemoteFileSystem.load("dev-s3")
fs.read_path("foo")  # b'hello'

Readable and writable file systems

Prefect provides two abstract file system types, ReadableFileSystem and WriteableFileSystem.

  • All readable file systems must implement read_path, which takes a file path to read content from and returns bytes.
  • All writeable file systems must implement write_path which takes a file path and content and writes the content to the file as bytes.

A file system may implement both of these types.