Amazon Simple Cloud Storage (S3)
Storing artifacts in an AWS S3 bucket.
Last updated
Was this helpful?
Storing artifacts in an AWS S3 bucket.
Last updated
Was this helpful?
The S3 Artifact Store is an flavor provided with the S3 ZenML integration that uses or one of the self-hosted S3 alternatives, such as or , to store artifacts in an S3 compatible object storage backend.
Running ZenML pipelines with is usually sufficient if you just want to evaluate ZenML or get started quickly without incurring the trouble and the cost of employing cloud storage services in your stack. However, the local Artifact Store becomes insufficient or unsuitable if you have more elaborate needs for your project:
if you want to share your pipeline run results with other team members or stakeholders inside or outside your organization
if you have other components in your stack that are running remotely (e.g. a Kubeflow or Kubernetes Orchestrator running in a public cloud).
if you outgrow what your local machine can offer in terms of storage space and need to use some form of private or public storage service that is shared with others
if you are running pipelines at scale and need an Artifact Store that can handle the demands of production-grade MLOps
In all these cases, you need an Artifact Store that is backed by a form of public cloud or self-hosted shared object storage service.
You should use the S3 Artifact Store when you decide to keep your ZenML artifacts in a shared object storage and if you have access to the AWS S3 managed service or one of the S3 compatible alternatives (e.g. Minio, Ceph RGW). You should consider one of the other if you don't have access to an S3-compatible service.
The S3 Artifact Store flavor is provided by the S3 ZenML integration, you need to install it on your local machine to be able to register an S3 Artifact Store and add it to your stack:
The only configuration parameter mandatory for registering an S3 Artifact Store is the root path URI, which needs to point to an S3 bucket and take the form s3://bucket-name
. Please read the documentation relevant to the S3 service that you are using on how to create an S3 bucket. For example, the AWS S3 documentation is available .
With the URI to your S3 bucket known, registering an S3 Artifact Store and using it in a stack can be done as follows:
Certain dashboard functionality, such as visualizing or deleting artifacts, is not available when using an implicitly authenticated artifact store together with a deployed ZenML server because the ZenML server will not have permission to access the filesystem.
The implicit authentication method also needs to be coordinated with other stack components that are highly dependent on the Artifact Store and need to interact with it directly to work. If these components are not running on your machine, they do not have access to the local AWS CLI configuration and will encounter authentication failures while trying to access the S3 Artifact Store:
s3_additional_kwargs
: advanced parameters that are used when calling S3 API, typically used for things like ServerSideEncryption
and ACL
.
To include these advanced parameters in your Artifact Store configuration, pass them using JSON format during registration, e.g.:
Depending on your use case, however, you may also need to provide additional configuration parameters pertaining to or to match your S3-compatible service or deployment scenario.
Integrating and using an S3-compatible Artifact Store in your pipelines is not possible without employing some form of authentication. If you're looking for a quick way to get started locally, you can use the Implicit Authentication method. However, the recommended way to authenticate to the AWS cloud platform is through . This is particularly useful if you are configuring ZenML stacks that combine the S3 Artifact Store with other remote stack components also running in AWS.
This method uses the implicit AWS authentication available in the environment where the ZenML code is running. On your local machine, this is the quickest way to configure an S3 Artifact Store. You don't need to supply credentials explicitly when you register the S3 Artifact Store, as it leverages the local credentials and configuration that the AWS CLI stores on your local machine. However, you will need to install and set up the AWS CLI on your machine as a prerequisite, as covered in , before you register the S3 Artifact Store.
need to access the Artifact Store to manage pipeline artifacts
need to access the Artifact Store to manage step-level artifacts
need to access the Artifact Store to load served models
To enable these use-cases, it is recommended to use to link your S3 Artifact Store to the remote S3 bucket.
To set up the S3 Artifact Store to authenticate to AWS and access an S3 bucket, it is recommended to leverage the many features provided by such as auto-configuration, best security practices regarding long-lived credentials and fine-grained access control and reusing the same credentials across multiple stack components.
A non-interactive CLI example that leverages on your local machine to auto-configure an AWS Service Connector targeting a single S3 bucket is:
Note: Please remember to grant the entity associated with your AWS credentials permissions to read and write to your S3 bucket as well as to list accessible S3 buckets. For a full list of permissions required to use an AWS Service Connector to access one or more S3 buckets, please refer to the or read the documentation available in the interactive CLI commands and dashboard. The AWS Service Connector supports with different levels of security and convenience. You should pick the one that best fits your use case.
When you register the S3 Artifact Store, you can , store it in a and then reference it in the Artifact Store configuration.
After having set up the IAM user and generated the access key, as described in the , you can register the S3 Artifact Store as follows:
The S3 Artifact Store accepts a range of advanced configuration options that can be used to further customize how ZenML connects to the S3 storage service that you are using. These are accessible via the client_kwargs
, config_kwargs
and s3_additional_kwargs
configuration attributes and are passed transparently to :
client_kwargs
: arguments that will be transparently passed to . You can use it to configure parameters like endpoint_url
and region_name
when connecting to an S3-compatible endpoint (e.g. Minio).
config_kwargs
: advanced parameters passed to .
For more, up-to-date information on the S3 Artifact Store implementation and its configuration, you can have a look at .
Aside from the fact that the artifacts are stored in an S3 compatible backend, using the S3 Artifact Store is no different than .