Overview

Overview of categories of MLOps components and third-party integrations.

Last updated 1 month ago

Was this helpful?

Overview

Overview of categories of MLOps components and third-party integrations.

If you are new to the world of MLOps, it is often daunting to be immediately faced with a sea of tools that seemingly all promise and do the same things. It is useful in this case to try to categorize tools in various groups in order to understand their value in your toolchain in a more precise manner.

What is a stack?

The is a fundamental component of the ZenML framework. Put simply, a stack represents the configuration of the infrastructure and tooling that defines where and how a pipeline executes.

A stack comprises different stack components, where each component is responsible for a specific task. For example, a stack might have a , a as an , an , an like MLflow and so on.

Each pipeline run that you execute with ZenML will require a stack and each stack will be required to include at least an orchestrator and an artifact store. Apart from these two, the other components are optional and to be added as your pipeline evolves in MLOps maturity.

Stacks as a way to organize your execution environment

With ZenML, you can run your pipelines on more than one stacks with ease. This pattern helps you test your code across different environments effortlessly.

This enables a case like this: a data scientist starts experimentation locally on their system and then once they are satisfied, move to a cloud environment on your staging cloud account to test more advanced features of your pipeline. Finally, when all looks good, they can mark the pipeline ready for production and have it run on a production-grade stack in your production cloud account.

Having separate stacks for these environments helps:

avoid wrongfully deploying your staging pipeline to production
curb costs by running less powerful resources in staging and testing locally first
control access to environments by granting permissions for only certain stacks to certain users

How to manage credentials for your stacks

Most stack components require some form of credentials to interact with the underlying infrastructure. For example, a container registry needs to be authenticated to push and pull images, a Kubernetes cluster needs to be authenticated to deploy models as a web service, and so on.

Recommended roles

Ideally, you would want that only the people who deal with and have direct access to your cloud resources are the ones that are able to create Service Connectors. This is useful for a few reasons:

Less chance of credentials leaking: the more people that have access to your cloud resources, the higher the chance that some of them will be leaked.
Instant revocation of compromised credentials: folks who have direct access to your cloud resources can revoke the credentials instantly if they are compromised, making this a much more secure setup.
Easier auditing: you can have a much easier time auditing and tracking who did what if you have a clear separation between the people who can create Service Connectors (who have direct access to your cloud resources) and those who can only use them.

Recommended workflow

Here's an approach you can take that is a good balance between convenience and security:

Have a limited set of people that have permissions to create Service Connectors. These are ideally people that have access to your cloud accounts and know what credentials to use.
You can create one connector for your development or staging environment and let your data scientists use that to register their stack components.
When you are ready to go to production, you can create another connector with permissions for your production environment and create stacks that use it. This way you can ensure that your production resources are not accidentally used for development or staging.

If you follow this approach, you can keep your data scientists free from the hassle of figuring out the best authentication mechanisms for the different cloud services, having to manage credentials locally, and keep your cloud accounts safe, while still giving them the freedom to run their experiments in the cloud.

How to deploy and manage stacks

Deploying and managing a MLOps stack is tricky.

Figuring out the defaults for infra parameters is not easy. Even if you have identified the backing infra that you need for a stack component, setting up reasonable defaults for parameters like instance size, CPU, memory, etc., needs a lot of experimentation to figure out.
Some tools need an additional layer of installations to enable a more secure, production-grade setup. For example, a standard MLflow tracking server deployment comes without an authentication frontend which might expose all of your tracking data to the world if deployed as-is.
All the components that you deploy must have the right permissions to be able to talk to each other. For example, your workloads running in a Kubernetes cluster might require access to the container registry or the code repository, and so on.

All of these points make taking your pipelines to production a more difficult task than it should be. We believe that the expertise in setting up these often-complex stacks shouldn't be a prerequisite to running your ML pipelines.

This docs section consists of information that makes it easier to provision, configure, and extend stacks and components in ZenML.